The call path of a switchdev VLAN addition to the bridge looks something
like this today:
nbp_vlan_init
| __br_vlan_set_default_pvid
| | |
| | br_afspec |
| | | |
| | v |
| | br_process_vlan_info |
| | | |
| | v |
| | br_vlan_info |
| | / \ /
| | / \ /
| | / \ /
| | / \ /
v v v v v
nbp_vlan_add br_vlan_add ------+
| ^ ^ | |
| / | | |
| / / / |
\ br_vlan_get_master/ / v
\ ^ / / br_vlan_add_existing
\ | / / |
\ | / / /
\ | / / /
\ | / / /
\ | / / /
v | | v /
__vlan_add /
/ | /
/ | /
v | /
__vlan_vid_add | /
\ | /
v v v
br_switchdev_port_vlan_add
The ranges UAPI was introduced to the bridge in commit bdced7ef78
("bridge: support for multiple vlans and vlan ranges in setlink and
dellink requests") (Jan 10 2015). But the VLAN ranges (parsed in br_afspec)
have always been passed one by one, through struct bridge_vlan_info
tmp_vinfo, to br_vlan_info. So the range never went too far in depth.
Then Scott Feldman introduced the switchdev_port_bridge_setlink function
in commit 47f8328bb1 ("switchdev: add new switchdev bridge setlink").
That marked the introduction of the SWITCHDEV_OBJ_PORT_VLAN, which made
full use of the range. But switchdev_port_bridge_setlink was called like
this:
br_setlink
-> br_afspec
-> switchdev_port_bridge_setlink
Basically, the switchdev and the bridge code were not tightly integrated.
Then commit 41c498b935 ("bridge: restore br_setlink back to original")
came, and switchdev drivers were required to implement
.ndo_bridge_setlink = switchdev_port_bridge_setlink for a while.
In the meantime, commits such as 0944d6b5a2 ("bridge: try switchdev op
first in __vlan_vid_add/del") finally made switchdev penetrate the
br_vlan_info() barrier and start to develop the call path we have today.
But remember, br_vlan_info() still receives VLANs one by one.
Then Arkadi Sharshevsky refactored the switchdev API in 2017 in commit
29ab586c3d ("net: switchdev: Remove bridge bypass support from
switchdev") so that drivers would not implement .ndo_bridge_setlink any
longer. The switchdev_port_bridge_setlink also got deleted.
This refactoring removed the parallel bridge_setlink implementation from
switchdev, and left the only switchdev VLAN objects to be the ones
offloaded from __vlan_vid_add (basically RX filtering) and __vlan_add
(the latter coming from commit 9c86ce2c1a ("net: bridge: Notify about
bridge VLANs")).
That is to say, today the switchdev VLAN object ranges are not used in
the kernel. Refactoring the above call path is a bit complicated, when
the bridge VLAN call path is already a bit complicated.
Let's go off and finish the job of commit 29ab586c3d by deleting the
bogus iteration through the VLAN ranges from the drivers. Some aspects
of this feature never made too much sense in the first place. For
example, what is a range of VLANs all having the BRIDGE_VLAN_INFO_PVID
flag supposed to mean, when a port can obviously have a single pvid?
This particular configuration _is_ denied as of commit 6623c60dc2
("bridge: vlan: enforce no pvid flag in vlan ranges"), but from an API
perspective, the driver still has to play pretend, and only offload the
vlan->vid_end as pvid. And the addition of a switchdev VLAN object can
modify the flags of another, completely unrelated, switchdev VLAN
object! (a VLAN that is PVID will invalidate the PVID flag from whatever
other VLAN had previously been offloaded with switchdev and had that
flag. Yet switchdev never notifies about that change, drivers are
supposed to guess).
Nonetheless, having a VLAN range in the API makes error handling look
scarier than it really is - unwinding on errors and all of that.
When in reality, no one really calls this API with more than one VLAN.
It is all unnecessary complexity.
And despite appearing pretentious (two-phase transactional model and
all), the switchdev API is really sloppy because the VLAN addition and
removal operations are not paired with one another (you can add a VLAN
100 times and delete it just once). The bridge notifies through
switchdev of a VLAN addition not only when the flags of an existing VLAN
change, but also when nothing changes. There are switchdev drivers out
there who don't like adding a VLAN that has already been added, and
those checks don't really belong at driver level. But the fact that the
API contains ranges is yet another factor that prevents this from being
addressed in the future.
Of the existing switchdev pieces of hardware, it appears that only
Mellanox Spectrum supports offloading more than one VLAN at a time,
through mlxsw_sp_port_vlan_set. I have kept that code internal to the
driver, because there is some more bookkeeping that makes use of it, but
I deleted it from the switchdev API. But since the switchdev support for
ranges has already been de facto deleted by a Mellanox employee and
nobody noticed for 4 years, I'm going to assume it's not a biggie.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com> # switchdev and mlxsw
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Kurt Kanzenbach <kurt@linutronix.de> # hellcreek
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Introduced in commit 37b8da1a3c ("net: dsa: Move FDB add/del
implementation inside DSA") in net/dsa/legacy.c, these functions were
moved again to slave.c as part of commit 2a93c1a365 ("net: dsa: Allow
compiling out legacy support"), before actually deleting net/dsa/slave.c
in 93e86b3bc8 ("net: dsa: Remove legacy probing support"). Along with
that movement there should have been a deletion of the prototypes from
dsa_priv.h, they are not useful.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Link: https://lore.kernel.org/r/20210108233054.1222278-1-olteanv@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
This effectively reverts commit 60724d4bae ("net: dsa: Add support for
DSA specific notifiers"). The reason is that since commit 2f1e8ea726
("net: dsa: link interfaces with the DSA master to get rid of lockdep
warnings"), it appears that there is a generic way to achieve the same
purpose. The only user thus far, the Broadcom SYSTEMPORT driver, was
converted to use the generic notifiers.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Acked-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Using the NETDEV_CHANGEUPPER notifications, drivers can be aware when
they are enslaved to e.g. a bridge by calling netif_is_bridge_master().
Export this helper from DSA to get the equivalent functionality of
determining whether the upper interface of a CHANGEUPPER notifier is a
DSA switch interface or not.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Acked-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
It is a bit strange to see something as specific as Broadcom SYSTEMPORT
bits in the main DSA include file. Move these away into a separate
header, and have the tagger and the SYSTEMPORT driver include them.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Acked-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Some DSA switches (and not only) cannot learn source MAC addresses from
packets injected from the CPU. They only perform hardware address
learning from inbound traffic.
This can be problematic when we have a bridge spanning some DSA switch
ports and some non-DSA ports (which we'll call "foreign interfaces" from
DSA's perspective).
There are 2 classes of problems created by the lack of learning on
CPU-injected traffic:
- excessive flooding, due to the fact that DSA treats those addresses as
unknown
- the risk of stale routes, which can lead to temporary packet loss
To illustrate the second class, consider the following situation, which
is common in production equipment (wireless access points, where there
is a WLAN interface and an Ethernet switch, and these form a single
bridging domain).
AP 1:
+------------------------------------------------------------------------+
| br0 |
+------------------------------------------------------------------------+
+------------+ +------------+ +------------+ +------------+ +------------+
| swp0 | | swp1 | | swp2 | | swp3 | | wlan0 |
+------------+ +------------+ +------------+ +------------+ +------------+
| ^ ^
| | |
| | |
| Client A Client B
|
|
|
+------------+ +------------+ +------------+ +------------+ +------------+
| swp0 | | swp1 | | swp2 | | swp3 | | wlan0 |
+------------+ +------------+ +------------+ +------------+ +------------+
+------------------------------------------------------------------------+
| br0 |
+------------------------------------------------------------------------+
AP 2
- br0 of AP 1 will know that Clients A and B are reachable via wlan0
- the hardware fdb of a DSA switch driver today is not kept in sync with
the software entries on other bridge ports, so it will not know that
clients A and B are reachable via the CPU port UNLESS the hardware
switch itself performs SA learning from traffic injected from the CPU.
Nonetheless, a substantial number of switches don't.
- the hardware fdb of the DSA switch on AP 2 may autonomously learn that
Client A and B are reachable through swp0. Therefore, the software br0
of AP 2 also may or may not learn this. In the example we're
illustrating, some Ethernet traffic has been going on, and br0 from AP
2 has indeed learnt that it can reach Client B through swp0.
One of the wireless clients, say Client B, disconnects from AP 1 and
roams to AP 2. The topology now looks like this:
AP 1:
+------------------------------------------------------------------------+
| br0 |
+------------------------------------------------------------------------+
+------------+ +------------+ +------------+ +------------+ +------------+
| swp0 | | swp1 | | swp2 | | swp3 | | wlan0 |
+------------+ +------------+ +------------+ +------------+ +------------+
| ^
| |
| Client A
|
|
| Client B
| |
| v
+------------+ +------------+ +------------+ +------------+ +------------+
| swp0 | | swp1 | | swp2 | | swp3 | | wlan0 |
+------------+ +------------+ +------------+ +------------+ +------------+
+------------------------------------------------------------------------+
| br0 |
+------------------------------------------------------------------------+
AP 2
- br0 of AP 1 still knows that Client A is reachable via wlan0 (no change)
- br0 of AP 1 will (possibly) know that Client B has left wlan0. There
are cases where it might never find out though. Either way, DSA today
does not process that notification in any way.
- the hardware FDB of the DSA switch on AP 1 may learn autonomously that
Client B can be reached via swp0, if it receives any packet with
Client 1's source MAC address over Ethernet.
- the hardware FDB of the DSA switch on AP 2 still thinks that Client B
can be reached via swp0. It does not know that it has roamed to wlan0,
because it doesn't perform SA learning from the CPU port.
Now Client A contacts Client B.
AP 1 routes the packet fine towards swp0 and delivers it on the Ethernet
segment.
AP 2 sees a frame on swp0 and its fdb says that the destination is swp0.
Hairpinning is disabled => drop.
This problem comes from the fact that these switches have a 'blind spot'
for addresses coming from software bridging. The generic solution is not
to assume that hardware learning can be enabled somehow, but to listen
to more bridge learning events. It turns out that the bridge driver does
learn in software from all inbound frames, in __br_handle_local_finish.
A proper SWITCHDEV_FDB_ADD_TO_DEVICE notification is emitted for the
addresses serviced by the bridge on 'foreign' interfaces. The software
bridge also does the right thing on migration, by notifying that the old
entry is deleted, so that does not need to be special-cased in DSA. When
it is deleted, we just need to delete our static FDB entry towards the
CPU too, and wait.
The problem is that DSA currently only cares about SWITCHDEV_FDB_ADD_TO_DEVICE
events received on its own interfaces, such as static FDB entries.
Luckily we can change that, and DSA can listen to all switchdev FDB
add/del events in the system and figure out if those events were emitted
by a bridge that spans at least one of DSA's own ports. In case that is
true, DSA will also offload that address towards its own CPU port, in
the eventuality that there might be bridge clients attached to the DSA
switch who want to talk to the station connected to the foreign
interface.
In terms of implementation, we need to keep the fdb_info->added_by_user
check for the case where the switchdev event was targeted directly at a
DSA switch port. But we don't need to look at that flag for snooped
events. So the check is currently too late, we need to move it earlier.
This also simplifies the code a bit, since we avoid uselessly allocating
and freeing switchdev_work.
We could probably do some improvements in the future. For example,
multi-bridge support is rudimentary at the moment. If there are two
bridges spanning a DSA switch's ports, and both of them need to service
the same MAC address, then what will happen is that the migration of one
of those stations will trigger the deletion of the FDB entry from the
CPU port while it is still used by other bridge. That could be improved
with reference counting but is left for another time.
This behavior needs to be enabled at driver level by setting
ds->assisted_learning_on_cpu_port = true. This is because we don't want
to inflict a potential performance penalty (accesses through
MDIO/I2C/SPI are expensive) to hardware that really doesn't need it
because address learning on the CPU port works there.
Reported-by: DENG Qingfang <dqfext@gmail.com>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Right now, the following would happen for a switch driver that does not
implement .port_fdb_add or .port_fdb_del.
dsa_slave_switchdev_event returns NOTIFY_OK and schedules:
-> dsa_slave_switchdev_event_work
-> dsa_port_fdb_add
-> dsa_port_notify(DSA_NOTIFIER_FDB_ADD)
-> dsa_switch_fdb_add
-> if (!ds->ops->port_fdb_add) return -EOPNOTSUPP;
-> an error is printed with dev_dbg, and
dsa_fdb_offload_notify(switchdev_work) is not called.
We can avoid scheduling the worker for nothing and say NOTIFY_DONE.
Because we don't call dsa_fdb_offload_notify, the static FDB entry will
remain just in the software bridge.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
We'll need to start listening to SWITCHDEV_FDB_{ADD,DEL}_TO_DEVICE
events even for interfaces where dsa_slave_dev_check returns false, so
we need that check inside the switch-case statement for SWITCHDEV_FDB_*.
This movement also avoids a useless allocation / free of switchdev_work
on the untreated "default event" case.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Currently DSA doesn't add FDB entries on the CPU port, because it only
does so through switchdev, which is associated with a net_device, and
there are none of those for the CPU port.
But actually FDB addresses on the CPU port have some use cases of their
own, if the switchdev operations are initiated from within the DSA
layer. There is just one problem with the existing code: it passes a
structure in dsa_switchdev_event_work which was retrieved directly from
switchdev, so it contains a net_device. We need to generalize the
contents to something that covers the CPU port as well: the "ds, port"
tuple is fine for that.
Note that the new procedure for notifying the successful FDB offload is
inspired from the rocker model.
Also, nothing was being done if added_by_user was false. Let's check for
that a lot earlier, and don't actually bother to schedule the worker
for nothing.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The dev_close() call was added in commit c9eb3e0f87 ("net: dsa: Add
support for learning FDB through notification") "to indicate inconsistent
situation" when we could not delete an FDB entry from the port.
bridge fdb del d8:58:d7:00:ca:6d dev swp0 self master
It is a bit drastic and at the same time not helpful if the above fails
to only print with netdev_dbg log level, but on the other hand to bring
the interface down.
So increase the verbosity of the error message, and drop dev_close().
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Looking for an -EINVAL all over the dsa code could take hours for
inexperienced DSA users.
Signed-off-by: Rafał Miłecki <rafal@milecki.pl>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://lore.kernel.org/r/20210106090915.21439-1-zajec5@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
These warnings become somewhat more informative when they include the
MTU value that could not be set and not just the errno.
Signed-off-by: Rasmus Villemoes <rasmus.villemoes@prevas.dk>
Link: https://lore.kernel.org/r/20201205133944.10182-1-rasmus.villemoes@prevas.dk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Remove unused and add needed includes. No functional change.
Suggested-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: Kurt Kanzenbach <kurt@linutronix.de>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
If dsa_switch_ops::port_txtstamp() returns false, clone will be freed
immediately. Shouldn't store a pointer to freed memory.
Signed-off-by: Christian Eggers <ceggers@arri.de>
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Tested-by: Vladimir Oltean <olteanv@gmail.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Link: https://lore.kernel.org/r/20201119110906.25558-1-ceggers@arri.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Use a consistent style of one-line/multi-line comments throughout the
file.
Signed-off-by: Tobias Waldekranz <tobias@waldekranz.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Ethertype DSA encodes exactly the same information in the DSA tag as
the non-ethertype variety. So refactor out the common parts and reuse
them for both protocols.
This is ensures tag parsing and generation is always consistent across
all mv88e6xxx chips.
While we are at it, explicitly deal with all possible CPU codes on
receive, making sure to set offload_fwd_mark as appropriate.
Signed-off-by: Tobias Waldekranz <tobias@waldekranz.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
When receiving an IGMP/MLD frame with a TO_CPU tag, the switch has not
performed any forwarding of it. This means that we should not set the
offload_fwd_mark on the skb, in case a software bridge wants it
forwarded.
This is a port of:
1ed9ec9b08 ("dsa: Allow forwarding of redirected IGMP traffic")
Which corrected the issue for chips using EDSA tags, but not for those
using regular DSA tags.
Signed-off-by: Tobias Waldekranz <tobias@waldekranz.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Use netdev->tstats instead of a member of dsa_slave_priv for storing
a pointer to the per-cpu counters. This allows us to use core
functionality for statistics handling.
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Tested-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Some switches rely on unique pvids to ensure port separation in
standalone mode, because they don't have a port forwarding matrix
configurable in hardware. So, setups like a group of 2 uppers with the
same VLAN, swp0.100 and swp1.100, will cause traffic tagged with VLAN
100 to be autonomously forwarded between these switch ports, in spite
of there being no bridge between swp0 and swp1.
These drivers need to prevent this from happening. They need to have
VLAN filtering enabled in standalone mode (so they'll drop frames tagged
with unknown VLANs) and they can only accept an 8021q upper on a port as
long as it isn't installed on any other port too. So give them the
chance to veto bad user requests.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
[Kurt: Pass info instead of ptr]
Signed-off-by: Kurt Kanzenbach <kurt@linutronix.de>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The Hirschmann Hellcreek TSN switches have a special tagging protocol for frames
exchanged between the CPU port and the master interface. The format is a one
byte trailer indicating the destination or origin port.
It's quite similar to the Micrel KSZ tagging. That's why the implementation is
based on that code.
Signed-off-by: Kurt Kanzenbach <kurt@linutronix.de>
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Now that we have a central TX reallocation procedure that accounts for
the tagger's needed headroom in a generic way, we can remove the
skb_cow_head call.
Cc: Per Forlin <per.forlin@axis.com>
Cc: Oleksij Rempel <linux@rempel-privat.de>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Tested-by: Oleksij Rempel <linux@rempel-privat.de>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Now that we have a central TX reallocation procedure that accounts for
the tagger's needed headroom in a generic way, we can remove the
skb_cow_head call.
This one is interesting, the DSA tag is 8 bytes on RX and 4 bytes on TX.
Because DSA is unaware of asymmetrical tag lengths, the overhead/needed
headroom is declared as 8 bytes and therefore 4 bytes larger than it
needs to be. If this becomes a problem, and the GSWIP driver can't be
converted to a uniform header length, we might need to make DSA aware of
separate RX/TX overhead values.
Cc: Hauke Mehrtens <hauke@hauke-m.de>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Now that we have a central TX reallocation procedure that accounts for
the tagger's needed headroom in a generic way, we can remove the
skb_cow_head call.
Similar to the EtherType DSA tagger, the old Marvell tagger can
transform an 802.1Q header if present into a DSA tag, so there is no
headroom required in that case. But we are ensuring that it exists,
regardless (practically speaking, the headroom must be 4 bytes larger
than it needs to be).
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Now that we have a central TX reallocation procedure that accounts for
the tagger's needed headroom in a generic way, we can remove the
skb_cow_head call.
Cc: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Now that we have a central TX reallocation procedure that accounts for
the tagger's needed headroom in a generic way, we can remove the
skb_cow_head call.
Note that the VLAN code path needs a smaller extra headroom than the
regular EtherType DSA path. That isn't a problem, because this tagger
declares the larger tag length (8 bytes vs 4) as the protocol overhead,
so we are covered in both cases.
Cc: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Now that we have a central TX reallocation procedure that accounts for
the tagger's needed headroom in a generic way, we can remove the
skb_cow_head call.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Now that we have a central TX reallocation procedure that accounts for
the tagger's needed headroom in a generic way, we can remove the
skb_cow_head call.
Cc: DENG Qingfang <dqfext@gmail.com>
Cc: Sean Wang <sean.wang@mediatek.com>
Cc: John Crispin <john@phrozen.org>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Now that we have a central TX reallocation procedure that accounts for
the tagger's needed headroom in a generic way, we can remove the
skb_cow_head call.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Now that we have a central TX reallocation procedure that accounts for
the tagger's needed headroom in a generic way, we can remove the
skb_cow_head call.
Cc: John Crispin <john@phrozen.org>
Cc: Alexander Lobakin <alobakin@pm.me>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The caller (dsa_slave_xmit) guarantees that the frame length is at least
ETH_ZLEN and that enough memory for tail tagging is available.
Signed-off-by: Christian Eggers <ceggers@arri.de>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The caller (dsa_slave_xmit) guarantees that the frame length is at least
ETH_ZLEN and that enough memory for tail tagging is available.
Signed-off-by: Christian Eggers <ceggers@arri.de>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
At the moment, taggers are left with the task of ensuring that the skb
headers are writable (which they aren't, if the frames were cloned for
TX timestamping, for flooding by the bridge, etc), and that there is
enough space in the skb data area for the DSA tag to be pushed.
Moreover, the life of tail taggers is even harder, because they need to
ensure that short frames have enough padding, a problem that normal
taggers don't have.
The principle of the DSA framework is that everything except for the
most intimate hardware specifics (like in this case, the actual packing
of the DSA tag bits) should be done inside the core, to avoid having
code paths that are very rarely tested.
So provide a TX reallocation procedure that should cover the known needs
of DSA today.
Note that this patch also gives the network stack a good hint about the
headroom/tailroom it's going to need. Up till now it wasn't doing that.
So the reallocation procedure should really be there only for the
exceptional cases, and for cloned packets which need to be unshared.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Tested-by: Christian Eggers <ceggers@arri.de> # For tail taggers only
Tested-by: Kurt Kanzenbach <kurt@linutronix.de>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The Marvell 88E6060 uses tag_trailer.c and the KSZ8795, KSZ9477 and
KSZ9893 switches also use tail tags.
Fixes: 7a6ffe764b ("net: dsa: point out the tail taggers")
Signed-off-by: Christian Eggers <ceggers@arri.de>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Link: https://lore.kernel.org/r/20201016171603.10587-1-ceggers@arri.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Simplify the code by using new function dev_fetch_sw_netstats().
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Tested-by: Vladimir Oltean <olteanv@gmail.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Link: https://lore.kernel.org/r/b6047017-8226-6b7e-a3cd-064e69fdfa27@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
When the Extraction Frame Header contains a valid classified VLAN, use
that instead of the VLAN header present in the packet.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
A driver may refuse to enable VLAN filtering for any reason beyond what
the DSA framework cares about, such as:
- having tc-flower rules that rely on the switch being VLAN-aware
- the particular switch does not support VLAN, even if the driver does
(the DSA framework just checks for the presence of the .port_vlan_add
and .port_vlan_del pointers)
- simply not supporting this configuration to be toggled at runtime
Currently, when a driver rejects a configuration it cannot support, it
does this from the commit phase, which triggers various warnings in
switchdev.
So propagate the prepare phase to drivers, to give them the ability to
refuse invalid configurations cleanly and avoid the warnings.
Since we need to modify all function prototypes and check for the
prepare phase from within the drivers, take that opportunity and move
the existing driver restrictions within the prepare phase where that is
possible and easy.
Cc: Florian Fainelli <f.fainelli@gmail.com>
Cc: Martin Blumenstingl <martin.blumenstingl@googlemail.com>
Cc: Hauke Mehrtens <hauke@hauke-m.de>
Cc: Woojung Huh <woojung.huh@microchip.com>
Cc: Microchip Linux Driver Support <UNGLinuxDriver@microchip.com>
Cc: Sean Wang <sean.wang@mediatek.com>
Cc: Landen Chao <Landen.Chao@mediatek.com>
Cc: Andrew Lunn <andrew@lunn.ch>
Cc: Vivien Didelot <vivien.didelot@gmail.com>
Cc: Jonathan McDowell <noodles@earth.li>
Cc: Linus Walleij <linus.walleij@linaro.org>
Cc: Alexandre Belloni <alexandre.belloni@bootlin.com>
Cc: Claudiu Manoil <claudiu.manoil@nxp.com>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Allow DSA drivers to make use of devlink port regions, via simple
wrappers.
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Tested-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
DSA drivers want to create regions on devlink ports as well as the
devlink device instance, in order to export registers and other tables
per port. To keep all this code together in the drivers, have the
devlink ports registered early, so the setup() method can setup both
device and port devlink regions.
v3:
Remove dp->setup
Move common code out of switch statement.
Fix wrong goto
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Tested-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
If a port is unused, still create a devlink port for it, but set the
flavour to unused. This allows us to attach devlink regions to the
port, etc.
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Tested-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Now that we are guaranteed that dsa_untag_bridge_pvid() is called after
eth_type_trans() we can utilize __vlan_find_dev_deep_rcu() which will
take care of finding an 802.1Q upper on top of a bridge master.
A common use case, prior to 12a1526d067 ("net: dsa: untag the bridge
pvid from rx skbs") was to configure a bridge 802.1Q upper like this:
ip link add name br0 type bridge vlan_filtering 0
ip link add link br0 name br0.1 type vlan id 1
in order to pop the default_pvid VLAN tag.
With this change we restore that behavior while still allowing the DSA
receive path to automatically pop the VLAN tag.
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Now that dsa_untag_bridge_pvid() is called after eth_type_trans() we are
guaranteed that skb->protocol will be set to a correct value, thus
allowing us to avoid calling vlan_eth_hdr().
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Indicate to the DSA receive path that we need to untage the bridge PVID,
this allows us to remove the dsa_untag_bridge_pvid() calls from
net/dsa/tag_brcm.c.
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When a DSA switch driver needs to call dsa_untag_bridge_pvid(), it can
set dsa_switch::untag_brige_pvid to indicate this is necessary.
This is a pre-requisite to making sure that we are always calling
dsa_untag_bridge_pvid() after eth_type_trans() has been called.
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Remove the .flow_dissect procedure, so the flow dissector will call the
generic variant which works for this tagging protocol.
Cc: Linus Walleij <linus.walleij@linaro.org>
Cc: DENG Qingfang <dqfext@gmail.com>
Cc: Mauri Sandberg <sandberg@mailfence.com>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The sja1105 is a bit of a special snowflake, in that not all frames are
transmitted/received in the same way. L2 link-local frames are received
with the source port/switch ID information put in the destination MAC
address. For the rest, a tag_8021q header is used. So only the latter
frames displace the rest of the headers and need to use the generic flow
dissector procedure.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Remove the .flow_dissect procedure, so the flow dissector will call the
generic variant which works for this tagging protocol.
Cc: John Crispin <john@phrozen.org>
Cc: Alexander Lobakin <alobakin@pm.me>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Remove the .flow_dissect procedure, so the flow dissector will call the
generic variant which works for this tagging protocol.
Cc: DENG Qingfang <dqfext@gmail.com>
Cc: Sean Wang <sean.wang@mediatek.com>
Cc: John Crispin <john@phrozen.org>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Remove the .flow_dissect procedure, so the flow dissector will call the
generic variant which works for this tagging protocol.
Cc: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Remove the .flow_dissect procedure, so the flow dissector will call the
generic variant which works for this tagging protocol.
Cc: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
There are 2 Broadcom tags in use, one places the DSA tag before the
Ethernet destination MAC address, and the other before the EtherType.
Nonetheless, both displace the rest of the headers, so this tagger can
use the generic flow dissector procedure which accounts for that.
The ASCII art drawing is a good reference though, so keep it but move it
somewhere else.
Cc: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The Marvell 88E6060 uses tag_trailer.c and the KSZ8795, KSZ9477 and
KSZ9893 switches also use tail tags.
Tell that to the DSA core, since this makes a difference for the flow
dissector. Most switches break the parsing of frame headers, but these
ones don't, so no flow dissector adjustment needs to be done for them.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
There is no tagger that returns anything other than zero, so just change
the return type appropriately.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
There are 2 goals that we follow:
- Reduce the header size
- Make the header size equal between RX and TX
The issue that required long prefix on RX was the fact that the ocelot
DSA tag, being put before Ethernet as it is, would overlap with the area
that a DSA master uses for RX filtering (destination MAC address
mainly).
Now that we can ask DSA to put the master in promiscuous mode, in theory
we could remove the prefix altogether and call it a day, but it looks
like we can't. Using no prefix on ingress, some packets (such as ICMP)
would be received, while others (such as PTP) would not be received.
This is because the DSA master we use (enetc) triggers parse errors
("MAC rx frame errors") presumably because it sees Ethernet frames with
a bad length. And indeed, when using no prefix, the EtherType (bytes
12-13 of the frame, bits 96-111) falls over the REW_VAL field from the
extraction header, aka the PTP timestamp.
When turning the short (32-bit) prefix on, the EtherType overlaps with
bits 64-79 of the extraction header, which are a reserved area
transmitted as zero by the switch. The packets are not dropped by the
DSA master with a short prefix. Actually, the frames look like this in
tcpdump (below is a PTP frame, with an extra dsa_8021q tag - dadb 0482 -
added by a downstream sja1105).
89:0c:a9:f2:01:00 > 88:80:00:0a:00:1d, 802.3, length 0: LLC, \
dsap Unknown (0x10) Individual, ssap ProWay NM (0x0e) Response, \
ctrl 0x0004: Information, send seq 2, rcv seq 0, \
Flags [Response], length 78
0x0000: 8880 000a 001d 890c a9f2 0100 0000 100f ................
0x0010: 0400 0000 0180 c200 000e 001f 7b63 0248 ............{c.H
0x0020: dadb 0482 88f7 1202 0036 0000 0000 0000 .........6......
0x0030: 0000 0000 0000 0000 0000 001f 7bff fe63 ............{..c
0x0040: 0248 0001 1f81 0500 0000 0000 0000 0000 .H..............
0x0050: 0000 0000 0000 0000 0000 0000 ............
So the short prefix is our new default: we've shortened our RX frames by
12 octets, increased TX by 4, and headers are now equal between RX and
TX. Note that we still need promiscuous mode for the DSA master to not
drop it.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently PTP is broken when ports are in standalone mode (the tagger
keeps printing this message):
sja1105 spi0.1: Expected meta frame, is 01-80-c2-00-00-0e in the DSA master multicast filter?
Sure, one might say "simply add 01-80-c2-00-00-0e to the master's RX
filter" but things become more complicated because:
- Actually all frames in the 01-80-c2-xx-xx-xx and 01-1b-19-xx-xx-xx
range are trapped to the CPU automatically
- The switch mangles bytes 3 and 4 of the MAC address via the incl_srcpt
("include source port [in the DMAC]") option, which is how source port
and switch id identification is done for link-local traffic on RX. But
this means that an address installed to the RX filter would, at the
end of the day, not correspond to the final address seen by the DSA
master.
Assume RX filtering lists on DSA masters are typically too small to
include all necessary addresses for PTP to work properly on sja1105, and
just request promiscuous mode unconditionally.
Just an example:
Assuming the following addresses are trapped to the CPU:
01-80-c2-00-00-00 to 01-80-c2-00-00-ff
01-1b-19-00-00-00 to 01-1b-19-00-00-ff
These are 512 addresses.
Now let's say this is a board with 3 switches, and 4 ports per switch.
The 512 addresses become 6144 addresses that must be managed by the DSA
master's RX filtering lists.
This may be refined in the future, but for now, it is simply not worth
it to add the additional addresses to the master's RX filter, so simply
request it to become promiscuous as soon as the driver probes.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently DSA assumes that taggers don't mess with the destination MAC
address of the frames on RX. That is not always the case. Some DSA
headers are placed before the Ethernet header (ocelot), and others
simply mangle random bytes from the destination MAC address (sja1105
with its incl_srcpt option).
Currently the DSA master goes to promiscuous mode automatically when the
slave devices go too (such as when enslaved to a bridge), but in
standalone mode this is a problem that needs to be dealt with.
So give drivers the possibility to signal that their tagging protocol
will get randomly dropped otherwise, and let DSA deal with fixing that.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently, ocelot switchdev passes the skb directly to the function that
enqueues it to the list of skb's awaiting a TX timestamp. Whereas the
felix DSA driver first clones the skb, then passes the clone to this
queue.
This matters because in the case of felix, the common IRQ handler, which
is ocelot_get_txtstamp(), currently clones the clone, and frees the
original clone. This is useless and can be simplified by using
skb_complete_tx_timestamp() instead of skb_tstamp_tx().
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Acked-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Update the B53 driver to support VLANs while not filtering. This
requires us to enable VLAN globally within the switch upon driver
initial configuration (dev->vlan_enabled).
We also need to remove the code that dealt with PVID re-configuration in
b53_vlan_filtering() since that function worked under the assumption
that it would only be called to make a bridge VLAN filtering, or not
filtering, and we would attempt to move the port's PVID accordingly.
Now that VLANs are programmed all the time, even in the case of a
non-VLAN filtering bridge, we would be programming a default_pvid for
the bridged switch ports.
We need the DSA receive path to pop the VLAN tag if it is the bridge's
default_pvid because the CPU port is always programmed tagged in the
programmed VLANs. In order to do so we utilize the
dsa_untag_bridge_pvid() helper introduced in the commit before within
net/dsa/tag_brcm.c.
Acked-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently the bridge untags VLANs present in its VLAN groups in
__allowed_ingress() only when VLAN filtering is enabled.
But when a skb is seen on the RX path as tagged with the bridge's pvid,
and that bridge has vlan_filtering=0, and there isn't any 8021q upper
with that VLAN either, then we have a problem. The bridge will not untag
it (since it is supposed to remain VLAN-unaware), and pvid-tagged
communication will be broken.
There are 2 situations where we can end up like that:
1. When installing a pvid in egress-tagged mode, like this:
ip link add dev br0 type bridge vlan_filtering 0
ip link set swp0 master br0
bridge vlan del dev swp0 vid 1
bridge vlan add dev swp0 vid 1 pvid
This happens because DSA configures the VLAN membership of the CPU port
using the same flags as swp0 (in this case "pvid and not untagged"), in
an attempt to copy the frame as-is from ingress to the CPU.
However, in this case, the packet may arrive untagged on ingress, it
will be pvid-tagged by the ingress port, and will be sent as
egress-tagged towards the CPU. Otherwise stated, the CPU will see a VLAN
tag where there was none to speak of on ingress.
When vlan_filtering is 1, this is not a problem, as stated in the first
paragraph, because __allowed_ingress() will pop it. But currently, when
vlan_filtering is 0 and we have such a VLAN configuration, we need an
8021q upper (br0.1) to be able to ping over that VLAN, which is not
symmetrical with the vlan_filtering=1 case, and therefore, confusing for
users.
Basically what DSA attempts to do is simply an approximation: try to
copy the skb with (or without) the same VLAN all the way up to the CPU.
But DSA drivers treat CPU port VLAN membership in various ways (which is
a good segue into situation 2). And some of those drivers simply tell
the CPU port to copy the frame unmodified, which is the golden standard
when it comes to VLAN processing (therefore, any driver which can
configure the hardware to do that, should do that, and discard the VLAN
flags requested by DSA on the CPU port).
2. Some DSA drivers always configure the CPU port as egress-tagged, in
an attempt to recover the classified VLAN from the skb. These drivers
cannot work at all with untagged traffic when bridged in
vlan_filtering=0 mode. And they can't go for the easy "just keep the
pvid as egress-untagged towards the CPU" route, because each front port
can have its own pvid, and that might require conflicting VLAN
membership settings on the CPU port (swp1 is pvid for VID 1 and
egress-tagged for VID 2; swp2 is egress-taggeed for VID 1 and pvid for
VID 2; with this simplistic approach, the CPU port, which is really a
separate hardware entity and has its own VLAN membership settings, would
end up being egress-untagged in both VID 1 and VID 2, therefore losing
the VLAN tags of ingress traffic).
So the only thing we can do is to create a helper function for resolving
the problematic case (that is, a function which untags the bridge pvid
when that is in vlan_filtering=0 mode), which taggers in need should
call. It isn't called from the generic DSA receive path because there
are drivers that fall neither in the first nor second category.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Two minor conflicts:
1) net/ipv4/route.c, adding a new local variable while
moving another local variable and removing it's
initial assignment.
2) drivers/net/dsa/microchip/ksz9477.c, overlapping changes.
One pretty prints the port mode differently, whilst another
changes the driver to try and obtain the port mode from
the port node rather than the switch node.
Signed-off-by: David S. Miller <davem@davemloft.net>
Check whether there is any hwaccel VLAN tag on RX, and if there is,
treat it as the tag_8021q header.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The whole purpose of tag_8021q is to send VLAN-tagged traffic to the
CPU, from which the driver can decode the source port and switch id.
Currently this only works if the VLAN filtering on the master is
disabled. Change that by explicitly adding code to tag_8021q.c to add
the VLANs corresponding to the tags to the filter of the master
interface.
Because we now need to call vlan_vid_add, then we also need to hold the
RTNL mutex. Propagate that requirement to the callers of dsa_8021q_setup
and modify the existing call sites as appropriate. Note that one call
path, sja1105_best_effort_vlan_filtering_set -> sja1105_vlan_filtering
-> sja1105_setup_8021q_tagging, was already holding this lock.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Most DSA switch tags shift the EtherType to the right, causing the
master to not parse the VLAN as VLAN.
However, not all switches do that (example: tail tags, tag_8021q etc),
and if the DSA master has "rx-vlan-filter: on" in ethtool -k, then we
have a problem.
Therefore, we could populate the VLAN table of the master, just in case
(for some switches it will not make a difference), so that network I/O
can work even with a VLAN filtering master.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When the bridge has VLAN awareness disabled there isn't any duplication
of functionality, since the bridge does not process VLAN. Don't deny
adding 8021q uppers to DSA switch ports in that case. The switch is
supposed to simply pass traffic leaving the VLAN tag as-is, and the
stack will happily strip the VLAN tag for all 8021q uppers that exist.
We need to ensure that there are no 8021q uppers when the user attempts
to enable bridge vlan_filtering.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The current logic beats me a little bit. The comment that "bridge skips
-EOPNOTSUPP, so skip the prepare phase" was introduced in commit
fb2dabad69 ("net: dsa: support VLAN filtering switchdev attr").
I'm not sure:
(a) ok, the bridge skips -EOPNOTSUPP, but, so what, where are we
returning -EOPNOTSUPP?
(b) even if we are, and I'm just not seeing it, what is the causality
relationship between the bridge skipping -EOPNOTSUPP and DSA
skipping the prepare phase, and just returning zero?
One thing is certain beyond doubt though, and that is that DSA currently
refuses VLAN filtering from the "commit" phase instead of "prepare", and
that this is not a good thing:
ip link add br0 type bridge
ip link add br1 type bridge vlan_filtering 1
ip link set swp2 master br0
ip link set swp3 master br1
[ 3790.379389] 001: sja1105 spi0.1: VLAN filtering is a global setting
[ 3790.379399] 001: ------------[ cut here ]------------
[ 3790.379403] 001: WARNING: CPU: 1 PID: 515 at net/switchdev/switchdev.c:157 switchdev_port_attr_set_now+0x9c/0xa4
[ 3790.379420] 001: swp3: Commit of attribute (id=6) failed.
[ 3790.379533] 001: [<c11ff588>] (switchdev_port_attr_set_now) from [<c11b62e4>] (nbp_vlan_init+0x84/0x148)
[ 3790.379544] 001: [<c11b62e4>] (nbp_vlan_init) from [<c11a2ff0>] (br_add_if+0x514/0x670)
[ 3790.379554] 001: [<c11a2ff0>] (br_add_if) from [<c1031b5c>] (do_setlink+0x38c/0xab0)
[ 3790.379565] 001: [<c1031b5c>] (do_setlink) from [<c1036fe8>] (__rtnl_newlink+0x44c/0x748)
[ 3790.379573] 001: [<c1036fe8>] (__rtnl_newlink) from [<c1037328>] (rtnl_newlink+0x44/0x60)
[ 3790.379580] 001: [<c1037328>] (rtnl_newlink) from [<c10315fc>] (rtnetlink_rcv_msg+0x124/0x2f8)
[ 3790.379590] 001: [<c10315fc>] (rtnetlink_rcv_msg) from [<c10926b8>] (netlink_rcv_skb+0xb8/0x110)
[ 3790.379806] 001: ---[ end trace 0000000000000002 ]---
[ 3790.379819] 001: sja1105 spi0.1 swp3: failed to initialize vlan filtering on this port
So move the current logic that may fail (except ds->ops->port_vlan_filtering,
that is way harder) into the prepare stage of the switchdev transaction.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This is checking for the following order of operations, and makes sure
to deny that configuration:
ip link add link swp2 name swp2.100 type vlan id 100
ip link add br0 type bridge vlan_filtering 1
ip link set swp2 master br0
bridge vlan add dev swp2 vid 100
Instead of using vlan_for_each(), which looks at the VLAN filters
installed with vlan_vid_add(), just track the 8021q uppers. This has the
advantage of freeing up the vlan_vid_add() call for actual VLAN
filtering.
There is another change in this patch. The check is moved in slave.c,
from switch.c. I don't think it makes sense to have this 8021q upper
check for each switch port that gets notified of that VLAN addition
(these include DSA links and CPU ports, we know those can't have 8021q
uppers because they don't have a net_device registered for them), so
just do it in slave.c, for that one slave interface.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
DSA tries to prevent having a VLAN added by a bridge and by an 802.1Q
upper at the same time. It does that by checking the VID in
.ndo_vlan_rx_add_vid(), since that's something that the 8021q module
calls, via vlan_vid_add(). When a VLAN matches in both subsystems, this
check returns -EBUSY.
However the vlan_vid_add() function isn't specific to the 8021q module
in any way at all. It is simply the kernel's way to tell an interface to
add a VLAN to its RX filter and not drop that VLAN. So there's no reason
to return -EBUSY when somebody tries to call vlan_vid_add() for a VLAN
that was installed by the bridge. The proper behavior is to accept that
configuration.
So what's wrong is how DSA checks that it has an 8021q upper. It should
look at the actual uppers for that, not just assume that the 8021q
module was somewhere in the call stack of .ndo_vlan_rx_add_vid().
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We'll be adding a new check in the PRECHANGEUPPER notifier, where we'll
need to check some VLAN uppers. It is hard to do that when there is
already a function named dsa_slave_upper_vlan_check. So rename this one.
Not to mention that this function probably shouldn't have started with
"dsa_slave_" in the first place, since the struct net_device argument
isn't a DSA slave, but an 8021q upper of one.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
There doesn't seem to be any strong technical reason for doing it this
way, but we'll be adding more checks for invalid upper device
configurations, and it will be easier to have them all grouped under
PRECHANGEUPPER.
Tested that it still works:
ip link set br0 type bridge vlan_filtering 1
ip link add link swp2 name swp2.100 type vlan id 100
ip link set swp2.100 master br0
[ 20.321312] br0: port 5(swp2.100) entered blocking state
[ 20.326711] br0: port 5(swp2.100) entered disabled state
Error: dsa_core: Cannot enslave VLAN device into VLAN aware bridge.
[ 20.346549] br0: port 5(swp2.100) entered blocking state
[ 20.351957] br0: port 5(swp2.100) entered disabled state
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Allow the DSA drivers to implement the devlink call to get info info,
e.g. driver name, firmware version, ASIC ID, etc.
v2:
Combine declaration and the assignment on a single line.
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Allow DSA drivers to make use of devlink regions, via simple wrappers.
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Given a devlink instance, return the dsa switch it is associated to.
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
The ocelot_port->ts_id is used to:
(a) populate skb->cb[0] for matching the TX timestamp in the PTP IRQ
with an skb.
(b) populate the REW_OP from the injection header of the ongoing skb.
Only then is ocelot_port->ts_id incremented.
This is a problem because, at least theoretically, another timestampable
skb might use the same ocelot_port->ts_id before that is incremented.
Normally all transmit calls are serialized by the netdev transmit
spinlock, but in this case, ocelot_port_add_txtstamp_skb() is also
called by DSA, which has started declaring the NETIF_F_LLTX feature
since commit 2b86cb8299 ("net: dsa: declare lockless TX feature for
slave ports"). So the logic of using and incrementing the timestamp id
should be atomic per port.
The solution is to use the global ocelot_port->ts_id only while
protected by the associated ocelot_port->ts_id_lock. That's where we
populate skb->cb[0]. Note that for ocelot, ocelot_port_add_txtstamp_skb
is called for the actual skb, but for felix, it is called for the skb's
clone. That is something which will also be changed in the future.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Tested-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
Reviewed-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This reverts commit 314f76d7a6.
Citing that commit message, the call graph was:
dsa_slave_vlan_rx_add_vid dsa_port_setup_8021q_tagging
| |
| |
| +-------------+
| |
v v
dsa_port_vid_add dsa_slave_port_obj_add
| |
+-------+ +-------+
| |
v v
dsa_port_vlan_add
Now that tag_8021q has its own ops structure, it no longer relies on
dsa_port_vid_add, and therefore on the dsa_switch_ops to install its
VLANs.
So dsa_port_vid_add now only has one single caller. So we can simplify
the call graph to what it was before, aka:
dsa_slave_vlan_rx_add_vid dsa_slave_port_obj_add
| |
+-------+ +-------+
| |
v v
dsa_port_vlan_add
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
While working on another tag_8021q driver implementation, some things
became apparent:
- It is not mandatory for a DSA driver to offload the tag_8021q VLANs by
using the VLAN table per se. For example, it can add custom TCAM rules
that simply encapsulate RX traffic, and redirect & decapsulate rules
for TX traffic. For such a driver, it makes no sense to receive the
tag_8021q configuration through the same callback as it receives the
VLAN configuration from the bridge and the 8021q modules.
- Currently, sja1105 (the only tag_8021q user) sets a
priv->expect_dsa_8021q variable to distinguish between the bridge
calling, and tag_8021q calling. That can be improved, to say the
least.
- The crosschip bridging operations are, in fact, stateful already. The
list of crosschip_links must be kept by the caller and passed to the
relevant tag_8021q functions.
So it would be nice if the tag_8021q configuration was more
self-contained. This patch attempts to do that.
Create a struct dsa_8021q_context which encapsulates a struct
dsa_switch, and has 2 function pointers for adding and deleting a VLAN.
These will replace the previous channel to the driver, which was through
the .port_vlan_add and .port_vlan_del callbacks of dsa_switch_ops.
Also put the list of crosschip_links into this dsa_8021q_context.
Drivers that don't support cross-chip bridging can simply omit to
initialize this list, as long as they dont call any cross-chip function.
The sja1105_vlan_add and sja1105_vlan_del functions are refactored into
a smaller sja1105_vlan_add_one, which now has 2 entry points:
- sja1105_vlan_add, from struct dsa_switch_ops
- sja1105_dsa_8021q_vlan_add, from the tag_8021q ops
But even this change is fairly trivial. It just reflects the fact that
for sja1105, the VLANs from these 2 channels end up in the same hardware
table. However that is not necessarily true in the general sense (and
that's the reason for making this change).
The rest of the patch is mostly plain refactoring of "ds" -> "ctx". The
dsa_8021q_context structure needs to be propagated because adding a VLAN
is now done through the ops function pointers inside of it.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
There is no point in calling dsa_port_setup_8021q_tagging for each
individual port. Additionally, it will become more difficult to do that
when we'll have a context structure to tag_8021q (next patch). So
refactor this now.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Since commit 845e0ebb44 ("net: change addr_list_lock back to static
key"), cascaded DSA setups (DSA switch port as DSA master for another
DSA switch port) are emitting this lockdep warning:
============================================
WARNING: possible recursive locking detected
5.8.0-rc1-00133-g923e4b5032dd-dirty #208 Not tainted
--------------------------------------------
dhcpcd/323 is trying to acquire lock:
ffff000066dd4268 (&dsa_master_addr_list_lock_key/1){+...}-{2:2}, at: dev_mc_sync+0x44/0x90
but task is already holding lock:
ffff00006608c268 (&dsa_master_addr_list_lock_key/1){+...}-{2:2}, at: dev_mc_sync+0x44/0x90
other info that might help us debug this:
Possible unsafe locking scenario:
CPU0
----
lock(&dsa_master_addr_list_lock_key/1);
lock(&dsa_master_addr_list_lock_key/1);
*** DEADLOCK ***
May be due to missing lock nesting notation
3 locks held by dhcpcd/323:
#0: ffffdbd1381dda18 (rtnl_mutex){+.+.}-{3:3}, at: rtnl_lock+0x24/0x30
#1: ffff00006614b268 (_xmit_ETHER){+...}-{2:2}, at: dev_set_rx_mode+0x28/0x48
#2: ffff00006608c268 (&dsa_master_addr_list_lock_key/1){+...}-{2:2}, at: dev_mc_sync+0x44/0x90
stack backtrace:
Call trace:
dump_backtrace+0x0/0x1e0
show_stack+0x20/0x30
dump_stack+0xec/0x158
__lock_acquire+0xca0/0x2398
lock_acquire+0xe8/0x440
_raw_spin_lock_nested+0x64/0x90
dev_mc_sync+0x44/0x90
dsa_slave_set_rx_mode+0x34/0x50
__dev_set_rx_mode+0x60/0xa0
dev_mc_sync+0x84/0x90
dsa_slave_set_rx_mode+0x34/0x50
__dev_set_rx_mode+0x60/0xa0
dev_set_rx_mode+0x30/0x48
__dev_open+0x10c/0x180
__dev_change_flags+0x170/0x1c8
dev_change_flags+0x2c/0x70
devinet_ioctl+0x774/0x878
inet_ioctl+0x348/0x3b0
sock_do_ioctl+0x50/0x310
sock_ioctl+0x1f8/0x580
ksys_ioctl+0xb0/0xf0
__arm64_sys_ioctl+0x28/0x38
el0_svc_common.constprop.0+0x7c/0x180
do_el0_svc+0x2c/0x98
el0_sync_handler+0x9c/0x1b8
el0_sync+0x158/0x180
Since DSA never made use of the netdev API for describing links between
upper devices and lower devices, the dev->lower_level value of a DSA
switch interface would be 1, which would warn when it is a DSA master.
We can use netdev_upper_dev_link() to describe the relationship between
a DSA slave and a DSA master. To be precise, a DSA "slave" (switch port)
is an "upper" to a DSA "master" (host port). The relationship is "many
uppers to one lower", like in the case of VLAN. So, for that reason, we
use the same function as VLAN uses.
There might be a chance that somebody will try to take hold of this
interface and use it immediately after register_netdev() and before
netdev_upper_dev_link(). To avoid that, we do the registration and
linkage while holding the RTNL, and we use the RTNL-locked cousin of
register_netdev(), which is register_netdevice().
Since this warning was not there when lockdep was using dynamic keys for
addr_list_lock, we are blaming the lockdep patch itself. The network
stack _has_ been using static lockdep keys before, and it _is_ likely
that stacked DSA setups have been triggering these lockdep warnings
since forever, however I can't test very old kernels on this particular
stacked DSA setup, to ensure I'm not in fact introducing regressions.
Fixes: 845e0ebb44 ("net: change addr_list_lock back to static key")
Suggested-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: Vladimir Oltean <olteanv@gmail.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit 72579e14a1 ("net: dsa: don't fail to probe if we couldn't set
the MTU") changed, for some reason, the "err && err != -EOPNOTSUPP"
check into a simple "err". This causes the MTU warning to be printed
even for drivers that don't have the MTU operations implemented.
Fix that.
Signed-off-by: Vladimir Oltean <olteanv@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
slave_dev->name is only populated at this stage if it was specified
through a label in the device tree. However that is not mandatory.
When it isn't, the error message looks like this:
[ 5.037057] fsl_enetc 0000:00:00.2 eth2: error -19 setting up slave PHY for eth%d
[ 5.044672] fsl_enetc 0000:00:00.2 eth2: error -19 setting up slave PHY for eth%d
[ 5.052275] fsl_enetc 0000:00:00.2 eth2: error -19 setting up slave PHY for eth%d
[ 5.059877] fsl_enetc 0000:00:00.2 eth2: error -19 setting up slave PHY for eth%d
which is especially confusing since the error gets printed on behalf of
the DSA master (fsl_enetc in this case).
Printing an error message that contains a valid reference to the DSA
port's name is difficult at this point in the initialization stage, so
at least we should print some info that is more reliable, even if less
user-friendly. That may be the driver name and the hardware port index.
After this change, the error is printed as:
[ 6.051587] mscc_felix 0000:00:00.5: error -19 setting up PHY for tree 0, switch 0, port 0
[ 6.061192] mscc_felix 0000:00:00.5: error -19 setting up PHY for tree 0, switch 0, port 1
[ 6.070765] mscc_felix 0000:00:00.5: error -19 setting up PHY for tree 0, switch 0, port 2
[ 6.080324] mscc_felix 0000:00:00.5: error -19 setting up PHY for tree 0, switch 0, port 3
Signed-off-by: Vladimir Oltean <olteanv@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The purpose of this override is to give the user an indication of what
the number of the CPU port is (in DSA, the CPU port is a hardware
implementation detail and not a network interface capable of traffic).
However, it has always failed (by design) at providing this information
to the user in a reliable fashion.
Prior to commit 3369afba1e ("net: Call into DSA netdevice_ops
wrappers"), the behavior was to only override this callback if it was
not provided by the DSA master.
That was its first failure: if the DSA master itself was a DSA port or a
switchdev, then the user would not see the number of the CPU port in
/sys/class/net/eth0/phys_port_name, but the number of the DSA master
port within its respective physical switch.
But that was actually ok in a way. The commit mentioned above changed
that behavior, and now overrides the master's ndo_get_phys_port_name
unconditionally. That comes with problems of its own, which are worse in
a way.
The idea is that it's typical for switchdev users to have udev rules for
consistent interface naming. These are based, among other things, on
the phys_port_name attribute. If we let the DSA switch at the bottom
to start randomly overriding ndo_get_phys_port_name with its own CPU
port, we basically lose any predictability in interface naming, or even
uniqueness, for that matter.
So, there are reasons to let DSA override the master's callback (to
provide a consistent interface, a number which has a clear meaning and
must not be interpreted according to context), and there are reasons to
not let DSA override it (it breaks udev matching for the DSA master).
But, there is an alternative method for users to retrieve the number of
the CPU port of each DSA switch in the system:
$ devlink port
pci/0000:00:00.5/0: type eth netdev swp0 flavour physical port 0
pci/0000:00:00.5/2: type eth netdev swp2 flavour physical port 2
pci/0000:00:00.5/4: type notset flavour cpu port 4
spi/spi2.0/0: type eth netdev sw0p0 flavour physical port 0
spi/spi2.0/1: type eth netdev sw0p1 flavour physical port 1
spi/spi2.0/2: type eth netdev sw0p2 flavour physical port 2
spi/spi2.0/4: type notset flavour cpu port 4
spi/spi2.1/0: type eth netdev sw1p0 flavour physical port 0
spi/spi2.1/1: type eth netdev sw1p1 flavour physical port 1
spi/spi2.1/2: type eth netdev sw1p2 flavour physical port 2
spi/spi2.1/3: type eth netdev sw1p3 flavour physical port 3
spi/spi2.1/4: type notset flavour cpu port 4
So remove this duplicated, unreliable and troublesome method. From this
patch on, the phys_port_name attribute of the DSA master will only
contain information about itself (if at all). If the users need reliable
information about the CPU port they're probably using devlink anyway.
Signed-off-by: Vladimir Oltean <olteanv@gmail.com>
Acked-by: florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Due to unified Ethernet Switch Device Tree Bindings allow for ethernet-ports as
encapsulating node as well.
Signed-off-by: Kurt Kanzenbach <kurt@linutronix.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Now that DSA supports MTU configuration, undo the effects of commit
8b1efc0f83 ("net: remove MTU limits on a few ether_setup callers") and
let DSA interfaces use the default min_mtu and max_mtu specified by
ether_setup(). This is more important for min_mtu: since DSA is
Ethernet, the minimum MTU is the same as of any other Ethernet
interface, and definitely not zero. For the max_mtu, we have a callback
through which drivers can override that, if they want to.
Signed-off-by: Vladimir Oltean <olteanv@gmail.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Now that we have all the infrastructure in place for calling into the
dsa_ptr->netdev_ops function pointers, install them when we configure
the DSA CPU/management interface and tear them down. The flow is
unchanged from before, but now we preserve equality of tests when
network device drivers do tests like dev->netdev_ops == &foo_ops which
was not the case before since we were allocating an entirely new
structure.
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
With this patch we try to kill 2 birds with 1 stone.
First of all, some switches that use tag_ocelot.c don't have the exact
same bitfield layout for the DSA tags. The destination ports field is
different for Seville VSC9953 for example. So the choices are to either
duplicate tag_ocelot.c into a new tag_seville.c (sub-optimal) or somehow
take into account a supposed ocelot->dest_ports_offset when packing this
field into the DSA injection header (again not ideal).
Secondly, tag_ocelot.c already needs to memset a 128-bit area to zero
and call some packing() functions of dubious performance in the
fastpath. And most of the values it needs to pack are pretty much
constant (BYPASS=1, SRC_PORT=CPU, DEST=port index). So it would be good
if we could improve that.
The proposed solution is to allocate a memory area per port at probe
time, initialize that with the statically defined bits as per chip
hardware revision, and just perform a simpler memcpy in the fastpath.
Other alternatives have been analyzed, such as:
- Create a separate tag_seville.c: too much code duplication for just 1
bit field difference.
- Create a separate DSA_TAG_PROTO_SEVILLE under tag_ocelot.c, just like
tag_brcm.c, which would have a separate .xmit function. Again, too
much code duplication for just 1 bit field difference.
- Allocate the template from the init function of the tag_ocelot.c
module, instead of from the driver: couldn't figure out a method of
accessing the correct port template corresponding to the correct
tagger in the .xmit function.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently, devlink_port_attrs_set accepts a long list of parameters,
that most of them are devlink port's attributes.
Use the devlink_port_attrs struct to replace the relevant parameters.
Signed-off-by: Danielle Ratson <danieller@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This implements the known parts of the Realtek 4 byte
tag protocol version 0xA, as found in the RTL8366RB
DSA switch.
It is designated as protocol version 0xA as a
different Realtek 4 byte tag format with protocol
version 0x9 is known to exist in the Realtek RTL8306
chips.
The tag and switch chip lacks public documentation, so
the tag format has been reverse-engineered from
packet dumps. As only ingress traffic has been available
for analysis an egress tag has not been possible to
develop (even using educated guesses about bit fields)
so this is as far as it gets. It is not known if the
switch even supports egress tagging.
Excessive attempts to figure out the egress tag format
was made. When nothing else worked, I just tried all bit
combinations with 0xannp where a is protocol and p is
port. I looped through all values several times trying
to get a response from ping, without any positive
result.
Using just these ingress tags however, the switch
functionality is vastly improved and the packets find
their way into the destination port without any
tricky VLAN configuration. On the D-Link DIR-685 the
LAN ports now come up and respond to ping without
any command line configuration so this is a real
improvement for users.
Egress packets need to be restricted to the proper
target ports using VLAN, which the RTL8366RB DSA
switch driver already sets up.
Cc: DENG Qingfang <dqfext@gmail.com>
Cc: Mauri Sandberg <sandberg@mailfence.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
net/dsa/tag_qca.c:48:15: warning: incorrect type in assignment (different base types)
net/dsa/tag_qca.c:48:15: expected unsigned short [usertype]
net/dsa/tag_qca.c:48:15: got restricted __be16 [usertype]
net/dsa/tag_qca.c:68:13: warning: incorrect type in assignment (different base types)
net/dsa/tag_qca.c:68:13: expected restricted __be16 [usertype] hdr
net/dsa/tag_qca.c:68:13: got int
net/dsa/tag_qca.c:71:16: warning: restricted __be16 degrades to integer
net/dsa/tag_qca.c:81:17: warning: restricted __be16 degrades to integer
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
net/dsa/tag_mtk.c:84:13: warning: incorrect type in assignment (different base types)
net/dsa/tag_mtk.c:84:13: expected restricted __be16 [usertype] hdr
net/dsa/tag_mtk.c:84:13: got int
net/dsa/tag_mtk.c:94:17: warning: restricted __be16 degrades to integer
The result of a ntohs() is not __be16, but u16.
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
net/dsa/tag_lan9303.c:76:24: warning: incorrect type in assignment (different base types)
net/dsa/tag_lan9303.c:76:24: expected unsigned short [usertype]
net/dsa/tag_lan9303.c:76:24: got restricted __be16 [usertype]
net/dsa/tag_lan9303.c:80:24: warning: incorrect type in assignment (different base types)
net/dsa/tag_lan9303.c:80:24: expected unsigned short [usertype]
net/dsa/tag_lan9303.c:80:24: got restricted __be16 [usertype]
net/dsa/tag_lan9303.c:106:31: warning: restricted __be16 degrades to integer
net/dsa/tag_lan9303.c:111:24: warning: cast to restricted __be16
net/dsa/tag_lan9303.c:111:24: warning: cast to restricted __be16
net/dsa/tag_lan9303.c:111:24: warning: cast to restricted __be16
net/dsa/tag_lan9303.c:111:24: warning: cast to restricted __be16
Make use of __be16 where appropriate to fix these warnings.
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
cpu_to_be16 returns a __be16 value. So what it is assigned to needs to
have the same type to avoid warnings.
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
net/dsa/slave.c:505:13: warning: incorrect type in initializer (different address spaces)
net/dsa/slave.c:505:13: expected void const [noderef] <asn:3> *__vpp_verify
net/dsa/slave.c:505:13: got struct pcpu_sw_netstats *
Add the needed _percpu property to prevent this warning.
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
It is not very informative to know the DSA master device when a
subordinate network device fails to get its PHY setup. Provide the
device name and capitalize PHY while we are it.
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
The driver for Marvell switches puts all ports in IGMP snooping mode
which results in all IGMP/MLD frames that ingress on the ports to be
forwarded to the CPU only.
The bridge code in the kernel can then interpret these frames and act
upon them, for instance by updating the mdb in the switch to reflect
multicast memberships of stations connected to the ports. However,
the IGMP/MLD frames must then also be forwarded to other ports of the
bridge so external IGMP queriers can track membership reports, and
external multicast clients can receive query reports from foreign IGMP
queriers.
Currently, this is impossible as the EDSA tagger sets offload_fwd_mark
on the skb when it unwraps the tagged frames, and that will make the
switchdev layer prevent the skb from egressing on any other port of
the same switch.
To fix that, look at the To_CPU code in the DSA header and make
forwarding of the frame possible for trapped IGMP packets.
Introduce some #defines for the frame types to make the code a bit more
comprehensive.
This was tested on a Marvell 88E6352 variant.
Signed-off-by: Daniel Mack <daniel@zonque.org>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Tested-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull networking fixes from David Miller:
1) Fix cfg80211 deadlock, from Johannes Berg.
2) RXRPC fails to send norigications, from David Howells.
3) MPTCP RM_ADDR parsing has an off by one pointer error, fix from
Geliang Tang.
4) Fix crash when using MSG_PEEK with sockmap, from Anny Hu.
5) The ucc_geth driver needs __netdev_watchdog_up exported, from
Valentin Longchamp.
6) Fix hashtable memory leak in dccp, from Wang Hai.
7) Fix how nexthops are marked as FDB nexthops, from David Ahern.
8) Fix mptcp races between shutdown and recvmsg, from Paolo Abeni.
9) Fix crashes in tipc_disc_rcv(), from Tuong Lien.
10) Fix link speed reporting in iavf driver, from Brett Creeley.
11) When a channel is used for XSK and then reused again later for XSK,
we forget to clear out the relevant data structures in mlx5 which
causes all kinds of problems. Fix from Maxim Mikityanskiy.
12) Fix memory leak in genetlink, from Cong Wang.
13) Disallow sockmap attachments to UDP sockets, it simply won't work.
From Lorenz Bauer.
* git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (83 commits)
net: ethernet: ti: ale: fix allmulti for nu type ale
net: ethernet: ti: am65-cpsw-nuss: fix ale parameters init
net: atm: Remove the error message according to the atomic context
bpf: Undo internal BPF_PROBE_MEM in BPF insns dump
libbpf: Support pre-initializing .bss global variables
tools/bpftool: Fix skeleton codegen
bpf: Fix memlock accounting for sock_hash
bpf: sockmap: Don't attach programs to UDP sockets
bpf: tcp: Recv() should return 0 when the peer socket is closed
ibmvnic: Flush existing work items before device removal
genetlink: clean up family attributes allocations
net: ipa: header pad field only valid for AP->modem endpoint
net: ipa: program upper nibbles of sequencer type
net: ipa: fix modem LAN RX endpoint id
net: ipa: program metadata mask differently
ionic: add pcie_print_link_status
rxrpc: Fix race between incoming ACK parser and retransmitter
net/mlx5: E-Switch, Fix some error pointer dereferences
net/mlx5: Don't fail driver on failure to create debugfs
net/mlx5e: CT: Fix ipv6 nat header rewrite actions
...
Since commit 84af7a6194 ("checkpatch: kconfig: prefer 'help' over
'---help---'"), the number of '---help---' has been gradually
decreasing, but there are still more than 2400 instances.
This commit finishes the conversion. While I touched the lines,
I also fixed the indentation.
There are a variety of indentation styles found.
a) 4 spaces + '---help---'
b) 7 spaces + '---help---'
c) 8 spaces + '---help---'
d) 1 space + 1 tab + '---help---'
e) 1 tab + '---help---' (correct indentation)
f) 1 tab + 1 space + '---help---'
g) 1 tab + 2 spaces + '---help---'
In order to convert all of them to 1 tab + 'help', I ran the
following commend:
$ find . -name 'Kconfig*' | xargs sed -i 's/^[[:space:]]*---help---/\thelp/'
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
The dynamic key update for addr_list_lock still causes troubles,
for example the following race condition still exists:
CPU 0: CPU 1:
(RCU read lock) (RTNL lock)
dev_mc_seq_show() netdev_update_lockdep_key()
-> lockdep_unregister_key()
-> netif_addr_lock_bh()
because lockdep doesn't provide an API to update it atomically.
Therefore, we have to move it back to static keys and use subclass
for nest locking like before.
In commit 1a33e10e4a ("net: partially revert dynamic lockdep key
changes"), I already reverted most parts of commit ab92d68fc2
("net: core: add generic lockdep keys").
This patch reverts the rest and also part of commit f3b0a18bb6
("net: remove unnecessary variables and callback"). After this
patch, addr_list_lock changes back to using static keys and
subclasses to satisfy lockdep. Thanks to dev->lower_level, we do
not have to change back to ->ndo_get_lock_subclass().
And hopefully this reduces some syzbot lockdep noises too.
Reported-by: syzbot+f3a0e80c34b3fc28ac5e@syzkaller.appspotmail.com
Cc: Taehee Yoo <ap420073@gmail.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
xdp_umem.c had overlapping changes between the 64-bit math fix
for the calculation of npgs and the removal of the zerocopy
memory type which got rid of the chunk_size_nohdr member.
The mlx5 Kconfig conflict is a case where we just take the
net-next copy of the Kconfig entry dependency as it takes on
the ESWITCH dependency by one level of indirection which is
what the 'net' conflicting change is trying to ensure.
Signed-off-by: David S. Miller <davem@davemloft.net>
Right now, our only tag_8021q user, sja1105, has the ability to restore
bridge VLANs on its own, so this logic is unnecessary.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Be there a platform with the following layout:
Regular NIC
|
+----> DSA master for switch port
|
+----> DSA master for another switch port
After changing DSA back to static lockdep class keys in commit
1a33e10e4a ("net: partially revert dynamic lockdep key changes"), this
kernel splat can be seen:
[ 13.361198] ============================================
[ 13.366524] WARNING: possible recursive locking detected
[ 13.371851] 5.7.0-rc4-02121-gc32a05ecd7af-dirty #988 Not tainted
[ 13.377874] --------------------------------------------
[ 13.383201] swapper/0/0 is trying to acquire lock:
[ 13.388004] ffff0000668ff298 (&dsa_slave_netdev_xmit_lock_key){+.-.}-{2:2}, at: __dev_queue_xmit+0x84c/0xbe0
[ 13.397879]
[ 13.397879] but task is already holding lock:
[ 13.403727] ffff0000661a1698 (&dsa_slave_netdev_xmit_lock_key){+.-.}-{2:2}, at: __dev_queue_xmit+0x84c/0xbe0
[ 13.413593]
[ 13.413593] other info that might help us debug this:
[ 13.420140] Possible unsafe locking scenario:
[ 13.420140]
[ 13.426075] CPU0
[ 13.428523] ----
[ 13.430969] lock(&dsa_slave_netdev_xmit_lock_key);
[ 13.435946] lock(&dsa_slave_netdev_xmit_lock_key);
[ 13.440924]
[ 13.440924] *** DEADLOCK ***
[ 13.440924]
[ 13.446860] May be due to missing lock nesting notation
[ 13.446860]
[ 13.453668] 6 locks held by swapper/0/0:
[ 13.457598] #0: ffff800010003de0 ((&idev->mc_ifc_timer)){+.-.}-{0:0}, at: call_timer_fn+0x0/0x400
[ 13.466593] #1: ffffd4d3fb478700 (rcu_read_lock){....}-{1:2}, at: mld_sendpack+0x0/0x560
[ 13.474803] #2: ffffd4d3fb478728 (rcu_read_lock_bh){....}-{1:2}, at: ip6_finish_output2+0x64/0xb10
[ 13.483886] #3: ffffd4d3fb478728 (rcu_read_lock_bh){....}-{1:2}, at: __dev_queue_xmit+0x6c/0xbe0
[ 13.492793] #4: ffff0000661a1698 (&dsa_slave_netdev_xmit_lock_key){+.-.}-{2:2}, at: __dev_queue_xmit+0x84c/0xbe0
[ 13.503094] #5: ffffd4d3fb478728 (rcu_read_lock_bh){....}-{1:2}, at: __dev_queue_xmit+0x6c/0xbe0
[ 13.512000]
[ 13.512000] stack backtrace:
[ 13.516369] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.7.0-rc4-02121-gc32a05ecd7af-dirty #988
[ 13.530421] Call trace:
[ 13.532871] dump_backtrace+0x0/0x1d8
[ 13.536539] show_stack+0x24/0x30
[ 13.539862] dump_stack+0xe8/0x150
[ 13.543271] __lock_acquire+0x1030/0x1678
[ 13.547290] lock_acquire+0xf8/0x458
[ 13.550873] _raw_spin_lock+0x44/0x58
[ 13.554543] __dev_queue_xmit+0x84c/0xbe0
[ 13.558562] dev_queue_xmit+0x24/0x30
[ 13.562232] dsa_slave_xmit+0xe0/0x128
[ 13.565988] dev_hard_start_xmit+0xf4/0x448
[ 13.570182] __dev_queue_xmit+0x808/0xbe0
[ 13.574200] dev_queue_xmit+0x24/0x30
[ 13.577869] neigh_resolve_output+0x15c/0x220
[ 13.582237] ip6_finish_output2+0x244/0xb10
[ 13.586430] __ip6_finish_output+0x1dc/0x298
[ 13.590709] ip6_output+0x84/0x358
[ 13.594116] mld_sendpack+0x2bc/0x560
[ 13.597786] mld_ifc_timer_expire+0x210/0x390
[ 13.602153] call_timer_fn+0xcc/0x400
[ 13.605822] run_timer_softirq+0x588/0x6e0
[ 13.609927] __do_softirq+0x118/0x590
[ 13.613597] irq_exit+0x13c/0x148
[ 13.616918] __handle_domain_irq+0x6c/0xc0
[ 13.621023] gic_handle_irq+0x6c/0x160
[ 13.624779] el1_irq+0xbc/0x180
[ 13.627927] cpuidle_enter_state+0xb4/0x4d0
[ 13.632120] cpuidle_enter+0x3c/0x50
[ 13.635703] call_cpuidle+0x44/0x78
[ 13.639199] do_idle+0x228/0x2c8
[ 13.642433] cpu_startup_entry+0x2c/0x48
[ 13.646363] rest_init+0x1ac/0x280
[ 13.649773] arch_call_rest_init+0x14/0x1c
[ 13.653878] start_kernel+0x490/0x4bc
Lockdep keys themselves were added in commit ab92d68fc2 ("net: core:
add generic lockdep keys"), and it's very likely that this splat existed
since then, but I have no real way to check, since this stacked platform
wasn't supported by mainline back then.
>From Taehee's own words:
This patch was considered that all stackable devices have LLTX flag.
But the dsa doesn't have LLTX, so this splat happened.
After this patch, dsa shares the same lockdep class key.
On the nested dsa interface architecture, which you illustrated,
the same lockdep class key will be used in __dev_queue_xmit() because
dsa doesn't have LLTX.
So that lockdep detects deadlock because the same lockdep class key is
used recursively although actually the different locks are used.
There are some ways to fix this problem.
1. using NETIF_F_LLTX flag.
If possible, using the LLTX flag is a very clear way for it.
But I'm so sorry I don't know whether the dsa could have LLTX or not.
2. using dynamic lockdep again.
It means that each interface uses a separate lockdep class key.
So, lockdep will not detect recursive locking.
But this way has a problem that it could consume lockdep class key
too many.
Currently, lockdep can have 8192 lockdep class keys.
- you can see this number with the following command.
cat /proc/lockdep_stats
lock-classes: 1251 [max: 8192]
...
The [max: 8192] means that the maximum number of lockdep class keys.
If too many lockdep class keys are registered, lockdep stops to work.
So, using a dynamic(separated) lockdep class key should be considered
carefully.
In addition, updating lockdep class key routine might have to be existing.
(lockdep_register_key(), lockdep_set_class(), lockdep_unregister_key())
3. Using lockdep subclass.
A lockdep class key could have 8 subclasses.
The different subclass is considered different locks by lockdep
infrastructure.
But "lock-classes" is not counted by subclasses.
So, it could avoid stopping lockdep infrastructure by an overflow of
lockdep class keys.
This approach should also have an updating lockdep class key routine.
(lockdep_set_subclass())
4. Using nonvalidate lockdep class key.
The lockdep infrastructure supports nonvalidate lockdep class key type.
It means this lockdep is not validated by lockdep infrastructure.
So, the splat will not happen but lockdep couldn't detect real deadlock
case because lockdep really doesn't validate it.
I think this should be used for really special cases.
(lockdep_set_novalidate_class())
Further discussion here:
https://patchwork.ozlabs.org/project/netdev/patch/20200503052220.4536-2-xiyou.wangcong@gmail.com/
There appears to be no negative side-effect to declaring lockless TX for
the DSA virtual interfaces, which means they handle their own locking.
So that's what we do to make the splat go away.
Patch tested in a wide variety of cases: unicast, multicast, PTP, etc.
Fixes: ab92d68fc2 ("net: core: add generic lockdep keys")
Suggested-by: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The MSCC bug fix in 'net' had to be slightly adjusted because the
register accesses are done slightly differently in net-next.
Signed-off-by: David S. Miller <davem@davemloft.net>