License cleanup: add SPDX GPL-2.0 license identifier to files with no license
Many source files in the tree are missing licensing information, which
makes it harder for compliance tools to determine the correct license.
By default all files without license information are under the default
license of the kernel, which is GPL version 2.
Update the files which contain no license information with the 'GPL-2.0'
SPDX license identifier. The SPDX identifier is a legally binding
shorthand, which can be used instead of the full boiler plate text.
This patch is based on work done by Thomas Gleixner and Kate Stewart and
Philippe Ombredanne.
How this work was done:
Patches were generated and checked against linux-4.14-rc6 for a subset of
the use cases:
- file had no licensing information it it.
- file was a */uapi/* one with no licensing information in it,
- file was a */uapi/* one with existing licensing information,
Further patches will be generated in subsequent months to fix up cases
where non-standard license headers were used, and references to license
had to be inferred by heuristics based on keywords.
The analysis to determine which SPDX License Identifier to be applied to
a file was done in a spreadsheet of side by side results from of the
output of two independent scanners (ScanCode & Windriver) producing SPDX
tag:value files created by Philippe Ombredanne. Philippe prepared the
base worksheet, and did an initial spot review of a few 1000 files.
The 4.13 kernel was the starting point of the analysis with 60,537 files
assessed. Kate Stewart did a file by file comparison of the scanner
results in the spreadsheet to determine which SPDX license identifier(s)
to be applied to the file. She confirmed any determination that was not
immediately clear with lawyers working with the Linux Foundation.
Criteria used to select files for SPDX license identifier tagging was:
- Files considered eligible had to be source code files.
- Make and config files were included as candidates if they contained >5
lines of source
- File already had some variant of a license header in it (even if <5
lines).
All documentation files were explicitly excluded.
The following heuristics were used to determine which SPDX license
identifiers to apply.
- when both scanners couldn't find any license traces, file was
considered to have no license information in it, and the top level
COPYING file license applied.
For non */uapi/* files that summary was:
SPDX license identifier # files
---------------------------------------------------|-------
GPL-2.0 11139
and resulted in the first patch in this series.
If that file was a */uapi/* path one, it was "GPL-2.0 WITH
Linux-syscall-note" otherwise it was "GPL-2.0". Results of that was:
SPDX license identifier # files
---------------------------------------------------|-------
GPL-2.0 WITH Linux-syscall-note 930
and resulted in the second patch in this series.
- if a file had some form of licensing information in it, and was one
of the */uapi/* ones, it was denoted with the Linux-syscall-note if
any GPL family license was found in the file or had no licensing in
it (per prior point). Results summary:
SPDX license identifier # files
---------------------------------------------------|------
GPL-2.0 WITH Linux-syscall-note 270
GPL-2.0+ WITH Linux-syscall-note 169
((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause) 21
((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause) 17
LGPL-2.1+ WITH Linux-syscall-note 15
GPL-1.0+ WITH Linux-syscall-note 14
((GPL-2.0+ WITH Linux-syscall-note) OR BSD-3-Clause) 5
LGPL-2.0+ WITH Linux-syscall-note 4
LGPL-2.1 WITH Linux-syscall-note 3
((GPL-2.0 WITH Linux-syscall-note) OR MIT) 3
((GPL-2.0 WITH Linux-syscall-note) AND MIT) 1
and that resulted in the third patch in this series.
- when the two scanners agreed on the detected license(s), that became
the concluded license(s).
- when there was disagreement between the two scanners (one detected a
license but the other didn't, or they both detected different
licenses) a manual inspection of the file occurred.
- In most cases a manual inspection of the information in the file
resulted in a clear resolution of the license that should apply (and
which scanner probably needed to revisit its heuristics).
- When it was not immediately clear, the license identifier was
confirmed with lawyers working with the Linux Foundation.
- If there was any question as to the appropriate license identifier,
the file was flagged for further research and to be revisited later
in time.
In total, over 70 hours of logged manual review was done on the
spreadsheet to determine the SPDX license identifiers to apply to the
source files by Kate, Philippe, Thomas and, in some cases, confirmation
by lawyers working with the Linux Foundation.
Kate also obtained a third independent scan of the 4.13 code base from
FOSSology, and compared selected files where the other two scanners
disagreed against that SPDX file, to see if there was new insights. The
Windriver scanner is based on an older version of FOSSology in part, so
they are related.
Thomas did random spot checks in about 500 files from the spreadsheets
for the uapi headers and agreed with SPDX license identifier in the
files he inspected. For the non-uapi files Thomas did random spot checks
in about 15000 files.
In initial set of patches against 4.14-rc6, 3 files were found to have
copy/paste license identifier errors, and have been fixed to reflect the
correct identifier.
Additionally Philippe spent 10 hours this week doing a detailed manual
inspection and review of the 12,461 patched files from the initial patch
version early this week with:
- a full scancode scan run, collecting the matched texts, detected
license ids and scores
- reviewing anything where there was a license detected (about 500+
files) to ensure that the applied SPDX license was correct
- reviewing anything where there was no detection but the patch license
was not GPL-2.0 WITH Linux-syscall-note to ensure that the applied
SPDX license was correct
This produced a worksheet with 20 files needing minor correction. This
worksheet was then exported into 3 different .csv files for the
different types of files to be modified.
These .csv files were then reviewed by Greg. Thomas wrote a script to
parse the csv files and add the proper SPDX tag to the file, in the
format that the file expected. This script was further refined by Greg
based on the output to detect more types of files automatically and to
distinguish between header and source .c files (which need different
comment types.) Finally Greg ran the script using the .csv files to
generate the patches.
Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org>
Reviewed-by: Philippe Ombredanne <pombredanne@nexb.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-11-01 22:07:57 +08:00
|
|
|
// SPDX-License-Identifier: GPL-2.0
|
2012-12-07 08:04:48 +08:00
|
|
|
#include <linux/err.h>
|
|
|
|
#include <linux/igmp.h>
|
|
|
|
#include <linux/kernel.h>
|
|
|
|
#include <linux/netdevice.h>
|
|
|
|
#include <linux/rculist.h>
|
|
|
|
#include <linux/skbuff.h>
|
2012-12-12 06:23:08 +08:00
|
|
|
#include <linux/if_ether.h>
|
2012-12-07 08:04:48 +08:00
|
|
|
#include <net/ip.h>
|
|
|
|
#include <net/netlink.h>
|
2016-01-11 04:06:23 +08:00
|
|
|
#include <net/switchdev.h>
|
2012-12-07 08:04:48 +08:00
|
|
|
#if IS_ENABLED(CONFIG_IPV6)
|
|
|
|
#include <net/ipv6.h>
|
2013-09-04 08:13:39 +08:00
|
|
|
#include <net/addrconf.h>
|
2012-12-07 08:04:48 +08:00
|
|
|
#endif
|
|
|
|
|
|
|
|
#include "br_private.h"
|
|
|
|
|
2021-05-13 21:20:45 +08:00
|
|
|
static bool
|
2021-08-10 23:29:32 +08:00
|
|
|
br_ip4_rports_get_timer(struct net_bridge_mcast_port *pmctx,
|
|
|
|
unsigned long *timer)
|
2021-05-13 21:20:45 +08:00
|
|
|
{
|
2021-08-10 23:29:32 +08:00
|
|
|
*timer = br_timer_value(&pmctx->ip4_mc_router_timer);
|
|
|
|
return !hlist_unhashed(&pmctx->ip4_rlist);
|
2021-05-13 21:20:45 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
static bool
|
2021-08-10 23:29:32 +08:00
|
|
|
br_ip6_rports_get_timer(struct net_bridge_mcast_port *pmctx,
|
|
|
|
unsigned long *timer)
|
2021-05-13 21:20:45 +08:00
|
|
|
{
|
2021-05-13 21:20:51 +08:00
|
|
|
#if IS_ENABLED(CONFIG_IPV6)
|
2021-08-10 23:29:32 +08:00
|
|
|
*timer = br_timer_value(&pmctx->ip6_mc_router_timer);
|
|
|
|
return !hlist_unhashed(&pmctx->ip6_rlist);
|
2021-05-13 21:20:51 +08:00
|
|
|
#else
|
2021-05-13 21:20:45 +08:00
|
|
|
*timer = 0;
|
|
|
|
return false;
|
2021-05-13 21:20:51 +08:00
|
|
|
#endif
|
2021-05-13 21:20:45 +08:00
|
|
|
}
|
|
|
|
|
2021-08-16 22:57:05 +08:00
|
|
|
static size_t __br_rports_one_size(void)
|
|
|
|
{
|
|
|
|
return nla_total_size(sizeof(u32)) + /* MDBA_ROUTER_PORT */
|
|
|
|
nla_total_size(sizeof(u32)) + /* MDBA_ROUTER_PATTR_TIMER */
|
|
|
|
nla_total_size(sizeof(u8)) + /* MDBA_ROUTER_PATTR_TYPE */
|
|
|
|
nla_total_size(sizeof(u32)) + /* MDBA_ROUTER_PATTR_INET_TIMER */
|
|
|
|
nla_total_size(sizeof(u32)) + /* MDBA_ROUTER_PATTR_INET6_TIMER */
|
|
|
|
nla_total_size(sizeof(u32)); /* MDBA_ROUTER_PATTR_VID */
|
|
|
|
}
|
|
|
|
|
|
|
|
size_t br_rports_size(const struct net_bridge_mcast *brmctx)
|
|
|
|
{
|
|
|
|
struct net_bridge_mcast_port *pmctx;
|
|
|
|
size_t size = nla_total_size(0); /* MDBA_ROUTER */
|
|
|
|
|
|
|
|
rcu_read_lock();
|
|
|
|
hlist_for_each_entry_rcu(pmctx, &brmctx->ip4_mc_router_list,
|
|
|
|
ip4_rlist)
|
|
|
|
size += __br_rports_one_size();
|
|
|
|
|
|
|
|
#if IS_ENABLED(CONFIG_IPV6)
|
|
|
|
hlist_for_each_entry_rcu(pmctx, &brmctx->ip6_mc_router_list,
|
|
|
|
ip6_rlist)
|
|
|
|
size += __br_rports_one_size();
|
|
|
|
#endif
|
|
|
|
rcu_read_unlock();
|
|
|
|
|
|
|
|
return size;
|
|
|
|
}
|
|
|
|
|
2021-08-10 23:29:33 +08:00
|
|
|
int br_rports_fill_info(struct sk_buff *skb,
|
|
|
|
const struct net_bridge_mcast *brmctx)
|
2012-12-07 08:04:48 +08:00
|
|
|
{
|
2021-08-10 23:29:32 +08:00
|
|
|
u16 vid = brmctx->vlan ? brmctx->vlan->vid : 0;
|
2021-05-13 21:20:45 +08:00
|
|
|
bool have_ip4_mc_rtr, have_ip6_mc_rtr;
|
|
|
|
unsigned long ip4_timer, ip6_timer;
|
2016-02-27 04:20:04 +08:00
|
|
|
struct nlattr *nest, *port_nest;
|
2021-05-13 21:20:45 +08:00
|
|
|
struct net_bridge_port *p;
|
|
|
|
|
2021-08-10 23:29:32 +08:00
|
|
|
if (!brmctx->multicast_router || !br_rports_have_mc_router(brmctx))
|
2012-12-07 08:04:48 +08:00
|
|
|
return 0;
|
|
|
|
|
2019-04-26 17:13:06 +08:00
|
|
|
nest = nla_nest_start_noflag(skb, MDBA_ROUTER);
|
2012-12-07 08:04:48 +08:00
|
|
|
if (nest == NULL)
|
|
|
|
return -EMSGSIZE;
|
|
|
|
|
2021-08-10 23:29:32 +08:00
|
|
|
list_for_each_entry_rcu(p, &brmctx->br->port_list, list) {
|
|
|
|
struct net_bridge_mcast_port *pmctx;
|
|
|
|
|
|
|
|
if (vid) {
|
|
|
|
struct net_bridge_vlan *v;
|
|
|
|
|
|
|
|
v = br_vlan_find(nbp_vlan_group(p), vid);
|
|
|
|
if (!v)
|
|
|
|
continue;
|
|
|
|
pmctx = &v->port_mcast_ctx;
|
|
|
|
} else {
|
|
|
|
pmctx = &p->multicast_ctx;
|
|
|
|
}
|
|
|
|
|
|
|
|
have_ip4_mc_rtr = br_ip4_rports_get_timer(pmctx, &ip4_timer);
|
|
|
|
have_ip6_mc_rtr = br_ip6_rports_get_timer(pmctx, &ip6_timer);
|
2021-05-13 21:20:45 +08:00
|
|
|
|
|
|
|
if (!have_ip4_mc_rtr && !have_ip6_mc_rtr)
|
2016-02-27 04:20:04 +08:00
|
|
|
continue;
|
2021-05-13 21:20:45 +08:00
|
|
|
|
2019-04-26 17:13:06 +08:00
|
|
|
port_nest = nla_nest_start_noflag(skb, MDBA_ROUTER_PORT);
|
2016-02-27 04:20:04 +08:00
|
|
|
if (!port_nest)
|
2012-12-07 08:04:48 +08:00
|
|
|
goto fail;
|
2021-05-13 21:20:45 +08:00
|
|
|
|
2016-02-27 04:20:04 +08:00
|
|
|
if (nla_put_nohdr(skb, sizeof(u32), &p->dev->ifindex) ||
|
|
|
|
nla_put_u32(skb, MDBA_ROUTER_PATTR_TIMER,
|
2021-05-13 21:20:45 +08:00
|
|
|
max(ip4_timer, ip6_timer)) ||
|
2016-02-27 04:20:04 +08:00
|
|
|
nla_put_u8(skb, MDBA_ROUTER_PATTR_TYPE,
|
2021-07-20 01:06:23 +08:00
|
|
|
p->multicast_ctx.multicast_router) ||
|
2021-05-13 21:20:52 +08:00
|
|
|
(have_ip4_mc_rtr &&
|
|
|
|
nla_put_u32(skb, MDBA_ROUTER_PATTR_INET_TIMER,
|
|
|
|
ip4_timer)) ||
|
|
|
|
(have_ip6_mc_rtr &&
|
|
|
|
nla_put_u32(skb, MDBA_ROUTER_PATTR_INET6_TIMER,
|
2021-08-10 23:29:33 +08:00
|
|
|
ip6_timer)) ||
|
|
|
|
(vid && nla_put_u16(skb, MDBA_ROUTER_PATTR_VID, vid))) {
|
2016-02-27 04:20:04 +08:00
|
|
|
nla_nest_cancel(skb, port_nest);
|
|
|
|
goto fail;
|
|
|
|
}
|
|
|
|
nla_nest_end(skb, port_nest);
|
2012-12-07 08:04:48 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
nla_nest_end(skb, nest);
|
|
|
|
return 0;
|
|
|
|
fail:
|
|
|
|
nla_nest_cancel(skb, nest);
|
|
|
|
return -EMSGSIZE;
|
|
|
|
}
|
|
|
|
|
2016-02-03 16:57:05 +08:00
|
|
|
static void __mdb_entry_fill_flags(struct br_mdb_entry *e, unsigned char flags)
|
|
|
|
{
|
|
|
|
e->state = flags & MDB_PG_FLAGS_PERMANENT;
|
|
|
|
e->flags = 0;
|
|
|
|
if (flags & MDB_PG_FLAGS_OFFLOAD)
|
|
|
|
e->flags |= MDB_FLAGS_OFFLOAD;
|
2019-07-30 20:20:41 +08:00
|
|
|
if (flags & MDB_PG_FLAGS_FAST_LEAVE)
|
|
|
|
e->flags |= MDB_FLAGS_FAST_LEAVE;
|
net: bridge: mcast: handle port group filter modes
We need to handle group filter mode transitions and initial state.
To change a port group's INCLUDE -> EXCLUDE mode (or when we have added
a new port group in EXCLUDE mode) we need to add that port to all of
*,G ports' S,G entries for proper replication. When the EXCLUDE state is
changed from IGMPv3 report, br_multicast_fwd_filter_exclude() must be
called after the source list processing because the assumption is that
all of the group's S,G entries will be created before transitioning to
EXCLUDE mode, i.e. most importantly its blocked entries will already be
added so it will not get automatically added to them.
The transition EXCLUDE -> INCLUDE happens only when a port group timer
expires, it requires us to remove that port from all of *,G ports' S,G
entries where it was automatically added previously.
Finally when we are adding a new S,G entry we must add all of *,G's
EXCLUDE ports to it.
In order to distinguish automatically added *,G EXCLUDE ports we have a
new port group flag - MDB_PG_FLAGS_STAR_EXCL.
Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-22 15:30:24 +08:00
|
|
|
if (flags & MDB_PG_FLAGS_STAR_EXCL)
|
|
|
|
e->flags |= MDB_FLAGS_STAR_EXCL;
|
2020-09-22 15:30:25 +08:00
|
|
|
if (flags & MDB_PG_FLAGS_BLOCKED)
|
|
|
|
e->flags |= MDB_FLAGS_BLOCKED;
|
2016-02-03 16:57:05 +08:00
|
|
|
}
|
|
|
|
|
2020-09-22 15:30:19 +08:00
|
|
|
static void __mdb_entry_to_br_ip(struct br_mdb_entry *entry, struct br_ip *ip,
|
|
|
|
struct nlattr **mdb_attrs)
|
2016-04-21 18:52:44 +08:00
|
|
|
{
|
|
|
|
memset(ip, 0, sizeof(struct br_ip));
|
|
|
|
ip->vid = entry->vid;
|
|
|
|
ip->proto = entry->addr.proto;
|
2020-09-22 15:30:19 +08:00
|
|
|
switch (ip->proto) {
|
|
|
|
case htons(ETH_P_IP):
|
2020-09-22 15:30:17 +08:00
|
|
|
ip->dst.ip4 = entry->addr.u.ip4;
|
2020-09-22 15:30:19 +08:00
|
|
|
if (mdb_attrs && mdb_attrs[MDBE_ATTR_SOURCE])
|
|
|
|
ip->src.ip4 = nla_get_in_addr(mdb_attrs[MDBE_ATTR_SOURCE]);
|
|
|
|
break;
|
2016-04-21 18:52:44 +08:00
|
|
|
#if IS_ENABLED(CONFIG_IPV6)
|
2020-09-22 15:30:19 +08:00
|
|
|
case htons(ETH_P_IPV6):
|
2020-09-22 15:30:17 +08:00
|
|
|
ip->dst.ip6 = entry->addr.u.ip6;
|
2020-09-22 15:30:19 +08:00
|
|
|
if (mdb_attrs && mdb_attrs[MDBE_ATTR_SOURCE])
|
|
|
|
ip->src.ip6 = nla_get_in6_addr(mdb_attrs[MDBE_ATTR_SOURCE]);
|
|
|
|
break;
|
2016-04-21 18:52:44 +08:00
|
|
|
#endif
|
2020-10-29 07:38:31 +08:00
|
|
|
default:
|
|
|
|
ether_addr_copy(ip->dst.mac_addr, entry->addr.u.mac_addr);
|
2020-09-22 15:30:19 +08:00
|
|
|
}
|
|
|
|
|
2016-04-21 18:52:44 +08:00
|
|
|
}
|
|
|
|
|
2020-09-07 17:56:08 +08:00
|
|
|
static int __mdb_fill_srcs(struct sk_buff *skb,
|
|
|
|
struct net_bridge_port_group *p)
|
|
|
|
{
|
|
|
|
struct net_bridge_group_src *ent;
|
|
|
|
struct nlattr *nest, *nest_ent;
|
|
|
|
|
|
|
|
if (hlist_empty(&p->src_list))
|
|
|
|
return 0;
|
|
|
|
|
|
|
|
nest = nla_nest_start(skb, MDBA_MDB_EATTR_SRC_LIST);
|
|
|
|
if (!nest)
|
|
|
|
return -EMSGSIZE;
|
|
|
|
|
|
|
|
hlist_for_each_entry_rcu(ent, &p->src_list, node,
|
2020-09-22 15:30:22 +08:00
|
|
|
lockdep_is_held(&p->key.port->br->multicast_lock)) {
|
2020-09-07 17:56:08 +08:00
|
|
|
nest_ent = nla_nest_start(skb, MDBA_MDB_SRCLIST_ENTRY);
|
|
|
|
if (!nest_ent)
|
|
|
|
goto out_cancel_err;
|
|
|
|
switch (ent->addr.proto) {
|
|
|
|
case htons(ETH_P_IP):
|
|
|
|
if (nla_put_in_addr(skb, MDBA_MDB_SRCATTR_ADDRESS,
|
2020-09-22 15:30:16 +08:00
|
|
|
ent->addr.src.ip4)) {
|
2020-09-07 17:56:08 +08:00
|
|
|
nla_nest_cancel(skb, nest_ent);
|
|
|
|
goto out_cancel_err;
|
|
|
|
}
|
|
|
|
break;
|
|
|
|
#if IS_ENABLED(CONFIG_IPV6)
|
|
|
|
case htons(ETH_P_IPV6):
|
|
|
|
if (nla_put_in6_addr(skb, MDBA_MDB_SRCATTR_ADDRESS,
|
2020-09-22 15:30:16 +08:00
|
|
|
&ent->addr.src.ip6)) {
|
2020-09-07 17:56:08 +08:00
|
|
|
nla_nest_cancel(skb, nest_ent);
|
|
|
|
goto out_cancel_err;
|
|
|
|
}
|
|
|
|
break;
|
|
|
|
#endif
|
|
|
|
default:
|
|
|
|
nla_nest_cancel(skb, nest_ent);
|
|
|
|
continue;
|
|
|
|
}
|
|
|
|
if (nla_put_u32(skb, MDBA_MDB_SRCATTR_TIMER,
|
|
|
|
br_timer_value(&ent->timer))) {
|
|
|
|
nla_nest_cancel(skb, nest_ent);
|
|
|
|
goto out_cancel_err;
|
|
|
|
}
|
|
|
|
nla_nest_end(skb, nest_ent);
|
|
|
|
}
|
|
|
|
|
|
|
|
nla_nest_end(skb, nest);
|
|
|
|
|
|
|
|
return 0;
|
|
|
|
|
|
|
|
out_cancel_err:
|
|
|
|
nla_nest_cancel(skb, nest);
|
|
|
|
return -EMSGSIZE;
|
|
|
|
}
|
|
|
|
|
2019-08-17 19:22:11 +08:00
|
|
|
static int __mdb_fill_info(struct sk_buff *skb,
|
2019-08-17 19:22:12 +08:00
|
|
|
struct net_bridge_mdb_entry *mp,
|
2019-08-17 19:22:11 +08:00
|
|
|
struct net_bridge_port_group *p)
|
|
|
|
{
|
2020-09-07 17:56:08 +08:00
|
|
|
bool dump_srcs_mode = false;
|
2019-08-17 19:22:12 +08:00
|
|
|
struct timer_list *mtimer;
|
2019-08-17 19:22:11 +08:00
|
|
|
struct nlattr *nest_ent;
|
|
|
|
struct br_mdb_entry e;
|
2019-08-17 19:22:12 +08:00
|
|
|
u8 flags = 0;
|
|
|
|
int ifindex;
|
2019-08-17 19:22:11 +08:00
|
|
|
|
|
|
|
memset(&e, 0, sizeof(e));
|
2019-08-17 19:22:12 +08:00
|
|
|
if (p) {
|
2020-09-22 15:30:22 +08:00
|
|
|
ifindex = p->key.port->dev->ifindex;
|
2019-08-17 19:22:12 +08:00
|
|
|
mtimer = &p->timer;
|
|
|
|
flags = p->flags;
|
|
|
|
} else {
|
|
|
|
ifindex = mp->br->dev->ifindex;
|
|
|
|
mtimer = &mp->timer;
|
|
|
|
}
|
|
|
|
|
|
|
|
__mdb_entry_fill_flags(&e, flags);
|
|
|
|
e.ifindex = ifindex;
|
|
|
|
e.vid = mp->addr.vid;
|
|
|
|
if (mp->addr.proto == htons(ETH_P_IP))
|
2020-09-22 15:30:17 +08:00
|
|
|
e.addr.u.ip4 = mp->addr.dst.ip4;
|
2019-08-17 19:22:11 +08:00
|
|
|
#if IS_ENABLED(CONFIG_IPV6)
|
2020-10-29 07:38:31 +08:00
|
|
|
else if (mp->addr.proto == htons(ETH_P_IPV6))
|
2020-09-22 15:30:17 +08:00
|
|
|
e.addr.u.ip6 = mp->addr.dst.ip6;
|
2019-08-17 19:22:11 +08:00
|
|
|
#endif
|
2020-10-29 07:38:31 +08:00
|
|
|
else
|
|
|
|
ether_addr_copy(e.addr.u.mac_addr, mp->addr.dst.mac_addr);
|
2019-08-17 19:22:12 +08:00
|
|
|
e.addr.proto = mp->addr.proto;
|
2019-08-17 19:22:11 +08:00
|
|
|
nest_ent = nla_nest_start_noflag(skb,
|
|
|
|
MDBA_MDB_ENTRY_INFO);
|
|
|
|
if (!nest_ent)
|
|
|
|
return -EMSGSIZE;
|
|
|
|
|
|
|
|
if (nla_put_nohdr(skb, sizeof(e), &e) ||
|
|
|
|
nla_put_u32(skb,
|
|
|
|
MDBA_MDB_EATTR_TIMER,
|
2020-09-22 15:30:19 +08:00
|
|
|
br_timer_value(mtimer)))
|
|
|
|
goto nest_err;
|
2020-09-22 15:30:21 +08:00
|
|
|
|
2020-09-07 17:56:08 +08:00
|
|
|
switch (mp->addr.proto) {
|
|
|
|
case htons(ETH_P_IP):
|
2021-07-20 01:06:24 +08:00
|
|
|
dump_srcs_mode = !!(mp->br->multicast_ctx.multicast_igmp_version == 3);
|
2020-09-22 15:30:19 +08:00
|
|
|
if (mp->addr.src.ip4) {
|
|
|
|
if (nla_put_in_addr(skb, MDBA_MDB_EATTR_SOURCE,
|
|
|
|
mp->addr.src.ip4))
|
|
|
|
goto nest_err;
|
|
|
|
break;
|
|
|
|
}
|
2020-09-07 17:56:08 +08:00
|
|
|
break;
|
|
|
|
#if IS_ENABLED(CONFIG_IPV6)
|
|
|
|
case htons(ETH_P_IPV6):
|
2021-07-20 01:06:24 +08:00
|
|
|
dump_srcs_mode = !!(mp->br->multicast_ctx.multicast_mld_version == 2);
|
2020-09-22 15:30:19 +08:00
|
|
|
if (!ipv6_addr_any(&mp->addr.src.ip6)) {
|
|
|
|
if (nla_put_in6_addr(skb, MDBA_MDB_EATTR_SOURCE,
|
|
|
|
&mp->addr.src.ip6))
|
|
|
|
goto nest_err;
|
|
|
|
break;
|
|
|
|
}
|
2020-09-07 17:56:08 +08:00
|
|
|
break;
|
|
|
|
#endif
|
2020-10-29 07:38:31 +08:00
|
|
|
default:
|
|
|
|
ether_addr_copy(e.addr.u.mac_addr, mp->addr.dst.mac_addr);
|
2020-09-07 17:56:08 +08:00
|
|
|
}
|
2020-09-22 15:30:21 +08:00
|
|
|
if (p) {
|
|
|
|
if (nla_put_u8(skb, MDBA_MDB_EATTR_RTPROT, p->rt_protocol))
|
|
|
|
goto nest_err;
|
|
|
|
if (dump_srcs_mode &&
|
|
|
|
(__mdb_fill_srcs(skb, p) ||
|
|
|
|
nla_put_u8(skb, MDBA_MDB_EATTR_GROUP_MODE,
|
|
|
|
p->filter_mode)))
|
|
|
|
goto nest_err;
|
|
|
|
}
|
2019-08-17 19:22:11 +08:00
|
|
|
nla_nest_end(skb, nest_ent);
|
|
|
|
|
|
|
|
return 0;
|
2020-09-22 15:30:19 +08:00
|
|
|
|
|
|
|
nest_err:
|
|
|
|
nla_nest_cancel(skb, nest_ent);
|
|
|
|
return -EMSGSIZE;
|
2019-08-17 19:22:11 +08:00
|
|
|
}
|
|
|
|
|
2012-12-07 08:04:48 +08:00
|
|
|
static int br_mdb_fill_info(struct sk_buff *skb, struct netlink_callback *cb,
|
|
|
|
struct net_device *dev)
|
|
|
|
{
|
2020-09-07 17:56:08 +08:00
|
|
|
int idx = 0, s_idx = cb->args[1], err = 0, pidx = 0, s_pidx = cb->args[2];
|
2012-12-07 08:04:48 +08:00
|
|
|
struct net_bridge *br = netdev_priv(dev);
|
2018-12-05 21:14:24 +08:00
|
|
|
struct net_bridge_mdb_entry *mp;
|
2012-12-07 08:04:48 +08:00
|
|
|
struct nlattr *nest, *nest2;
|
|
|
|
|
2018-09-26 22:01:03 +08:00
|
|
|
if (!br_opt_get(br, BROPT_MULTICAST_ENABLED))
|
2012-12-07 08:04:48 +08:00
|
|
|
return 0;
|
|
|
|
|
2019-04-26 17:13:06 +08:00
|
|
|
nest = nla_nest_start_noflag(skb, MDBA_MDB);
|
2012-12-07 08:04:48 +08:00
|
|
|
if (nest == NULL)
|
|
|
|
return -EMSGSIZE;
|
|
|
|
|
2018-12-05 21:14:24 +08:00
|
|
|
hlist_for_each_entry_rcu(mp, &br->mdb_list, mdb_node) {
|
2013-08-05 08:19:38 +08:00
|
|
|
struct net_bridge_port_group *p;
|
|
|
|
struct net_bridge_port_group __rcu **pp;
|
2012-12-07 08:04:48 +08:00
|
|
|
|
2018-12-05 21:14:24 +08:00
|
|
|
if (idx < s_idx)
|
|
|
|
goto skip;
|
2012-12-07 08:04:48 +08:00
|
|
|
|
2019-04-26 17:13:06 +08:00
|
|
|
nest2 = nla_nest_start_noflag(skb, MDBA_MDB_ENTRY);
|
2018-12-05 21:14:24 +08:00
|
|
|
if (!nest2) {
|
|
|
|
err = -EMSGSIZE;
|
|
|
|
break;
|
|
|
|
}
|
2012-12-07 08:04:48 +08:00
|
|
|
|
2020-09-07 17:56:08 +08:00
|
|
|
if (!s_pidx && mp->host_joined) {
|
2019-08-17 19:22:12 +08:00
|
|
|
err = __mdb_fill_info(skb, mp, NULL);
|
|
|
|
if (err) {
|
|
|
|
nla_nest_cancel(skb, nest2);
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2018-12-05 21:14:24 +08:00
|
|
|
for (pp = &mp->ports; (p = rcu_dereference(*pp)) != NULL;
|
|
|
|
pp = &p->next) {
|
2020-09-22 15:30:22 +08:00
|
|
|
if (!p->key.port)
|
2018-12-05 21:14:24 +08:00
|
|
|
continue;
|
2020-09-07 17:56:08 +08:00
|
|
|
if (pidx < s_pidx)
|
|
|
|
goto skip_pg;
|
2018-12-05 21:14:24 +08:00
|
|
|
|
2019-08-17 19:22:12 +08:00
|
|
|
err = __mdb_fill_info(skb, mp, p);
|
2019-08-17 19:22:11 +08:00
|
|
|
if (err) {
|
2020-09-11 21:24:47 +08:00
|
|
|
nla_nest_end(skb, nest2);
|
2018-12-05 21:14:24 +08:00
|
|
|
goto out;
|
|
|
|
}
|
2020-09-07 17:56:08 +08:00
|
|
|
skip_pg:
|
|
|
|
pidx++;
|
2012-12-07 08:04:48 +08:00
|
|
|
}
|
2020-09-07 17:56:08 +08:00
|
|
|
pidx = 0;
|
|
|
|
s_pidx = 0;
|
2018-12-05 21:14:24 +08:00
|
|
|
nla_nest_end(skb, nest2);
|
|
|
|
skip:
|
|
|
|
idx++;
|
2012-12-07 08:04:48 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
out:
|
|
|
|
cb->args[1] = idx;
|
2020-09-07 17:56:08 +08:00
|
|
|
cb->args[2] = pidx;
|
2012-12-07 08:04:48 +08:00
|
|
|
nla_nest_end(skb, nest);
|
|
|
|
return err;
|
|
|
|
}
|
|
|
|
|
2018-10-08 11:16:42 +08:00
|
|
|
static int br_mdb_valid_dump_req(const struct nlmsghdr *nlh,
|
|
|
|
struct netlink_ext_ack *extack)
|
|
|
|
{
|
|
|
|
struct br_port_msg *bpm;
|
|
|
|
|
|
|
|
if (nlh->nlmsg_len < nlmsg_msg_size(sizeof(*bpm))) {
|
|
|
|
NL_SET_ERR_MSG_MOD(extack, "Invalid header for mdb dump request");
|
|
|
|
return -EINVAL;
|
|
|
|
}
|
|
|
|
|
|
|
|
bpm = nlmsg_data(nlh);
|
|
|
|
if (bpm->ifindex) {
|
|
|
|
NL_SET_ERR_MSG_MOD(extack, "Filtering by device index is not supported for mdb dump request");
|
|
|
|
return -EINVAL;
|
|
|
|
}
|
|
|
|
if (nlmsg_attrlen(nlh, sizeof(*bpm))) {
|
|
|
|
NL_SET_ERR_MSG(extack, "Invalid data after header in mdb dump request");
|
|
|
|
return -EINVAL;
|
|
|
|
}
|
|
|
|
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2012-12-07 08:04:48 +08:00
|
|
|
static int br_mdb_dump(struct sk_buff *skb, struct netlink_callback *cb)
|
|
|
|
{
|
|
|
|
struct net_device *dev;
|
|
|
|
struct net *net = sock_net(skb->sk);
|
|
|
|
struct nlmsghdr *nlh = NULL;
|
|
|
|
int idx = 0, s_idx;
|
|
|
|
|
2018-10-08 11:16:42 +08:00
|
|
|
if (cb->strict_check) {
|
|
|
|
int err = br_mdb_valid_dump_req(cb->nlh, cb->extack);
|
|
|
|
|
|
|
|
if (err < 0)
|
|
|
|
return err;
|
|
|
|
}
|
|
|
|
|
2012-12-07 08:04:48 +08:00
|
|
|
s_idx = cb->args[0];
|
|
|
|
|
|
|
|
rcu_read_lock();
|
|
|
|
|
2018-12-05 21:14:24 +08:00
|
|
|
cb->seq = net->dev_base_seq;
|
2012-12-07 08:04:48 +08:00
|
|
|
|
|
|
|
for_each_netdev_rcu(net, dev) {
|
|
|
|
if (dev->priv_flags & IFF_EBRIDGE) {
|
2021-08-10 23:29:32 +08:00
|
|
|
struct net_bridge *br = netdev_priv(dev);
|
2012-12-07 08:04:48 +08:00
|
|
|
struct br_port_msg *bpm;
|
|
|
|
|
|
|
|
if (idx < s_idx)
|
|
|
|
goto skip;
|
|
|
|
|
|
|
|
nlh = nlmsg_put(skb, NETLINK_CB(cb->skb).portid,
|
|
|
|
cb->nlh->nlmsg_seq, RTM_GETMDB,
|
|
|
|
sizeof(*bpm), NLM_F_MULTI);
|
|
|
|
if (nlh == NULL)
|
|
|
|
break;
|
|
|
|
|
|
|
|
bpm = nlmsg_data(nlh);
|
2013-03-09 13:52:19 +08:00
|
|
|
memset(bpm, 0, sizeof(*bpm));
|
2012-12-07 08:04:48 +08:00
|
|
|
bpm->ifindex = dev->ifindex;
|
|
|
|
if (br_mdb_fill_info(skb, cb, dev) < 0)
|
|
|
|
goto out;
|
2021-08-10 23:29:32 +08:00
|
|
|
if (br_rports_fill_info(skb, &br->multicast_ctx) < 0)
|
2012-12-07 08:04:48 +08:00
|
|
|
goto out;
|
|
|
|
|
|
|
|
cb->args[1] = 0;
|
|
|
|
nlmsg_end(skb, nlh);
|
|
|
|
skip:
|
|
|
|
idx++;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
out:
|
|
|
|
if (nlh)
|
|
|
|
nlmsg_end(skb, nlh);
|
|
|
|
rcu_read_unlock();
|
|
|
|
cb->args[0] = idx;
|
|
|
|
return skb->len;
|
|
|
|
}
|
|
|
|
|
2012-12-12 06:23:07 +08:00
|
|
|
static int nlmsg_populate_mdb_fill(struct sk_buff *skb,
|
|
|
|
struct net_device *dev,
|
2020-09-07 17:56:12 +08:00
|
|
|
struct net_bridge_mdb_entry *mp,
|
|
|
|
struct net_bridge_port_group *pg,
|
|
|
|
int type)
|
2012-12-12 06:23:07 +08:00
|
|
|
{
|
|
|
|
struct nlmsghdr *nlh;
|
|
|
|
struct br_port_msg *bpm;
|
|
|
|
struct nlattr *nest, *nest2;
|
|
|
|
|
2020-09-07 17:56:12 +08:00
|
|
|
nlh = nlmsg_put(skb, 0, 0, type, sizeof(*bpm), 0);
|
2012-12-12 06:23:07 +08:00
|
|
|
if (!nlh)
|
|
|
|
return -EMSGSIZE;
|
|
|
|
|
|
|
|
bpm = nlmsg_data(nlh);
|
2013-03-09 13:52:19 +08:00
|
|
|
memset(bpm, 0, sizeof(*bpm));
|
2012-12-12 06:23:07 +08:00
|
|
|
bpm->family = AF_BRIDGE;
|
|
|
|
bpm->ifindex = dev->ifindex;
|
2019-04-26 17:13:06 +08:00
|
|
|
nest = nla_nest_start_noflag(skb, MDBA_MDB);
|
2012-12-12 06:23:07 +08:00
|
|
|
if (nest == NULL)
|
|
|
|
goto cancel;
|
2019-04-26 17:13:06 +08:00
|
|
|
nest2 = nla_nest_start_noflag(skb, MDBA_MDB_ENTRY);
|
2012-12-12 06:23:07 +08:00
|
|
|
if (nest2 == NULL)
|
|
|
|
goto end;
|
|
|
|
|
2020-09-07 17:56:12 +08:00
|
|
|
if (__mdb_fill_info(skb, mp, pg))
|
2012-12-12 06:23:07 +08:00
|
|
|
goto end;
|
|
|
|
|
|
|
|
nla_nest_end(skb, nest2);
|
|
|
|
nla_nest_end(skb, nest);
|
2015-01-17 05:09:00 +08:00
|
|
|
nlmsg_end(skb, nlh);
|
|
|
|
return 0;
|
2012-12-12 06:23:07 +08:00
|
|
|
|
|
|
|
end:
|
|
|
|
nla_nest_end(skb, nest);
|
|
|
|
cancel:
|
|
|
|
nlmsg_cancel(skb, nlh);
|
|
|
|
return -EMSGSIZE;
|
|
|
|
}
|
|
|
|
|
2020-09-07 17:56:12 +08:00
|
|
|
static size_t rtnl_mdb_nlmsg_size(struct net_bridge_port_group *pg)
|
2012-12-12 06:23:07 +08:00
|
|
|
{
|
2020-09-07 17:56:12 +08:00
|
|
|
size_t nlmsg_size = NLMSG_ALIGN(sizeof(struct br_port_msg)) +
|
|
|
|
nla_total_size(sizeof(struct br_mdb_entry)) +
|
|
|
|
nla_total_size(sizeof(u32));
|
|
|
|
struct net_bridge_group_src *ent;
|
|
|
|
size_t addr_size = 0;
|
|
|
|
|
|
|
|
if (!pg)
|
|
|
|
goto out;
|
|
|
|
|
2020-09-22 15:30:21 +08:00
|
|
|
/* MDBA_MDB_EATTR_RTPROT */
|
|
|
|
nlmsg_size += nla_total_size(sizeof(u8));
|
|
|
|
|
2020-09-22 15:30:22 +08:00
|
|
|
switch (pg->key.addr.proto) {
|
2020-09-07 17:56:12 +08:00
|
|
|
case htons(ETH_P_IP):
|
2020-09-22 15:30:19 +08:00
|
|
|
/* MDBA_MDB_EATTR_SOURCE */
|
2020-09-22 15:30:22 +08:00
|
|
|
if (pg->key.addr.src.ip4)
|
2020-09-22 15:30:19 +08:00
|
|
|
nlmsg_size += nla_total_size(sizeof(__be32));
|
2021-07-20 01:06:24 +08:00
|
|
|
if (pg->key.port->br->multicast_ctx.multicast_igmp_version == 2)
|
2020-09-07 17:56:12 +08:00
|
|
|
goto out;
|
|
|
|
addr_size = sizeof(__be32);
|
|
|
|
break;
|
|
|
|
#if IS_ENABLED(CONFIG_IPV6)
|
|
|
|
case htons(ETH_P_IPV6):
|
2020-09-22 15:30:19 +08:00
|
|
|
/* MDBA_MDB_EATTR_SOURCE */
|
2020-09-22 15:30:22 +08:00
|
|
|
if (!ipv6_addr_any(&pg->key.addr.src.ip6))
|
2020-09-22 15:30:19 +08:00
|
|
|
nlmsg_size += nla_total_size(sizeof(struct in6_addr));
|
2021-07-20 01:06:24 +08:00
|
|
|
if (pg->key.port->br->multicast_ctx.multicast_mld_version == 1)
|
2020-09-07 17:56:12 +08:00
|
|
|
goto out;
|
|
|
|
addr_size = sizeof(struct in6_addr);
|
|
|
|
break;
|
|
|
|
#endif
|
|
|
|
}
|
|
|
|
|
|
|
|
/* MDBA_MDB_EATTR_GROUP_MODE */
|
|
|
|
nlmsg_size += nla_total_size(sizeof(u8));
|
|
|
|
|
|
|
|
/* MDBA_MDB_EATTR_SRC_LIST nested attr */
|
|
|
|
if (!hlist_empty(&pg->src_list))
|
|
|
|
nlmsg_size += nla_total_size(0);
|
|
|
|
|
|
|
|
hlist_for_each_entry(ent, &pg->src_list, node) {
|
|
|
|
/* MDBA_MDB_SRCLIST_ENTRY nested attr +
|
|
|
|
* MDBA_MDB_SRCATTR_ADDRESS + MDBA_MDB_SRCATTR_TIMER
|
|
|
|
*/
|
|
|
|
nlmsg_size += nla_total_size(0) +
|
|
|
|
nla_total_size(addr_size) +
|
|
|
|
nla_total_size(sizeof(u32));
|
|
|
|
}
|
|
|
|
out:
|
|
|
|
return nlmsg_size;
|
2012-12-12 06:23:07 +08:00
|
|
|
}
|
|
|
|
|
2016-04-21 18:52:45 +08:00
|
|
|
struct br_mdb_complete_info {
|
|
|
|
struct net_bridge_port *port;
|
|
|
|
struct br_ip ip;
|
|
|
|
};
|
|
|
|
|
|
|
|
static void br_mdb_complete(struct net_device *dev, int err, void *priv)
|
2012-12-12 06:23:07 +08:00
|
|
|
{
|
2016-04-21 18:52:45 +08:00
|
|
|
struct br_mdb_complete_info *data = priv;
|
|
|
|
struct net_bridge_port_group __rcu **pp;
|
|
|
|
struct net_bridge_port_group *p;
|
|
|
|
struct net_bridge_mdb_entry *mp;
|
|
|
|
struct net_bridge_port *port = data->port;
|
|
|
|
struct net_bridge *br = port->br;
|
|
|
|
|
|
|
|
if (err)
|
|
|
|
goto err;
|
|
|
|
|
|
|
|
spin_lock_bh(&br->multicast_lock);
|
2018-12-05 21:14:24 +08:00
|
|
|
mp = br_mdb_ip_get(br, &data->ip);
|
2016-04-21 18:52:45 +08:00
|
|
|
if (!mp)
|
|
|
|
goto out;
|
|
|
|
for (pp = &mp->ports; (p = mlock_dereference(*pp, br)) != NULL;
|
|
|
|
pp = &p->next) {
|
2020-09-22 15:30:22 +08:00
|
|
|
if (p->key.port != port)
|
2016-04-21 18:52:45 +08:00
|
|
|
continue;
|
|
|
|
p->flags |= MDB_PG_FLAGS_OFFLOAD;
|
|
|
|
}
|
|
|
|
out:
|
|
|
|
spin_unlock_bh(&br->multicast_lock);
|
|
|
|
err:
|
|
|
|
kfree(priv);
|
|
|
|
}
|
|
|
|
|
net: bridge: add helper to replay port and host-joined mdb entries
I have a system with DSA ports, and udhcpcd is configured to bring
interfaces up as soon as they are created.
I create a bridge as follows:
ip link add br0 type bridge
As soon as I create the bridge and udhcpcd brings it up, I also have
avahi which automatically starts sending IPv6 packets to advertise some
local services, and because of that, the br0 bridge joins the following
IPv6 groups due to the code path detailed below:
33:33:ff:6d:c1:9c vid 0
33:33:00:00:00:6a vid 0
33:33:00:00:00:fb vid 0
br_dev_xmit
-> br_multicast_rcv
-> br_ip6_multicast_add_group
-> __br_multicast_add_group
-> br_multicast_host_join
-> br_mdb_notify
This is all fine, but inside br_mdb_notify we have br_mdb_switchdev_host
hooked up, and switchdev will attempt to offload the host joined groups
to an empty list of ports. Of course nobody offloads them.
Then when we add a port to br0:
ip link set swp0 master br0
the bridge doesn't replay the host-joined MDB entries from br_add_if,
and eventually the host joined addresses expire, and a switchdev
notification for deleting it is emitted, but surprise, the original
addition was already completely missed.
The strategy to address this problem is to replay the MDB entries (both
the port ones and the host joined ones) when the new port joins the
bridge, similar to what vxlan_fdb_replay does (in that case, its FDB can
be populated and only then attached to a bridge that you offload).
However there are 2 possibilities: the addresses can be 'pushed' by the
bridge into the port, or the port can 'pull' them from the bridge.
Considering that in the general case, the new port can be really late to
the party, and there may have been many other switchdev ports that
already received the initial notification, we would like to avoid
delivering duplicate events to them, since they might misbehave. And
currently, the bridge calls the entire switchdev notifier chain, whereas
for replaying it should just call the notifier block of the new guy.
But the bridge doesn't know what is the new guy's notifier block, it
just knows where the switchdev notifier chain is. So for simplification,
we make this a driver-initiated pull for now, and the notifier block is
passed as an argument.
To emulate the calling context for mdb objects (deferred and put on the
blocking notifier chain), we must iterate under RCU protection through
the bridge's mdb entries, queue them, and only call them once we're out
of the RCU read-side critical section.
There was some opportunity for reuse between br_mdb_switchdev_host_port,
br_mdb_notify and the newly added br_mdb_queue_one in how the switchdev
mdb object is created, so a helper was created.
Suggested-by: Ido Schimmel <idosch@idosch.org>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Acked-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-23 07:51:44 +08:00
|
|
|
static void br_switchdev_mdb_populate(struct switchdev_obj_port_mdb *mdb,
|
|
|
|
const struct net_bridge_mdb_entry *mp)
|
|
|
|
{
|
|
|
|
if (mp->addr.proto == htons(ETH_P_IP))
|
|
|
|
ip_eth_mc_map(mp->addr.dst.ip4, mdb->addr);
|
|
|
|
#if IS_ENABLED(CONFIG_IPV6)
|
|
|
|
else if (mp->addr.proto == htons(ETH_P_IPV6))
|
|
|
|
ipv6_eth_mc_map(&mp->addr.dst.ip6, mdb->addr);
|
|
|
|
#endif
|
|
|
|
else
|
|
|
|
ether_addr_copy(mdb->addr, mp->addr.dst.mac_addr);
|
|
|
|
|
|
|
|
mdb->vid = mp->addr.vid;
|
|
|
|
}
|
|
|
|
|
|
|
|
static int br_mdb_replay_one(struct notifier_block *nb, struct net_device *dev,
|
2021-06-27 19:54:26 +08:00
|
|
|
const struct switchdev_obj_port_mdb *mdb,
|
2021-06-27 19:54:27 +08:00
|
|
|
unsigned long action, const void *ctx,
|
|
|
|
struct netlink_ext_ack *extack)
|
net: bridge: add helper to replay port and host-joined mdb entries
I have a system with DSA ports, and udhcpcd is configured to bring
interfaces up as soon as they are created.
I create a bridge as follows:
ip link add br0 type bridge
As soon as I create the bridge and udhcpcd brings it up, I also have
avahi which automatically starts sending IPv6 packets to advertise some
local services, and because of that, the br0 bridge joins the following
IPv6 groups due to the code path detailed below:
33:33:ff:6d:c1:9c vid 0
33:33:00:00:00:6a vid 0
33:33:00:00:00:fb vid 0
br_dev_xmit
-> br_multicast_rcv
-> br_ip6_multicast_add_group
-> __br_multicast_add_group
-> br_multicast_host_join
-> br_mdb_notify
This is all fine, but inside br_mdb_notify we have br_mdb_switchdev_host
hooked up, and switchdev will attempt to offload the host joined groups
to an empty list of ports. Of course nobody offloads them.
Then when we add a port to br0:
ip link set swp0 master br0
the bridge doesn't replay the host-joined MDB entries from br_add_if,
and eventually the host joined addresses expire, and a switchdev
notification for deleting it is emitted, but surprise, the original
addition was already completely missed.
The strategy to address this problem is to replay the MDB entries (both
the port ones and the host joined ones) when the new port joins the
bridge, similar to what vxlan_fdb_replay does (in that case, its FDB can
be populated and only then attached to a bridge that you offload).
However there are 2 possibilities: the addresses can be 'pushed' by the
bridge into the port, or the port can 'pull' them from the bridge.
Considering that in the general case, the new port can be really late to
the party, and there may have been many other switchdev ports that
already received the initial notification, we would like to avoid
delivering duplicate events to them, since they might misbehave. And
currently, the bridge calls the entire switchdev notifier chain, whereas
for replaying it should just call the notifier block of the new guy.
But the bridge doesn't know what is the new guy's notifier block, it
just knows where the switchdev notifier chain is. So for simplification,
we make this a driver-initiated pull for now, and the notifier block is
passed as an argument.
To emulate the calling context for mdb objects (deferred and put on the
blocking notifier chain), we must iterate under RCU protection through
the bridge's mdb entries, queue them, and only call them once we're out
of the RCU read-side critical section.
There was some opportunity for reuse between br_mdb_switchdev_host_port,
br_mdb_notify and the newly added br_mdb_queue_one in how the switchdev
mdb object is created, so a helper was created.
Suggested-by: Ido Schimmel <idosch@idosch.org>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Acked-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-23 07:51:44 +08:00
|
|
|
{
|
|
|
|
struct switchdev_notifier_port_obj_info obj_info = {
|
|
|
|
.info = {
|
|
|
|
.dev = dev,
|
|
|
|
.extack = extack,
|
2021-06-27 19:54:25 +08:00
|
|
|
.ctx = ctx,
|
net: bridge: add helper to replay port and host-joined mdb entries
I have a system with DSA ports, and udhcpcd is configured to bring
interfaces up as soon as they are created.
I create a bridge as follows:
ip link add br0 type bridge
As soon as I create the bridge and udhcpcd brings it up, I also have
avahi which automatically starts sending IPv6 packets to advertise some
local services, and because of that, the br0 bridge joins the following
IPv6 groups due to the code path detailed below:
33:33:ff:6d:c1:9c vid 0
33:33:00:00:00:6a vid 0
33:33:00:00:00:fb vid 0
br_dev_xmit
-> br_multicast_rcv
-> br_ip6_multicast_add_group
-> __br_multicast_add_group
-> br_multicast_host_join
-> br_mdb_notify
This is all fine, but inside br_mdb_notify we have br_mdb_switchdev_host
hooked up, and switchdev will attempt to offload the host joined groups
to an empty list of ports. Of course nobody offloads them.
Then when we add a port to br0:
ip link set swp0 master br0
the bridge doesn't replay the host-joined MDB entries from br_add_if,
and eventually the host joined addresses expire, and a switchdev
notification for deleting it is emitted, but surprise, the original
addition was already completely missed.
The strategy to address this problem is to replay the MDB entries (both
the port ones and the host joined ones) when the new port joins the
bridge, similar to what vxlan_fdb_replay does (in that case, its FDB can
be populated and only then attached to a bridge that you offload).
However there are 2 possibilities: the addresses can be 'pushed' by the
bridge into the port, or the port can 'pull' them from the bridge.
Considering that in the general case, the new port can be really late to
the party, and there may have been many other switchdev ports that
already received the initial notification, we would like to avoid
delivering duplicate events to them, since they might misbehave. And
currently, the bridge calls the entire switchdev notifier chain, whereas
for replaying it should just call the notifier block of the new guy.
But the bridge doesn't know what is the new guy's notifier block, it
just knows where the switchdev notifier chain is. So for simplification,
we make this a driver-initiated pull for now, and the notifier block is
passed as an argument.
To emulate the calling context for mdb objects (deferred and put on the
blocking notifier chain), we must iterate under RCU protection through
the bridge's mdb entries, queue them, and only call them once we're out
of the RCU read-side critical section.
There was some opportunity for reuse between br_mdb_switchdev_host_port,
br_mdb_notify and the newly added br_mdb_queue_one in how the switchdev
mdb object is created, so a helper was created.
Suggested-by: Ido Schimmel <idosch@idosch.org>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Acked-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-23 07:51:44 +08:00
|
|
|
},
|
|
|
|
.obj = &mdb->obj,
|
|
|
|
};
|
|
|
|
int err;
|
|
|
|
|
2021-06-27 19:54:27 +08:00
|
|
|
err = nb->notifier_call(nb, action, &obj_info);
|
net: bridge: add helper to replay port and host-joined mdb entries
I have a system with DSA ports, and udhcpcd is configured to bring
interfaces up as soon as they are created.
I create a bridge as follows:
ip link add br0 type bridge
As soon as I create the bridge and udhcpcd brings it up, I also have
avahi which automatically starts sending IPv6 packets to advertise some
local services, and because of that, the br0 bridge joins the following
IPv6 groups due to the code path detailed below:
33:33:ff:6d:c1:9c vid 0
33:33:00:00:00:6a vid 0
33:33:00:00:00:fb vid 0
br_dev_xmit
-> br_multicast_rcv
-> br_ip6_multicast_add_group
-> __br_multicast_add_group
-> br_multicast_host_join
-> br_mdb_notify
This is all fine, but inside br_mdb_notify we have br_mdb_switchdev_host
hooked up, and switchdev will attempt to offload the host joined groups
to an empty list of ports. Of course nobody offloads them.
Then when we add a port to br0:
ip link set swp0 master br0
the bridge doesn't replay the host-joined MDB entries from br_add_if,
and eventually the host joined addresses expire, and a switchdev
notification for deleting it is emitted, but surprise, the original
addition was already completely missed.
The strategy to address this problem is to replay the MDB entries (both
the port ones and the host joined ones) when the new port joins the
bridge, similar to what vxlan_fdb_replay does (in that case, its FDB can
be populated and only then attached to a bridge that you offload).
However there are 2 possibilities: the addresses can be 'pushed' by the
bridge into the port, or the port can 'pull' them from the bridge.
Considering that in the general case, the new port can be really late to
the party, and there may have been many other switchdev ports that
already received the initial notification, we would like to avoid
delivering duplicate events to them, since they might misbehave. And
currently, the bridge calls the entire switchdev notifier chain, whereas
for replaying it should just call the notifier block of the new guy.
But the bridge doesn't know what is the new guy's notifier block, it
just knows where the switchdev notifier chain is. So for simplification,
we make this a driver-initiated pull for now, and the notifier block is
passed as an argument.
To emulate the calling context for mdb objects (deferred and put on the
blocking notifier chain), we must iterate under RCU protection through
the bridge's mdb entries, queue them, and only call them once we're out
of the RCU read-side critical section.
There was some opportunity for reuse between br_mdb_switchdev_host_port,
br_mdb_notify and the newly added br_mdb_queue_one in how the switchdev
mdb object is created, so a helper was created.
Suggested-by: Ido Schimmel <idosch@idosch.org>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Acked-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-23 07:51:44 +08:00
|
|
|
return notifier_to_errno(err);
|
|
|
|
}
|
|
|
|
|
|
|
|
static int br_mdb_queue_one(struct list_head *mdb_list,
|
|
|
|
enum switchdev_obj_id id,
|
|
|
|
const struct net_bridge_mdb_entry *mp,
|
|
|
|
struct net_device *orig_dev)
|
|
|
|
{
|
|
|
|
struct switchdev_obj_port_mdb *mdb;
|
|
|
|
|
|
|
|
mdb = kzalloc(sizeof(*mdb), GFP_ATOMIC);
|
|
|
|
if (!mdb)
|
|
|
|
return -ENOMEM;
|
|
|
|
|
|
|
|
mdb->obj.id = id;
|
|
|
|
mdb->obj.orig_dev = orig_dev;
|
|
|
|
br_switchdev_mdb_populate(mdb, mp);
|
|
|
|
list_add_tail(&mdb->obj.list, mdb_list);
|
|
|
|
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
|
|
|
int br_mdb_replay(struct net_device *br_dev, struct net_device *dev,
|
2021-06-27 19:54:27 +08:00
|
|
|
const void *ctx, bool adding, struct notifier_block *nb,
|
2021-06-27 19:54:25 +08:00
|
|
|
struct netlink_ext_ack *extack)
|
net: bridge: add helper to replay port and host-joined mdb entries
I have a system with DSA ports, and udhcpcd is configured to bring
interfaces up as soon as they are created.
I create a bridge as follows:
ip link add br0 type bridge
As soon as I create the bridge and udhcpcd brings it up, I also have
avahi which automatically starts sending IPv6 packets to advertise some
local services, and because of that, the br0 bridge joins the following
IPv6 groups due to the code path detailed below:
33:33:ff:6d:c1:9c vid 0
33:33:00:00:00:6a vid 0
33:33:00:00:00:fb vid 0
br_dev_xmit
-> br_multicast_rcv
-> br_ip6_multicast_add_group
-> __br_multicast_add_group
-> br_multicast_host_join
-> br_mdb_notify
This is all fine, but inside br_mdb_notify we have br_mdb_switchdev_host
hooked up, and switchdev will attempt to offload the host joined groups
to an empty list of ports. Of course nobody offloads them.
Then when we add a port to br0:
ip link set swp0 master br0
the bridge doesn't replay the host-joined MDB entries from br_add_if,
and eventually the host joined addresses expire, and a switchdev
notification for deleting it is emitted, but surprise, the original
addition was already completely missed.
The strategy to address this problem is to replay the MDB entries (both
the port ones and the host joined ones) when the new port joins the
bridge, similar to what vxlan_fdb_replay does (in that case, its FDB can
be populated and only then attached to a bridge that you offload).
However there are 2 possibilities: the addresses can be 'pushed' by the
bridge into the port, or the port can 'pull' them from the bridge.
Considering that in the general case, the new port can be really late to
the party, and there may have been many other switchdev ports that
already received the initial notification, we would like to avoid
delivering duplicate events to them, since they might misbehave. And
currently, the bridge calls the entire switchdev notifier chain, whereas
for replaying it should just call the notifier block of the new guy.
But the bridge doesn't know what is the new guy's notifier block, it
just knows where the switchdev notifier chain is. So for simplification,
we make this a driver-initiated pull for now, and the notifier block is
passed as an argument.
To emulate the calling context for mdb objects (deferred and put on the
blocking notifier chain), we must iterate under RCU protection through
the bridge's mdb entries, queue them, and only call them once we're out
of the RCU read-side critical section.
There was some opportunity for reuse between br_mdb_switchdev_host_port,
br_mdb_notify and the newly added br_mdb_queue_one in how the switchdev
mdb object is created, so a helper was created.
Suggested-by: Ido Schimmel <idosch@idosch.org>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Acked-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-23 07:51:44 +08:00
|
|
|
{
|
2021-06-27 19:54:26 +08:00
|
|
|
const struct net_bridge_mdb_entry *mp;
|
net: bridge: add helper to replay port and host-joined mdb entries
I have a system with DSA ports, and udhcpcd is configured to bring
interfaces up as soon as they are created.
I create a bridge as follows:
ip link add br0 type bridge
As soon as I create the bridge and udhcpcd brings it up, I also have
avahi which automatically starts sending IPv6 packets to advertise some
local services, and because of that, the br0 bridge joins the following
IPv6 groups due to the code path detailed below:
33:33:ff:6d:c1:9c vid 0
33:33:00:00:00:6a vid 0
33:33:00:00:00:fb vid 0
br_dev_xmit
-> br_multicast_rcv
-> br_ip6_multicast_add_group
-> __br_multicast_add_group
-> br_multicast_host_join
-> br_mdb_notify
This is all fine, but inside br_mdb_notify we have br_mdb_switchdev_host
hooked up, and switchdev will attempt to offload the host joined groups
to an empty list of ports. Of course nobody offloads them.
Then when we add a port to br0:
ip link set swp0 master br0
the bridge doesn't replay the host-joined MDB entries from br_add_if,
and eventually the host joined addresses expire, and a switchdev
notification for deleting it is emitted, but surprise, the original
addition was already completely missed.
The strategy to address this problem is to replay the MDB entries (both
the port ones and the host joined ones) when the new port joins the
bridge, similar to what vxlan_fdb_replay does (in that case, its FDB can
be populated and only then attached to a bridge that you offload).
However there are 2 possibilities: the addresses can be 'pushed' by the
bridge into the port, or the port can 'pull' them from the bridge.
Considering that in the general case, the new port can be really late to
the party, and there may have been many other switchdev ports that
already received the initial notification, we would like to avoid
delivering duplicate events to them, since they might misbehave. And
currently, the bridge calls the entire switchdev notifier chain, whereas
for replaying it should just call the notifier block of the new guy.
But the bridge doesn't know what is the new guy's notifier block, it
just knows where the switchdev notifier chain is. So for simplification,
we make this a driver-initiated pull for now, and the notifier block is
passed as an argument.
To emulate the calling context for mdb objects (deferred and put on the
blocking notifier chain), we must iterate under RCU protection through
the bridge's mdb entries, queue them, and only call them once we're out
of the RCU read-side critical section.
There was some opportunity for reuse between br_mdb_switchdev_host_port,
br_mdb_notify and the newly added br_mdb_queue_one in how the switchdev
mdb object is created, so a helper was created.
Suggested-by: Ido Schimmel <idosch@idosch.org>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Acked-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-23 07:51:44 +08:00
|
|
|
struct switchdev_obj *obj, *tmp;
|
|
|
|
struct net_bridge *br;
|
2021-06-27 19:54:27 +08:00
|
|
|
unsigned long action;
|
net: bridge: add helper to replay port and host-joined mdb entries
I have a system with DSA ports, and udhcpcd is configured to bring
interfaces up as soon as they are created.
I create a bridge as follows:
ip link add br0 type bridge
As soon as I create the bridge and udhcpcd brings it up, I also have
avahi which automatically starts sending IPv6 packets to advertise some
local services, and because of that, the br0 bridge joins the following
IPv6 groups due to the code path detailed below:
33:33:ff:6d:c1:9c vid 0
33:33:00:00:00:6a vid 0
33:33:00:00:00:fb vid 0
br_dev_xmit
-> br_multicast_rcv
-> br_ip6_multicast_add_group
-> __br_multicast_add_group
-> br_multicast_host_join
-> br_mdb_notify
This is all fine, but inside br_mdb_notify we have br_mdb_switchdev_host
hooked up, and switchdev will attempt to offload the host joined groups
to an empty list of ports. Of course nobody offloads them.
Then when we add a port to br0:
ip link set swp0 master br0
the bridge doesn't replay the host-joined MDB entries from br_add_if,
and eventually the host joined addresses expire, and a switchdev
notification for deleting it is emitted, but surprise, the original
addition was already completely missed.
The strategy to address this problem is to replay the MDB entries (both
the port ones and the host joined ones) when the new port joins the
bridge, similar to what vxlan_fdb_replay does (in that case, its FDB can
be populated and only then attached to a bridge that you offload).
However there are 2 possibilities: the addresses can be 'pushed' by the
bridge into the port, or the port can 'pull' them from the bridge.
Considering that in the general case, the new port can be really late to
the party, and there may have been many other switchdev ports that
already received the initial notification, we would like to avoid
delivering duplicate events to them, since they might misbehave. And
currently, the bridge calls the entire switchdev notifier chain, whereas
for replaying it should just call the notifier block of the new guy.
But the bridge doesn't know what is the new guy's notifier block, it
just knows where the switchdev notifier chain is. So for simplification,
we make this a driver-initiated pull for now, and the notifier block is
passed as an argument.
To emulate the calling context for mdb objects (deferred and put on the
blocking notifier chain), we must iterate under RCU protection through
the bridge's mdb entries, queue them, and only call them once we're out
of the RCU read-side critical section.
There was some opportunity for reuse between br_mdb_switchdev_host_port,
br_mdb_notify and the newly added br_mdb_queue_one in how the switchdev
mdb object is created, so a helper was created.
Suggested-by: Ido Schimmel <idosch@idosch.org>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Acked-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-23 07:51:44 +08:00
|
|
|
LIST_HEAD(mdb_list);
|
|
|
|
int err = 0;
|
|
|
|
|
|
|
|
ASSERT_RTNL();
|
|
|
|
|
2021-07-22 00:24:02 +08:00
|
|
|
if (!nb)
|
|
|
|
return 0;
|
|
|
|
|
net: bridge: add helper to replay port and host-joined mdb entries
I have a system with DSA ports, and udhcpcd is configured to bring
interfaces up as soon as they are created.
I create a bridge as follows:
ip link add br0 type bridge
As soon as I create the bridge and udhcpcd brings it up, I also have
avahi which automatically starts sending IPv6 packets to advertise some
local services, and because of that, the br0 bridge joins the following
IPv6 groups due to the code path detailed below:
33:33:ff:6d:c1:9c vid 0
33:33:00:00:00:6a vid 0
33:33:00:00:00:fb vid 0
br_dev_xmit
-> br_multicast_rcv
-> br_ip6_multicast_add_group
-> __br_multicast_add_group
-> br_multicast_host_join
-> br_mdb_notify
This is all fine, but inside br_mdb_notify we have br_mdb_switchdev_host
hooked up, and switchdev will attempt to offload the host joined groups
to an empty list of ports. Of course nobody offloads them.
Then when we add a port to br0:
ip link set swp0 master br0
the bridge doesn't replay the host-joined MDB entries from br_add_if,
and eventually the host joined addresses expire, and a switchdev
notification for deleting it is emitted, but surprise, the original
addition was already completely missed.
The strategy to address this problem is to replay the MDB entries (both
the port ones and the host joined ones) when the new port joins the
bridge, similar to what vxlan_fdb_replay does (in that case, its FDB can
be populated and only then attached to a bridge that you offload).
However there are 2 possibilities: the addresses can be 'pushed' by the
bridge into the port, or the port can 'pull' them from the bridge.
Considering that in the general case, the new port can be really late to
the party, and there may have been many other switchdev ports that
already received the initial notification, we would like to avoid
delivering duplicate events to them, since they might misbehave. And
currently, the bridge calls the entire switchdev notifier chain, whereas
for replaying it should just call the notifier block of the new guy.
But the bridge doesn't know what is the new guy's notifier block, it
just knows where the switchdev notifier chain is. So for simplification,
we make this a driver-initiated pull for now, and the notifier block is
passed as an argument.
To emulate the calling context for mdb objects (deferred and put on the
blocking notifier chain), we must iterate under RCU protection through
the bridge's mdb entries, queue them, and only call them once we're out
of the RCU read-side critical section.
There was some opportunity for reuse between br_mdb_switchdev_host_port,
br_mdb_notify and the newly added br_mdb_queue_one in how the switchdev
mdb object is created, so a helper was created.
Suggested-by: Ido Schimmel <idosch@idosch.org>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Acked-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-23 07:51:44 +08:00
|
|
|
if (!netif_is_bridge_master(br_dev) || !netif_is_bridge_port(dev))
|
|
|
|
return -EINVAL;
|
|
|
|
|
|
|
|
br = netdev_priv(br_dev);
|
|
|
|
|
|
|
|
if (!br_opt_get(br, BROPT_MULTICAST_ENABLED))
|
|
|
|
return 0;
|
|
|
|
|
|
|
|
/* We cannot walk over br->mdb_list protected just by the rtnl_mutex,
|
|
|
|
* because the write-side protection is br->multicast_lock. But we
|
|
|
|
* need to emulate the [ blocking ] calling context of a regular
|
|
|
|
* switchdev event, so since both br->multicast_lock and RCU read side
|
|
|
|
* critical sections are atomic, we have no choice but to pick the RCU
|
|
|
|
* read side lock, queue up all our events, leave the critical section
|
|
|
|
* and notify switchdev from blocking context.
|
|
|
|
*/
|
|
|
|
rcu_read_lock();
|
|
|
|
|
|
|
|
hlist_for_each_entry_rcu(mp, &br->mdb_list, mdb_node) {
|
2021-06-27 19:54:26 +08:00
|
|
|
struct net_bridge_port_group __rcu * const *pp;
|
|
|
|
const struct net_bridge_port_group *p;
|
net: bridge: add helper to replay port and host-joined mdb entries
I have a system with DSA ports, and udhcpcd is configured to bring
interfaces up as soon as they are created.
I create a bridge as follows:
ip link add br0 type bridge
As soon as I create the bridge and udhcpcd brings it up, I also have
avahi which automatically starts sending IPv6 packets to advertise some
local services, and because of that, the br0 bridge joins the following
IPv6 groups due to the code path detailed below:
33:33:ff:6d:c1:9c vid 0
33:33:00:00:00:6a vid 0
33:33:00:00:00:fb vid 0
br_dev_xmit
-> br_multicast_rcv
-> br_ip6_multicast_add_group
-> __br_multicast_add_group
-> br_multicast_host_join
-> br_mdb_notify
This is all fine, but inside br_mdb_notify we have br_mdb_switchdev_host
hooked up, and switchdev will attempt to offload the host joined groups
to an empty list of ports. Of course nobody offloads them.
Then when we add a port to br0:
ip link set swp0 master br0
the bridge doesn't replay the host-joined MDB entries from br_add_if,
and eventually the host joined addresses expire, and a switchdev
notification for deleting it is emitted, but surprise, the original
addition was already completely missed.
The strategy to address this problem is to replay the MDB entries (both
the port ones and the host joined ones) when the new port joins the
bridge, similar to what vxlan_fdb_replay does (in that case, its FDB can
be populated and only then attached to a bridge that you offload).
However there are 2 possibilities: the addresses can be 'pushed' by the
bridge into the port, or the port can 'pull' them from the bridge.
Considering that in the general case, the new port can be really late to
the party, and there may have been many other switchdev ports that
already received the initial notification, we would like to avoid
delivering duplicate events to them, since they might misbehave. And
currently, the bridge calls the entire switchdev notifier chain, whereas
for replaying it should just call the notifier block of the new guy.
But the bridge doesn't know what is the new guy's notifier block, it
just knows where the switchdev notifier chain is. So for simplification,
we make this a driver-initiated pull for now, and the notifier block is
passed as an argument.
To emulate the calling context for mdb objects (deferred and put on the
blocking notifier chain), we must iterate under RCU protection through
the bridge's mdb entries, queue them, and only call them once we're out
of the RCU read-side critical section.
There was some opportunity for reuse between br_mdb_switchdev_host_port,
br_mdb_notify and the newly added br_mdb_queue_one in how the switchdev
mdb object is created, so a helper was created.
Suggested-by: Ido Schimmel <idosch@idosch.org>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Acked-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-23 07:51:44 +08:00
|
|
|
|
|
|
|
if (mp->host_joined) {
|
|
|
|
err = br_mdb_queue_one(&mdb_list,
|
|
|
|
SWITCHDEV_OBJ_ID_HOST_MDB,
|
|
|
|
mp, br_dev);
|
|
|
|
if (err) {
|
|
|
|
rcu_read_unlock();
|
|
|
|
goto out_free_mdb;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
for (pp = &mp->ports; (p = rcu_dereference(*pp)) != NULL;
|
|
|
|
pp = &p->next) {
|
|
|
|
if (p->key.port->dev != dev)
|
|
|
|
continue;
|
|
|
|
|
|
|
|
err = br_mdb_queue_one(&mdb_list,
|
|
|
|
SWITCHDEV_OBJ_ID_PORT_MDB,
|
|
|
|
mp, dev);
|
|
|
|
if (err) {
|
|
|
|
rcu_read_unlock();
|
|
|
|
goto out_free_mdb;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
rcu_read_unlock();
|
|
|
|
|
2021-06-27 19:54:27 +08:00
|
|
|
if (adding)
|
|
|
|
action = SWITCHDEV_PORT_OBJ_ADD;
|
|
|
|
else
|
|
|
|
action = SWITCHDEV_PORT_OBJ_DEL;
|
|
|
|
|
net: bridge: add helper to replay port and host-joined mdb entries
I have a system with DSA ports, and udhcpcd is configured to bring
interfaces up as soon as they are created.
I create a bridge as follows:
ip link add br0 type bridge
As soon as I create the bridge and udhcpcd brings it up, I also have
avahi which automatically starts sending IPv6 packets to advertise some
local services, and because of that, the br0 bridge joins the following
IPv6 groups due to the code path detailed below:
33:33:ff:6d:c1:9c vid 0
33:33:00:00:00:6a vid 0
33:33:00:00:00:fb vid 0
br_dev_xmit
-> br_multicast_rcv
-> br_ip6_multicast_add_group
-> __br_multicast_add_group
-> br_multicast_host_join
-> br_mdb_notify
This is all fine, but inside br_mdb_notify we have br_mdb_switchdev_host
hooked up, and switchdev will attempt to offload the host joined groups
to an empty list of ports. Of course nobody offloads them.
Then when we add a port to br0:
ip link set swp0 master br0
the bridge doesn't replay the host-joined MDB entries from br_add_if,
and eventually the host joined addresses expire, and a switchdev
notification for deleting it is emitted, but surprise, the original
addition was already completely missed.
The strategy to address this problem is to replay the MDB entries (both
the port ones and the host joined ones) when the new port joins the
bridge, similar to what vxlan_fdb_replay does (in that case, its FDB can
be populated and only then attached to a bridge that you offload).
However there are 2 possibilities: the addresses can be 'pushed' by the
bridge into the port, or the port can 'pull' them from the bridge.
Considering that in the general case, the new port can be really late to
the party, and there may have been many other switchdev ports that
already received the initial notification, we would like to avoid
delivering duplicate events to them, since they might misbehave. And
currently, the bridge calls the entire switchdev notifier chain, whereas
for replaying it should just call the notifier block of the new guy.
But the bridge doesn't know what is the new guy's notifier block, it
just knows where the switchdev notifier chain is. So for simplification,
we make this a driver-initiated pull for now, and the notifier block is
passed as an argument.
To emulate the calling context for mdb objects (deferred and put on the
blocking notifier chain), we must iterate under RCU protection through
the bridge's mdb entries, queue them, and only call them once we're out
of the RCU read-side critical section.
There was some opportunity for reuse between br_mdb_switchdev_host_port,
br_mdb_notify and the newly added br_mdb_queue_one in how the switchdev
mdb object is created, so a helper was created.
Suggested-by: Ido Schimmel <idosch@idosch.org>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Acked-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-23 07:51:44 +08:00
|
|
|
list_for_each_entry(obj, &mdb_list, list) {
|
|
|
|
err = br_mdb_replay_one(nb, dev, SWITCHDEV_OBJ_PORT_MDB(obj),
|
2021-06-27 19:54:27 +08:00
|
|
|
action, ctx, extack);
|
net: bridge: add helper to replay port and host-joined mdb entries
I have a system with DSA ports, and udhcpcd is configured to bring
interfaces up as soon as they are created.
I create a bridge as follows:
ip link add br0 type bridge
As soon as I create the bridge and udhcpcd brings it up, I also have
avahi which automatically starts sending IPv6 packets to advertise some
local services, and because of that, the br0 bridge joins the following
IPv6 groups due to the code path detailed below:
33:33:ff:6d:c1:9c vid 0
33:33:00:00:00:6a vid 0
33:33:00:00:00:fb vid 0
br_dev_xmit
-> br_multicast_rcv
-> br_ip6_multicast_add_group
-> __br_multicast_add_group
-> br_multicast_host_join
-> br_mdb_notify
This is all fine, but inside br_mdb_notify we have br_mdb_switchdev_host
hooked up, and switchdev will attempt to offload the host joined groups
to an empty list of ports. Of course nobody offloads them.
Then when we add a port to br0:
ip link set swp0 master br0
the bridge doesn't replay the host-joined MDB entries from br_add_if,
and eventually the host joined addresses expire, and a switchdev
notification for deleting it is emitted, but surprise, the original
addition was already completely missed.
The strategy to address this problem is to replay the MDB entries (both
the port ones and the host joined ones) when the new port joins the
bridge, similar to what vxlan_fdb_replay does (in that case, its FDB can
be populated and only then attached to a bridge that you offload).
However there are 2 possibilities: the addresses can be 'pushed' by the
bridge into the port, or the port can 'pull' them from the bridge.
Considering that in the general case, the new port can be really late to
the party, and there may have been many other switchdev ports that
already received the initial notification, we would like to avoid
delivering duplicate events to them, since they might misbehave. And
currently, the bridge calls the entire switchdev notifier chain, whereas
for replaying it should just call the notifier block of the new guy.
But the bridge doesn't know what is the new guy's notifier block, it
just knows where the switchdev notifier chain is. So for simplification,
we make this a driver-initiated pull for now, and the notifier block is
passed as an argument.
To emulate the calling context for mdb objects (deferred and put on the
blocking notifier chain), we must iterate under RCU protection through
the bridge's mdb entries, queue them, and only call them once we're out
of the RCU read-side critical section.
There was some opportunity for reuse between br_mdb_switchdev_host_port,
br_mdb_notify and the newly added br_mdb_queue_one in how the switchdev
mdb object is created, so a helper was created.
Suggested-by: Ido Schimmel <idosch@idosch.org>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Acked-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-23 07:51:44 +08:00
|
|
|
if (err)
|
|
|
|
goto out_free_mdb;
|
|
|
|
}
|
|
|
|
|
|
|
|
out_free_mdb:
|
|
|
|
list_for_each_entry_safe(obj, tmp, &mdb_list, list) {
|
|
|
|
list_del(&obj->list);
|
|
|
|
kfree(SWITCHDEV_OBJ_PORT_MDB(obj));
|
|
|
|
}
|
|
|
|
|
|
|
|
return err;
|
|
|
|
}
|
|
|
|
|
2017-11-10 06:10:59 +08:00
|
|
|
static void br_mdb_switchdev_host_port(struct net_device *dev,
|
|
|
|
struct net_device *lower_dev,
|
2020-09-07 17:56:12 +08:00
|
|
|
struct net_bridge_mdb_entry *mp,
|
|
|
|
int type)
|
2017-11-10 06:10:59 +08:00
|
|
|
{
|
|
|
|
struct switchdev_obj_port_mdb mdb = {
|
|
|
|
.obj = {
|
|
|
|
.id = SWITCHDEV_OBJ_ID_HOST_MDB,
|
|
|
|
.flags = SWITCHDEV_F_DEFER,
|
net: bridge: add helper to replay port and host-joined mdb entries
I have a system with DSA ports, and udhcpcd is configured to bring
interfaces up as soon as they are created.
I create a bridge as follows:
ip link add br0 type bridge
As soon as I create the bridge and udhcpcd brings it up, I also have
avahi which automatically starts sending IPv6 packets to advertise some
local services, and because of that, the br0 bridge joins the following
IPv6 groups due to the code path detailed below:
33:33:ff:6d:c1:9c vid 0
33:33:00:00:00:6a vid 0
33:33:00:00:00:fb vid 0
br_dev_xmit
-> br_multicast_rcv
-> br_ip6_multicast_add_group
-> __br_multicast_add_group
-> br_multicast_host_join
-> br_mdb_notify
This is all fine, but inside br_mdb_notify we have br_mdb_switchdev_host
hooked up, and switchdev will attempt to offload the host joined groups
to an empty list of ports. Of course nobody offloads them.
Then when we add a port to br0:
ip link set swp0 master br0
the bridge doesn't replay the host-joined MDB entries from br_add_if,
and eventually the host joined addresses expire, and a switchdev
notification for deleting it is emitted, but surprise, the original
addition was already completely missed.
The strategy to address this problem is to replay the MDB entries (both
the port ones and the host joined ones) when the new port joins the
bridge, similar to what vxlan_fdb_replay does (in that case, its FDB can
be populated and only then attached to a bridge that you offload).
However there are 2 possibilities: the addresses can be 'pushed' by the
bridge into the port, or the port can 'pull' them from the bridge.
Considering that in the general case, the new port can be really late to
the party, and there may have been many other switchdev ports that
already received the initial notification, we would like to avoid
delivering duplicate events to them, since they might misbehave. And
currently, the bridge calls the entire switchdev notifier chain, whereas
for replaying it should just call the notifier block of the new guy.
But the bridge doesn't know what is the new guy's notifier block, it
just knows where the switchdev notifier chain is. So for simplification,
we make this a driver-initiated pull for now, and the notifier block is
passed as an argument.
To emulate the calling context for mdb objects (deferred and put on the
blocking notifier chain), we must iterate under RCU protection through
the bridge's mdb entries, queue them, and only call them once we're out
of the RCU read-side critical section.
There was some opportunity for reuse between br_mdb_switchdev_host_port,
br_mdb_notify and the newly added br_mdb_queue_one in how the switchdev
mdb object is created, so a helper was created.
Suggested-by: Ido Schimmel <idosch@idosch.org>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Acked-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-23 07:51:44 +08:00
|
|
|
.orig_dev = dev,
|
2017-11-10 06:10:59 +08:00
|
|
|
},
|
|
|
|
};
|
|
|
|
|
net: bridge: add helper to replay port and host-joined mdb entries
I have a system with DSA ports, and udhcpcd is configured to bring
interfaces up as soon as they are created.
I create a bridge as follows:
ip link add br0 type bridge
As soon as I create the bridge and udhcpcd brings it up, I also have
avahi which automatically starts sending IPv6 packets to advertise some
local services, and because of that, the br0 bridge joins the following
IPv6 groups due to the code path detailed below:
33:33:ff:6d:c1:9c vid 0
33:33:00:00:00:6a vid 0
33:33:00:00:00:fb vid 0
br_dev_xmit
-> br_multicast_rcv
-> br_ip6_multicast_add_group
-> __br_multicast_add_group
-> br_multicast_host_join
-> br_mdb_notify
This is all fine, but inside br_mdb_notify we have br_mdb_switchdev_host
hooked up, and switchdev will attempt to offload the host joined groups
to an empty list of ports. Of course nobody offloads them.
Then when we add a port to br0:
ip link set swp0 master br0
the bridge doesn't replay the host-joined MDB entries from br_add_if,
and eventually the host joined addresses expire, and a switchdev
notification for deleting it is emitted, but surprise, the original
addition was already completely missed.
The strategy to address this problem is to replay the MDB entries (both
the port ones and the host joined ones) when the new port joins the
bridge, similar to what vxlan_fdb_replay does (in that case, its FDB can
be populated and only then attached to a bridge that you offload).
However there are 2 possibilities: the addresses can be 'pushed' by the
bridge into the port, or the port can 'pull' them from the bridge.
Considering that in the general case, the new port can be really late to
the party, and there may have been many other switchdev ports that
already received the initial notification, we would like to avoid
delivering duplicate events to them, since they might misbehave. And
currently, the bridge calls the entire switchdev notifier chain, whereas
for replaying it should just call the notifier block of the new guy.
But the bridge doesn't know what is the new guy's notifier block, it
just knows where the switchdev notifier chain is. So for simplification,
we make this a driver-initiated pull for now, and the notifier block is
passed as an argument.
To emulate the calling context for mdb objects (deferred and put on the
blocking notifier chain), we must iterate under RCU protection through
the bridge's mdb entries, queue them, and only call them once we're out
of the RCU read-side critical section.
There was some opportunity for reuse between br_mdb_switchdev_host_port,
br_mdb_notify and the newly added br_mdb_queue_one in how the switchdev
mdb object is created, so a helper was created.
Suggested-by: Ido Schimmel <idosch@idosch.org>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Acked-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-23 07:51:44 +08:00
|
|
|
br_switchdev_mdb_populate(&mdb, mp);
|
2017-11-10 06:10:59 +08:00
|
|
|
|
|
|
|
switch (type) {
|
|
|
|
case RTM_NEWMDB:
|
2018-12-13 01:02:52 +08:00
|
|
|
switchdev_port_obj_add(lower_dev, &mdb.obj, NULL);
|
2017-11-10 06:10:59 +08:00
|
|
|
break;
|
|
|
|
case RTM_DELMDB:
|
|
|
|
switchdev_port_obj_del(lower_dev, &mdb.obj);
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
static void br_mdb_switchdev_host(struct net_device *dev,
|
2020-09-07 17:56:12 +08:00
|
|
|
struct net_bridge_mdb_entry *mp, int type)
|
2017-11-10 06:10:59 +08:00
|
|
|
{
|
|
|
|
struct net_device *lower_dev;
|
|
|
|
struct list_head *iter;
|
|
|
|
|
|
|
|
netdev_for_each_lower_dev(dev, lower_dev, iter)
|
2020-09-07 17:56:12 +08:00
|
|
|
br_mdb_switchdev_host_port(dev, lower_dev, mp, type);
|
2017-11-10 06:10:59 +08:00
|
|
|
}
|
|
|
|
|
2020-09-07 17:56:12 +08:00
|
|
|
void br_mdb_notify(struct net_device *dev,
|
|
|
|
struct net_bridge_mdb_entry *mp,
|
|
|
|
struct net_bridge_port_group *pg,
|
|
|
|
int type)
|
2016-04-21 18:52:45 +08:00
|
|
|
{
|
|
|
|
struct br_mdb_complete_info *complete_info;
|
2016-01-11 04:06:23 +08:00
|
|
|
struct switchdev_obj_port_mdb mdb = {
|
|
|
|
.obj = {
|
|
|
|
.id = SWITCHDEV_OBJ_ID_PORT_MDB,
|
|
|
|
.flags = SWITCHDEV_F_DEFER,
|
|
|
|
},
|
|
|
|
};
|
2012-12-12 06:23:07 +08:00
|
|
|
struct net *net = dev_net(dev);
|
|
|
|
struct sk_buff *skb;
|
|
|
|
int err = -ENOBUFS;
|
|
|
|
|
2020-09-07 17:56:12 +08:00
|
|
|
if (pg) {
|
net: bridge: add helper to replay port and host-joined mdb entries
I have a system with DSA ports, and udhcpcd is configured to bring
interfaces up as soon as they are created.
I create a bridge as follows:
ip link add br0 type bridge
As soon as I create the bridge and udhcpcd brings it up, I also have
avahi which automatically starts sending IPv6 packets to advertise some
local services, and because of that, the br0 bridge joins the following
IPv6 groups due to the code path detailed below:
33:33:ff:6d:c1:9c vid 0
33:33:00:00:00:6a vid 0
33:33:00:00:00:fb vid 0
br_dev_xmit
-> br_multicast_rcv
-> br_ip6_multicast_add_group
-> __br_multicast_add_group
-> br_multicast_host_join
-> br_mdb_notify
This is all fine, but inside br_mdb_notify we have br_mdb_switchdev_host
hooked up, and switchdev will attempt to offload the host joined groups
to an empty list of ports. Of course nobody offloads them.
Then when we add a port to br0:
ip link set swp0 master br0
the bridge doesn't replay the host-joined MDB entries from br_add_if,
and eventually the host joined addresses expire, and a switchdev
notification for deleting it is emitted, but surprise, the original
addition was already completely missed.
The strategy to address this problem is to replay the MDB entries (both
the port ones and the host joined ones) when the new port joins the
bridge, similar to what vxlan_fdb_replay does (in that case, its FDB can
be populated and only then attached to a bridge that you offload).
However there are 2 possibilities: the addresses can be 'pushed' by the
bridge into the port, or the port can 'pull' them from the bridge.
Considering that in the general case, the new port can be really late to
the party, and there may have been many other switchdev ports that
already received the initial notification, we would like to avoid
delivering duplicate events to them, since they might misbehave. And
currently, the bridge calls the entire switchdev notifier chain, whereas
for replaying it should just call the notifier block of the new guy.
But the bridge doesn't know what is the new guy's notifier block, it
just knows where the switchdev notifier chain is. So for simplification,
we make this a driver-initiated pull for now, and the notifier block is
passed as an argument.
To emulate the calling context for mdb objects (deferred and put on the
blocking notifier chain), we must iterate under RCU protection through
the bridge's mdb entries, queue them, and only call them once we're out
of the RCU read-side critical section.
There was some opportunity for reuse between br_mdb_switchdev_host_port,
br_mdb_notify and the newly added br_mdb_queue_one in how the switchdev
mdb object is created, so a helper was created.
Suggested-by: Ido Schimmel <idosch@idosch.org>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Acked-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-23 07:51:44 +08:00
|
|
|
br_switchdev_mdb_populate(&mdb, mp);
|
2020-10-29 07:38:31 +08:00
|
|
|
|
2020-09-22 15:30:22 +08:00
|
|
|
mdb.obj.orig_dev = pg->key.port->dev;
|
2020-09-07 17:56:12 +08:00
|
|
|
switch (type) {
|
|
|
|
case RTM_NEWMDB:
|
|
|
|
complete_info = kmalloc(sizeof(*complete_info), GFP_ATOMIC);
|
|
|
|
if (!complete_info)
|
|
|
|
break;
|
2020-09-22 15:30:22 +08:00
|
|
|
complete_info->port = pg->key.port;
|
2020-09-07 17:56:12 +08:00
|
|
|
complete_info->ip = mp->addr;
|
2016-04-21 18:52:45 +08:00
|
|
|
mdb.obj.complete_priv = complete_info;
|
|
|
|
mdb.obj.complete = br_mdb_complete;
|
2020-09-22 15:30:22 +08:00
|
|
|
if (switchdev_port_obj_add(pg->key.port->dev, &mdb.obj, NULL))
|
2017-07-12 05:55:12 +08:00
|
|
|
kfree(complete_info);
|
2020-09-07 17:56:12 +08:00
|
|
|
break;
|
|
|
|
case RTM_DELMDB:
|
2020-09-22 15:30:22 +08:00
|
|
|
switchdev_port_obj_del(pg->key.port->dev, &mdb.obj);
|
2020-09-07 17:56:12 +08:00
|
|
|
break;
|
2016-04-21 18:52:45 +08:00
|
|
|
}
|
2020-09-07 17:56:12 +08:00
|
|
|
} else {
|
|
|
|
br_mdb_switchdev_host(dev, mp, type);
|
2016-02-03 16:57:06 +08:00
|
|
|
}
|
2016-01-11 04:06:23 +08:00
|
|
|
|
2020-09-07 17:56:12 +08:00
|
|
|
skb = nlmsg_new(rtnl_mdb_nlmsg_size(pg), GFP_ATOMIC);
|
2012-12-12 06:23:07 +08:00
|
|
|
if (!skb)
|
|
|
|
goto errout;
|
|
|
|
|
2020-09-07 17:56:12 +08:00
|
|
|
err = nlmsg_populate_mdb_fill(skb, dev, mp, pg, type);
|
2012-12-12 06:23:07 +08:00
|
|
|
if (err < 0) {
|
|
|
|
kfree_skb(skb);
|
|
|
|
goto errout;
|
|
|
|
}
|
|
|
|
|
|
|
|
rtnl_notify(skb, net, 0, RTNLGRP_MDB, NULL, GFP_ATOMIC);
|
|
|
|
return;
|
|
|
|
errout:
|
|
|
|
rtnl_set_sk_err(net, RTNLGRP_MDB, err);
|
|
|
|
}
|
|
|
|
|
2015-07-23 20:00:53 +08:00
|
|
|
static int nlmsg_populate_rtr_fill(struct sk_buff *skb,
|
|
|
|
struct net_device *dev,
|
2021-07-20 01:06:33 +08:00
|
|
|
int ifindex, u16 vid, u32 pid,
|
2015-07-23 20:00:53 +08:00
|
|
|
u32 seq, int type, unsigned int flags)
|
|
|
|
{
|
2021-07-20 01:06:33 +08:00
|
|
|
struct nlattr *nest, *port_nest;
|
2015-07-23 20:00:53 +08:00
|
|
|
struct br_port_msg *bpm;
|
|
|
|
struct nlmsghdr *nlh;
|
|
|
|
|
2019-09-06 17:47:02 +08:00
|
|
|
nlh = nlmsg_put(skb, pid, seq, type, sizeof(*bpm), 0);
|
2015-07-23 20:00:53 +08:00
|
|
|
if (!nlh)
|
|
|
|
return -EMSGSIZE;
|
|
|
|
|
|
|
|
bpm = nlmsg_data(nlh);
|
|
|
|
memset(bpm, 0, sizeof(*bpm));
|
|
|
|
bpm->family = AF_BRIDGE;
|
|
|
|
bpm->ifindex = dev->ifindex;
|
2019-04-26 17:13:06 +08:00
|
|
|
nest = nla_nest_start_noflag(skb, MDBA_ROUTER);
|
2015-07-23 20:00:53 +08:00
|
|
|
if (!nest)
|
|
|
|
goto cancel;
|
|
|
|
|
2021-07-20 01:06:33 +08:00
|
|
|
port_nest = nla_nest_start_noflag(skb, MDBA_ROUTER_PORT);
|
|
|
|
if (!port_nest)
|
|
|
|
goto end;
|
|
|
|
if (nla_put_nohdr(skb, sizeof(u32), &ifindex)) {
|
|
|
|
nla_nest_cancel(skb, port_nest);
|
2015-07-23 20:00:53 +08:00
|
|
|
goto end;
|
2021-07-20 01:06:33 +08:00
|
|
|
}
|
|
|
|
if (vid && nla_put_u16(skb, MDBA_ROUTER_PATTR_VID, vid)) {
|
|
|
|
nla_nest_cancel(skb, port_nest);
|
|
|
|
goto end;
|
|
|
|
}
|
|
|
|
nla_nest_end(skb, port_nest);
|
2015-07-23 20:00:53 +08:00
|
|
|
|
|
|
|
nla_nest_end(skb, nest);
|
|
|
|
nlmsg_end(skb, nlh);
|
|
|
|
return 0;
|
|
|
|
|
|
|
|
end:
|
|
|
|
nla_nest_end(skb, nest);
|
|
|
|
cancel:
|
|
|
|
nlmsg_cancel(skb, nlh);
|
|
|
|
return -EMSGSIZE;
|
|
|
|
}
|
|
|
|
|
|
|
|
static inline size_t rtnl_rtr_nlmsg_size(void)
|
|
|
|
{
|
|
|
|
return NLMSG_ALIGN(sizeof(struct br_port_msg))
|
2021-07-20 01:06:33 +08:00
|
|
|
+ nla_total_size(sizeof(__u32))
|
|
|
|
+ nla_total_size(sizeof(u16));
|
2015-07-23 20:00:53 +08:00
|
|
|
}
|
|
|
|
|
2021-07-20 01:06:33 +08:00
|
|
|
void br_rtr_notify(struct net_device *dev, struct net_bridge_mcast_port *pmctx,
|
2015-07-23 20:00:53 +08:00
|
|
|
int type)
|
|
|
|
{
|
|
|
|
struct net *net = dev_net(dev);
|
|
|
|
struct sk_buff *skb;
|
|
|
|
int err = -ENOBUFS;
|
|
|
|
int ifindex;
|
2021-07-20 01:06:33 +08:00
|
|
|
u16 vid;
|
2015-07-23 20:00:53 +08:00
|
|
|
|
2021-07-20 01:06:33 +08:00
|
|
|
ifindex = pmctx ? pmctx->port->dev->ifindex : 0;
|
|
|
|
vid = pmctx && br_multicast_port_ctx_is_vlan(pmctx) ? pmctx->vlan->vid :
|
|
|
|
0;
|
2015-07-23 20:00:53 +08:00
|
|
|
skb = nlmsg_new(rtnl_rtr_nlmsg_size(), GFP_ATOMIC);
|
|
|
|
if (!skb)
|
|
|
|
goto errout;
|
|
|
|
|
2021-07-20 01:06:33 +08:00
|
|
|
err = nlmsg_populate_rtr_fill(skb, dev, ifindex, vid, 0, 0, type,
|
|
|
|
NTF_SELF);
|
2015-07-23 20:00:53 +08:00
|
|
|
if (err < 0) {
|
|
|
|
kfree_skb(skb);
|
|
|
|
goto errout;
|
|
|
|
}
|
|
|
|
|
|
|
|
rtnl_notify(skb, net, 0, RTNLGRP_MDB, NULL, GFP_ATOMIC);
|
|
|
|
return;
|
|
|
|
|
|
|
|
errout:
|
|
|
|
rtnl_set_sk_err(net, RTNLGRP_MDB, err);
|
|
|
|
}
|
|
|
|
|
2020-09-22 15:30:12 +08:00
|
|
|
static bool is_valid_mdb_entry(struct br_mdb_entry *entry,
|
|
|
|
struct netlink_ext_ack *extack)
|
2012-12-12 06:23:08 +08:00
|
|
|
{
|
2020-09-22 15:30:12 +08:00
|
|
|
if (entry->ifindex == 0) {
|
|
|
|
NL_SET_ERR_MSG_MOD(extack, "Zero entry ifindex is not allowed");
|
2012-12-12 06:23:08 +08:00
|
|
|
return false;
|
2020-09-22 15:30:12 +08:00
|
|
|
}
|
2012-12-12 06:23:08 +08:00
|
|
|
|
|
|
|
if (entry->addr.proto == htons(ETH_P_IP)) {
|
2020-09-22 15:30:12 +08:00
|
|
|
if (!ipv4_is_multicast(entry->addr.u.ip4)) {
|
|
|
|
NL_SET_ERR_MSG_MOD(extack, "IPv4 entry group address is not multicast");
|
2012-12-12 06:23:08 +08:00
|
|
|
return false;
|
2020-09-22 15:30:12 +08:00
|
|
|
}
|
|
|
|
if (ipv4_is_local_multicast(entry->addr.u.ip4)) {
|
|
|
|
NL_SET_ERR_MSG_MOD(extack, "IPv4 entry group address is local multicast");
|
2012-12-12 06:23:08 +08:00
|
|
|
return false;
|
2020-09-22 15:30:12 +08:00
|
|
|
}
|
2012-12-12 06:23:08 +08:00
|
|
|
#if IS_ENABLED(CONFIG_IPV6)
|
|
|
|
} else if (entry->addr.proto == htons(ETH_P_IPV6)) {
|
2020-09-22 15:30:12 +08:00
|
|
|
if (ipv6_addr_is_ll_all_nodes(&entry->addr.u.ip6)) {
|
|
|
|
NL_SET_ERR_MSG_MOD(extack, "IPv6 entry group address is link-local all nodes");
|
2012-12-12 06:23:08 +08:00
|
|
|
return false;
|
2020-09-22 15:30:12 +08:00
|
|
|
}
|
2012-12-12 06:23:08 +08:00
|
|
|
#endif
|
2020-10-29 07:38:31 +08:00
|
|
|
} else if (entry->addr.proto == 0) {
|
|
|
|
/* L2 mdb */
|
|
|
|
if (!is_multicast_ether_addr(entry->addr.u.mac_addr)) {
|
|
|
|
NL_SET_ERR_MSG_MOD(extack, "L2 entry group is not multicast");
|
|
|
|
return false;
|
|
|
|
}
|
2020-09-22 15:30:12 +08:00
|
|
|
} else {
|
|
|
|
NL_SET_ERR_MSG_MOD(extack, "Unknown entry protocol");
|
2012-12-12 06:23:08 +08:00
|
|
|
return false;
|
2020-09-22 15:30:12 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
if (entry->state != MDB_PERMANENT && entry->state != MDB_TEMPORARY) {
|
|
|
|
NL_SET_ERR_MSG_MOD(extack, "Unknown entry state");
|
2012-12-15 06:09:51 +08:00
|
|
|
return false;
|
2020-09-22 15:30:12 +08:00
|
|
|
}
|
|
|
|
if (entry->vid >= VLAN_VID_MASK) {
|
|
|
|
NL_SET_ERR_MSG_MOD(extack, "Invalid entry VLAN id");
|
2015-07-10 23:02:08 +08:00
|
|
|
return false;
|
2020-09-22 15:30:12 +08:00
|
|
|
}
|
2012-12-12 06:23:08 +08:00
|
|
|
|
|
|
|
return true;
|
|
|
|
}
|
|
|
|
|
2020-09-22 15:30:19 +08:00
|
|
|
static bool is_valid_mdb_source(struct nlattr *attr, __be16 proto,
|
|
|
|
struct netlink_ext_ack *extack)
|
|
|
|
{
|
|
|
|
switch (proto) {
|
|
|
|
case htons(ETH_P_IP):
|
|
|
|
if (nla_len(attr) != sizeof(struct in_addr)) {
|
|
|
|
NL_SET_ERR_MSG_MOD(extack, "IPv4 invalid source address length");
|
|
|
|
return false;
|
|
|
|
}
|
|
|
|
if (ipv4_is_multicast(nla_get_in_addr(attr))) {
|
|
|
|
NL_SET_ERR_MSG_MOD(extack, "IPv4 multicast source address is not allowed");
|
|
|
|
return false;
|
|
|
|
}
|
|
|
|
break;
|
|
|
|
#if IS_ENABLED(CONFIG_IPV6)
|
|
|
|
case htons(ETH_P_IPV6): {
|
|
|
|
struct in6_addr src;
|
|
|
|
|
|
|
|
if (nla_len(attr) != sizeof(struct in6_addr)) {
|
|
|
|
NL_SET_ERR_MSG_MOD(extack, "IPv6 invalid source address length");
|
|
|
|
return false;
|
|
|
|
}
|
|
|
|
src = nla_get_in6_addr(attr);
|
|
|
|
if (ipv6_addr_is_multicast(&src)) {
|
|
|
|
NL_SET_ERR_MSG_MOD(extack, "IPv6 multicast source address is not allowed");
|
|
|
|
return false;
|
|
|
|
}
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
#endif
|
|
|
|
default:
|
|
|
|
NL_SET_ERR_MSG_MOD(extack, "Invalid protocol used with source address");
|
|
|
|
return false;
|
|
|
|
}
|
|
|
|
|
|
|
|
return true;
|
|
|
|
}
|
|
|
|
|
2020-09-22 15:30:18 +08:00
|
|
|
static const struct nla_policy br_mdbe_attrs_pol[MDBE_ATTR_MAX + 1] = {
|
2020-09-22 15:30:19 +08:00
|
|
|
[MDBE_ATTR_SOURCE] = NLA_POLICY_RANGE(NLA_BINARY,
|
|
|
|
sizeof(struct in_addr),
|
|
|
|
sizeof(struct in6_addr)),
|
2020-09-22 15:30:18 +08:00
|
|
|
};
|
|
|
|
|
2012-12-12 06:23:08 +08:00
|
|
|
static int br_mdb_parse(struct sk_buff *skb, struct nlmsghdr *nlh,
|
2020-09-22 15:30:12 +08:00
|
|
|
struct net_device **pdev, struct br_mdb_entry **pentry,
|
2020-09-22 15:30:18 +08:00
|
|
|
struct nlattr **mdb_attrs, struct netlink_ext_ack *extack)
|
2012-12-12 06:23:08 +08:00
|
|
|
{
|
|
|
|
struct net *net = sock_net(skb->sk);
|
|
|
|
struct br_mdb_entry *entry;
|
|
|
|
struct br_port_msg *bpm;
|
|
|
|
struct nlattr *tb[MDBA_SET_ENTRY_MAX+1];
|
|
|
|
struct net_device *dev;
|
|
|
|
int err;
|
|
|
|
|
netlink: make validation more configurable for future strictness
We currently have two levels of strict validation:
1) liberal (default)
- undefined (type >= max) & NLA_UNSPEC attributes accepted
- attribute length >= expected accepted
- garbage at end of message accepted
2) strict (opt-in)
- NLA_UNSPEC attributes accepted
- attribute length >= expected accepted
Split out parsing strictness into four different options:
* TRAILING - check that there's no trailing data after parsing
attributes (in message or nested)
* MAXTYPE - reject attrs > max known type
* UNSPEC - reject attributes with NLA_UNSPEC policy entries
* STRICT_ATTRS - strictly validate attribute size
The default for future things should be *everything*.
The current *_strict() is a combination of TRAILING and MAXTYPE,
and is renamed to _deprecated_strict().
The current regular parsing has none of this, and is renamed to
*_parse_deprecated().
Additionally it allows us to selectively set one of the new flags
even on old policies. Notably, the UNSPEC flag could be useful in
this case, since it can be arranged (by filling in the policy) to
not be an incompatible userspace ABI change, but would then going
forward prevent forgetting attribute entries. Similar can apply
to the POLICY flag.
We end up with the following renames:
* nla_parse -> nla_parse_deprecated
* nla_parse_strict -> nla_parse_deprecated_strict
* nlmsg_parse -> nlmsg_parse_deprecated
* nlmsg_parse_strict -> nlmsg_parse_deprecated_strict
* nla_parse_nested -> nla_parse_nested_deprecated
* nla_validate_nested -> nla_validate_nested_deprecated
Using spatch, of course:
@@
expression TB, MAX, HEAD, LEN, POL, EXT;
@@
-nla_parse(TB, MAX, HEAD, LEN, POL, EXT)
+nla_parse_deprecated(TB, MAX, HEAD, LEN, POL, EXT)
@@
expression NLH, HDRLEN, TB, MAX, POL, EXT;
@@
-nlmsg_parse(NLH, HDRLEN, TB, MAX, POL, EXT)
+nlmsg_parse_deprecated(NLH, HDRLEN, TB, MAX, POL, EXT)
@@
expression NLH, HDRLEN, TB, MAX, POL, EXT;
@@
-nlmsg_parse_strict(NLH, HDRLEN, TB, MAX, POL, EXT)
+nlmsg_parse_deprecated_strict(NLH, HDRLEN, TB, MAX, POL, EXT)
@@
expression TB, MAX, NLA, POL, EXT;
@@
-nla_parse_nested(TB, MAX, NLA, POL, EXT)
+nla_parse_nested_deprecated(TB, MAX, NLA, POL, EXT)
@@
expression START, MAX, POL, EXT;
@@
-nla_validate_nested(START, MAX, POL, EXT)
+nla_validate_nested_deprecated(START, MAX, POL, EXT)
@@
expression NLH, HDRLEN, MAX, POL, EXT;
@@
-nlmsg_validate(NLH, HDRLEN, MAX, POL, EXT)
+nlmsg_validate_deprecated(NLH, HDRLEN, MAX, POL, EXT)
For this patch, don't actually add the strict, non-renamed versions
yet so that it breaks compile if I get it wrong.
Also, while at it, make nla_validate and nla_parse go down to a
common __nla_validate_parse() function to avoid code duplication.
Ultimately, this allows us to have very strict validation for every
new caller of nla_parse()/nlmsg_parse() etc as re-introduced in the
next patch, while existing things will continue to work as is.
In effect then, this adds fully strict validation for any new command.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-26 20:07:28 +08:00
|
|
|
err = nlmsg_parse_deprecated(nlh, sizeof(*bpm), tb,
|
|
|
|
MDBA_SET_ENTRY_MAX, NULL, NULL);
|
2012-12-12 06:23:08 +08:00
|
|
|
if (err < 0)
|
|
|
|
return err;
|
|
|
|
|
|
|
|
bpm = nlmsg_data(nlh);
|
|
|
|
if (bpm->ifindex == 0) {
|
2020-09-22 15:30:12 +08:00
|
|
|
NL_SET_ERR_MSG_MOD(extack, "Invalid bridge ifindex");
|
2012-12-12 06:23:08 +08:00
|
|
|
return -EINVAL;
|
|
|
|
}
|
|
|
|
|
|
|
|
dev = __dev_get_by_index(net, bpm->ifindex);
|
|
|
|
if (dev == NULL) {
|
2020-09-22 15:30:12 +08:00
|
|
|
NL_SET_ERR_MSG_MOD(extack, "Bridge device doesn't exist");
|
2012-12-12 06:23:08 +08:00
|
|
|
return -ENODEV;
|
|
|
|
}
|
|
|
|
|
|
|
|
if (!(dev->priv_flags & IFF_EBRIDGE)) {
|
2020-09-22 15:30:12 +08:00
|
|
|
NL_SET_ERR_MSG_MOD(extack, "Device is not a bridge");
|
2012-12-12 06:23:08 +08:00
|
|
|
return -EOPNOTSUPP;
|
|
|
|
}
|
|
|
|
|
|
|
|
*pdev = dev;
|
|
|
|
|
2020-09-22 15:30:12 +08:00
|
|
|
if (!tb[MDBA_SET_ENTRY]) {
|
|
|
|
NL_SET_ERR_MSG_MOD(extack, "Missing MDBA_SET_ENTRY attribute");
|
2012-12-12 06:23:08 +08:00
|
|
|
return -EINVAL;
|
|
|
|
}
|
2020-09-22 15:30:12 +08:00
|
|
|
if (nla_len(tb[MDBA_SET_ENTRY]) != sizeof(struct br_mdb_entry)) {
|
|
|
|
NL_SET_ERR_MSG_MOD(extack, "Invalid MDBA_SET_ENTRY attribute length");
|
2012-12-12 06:23:08 +08:00
|
|
|
return -EINVAL;
|
|
|
|
}
|
|
|
|
|
2020-09-22 15:30:12 +08:00
|
|
|
entry = nla_data(tb[MDBA_SET_ENTRY]);
|
|
|
|
if (!is_valid_mdb_entry(entry, extack))
|
|
|
|
return -EINVAL;
|
2012-12-12 06:23:08 +08:00
|
|
|
*pentry = entry;
|
2020-09-22 15:30:12 +08:00
|
|
|
|
2020-09-22 15:30:18 +08:00
|
|
|
if (tb[MDBA_SET_ENTRY_ATTRS]) {
|
|
|
|
err = nla_parse_nested(mdb_attrs, MDBE_ATTR_MAX,
|
|
|
|
tb[MDBA_SET_ENTRY_ATTRS],
|
|
|
|
br_mdbe_attrs_pol, extack);
|
|
|
|
if (err)
|
|
|
|
return err;
|
2020-09-22 15:30:19 +08:00
|
|
|
if (mdb_attrs[MDBE_ATTR_SOURCE] &&
|
|
|
|
!is_valid_mdb_source(mdb_attrs[MDBE_ATTR_SOURCE],
|
|
|
|
entry->addr.proto, extack))
|
|
|
|
return -EINVAL;
|
2020-09-22 15:30:18 +08:00
|
|
|
} else {
|
|
|
|
memset(mdb_attrs, 0,
|
|
|
|
sizeof(struct nlattr *) * (MDBE_ATTR_MAX + 1));
|
|
|
|
}
|
|
|
|
|
2012-12-12 06:23:08 +08:00
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2021-07-21 22:01:26 +08:00
|
|
|
static struct net_bridge_mcast *
|
|
|
|
__br_mdb_choose_context(struct net_bridge *br,
|
|
|
|
const struct br_mdb_entry *entry,
|
|
|
|
struct netlink_ext_ack *extack)
|
|
|
|
{
|
|
|
|
struct net_bridge_mcast *brmctx = NULL;
|
|
|
|
struct net_bridge_vlan *v;
|
|
|
|
|
|
|
|
if (!br_opt_get(br, BROPT_MCAST_VLAN_SNOOPING_ENABLED)) {
|
|
|
|
brmctx = &br->multicast_ctx;
|
|
|
|
goto out;
|
|
|
|
}
|
|
|
|
|
|
|
|
if (!entry->vid) {
|
|
|
|
NL_SET_ERR_MSG_MOD(extack, "Cannot add an entry without a vlan when vlan snooping is enabled");
|
|
|
|
goto out;
|
|
|
|
}
|
|
|
|
|
|
|
|
v = br_vlan_find(br_vlan_group(br), entry->vid);
|
|
|
|
if (!v) {
|
|
|
|
NL_SET_ERR_MSG_MOD(extack, "Vlan is not configured");
|
|
|
|
goto out;
|
|
|
|
}
|
|
|
|
if (br_multicast_ctx_vlan_global_disabled(&v->br_mcast_ctx)) {
|
|
|
|
NL_SET_ERR_MSG_MOD(extack, "Vlan's multicast processing is disabled");
|
|
|
|
goto out;
|
|
|
|
}
|
|
|
|
brmctx = &v->br_mcast_ctx;
|
|
|
|
out:
|
|
|
|
return brmctx;
|
|
|
|
}
|
|
|
|
|
2012-12-12 06:23:08 +08:00
|
|
|
static int br_mdb_add_group(struct net_bridge *br, struct net_bridge_port *port,
|
2020-09-22 15:30:21 +08:00
|
|
|
struct br_mdb_entry *entry,
|
|
|
|
struct nlattr **mdb_attrs,
|
2020-09-22 15:30:14 +08:00
|
|
|
struct netlink_ext_ack *extack)
|
2012-12-12 06:23:08 +08:00
|
|
|
{
|
net: bridge: mcast: handle port group filter modes
We need to handle group filter mode transitions and initial state.
To change a port group's INCLUDE -> EXCLUDE mode (or when we have added
a new port group in EXCLUDE mode) we need to add that port to all of
*,G ports' S,G entries for proper replication. When the EXCLUDE state is
changed from IGMPv3 report, br_multicast_fwd_filter_exclude() must be
called after the source list processing because the assumption is that
all of the group's S,G entries will be created before transitioning to
EXCLUDE mode, i.e. most importantly its blocked entries will already be
added so it will not get automatically added to them.
The transition EXCLUDE -> INCLUDE happens only when a port group timer
expires, it requires us to remove that port from all of *,G ports' S,G
entries where it was automatically added previously.
Finally when we are adding a new S,G entry we must add all of *,G's
EXCLUDE ports to it.
In order to distinguish automatically added *,G EXCLUDE ports we have a
new port group flag - MDB_PG_FLAGS_STAR_EXCL.
Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-22 15:30:24 +08:00
|
|
|
struct net_bridge_mdb_entry *mp, *star_mp;
|
2012-12-12 06:23:08 +08:00
|
|
|
struct net_bridge_port_group __rcu **pp;
|
2021-07-21 22:01:26 +08:00
|
|
|
struct net_bridge_port_group *p;
|
|
|
|
struct net_bridge_mcast *brmctx;
|
net: bridge: mcast: handle port group filter modes
We need to handle group filter mode transitions and initial state.
To change a port group's INCLUDE -> EXCLUDE mode (or when we have added
a new port group in EXCLUDE mode) we need to add that port to all of
*,G ports' S,G entries for proper replication. When the EXCLUDE state is
changed from IGMPv3 report, br_multicast_fwd_filter_exclude() must be
called after the source list processing because the assumption is that
all of the group's S,G entries will be created before transitioning to
EXCLUDE mode, i.e. most importantly its blocked entries will already be
added so it will not get automatically added to them.
The transition EXCLUDE -> INCLUDE happens only when a port group timer
expires, it requires us to remove that port from all of *,G ports' S,G
entries where it was automatically added previously.
Finally when we are adding a new S,G entry we must add all of *,G's
EXCLUDE ports to it.
In order to distinguish automatically added *,G EXCLUDE ports we have a
new port group flag - MDB_PG_FLAGS_STAR_EXCL.
Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-22 15:30:24 +08:00
|
|
|
struct br_ip group, star_group;
|
2015-07-06 20:53:35 +08:00
|
|
|
unsigned long now = jiffies;
|
2020-10-29 07:48:15 +08:00
|
|
|
unsigned char flags = 0;
|
2020-09-22 15:30:19 +08:00
|
|
|
u8 filter_mode;
|
2012-12-12 06:23:08 +08:00
|
|
|
int err;
|
|
|
|
|
2020-09-22 15:30:21 +08:00
|
|
|
__mdb_entry_to_br_ip(entry, &group, mdb_attrs);
|
|
|
|
|
2021-07-21 22:01:26 +08:00
|
|
|
brmctx = __br_mdb_choose_context(br, entry, extack);
|
|
|
|
if (!brmctx)
|
|
|
|
return -EINVAL;
|
|
|
|
|
2020-09-22 15:30:19 +08:00
|
|
|
/* host join errors which can happen before creating the group */
|
|
|
|
if (!port) {
|
|
|
|
/* don't allow any flags for host-joined groups */
|
|
|
|
if (entry->state) {
|
|
|
|
NL_SET_ERR_MSG_MOD(extack, "Flags are not allowed for host groups");
|
|
|
|
return -EINVAL;
|
|
|
|
}
|
2020-09-22 15:30:21 +08:00
|
|
|
if (!br_multicast_is_star_g(&group)) {
|
2020-09-22 15:30:19 +08:00
|
|
|
NL_SET_ERR_MSG_MOD(extack, "Groups with sources cannot be manually host joined");
|
|
|
|
return -EINVAL;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2020-10-29 07:38:31 +08:00
|
|
|
if (br_group_is_l2(&group) && entry->state != MDB_PERMANENT) {
|
|
|
|
NL_SET_ERR_MSG_MOD(extack, "Only permanent L2 entries allowed");
|
|
|
|
return -EINVAL;
|
|
|
|
}
|
|
|
|
|
2020-09-22 15:30:21 +08:00
|
|
|
mp = br_mdb_ip_get(br, &group);
|
2012-12-12 06:23:08 +08:00
|
|
|
if (!mp) {
|
2020-09-22 15:30:21 +08:00
|
|
|
mp = br_multicast_new_group(br, &group);
|
2016-02-10 23:09:02 +08:00
|
|
|
err = PTR_ERR_OR_ZERO(mp);
|
|
|
|
if (err)
|
2012-12-12 06:23:08 +08:00
|
|
|
return err;
|
|
|
|
}
|
|
|
|
|
2019-08-17 19:22:13 +08:00
|
|
|
/* host join */
|
|
|
|
if (!port) {
|
2020-09-22 15:30:14 +08:00
|
|
|
if (mp->host_joined) {
|
|
|
|
NL_SET_ERR_MSG_MOD(extack, "Group is already joined by host");
|
2019-08-17 19:22:13 +08:00
|
|
|
return -EEXIST;
|
2020-09-22 15:30:14 +08:00
|
|
|
}
|
2019-08-17 19:22:13 +08:00
|
|
|
|
2021-07-21 22:01:27 +08:00
|
|
|
br_multicast_host_join(brmctx, mp, false);
|
2020-09-07 17:56:12 +08:00
|
|
|
br_mdb_notify(br->dev, mp, NULL, RTM_NEWMDB);
|
2019-08-17 19:22:13 +08:00
|
|
|
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2012-12-12 06:23:08 +08:00
|
|
|
for (pp = &mp->ports;
|
|
|
|
(p = mlock_dereference(*pp, br)) != NULL;
|
|
|
|
pp = &p->next) {
|
2020-09-22 15:30:22 +08:00
|
|
|
if (p->key.port == port) {
|
2020-09-22 15:30:14 +08:00
|
|
|
NL_SET_ERR_MSG_MOD(extack, "Group is already joined by port");
|
2012-12-12 06:23:08 +08:00
|
|
|
return -EEXIST;
|
2020-09-22 15:30:14 +08:00
|
|
|
}
|
2020-09-22 15:30:22 +08:00
|
|
|
if ((unsigned long)p->key.port < (unsigned long)port)
|
2012-12-12 06:23:08 +08:00
|
|
|
break;
|
|
|
|
}
|
|
|
|
|
2020-09-22 15:30:21 +08:00
|
|
|
filter_mode = br_multicast_is_star_g(&group) ? MCAST_EXCLUDE :
|
|
|
|
MCAST_INCLUDE;
|
2020-09-22 15:30:19 +08:00
|
|
|
|
2020-10-29 07:48:15 +08:00
|
|
|
if (entry->state == MDB_PERMANENT)
|
|
|
|
flags |= MDB_PG_FLAGS_PERMANENT;
|
|
|
|
|
|
|
|
p = br_multicast_new_port_group(port, &group, *pp, flags, NULL,
|
2020-09-22 15:30:21 +08:00
|
|
|
filter_mode, RTPROT_STATIC);
|
2020-09-22 15:30:14 +08:00
|
|
|
if (unlikely(!p)) {
|
|
|
|
NL_SET_ERR_MSG_MOD(extack, "Couldn't allocate new port group");
|
2012-12-12 06:23:08 +08:00
|
|
|
return -ENOMEM;
|
2020-09-22 15:30:14 +08:00
|
|
|
}
|
2012-12-12 06:23:08 +08:00
|
|
|
rcu_assign_pointer(*pp, p);
|
2020-09-07 17:56:11 +08:00
|
|
|
if (entry->state == MDB_TEMPORARY)
|
2021-07-20 01:06:24 +08:00
|
|
|
mod_timer(&p->timer,
|
2021-07-21 22:01:26 +08:00
|
|
|
now + brmctx->multicast_membership_interval);
|
2020-09-07 17:56:12 +08:00
|
|
|
br_mdb_notify(br->dev, mp, p, RTM_NEWMDB);
|
net: bridge: mcast: handle port group filter modes
We need to handle group filter mode transitions and initial state.
To change a port group's INCLUDE -> EXCLUDE mode (or when we have added
a new port group in EXCLUDE mode) we need to add that port to all of
*,G ports' S,G entries for proper replication. When the EXCLUDE state is
changed from IGMPv3 report, br_multicast_fwd_filter_exclude() must be
called after the source list processing because the assumption is that
all of the group's S,G entries will be created before transitioning to
EXCLUDE mode, i.e. most importantly its blocked entries will already be
added so it will not get automatically added to them.
The transition EXCLUDE -> INCLUDE happens only when a port group timer
expires, it requires us to remove that port from all of *,G ports' S,G
entries where it was automatically added previously.
Finally when we are adding a new S,G entry we must add all of *,G's
EXCLUDE ports to it.
In order to distinguish automatically added *,G EXCLUDE ports we have a
new port group flag - MDB_PG_FLAGS_STAR_EXCL.
Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-22 15:30:24 +08:00
|
|
|
/* if we are adding a new EXCLUDE port group (*,G) it needs to be also
|
|
|
|
* added to all S,G entries for proper replication, if we are adding
|
|
|
|
* a new INCLUDE port (S,G) then all of *,G EXCLUDE ports need to be
|
|
|
|
* added to it for proper replication
|
|
|
|
*/
|
2021-07-21 22:01:26 +08:00
|
|
|
if (br_multicast_should_handle_mode(brmctx, group.proto)) {
|
net: bridge: mcast: handle port group filter modes
We need to handle group filter mode transitions and initial state.
To change a port group's INCLUDE -> EXCLUDE mode (or when we have added
a new port group in EXCLUDE mode) we need to add that port to all of
*,G ports' S,G entries for proper replication. When the EXCLUDE state is
changed from IGMPv3 report, br_multicast_fwd_filter_exclude() must be
called after the source list processing because the assumption is that
all of the group's S,G entries will be created before transitioning to
EXCLUDE mode, i.e. most importantly its blocked entries will already be
added so it will not get automatically added to them.
The transition EXCLUDE -> INCLUDE happens only when a port group timer
expires, it requires us to remove that port from all of *,G ports' S,G
entries where it was automatically added previously.
Finally when we are adding a new S,G entry we must add all of *,G's
EXCLUDE ports to it.
In order to distinguish automatically added *,G EXCLUDE ports we have a
new port group flag - MDB_PG_FLAGS_STAR_EXCL.
Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-22 15:30:24 +08:00
|
|
|
switch (filter_mode) {
|
|
|
|
case MCAST_EXCLUDE:
|
|
|
|
br_multicast_star_g_handle_mode(p, MCAST_EXCLUDE);
|
|
|
|
break;
|
|
|
|
case MCAST_INCLUDE:
|
|
|
|
star_group = p->key.addr;
|
|
|
|
memset(&star_group.src, 0, sizeof(star_group.src));
|
|
|
|
star_mp = br_mdb_ip_get(br, &star_group);
|
|
|
|
if (star_mp)
|
|
|
|
br_multicast_sg_add_exclude_ports(star_mp, p);
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
}
|
2012-12-12 06:23:08 +08:00
|
|
|
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
|
|
|
static int __br_mdb_add(struct net *net, struct net_bridge *br,
|
2020-09-22 15:30:13 +08:00
|
|
|
struct net_bridge_port *p,
|
2020-09-22 15:30:14 +08:00
|
|
|
struct br_mdb_entry *entry,
|
2020-09-22 15:30:19 +08:00
|
|
|
struct nlattr **mdb_attrs,
|
2020-09-22 15:30:14 +08:00
|
|
|
struct netlink_ext_ack *extack)
|
2012-12-12 06:23:08 +08:00
|
|
|
{
|
|
|
|
int ret;
|
|
|
|
|
|
|
|
spin_lock_bh(&br->multicast_lock);
|
2020-09-22 15:30:21 +08:00
|
|
|
ret = br_mdb_add_group(br, p, entry, mdb_attrs, extack);
|
2012-12-12 06:23:08 +08:00
|
|
|
spin_unlock_bh(&br->multicast_lock);
|
2020-09-22 15:30:13 +08:00
|
|
|
|
2012-12-12 06:23:08 +08:00
|
|
|
return ret;
|
|
|
|
}
|
|
|
|
|
2017-04-17 00:48:24 +08:00
|
|
|
static int br_mdb_add(struct sk_buff *skb, struct nlmsghdr *nlh,
|
|
|
|
struct netlink_ext_ack *extack)
|
2012-12-12 06:23:08 +08:00
|
|
|
{
|
2020-09-22 15:30:18 +08:00
|
|
|
struct nlattr *mdb_attrs[MDBE_ATTR_MAX + 1];
|
2012-12-12 06:23:08 +08:00
|
|
|
struct net *net = sock_net(skb->sk);
|
bridge: vlan: add per-vlan struct and move to rhashtables
This patch changes the bridge vlan implementation to use rhashtables
instead of bitmaps. The main motivation behind this change is that we
need extensible per-vlan structures (both per-port and global) so more
advanced features can be introduced and the vlan support can be
extended. I've tried to break this up but the moment net_port_vlans is
changed and the whole API goes away, thus this is a larger patch.
A few short goals of this patch are:
- Extensible per-vlan structs stored in rhashtables and a sorted list
- Keep user-visible behaviour (compressed vlans etc)
- Keep fastpath ingress/egress logic the same (optimizations to come
later)
Here's a brief list of some of the new features we'd like to introduce:
- per-vlan counters
- vlan ingress/egress mapping
- per-vlan igmp configuration
- vlan priorities
- avoid fdb entries replication (e.g. local fdb scaling issues)
The structure is kept single for both global and per-port entries so to
avoid code duplication where possible and also because we'll soon introduce
"port0 / aka bridge as port" which should simplify things further
(thanks to Vlad for the suggestion!).
Now we have per-vlan global rhashtable (bridge-wide) and per-vlan port
rhashtable, if an entry is added to a port it'll get a pointer to its
global context so it can be quickly accessed later. There's also a
sorted vlan list which is used for stable walks and some user-visible
behaviour such as the vlan ranges, also for error paths.
VLANs are stored in a "vlan group" which currently contains the
rhashtable, sorted vlan list and the number of "real" vlan entries.
A good side-effect of this change is that it resembles how hw keeps
per-vlan data.
One important note after this change is that if a VLAN is being looked up
in the bridge's rhashtable for filtering purposes (or to check if it's an
existing usable entry, not just a global context) then the new helper
br_vlan_should_use() needs to be used if the vlan is found. In case the
lookup is done only with a port's vlan group, then this check can be
skipped.
Things tested so far:
- basic vlan ingress/egress
- pvids
- untagged vlans
- undef CONFIG_BRIDGE_VLAN_FILTERING
- adding/deleting vlans in different scenarios (with/without global ctx,
while transmitting traffic, in ranges etc)
- loading/removing the module while having/adding/deleting vlans
- extracting bridge vlan information (user ABI), compressed requests
- adding/deleting fdbs on vlans
- bridge mac change, promisc mode
- default pvid change
- kmemleak ON during the whole time
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-09-26 01:00:11 +08:00
|
|
|
struct net_bridge_vlan_group *vg;
|
2019-08-17 19:22:13 +08:00
|
|
|
struct net_bridge_port *p = NULL;
|
2015-08-03 19:29:16 +08:00
|
|
|
struct net_device *dev, *pdev;
|
2012-12-12 06:23:08 +08:00
|
|
|
struct br_mdb_entry *entry;
|
bridge: vlan: add per-vlan struct and move to rhashtables
This patch changes the bridge vlan implementation to use rhashtables
instead of bitmaps. The main motivation behind this change is that we
need extensible per-vlan structures (both per-port and global) so more
advanced features can be introduced and the vlan support can be
extended. I've tried to break this up but the moment net_port_vlans is
changed and the whole API goes away, thus this is a larger patch.
A few short goals of this patch are:
- Extensible per-vlan structs stored in rhashtables and a sorted list
- Keep user-visible behaviour (compressed vlans etc)
- Keep fastpath ingress/egress logic the same (optimizations to come
later)
Here's a brief list of some of the new features we'd like to introduce:
- per-vlan counters
- vlan ingress/egress mapping
- per-vlan igmp configuration
- vlan priorities
- avoid fdb entries replication (e.g. local fdb scaling issues)
The structure is kept single for both global and per-port entries so to
avoid code duplication where possible and also because we'll soon introduce
"port0 / aka bridge as port" which should simplify things further
(thanks to Vlad for the suggestion!).
Now we have per-vlan global rhashtable (bridge-wide) and per-vlan port
rhashtable, if an entry is added to a port it'll get a pointer to its
global context so it can be quickly accessed later. There's also a
sorted vlan list which is used for stable walks and some user-visible
behaviour such as the vlan ranges, also for error paths.
VLANs are stored in a "vlan group" which currently contains the
rhashtable, sorted vlan list and the number of "real" vlan entries.
A good side-effect of this change is that it resembles how hw keeps
per-vlan data.
One important note after this change is that if a VLAN is being looked up
in the bridge's rhashtable for filtering purposes (or to check if it's an
existing usable entry, not just a global context) then the new helper
br_vlan_should_use() needs to be used if the vlan is found. In case the
lookup is done only with a port's vlan group, then this check can be
skipped.
Things tested so far:
- basic vlan ingress/egress
- pvids
- untagged vlans
- undef CONFIG_BRIDGE_VLAN_FILTERING
- adding/deleting vlans in different scenarios (with/without global ctx,
while transmitting traffic, in ranges etc)
- loading/removing the module while having/adding/deleting vlans
- extracting bridge vlan information (user ABI), compressed requests
- adding/deleting fdbs on vlans
- bridge mac change, promisc mode
- default pvid change
- kmemleak ON during the whole time
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-09-26 01:00:11 +08:00
|
|
|
struct net_bridge_vlan *v;
|
2012-12-12 06:23:08 +08:00
|
|
|
struct net_bridge *br;
|
|
|
|
int err;
|
|
|
|
|
2020-09-22 15:30:18 +08:00
|
|
|
err = br_mdb_parse(skb, nlh, &dev, &entry, mdb_attrs, extack);
|
2012-12-12 06:23:08 +08:00
|
|
|
if (err < 0)
|
|
|
|
return err;
|
|
|
|
|
|
|
|
br = netdev_priv(dev);
|
|
|
|
|
2020-09-22 15:30:14 +08:00
|
|
|
if (!netif_running(br->dev)) {
|
|
|
|
NL_SET_ERR_MSG_MOD(extack, "Bridge device is not running");
|
2020-09-22 15:30:13 +08:00
|
|
|
return -EINVAL;
|
2020-09-22 15:30:14 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
if (!br_opt_get(br, BROPT_MULTICAST_ENABLED)) {
|
|
|
|
NL_SET_ERR_MSG_MOD(extack, "Bridge's multicast processing is disabled");
|
|
|
|
return -EINVAL;
|
|
|
|
}
|
2020-09-22 15:30:13 +08:00
|
|
|
|
2019-08-17 19:22:13 +08:00
|
|
|
if (entry->ifindex != br->dev->ifindex) {
|
|
|
|
pdev = __dev_get_by_index(net, entry->ifindex);
|
2020-09-22 15:30:14 +08:00
|
|
|
if (!pdev) {
|
|
|
|
NL_SET_ERR_MSG_MOD(extack, "Port net device doesn't exist");
|
2019-08-17 19:22:13 +08:00
|
|
|
return -ENODEV;
|
2020-09-22 15:30:14 +08:00
|
|
|
}
|
2015-08-03 19:29:16 +08:00
|
|
|
|
2019-08-17 19:22:13 +08:00
|
|
|
p = br_port_get_rtnl(pdev);
|
2020-09-22 15:30:14 +08:00
|
|
|
if (!p) {
|
|
|
|
NL_SET_ERR_MSG_MOD(extack, "Net device is not a bridge port");
|
|
|
|
return -EINVAL;
|
|
|
|
}
|
|
|
|
|
|
|
|
if (p->br != br) {
|
|
|
|
NL_SET_ERR_MSG_MOD(extack, "Port belongs to a different bridge device");
|
|
|
|
return -EINVAL;
|
|
|
|
}
|
|
|
|
if (p->state == BR_STATE_DISABLED) {
|
|
|
|
NL_SET_ERR_MSG_MOD(extack, "Port is in disabled state");
|
2019-08-17 19:22:13 +08:00
|
|
|
return -EINVAL;
|
2020-09-22 15:30:14 +08:00
|
|
|
}
|
2019-08-17 19:22:13 +08:00
|
|
|
vg = nbp_vlan_group(p);
|
|
|
|
} else {
|
|
|
|
vg = br_vlan_group(br);
|
|
|
|
}
|
2015-08-03 19:29:16 +08:00
|
|
|
|
2019-08-17 19:22:10 +08:00
|
|
|
/* If vlan filtering is enabled and VLAN is not specified
|
|
|
|
* install mdb entry on all vlans configured on the port.
|
|
|
|
*/
|
2017-05-26 14:37:23 +08:00
|
|
|
if (br_vlan_enabled(br->dev) && vg && entry->vid == 0) {
|
bridge: vlan: add per-vlan struct and move to rhashtables
This patch changes the bridge vlan implementation to use rhashtables
instead of bitmaps. The main motivation behind this change is that we
need extensible per-vlan structures (both per-port and global) so more
advanced features can be introduced and the vlan support can be
extended. I've tried to break this up but the moment net_port_vlans is
changed and the whole API goes away, thus this is a larger patch.
A few short goals of this patch are:
- Extensible per-vlan structs stored in rhashtables and a sorted list
- Keep user-visible behaviour (compressed vlans etc)
- Keep fastpath ingress/egress logic the same (optimizations to come
later)
Here's a brief list of some of the new features we'd like to introduce:
- per-vlan counters
- vlan ingress/egress mapping
- per-vlan igmp configuration
- vlan priorities
- avoid fdb entries replication (e.g. local fdb scaling issues)
The structure is kept single for both global and per-port entries so to
avoid code duplication where possible and also because we'll soon introduce
"port0 / aka bridge as port" which should simplify things further
(thanks to Vlad for the suggestion!).
Now we have per-vlan global rhashtable (bridge-wide) and per-vlan port
rhashtable, if an entry is added to a port it'll get a pointer to its
global context so it can be quickly accessed later. There's also a
sorted vlan list which is used for stable walks and some user-visible
behaviour such as the vlan ranges, also for error paths.
VLANs are stored in a "vlan group" which currently contains the
rhashtable, sorted vlan list and the number of "real" vlan entries.
A good side-effect of this change is that it resembles how hw keeps
per-vlan data.
One important note after this change is that if a VLAN is being looked up
in the bridge's rhashtable for filtering purposes (or to check if it's an
existing usable entry, not just a global context) then the new helper
br_vlan_should_use() needs to be used if the vlan is found. In case the
lookup is done only with a port's vlan group, then this check can be
skipped.
Things tested so far:
- basic vlan ingress/egress
- pvids
- untagged vlans
- undef CONFIG_BRIDGE_VLAN_FILTERING
- adding/deleting vlans in different scenarios (with/without global ctx,
while transmitting traffic, in ranges etc)
- loading/removing the module while having/adding/deleting vlans
- extracting bridge vlan information (user ABI), compressed requests
- adding/deleting fdbs on vlans
- bridge mac change, promisc mode
- default pvid change
- kmemleak ON during the whole time
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-09-26 01:00:11 +08:00
|
|
|
list_for_each_entry(v, &vg->vlan_list, vlist) {
|
|
|
|
entry->vid = v->vid;
|
2020-09-22 15:30:19 +08:00
|
|
|
err = __br_mdb_add(net, br, p, entry, mdb_attrs, extack);
|
2015-08-03 19:29:16 +08:00
|
|
|
if (err)
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
} else {
|
2020-09-22 15:30:19 +08:00
|
|
|
err = __br_mdb_add(net, br, p, entry, mdb_attrs, extack);
|
2015-08-03 19:29:16 +08:00
|
|
|
}
|
|
|
|
|
2012-12-12 06:23:08 +08:00
|
|
|
return err;
|
|
|
|
}
|
|
|
|
|
2020-09-22 15:30:19 +08:00
|
|
|
static int __br_mdb_del(struct net_bridge *br, struct br_mdb_entry *entry,
|
|
|
|
struct nlattr **mdb_attrs)
|
2012-12-12 06:23:08 +08:00
|
|
|
{
|
|
|
|
struct net_bridge_mdb_entry *mp;
|
|
|
|
struct net_bridge_port_group *p;
|
|
|
|
struct net_bridge_port_group __rcu **pp;
|
|
|
|
struct br_ip ip;
|
|
|
|
int err = -EINVAL;
|
|
|
|
|
2018-09-26 22:01:03 +08:00
|
|
|
if (!netif_running(br->dev) || !br_opt_get(br, BROPT_MULTICAST_ENABLED))
|
2012-12-12 06:23:08 +08:00
|
|
|
return -EINVAL;
|
|
|
|
|
2020-09-22 15:30:19 +08:00
|
|
|
__mdb_entry_to_br_ip(entry, &ip, mdb_attrs);
|
2012-12-12 06:23:08 +08:00
|
|
|
|
|
|
|
spin_lock_bh(&br->multicast_lock);
|
2018-12-05 21:14:24 +08:00
|
|
|
mp = br_mdb_ip_get(br, &ip);
|
2012-12-12 06:23:08 +08:00
|
|
|
if (!mp)
|
|
|
|
goto unlock;
|
|
|
|
|
2019-08-17 19:22:13 +08:00
|
|
|
/* host leave */
|
|
|
|
if (entry->ifindex == mp->br->dev->ifindex && mp->host_joined) {
|
|
|
|
br_multicast_host_leave(mp, false);
|
|
|
|
err = 0;
|
2020-09-07 17:56:12 +08:00
|
|
|
br_mdb_notify(br->dev, mp, NULL, RTM_DELMDB);
|
2019-08-17 19:22:13 +08:00
|
|
|
if (!mp->ports && netif_running(br->dev))
|
|
|
|
mod_timer(&mp->timer, jiffies);
|
|
|
|
goto unlock;
|
|
|
|
}
|
|
|
|
|
2012-12-12 06:23:08 +08:00
|
|
|
for (pp = &mp->ports;
|
|
|
|
(p = mlock_dereference(*pp, br)) != NULL;
|
|
|
|
pp = &p->next) {
|
2020-09-22 15:30:22 +08:00
|
|
|
if (!p->key.port || p->key.port->dev->ifindex != entry->ifindex)
|
2012-12-12 06:23:08 +08:00
|
|
|
continue;
|
|
|
|
|
2020-09-22 15:30:22 +08:00
|
|
|
if (p->key.port->state == BR_STATE_DISABLED)
|
2012-12-12 06:23:08 +08:00
|
|
|
goto unlock;
|
|
|
|
|
2020-09-07 17:56:06 +08:00
|
|
|
br_multicast_del_pg(mp, p, pp);
|
2012-12-12 06:23:08 +08:00
|
|
|
err = 0;
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
|
|
|
|
unlock:
|
|
|
|
spin_unlock_bh(&br->multicast_lock);
|
|
|
|
return err;
|
|
|
|
}
|
|
|
|
|
2017-04-17 00:48:24 +08:00
|
|
|
static int br_mdb_del(struct sk_buff *skb, struct nlmsghdr *nlh,
|
|
|
|
struct netlink_ext_ack *extack)
|
2012-12-12 06:23:08 +08:00
|
|
|
{
|
2020-09-22 15:30:18 +08:00
|
|
|
struct nlattr *mdb_attrs[MDBE_ATTR_MAX + 1];
|
2015-08-03 19:29:16 +08:00
|
|
|
struct net *net = sock_net(skb->sk);
|
bridge: vlan: add per-vlan struct and move to rhashtables
This patch changes the bridge vlan implementation to use rhashtables
instead of bitmaps. The main motivation behind this change is that we
need extensible per-vlan structures (both per-port and global) so more
advanced features can be introduced and the vlan support can be
extended. I've tried to break this up but the moment net_port_vlans is
changed and the whole API goes away, thus this is a larger patch.
A few short goals of this patch are:
- Extensible per-vlan structs stored in rhashtables and a sorted list
- Keep user-visible behaviour (compressed vlans etc)
- Keep fastpath ingress/egress logic the same (optimizations to come
later)
Here's a brief list of some of the new features we'd like to introduce:
- per-vlan counters
- vlan ingress/egress mapping
- per-vlan igmp configuration
- vlan priorities
- avoid fdb entries replication (e.g. local fdb scaling issues)
The structure is kept single for both global and per-port entries so to
avoid code duplication where possible and also because we'll soon introduce
"port0 / aka bridge as port" which should simplify things further
(thanks to Vlad for the suggestion!).
Now we have per-vlan global rhashtable (bridge-wide) and per-vlan port
rhashtable, if an entry is added to a port it'll get a pointer to its
global context so it can be quickly accessed later. There's also a
sorted vlan list which is used for stable walks and some user-visible
behaviour such as the vlan ranges, also for error paths.
VLANs are stored in a "vlan group" which currently contains the
rhashtable, sorted vlan list and the number of "real" vlan entries.
A good side-effect of this change is that it resembles how hw keeps
per-vlan data.
One important note after this change is that if a VLAN is being looked up
in the bridge's rhashtable for filtering purposes (or to check if it's an
existing usable entry, not just a global context) then the new helper
br_vlan_should_use() needs to be used if the vlan is found. In case the
lookup is done only with a port's vlan group, then this check can be
skipped.
Things tested so far:
- basic vlan ingress/egress
- pvids
- untagged vlans
- undef CONFIG_BRIDGE_VLAN_FILTERING
- adding/deleting vlans in different scenarios (with/without global ctx,
while transmitting traffic, in ranges etc)
- loading/removing the module while having/adding/deleting vlans
- extracting bridge vlan information (user ABI), compressed requests
- adding/deleting fdbs on vlans
- bridge mac change, promisc mode
- default pvid change
- kmemleak ON during the whole time
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-09-26 01:00:11 +08:00
|
|
|
struct net_bridge_vlan_group *vg;
|
2019-08-17 19:22:13 +08:00
|
|
|
struct net_bridge_port *p = NULL;
|
2015-08-03 19:29:16 +08:00
|
|
|
struct net_device *dev, *pdev;
|
2012-12-12 06:23:08 +08:00
|
|
|
struct br_mdb_entry *entry;
|
bridge: vlan: add per-vlan struct and move to rhashtables
This patch changes the bridge vlan implementation to use rhashtables
instead of bitmaps. The main motivation behind this change is that we
need extensible per-vlan structures (both per-port and global) so more
advanced features can be introduced and the vlan support can be
extended. I've tried to break this up but the moment net_port_vlans is
changed and the whole API goes away, thus this is a larger patch.
A few short goals of this patch are:
- Extensible per-vlan structs stored in rhashtables and a sorted list
- Keep user-visible behaviour (compressed vlans etc)
- Keep fastpath ingress/egress logic the same (optimizations to come
later)
Here's a brief list of some of the new features we'd like to introduce:
- per-vlan counters
- vlan ingress/egress mapping
- per-vlan igmp configuration
- vlan priorities
- avoid fdb entries replication (e.g. local fdb scaling issues)
The structure is kept single for both global and per-port entries so to
avoid code duplication where possible and also because we'll soon introduce
"port0 / aka bridge as port" which should simplify things further
(thanks to Vlad for the suggestion!).
Now we have per-vlan global rhashtable (bridge-wide) and per-vlan port
rhashtable, if an entry is added to a port it'll get a pointer to its
global context so it can be quickly accessed later. There's also a
sorted vlan list which is used for stable walks and some user-visible
behaviour such as the vlan ranges, also for error paths.
VLANs are stored in a "vlan group" which currently contains the
rhashtable, sorted vlan list and the number of "real" vlan entries.
A good side-effect of this change is that it resembles how hw keeps
per-vlan data.
One important note after this change is that if a VLAN is being looked up
in the bridge's rhashtable for filtering purposes (or to check if it's an
existing usable entry, not just a global context) then the new helper
br_vlan_should_use() needs to be used if the vlan is found. In case the
lookup is done only with a port's vlan group, then this check can be
skipped.
Things tested so far:
- basic vlan ingress/egress
- pvids
- untagged vlans
- undef CONFIG_BRIDGE_VLAN_FILTERING
- adding/deleting vlans in different scenarios (with/without global ctx,
while transmitting traffic, in ranges etc)
- loading/removing the module while having/adding/deleting vlans
- extracting bridge vlan information (user ABI), compressed requests
- adding/deleting fdbs on vlans
- bridge mac change, promisc mode
- default pvid change
- kmemleak ON during the whole time
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-09-26 01:00:11 +08:00
|
|
|
struct net_bridge_vlan *v;
|
2012-12-12 06:23:08 +08:00
|
|
|
struct net_bridge *br;
|
|
|
|
int err;
|
|
|
|
|
2020-09-22 15:30:18 +08:00
|
|
|
err = br_mdb_parse(skb, nlh, &dev, &entry, mdb_attrs, extack);
|
2012-12-12 06:23:08 +08:00
|
|
|
if (err < 0)
|
|
|
|
return err;
|
|
|
|
|
|
|
|
br = netdev_priv(dev);
|
|
|
|
|
2019-08-17 19:22:13 +08:00
|
|
|
if (entry->ifindex != br->dev->ifindex) {
|
|
|
|
pdev = __dev_get_by_index(net, entry->ifindex);
|
|
|
|
if (!pdev)
|
|
|
|
return -ENODEV;
|
2015-08-03 19:29:16 +08:00
|
|
|
|
2019-08-17 19:22:13 +08:00
|
|
|
p = br_port_get_rtnl(pdev);
|
|
|
|
if (!p || p->br != br || p->state == BR_STATE_DISABLED)
|
|
|
|
return -EINVAL;
|
|
|
|
vg = nbp_vlan_group(p);
|
|
|
|
} else {
|
|
|
|
vg = br_vlan_group(br);
|
|
|
|
}
|
2015-08-03 19:29:16 +08:00
|
|
|
|
2019-08-17 19:22:10 +08:00
|
|
|
/* If vlan filtering is enabled and VLAN is not specified
|
|
|
|
* delete mdb entry on all vlans configured on the port.
|
|
|
|
*/
|
2017-05-26 14:37:23 +08:00
|
|
|
if (br_vlan_enabled(br->dev) && vg && entry->vid == 0) {
|
bridge: vlan: add per-vlan struct and move to rhashtables
This patch changes the bridge vlan implementation to use rhashtables
instead of bitmaps. The main motivation behind this change is that we
need extensible per-vlan structures (both per-port and global) so more
advanced features can be introduced and the vlan support can be
extended. I've tried to break this up but the moment net_port_vlans is
changed and the whole API goes away, thus this is a larger patch.
A few short goals of this patch are:
- Extensible per-vlan structs stored in rhashtables and a sorted list
- Keep user-visible behaviour (compressed vlans etc)
- Keep fastpath ingress/egress logic the same (optimizations to come
later)
Here's a brief list of some of the new features we'd like to introduce:
- per-vlan counters
- vlan ingress/egress mapping
- per-vlan igmp configuration
- vlan priorities
- avoid fdb entries replication (e.g. local fdb scaling issues)
The structure is kept single for both global and per-port entries so to
avoid code duplication where possible and also because we'll soon introduce
"port0 / aka bridge as port" which should simplify things further
(thanks to Vlad for the suggestion!).
Now we have per-vlan global rhashtable (bridge-wide) and per-vlan port
rhashtable, if an entry is added to a port it'll get a pointer to its
global context so it can be quickly accessed later. There's also a
sorted vlan list which is used for stable walks and some user-visible
behaviour such as the vlan ranges, also for error paths.
VLANs are stored in a "vlan group" which currently contains the
rhashtable, sorted vlan list and the number of "real" vlan entries.
A good side-effect of this change is that it resembles how hw keeps
per-vlan data.
One important note after this change is that if a VLAN is being looked up
in the bridge's rhashtable for filtering purposes (or to check if it's an
existing usable entry, not just a global context) then the new helper
br_vlan_should_use() needs to be used if the vlan is found. In case the
lookup is done only with a port's vlan group, then this check can be
skipped.
Things tested so far:
- basic vlan ingress/egress
- pvids
- untagged vlans
- undef CONFIG_BRIDGE_VLAN_FILTERING
- adding/deleting vlans in different scenarios (with/without global ctx,
while transmitting traffic, in ranges etc)
- loading/removing the module while having/adding/deleting vlans
- extracting bridge vlan information (user ABI), compressed requests
- adding/deleting fdbs on vlans
- bridge mac change, promisc mode
- default pvid change
- kmemleak ON during the whole time
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-09-26 01:00:11 +08:00
|
|
|
list_for_each_entry(v, &vg->vlan_list, vlist) {
|
|
|
|
entry->vid = v->vid;
|
2020-09-22 15:30:19 +08:00
|
|
|
err = __br_mdb_del(br, entry, mdb_attrs);
|
2015-08-03 19:29:16 +08:00
|
|
|
}
|
|
|
|
} else {
|
2020-09-22 15:30:19 +08:00
|
|
|
err = __br_mdb_del(br, entry, mdb_attrs);
|
2015-08-03 19:29:16 +08:00
|
|
|
}
|
|
|
|
|
2012-12-12 06:23:08 +08:00
|
|
|
return err;
|
|
|
|
}
|
|
|
|
|
2012-12-07 08:04:48 +08:00
|
|
|
void br_mdb_init(void)
|
|
|
|
{
|
2017-12-03 04:44:07 +08:00
|
|
|
rtnl_register_module(THIS_MODULE, PF_BRIDGE, RTM_GETMDB, NULL, br_mdb_dump, 0);
|
|
|
|
rtnl_register_module(THIS_MODULE, PF_BRIDGE, RTM_NEWMDB, br_mdb_add, NULL, 0);
|
|
|
|
rtnl_register_module(THIS_MODULE, PF_BRIDGE, RTM_DELMDB, br_mdb_del, NULL, 0);
|
2012-12-07 08:04:48 +08:00
|
|
|
}
|
2012-12-19 17:13:48 +08:00
|
|
|
|
|
|
|
void br_mdb_uninit(void)
|
|
|
|
{
|
|
|
|
rtnl_unregister(PF_BRIDGE, RTM_GETMDB);
|
|
|
|
rtnl_unregister(PF_BRIDGE, RTM_NEWMDB);
|
|
|
|
rtnl_unregister(PF_BRIDGE, RTM_DELMDB);
|
|
|
|
}
|