linux/net/8021q/vlan_dev.c

838 lines
22 KiB
C
Raw Normal View History

/* -*- linux-c -*-
* INET 802.1Q VLAN
* Ethernet-type device handling.
*
* Authors: Ben Greear <greearb@candelatech.com>
* Please send support related email to: netdev@vger.kernel.org
* VLAN Home Page: http://www.candelatech.com/~greear/vlan.html
*
* Fixes: Mar 22 2001: Martin Bokaemper <mbokaemper@unispherenetworks.com>
* - reset skb->pkt_type on incoming packets when MAC was changed
* - see that changed MAC is saddr for outgoing packets
* Oct 20, 2001: Ard van Breeman:
* - Fix MC-list, finally.
* - Flush MC-list on VLAN destroy.
*
*
* This program is free software; you can redistribute it and/or
* modify it under the terms of the GNU General Public License
* as published by the Free Software Foundation; either version
* 2 of the License, or (at your option) any later version.
*/
#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
#include <linux/module.h>
include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h percpu.h is included by sched.h and module.h and thus ends up being included when building most .c files. percpu.h includes slab.h which in turn includes gfp.h making everything defined by the two files universally available and complicating inclusion dependencies. percpu.h -> slab.h dependency is about to be removed. Prepare for this change by updating users of gfp and slab facilities include those headers directly instead of assuming availability. As this conversion needs to touch large number of source files, the following script is used as the basis of conversion. http://userweb.kernel.org/~tj/misc/slabh-sweep.py The script does the followings. * Scan files for gfp and slab usages and update includes such that only the necessary includes are there. ie. if only gfp is used, gfp.h, if slab is used, slab.h. * When the script inserts a new include, it looks at the include blocks and try to put the new include such that its order conforms to its surrounding. It's put in the include block which contains core kernel includes, in the same order that the rest are ordered - alphabetical, Christmas tree, rev-Xmas-tree or at the end if there doesn't seem to be any matching order. * If the script can't find a place to put a new include (mostly because the file doesn't have fitting include block), it prints out an error message indicating which .h file needs to be added to the file. The conversion was done in the following steps. 1. The initial automatic conversion of all .c files updated slightly over 4000 files, deleting around 700 includes and adding ~480 gfp.h and ~3000 slab.h inclusions. The script emitted errors for ~400 files. 2. Each error was manually checked. Some didn't need the inclusion, some needed manual addition while adding it to implementation .h or embedding .c file was more appropriate for others. This step added inclusions to around 150 files. 3. The script was run again and the output was compared to the edits from #2 to make sure no file was left behind. 4. Several build tests were done and a couple of problems were fixed. e.g. lib/decompress_*.c used malloc/free() wrappers around slab APIs requiring slab.h to be added manually. 5. The script was run on all .h files but without automatically editing them as sprinkling gfp.h and slab.h inclusions around .h files could easily lead to inclusion dependency hell. Most gfp.h inclusion directives were ignored as stuff from gfp.h was usually wildly available and often used in preprocessor macros. Each slab.h inclusion directive was examined and added manually as necessary. 6. percpu.h was updated not to include slab.h. 7. Build test were done on the following configurations and failures were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my distributed build env didn't work with gcov compiles) and a few more options had to be turned off depending on archs to make things build (like ipr on powerpc/64 which failed due to missing writeq). * x86 and x86_64 UP and SMP allmodconfig and a custom test config. * powerpc and powerpc64 SMP allmodconfig * sparc and sparc64 SMP allmodconfig * ia64 SMP allmodconfig * s390 SMP allmodconfig * alpha SMP allmodconfig * um on x86_64 SMP allmodconfig 8. percpu.h modifications were reverted so that it could be applied as a separate patch and serve as bisection point. Given the fact that I had only a couple of failures from tests on step 6, I'm fairly confident about the coverage of this conversion patch. If there is a breakage, it's likely to be something in one of the arch headers which should be easily discoverable easily on most builds of the specific arch. Signed-off-by: Tejun Heo <tj@kernel.org> Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
2010-03-24 16:04:11 +08:00
#include <linux/slab.h>
#include <linux/skbuff.h>
#include <linux/netdevice.h>
#include <linux/net_tstamp.h>
#include <linux/etherdevice.h>
#include <linux/ethtool.h>
#include <linux/phy.h>
#include <net/arp.h>
#include <net/switchdev.h>
#include "vlan.h"
#include "vlanproc.h"
#include <linux/if_vlan.h>
#include <linux/netpoll.h>
/*
* Create the VLAN header for an arbitrary protocol layer
*
* saddr=NULL means use device source address
* daddr=NULL means leave destination address (eg unresolved arp)
*
* This is called when the SKB is moving down the stack towards the
* physical devices.
*/
static int vlan_dev_hard_header(struct sk_buff *skb, struct net_device *dev,
unsigned short type,
const void *daddr, const void *saddr,
unsigned int len)
{
struct vlan_dev_priv *vlan = vlan_dev_priv(dev);
struct vlan_hdr *vhdr;
unsigned int vhdrlen = 0;
u16 vlan_tci = 0;
int rc;
if (!(vlan->flags & VLAN_FLAG_REORDER_HDR)) {
vhdr = skb_push(skb, VLAN_HLEN);
vlan_tci = vlan->vlan_id;
vlan_tci |= vlan_dev_get_egress_qos_mask(dev, skb->priority);
vhdr->h_vlan_TCI = htons(vlan_tci);
/*
* Set the protocol type. For a packet of type ETH_P_802_3/2 we
* put the length in here instead.
*/
if (type != ETH_P_802_3 && type != ETH_P_802_2)
vhdr->h_vlan_encapsulated_proto = htons(type);
else
vhdr->h_vlan_encapsulated_proto = htons(len);
skb->protocol = vlan->vlan_proto;
type = ntohs(vlan->vlan_proto);
vhdrlen = VLAN_HLEN;
}
/* Before delegating work to the lower layer, enter our MAC-address */
if (saddr == NULL)
saddr = dev->dev_addr;
/* Now make the underlying real hard header */
dev = vlan->real_dev;
rc = dev_hard_header(skb, dev, type, daddr, saddr, len + vhdrlen);
if (rc > 0)
rc += vhdrlen;
return rc;
}
static inline netdev_tx_t vlan_netpoll_send_skb(struct vlan_dev_priv *vlan, struct sk_buff *skb)
{
#ifdef CONFIG_NET_POLL_CONTROLLER
if (vlan->netpoll)
netpoll_send_skb(vlan->netpoll, skb);
#else
BUG();
#endif
return NETDEV_TX_OK;
}
static netdev_tx_t vlan_dev_hard_start_xmit(struct sk_buff *skb,
struct net_device *dev)
{
struct vlan_dev_priv *vlan = vlan_dev_priv(dev);
struct vlan_ethhdr *veth = (struct vlan_ethhdr *)(skb->data);
unsigned int len;
int ret;
/* Handle non-VLAN frames if they are sent to us, for example by DHCP.
*
* NOTE: THIS ASSUMES DIX ETHERNET, SPECIFICALLY NOT SUPPORTING
* OTHER THINGS LIKE FDDI/TokenRing/802.3 SNAPs...
*/
if (veth->h_vlan_proto != vlan->vlan_proto ||
vlan->flags & VLAN_FLAG_REORDER_HDR) {
u16 vlan_tci;
vlan_tci = vlan->vlan_id;
vlan_tci |= vlan_dev_get_egress_qos_mask(dev, skb->priority);
__vlan_hwaccel_put_tag(skb, vlan->vlan_proto, vlan_tci);
}
skb->dev = vlan->real_dev;
len = skb->len;
if (unlikely(netpoll_tx_running(dev)))
return vlan_netpoll_send_skb(vlan, skb);
ret = dev_queue_xmit(skb);
if (likely(ret == NET_XMIT_SUCCESS || ret == NET_XMIT_CN)) {
struct vlan_pcpu_stats *stats;
stats = this_cpu_ptr(vlan->vlan_pcpu_stats);
u64_stats_update_begin(&stats->syncp);
stats->tx_packets++;
stats->tx_bytes += len;
u64_stats_update_end(&stats->syncp);
} else {
this_cpu_inc(vlan->vlan_pcpu_stats->tx_dropped);
}
return ret;
}
static int vlan_dev_change_mtu(struct net_device *dev, int new_mtu)
{
struct net_device *real_dev = vlan_dev_priv(dev)->real_dev;
unsigned int max_mtu = real_dev->mtu;
if (netif_reduces_vlan_mtu(real_dev))
max_mtu -= VLAN_HLEN;
if (max_mtu < new_mtu)
return -ERANGE;
dev->mtu = new_mtu;
return 0;
}
void vlan_dev_set_ingress_priority(const struct net_device *dev,
u32 skb_prio, u16 vlan_prio)
{
struct vlan_dev_priv *vlan = vlan_dev_priv(dev);
if (vlan->ingress_priority_map[vlan_prio & 0x7] && !skb_prio)
vlan->nr_ingress_mappings--;
else if (!vlan->ingress_priority_map[vlan_prio & 0x7] && skb_prio)
vlan->nr_ingress_mappings++;
vlan->ingress_priority_map[vlan_prio & 0x7] = skb_prio;
}
int vlan_dev_set_egress_priority(const struct net_device *dev,
u32 skb_prio, u16 vlan_prio)
{
struct vlan_dev_priv *vlan = vlan_dev_priv(dev);
struct vlan_priority_tci_mapping *mp = NULL;
struct vlan_priority_tci_mapping *np;
u32 vlan_qos = (vlan_prio << VLAN_PRIO_SHIFT) & VLAN_PRIO_MASK;
/* See if a priority mapping exists.. */
mp = vlan->egress_priority_map[skb_prio & 0xF];
while (mp) {
if (mp->priority == skb_prio) {
if (mp->vlan_qos && !vlan_qos)
vlan->nr_egress_mappings--;
else if (!mp->vlan_qos && vlan_qos)
vlan->nr_egress_mappings++;
mp->vlan_qos = vlan_qos;
return 0;
}
mp = mp->next;
}
/* Create a new mapping then. */
mp = vlan->egress_priority_map[skb_prio & 0xF];
np = kmalloc(sizeof(struct vlan_priority_tci_mapping), GFP_KERNEL);
if (!np)
return -ENOBUFS;
np->next = mp;
np->priority = skb_prio;
np->vlan_qos = vlan_qos;
/* Before inserting this element in hash table, make sure all its fields
* are committed to memory.
* coupled with smp_rmb() in vlan_dev_get_egress_qos_mask()
*/
smp_wmb();
vlan->egress_priority_map[skb_prio & 0xF] = np;
if (vlan_qos)
vlan->nr_egress_mappings++;
return 0;
}
/* Flags are defined in the vlan_flags enum in
* include/uapi/linux/if_vlan.h file.
*/
int vlan_dev_change_flags(const struct net_device *dev, u32 flags, u32 mask)
{
struct vlan_dev_priv *vlan = vlan_dev_priv(dev);
u32 old_flags = vlan->flags;
if (mask & ~(VLAN_FLAG_REORDER_HDR | VLAN_FLAG_GVRP |
VLAN_FLAG_LOOSE_BINDING | VLAN_FLAG_MVRP))
return -EINVAL;
vlan->flags = (old_flags & ~mask) | (flags & mask);
if (netif_running(dev) && (vlan->flags ^ old_flags) & VLAN_FLAG_GVRP) {
if (vlan->flags & VLAN_FLAG_GVRP)
vlan_gvrp_request_join(dev);
else
vlan_gvrp_request_leave(dev);
}
if (netif_running(dev) && (vlan->flags ^ old_flags) & VLAN_FLAG_MVRP) {
if (vlan->flags & VLAN_FLAG_MVRP)
vlan_mvrp_request_join(dev);
else
vlan_mvrp_request_leave(dev);
}
return 0;
}
void vlan_dev_get_realdev_name(const struct net_device *dev, char *result)
{
strncpy(result, vlan_dev_priv(dev)->real_dev->name, 23);
}
bool vlan_dev_inherit_address(struct net_device *dev,
struct net_device *real_dev)
{
if (dev->addr_assign_type != NET_ADDR_STOLEN)
return false;
ether_addr_copy(dev->dev_addr, real_dev->dev_addr);
call_netdevice_notifiers(NETDEV_CHANGEADDR, dev);
return true;
}
static int vlan_dev_open(struct net_device *dev)
{
struct vlan_dev_priv *vlan = vlan_dev_priv(dev);
struct net_device *real_dev = vlan->real_dev;
int err;
if (!(real_dev->flags & IFF_UP) &&
!(vlan->flags & VLAN_FLAG_LOOSE_BINDING))
return -ENETDOWN;
if (!ether_addr_equal(dev->dev_addr, real_dev->dev_addr) &&
!vlan_dev_inherit_address(dev, real_dev)) {
err = dev_uc_add(real_dev, dev->dev_addr);
if (err < 0)
goto out;
}
if (dev->flags & IFF_ALLMULTI) {
err = dev_set_allmulti(real_dev, 1);
if (err < 0)
goto del_unicast;
}
if (dev->flags & IFF_PROMISC) {
err = dev_set_promiscuity(real_dev, 1);
if (err < 0)
goto clear_allmulti;
}
ether_addr_copy(vlan->real_dev_addr, real_dev->dev_addr);
if (vlan->flags & VLAN_FLAG_GVRP)
vlan_gvrp_request_join(dev);
if (vlan->flags & VLAN_FLAG_MVRP)
vlan_mvrp_request_join(dev);
if (netif_carrier_ok(real_dev))
netif_carrier_on(dev);
return 0;
clear_allmulti:
if (dev->flags & IFF_ALLMULTI)
dev_set_allmulti(real_dev, -1);
del_unicast:
if (!ether_addr_equal(dev->dev_addr, real_dev->dev_addr))
dev_uc_del(real_dev, dev->dev_addr);
out:
netif_carrier_off(dev);
return err;
}
static int vlan_dev_stop(struct net_device *dev)
{
struct vlan_dev_priv *vlan = vlan_dev_priv(dev);
struct net_device *real_dev = vlan->real_dev;
dev_mc_unsync(real_dev, dev);
dev_uc_unsync(real_dev, dev);
if (dev->flags & IFF_ALLMULTI)
dev_set_allmulti(real_dev, -1);
if (dev->flags & IFF_PROMISC)
dev_set_promiscuity(real_dev, -1);
if (!ether_addr_equal(dev->dev_addr, real_dev->dev_addr))
dev_uc_del(real_dev, dev->dev_addr);
netif_carrier_off(dev);
return 0;
}
static int vlan_dev_set_mac_address(struct net_device *dev, void *p)
{
struct net_device *real_dev = vlan_dev_priv(dev)->real_dev;
struct sockaddr *addr = p;
int err;
if (!is_valid_ether_addr(addr->sa_data))
return -EADDRNOTAVAIL;
if (!(dev->flags & IFF_UP))
goto out;
if (!ether_addr_equal(addr->sa_data, real_dev->dev_addr)) {
err = dev_uc_add(real_dev, addr->sa_data);
if (err < 0)
return err;
}
if (!ether_addr_equal(dev->dev_addr, real_dev->dev_addr))
dev_uc_del(real_dev, dev->dev_addr);
out:
ether_addr_copy(dev->dev_addr, addr->sa_data);
return 0;
}
static int vlan_dev_ioctl(struct net_device *dev, struct ifreq *ifr, int cmd)
{
struct net_device *real_dev = vlan_dev_priv(dev)->real_dev;
const struct net_device_ops *ops = real_dev->netdev_ops;
struct ifreq ifrr;
int err = -EOPNOTSUPP;
strncpy(ifrr.ifr_name, real_dev->name, IFNAMSIZ);
ifrr.ifr_ifru = ifr->ifr_ifru;
switch (cmd) {
case SIOCGMIIPHY:
case SIOCGMIIREG:
case SIOCSMIIREG:
case SIOCSHWTSTAMP:
case SIOCGHWTSTAMP:
if (netif_device_present(real_dev) && ops->ndo_do_ioctl)
err = ops->ndo_do_ioctl(real_dev, &ifrr, cmd);
break;
}
if (!err)
ifr->ifr_ifru = ifrr.ifr_ifru;
return err;
}
static int vlan_dev_neigh_setup(struct net_device *dev, struct neigh_parms *pa)
{
struct net_device *real_dev = vlan_dev_priv(dev)->real_dev;
const struct net_device_ops *ops = real_dev->netdev_ops;
int err = 0;
if (netif_device_present(real_dev) && ops->ndo_neigh_setup)
err = ops->ndo_neigh_setup(real_dev, pa);
return err;
}
#if IS_ENABLED(CONFIG_FCOE)
static int vlan_dev_fcoe_ddp_setup(struct net_device *dev, u16 xid,
struct scatterlist *sgl, unsigned int sgc)
{
struct net_device *real_dev = vlan_dev_priv(dev)->real_dev;
const struct net_device_ops *ops = real_dev->netdev_ops;
int rc = 0;
if (ops->ndo_fcoe_ddp_setup)
rc = ops->ndo_fcoe_ddp_setup(real_dev, xid, sgl, sgc);
return rc;
}
static int vlan_dev_fcoe_ddp_done(struct net_device *dev, u16 xid)
{
struct net_device *real_dev = vlan_dev_priv(dev)->real_dev;
const struct net_device_ops *ops = real_dev->netdev_ops;
int len = 0;
if (ops->ndo_fcoe_ddp_done)
len = ops->ndo_fcoe_ddp_done(real_dev, xid);
return len;
}
static int vlan_dev_fcoe_enable(struct net_device *dev)
{
struct net_device *real_dev = vlan_dev_priv(dev)->real_dev;
const struct net_device_ops *ops = real_dev->netdev_ops;
int rc = -EINVAL;
if (ops->ndo_fcoe_enable)
rc = ops->ndo_fcoe_enable(real_dev);
return rc;
}
static int vlan_dev_fcoe_disable(struct net_device *dev)
{
struct net_device *real_dev = vlan_dev_priv(dev)->real_dev;
const struct net_device_ops *ops = real_dev->netdev_ops;
int rc = -EINVAL;
if (ops->ndo_fcoe_disable)
rc = ops->ndo_fcoe_disable(real_dev);
return rc;
}
static int vlan_dev_fcoe_get_wwn(struct net_device *dev, u64 *wwn, int type)
{
struct net_device *real_dev = vlan_dev_priv(dev)->real_dev;
const struct net_device_ops *ops = real_dev->netdev_ops;
int rc = -EINVAL;
if (ops->ndo_fcoe_get_wwn)
rc = ops->ndo_fcoe_get_wwn(real_dev, wwn, type);
return rc;
}
static int vlan_dev_fcoe_ddp_target(struct net_device *dev, u16 xid,
struct scatterlist *sgl, unsigned int sgc)
{
struct net_device *real_dev = vlan_dev_priv(dev)->real_dev;
const struct net_device_ops *ops = real_dev->netdev_ops;
int rc = 0;
if (ops->ndo_fcoe_ddp_target)
rc = ops->ndo_fcoe_ddp_target(real_dev, xid, sgl, sgc);
return rc;
}
#endif
static void vlan_dev_change_rx_flags(struct net_device *dev, int change)
{
struct net_device *real_dev = vlan_dev_priv(dev)->real_dev;
if (dev->flags & IFF_UP) {
if (change & IFF_ALLMULTI)
dev_set_allmulti(real_dev, dev->flags & IFF_ALLMULTI ? 1 : -1);
if (change & IFF_PROMISC)
dev_set_promiscuity(real_dev, dev->flags & IFF_PROMISC ? 1 : -1);
}
}
static void vlan_dev_set_rx_mode(struct net_device *vlan_dev)
{
dev_mc_sync(vlan_dev_priv(vlan_dev)->real_dev, vlan_dev);
dev_uc_sync(vlan_dev_priv(vlan_dev)->real_dev, vlan_dev);
}
/*
* vlan network devices have devices nesting below it, and are a special
* "super class" of normal network devices; split their locks off into a
* separate class since they always nest.
*/
static struct lock_class_key vlan_netdev_xmit_lock_key;
static struct lock_class_key vlan_netdev_addr_lock_key;
static void vlan_dev_set_lockdep_one(struct net_device *dev,
struct netdev_queue *txq,
void *_subclass)
{
lockdep_set_class_and_subclass(&txq->_xmit_lock,
&vlan_netdev_xmit_lock_key,
*(int *)_subclass);
}
static void vlan_dev_set_lockdep_class(struct net_device *dev, int subclass)
{
lockdep_set_class_and_subclass(&dev->addr_list_lock,
&vlan_netdev_addr_lock_key,
subclass);
netdev_for_each_tx_queue(dev, vlan_dev_set_lockdep_one, &subclass);
}
static int vlan_dev_get_lock_subclass(struct net_device *dev)
{
return vlan_dev_priv(dev)->nest_level;
}
static const struct header_ops vlan_header_ops = {
.create = vlan_dev_hard_header,
.parse = eth_header_parse,
};
static int vlan_passthru_hard_header(struct sk_buff *skb, struct net_device *dev,
unsigned short type,
const void *daddr, const void *saddr,
unsigned int len)
{
struct vlan_dev_priv *vlan = vlan_dev_priv(dev);
struct net_device *real_dev = vlan->real_dev;
if (saddr == NULL)
saddr = dev->dev_addr;
return dev_hard_header(skb, real_dev, type, daddr, saddr, len);
}
static const struct header_ops vlan_passthru_header_ops = {
.create = vlan_passthru_hard_header,
.parse = eth_header_parse,
};
static struct device_type vlan_type = {
.name = "vlan",
};
static const struct net_device_ops vlan_netdev_ops;
static int vlan_dev_init(struct net_device *dev)
{
struct net_device *real_dev = vlan_dev_priv(dev)->real_dev;
netif_carrier_off(dev);
/* IFF_BROADCAST|IFF_MULTICAST; ??? */
dev->flags = real_dev->flags & ~(IFF_UP | IFF_PROMISC | IFF_ALLMULTI |
IFF_MASTER | IFF_SLAVE);
dev->state = (real_dev->state & ((1<<__LINK_STATE_NOCARRIER) |
(1<<__LINK_STATE_DORMANT))) |
(1<<__LINK_STATE_PRESENT);
dev->hw_features = NETIF_F_HW_CSUM | NETIF_F_SG |
NETIF_F_FRAGLIST | NETIF_F_GSO_SOFTWARE |
NETIF_F_GSO_ENCAP_ALL |
NETIF_F_HIGHDMA | NETIF_F_SCTP_CRC |
NETIF_F_ALL_FCOE;
net/8021q: create device with all possible features in wanted_features wanted_features is a set of features which have to be enabled if a hardware allows that. Currently when a vlan device is created, its wanted_features is set to current features of its base device. The problem is that the base device can get new features and they are not propagated to vlan-s of this device. If we look at bonding devices, they doesn't have this problem and this patch suggests to fix this issue by the same way how it works for bonding devices. We meet this problem, when we try to create a vlan device over a bonding device. When a system are booting, real devices require time to be initialized, so bonding devices created without slaves, then vlan devices are created and only then ethernet devices are added to the bonding device. As a result we have vlan devices with disabled scatter-gather. * create a bonding device $ ip link add bond0 type bond $ ethtool -k bond0 | grep scatter scatter-gather: off tx-scatter-gather: off [requested on] tx-scatter-gather-fraglist: off [requested on] * create a vlan device $ ip link add link bond0 name bond0.10 type vlan id 10 $ ethtool -k bond0.10 | grep scatter scatter-gather: off tx-scatter-gather: off tx-scatter-gather-fraglist: off * Add a slave device to bond0 $ ip link set dev eth0 master bond0 And now we can see that the bond0 device has got the scatter-gather feature, but the bond0.10 hasn't got it. [root@laptop linux-task-diag]# ethtool -k bond0 | grep scatter scatter-gather: on tx-scatter-gather: on tx-scatter-gather-fraglist: on [root@laptop linux-task-diag]# ethtool -k bond0.10 | grep scatter scatter-gather: off tx-scatter-gather: off tx-scatter-gather-fraglist: off With this patch the vlan device will get all new features from the bonding device. Here is a call trace how features which are set in this patch reach dev->wanted_features. register_netdevice vlan_dev_init ... dev->hw_features = NETIF_F_HW_CSUM | NETIF_F_SG | NETIF_F_FRAGLIST | NETIF_F_GSO_SOFTWARE | NETIF_F_HIGHDMA | NETIF_F_SCTP_CRC | NETIF_F_ALL_FCOE; dev->features |= dev->hw_features; ... dev->wanted_features = dev->features & dev->hw_features; __netdev_update_features(dev); vlan_dev_fix_features ... Cc: Alexey Kuznetsov <kuznet@virtuozzo.com> Cc: Patrick McHardy <kaber@trash.net> Cc: "David S. Miller" <davem@davemloft.net> Signed-off-by: Andrei Vagin <avagin@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-16 08:41:14 +08:00
dev->features |= dev->hw_features | NETIF_F_LLTX;
dev->gso_max_size = real_dev->gso_max_size;
dev->gso_max_segs = real_dev->gso_max_segs;
if (dev->features & NETIF_F_VLAN_FEATURES)
netdev_warn(real_dev, "VLAN features are set incorrectly. Q-in-Q configurations may not work correctly.\n");
dev->vlan_features = real_dev->vlan_features & ~NETIF_F_ALL_FCOE;
dev->hw_enc_features = vlan_tnl_features(real_dev);
/* ipv6 shared card related stuff */
dev->dev_id = real_dev->dev_id;
if (is_zero_ether_addr(dev->dev_addr)) {
ether_addr_copy(dev->dev_addr, real_dev->dev_addr);
dev->addr_assign_type = NET_ADDR_STOLEN;
}
if (is_zero_ether_addr(dev->broadcast))
memcpy(dev->broadcast, real_dev->broadcast, dev->addr_len);
#if IS_ENABLED(CONFIG_FCOE)
dev->fcoe_ddp_xid = real_dev->fcoe_ddp_xid;
#endif
dev->needed_headroom = real_dev->needed_headroom;
if (vlan_hw_offload_capable(real_dev->features,
vlan_dev_priv(dev)->vlan_proto)) {
dev->header_ops = &vlan_passthru_header_ops;
dev->hard_header_len = real_dev->hard_header_len;
} else {
dev->header_ops = &vlan_header_ops;
dev->hard_header_len = real_dev->hard_header_len + VLAN_HLEN;
}
dev->netdev_ops = &vlan_netdev_ops;
SET_NETDEV_DEVTYPE(dev, &vlan_type);
vlan_dev_set_lockdep_class(dev, vlan_dev_get_lock_subclass(dev));
vlan_dev_priv(dev)->vlan_pcpu_stats = netdev_alloc_pcpu_stats(struct vlan_pcpu_stats);
if (!vlan_dev_priv(dev)->vlan_pcpu_stats)
return -ENOMEM;
return 0;
}
static void vlan_dev_uninit(struct net_device *dev)
{
struct vlan_priority_tci_mapping *pm;
struct vlan_dev_priv *vlan = vlan_dev_priv(dev);
int i;
for (i = 0; i < ARRAY_SIZE(vlan->egress_priority_map); i++) {
while ((pm = vlan->egress_priority_map[i]) != NULL) {
vlan->egress_priority_map[i] = pm->next;
kfree(pm);
}
}
}
static netdev_features_t vlan_dev_fix_features(struct net_device *dev,
netdev_features_t features)
{
struct net_device *real_dev = vlan_dev_priv(dev)->real_dev;
netdev_features_t old_features = features;
netdev_features_t lower_features;
lower_features = netdev_intersect_features((real_dev->vlan_features |
NETIF_F_RXCSUM),
real_dev->features);
/* Add HW_CSUM setting to preserve user ability to control
* checksum offload on the vlan device.
*/
if (lower_features & (NETIF_F_IP_CSUM|NETIF_F_IPV6_CSUM))
lower_features |= NETIF_F_HW_CSUM;
features = netdev_intersect_features(features, lower_features);
features |= old_features & (NETIF_F_SOFT_FEATURES | NETIF_F_GSO_SOFTWARE);
features |= NETIF_F_LLTX;
return features;
}
static int vlan_ethtool_get_link_ksettings(struct net_device *dev,
struct ethtool_link_ksettings *cmd)
{
const struct vlan_dev_priv *vlan = vlan_dev_priv(dev);
return __ethtool_get_link_ksettings(vlan->real_dev, cmd);
}
static void vlan_ethtool_get_drvinfo(struct net_device *dev,
struct ethtool_drvinfo *info)
{
strlcpy(info->driver, vlan_fullname, sizeof(info->driver));
strlcpy(info->version, vlan_version, sizeof(info->version));
strlcpy(info->fw_version, "N/A", sizeof(info->fw_version));
}
static int vlan_ethtool_get_ts_info(struct net_device *dev,
struct ethtool_ts_info *info)
{
const struct vlan_dev_priv *vlan = vlan_dev_priv(dev);
const struct ethtool_ops *ops = vlan->real_dev->ethtool_ops;
struct phy_device *phydev = vlan->real_dev->phydev;
if (phydev && phydev->drv && phydev->drv->ts_info) {
return phydev->drv->ts_info(phydev, info);
} else if (ops->get_ts_info) {
return ops->get_ts_info(vlan->real_dev, info);
} else {
info->so_timestamping = SOF_TIMESTAMPING_RX_SOFTWARE |
SOF_TIMESTAMPING_SOFTWARE;
info->phc_index = -1;
}
return 0;
}
static void vlan_dev_get_stats64(struct net_device *dev,
struct rtnl_link_stats64 *stats)
{
struct vlan_pcpu_stats *p;
u32 rx_errors = 0, tx_dropped = 0;
int i;
for_each_possible_cpu(i) {
u64 rxpackets, rxbytes, rxmulticast, txpackets, txbytes;
unsigned int start;
p = per_cpu_ptr(vlan_dev_priv(dev)->vlan_pcpu_stats, i);
do {
start = u64_stats_fetch_begin_irq(&p->syncp);
rxpackets = p->rx_packets;
rxbytes = p->rx_bytes;
rxmulticast = p->rx_multicast;
txpackets = p->tx_packets;
txbytes = p->tx_bytes;
} while (u64_stats_fetch_retry_irq(&p->syncp, start));
stats->rx_packets += rxpackets;
stats->rx_bytes += rxbytes;
stats->multicast += rxmulticast;
stats->tx_packets += txpackets;
stats->tx_bytes += txbytes;
/* rx_errors & tx_dropped are u32 */
rx_errors += p->rx_errors;
tx_dropped += p->tx_dropped;
}
stats->rx_errors = rx_errors;
stats->tx_dropped = tx_dropped;
}
#ifdef CONFIG_NET_POLL_CONTROLLER
static void vlan_dev_poll_controller(struct net_device *dev)
{
return;
}
netpoll: Remove gfp parameter from __netpoll_setup The gfp parameter was added in: commit 47be03a28cc6c80e3aa2b3e8ed6d960ff0c5c0af Author: Amerigo Wang <amwang@redhat.com> Date: Fri Aug 10 01:24:37 2012 +0000 netpoll: use GFP_ATOMIC in slave_enable_netpoll() and __netpoll_setup() slave_enable_netpoll() and __netpoll_setup() may be called with read_lock() held, so should use GFP_ATOMIC to allocate memory. Eric suggested to pass gfp flags to __netpoll_setup(). Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: "David S. Miller" <davem@davemloft.net> Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Cong Wang <amwang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> The reason for the gfp parameter was removed in: commit c4cdef9b7183159c23c7302aaf270d64c549f557 Author: dingtianhong <dingtianhong@huawei.com> Date: Tue Jul 23 15:25:27 2013 +0800 bonding: don't call slave_xxx_netpoll under spinlocks The slave_xxx_netpoll will call synchronize_rcu_bh(), so the function may schedule and sleep, it should't be called under spinlocks. bond_netpoll_setup() and bond_netpoll_cleanup() are always protected by rtnl lock, it is no need to take the read lock, as the slave list couldn't be changed outside rtnl lock. Signed-off-by: Ding Tianhong <dingtianhong@huawei.com> Cc: Jay Vosburgh <fubar@us.ibm.com> Cc: Andy Gospodarek <andy@greyhouse.net> Signed-off-by: David S. Miller <davem@davemloft.net> Nothing else that calls __netpoll_setup or ndo_netpoll_setup requires a gfp paramter, so remove the gfp parameter from both of these functions making the code clearer. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-28 06:36:38 +08:00
static int vlan_dev_netpoll_setup(struct net_device *dev, struct netpoll_info *npinfo)
{
struct vlan_dev_priv *vlan = vlan_dev_priv(dev);
struct net_device *real_dev = vlan->real_dev;
struct netpoll *netpoll;
int err = 0;
netpoll: Remove gfp parameter from __netpoll_setup The gfp parameter was added in: commit 47be03a28cc6c80e3aa2b3e8ed6d960ff0c5c0af Author: Amerigo Wang <amwang@redhat.com> Date: Fri Aug 10 01:24:37 2012 +0000 netpoll: use GFP_ATOMIC in slave_enable_netpoll() and __netpoll_setup() slave_enable_netpoll() and __netpoll_setup() may be called with read_lock() held, so should use GFP_ATOMIC to allocate memory. Eric suggested to pass gfp flags to __netpoll_setup(). Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: "David S. Miller" <davem@davemloft.net> Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Cong Wang <amwang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> The reason for the gfp parameter was removed in: commit c4cdef9b7183159c23c7302aaf270d64c549f557 Author: dingtianhong <dingtianhong@huawei.com> Date: Tue Jul 23 15:25:27 2013 +0800 bonding: don't call slave_xxx_netpoll under spinlocks The slave_xxx_netpoll will call synchronize_rcu_bh(), so the function may schedule and sleep, it should't be called under spinlocks. bond_netpoll_setup() and bond_netpoll_cleanup() are always protected by rtnl lock, it is no need to take the read lock, as the slave list couldn't be changed outside rtnl lock. Signed-off-by: Ding Tianhong <dingtianhong@huawei.com> Cc: Jay Vosburgh <fubar@us.ibm.com> Cc: Andy Gospodarek <andy@greyhouse.net> Signed-off-by: David S. Miller <davem@davemloft.net> Nothing else that calls __netpoll_setup or ndo_netpoll_setup requires a gfp paramter, so remove the gfp parameter from both of these functions making the code clearer. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-28 06:36:38 +08:00
netpoll = kzalloc(sizeof(*netpoll), GFP_KERNEL);
err = -ENOMEM;
if (!netpoll)
goto out;
netpoll: Remove gfp parameter from __netpoll_setup The gfp parameter was added in: commit 47be03a28cc6c80e3aa2b3e8ed6d960ff0c5c0af Author: Amerigo Wang <amwang@redhat.com> Date: Fri Aug 10 01:24:37 2012 +0000 netpoll: use GFP_ATOMIC in slave_enable_netpoll() and __netpoll_setup() slave_enable_netpoll() and __netpoll_setup() may be called with read_lock() held, so should use GFP_ATOMIC to allocate memory. Eric suggested to pass gfp flags to __netpoll_setup(). Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: "David S. Miller" <davem@davemloft.net> Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Cong Wang <amwang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> The reason for the gfp parameter was removed in: commit c4cdef9b7183159c23c7302aaf270d64c549f557 Author: dingtianhong <dingtianhong@huawei.com> Date: Tue Jul 23 15:25:27 2013 +0800 bonding: don't call slave_xxx_netpoll under spinlocks The slave_xxx_netpoll will call synchronize_rcu_bh(), so the function may schedule and sleep, it should't be called under spinlocks. bond_netpoll_setup() and bond_netpoll_cleanup() are always protected by rtnl lock, it is no need to take the read lock, as the slave list couldn't be changed outside rtnl lock. Signed-off-by: Ding Tianhong <dingtianhong@huawei.com> Cc: Jay Vosburgh <fubar@us.ibm.com> Cc: Andy Gospodarek <andy@greyhouse.net> Signed-off-by: David S. Miller <davem@davemloft.net> Nothing else that calls __netpoll_setup or ndo_netpoll_setup requires a gfp paramter, so remove the gfp parameter from both of these functions making the code clearer. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-28 06:36:38 +08:00
err = __netpoll_setup(netpoll, real_dev);
if (err) {
kfree(netpoll);
goto out;
}
vlan->netpoll = netpoll;
out:
return err;
}
static void vlan_dev_netpoll_cleanup(struct net_device *dev)
{
struct vlan_dev_priv *vlan= vlan_dev_priv(dev);
struct netpoll *netpoll = vlan->netpoll;
if (!netpoll)
return;
vlan->netpoll = NULL;
__netpoll_free(netpoll);
}
#endif /* CONFIG_NET_POLL_CONTROLLER */
static int vlan_dev_get_iflink(const struct net_device *dev)
{
struct net_device *real_dev = vlan_dev_priv(dev)->real_dev;
return real_dev->ifindex;
}
static const struct ethtool_ops vlan_ethtool_ops = {
.get_link_ksettings = vlan_ethtool_get_link_ksettings,
.get_drvinfo = vlan_ethtool_get_drvinfo,
.get_link = ethtool_op_get_link,
.get_ts_info = vlan_ethtool_get_ts_info,
};
static const struct net_device_ops vlan_netdev_ops = {
.ndo_change_mtu = vlan_dev_change_mtu,
.ndo_init = vlan_dev_init,
.ndo_uninit = vlan_dev_uninit,
.ndo_open = vlan_dev_open,
.ndo_stop = vlan_dev_stop,
.ndo_start_xmit = vlan_dev_hard_start_xmit,
.ndo_validate_addr = eth_validate_addr,
.ndo_set_mac_address = vlan_dev_set_mac_address,
.ndo_set_rx_mode = vlan_dev_set_rx_mode,
.ndo_change_rx_flags = vlan_dev_change_rx_flags,
.ndo_do_ioctl = vlan_dev_ioctl,
.ndo_neigh_setup = vlan_dev_neigh_setup,
.ndo_get_stats64 = vlan_dev_get_stats64,
#if IS_ENABLED(CONFIG_FCOE)
.ndo_fcoe_ddp_setup = vlan_dev_fcoe_ddp_setup,
.ndo_fcoe_ddp_done = vlan_dev_fcoe_ddp_done,
.ndo_fcoe_enable = vlan_dev_fcoe_enable,
.ndo_fcoe_disable = vlan_dev_fcoe_disable,
.ndo_fcoe_get_wwn = vlan_dev_fcoe_get_wwn,
.ndo_fcoe_ddp_target = vlan_dev_fcoe_ddp_target,
#endif
#ifdef CONFIG_NET_POLL_CONTROLLER
.ndo_poll_controller = vlan_dev_poll_controller,
.ndo_netpoll_setup = vlan_dev_netpoll_setup,
.ndo_netpoll_cleanup = vlan_dev_netpoll_cleanup,
#endif
.ndo_fix_features = vlan_dev_fix_features,
.ndo_get_lock_subclass = vlan_dev_get_lock_subclass,
.ndo_get_iflink = vlan_dev_get_iflink,
};
vlan: free percpu stats in device destructor Madalin-Cristian reported crashs happening after a recent commit (5a4ae5f6e7d4 "vlan: unnecessary to check if vlan_pcpu_stats is NULL") ----------------------------------------------------------------------- root@p5040ds:~# vconfig add eth8 1 root@p5040ds:~# vconfig rem eth8.1 Unable to handle kernel paging request for data at address 0x2bc88028 Faulting instruction address: 0xc058e950 Oops: Kernel access of bad area, sig: 11 [#1] SMP NR_CPUS=8 CoreNet Generic Modules linked in: CPU: 3 PID: 2167 Comm: vconfig Tainted: G W 3.16.0-rc3-00346-g65e85bf #2 task: e7264d90 ti: e2c2c000 task.ti: e2c2c000 NIP: c058e950 LR: c058ea30 CTR: c058e900 REGS: e2c2db20 TRAP: 0300 Tainted: G W (3.16.0-rc3-00346-g65e85bf) MSR: 00029002 <CE,EE,ME> CR: 48000428 XER: 20000000 DEAR: 2bc88028 ESR: 00000000 GPR00: c047299c e2c2dbd0 e7264d90 00000000 2bc88000 00000000 ffffffff 00000000 GPR08: 0000000f 00000000 000000ff 00000000 28000422 10121928 10100000 10100000 GPR16: 10100000 00000000 c07c5968 00000000 00000000 00000000 e2c2dc48 e7838000 GPR24: c07c5bac c07c58a8 e77290cc c07b0000 00000000 c05de6c0 e7838000 e2c2dc48 NIP [c058e950] vlan_dev_get_stats64+0x50/0x170 LR [c058ea30] vlan_dev_get_stats64+0x130/0x170 Call Trace: [e2c2dbd0] [ffffffea] 0xffffffea (unreliable) [e2c2dc20] [c047299c] dev_get_stats+0x4c/0x140 [e2c2dc40] [c0488ca8] rtnl_fill_ifinfo+0x3d8/0x960 [e2c2dd70] [c0489f4c] rtmsg_ifinfo+0x6c/0x110 [e2c2dd90] [c04731d4] rollback_registered_many+0x344/0x3b0 [e2c2ddd0] [c047332c] rollback_registered+0x2c/0x50 [e2c2ddf0] [c0476058] unregister_netdevice_queue+0x78/0xf0 [e2c2de00] [c058d800] unregister_vlan_dev+0xc0/0x160 [e2c2de20] [c058e360] vlan_ioctl_handler+0x1c0/0x550 [e2c2de90] [c045d11c] sock_ioctl+0x28c/0x2f0 [e2c2deb0] [c010d070] do_vfs_ioctl+0x90/0x7b0 [e2c2df20] [c010d7d0] SyS_ioctl+0x40/0x80 [e2c2df40] [c000f924] ret_from_syscall+0x0/0x3c Fix this problem by freeing percpu stats from dev->destructor() instead of ndo_uninit() Reported-by: Madalin-Cristian Bucur <madalin.bucur@freescale.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Tested-by: Madalin-Cristian Bucur <madalin.bucur@freescale.com> Fixes: 5a4ae5f6e7d4 ("vlan: unnecessary to check if vlan_pcpu_stats is NULL") Cc: Li RongQing <roy.qing.li@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-07-02 17:25:15 +08:00
static void vlan_dev_free(struct net_device *dev)
{
struct vlan_dev_priv *vlan = vlan_dev_priv(dev);
free_percpu(vlan->vlan_pcpu_stats);
vlan->vlan_pcpu_stats = NULL;
}
void vlan_setup(struct net_device *dev)
{
ether_setup(dev);
dev->priv_flags |= IFF_802_1Q_VLAN | IFF_NO_QUEUE;
dev->priv_flags |= IFF_UNICAST_FLT;
dev->priv_flags &= ~IFF_TX_SKB_SHARING;
netif_keep_dst(dev);
dev->netdev_ops = &vlan_netdev_ops;
net: Fix inconsistent teardown and release of private netdev state. Network devices can allocate reasources and private memory using netdev_ops->ndo_init(). However, the release of these resources can occur in one of two different places. Either netdev_ops->ndo_uninit() or netdev->destructor(). The decision of which operation frees the resources depends upon whether it is necessary for all netdev refs to be released before it is safe to perform the freeing. netdev_ops->ndo_uninit() presumably can occur right after the NETDEV_UNREGISTER notifier completes and the unicast and multicast address lists are flushed. netdev->destructor(), on the other hand, does not run until the netdev references all go away. Further complicating the situation is that netdev->destructor() almost universally does also a free_netdev(). This creates a problem for the logic in register_netdevice(). Because all callers of register_netdevice() manage the freeing of the netdev, and invoke free_netdev(dev) if register_netdevice() fails. If netdev_ops->ndo_init() succeeds, but something else fails inside of register_netdevice(), it does call ndo_ops->ndo_uninit(). But it is not able to invoke netdev->destructor(). This is because netdev->destructor() will do a free_netdev() and then the caller of register_netdevice() will do the same. However, this means that the resources that would normally be released by netdev->destructor() will not be. Over the years drivers have added local hacks to deal with this, by invoking their destructor parts by hand when register_netdevice() fails. Many drivers do not try to deal with this, and instead we have leaks. Let's close this hole by formalizing the distinction between what private things need to be freed up by netdev->destructor() and whether the driver needs unregister_netdevice() to perform the free_netdev(). netdev->priv_destructor() performs all actions to free up the private resources that used to be freed by netdev->destructor(), except for free_netdev(). netdev->needs_free_netdev is a boolean that indicates whether free_netdev() should be done at the end of unregister_netdevice(). Now, register_netdevice() can sanely release all resources after ndo_ops->ndo_init() succeeds, by invoking both ndo_ops->ndo_uninit() and netdev->priv_destructor(). And at the end of unregister_netdevice(), we invoke netdev->priv_destructor() and optionally call free_netdev(). Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-09 00:52:56 +08:00
dev->needs_free_netdev = true;
dev->priv_destructor = vlan_dev_free;
dev->ethtool_ops = &vlan_ethtool_ops;
net: use core MTU range checking in core net infra geneve: - Merge __geneve_change_mtu back into geneve_change_mtu, set max_mtu - This one isn't quite as straight-forward as others, could use some closer inspection and testing macvlan: - set min/max_mtu tun: - set min/max_mtu, remove tun_net_change_mtu vxlan: - Merge __vxlan_change_mtu back into vxlan_change_mtu - Set max_mtu to IP_MAX_MTU and retain dynamic MTU range checks in change_mtu function - This one is also not as straight-forward and could use closer inspection and testing from vxlan folks bridge: - set max_mtu of IP_MAX_MTU and retain dynamic MTU range checks in change_mtu function openvswitch: - set min/max_mtu, remove internal_dev_change_mtu - note: max_mtu wasn't checked previously, it's been set to 65535, which is the largest possible size supported sch_teql: - set min/max_mtu (note: max_mtu previously unchecked, used max of 65535) macsec: - min_mtu = 0, max_mtu = 65535 macvlan: - min_mtu = 0, max_mtu = 65535 ntb_netdev: - min_mtu = 0, max_mtu = 65535 veth: - min_mtu = 68, max_mtu = 65535 8021q: - min_mtu = 0, max_mtu = 65535 CC: netdev@vger.kernel.org CC: Nicolas Dichtel <nicolas.dichtel@6wind.com> CC: Hannes Frederic Sowa <hannes@stressinduktion.org> CC: Tom Herbert <tom@herbertland.com> CC: Daniel Borkmann <daniel@iogearbox.net> CC: Alexander Duyck <alexander.h.duyck@intel.com> CC: Paolo Abeni <pabeni@redhat.com> CC: Jiri Benc <jbenc@redhat.com> CC: WANG Cong <xiyou.wangcong@gmail.com> CC: Roopa Prabhu <roopa@cumulusnetworks.com> CC: Pravin B Shelar <pshelar@ovn.org> CC: Sabrina Dubroca <sd@queasysnail.net> CC: Patrick McHardy <kaber@trash.net> CC: Stephen Hemminger <stephen@networkplumber.org> CC: Pravin Shelar <pshelar@nicira.com> CC: Maxim Krasnyansky <maxk@qti.qualcomm.com> Signed-off-by: Jarod Wilson <jarod@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-21 01:55:20 +08:00
dev->min_mtu = 0;
dev->max_mtu = ETH_MAX_MTU;
eth_zero_addr(dev->broadcast);
}