This greatly simplifies testing to verify I have fixed the problems
with a tun device disappearing when the tun file descriptor is still
held open.
Further it allows removal network namespace operations for the tun
driver. Reducing the network namespace handling in the driver to the
minimum. i.e. When we are creating a tun device.
Signed-off-by: Eric W. Biederman <ebiederm@aristanetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
With the awkward case between free_netdev and dev_chr_close fixed
there is no longer any need to limit tun and tap devices to the
network namespace they were created in. So remove the
NETIF_F_NETNS_LOCAL flag on the network device.
Signed-off-by: Eric W. Biederman <ebiederm@aristanetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The tun code does not cope gracefully if the network device goes away before
the tun file descriptor is closed. It looks like we can trigger this with
rmmod, and moving tun devices between network namespaces will allow this
to be triggered when network namespaces exit.
To fix this I introduce an intermediate data structure tun_file which
holds a count of users and a pointer to the struct tun_struct. tun_get
increments that reference count if it is greater than 0. tun_put decrements
that reference count and detaches from the network device if the count is 0.
While we have a file attached to the network device I hold a reference
to the network device keeping it from going away completely.
When a network device is unregistered I decrement the count of the
attached tun_file and if that was the last user I detach the tun_file,
and all processes on read_wait are woken up to ensure they do not
sleep indefinitely. As some of those sleeps happen with the count on
the tun device elevated waking up the read waiters ensures that
tun_file will be detached in a timely manner.
Signed-off-by: Eric W. Biederman <ebiederm@aristanetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The poll interface requires that the waitqueue exist while the struct
file is open. In the rare case when a tun device disappears before
the tun file closes we fail to provide this property, so move
read_wait.
This is safe now that tun_net_xmit is atomic with tun_detach.
Signed-off-by: Eric W. Biederman <ebiederm@aristanetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently this small race allows for a packet to be received when we
detach from an tun device and still be enqueued. Not especially
important but not what the code is trying to do.
Signed-off-by: Eric W. Biederman <ebiederm@aristanetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Grabbing namespaces in open, and putting them in close always seems to
be the cleanest approach with the fewest surprises.
So now that we have tun_file so we have somepleace to put the network
namespace, let's grab the network namespace on file open and put on
file close.
Signed-off-by: Eric W. Biederman <ebiederm@aristanetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently the tun code suffers from only having a single word of
data that exists for the entire life of the tun file descriptor.
This results in peculiar holding of references to the network namespace
as well as races between free_netdevice and tun_chr_close.
Fix this by introducing tun_file which will hold the per file state.
For the moment it still holds just a single word so the differences
are all logic changes with no changes in semantics.
Signed-off-by: Eric W. Biederman <ebiederm@aristanetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
EBADF is meaningless in the context of a poll mask so use POLLERR
instead.
Signed-off-by: Eric W. Biederman <ebiederm@aristanetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
It is possible for two different tasks with access to the same file
descriptor to call tun_set_iff on it at the same time and race to
attach to a tap device. Prevent this by placing all of the logic to
attach to a file descriptor in one function and testing the file
descriptor to be certain it is not already attached to another tun
device.
Signed-off-by: Eric W. Biederman <ebiederm@aristanetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently the tun driver keeps a private list of tun devices for what
appears to be a small gain in performance when reconnecting a file
descriptor to an existing tun or tap device. So simplify the code by
removing it.
Signed-off-by: Eric W. Biederman <ebiederm@aristanetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
register_pernet_gen_device() expects 'int*', found via sparse.
CHECK drivers/net/tun.c
drivers/net/tun.c:1245:36: warning: incorrect type in argument 1 (different signedness)
drivers/net/tun.c:1245:36: expected int *id
drivers/net/tun.c:1245:36: got unsigned int static [toplevel] *<noident>
Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6: (1429 commits)
net: Allow dependancies of FDDI & Tokenring to be modular.
igb: Fix build warning when DCA is disabled.
net: Fix warning fallout from recent NAPI interface changes.
gro: Fix potential use after free
sfc: If AN is enabled, always read speed/duplex from the AN advertising bits
sfc: When disabling the NIC, close the device rather than unregistering it
sfc: SFT9001: Add cable diagnostics
sfc: Add support for multiple PHY self-tests
sfc: Merge top-level functions for self-tests
sfc: Clean up PHY mode management in loopback self-test
sfc: Fix unreliable link detection in some loopback modes
sfc: Generate unique names for per-NIC workqueues
802.3ad: use standard ethhdr instead of ad_header
802.3ad: generalize out mac address initializer
802.3ad: initialize ports LACPDU from const initializer
802.3ad: remove typedef around ad_system
802.3ad: turn ports is_individual into a bool
802.3ad: turn ports is_enabled into a bool
802.3ad: make ntt bool
ixgbe: Fix set_ringparam in ixgbe to use the same memory pools.
...
Fixed trivial IPv4/6 address printing conflicts in fs/cifs/connect.c due
to the conversion to %pI (in this networking merge) and the addition of
doing IPv6 addresses (from the earlier merge of CIFS).
This patch moves neigh_setup and hard_start_xmit into the network device ops
structure. For bisection, fix all the previously converted drivers as well.
Bonding driver took the biggest hit on this.
Added a prefetch of the hard_start_xmit in the fast path to try and reduce
any impact this would have.
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Convert the TUN/TAP tunnel driver to net_device_ops.
Split the ops in two, and retain compatability.
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Wrap current->cred and a few other accessors to hide their actual
implementation.
Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: James Morris <jmorris@namei.org>
Acked-by: Serge Hallyn <serue@us.ibm.com>
Signed-off-by: James Morris <jmorris@namei.org>
Wrap access to task credentials so that they can be separated more easily from
the task_struct during the introduction of COW creds.
Change most current->(|e|s|fs)[ug]id to current_(|e|s|fs)[ug]id().
Change some task->e?[ug]id to task_e?[ug]id(). In some places it makes more
sense to use RCU directly rather than a convenient wrapper; these will be
addressed by later patches.
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: James Morris <jmorris@namei.org>
Acked-by: Serge Hallyn <serue@us.ibm.com>
Cc: netdev@vger.kernel.org
Signed-off-by: James Morris <jmorris@namei.org>
The generic packet receive code takes care of setting
netdev->last_rx when necessary, for the sake of the
bonding ARP monitor.
Drivers need not do it any more.
Some cases had to be skipped over because the drivers
were making use of the ->last_rx value themselves.
Signed-off-by: David S. Miller <davem@davemloft.net>
As it is, all instances of ->release() for files that have ->fasync()
need to remember to evict file from fasync lists; forgetting that
creates a hole and we actually have a bunch that *does* forget.
So let's keep our lives simple - let __fput() check FASYNC in
file->f_flags and call ->fasync() there if it's been set. And lose that
crap in ->release() instances - leaving it there is still valid, but we
don't have to bother anymore.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This converts pretty much everything to print_mac. There were
a few things that had conflicts which I have just dropped for
now, no harm done.
I've built an allyesconfig with this and looked at the files
that weren't built very carefully, but it's a huge patch.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
skb_alloc produces linear packets (using kmalloc()). That can fail,
so should we fall back to making paged skbs.
My original version of this patch always allocate paged skbs for big
packets. But that made performance drop from 8.4 seconds to 8.8
seconds on 1G lguest->Host TCP xmit. So now we only do that as a
fallback.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Acked-by: Max Krasnyansky <maxk@qualcomm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add a TUNGETIFF interface so that userspace can query a
tun/tap descriptor for its name and flags.
This is needed because it is common for one app to create
a tap interface, exec another app and pass it the file
descriptor for the interface. Without TUNGETIFF the spawned
app has no way of detecting wheter the interface has e.g.
IFF_VNET_HDR set.
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
Acked-by: Max Krasnyansky <maxk@qualcomm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Please see the following thread to get some context on this
http://marc.info/?l=linux-netdev&m=121564433018903&w=2
Basically the issue is that current multi-cast filtering stuff in
the TUN/TAP driver is seriously broken.
Original patch went in without proper review and ACK. It was broken and
confusing to start with and subsequent patches broke it completely.
To give you an idea of what's broken here are some of the issues:
- Very confusing comments throughout the code that imply that the
character device is a network interface in its own right, and that packets
are passed between the two nics. Which is completely wrong.
- Wrong set of ioctls is used for setting up filters. They look like
shortcuts for manipulating state of the tun/tap network interface but
in reality manipulate the state of the TX filter.
- ioctls that were originally used for setting address of the the TX filter
got "fixed" and now set the address of the network interface itself. Which
made filter totaly useless.
- Filtering is done too late. Instead of filtering early on, to avoid
unnecessary wakeups, filtering is done in the read() call.
The list goes on and on :)
So the patch cleans all that up. It introduces simple and clean interface for
setting up TX filters (TUNSETTXFILTER + tun_filter spec) and does filtering
before enqueuing the packets.
TX filtering is useful in the scenarios where TAP is part of a bridge, in
which case it gets all broadcast, multicast and potentially other packets when
the bridge is learning. So for example Ethernet tunnelling app may want to
setup TX filters to avoid tunnelling multicast traffic. QEMU and other
hypervisors can push RX filtering that is currently done in the guest into the
host context therefore saving wakeups and unnecessary data transfer.
Signed-off-by: Max Krasnyansky <maxk@qualcomm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The scenario goes like this. App stops reading from tun/tap.
TX queue gets full and driver does netif_stop_queue().
App closes fd and TX queue gets flushed as part of the cleanup.
Next time the app opens tun/tap and starts reading from it but
the xoff state is not cleared. We're stuck.
Normally xoff state is cleared when netdev is brought up. But
in the case of persistent devices this happens only during
initial setup.
The fix is trivial. If device is already up when an app opens
it we clear xoff state and that gets things moving again.
Signed-off-by: Max Krasnyansky <maxk@qualcomm.com>
Tested-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add a IFF_VNET_HDR flag. This uses the same ABI as virtio_net
(ie. prepending struct virtio_net_hdr to packets) to indicate GSO and
checksum information.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Acked-by: Max Krasnyansky <maxk@qualcomm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
ethtool is useful for setting (some) device fields, but it's
root-only. Finer feature control is available through a tun-specific
ioctl.
(Includes Mark McLoughlin <markmc@redhat.com>'s fix to hold rtnl sem).
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Acked-by: Max Krasnyansky <maxk@qualcomm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The problem with introducing checksum offload and gso to tun is they
need to set dev->features to enable GSO and/or checksumming, which is
supposed to be done before register_netdevice(), ie. as part of
TUNSETIFF.
Unfortunately, TUNSETIFF has always just ignored flags it doesn't
understand, so there's no good way of detecting whether the kernel
supports new IFF_ flags.
This patch implements a TUNGETFEATURES ioctl which returns all the valid IFF
flags. It could be extended later to include other features.
Here's an example program which uses it:
#include <linux/if_tun.h>
#include <sys/types.h>
#include <sys/ioctl.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <err.h>
#include <stdio.h>
static struct {
unsigned int flag;
const char *name;
} known_flags[] = {
{ IFF_TUN, "TUN" },
{ IFF_TAP, "TAP" },
{ IFF_NO_PI, "NO_PI" },
{ IFF_ONE_QUEUE, "ONE_QUEUE" },
};
int main()
{
unsigned int features, i;
int netfd = open("/dev/net/tun", O_RDWR);
if (netfd < 0)
err(1, "Opening /dev/net/tun");
if (ioctl(netfd, TUNGETFEATURES, &features) != 0) {
printf("Kernel does not support TUNGETFEATURES, guessing\n");
features = (IFF_TUN|IFF_TAP|IFF_NO_PI|IFF_ONE_QUEUE);
}
printf("Available features are: ");
for (i = 0; i < sizeof(known_flags)/sizeof(known_flags[0]); i++) {
if (features & known_flags[i].flag) {
features &= ~known_flags[i].flag;
printf("%s ", known_flags[i].name);
}
}
if (features)
printf("(UNKNOWN %#x)", features);
printf("\n");
return 0;
}
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Acked-by: Max Krasnyansky <maxk@qualcomm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
By default, tun.c running in TUN_TUN_DEV mode will set the protocol of
packet to IPv4 if TUN_NO_PI is set. My program failed to work when I
assumed that the driver will check the first nibble of packet,
determine IP version and set the appropriate protocol.
Signed-off-by: Ang Way Chuang <wcang@nav6.org>
Acked-by: Max Krasnyansky <maxk@qualcomm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Since these operations don't go through the normal
device calls, we have to ensure we synchronize with
those paths.
Noticed by Alan Cox.
Signed-off-by: David S. Miller <davem@davemloft.net>
Noticed by Alan Cox.
The IFF_UP test is a bit racey, because other entities
outside of this driver's ioctl handler can modify that
state, even though this ioctl handler runs under
lock_kernel().
Signed-off-by: David S. Miller <davem@davemloft.net>
This is basically means that a net is set for a new device, but
actually also involves two more steps:
1. mark the tun device as "local", i.e. do not allow for it to
move across namespaces.
This is done so, since tun device is most often associated to some
file (and thus to some process) and moving the device alone is not
valid while keeping the file and the process outside. The need in
ability to move a detached persistent device is to be investigated
later.
2. get the tun device's net when tun becomes attached and put one
when it becomes detached.
This is needed to handle the case when a task owning the tun dies,
but a files lives for some more time - in this case we must not
allow for net to be freed, since its exit hook will spoil that file's
private data by unregistering the tun from under tun_chr_close.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Remove the static tun_dev_list and replace its occurrences in
driver with per-net one.
It is used in two places - in tun_set_iff and tun_cleanup. In
the first case it's legal to use current net_ns. In the cleanup
call - move the loop, that unregisters all devices in net exit
hook.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
This is the first step in making tuntap devices work in net
namespaces. The structure mentioned is pointed by generic
net pointer with tun_net_id id, and tun driver fills one on
its load. It will contain only the tun devices list.
So declare this structure and introduce net init and exit hooks.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
If the user gives a packet under 14 bytes, we'll end up reading off the end
of the skb (not oopsing, just reading off the end).
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Acked-by: Max Krasnyanskiy <maxk@qualcomm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
There's no reason for this to be in the header, and it just hurts
recompile time.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Acked-by: Max Krasnyanskiy <maxk@qualcomm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Current tun/tap driver sets also net device's hw address when asked to
change character device's hw address. This is a good idea, but it
misses RTLN-locking, resulting following error message in 2.6.25-rc3's
inetdev_event() function:
RTNL: assertion failed at net/ipv4/devinet.c (1050)
Attached patch fixes this problem.
Signed-off-by: Kim B. Heino <Kim.Heino@bluegiga.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
From: "Nathaniel Filardo" <nwfilardo@gmail.com>
Taken from http://bugzilla.kernel.org/show_bug.cgi?id=9806
The TUN/TAP driver only permits one-way transitions of IFF_NO_PI or
IFF_ONE_QUEUE during the lifetime of a tap/tun interface. Note that
tun_set_iff contains
541 if (ifr->ifr_flags & IFF_NO_PI)
542 tun->flags |= TUN_NO_PI;
543
544 if (ifr->ifr_flags & IFF_ONE_QUEUE)
545 tun->flags |= TUN_ONE_QUEUE;
This is easily fixed by adding else branches which clear these bits.
Steps to reproduce:
This is easily reproduced by setting an interface persistant using tunctl then
attempting to open it as IFF_TAP or IFF_TUN, without asserting the IFF_NO_PI
flag. The ioctl() will succeed and the ifr.flags word is not modified, but the
interface remains in IFF_NO_PI mode (as it was set by tunctl).
Acked-by: Maxim Krasnyansky <maxk@qualcomm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use iov_length() instead of tun's homemade iov_total().
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This is a trivial fix of debug message.
When a persist flag is set, the message should say "enabled".
Signed-off-by: Toyo Abe <tabe@miraclelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We now have struct net_device_stats embedded in struct net_device,
and the default ->get_stats() hook does the obvious thing for us.
Run through drivers/net/* and remove the driver-local storage of
statistics, and driver-local ->get_stats() hook where applicable.
This was just the low-hanging fruit in drivers/net; plenty more drivers
remain to be updated.
[ Resolved conflicts with napi_struct changes and fix sunqe build
regression... -DaveM ]
Signed-off-by: Jeff Garzik <jeff@garzik.org>
Signed-off-by: David S. Miller <davem@davemloft.net>