linux

Commit Graph

Author	SHA1	Message	Date
stephen hemminger	a676847b39	tun: allow setting ethernet addresss while running This is a pure software device, and ok with live address change. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-12-11 12:49:53 -05:00
Paul Moore	b3943aef7e	tun: correctly report an error in tun_flow_init() On error, the error code from tun_flow_init() is lost inside tun_set_iff(), this patch fixes this by assigning the tun_flow_init() error code to the "err" variable which is returned by the tun_flow_init() function on error. Signed-off-by: Paul Moore <pmoore@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-12-07 13:20:46 -05:00
Michael S. Tsirkin	5d09710925	tun: only queue packets on device Historically tun supported two modes of operation: - in default mode, a small number of packets would get queued at the device, the rest would be queued in qdisc - in one queue mode, all packets would get queued at the device This might have made sense up to a point where we made the queue depth for both modes the same and set it to a huge value (500) so unless the consumer is stuck the chance of losing packets is small. Thus in practice both modes behave the same, but the default mode has some problems: - if packets are never consumed, fragments are never orphaned which cases a DOS for sender using zero copy transmit - overrun errors are hard to diagnose: fifo error is incremented only once so you can not distinguish between userspace that is stuck and a transient failure, tcpdump on the device does not show any traffic Userspace solves this simply by enabling IFF_ONE_QUEUE but there seems to be little point in not doing the right thing for everyone, by default. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-12-03 15:07:36 -05:00
Jason Wang	eb0fb363f9	tuntap: attach queue 0 before registering netdevice We attach queue 0 after registering netdevice currently. This leads to call netif_set_real_num_{tx\|rx}_queues() after registering the netdevice. Since we allow tun/tap has a maximum of 1024 queues, this may lead a huge number of uevents to be injected to userspace since we create 2048 kobjects and then remove 2046. Solve this problem by attaching queue 0 and set the real number of queues before registering netdevice. Reported-by: Jiri Slaby <jslaby@suse.cz> Tested-by: Jiri Slaby <jslaby@suse.cz> Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-12-03 13:47:57 -05:00
Rami Rosen	3872baf618	tun: put correct method name in a debug message. This patch puts the correct method name, tun_do_read, in a debug message. Signed-off-by: Rami Rosen <ramirose@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-11-26 17:22:10 -05:00
Rami Rosen	36fe8c0973	vtun: fix typos. This patch fixes four typos in drivers/net/vtun.c. Signed-off-by: Rami Rosen <ramirose@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-11-26 17:22:09 -05:00
Rami Rosen	9ce99cf6dc	tun: change tun_get_iff() prototype. This patch changes tun_get_iff() prototype to return void as it never fails. Signed-off-by: Rami Rosen <ramirose@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-11-23 14:24:46 -05:00
Eric W. Biederman	c260b7722f	net: Allow userns root to control tun and tap devices Allow an unpriviled user who has created a user namespace, and then created a network namespace to effectively use the new network namespace, by reducing capable(CAP_NET_ADMIN) calls to ns_capable(net->user_ns,CAP_NET_ADMIN) calls. Allow setting of the tun iff flags. Allow creating of tun devices. Allow adding a new queue to a tun device. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Acked-by: Serge Hallyn <serge.hallyn@canonical.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-11-19 14:15:54 -05:00
Michael S. Tsirkin	149d36f718	tun: report orphan frags errors to zero copy callback When tun transmits a zero copy skb, it orphans the frags which might need to allocate extra memory, in atomic context. If that fails, notify ubufs callback before freeing the skb as a hint that device should disable zerocopy mode. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-11-02 21:29:57 -04:00
Jason Wang	96442e4242	tuntap: choose the txq based on rxq This patch implements a simple multiqueue flow steering policy - tx follows rx for tun/tap. The idea is simple, it just choose the txq based on which rxq it comes. The flow were identified through the rxhash of a skb, and the hash to queue mapping were recorded in a hlist with an ageing timer to retire the mapping. The mapping were created when tun receives packet from userspace, and was quired in .ndo_select_queue(). I run co-current TCP_CRR test and didn't see any mapping manipulation helpers in perf top, so the overhead could be negelected. Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-11-01 11:14:09 -04:00
Jason Wang	cde8b15f1a	tuntap: add ioctl to attach or detach a file form tuntap device Sometimes usespace may need to active/deactive a queue, this could be done by detaching and attaching a file from tuntap device. This patch introduces a new ioctls - TUNSETQUEUE which could be used to do this. Flag IFF_ATTACH_QUEUE were introduced to do attaching while IFF_DETACH_QUEUE were introduced to do the detaching. Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-11-01 11:14:08 -04:00
Jason Wang	c8d68e6be1	tuntap: multiqueue support This patch converts tun/tap to a multiqueue devices and expose the multiqueue queues as multiple file descriptors to userspace. Internally, each tun_file were abstracted as a queue, and an array of pointers to tun_file structurs were stored in tun_structure device, so multiple tun_files were allowed to be attached to the device as multiple queues. When choosing txq, we first try to identify a flow through its rxhash, if it does not have such one, we could try recorded rxq and then use them to choose the transmit queue. This policy may be changed in the future. Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-11-01 11:14:08 -04:00
Jason Wang	6e914fc707	tuntap: RCUify dereferencing between tun_struct and tun_file RCU were introduced in this patch to synchronize the dereferences between tun_struct and tun_file. All tun_{get\|put} were replaced with RCU, the dereference from one to other must be done under rtnl lock or rcu read critical region. This is needed for the following patches since the one of the goal of multiqueue tuntap is to allow adding or removing queues during workload. Without RCU, control path would hold tx locks when adding or removing queues (which may cause sme delay) and it's hard to change the number of queues without stopping the net device. With the help of rcu, there's also no need for tun_file hold an refcnt to tun_struct. Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-11-01 11:14:07 -04:00
Jason Wang	54f968d6ef	tuntap: move socket to tun_file Current tuntap makes use of the socket receive queue as its tx queue. To implement multiple tx queues for tuntap and enable the ability of adding and removing queues during workload, the first step is to move the socket related structures to tun_file. Then we could let multiple fds/sockets to be attached to the tuntap. This patch removes tun_sock and moves socket related structures from tun_sock or tun_struct to tun_file. Two exceptions are tap_filter and sock_fprog, they are still kept in tun_structure since they are used to filter packets for the net device instead of per transmit queue (at least I see no requirements for them). After those changes, socket were created and destroyed during file open and close (instead of device creation and destroy), the socket structures could be dereferenced from tun_file instead of the file of tun_struct structure itself. For persisent device, since we purge during datching and wouldn't queue any packets when no interface were attached, there's no behaviod changes before and after this patch, so the changes were transparent to the userspace. To keep the attributes such as sndbuf, socket filter and vnet header, those would be re-initialize after a new interface were attached to an persist device. Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-11-01 11:14:07 -04:00
Jason Wang	1e5883382c	tuntap: log the unsigned informaiton with %u Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-11-01 11:14:06 -04:00
Daniel Wagner	6a328d8c6f	cgroup: net_cls: Rework update socket logic The cgroup logic part of net_cls is very similar as the one in net_prio. Let's stream line the net_cls logic with the net_prio one. The net_prio update logic was changed by following commit (note there were some changes necessary later on) commit `406a3c638c` Author: John Fastabend <john.r.fastabend@intel.com> Date: Fri Jul 20 10:39:25 2012 +0000 net: netprio_cgroup: rework update socket logic Instead of updating the sk_cgrp_prioidx struct field on every send this only updates the field when a task is moved via cgroup infrastructure. This allows sockets that may be used by a kernel worker thread to be managed. For example in the iscsi case today a user can put iscsid in a netprio cgroup and control traffic will be sent with the correct sk_cgrp_prioidx value set but as soon as data is sent the kernel worker thread isssues a send and sk_cgrp_prioidx is updated with the kernel worker threads value which is the default case. It seems more correct to only update the field when the user explicitly sets it via control group infrastructure. This allows the users to manage sockets that may be used with other threads. Since classid is now updated when the task is moved between the cgroups, we don't have to call sock_update_classid() from various places to ensure we always using the latest classid value. [v2: Use iterate_fd() instead of open coding] Signed-off-by: Daniel Wagner <daniel.wagner@bmw-carit.de> Cc: Li Zefan <lizefan@huawei.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Jamal Hadi Salim <jhs@mojatatu.com> Cc: Joe Perches <joe@perches.com> Cc: John Fastabend <john.r.fastabend@intel.com> Cc: Neil Horman <nhorman@tuxdriver.com> Cc: Stanislav Kinsbursky <skinsbursky@parallels.com> Cc: Tejun Heo <tj@kernel.org> Cc: <netdev@vger.kernel.org> Cc: <cgroups@vger.kernel.org> Acked-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-10-26 03:40:51 -04:00
Daniel Wagner	fd9a08a7b8	cgroup: net_cls: Pass in task to sock_update_classid() sock_update_classid() assumes that the update operation always are applied on the current task. sock_update_classid() needs to know on which tasks to work on in order to be able to migrate task between cgroups using the struct cgroup_subsys attach() callback. Signed-off-by: Daniel Wagner <daniel.wagner@bmw-carit.de> Cc: "David S. Miller" <davem@davemloft.net> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Eric Dumazet <edumazet@google.com> Cc: Glauber Costa <glommer@parallels.com> Cc: Joe Perches <joe@perches.com> Cc: Neil Horman <nhorman@tuxdriver.com> Cc: Stanislav Kinsbursky <skinsbursky@parallels.com> Cc: Tejun Heo <tj@kernel.org> Cc: <netdev@vger.kernel.org> Cc: <cgroups@vger.kernel.org> Acked-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-10-26 03:40:50 -04:00
Linus Torvalds	437589a74b	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace Pull user namespace changes from Eric Biederman: "This is a mostly modest set of changes to enable basic user namespace support. This allows the code to code to compile with user namespaces enabled and removes the assumption there is only the initial user namespace. Everything is converted except for the most complex of the filesystems: autofs4, 9p, afs, ceph, cifs, coda, fuse, gfs2, ncpfs, nfs, ocfs2 and xfs as those patches need a bit more review. The strategy is to push kuid_t and kgid_t values are far down into subsystems and filesystems as reasonable. Leaving the make_kuid and from_kuid operations to happen at the edge of userspace, as the values come off the disk, and as the values come in from the network. Letting compile type incompatible compile errors (present when user namespaces are enabled) guide me to find the issues. The most tricky areas have been the places where we had an implicit union of uid and gid values and were storing them in an unsigned int. Those places were converted into explicit unions. I made certain to handle those places with simple trivial patches. Out of that work I discovered we have generic interfaces for storing quota by projid. I had never heard of the project identifiers before. Adding full user namespace support for project identifiers accounts for most of the code size growth in my git tree. Ultimately there will be work to relax privlige checks from "capable(FOO)" to "ns_capable(user_ns, FOO)" where it is safe allowing root in a user names to do those things that today we only forbid to non-root users because it will confuse suid root applications. While I was pushing kuid_t and kgid_t changes deep into the audit code I made a few other cleanups. I capitalized on the fact we process netlink messages in the context of the message sender. I removed usage of NETLINK_CRED, and started directly using current->tty. Some of these patches have also made it into maintainer trees, with no problems from identical code from different trees showing up in linux-next. After reading through all of this code I feel like I might be able to win a game of kernel trivial pursuit." Fix up some fairly trivial conflicts in netfilter uid/git logging code. * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: (107 commits) userns: Convert the ufs filesystem to use kuid/kgid where appropriate userns: Convert the udf filesystem to use kuid/kgid where appropriate userns: Convert ubifs to use kuid/kgid userns: Convert squashfs to use kuid/kgid where appropriate userns: Convert reiserfs to use kuid and kgid where appropriate userns: Convert jfs to use kuid/kgid where appropriate userns: Convert jffs2 to use kuid and kgid where appropriate userns: Convert hpfs to use kuid and kgid where appropriate userns: Convert btrfs to use kuid/kgid where appropriate userns: Convert bfs to use kuid/kgid where appropriate userns: Convert affs to use kuid/kgid wherwe appropriate userns: On alpha modify linux_to_osf_stat to use convert from kuids and kgids userns: On ia64 deal with current_uid and current_gid being kuid and kgid userns: On ppc convert current_uid from a kuid before printing. userns: Convert s390 getting uid and gid system calls to use kuid and kgid userns: Convert s390 hypfs to use kuid and kgid where appropriate userns: Convert binder ipc to use kuids userns: Teach security_path_chown to take kuids and kgids userns: Add user namespace support to IMA userns: Convert EVM to deal with kuids and kgids in it's hmac computation ...	2012-10-02 11:11:09 -07:00
Daniel Wagner	f341980771	cgroup: net_cls: Move sock_update_classid() declaration to cls_cgroup.h The only user of sock_update_classid() is net/socket.c which happens to include cls_cgroup.h directly. tj: Fix build breakage due to missing cls_cgroup.h inclusion in drivers/net/tun.c reported in linux-next by Stephen. Signed-off-by: Daniel Wagner <daniel.wagner@bmw-carit.de> Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Li Zefan <lizefan@huawei.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Cc: Gao feng <gaofeng@cn.fujitsu.com> Cc: Jamal Hadi Salim <jhs@mojatatu.com> Cc: John Fastabend <john.r.fastabend@intel.com> Cc: netdev@vger.kernel.org Cc: cgroups@vger.kernel.org	2012-09-14 09:55:57 -07:00
Eric W. Biederman	0625c883bc	userns: Convert tun/tap to use kuid and kgid where appropriate Cc: Maxim Krasnyansky <maxk@qualcomm.com> Acked-by: David S. Miller <davem@davemloft.net> Acked-by: Serge Hallyn <serge.hallyn@canonical.com> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>	2012-08-14 21:55:31 -07:00
Stanislav Kinsbursky	66d1b9263a	tun: don't zeroize sock->file on detach This is a fix for bug, introduced in 3.4 kernel by commit `1ab5ecb90c` ("tun: don't hold network namespace by tun sockets"), which, among other things, replaced simple sock_put() by sk_release_kernel(). Below is sequence, which leads to oops for non-persistent devices: tun_chr_close() tun_detach() <== tun->socket.file = NULL tun_free_netdev() sk_release_sock() sock_release(sock->file == NULL) iput(SOCK_INODE(sock)) <== dereference on NULL pointer This patch just removes zeroing of socket's file from __tun_detach(). sock_release() will do this. Cc: stable@vger.kernel.org Reported-by: Ruan Zhijie <ruanzhijie@hotmail.com> Tested-by: Ruan Zhijie <ruanzhijie@hotmail.com> Acked-by: Al Viro <viro@ZenIV.linux.org.uk> Acked-by: Eric Dumazet <edumazet@google.com> Acked-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-08-09 16:16:14 -07:00
David S. Miller	8bbb181308	tun: Fix formatting. Signed-off-by: David S. Miller <davem@davemloft.net>	2012-07-30 14:52:48 -07:00
Mathias Krause	a117dacde0	net/tun: fix ioctl() based info leaks The tun module leaks up to 36 bytes of memory by not fully initializing a structure located on the stack that gets copied to user memory by the TUNGETIFF and SIOCGIFHWADDR ioctl()s. Signed-off-by: Mathias Krause <minipli@googlemail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-07-29 23:18:31 -07:00
Michael S. Tsirkin	0690899b4d	tun: experimental zero copy tx support Let vhost-net utilize zero copy tx when used with tun. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-07-22 12:39:33 -07:00
Michael S. Tsirkin	868eefeb17	tun: orphan frags on xmit tun xmit is actually receive of the internal tun socket. Orphan the frags same as we do for normal rx path. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-07-22 12:39:33 -07:00
Mikulas Patocka	b09e786bd1	tun: fix a crash bug and a memory leak This patch fixes a crash tun_chr_close -> netdev_run_todo -> tun_free_netdev -> sk_release_kernel -> sock_release -> iput(SOCK_INODE(sock)) introduced by commit `1ab5ecb90c` The problem is that this socket is embedded in struct tun_struct, it has no inode, iput is called on invalid inode, which modifies invalid memory and optionally causes a crash. sock_release also decrements sockets_in_use, this causes a bug that "sockets: used" field in /proc/*/net/sockstat keeps on decreasing when creating and closing tun devices. This patch introduces a flag SOCK_EXTERNALLY_ALLOCATED that instructs sock_release to not free the inode and not decrement sockets_in_use, fixing both memory corruption and sockets_in_use underflow. It should be backported to 3.3 an 3.4 stabke. Signed-off-by: Mikulas Patocka <mikulas@artax.karlin.mff.cuni.cz> Cc: stable@kernel.org Signed-off-by: David S. Miller <davem@davemloft.net>	2012-07-20 11:21:06 -07:00
Joe Perches	344dc8ede1	drivers/net: Use eth_random_addr Convert the existing uses of random_ether_addr to the new eth_random_addr. Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-07-16 22:38:28 -07:00
Joe Perches	2e42e4747e	drivers/net: Convert compare_ether_addr to ether_addr_equal Use the new bool function ether_addr_equal to add some clarity and reduce the likelihood for misuse of compare_ether_addr for sorting. Done via cocci script: $ cat compare_ether_addr.cocci @@ expression a,b; @@ - !compare_ether_addr(a, b) + ether_addr_equal(a, b) @@ expression a,b; @@ - compare_ether_addr(a, b) + !ether_addr_equal(a, b) @@ expression a,b; @@ - !ether_addr_equal(a, b) == 0 + ether_addr_equal(a, b) @@ expression a,b; @@ - !ether_addr_equal(a, b) != 0 + !ether_addr_equal(a, b) @@ expression a,b; @@ - ether_addr_equal(a, b) == 0 + !ether_addr_equal(a, b) @@ expression a,b; @@ - ether_addr_equal(a, b) != 0 + ether_addr_equal(a, b) @@ expression a,b; @@ - !!ether_addr_equal(a, b) + ether_addr_equal(a, b) Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-05-10 23:33:01 -04:00
David Howells	9ffc93f203	Remove all #inclusions of asm/system.h Remove all #inclusions of asm/system.h preparatory to splitting and killing it. Performed with the following command: perl -p -i -e 's!^#\sinclude\s<asm/system[.]h>.\n!!' `grep -Irl '^#\sinclude\s<asm/system[.]h>' ` Signed-off-by: David Howells <dhowells@redhat.com>	2012-03-28 18:30:03 +01:00
David S. Miller	4da0bd7365	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net	2012-03-18 23:29:41 -04:00
Stanislav Kinsbursky	1ab5ecb90c	tun: don't hold network namespace by tun sockets v3: added previously removed sock_put() to the tun_release() callback, because sk_release_kernel() doesn't drop the socket reference. v2: sk_release_kernel() used for socket release. Dummy tun_release() is required for sk_release_kernel() ---> sock_release() ---> sock->ops->release() call. TUN was designed to destroy it's socket on network namesapce shutdown. But this will never happen for persistent device, because it's socket holds network namespace. This patch removes of holding network namespace by TUN socket and replaces it by creating socket in init_net and then changing it's net it to desired one. On shutdown socket is moved back to init_net prior to final put. Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-03-12 17:14:00 -07:00
Danny Kukawka	f2cedb63df	net: replace random_ether_addr() with eth_hw_addr_random() Replace usage of random_ether_addr() with eth_hw_addr_random() to set addr_assign_type correctly to NET_ADDR_RANDOM. Change the trivial cases. v2: adapt to renamed eth_hw_addr_random() Signed-off-by: Danny Kukawka <danny.kukawka@bisect.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-02-15 15:34:16 -05:00
Rick Jones	84b4050111	Sweep away N/A fw_version dustbunnies from the .get_drvinfo routine of a number of drivers Per discussion with Ben Hutchings and David Miller, go through and remove assignments of "N/A" to fw_version in various drivers' .get_drvinfo routines. While there clean-up some use of bare constants and such. Signed-off-by: Rick Jones <rick.jones2@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-11-22 16:43:32 -05:00
Michał Mirosław	c8f44affb7	net: introduce and use netdev_features_t for device features sets v2: add couple missing conversions in drivers split unexporting netdev_fix_features() implemented %pNF convert sock::sk_route_(no?)caps Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-11-16 17:43:10 -05:00
Rick Jones	33a5ba144e	net: sweep-up some straglers in strlcpy conversion of .get_drvinfo routines Convert some remaining straglers' .get_drvinfo routines to use strlcpy rather than strcpy/strncpy. Signed-off-by: Rick Jones <rick.jones2@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-11-16 17:38:55 -05:00
Jiri Pirko	afc4b13df1	net: remove use of ndo_set_multicast_list in drivers replace it by ndo_set_rx_mode Signed-off-by: Jiri Pirko <jpirko@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-08-17 20:22:03 -07:00
Neil Horman	550fd08c2c	net: Audit drivers to identify those needing IFF_TX_SKB_SHARING cleared After the last patch, We are left in a state in which only drivers calling ether_setup have IFF_TX_SKB_SHARING set (we assume that drivers touching real hardware call ether_setup for their net_devices and don't hold any state in their skbs. There are a handful of drivers that violate this assumption of course, and need to be fixed up. This patch identifies those drivers, and marks them as not being able to support the safe transmission of skbs by clearning the IFF_TX_SKB_SHARING flag in priv_flags Signed-off-by: Neil Horman <nhorman@tuxdriver.com> CC: Karsten Keil <isdn@linux-pingi.de> CC: "David S. Miller" <davem@davemloft.net> CC: Jay Vosburgh <fubar@us.ibm.com> CC: Andy Gospodarek <andy@greyhouse.net> CC: Patrick McHardy <kaber@trash.net> CC: Krzysztof Halasa <khc@pm.waw.pl> CC: "John W. Linville" <linville@tuxdriver.com> CC: Greg Kroah-Hartman <gregkh@suse.de> CC: Marcel Holtmann <marcel@holtmann.org> CC: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-07-27 22:39:30 -07:00
David S. Miller	9f6ec8d697	Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 Conflicts: drivers/net/wireless/iwlwifi/iwl-agn-rxon.c drivers/net/wireless/rtlwifi/pci.c net/netfilter/ipvs/ip_vs_core.c	2011-06-20 22:29:08 -07:00
Neil Horman	bebd097a0a	tun: teach the tun/tap driver to support netpoll Commit `8d8fc29d02` changed the behavior of slave devices in regards to netpoll. Specifically it created a mutually exclusive relationship between being a slave and a netpoll-capable device. This creates problems for KVM because guests relied on needing netconsole active on a slave device to a bridge. Ideally libvirtd could just attach netconsole to the bridge device instead, but thats currently infeasible, because while the bridge device supports netpoll, it requires that all slave interface also support it, but the tun/tap driver currently does not. The most direct solution is to teach tun/tap to support netpoll, which is implemented by the patch below. I've not tested this yet, but its pretty straightforward. Signed-off-by: Neil Horman <nhorman@tuxdriver.com> Reported-by: Rik van Riel <riel@redhat.com> CC: Rik van Riel <riel@redhat.com> CC: Maxim Krasnyansky <maxk@qualcomm.com> CC: Cong Wang <amwang@redhat.com> CC: "David S. Miller" <davem@davemloft.net> Reviewed-by: Rik van Riel <riel@redhat.com> Tested-by: Rik van Riel <riel@redhat.com> Reviewed-by: WANG Cong <amwang@redhat.com> Signed-off-by: David S. Miller <davem@conan.davemloft.net>	2011-06-16 23:53:10 -04:00
Jason Wang	10a8d94a95	virtio_net: introduce VIRTIO_NET_HDR_F_DATA_VALID There's no need for the guest to validate the checksum if it have been validated by host nics. So this patch introduces a new flag - VIRTIO_NET_HDR_F_DATA_VALID which is used to bypass the checksum examing in guest. The backend (tap/macvtap) may set this flag when met skbs with CHECKSUM_UNNECESSARY to save cpu utilization. No feature negotiation is needed as old driver just ignore this flag. Iperf shows 12%-30% performance improvement for UDP traffic. For TCP, when gro is on no difference as it produces skb with partial checksum. But when gro is disabled, 20% or even higher improvement could be measured by netperf. Signed-off-by: Jason Wang <jasowang@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-06-11 15:57:47 -07:00
Amos Kong	61a5ff15eb	tun: do not put self in waitq if doing a nonblock read Perf shows a relatively high rate (about 8%) race in spin_lock_irqsave() when doing netperf between external host and guest. It's mainly becuase the lock contention between the tun_do_read() and tun_xmit_skb(), so this patch do not put self into waitqueue to reduce this kind of race. After this patch, it drops to 4%. Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Amos Kong <akong@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-06-09 00:27:10 -07:00
stephen hemminger	6f7c156c08	tun: dont force inline of functions Current standard practice is to not mark most functions as inline and let compiler decide instead. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-06-09 00:08:39 -07:00
stephen hemminger	a504b86e71	tun: reserves space for network in skb The tun driver allocates skb's to hold data from user and then passes the data into the network stack as received data. Most network devices allocate the receive skb with routines like dev_alloc_skb() that reserves additional space for use by network protocol stack but tun does not. Because of the lack of padding, when the packet is passed through bridge netfilter a new skb has to be allocated. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-06-09 00:08:38 -07:00
Joe Perches	6403eab143	drivers/net: Remove unnecessary semicolons Semicolons are not necessary after switch/while/for/if braces so remove them. Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-06-05 14:33:40 -07:00
Jiri Pirko	1c5cae815d	net: call dev_alloc_name from register_netdevice Force dev_alloc_name() to be called from register_netdevice() by dev_get_valid_name(). That allows to remove multiple explicit dev_alloc_name() calls. The possibility to call dev_alloc_name in advance remains. This also fixes veth creation regresion caused by `84c49d8c3e` Signed-off-by: Jiri Pirko <jpirko@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-05-05 10:57:45 -07:00
David Decotigny	7073949720	ethtool: cosmetic: Use ethtool ethtool_cmd_speed API This updates the network drivers so that they don't access the ethtool_cmd::speed field directly, but use ethtool_cmd_speed() instead. For most of the drivers, these changes are purely cosmetic and don't fix any problem, such as for those 1GbE/10GbE drivers that indirectly call their own ethtool get_settings()/mii_ethtool_gset(). The changes are meant to enforce code consistency and provide robustness with future larger throughputs, at the expense of a few CPU cycles for each ethtool operation. All drivers compiled with make allyesconfig ion x86_64 have been updated. Tested: make allyesconfig on x86_64 + e1000e/bnx2x work Signed-off-by: David Decotigny <decot@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-04-29 14:03:01 -07:00
Michał Mirosław	8825537521	net: tun: convert to hw_features This changes offload setting behaviour to what I think is correct: - offloads set via ethtool mean what admin wants to use (by default he wants 'em all) - offloads set via ioctl() mean what userspace is expecting to get (this limits which admin wishes are granted) - TUN_NOCHECKSUM is ignored, as it might cause broken packets when forwarded (ip_summed == CHECKSUM_UNNECESSARY means that checksum was verified, not that it can be ignored) If TUN_NOCHECKSUM is implemented, it should set skb->csum_* and skb->ip_summed (= CHECKSUM_PARTIAL) for known protocols and let others be verified by kernel when necessary. TUN_NOCHECKSUM handling was introduced by commit f43798c27684ab925adde7d8acc34c78c6e50df8: tun: Allow GSO using virtio_net_hdr Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-04-20 01:30:45 -07:00
Joe Perches	6b8a66ee91	tun: Convert logging messages to pr_<level> and tun_debug Use the current logging forms with pr_fmt. Convert DBG macro to tun_debug, use netdev_printk as well. Add printf verification when TUN_DEBUG not defined. Miscellaneous comment typo fix. Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-03-03 12:21:14 -08:00
Michał Mirosław	04ed3e741d	net: change netdev->features to u32 Quoting Ben Hutchings: we presumably won't be defining features that can only be enabled on 64-bit architectures. Occurences found by `grep -r` on net/, drivers/net, include/ [ Move features and vlan_features next to each other in struct netdev, as per Eric Dumazet's suggestion -DaveM ] Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-01-24 15:32:47 -08:00
Linus Torvalds	008d23e485	Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial * 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (43 commits) Documentation/trace/events.txt: Remove obsolete sched_signal_send. writeback: fix global_dirty_limits comment runtime -> real-time ppc: fix comment typo singal -> signal drivers: fix comment typo diable -> disable. m68k: fix comment typo diable -> disable. wireless: comment typo fix diable -> disable. media: comment typo fix diable -> disable. remove doc for obsolete dynamic-printk kernel-parameter remove extraneous 'is' from Documentation/iostats.txt Fix spelling milisec -> ms in snd_ps3 module parameter description Fix spelling mistakes in comments Revert conflicting V4L changes i7core_edac: fix typos in comments mm/rmap.c: fix comment sound, ca0106: Fix assignment to 'channel'. hrtimer: fix a typo in comment init/Kconfig: fix typo anon_inodes: fix wrong function name in comment fix comment typos concerning "consistent" poll: fix a typo in comment ... Fix up trivial conflicts in: - drivers/net/wireless/iwlwifi/iwl-core.c (moved to iwl-legacy.c) - fs/ext4/ext4.h Also fix missed 'diabled' typo in drivers/net/bnx2x/bnx2x.h while at it.	2011-01-13 10:05:56 -08:00
Michał Mirosław	55508d601d	net: Use skb_checksum_start_offset() Replace skb->csum_start - skb_headroom(skb) with skb_checksum_start_offset(). Note for usb/smsc95xx: skb->data - skb->head == skb_headroom(skb). Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-12-16 14:43:14 -08:00
Uwe Kleine-König	b595076a18	tree-wide: fix comment/printk typos "gadget", "through", "command", "maintain", "maintain", "controller", "address", "between", "initiali[zs]e", "instead", "function", "select", "already", "equal", "access", "management", "hierarchy", "registration", "interest", "relative", "memory", "offset", "already", Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de> Signed-off-by: Jiri Kosina <jkosina@suse.cz>	2010-11-01 15:38:34 -04:00
Nolan Leake	bee31369ce	tun: keep link (carrier) state up to date Currently, only ethtool can get accurate link state of a tap device. With this patch, IFF_RUNNING and IF_OPER_UP/DOWN are kept up to date as well. Signed-off-by: Nolan Leake <nolan@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-07-30 22:06:41 -07:00
Michael S. Tsirkin	ef3db4a595	tun: avoid BUG, dump packet on GSO errors There are still some LRO cards that cause GSO errors in tun, and BUG on this is an unfriendly way to tell the admin to disable LRO. Further, experience shows we might have more GSO bugs lurking. See https://bugzilla.kernel.org/show_bug.cgi?id=16413 as a recent example. dumping a packet will make it easier to figure it out. Replace BUG with warning+dump+drop the packet to make GSO errors in tun less critical and easier to debug. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Tested-by: Alex Unigovsky <unik@compot.ru> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-07-24 20:47:20 -07:00
Linus Torvalds	b1cdc4670b	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6 * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (63 commits) drivers/net/usb/asix.c: Fix pointer cast. be2net: Bug fix to avoid disabling bottom half during firmware upgrade. proc_dointvec: write a single value hso: add support for new products Phonet: fix potential use-after-free in pep_sock_close() ath9k: remove VEOL support for ad-hoc ath9k: change beacon allocation to prefer the first beacon slot sock.h: fix kernel-doc warning cls_cgroup: Fix build error when built-in macvlan: do proper cleanup in macvlan_common_newlink() V2 be2net: Bug fix in init code in probe net/dccp: expansion of error code size ath9k: Fix rx of mcast/bcast frames in PS mode with auto sleep wireless: fix sta_info.h kernel-doc warnings wireless: fix mac80211.h kernel-doc warnings iwlwifi: testing the wrong variable in iwl_add_bssid_station() ath9k_htc: rare leak in ath9k_hif_usb_alloc_tx_urbs() ath9k_htc: dereferencing before check in hif_usb_tx_cb() rt2x00: Fix rt2800usb TX descriptor writing. rt2x00: Fix failed SLEEP->AWAKE and AWAKE->SLEEP transitions. ...	2010-05-25 16:59:51 -07:00
Kay Sievers	578454ff7e	driver core: add devname module aliases to allow module on-demand auto-loading This adds: alias: devname:<name> to some common kernel modules, which will allow the on-demand loading of the kernel module when the device node is accessed. Ideally all these modules would be compiled-in, but distros seems too much in love with their modularization that we need to cover the common cases with this new facility. It will allow us to remove a bunch of pretty useless init scripts and modprobes from init scripts. The static device node aliases will be carried in the module itself. The program depmod will extract this information to a file in the module directory: $ cat /lib/modules/2.6.34-00650-g537b60d-dirty/modules.devname # Device nodes to trigger on-demand module loading. microcode cpu/microcode c10:184 fuse fuse c10:229 ppp_generic ppp c108:0 tun net/tun c10:200 dm_mod mapper/control c10:235 Udev will pick up the depmod created file on startup and create all the static device nodes which the kernel modules specify, so that these modules get automatically loaded when the device node is accessed: $ /sbin/udevd --debug ... static_dev_create_from_modules: mknod '/dev/cpu/microcode' c10:184 static_dev_create_from_modules: mknod '/dev/fuse' c10:229 static_dev_create_from_modules: mknod '/dev/ppp' c108:0 static_dev_create_from_modules: mknod '/dev/net/tun' c10:200 static_dev_create_from_modules: mknod '/dev/mapper/control' c10:235 udev_rules_apply_static_dev_perms: chmod '/dev/net/tun' 0666 udev_rules_apply_static_dev_perms: chmod '/dev/fuse' 0666 A few device nodes are switched to statically allocated numbers, to allow the static nodes to work. This might also useful for systems which still run a plain static /dev, which is completely unsafe to use with any dynamic minor numbers. Note: The devname aliases must be limited to the common and singleinstance* device nodes, like the misc devices, and never be used for conceptually limited systems like the loop devices, which should rather get fixed properly and get a control node for losetup to talk to, instead of creating a random number of device nodes in advance, regardless if they are ever used. This facility is to hide the mess distros are creating with too modualized kernels, and just to hide that these modules are not compiled-in, and not to paper-over broken concepts. Thanks! :) Cc: Greg Kroah-Hartman <gregkh@suse.de> Cc: David S. Miller <davem@davemloft.net> Cc: Miklos Szeredi <miklos@szeredi.hu> Cc: Chris Mason <chris.mason@oracle.com> Cc: Alasdair G Kergon <agk@redhat.com> Cc: Tigran Aivazian <tigran@aivazian.fsnet.co.uk> Cc: Ian Kent <raven@themaw.net> Signed-Off-By: Kay Sievers <kay.sievers@vrfy.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2010-05-25 15:08:26 -07:00
Herbert Xu	8286274284	tun: Update classid on packet injection This patch makes tun update its socket classid every time we inject a packet into the network stack. This is so that any updates made by the admin to the process writing packets to tun is effected. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-05-24 00:14:10 -07:00
Joe Perches	ee289b6440	drivers/net: remove useless semicolons switch and while statements don't need semicolons at end of statement [ Fixup minor conflicts with recent wimax merge... -DaveM ] Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-05-17 22:47:34 -07:00
Joe Perches	a4b770972b	drivers/net: Remove unnecessary returns from void function()s This patch removes from drivers/net/ all the unnecessary return; statements that precede the last closing brace of void functions. It does not remove the returns that are immediately preceded by a label as gcc doesn't like that. It also does not remove null void functions with return. Done via: $ grep -rP --include=*.[ch] -l "return;\n}" net/ \| \ xargs perl -i -e 'local $/ ; while (<>) { s/\n[ \t\n]+return;\n}/\n}/g; print; }' with some cleanups by hand. Compile tested x86 allmodconfig only. Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-05-14 00:19:28 -07:00
Eric Dumazet	1ae5dc342a	net: trans_start cleanups Now that core network takes care of trans_start updates, dont do it in drivers themselves, if possible. Drivers can avoid one cache miss (on dev->trans_start) in their start_xmit() handler. Exceptions are NETIF_F_LLTX drivers Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-05-10 05:01:31 -07:00
Michael S. Tsirkin	d9d52b5178	tun: add ioctl to modify vnet header size virtio added mergeable buffers mode where 2 bytes of extra info is put after vnet header but before actual data (tun does not need this data). In hindsight, it would have been better to add the new info before the packet: as it is, users need a lot of tricky code to skip the extra 2 bytes in the middle of the iovec, and in fact applications seem to get it wrong, and only work with specific iovec layout. The fact we might need to split iovec also means we might in theory overflow iovec max size. This patch adds a simpler way for applications to handle this, and future proofs the interface against further extensions, by making the size of the virtio net header configurable from userspace. As a result, tun driver will simply skip the extra 2 bytes on both input and output. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: David S. Miller <davem@davemloft.net>	2010-05-03 12:33:13 +03:00
Eric Dumazet	4381548237	net: sock_def_readable() and friends RCU conversion sk_callback_lock rwlock actually protects sk->sk_sleep pointer, so we need two atomic operations (and associated dirtying) per incoming packet. RCU conversion is pretty much needed : 1) Add a new structure, called "struct socket_wq" to hold all fields that will need rcu_read_lock() protection (currently: a wait_queue_head_t and a struct fasync_struct pointer). [Future patch will add a list anchor for wakeup coalescing] 2) Attach one of such structure to each "struct socket" created in sock_alloc_inode(). 3) Respect RCU grace period when freeing a "struct socket_wq" 4) Change sk_sleep pointer in "struct sock" by sk_wq, pointer to "struct socket_wq" 5) Change sk_sleep() function to use new sk->sk_wq instead of sk->sk_sleep 6) Change sk_has_sleeper() to wq_has_sleeper() that must be used inside a rcu_read_lock() section. 7) Change all sk_has_sleeper() callers to : - Use rcu_read_lock() instead of read_lock(&sk->sk_callback_lock) - Use wq_has_sleeper() to eventually wakeup tasks. - Use rcu_read_unlock() instead of read_unlock(&sk->sk_callback_lock) 8) sock_wake_async() is modified to use rcu protection as well. 9) Exceptions : macvtap, drivers/net/tun.c, af_unix use integrated "struct socket_wq" instead of dynamically allocated ones. They dont need rcu freeing. Some cleanups or followups are probably needed, (possible sk_callback_lock conversion to a spinlock for example...). Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-05-01 15:00:15 -07:00
Eric Dumazet	aa39514516	net: sk_sleep() helper Define a new function to return the waitqueue of a "struct sock". static inline wait_queue_head_t sk_sleep(struct sock sk) { return sk->sk_sleep; } Change all read occurrences of sk_sleep by a call to this function. Needed for a future RCU conversion. sk_sleep wont be a field directly available. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-04-20 16:37:13 -07:00
Michael S. Tsirkin	0110d6f22f	tun: orphan an skb on tx The following situation was observed in the field: tap1 sends packets, tap2 does not consume them, as a result tap1 can not be closed. This happens because tun/tap devices can hang on to skbs undefinitely. As noted by Herbert, possible solutions include a timeout followed by a copy/change of ownership of the skb, or always copying/changing ownership if we're going into a hostile device. This patch implements the second approach. Note: one issue still remaining is that since skbs keep reference to tun socket and tun socket has a reference to tun device, we won't flush backlog, instead simply waiting for all skbs to get transmitted. At least this is not user-triggerable, and this was not reported in practice, my assumption is other devices besides tap complete an skb within finite time after it has been queued. A possible solution for the second issue would not to have socket reference the device, instead, implement dev->destructor for tun, and wait for all skbs to complete there, but this needs some thought, probably too risky for 2.6.34. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Tested-by: Yan Vugenfirer <yvugenfi@redhat.com> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-04-14 04:52:03 -07:00
Jiri Kosina	318ae2edc3	Merge branch 'for-next' into for-linus Conflicts: Documentation/filesystems/proc.txt arch/arm/mach-u300/include/mach/debug-macro.S drivers/net/qlge/qlge_ethtool.c drivers/net/qlge/qlge_main.c drivers/net/typhoon.c	2010-03-08 16:55:37 +01:00
Michael S. Tsirkin	9940516259	tun: socket filter support This patch adds Linux Socket Filter support to tun driver. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-02-17 16:35:17 -08:00
Daniel Mack	3ad2f3fbb9	tree-wide: Assorted spelling fixes In particular, several occurances of funny versions of 'success', 'unknown', 'therefore', 'acknowledge', 'argument', 'achieve', 'address', 'beginning', 'desirable', 'separate' and 'necessary' are fixed. Signed-off-by: Daniel Mack <daniel@caiaq.de> Cc: Joe Perches <joe@perches.com> Cc: Junio C Hamano <gitster@pobox.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz>	2010-02-09 11:13:56 +01:00
Michael S. Tsirkin	05c2828c72	tun: export underlying socket Tun device looks similar to a packet socket in that both pass complete frames from/to userspace. This patch fills in enough fields in the socket underlying tun driver to support sendmsg/recvmsg operations, and message flags MSG_TRUNC and MSG_DONTWAIT, and exports access to this socket to modules. Regular read/write behaviour is unchanged. This way, code using raw sockets to inject packets into a physical device, can support injecting packets into host network stack almost without modification. First user of this interface will be vhost virtualization accelerator. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Acked-by: David S. Miller <davem@davemloft.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-01-15 01:43:28 -08:00
Vitaliy Gusev	80924e5f7d	tun: use tun_sk instead container_of Using macro tun_sk is more clear and shorter. However tun.c has tun_sk, but doesn't use it. Signed-off-by: Vitaliy Gusev <vgusev@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2009-12-26 20:24:44 -08:00
Arnd Bergmann	50857e2a59	net/tun: handle compat_ioctl directly The tun driver is the only code in the kernel that operates on a character device with struct ifreq. Change the driver to handle the conversion itself so we can contain the remaining ifreq handling in the socket layer. This also fixes a bug in the handling of invalid ioctl numbers on an unbound tun device. The driver treats this as a TUNSETIFF in native mode, but there is no way for the generic compat_ioctl() function to emulate this behaviour. Possibly the driver was only doing this accidentally anyway, but if any code relies on this misfeature, it now also works in compat mode. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2009-11-06 22:52:32 -08:00
Thomas Gleixner	deed49fbb6	net: Remove BKL from tun The lock_kernel/unlock_kernel() in cycle_kernel_lock() which is called in tun_chr_open() is not serializing against anything and safe to remove. tun_chr_fasync() is serialized by get/put_tun() and fasync_helper() has no dependency on BKL. The modification of tun->flags is racy with and without the BKL so removing it does not make it worse. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2009-10-14 01:19:46 -07:00
David S. Miller	8b3f6af863	Merge branch 'master' of /home/davem/src/GIT/linux-2.6/ Conflicts: drivers/staging/Kconfig drivers/staging/Makefile drivers/staging/cpc-usb/TODO drivers/staging/cpc-usb/cpc-usb_drv.c drivers/staging/cpc-usb/cpc.h drivers/staging/cpc-usb/cpc_int.h drivers/staging/cpc-usb/cpcusb.h	2009-09-24 15:13:11 -07:00
Kusanagi Kouichi	36989b9087	tun: Return -EINVAL if neither IFF_TUN nor IFF_TAP is set. After commit `2b980dbd77` ("lsm: Add hooks to the TUN driver") tun_set_iff doesn't return -EINVAL though neither IFF_TUN nor IFF_TAP is set. Signed-off-by: Kusanagi Kouichi <slash@ma.neweb.ne.jp> Reviewed-by: Paul Moore <paul.moore@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2009-09-22 14:00:16 -07:00
Kay Sievers	e454cea20b	Driver-Core: extend devnode callbacks to provide permissions This allows subsytems to provide devtmpfs with non-default permissions for the device node. Instead of the default mode of 0600, null, zero, random, urandom, full, tty, ptmx now have a mode of 0666, which allows non-privileged processes to access standard device nodes in case no other userspace process applies the expected permissions. This also fixes a wrong assignment in pktcdvd and a checkpatch.pl complain. Signed-off-by: Kay Sievers <kay.sievers@vrfy.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2009-09-19 12:50:38 -07:00
Linus Torvalds	d7e9660ad9	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6 * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6: (1623 commits) netxen: update copyright netxen: fix tx timeout recovery netxen: fix file firmware leak netxen: improve pci memory access netxen: change firmware write size tg3: Fix return ring size breakage netxen: build fix for INET=n cdc-phonet: autoconfigure Phonet address Phonet: back-end for autoconfigured addresses Phonet: fix netlink address dump error handling ipv6: Add IFA_F_DADFAILED flag net: Add DEVTYPE support for Ethernet based devices mv643xx_eth.c: remove unused txq_set_wrr() ucc_geth: Fix hangs after switching from full to half duplex ucc_geth: Rearrange some code to avoid forward declarations phy/marvell: Make non-aneg speed/duplex forcing work for 88E1111 PHYs drivers/net/phy: introduce missing kfree drivers/net/wan: introduce missing kfree net: force bridge module(s) to be GPL Subject: [PATCH] appletalk: Fix skb leak when ipddp interface is not loaded ... Fixed up trivial conflicts: - arch/x86/include/asm/socket.h converted to <asm-generic/socket.h> in the x86 tree. The generic header has the same new #define's, so that works out fine. - drivers/net/tun.c fix conflict between `89f56d1e9` ("tun: reuse struct sock fields") that switched over to using 'tun->socket.sk' instead of the redundantly available (and thus removed) 'tun->sk', and `2b980dbd` ("lsm: Add hooks to the TUN driver") which added a new 'tun->sk' use. Noted in 'next' by Stephen Rothwell.	2009-09-14 10:37:28 -07:00
Michael S. Tsirkin	89f56d1e91	tun: reuse struct sock fields As tun always has an embeedded struct sock, use sk and sk_receive_queue fields instead of duplicating them in tun_struct. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2009-09-01 17:40:33 -07:00
Stephen Hemminger	424efe9caf	netdev: convert pseudo drivers to netdev_tx_t These are all drivers that don't touch real hardware. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2009-09-01 01:13:40 -07:00
Paul Moore	2b980dbd77	lsm: Add hooks to the TUN driver The TUN driver lacks any LSM hooks which makes it difficult for LSM modules, such as SELinux, to enforce access controls on network traffic generated by TUN users; this is particularly problematic for virtualization apps such as QEMU and KVM. This patch adds three new LSM hooks designed to control the creation and attachment of TUN devices, the hooks are: * security_tun_dev_create() Provides access control for the creation of new TUN devices * security_tun_dev_post_create() Provides the ability to create the necessary socket LSM state for newly created TUN devices * security_tun_dev_attach() Provides access control for attaching to existing, persistent TUN devices and the ability to update the TUN device's socket LSM state as necessary Signed-off-by: Paul Moore <paul.moore@hp.com> Acked-by: Eric Paris <eparis@parisplace.org> Acked-by: Serge Hallyn <serue@us.ibm.com> Acked-by: David S. Miller <davem@davemloft.net> Signed-off-by: James Morris <jmorris@namei.org>	2009-09-01 08:29:48 +10:00
David S. Miller	aa11d958d1	Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 Conflicts: arch/microblaze/include/asm/socket.h	2009-08-12 17:44:53 -07:00
Herbert Xu	876bfd4d0f	tun: Extend RTNL lock coverage over whole ioctl As it is, parts of the ioctl runs under the RTNL and parts of it do not. The unlocked section is still protected by the BKL, but there can be subtle races. For example, Eric Biederman and Paul Moore observed that if two threads tried to create two tun devices on the same file descriptor, then unexpected results may occur. As there isn't anything in the ioctl that is expected to sleep indefinitely, we can prevent this from occurring by extending the RTNL lock coverage. This also allows to get rid of the BKL. Finally, I changed tun_get_iff to take a tun device in order to avoid calling tun_put which would dead-lock as it also tries to take the RTNL lock. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2009-08-09 21:45:35 -07:00
Sridhar Samudrala	e36aa25a53	tun: Allow tap device to send/receive UFO packets. - Allow setting UFO on tap device and handle UFO packets. Signed-off-by: Sridhar Samudrala <sri@us.ibm.com> --------------------------------------------------------- Signed-off-by: David S. Miller <davem@davemloft.net>	2009-07-17 10:11:00 -07:00
David S. Miller	e5a8a896f5	Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6	2009-07-09 20:18:24 -07:00
Paul Moore	460deefae6	tun: Remove a dead line of code Remove an unnecessary assignment. Signed-off-by: Paul Moore <paul.moore@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2009-07-07 19:22:11 -07:00
Mariusz Kozlowski	3c8a9c63d5	tun/tap: Fix crashes if open() /dev/net/tun and then poll() it. Fix NULL pointer dereference in tun_chr_pool() introduced by commit `33dccbb050` ("tun: Limit amount of queued packets per device") and triggered by this code: int fd; struct pollfd pfd; fd = open("/dev/net/tun", O_RDWR); pfd.fd = fd; pfd.events = POLLIN \| POLLOUT; poll(&pfd, 1, 0); Reported-by: Eugene Kapun <abacabadabacaba@gmail.com> Signed-off-by: Mariusz Kozlowski <m.kozlowski@tuxland.pl> Signed-off-by: David S. Miller <davem@davemloft.net>	2009-07-06 12:47:07 -07:00
Patrick McHardy	6ed106549d	net: use NETDEV_TX_OK instead of 0 in ndo_start_xmit() functions This patch is the result of an automatic spatch transformation to convert all ndo_start_xmit() return values of 0 to NETDEV_TX_OK. Some occurences are missed by the automatic conversion, those will be handled in a seperate patch. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2009-07-05 19:16:04 -07:00
Herbert Xu	d23e43658a	tun: Fix device unregister race It is currently possible for an asynchronous device unregister to cause the same tun device to be unregistered twice. This is because the unregister in tun_chr_close only checks whether __tun_get(tfile) != NULL. This however has nothing to do with whether the device has already been unregistered. All it tells you is whether __tun_detach has been called. This patch fixes this by using the most obvious thing to test whether the device has been unregistered. It also moves __tun_detach outside of rtnl_unlock since nothing that it does requires that lock. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2009-07-05 18:03:18 -07:00
Kay Sievers	d405640539	Driver Core: misc: add nodename support for misc devices. This adds support for misc devices to report their requested nodename to userspace. It also updates a number of misc drivers to provide the needed subdirectory and device name to be used for them. Signed-off-by: Kay Sievers <kay.sievers@vrfy.org> Signed-off-by: Jan Blunck <jblunck@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2009-06-15 21:30:25 -07:00
Eric W. Biederman	f0a4d0e5b5	tun: Fix unregister race It is possible for tun_chr_close to race with dellink on the a tun device. In which case if __tun_get runs before dellink but dellink runs before tun_chr_close calls unregister_netdevice we will attempt to unregister the netdevice after it is already gone. The two cases are already serialized on the rtnl_lock, so I have gone for the cheap simple fix of moving rtnl_lock to cover __tun_get in tun_chr_close. Eliminating the possibility of the tun device being unregistered between __tun_get and unregister_netdevice in tun_chr_close. Signed-off-by: Eric W. Biederman <ebiederm@aristanetworks.com> Tested-by: David Woodhouse <David.Woodhouse@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2009-06-08 00:44:31 -07:00
Sridhar Samudrala	6f536f4039	tun: Fix copy/paste error in tun_get_user Use the right structure while incrementing the offset in tun_get_user. Signed-off-by: Sridhar Samudrala <sri@us.ibm.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2009-06-08 00:27:28 -07:00
Herbert Xu	4909122fb8	tun: Optimise handling of bogus gso->hdr_len As all current versions of virtio_net generate a value for the header length that's too small, we should optimise this so that we don't copy it twice. This can be done by ensuring that it is at least as large as the place where we'll write the checksum. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2009-06-08 00:20:01 -07:00
Herbert Xu	c722c625db	tun: Only wake up writers When I added socket accounting to tun I inadvertently introduced spurious wake-up events that kills qemu performance. The problem occurs when qemu polls on the tun fd for read, and then transmits packets. For each packet transmitted, we will wake up qemu even if it only cares about read events. Now this affects all sockets, but it is only a new problem for tun. So this patch tries to fix it for tun first and we can then look at the problem in general. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2009-06-03 21:45:55 -07:00
David Woodhouse	980c9e8cee	tun: add tun_flags, owner, group attributes in sysfs This patch adds three attribute files in /sys/class/net/$dev/ for tun devices; allowing userspace to obtain the information which TUNGETIFF offers, and more, but without having to attach to the device in question (which may not be possible if it's in use). It also fixes a bug which has been present in the TUNGETIFF ioctl since its inception, where it would never set IFF_TUN or IFF_TAP according to the device type. (Look carefully at the code which I remove from tun_get_iff() and how the new tun_flags() helper is subtly different). Signed-off-by: David Woodhouse <David.Woodhouse@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2009-05-09 22:54:21 -07:00
David Woodhouse	f85ba78068	tun: add IFF_TUN_EXCL flag to avoid opening a persistent device. When creating a certain types of VPN, NetworkManager will first attempt to find an available tun device by iterating through 'vpn%d' until it finds one that isn't already busy. Then it'll set that to be persistent and owned by the otherwise unprivileged user that the VPN dæmon itself runs as. There's a race condition here -- during the period where the vpn%d device is created and we're waiting for the VPN dæmon to actually connect and use it, if we try to create _another_ device we could end up re-using the same one -- because trying to open it again doesn't get -EBUSY as it would while it's _actually_ busy. So solve this, we add an IFF_TUN_EXCL flag which causes tun_set_iff() to fail if it would be opening an existing persistent tundevice -- so that we can make sure we're getting an entirely _new_ device. Signed-off-by: David Woodhouse <David.Woodhouse@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2009-04-27 03:23:54 -07:00
Michael S. Tsirkin	6f26c9a755	tun: fix tun_chr_aio_write so that aio works aio_write gets const struct iovec * but tun_chr_aio_write casts this to struct iovec * and modifies the iovec. As a result, attempts to use io_submit to send packets to a tun device fail with weird errors such as EINVAL. Since tun is the only user of skb_copy_datagram_from_iovec, we can fix this simply by changing the later so that it does not touch the iovec passed to it. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2009-04-21 05:42:46 -07:00
Michael S. Tsirkin	43b39dcdbd	tun: fix tun_chr_aio_read so that aio works aio_read gets const struct iovec * but tun_chr_aio_read casts this to struct iovec * and modifies the iovec. As a result, attempts to use io_submit to get packets from a tun device fail with weird errors such as EINVAL. Fix by using the new skb_copy_datagram_const_iovec. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2009-04-21 05:42:45 -07:00
Herbert Xu	c40af84a67	tun: Fix sk_sleep races when attaching/detaching As the sk_sleep wait queue actually lives in tfile, which may be detached from the tun device, bad things will happen when we use sk_sleep after detaching. Since the tun device is the persistent data structure here (when requested by the user), it makes much more sense to have the wait queue live there. There is no reason to have it in tfile at all since the only time we can wait is if we have a tun attached. In fact we already have a wait queue in tun_struct, so we might as well use it. Reported-by: Eric W. Biederman <ebiederm@xmission.com> Tested-by: Christian Borntraeger <borntraeger@de.ibm.com> Tested-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2009-04-20 03:01:48 -07:00
Herbert Xu	9c3fea6ab0	tun: Only free a netdev when all tun descriptors are closed The commit `c70f182940` ("tun: Fix races between tun_net_close and free_netdev") fixed a race where an asynchronous deletion of a tun device can hose a poll(2) on a tun fd attached to that device. However, this came at the cost of moving the tun wait queue into the tun file data structure. The problem with this is that it imposes restrictions on when and where the tun device can access the wait queue since the tun file may change at any time due to detaching and reattaching. In particular, now that we need to use the wait queue on the receive path it becomes difficult to properly synchronise this with the detachment of the tun device. This patch solves the original race in a different way. Since the race is only because the underlying memory gets freed, we can prevent it simply by ensuring that we don't do that until all tun descriptors ever attached to the device (even if they have since be detached because they may still be sitting in poll) have been closed. This is done by using reference counting the attached tun file descriptors. The refcount in tun->sk has been reappropriated for this purpose since it was already being used for that, albeit from the opposite angle. Note that we no longer zero tfile->tun since tun_get will return NULL anyway after the refcount on tfile hits zero. Instead it represents whether this device has ever been attached to a device. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2009-04-20 03:01:47 -07:00
Herbert Xu	0eca93bcf7	tun: Fix crash with non-GSO users When I made the tun driver use non-linear packets as the preferred option, it broke non-GSO users because they would end up allocating a completely non-linear packet, which triggers a crash when we call eth_type_trans. This patch reverts non-GSO users to using linear packets and adds a check to ensure that GSO users can't cause crashes in the same way. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2009-04-14 02:09:43 -07:00
Herbert Xu	ab46d77966	tun: Fix merge error When forward-porting the tun accounting patch I managed to break the send path compltely by dropping the tun_get call. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2009-02-14 20:46:39 -08:00
David S. Miller	0ecc103aec	Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 Conflicts: drivers/net/gianfar.c	2009-02-09 23:22:21 -08:00
Alex Williamson	cfbf84fcbc	tun: Fix unicast filter overflow Tap devices can make use of a small MAC filter set via the TUNSETTXFILTER ioctl. The filter has a set of exact matches plus a hash for imperfect filtering of additional multicast addresses. The current code is unbalanced, adding unicast addresses to the multicast hash, but only checking the hash against multicast addresses. This results in the filter dropping unicast addresses that overflow the exact filter. The fix is simply to disable the filter by leaving count set to zero if we find non-multicast addresses after the exact match table is filled. Signed-off-by: Alex Williamson <alex.williamson@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2009-02-08 17:49:17 -08:00
Herbert Xu	33dccbb050	tun: Limit amount of queued packets per device Unlike a normal socket path, the tuntap device send path does not have any accounting. This means that the user-space sender may be able to pin down arbitrary amounts of kernel memory by continuing to send data to an end-point that is congested. Even when this isn't an issue because of limited queueing at most end points, this can also be a problem because its only response to congestion is packet loss. That is, when those local queues at the end-point fills up, the tuntap device will start wasting system time because it will continue to send data there which simply gets dropped straight away. Of course one could argue that everybody should do congestion control end-to-end, unfortunately there are people in this world still hooked on UDP, and they don't appear to be going away anywhere fast. In fact, we've always helped them by performing accounting in our UDP code, the sole purpose of which is to provide congestion feedback other than through packet loss. This patch attempts to apply the same bandaid to the tuntap device. It creates a pseudo-socket object which is used to account our packets just as a normal socket does for UDP. Of course things are a little complex because we're actually reinjecting traffic back into the stack rather than out of the stack. The stack complexities however should have been resolved by preceding patches. So this one can simply start using skb_set_owner_w. For now the accounting is essentially disabled by default for backwards compatibility. In particular, we set the cap to INT_MAX. This is so that existing applications don't get confused by the sudden arrival EAGAIN errors. In future we may wish (or be forced to) do this by default. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2009-02-05 21:25:32 -08:00
Michael Tokarev	1bded710a5	tun: Check supplemental groups in TUN/TAP driver. Michael Tokarev wrote: [] > 2, and this is the main one: How about supplementary groups? > > Here I have a valid usage case: a group of testers running various > versions of windows using KVM (kernel virtual machine), 1 at a time, > to test some software. kvm is set up to use bridge with a tap device > (there should be a way to connect to the machine). Anyone on that group > has to be able to start/stop the virtual machines. > > My first attempt - pretty obvious when I saw -g option of tunctl - is > to add group ownership for the tun device and add a supplementary group > to each user (their primary group should be different). But that fails, > since kernel only checks for egid, not any other group ids. > > What's the reasoning to not allow supplementary groups and to only check > for egid? Signed-off-by: Michael Tokarev <mjt@tls.msk.ru> Signed-off-by: David S. Miller <davem@davemloft.net>	2009-02-02 23:34:56 -08:00
Harvey Harrison	09640e6365	net: replace uses of __constant_{endian} Base versions handle constant folding now. Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2009-02-01 00:45:17 -08:00
Eric W. Biederman	f019a7a594	tun: Implement ip link del tunXXX This greatly simplifies testing to verify I have fixed the problems with a tun device disappearing when the tun file descriptor is still held open. Further it allows removal network namespace operations for the tun driver. Reducing the network namespace handling in the driver to the minimum. i.e. When we are creating a tun device. Signed-off-by: Eric W. Biederman <ebiederm@aristanetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2009-01-21 16:02:16 -08:00
Eric W. Biederman	aec191aa2a	tun: There is no longer any need to deny changing network namespaces With the awkward case between free_netdev and dev_chr_close fixed there is no longer any need to limit tun and tap devices to the network namespace they were created in. So remove the NETIF_F_NETNS_LOCAL flag on the network device. Signed-off-by: Eric W. Biederman <ebiederm@aristanetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2009-01-21 16:00:47 -08:00
Eric W. Biederman	c70f182940	tun: Fix races between tun_net_close and free_netdev. The tun code does not cope gracefully if the network device goes away before the tun file descriptor is closed. It looks like we can trigger this with rmmod, and moving tun devices between network namespaces will allow this to be triggered when network namespaces exit. To fix this I introduce an intermediate data structure tun_file which holds a count of users and a pointer to the struct tun_struct. tun_get increments that reference count if it is greater than 0. tun_put decrements that reference count and detaches from the network device if the count is 0. While we have a file attached to the network device I hold a reference to the network device keeping it from going away completely. When a network device is unregistered I decrement the count of the attached tun_file and if that was the last user I detach the tun_file, and all processes on read_wait are woken up to ensure they do not sleep indefinitely. As some of those sleeps happen with the count on the tun device elevated waking up the read waiters ensures that tun_file will be detached in a timely manner. Signed-off-by: Eric W. Biederman <ebiederm@aristanetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2009-01-21 16:00:46 -08:00
Eric W. Biederman	b2430de37e	tun: Move read_wait into tun_file The poll interface requires that the waitqueue exist while the struct file is open. In the rare case when a tun device disappears before the tun file closes we fail to provide this property, so move read_wait. This is safe now that tun_net_xmit is atomic with tun_detach. Signed-off-by: Eric W. Biederman <ebiederm@aristanetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2009-01-21 16:00:46 -08:00
Eric W. Biederman	38231b7a8d	tun: Make tun_net_xmit atomic wrt tun_attach && tun_detach Currently this small race allows for a packet to be received when we detach from an tun device and still be enqueued. Not especially important but not what the code is trying to do. Signed-off-by: Eric W. Biederman <ebiederm@aristanetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2009-01-21 16:00:45 -08:00
Eric W. Biederman	36b50bab53	tun: Grab the netns in open. Grabbing namespaces in open, and putting them in close always seems to be the cleanest approach with the fewest surprises. So now that we have tun_file so we have somepleace to put the network namespace, let's grab the network namespace on file open and put on file close. Signed-off-by: Eric W. Biederman <ebiederm@aristanetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2009-01-21 16:00:45 -08:00
Eric W. Biederman	631ab46b79	tun: Introduce tun_file Currently the tun code suffers from only having a single word of data that exists for the entire life of the tun file descriptor. This results in peculiar holding of references to the network namespace as well as races between free_netdevice and tun_chr_close. Fix this by introducing tun_file which will hold the per file state. For the moment it still holds just a single word so the differences are all logic changes with no changes in semantics. Signed-off-by: Eric W. Biederman <ebiederm@aristanetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2009-01-21 16:00:44 -08:00
Eric W. Biederman	eac9e90265	tun: Use POLLERR not EBADF in tun_chr_poll EBADF is meaningless in the context of a poll mask so use POLLERR instead. Signed-off-by: Eric W. Biederman <ebiederm@aristanetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2009-01-21 16:00:44 -08:00
Eric W. Biederman	a7385ba211	tun: Fix races in tun_set_iff It is possible for two different tasks with access to the same file descriptor to call tun_set_iff on it at the same time and race to attach to a tap device. Prevent this by placing all of the logic to attach to a file descriptor in one function and testing the file descriptor to be certain it is not already attached to another tun device. Signed-off-by: Eric W. Biederman <ebiederm@aristanetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2009-01-21 16:00:43 -08:00
Eric W. Biederman	74a3e5a71c	tun: Remove unnecessary tun_get_by_name Currently the tun driver keeps a private list of tun devices for what appears to be a small gain in performance when reconnecting a file descriptor to an existing tun or tap device. So simplify the code by removing it. Signed-off-by: Eric W. Biederman <ebiederm@aristanetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2009-01-21 16:00:43 -08:00
Gerrit Renker	745417e206	tun: Eliminate sparse signedness warning register_pernet_gen_device() expects 'int', found via sparse. CHECK drivers/net/tun.c drivers/net/tun.c:1245:36: warning: incorrect type in argument 1 (different signedness) drivers/net/tun.c:1245:36: expected int id drivers/net/tun.c:1245:36: got unsigned int static [toplevel] *<noident> Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: David S. Miller <davem@davemloft.net>	2009-01-04 17:14:46 -08:00
Kusanagi Kouichi	7a0a9608e4	tun: Fix SIOCSIFHWADDR error. Set proper operations. Signed-off-by: Kusanagi Kouichi <slash@ma.neweb.ne.jp> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-12-29 18:23:28 -08:00
Linus Torvalds	0191b625ca	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6 * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6: (1429 commits) net: Allow dependancies of FDDI & Tokenring to be modular. igb: Fix build warning when DCA is disabled. net: Fix warning fallout from recent NAPI interface changes. gro: Fix potential use after free sfc: If AN is enabled, always read speed/duplex from the AN advertising bits sfc: When disabling the NIC, close the device rather than unregistering it sfc: SFT9001: Add cable diagnostics sfc: Add support for multiple PHY self-tests sfc: Merge top-level functions for self-tests sfc: Clean up PHY mode management in loopback self-test sfc: Fix unreliable link detection in some loopback modes sfc: Generate unique names for per-NIC workqueues 802.3ad: use standard ethhdr instead of ad_header 802.3ad: generalize out mac address initializer 802.3ad: initialize ports LACPDU from const initializer 802.3ad: remove typedef around ad_system 802.3ad: turn ports is_individual into a bool 802.3ad: turn ports is_enabled into a bool 802.3ad: make ntt bool ixgbe: Fix set_ringparam in ixgbe to use the same memory pools. ... Fixed trivial IPv4/6 address printing conflicts in fs/cifs/connect.c due to the conversion to %pI (in this networking merge) and the addition of doing IPv6 addresses (from the earlier merge of CIFS).	2008-12-28 12:49:40 -08:00
Stephen Hemminger	008298231a	netdev: add more functions to netdevice ops This patch moves neigh_setup and hard_start_xmit into the network device ops structure. For bisection, fix all the previously converted drivers as well. Bonding driver took the biggest hit on this. Added a prefetch of the hard_start_xmit in the fast path to try and reduce any impact this would have. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-11-20 20:14:53 -08:00
Stephen Hemminger	758e43b74c	tun: convert to net_device_ops Convert the TUN/TAP tunnel driver to net_device_ops. Split the ops in two, and retain compatability. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-11-19 22:42:47 -08:00
David Howells	86a264abe5	CRED: Wrap current->cred and a few other accessors Wrap current->cred and a few other accessors to hide their actual implementation. Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: James Morris <jmorris@namei.org> Acked-by: Serge Hallyn <serue@us.ibm.com> Signed-off-by: James Morris <jmorris@namei.org>	2008-11-14 10:39:18 +11:00
David Howells	ee9785ada3	CRED: Wrap task credential accesses in the network device drivers Wrap access to task credentials so that they can be separated more easily from the task_struct during the introduction of COW creds. Change most current->(\|e\|s\|fs)[ug]id to current_(\|e\|s\|fs)[ug]id(). Change some task->e?[ug]id to task_e?[ug]id(). In some places it makes more sense to use RCU directly rather than a convenient wrapper; these will be addressed by later patches. Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: James Morris <jmorris@namei.org> Acked-by: Serge Hallyn <serue@us.ibm.com> Cc: netdev@vger.kernel.org Signed-off-by: James Morris <jmorris@namei.org>	2008-11-14 10:38:43 +11:00
David S. Miller	9eeda9abd1	Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 Conflicts: drivers/net/wireless/ath5k/base.c net/8021q/vlan_core.c	2008-11-06 22:43:03 -08:00
David S. Miller	babcda74e9	drivers/net: Kill now superfluous ->last_rx stores. The generic packet receive code takes care of setting netdev->last_rx when necessary, for the sake of the bonding ARP monitor. Drivers need not do it any more. Some cases had to be skipped over because the drivers were making use of the ->last_rx value themselves. Signed-off-by: David S. Miller <davem@davemloft.net>	2008-11-03 21:11:17 -08:00
Al Viro	233e70f422	saner FASYNC handling on file close As it is, all instances of ->release() for files that have ->fasync() need to remember to evict file from fasync lists; forgetting that creates a hole and we actually have a bunch that does forget. So let's keep our lives simple - let __fput() check FASYNC in file->f_flags and call ->fasync() there if it's been set. And lose that crap in ->release() instances - leaving it there is still valid, but we don't have to bother anymore. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-11-01 09:49:46 -07:00
Johannes Berg	e174961ca1	net: convert print_mac to %pM This converts pretty much everything to print_mac. There were a few things that had conflicts which I have just dropped for now, no harm done. I've built an allyesconfig with this and looked at the files that weren't built very carefully, but it's a huge patch. Signed-off-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-10-27 17:06:18 -07:00
Rusty Russell	f42157cb56	tun: fallback if skb_alloc() fails on big packets skb_alloc produces linear packets (using kmalloc()). That can fail, so should we fall back to making paged skbs. My original version of this patch always allocate paged skbs for big packets. But that made performance drop from 8.4 seconds to 8.8 seconds on 1G lguest->Host TCP xmit. So now we only do that as a fallback. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Acked-by: Max Krasnyansky <maxk@qualcomm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-08-15 19:52:31 -07:00
Mark McLoughlin	e3b9955697	tun: TUNGETIFF interface to query name and flags Add a TUNGETIFF interface so that userspace can query a tun/tap descriptor for its name and flags. This is needed because it is common for one app to create a tap interface, exec another app and pass it the file descriptor for the interface. Without TUNGETIFF the spawned app has no way of detecting wheter the interface has e.g. IFF_VNET_HDR set. Signed-off-by: Mark McLoughlin <markmc@redhat.com> Acked-by: Max Krasnyansky <maxk@qualcomm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-08-15 19:52:19 -07:00
Harvey Harrison	c0e5a8c21b	net: tun.c fix cast Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com> Signed-off-by: Jeff Garzik <jgarzik@redhat.com>	2008-07-22 17:54:17 -04:00
David S. Miller	49997d7515	Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/torvalds/linux-2.6 Conflicts: Documentation/powerpc/booting-without-of.txt drivers/atm/Makefile drivers/net/fs_enet/fs_enet-main.c drivers/pci/pci-acpi.c net/8021q/vlan.c net/iucv/iucv.c	2008-07-18 02:39:39 -07:00
Max Krasnyansky	f271b2cc78	tun: Fix/rewrite packet filtering logic Please see the following thread to get some context on this http://marc.info/?l=linux-netdev&m=121564433018903&w=2 Basically the issue is that current multi-cast filtering stuff in the TUN/TAP driver is seriously broken. Original patch went in without proper review and ACK. It was broken and confusing to start with and subsequent patches broke it completely. To give you an idea of what's broken here are some of the issues: - Very confusing comments throughout the code that imply that the character device is a network interface in its own right, and that packets are passed between the two nics. Which is completely wrong. - Wrong set of ioctls is used for setting up filters. They look like shortcuts for manipulating state of the tun/tap network interface but in reality manipulate the state of the TX filter. - ioctls that were originally used for setting address of the the TX filter got "fixed" and now set the address of the network interface itself. Which made filter totaly useless. - Filtering is done too late. Instead of filtering early on, to avoid unnecessary wakeups, filtering is done in the read() call. The list goes on and on :) So the patch cleans all that up. It introduces simple and clean interface for setting up TX filters (TUNSETTXFILTER + tun_filter spec) and does filtering before enqueuing the packets. TX filtering is useful in the scenarios where TAP is part of a bridge, in which case it gets all broadcast, multicast and potentially other packets when the bridge is learning. So for example Ethernet tunnelling app may want to setup TX filters to avoid tunnelling multicast traffic. QEMU and other hypervisors can push RX filtering that is currently done in the guest into the host context therefore saving wakeups and unnecessary data transfer. Signed-off-by: Max Krasnyansky <maxk@qualcomm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-14 22:18:19 -07:00
David S. Miller	2aec609fb4	Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 Conflicts: net/netfilter/nf_conntrack_proto_tcp.c	2008-07-14 20:23:54 -07:00
Jonathan Corbet	2fceef397f	Merge commit 'v2.6.26' into bkl-removal	2008-07-14 15:29:34 -06:00
Max Krasnyansky	e35259a953	tun: Persistent devices can get stuck in xoff state The scenario goes like this. App stops reading from tun/tap. TX queue gets full and driver does netif_stop_queue(). App closes fd and TX queue gets flushed as part of the cleanup. Next time the app opens tun/tap and starts reading from it but the xoff state is not cleared. We're stuck. Normally xoff state is cleared when netdev is brought up. But in the case of persistent devices this happens only during initial setup. The fix is trivial. If device is already up when an app opens it we clear xoff state and that gets things moving again. Signed-off-by: Max Krasnyansky <maxk@qualcomm.com> Tested-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-10 16:59:11 -07:00
Rusty Russell	f43798c276	tun: Allow GSO using virtio_net_hdr Add a IFF_VNET_HDR flag. This uses the same ABI as virtio_net (ie. prepending struct virtio_net_hdr to packets) to indicate GSO and checksum information. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Acked-by: Max Krasnyansky <maxk@qualcomm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-03 03:48:02 -07:00
Rusty Russell	5228ddc98f	tun: TUNSETFEATURES to set gso features. ethtool is useful for setting (some) device fields, but it's root-only. Finer feature control is available through a tun-specific ioctl. (Includes Mark McLoughlin <markmc@redhat.com>'s fix to hold rtnl sem). Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Acked-by: Max Krasnyansky <maxk@qualcomm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-03 03:46:16 -07:00
Rusty Russell	07240fd090	tun: Interface to query tun/tap features. The problem with introducing checksum offload and gso to tun is they need to set dev->features to enable GSO and/or checksumming, which is supposed to be done before register_netdevice(), ie. as part of TUNSETIFF. Unfortunately, TUNSETIFF has always just ignored flags it doesn't understand, so there's no good way of detecting whether the kernel supports new IFF_ flags. This patch implements a TUNGETFEATURES ioctl which returns all the valid IFF flags. It could be extended later to include other features. Here's an example program which uses it: #include <linux/if_tun.h> #include <sys/types.h> #include <sys/ioctl.h> #include <sys/stat.h> #include <fcntl.h> #include <err.h> #include <stdio.h> static struct { unsigned int flag; const char *name; } known_flags[] = { { IFF_TUN, "TUN" }, { IFF_TAP, "TAP" }, { IFF_NO_PI, "NO_PI" }, { IFF_ONE_QUEUE, "ONE_QUEUE" }, }; int main() { unsigned int features, i; int netfd = open("/dev/net/tun", O_RDWR); if (netfd < 0) err(1, "Opening /dev/net/tun"); if (ioctl(netfd, TUNGETFEATURES, &features) != 0) { printf("Kernel does not support TUNGETFEATURES, guessing\n"); features = (IFF_TUN\|IFF_TAP\|IFF_NO_PI\|IFF_ONE_QUEUE); } printf("Available features are: "); for (i = 0; i < sizeof(known_flags)/sizeof(known_flags[0]); i++) { if (features & known_flags[i].flag) { features &= ~known_flags[i].flag; printf("%s ", known_flags[i].name); } } if (features) printf("(UNKNOWN %#x)", features); printf("\n"); return 0; } Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Acked-by: Max Krasnyansky <maxk@qualcomm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-03 03:45:32 -07:00
Jonathan Corbet	9d31952257	tun: fasync BKL pushdown Signed-off-by: Jonathan Corbet <corbet@lwn.net>	2008-07-02 15:06:27 -06:00
Arnd Bergmann	fd3e05b6c8	net-tun: BKL pushdown Signed-off-by: Arnd Bergmann <arnd@arndb.de>	2008-07-02 15:06:23 -06:00
Ang Way Chuang	f09f7ee20c	tun: Proper handling of IPv6 header in tun driver when TUN_NO_PI is set By default, tun.c running in TUN_TUN_DEV mode will set the protocol of packet to IPv4 if TUN_NO_PI is set. My program failed to work when I assumed that the driver will check the first nibble of packet, determine IP version and set the appropriate protocol. Signed-off-by: Ang Way Chuang <wcang@nav6.org> Acked-by: Max Krasnyansky <maxk@qualcomm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-06-17 21:10:33 -07:00
David S. Miller	9edb74cc6c	tun: Multicast handling in tun_chr_ioctl() needs proper locking. Since these operations don't go through the normal device calls, we have to ensure we synchronize with those paths. Noticed by Alan Cox. Signed-off-by: David S. Miller <davem@davemloft.net>	2008-04-24 03:44:43 -07:00
David S. Miller	48abfe05cd	tun: Fix minor race in TUNSETLINK ioctl handling. Noticed by Alan Cox. The IFF_UP test is a bit racey, because other entities outside of this driver's ioctl handler can modify that state, even though this ioctl handler runs under lock_kernel(). Signed-off-by: David S. Miller <davem@davemloft.net>	2008-04-23 19:37:58 -07:00
Pavel Emelyanov	fc54c65853	[TUN]: Allow to register tun devices in namespace. This is basically means that a net is set for a new device, but actually also involves two more steps: 1. mark the tun device as "local", i.e. do not allow for it to move across namespaces. This is done so, since tun device is most often associated to some file (and thus to some process) and moving the device alone is not valid while keeping the file and the process outside. The need in ability to move a detached persistent device is to be investigated later. 2. get the tun device's net when tun becomes attached and put one when it becomes detached. This is needed to handle the case when a task owning the tun dies, but a files lives for some more time - in this case we must not allow for net to be freed, since its exit hook will spoil that file's private data by unregistering the tun from under tun_chr_close. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-04-16 00:41:53 -07:00
Pavel Emelyanov	d647a591da	[TUN]: Make the tun_dev_list per-net. Remove the static tun_dev_list and replace its occurrences in driver with per-net one. It is used in two places - in tun_set_iff and tun_cleanup. In the first case it's legal to use current net_ns. In the cleanup call - move the loop, that unregisters all devices in net exit hook. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-04-16 00:41:16 -07:00
Pavel Emelyanov	79d1760491	[TUN]: Introduce the tun_net structure and init/exit net ops. This is the first step in making tuntap devices work in net namespaces. The structure mentioned is pointed by generic net pointer with tun_net_id id, and tun driver fills one on its load. It will contain only the tun devices list. So declare this structure and introduce net init and exit hooks. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-04-16 00:40:46 -07:00
Rusty Russell	e01bf1c833	net: check for underlength tap writes If the user gives a packet under 14 bytes, we'll end up reading off the end of the skb (not oopsing, just reading off the end). Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Acked-by: Max Krasnyanskiy <maxk@qualcomm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-04-12 18:49:30 -07:00
Rusty Russell	14daa02139	net: make struct tun_struct private to tun.c There's no reason for this to be in the header, and it just hurts recompile time. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Acked-by: Max Krasnyanskiy <maxk@qualcomm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-04-12 18:48:58 -07:00
Kim B. Heino	401023710d	[TUN]: Fix RTNL-locking in tun/tap driver Current tun/tap driver sets also net device's hw address when asked to change character device's hw address. This is a good idea, but it misses RTLN-locking, resulting following error message in 2.6.25-rc3's inetdev_event() function: RTNL: assertion failed at net/ipv4/devinet.c (1050) Attached patch fixes this problem. Signed-off-by: Kim B. Heino <Kim.Heino@bluegiga.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-02-29 12:26:21 -08:00
Nathaniel Filardo	a26af1e08a	tun: impossible to deassert IFF_ONE_QUEUE or IFF_NO_PI From: "Nathaniel Filardo" <nwfilardo@gmail.com> Taken from http://bugzilla.kernel.org/show_bug.cgi?id=9806 The TUN/TAP driver only permits one-way transitions of IFF_NO_PI or IFF_ONE_QUEUE during the lifetime of a tap/tun interface. Note that tun_set_iff contains 541 if (ifr->ifr_flags & IFF_NO_PI) 542 tun->flags \|= TUN_NO_PI; 543 544 if (ifr->ifr_flags & IFF_ONE_QUEUE) 545 tun->flags \|= TUN_ONE_QUEUE; This is easily fixed by adding else branches which clear these bits. Steps to reproduce: This is easily reproduced by setting an interface persistant using tunctl then attempting to open it as IFF_TAP or IFF_TUN, without asserting the IFF_NO_PI flag. The ioctl() will succeed and the ifr.flags word is not modified, but the interface remains in IFF_NO_PI mode (as it was set by tunctl). Acked-by: Maxim Krasnyansky <maxk@qualcomm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-02-05 03:05:07 -08:00
Al Viro	a3edb08311	annotate tun Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Jeff Garzik <jeff@garzik.org>	2008-01-28 15:07:57 -08:00
Akinobu Mita	52427c9d11	[TUN]: Use iov_length() Use iov_length() instead of tun's homemade iov_total(). Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:54:34 -08:00

1 2 3 4 5 ...

273 Commits