Fixes two bugs:
- ToS/DiffServ inheritance was unintentionally activated when using impair fixed ToS values
- ECN bit was lost during ToS/DiffServ inheritance
Signed-off-by: Andreas Jaggi <aj@open.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch is the result of an automatic spatch transformation to convert
all ndo_start_xmit() return values of 0 to NETDEV_TX_OK.
Some occurences are missed by the automatic conversion, those will be
handled in a seperate patch.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Define three accessors to get/set dst attached to a skb
struct dst_entry *skb_dst(const struct sk_buff *skb)
void skb_dst_set(struct sk_buff *skb, struct dst_entry *dst)
void skb_dst_drop(struct sk_buff *skb)
This one should replace occurrences of :
dst_release(skb->dst)
skb->dst = NULL;
Delete skb->dst field
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Define skb_rtable(const struct sk_buff *skb) accessor to get rtable from skb
Delete skb->rtable field
Setting rtable is not allowed, just set dst instead as rtable is an alias.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
ipgre_tunnel_xmit() might need skb->dst, so tell dev_hard_start_xmit()
to no release it.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The functions time_before is more robust for comparing
jiffies against other values.
Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Instead of keeping candidate tunnel device from all categories,
keep only one candidate with best score. This optimizes stack
usage and speeds up exit code.
Signed-off-by: Timo Teras <timo.teras@iki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
Check the device on receive path and allow otherwise identical devices
as long as the physical device differs.
This is useful for NBMA tunnels, where you want to use different gre IP
for each public IP available via different physical devices.
Signed-off-by: Timo Teras <timo.teras@iki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
Brown paper bag error of calling memset with sizeof(p) instead
of sizeof(*p).
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
- use typeful helpers for IFLA_GRE_LOCAL/IFLA_GRE_REMOTE
- replace magic value by FIELD_SIZEOF
- use MODULE_ALIAS_RTNL_LINK macro
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch makes the RX/TX byte counters for IPIP, GRE and SIT more
consistent. Previously we included the external IP headers on the
way out but not when the packet is inbound.
The new scheme is to count payload only in both directions. For
IPIP and SIT this simply means the exclusion of the external IP
header. For GRE this means that we exclude the GRE header as
well.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds support for Ethernet over GRE encapsulation.
This is exposed to user-space with a new link type of "gretap"
instead of "gre". It will create an ARPHRD_ETHER device in
lieu of the usual ARPHRD_IPGRE.
Note that to preserver backwards compatibility all Transparent
Ethernet Bridging packets are passed to an ARPHRD_IPGRE tunnel
if its key matches and there is no ARPHRD_ETHER device whose
key matches more closely.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds a netlink interface that will eventually displace
the existing ioctl interface. It utilises the elegant rtnl_link_ops
mechanism.
This also means that user-space no longer needs to rely on the
tunnel interface being of type GRE to identify GRE tunnels. The
identification can now occur using rtnl_link_ops.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch moves the dev->mtu setting out of ipgre_tunnel_bind_dev.
This is in prepartion of using rtnl_link where we'll need to make
the MTU setting conditional on whether the user has supplied an
MTU. This also requires the move of the ipgre_tunnel_bind_dev
call out of the dev->init function so that we can access the user
parameters later.
This patch also adds a check to prevent setting the MTU below
the minimum of 68.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Now that we have dev->needed_headroom, we can use it instead of
having a bogus dev->hard_header_len. This also allows us to
include dev->hard_header_len in the MTU computation so that when
we do have a meaningful hard_harder_len in future it is included
automatically in figuring out the MTU.
Incidentally, this fixes a bug where we ignored the needed_headroom
field of the underlying device in calculating our own hard_header_len.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Unless there will be any objection here, I suggest consider the
following patch which simply removes the code for the
-DI_WISH_WORLD_WERE_PERFECT in the three methods which use it.
The compilation errors we get when using -DI_WISH_WORLD_WERE_PERFECT
show that this code was not built and not used for really a long time.
Signed-off-by: Rami Rosen <ramirose@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Just switch from tunnel->stat to tunnel->dev->stats. The ip_tunnel->stat
member itself will be removed after I fix its other users (very soon).
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
This one was also disabled by default for sanity.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
I.e. set the proper net and mark as NETNS_LOCAL.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
As for the IPIP tunnel, there are some ip_route_output_key()
calls in there that require a proper net so give one to them.
And a proper net for the __get_dev_by_index hanging around.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Very similar to what was done for the IPIP code.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Everything is prepared for this change now. Create on in
init callback, use it over the code and destroy on net exit.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
This is the part#2 of the patch #2 - get the proper net for
these functions. This change in a separate patch in order not
to get lost in a large previous patch.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
The fallback device and hashes are to become per-net, but many
code doesn't have anything to get the struct net pointer from.
So pass the proper net there with an extra argument.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Introduce per-net_device inlines: dev_net(), dev_net_set().
Without CONFIG_NET_NS, no namespace other than &init_net exists.
Let's explicitly define them to help compiler optimizations.
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
(Anonymous) unions can help us to avoid ugly casts.
A common cast it the (struct rtable *)skb->dst one.
Defining an union like :
union {
struct dst_entry *dst;
struct rtable *rtable;
};
permits to use skb->rtable in place.
Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Four tunnel drivers (ip_gre, ipip, ip6_tunnel and sit) can receive a
pre-defined name for a device from the userspace. Since these drivers
call the register_netdevice() (rtnl_lock, is held), which does _not_
generate the device's name, this name may contain a '%' character.
Not sure how bad is this to have a device with a '%' in its name, but
all the other places either use the register_netdev(), which call the
dev_alloc_name(), or explicitly call the dev_alloc_name() before
registering, i.e. do not allow for such names.
This had to be prior to the commit 34cc7b, but I forgot to number the
patches and this one got lost, sorry.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use the added dev_alloc_name() call to create tunnel device name,
rather than iterate in a hand-made loop with an artificial limit.
Thanks Patrick for noticing this.
[ The way this works is, when the device is actually registered,
the generic code noticed the '%' in the name and invokes
dev_alloc_name() to fully resolve the name. -DaveM ]
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Needed to propagate it down to the ip_route_output_flow.
Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
This is similar to the change already done for IPIP tunnels.
Once created, a GRE tunnel can't be bound to another device.
To reproduce:
# create a tunnel:
ip tunnel add tunneltest0 mode gre remote 10.0.0.1 dev eth0
# try to change the bounding device from eth0 to eth1:
ip tunnel change tunneltest0 dev eth1
# show the result:
ip tunnel show tunneltest0
tunneltest0: gre/ip remote 10.0.0.1 local any dev eth0 ttl inherit
Notice the bound device has not changed from eth0 to eth1.
This patch fixes it. When changing the binding, it also recalculates the
MTU according to the new bound device's MTU.
Signed-off-by: Michal Schmidt <mschmidt@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
mac_header update in ipgre_recv() was incorrectly changed to
skb_reset_mac_header() when it was introduced.
Signed-off-by: Timo Teras <timo.teras@iki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
In some places, the result of skb_headroom() is compared to an unsigned
integer, and in others, the result is compared to a signed integer. Make
the comparisons consistent and correct.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When GRE tunnel is in NBMA mode, this patch allows an application to use
a PF_PACKET socket to:
- send a packet to specific NBMA address with sendto()
- use recvfrom() to receive packet and check which NBMA address it came from
This is required to implement properly NHRP over GRE tunnel.
Signed-off-by: Timo Teras <timo.teras@iki.fi>
Acked-by: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
Signed-off-by: David S. Miller <davem@davemloft.net>
Since hardware header operations are part of the protocol class
not the device instance, make them into a separate object and
save memory.
Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
It's been a useless no-op for long enough in 2.6 so I figured it's time to
remove it. The number of people that could object because they're
maintaining unified 2.4 and 2.6 drivers is probably rather small.
[ Handled drivers added by netdev tree and some missed IRDA cases... -DaveM ]
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch makes most of the generic device layer network
namespace safe. This patch makes dev_base_head a
network namespace variable, and then it picks up
a few associated variables. The functions:
dev_getbyhwaddr
dev_getfirsthwbytype
dev_get_by_flags
dev_get_by_name
__dev_get_by_name
dev_get_by_index
__dev_get_by_index
dev_ioctl
dev_ethtool
dev_load
wireless_process_ioctl
were modified to take a network namespace argument, and
deal with it.
vlan_ioctl_set and brioctl_set were modified so their
hooks will receive a network namespace argument.
So basically anthing in the core of the network stack that was
affected to by the change of dev_base was modified to handle
multiple network namespaces. The rest of the network stack was
simply modified to explicitly use &init_net the initial network
namespace. This can be fixed when those components of the network
stack are modified to handle multiple network namespaces.
For now the ifindex generator is left global.
Fundametally ifindex numbers are per namespace, or else
we will have corner case problems with migration when
we get that far.
At the same time there are assumptions in the network stack
that the ifindex of a network device won't change. Making
the ifindex number global seems a good compromise until
the network stack can cope with ifindex changes when
you change namespaces, and the like.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Renaming skb->h to skb->transport_header, skb->nh to skb->network_header and
skb->mac to skb->mac_header, to match the names of the associated helpers
(skb[_[re]set]_{transport,network,mac}_header).
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
For the places where we need a pointer to the transport header, it is
still legal to touch skb->h.raw directly if just adding to,
subtracting from or setting it to another layer header.
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Now the skb->nh union has just one member, .raw, i.e. it is just like the
skb->mac union, strange, no? I'm just leaving it like that till the transport
layer is done with, when we'll rename skb->mac.raw to skb->mac_header (or
->mac_header_offset?), ditto for ->{h,nh}.
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
It returns skb->data, so we can just use skb_reset_network_header after it.
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
skb_push updates and returns skb->data, so we can just call
skb_reset_network_header after the call to skb_push.
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
For the common, open coded 'skb->nh.raw = skb->data' operation, so that we can
later turn skb->nh.raw into a offset, reducing the size of struct sk_buff in
64bit land while possibly keeping it as a pointer on 32bit.
This one touches just the most simple case, next will handle the slightly more
"complex" cases.
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
For the common, open coded 'skb->mac.raw = skb->data' operation, so that we can
later turn skb->mac.raw into a offset, reducing the size of struct sk_buff in
64bit land while possibly keeping it as a pointer on 32bit.
This one touches just the most simple case, next will handle the slightly more
"complex" cases.
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
After Al Viro (finally) succeeded in removing the sched.h #include in module.h
recently, it makes sense again to remove other superfluous sched.h includes.
There are quite a lot of files which include it but don't actually need
anything defined in there. Presumably these includes were once needed for
macros that used to live in sched.h, but moved to other header files in the
course of cleaning it up.
To ease the pain, this time I did not fiddle with any header files and only
removed #includes from .c-files, which tend to cause less trouble.
Compile tested against 2.6.20-rc2 and 2.6.20-rc2-mm2 (with offsets) on alpha,
arm, i386, ia64, mips, powerpc, and x86_64 with allnoconfig, defconfig,
allmodconfig, and allyesconfig as well as a few randconfigs on x86_64 and all
configs in arch/arm/configs on arm. I also checked that no new warnings were
introduced by the patch (actually, some warnings are removed that were emitted
by unnecessarily included header files).
Signed-off-by: Tim Schmielau <tim@physik3.uni-rostock.de>
Acked-by: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
There was no real useful information from the unregister_netdevice() return
code, the only error occurred in a situation that was a driver bug. So
change it to a void function.
Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Handling of ipip and ip_gre ICMP error relaying is b0rken; it accesses
8bit field + 3 reserved octets as host-endian 32bit, does comparison,
subtraction and stuffs the result back. That breaks on big-endian.
Fixed, made endian-clean.
[ Note that this effected code is permanently commented out with
and ifdef, so this error couldn't actually cause problems for
anyone. -DaveM ]
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
Replace CHECKSUM_HW by CHECKSUM_PARTIAL (for outgoing packets, whose
checksum still needs to be completed) and CHECKSUM_COMPLETE (for
incoming packets, device supplied full checksum).
Patch originally from Herbert Xu, updated by myself for 2.6.18-rc3.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Now that we always zero the IPCB->opts in ip_rcv, it is no longer
necessary to do so before calling netif_rx for tunneled packets.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch changes GRE and SIT to generate port unreachable instead of
protocol unreachable errors when we can't find a matching tunnel for a
packet.
This removes the ambiguity as to whether the error is caused by no
tunnel being found or by the lack of support for the given tunnel
type.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
When a packet matching an IPsec policy is SNATed so it doesn't match any
policy anymore it looses its xfrm bundle, which makes xfrm4_output_finish
crash because of a NULL pointer dereference.
This patch directs these packets to the original output path instead. Since
the packets have already passed the POST_ROUTING hook, but need to start at
the beginning of the original output path which includes another
POST_ROUTING invocation, a flag is added to the IPCB to indicate that the
packet was rerouted and doesn't need to pass the POST_ROUTING hook again.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
net: Use <linux/capability.h> where capable() is used.
Signed-off-by: Randy Dunlap <rdunlap@xenotime.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
ip_route_me_harder doesn't use the port numbers of the xfrm lookup and
uses ip_route_input for non-local addresses which doesn't do a xfrm
lookup, ip6_route_me_harder doesn't do a xfrm lookup at all.
Use xfrm_decode_session and do the lookup manually, make sure both
only do the lookup if the packet hasn't been transformed already.
Makeing sure the lookup only happens once needs a new field in the
IP6CB, which exceeds the size of skb->cb. The size of skb->cb is
increased to 48b. Apparently the IPv6 mobile extensions need some
more room anyway.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Reset IPSKB_XFRM_TUNNEL_SIZE flags in ipip and ip_gre hard_start_xmit
function before the packet reenters IP. This is neccessary so the
encapsulated packets are checked not to be oversized in xfrm4_output.c
again. Reset all flags in sit when a packet changes its address family.
Also remove some obsolete IPSKB flags.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
These patches add the header linux/if_ether.h and change 1500 to
ETH_DATA_LEN in some files.
Signed-off-by: Kris Katterjohn <kjak@users.sourceforge.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
The skb_postpull_rcsum introduced a bug to the checksum modification.
Although the length pulled is offset bytes, the origin of the pulling
is the GRE header, not the IP header.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch makes two needlessly global functions static.
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Here is the patch that introduces the generic skb_checksum_complete
which also checks for hardware RX checksum faults. If that happens,
it'll call netdev_rx_csum_fault which currently prints out a stack
trace with the device name. In future it can turn off RX checksum.
I've converted every spot under net/ that does RX checksum checks to
use skb_checksum_complete or __skb_checksum_complete with the
exceptions of:
* Those places where checksums are done bit by bit. These will call
netdev_rx_csum_fault directly.
* The following have not been completely checked/converted:
ipmr
ip_vs
netfilter
dccp
This patch is based on patches and suggestions from Stephen Hemminger
and David S. Miller.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
The following patch renames __in_dev_get() to __in_dev_get_rtnl() and
introduces __in_dev_get_rcu() to cover the second case.
1) RCU with refcnt should use in_dev_get().
2) RCU without refcnt should use __in_dev_get_rcu().
3) All others must hold RTNL and use __in_dev_get_rtnl().
There is one exception in net/ipv4/route.c which is in fact a pre-existing
race condition. I've marked it as such so that we remember to fix it.
This patch is based on suggestions and prior work by Suzanne Wood and
Paul McKenney.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Tunnel modules used to obtain module refcount each time when
some tunnel was created, which meaned that tunnel could be unloaded
only after all the tunnels are deleted.
Since killing old MOD_*_USE_COUNT macros this protection has gone.
It is possible to return it back as module_get/put, but it looks
more natural and practically useful to force destruction of all
the child tunnels on module unload.
Signed-off-by: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
Signed-off-by: David S. Miller <davem@davemloft.net>
Initial git repository build. I'm not bothering with the full history,
even though we have it. We can create a separate "historical" git
archive of that later if we want to, and in the meantime it's about
3.2GB when imported into git - space that would just make the early
git days unnecessarily complicated, when we don't have a lot of good
infrastructure for it.
Let it rip!