Commit Graph

129 Commits

Author SHA1 Message Date
David S. Miller df39e8ba56 Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6
Conflicts:

	drivers/net/ehea/ehea_main.c
	drivers/net/wireless/iwlwifi/Kconfig
	drivers/net/wireless/rt2x00/rt61pci.c
	net/ipv4/inet_timewait_sock.c
	net/ipv6/raw.c
	net/mac80211/ieee80211_sta.c
2008-04-14 02:30:23 -07:00
Gerrit Renker 7de6c03336 [SKB]: __skb_append = __skb_queue_after
This expresses __skb_append in terms of __skb_queue_after, exploiting that

  __skb_append(old, new, list) = __skb_queue_after(list, old, new).

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-04-14 00:05:09 -07:00
Ben Hutchings 4c821d753d [NET]: Fix kernel-doc for skb_segment
The kernel-doc comment for skb_segment is clearly wrong.  This states
what it actually does.

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-04-13 21:52:48 -07:00
David S. Miller a0f55e0e83 [NET]: Fix dev_alloc_skb() typo.
Noticed by Joe Perches.

Signed-off-by: David S. Miller <davem@davemloft.net>
2008-03-28 18:22:32 -07:00
Denys Vlasenko 1483b8744e [NET]: Add inline intent commentary to dev_alloc_skb().
Signed-off-by: Denys Vlasenko <vda.linux@googlemail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-03-28 15:57:39 -07:00
Ilpo Järvinen 419ae74ecc [NET]: uninline skb_trim, de-bloats
Allyesconfig (v2.6.24-mm1):
-10976  209 funcs, 123 +, 11099 -, diff: -10976 --- skb_trim

Without number of debug related CONFIGs (v2.6.25-rc2-mm1):
-7360  192 funcs, 131 +, 7491 -, diff: -7360 --- skb_trim
skb_trim                      |  +42

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-03-27 17:54:01 -07:00
Ilpo Järvinen c2aa270ad7 [NET]: uninline skb_push, de-bloats a lot
Allyesconfig (v2.6.24-mm1):

-21593  356 funcs, 2418 +, 24011 -, diff: -21593 --- skb_push

Without many debug related CONFIGs (v2.6.25-rc2-mm1):

-13890  341 funcs, 189 +, 14079 -, diff: -13890 --- skb_push
skb_push                      |  +46

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-03-27 17:52:40 -07:00
Ilpo Järvinen f58518e678 [NET]: uninline dev_alloc_skb, de-bloats a lot
Allyesconfig (v2.6.24-mm1):

-23668  392 funcs, 104 +, 23772 -, diff: -23668 --- dev_alloc_skb

Without many debug CONFIGs (v2.6.25-rc2-mm1):

-12178  382 funcs, 157 +, 12335 -, diff: -12178 --- dev_alloc_skb
dev_alloc_skb                 |  +37

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-03-27 17:51:31 -07:00
Ilpo Järvinen 6be8ac2fdc [NET]: uninline skb_pull, de-bloats a lot
Allyesconfig (v2.6.24-mm1):

-28162  354 funcs, 3005 +, 31167 -, diff: -28162 --- skb_pull

Without number of debug related CONFIGs (v2.6.25-rc2-mm1):

-9697  338 funcs, 221 +, 9918 -, diff: -9697 --- skb_pull
skb_pull                      |  +44

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-03-27 17:47:24 -07:00
Ilpo Järvinen 0dde3e1648 [NET]: uninline skb_put, de-bloats a lot
Allyesconfig (v2.6.24-mm1):

~500 files changed
...
 869 funcs, 198 +, 111003 -, diff: -110805 --- skb_put
  skb_put                       | +104

Without number of debug related CONFIGs (v2.6.25-rc2-mm1):

-60744  855 funcs, 861 +, 61605 -, diff: -60744 --- skb_put
  skb_put                       |  +57

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-03-27 17:43:41 -07:00
Linus Torvalds f6866fecd6 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (82 commits)
  [NET]: Make sure sockets implement splice_read
  netconsole: avoid null pointer dereference at show_local_mac()
  [IPV6]: Fix reversed local_df test in ip6_fragment
  [XFRM]: Avoid bogus BUG() when throwing new policy away.
  [AF_KEY]: Fix bug in spdadd
  [NETFILTER] nf_conntrack_proto_tcp.c: Mistyped state corrected.
  net: xfrm statistics depend on INET
  [NETFILTER]: make secmark_tg_destroy() static
  [INET]: Unexport inet_listen_wlock
  [INET]: Unexport __inet_hash_connect
  [NET]: Improve cache line coherency of ingress qdisc
  [NET]: Fix race in dev_close(). (Bug 9750)
  [IPSEC]: Fix bogus usage of u64 on input sequence number
  [RTNETLINK]: Send a single notification on device state changes.
  [NETLABLE]: Hide netlbl_unlabel_audit_addr6 under ifdef CONFIG_IPV6.
  [NETLABEL]: Don't produce unused variables when IPv6 is off.
  [NETLABEL]: Compilation for CONFIG_AUDIT=n case.
  [GENETLINK]: Relax dances with genl_lock.
  [NETLABEL]: Fix lookup logic of netlbl_domhsh_search_def.
  [IPV6]: remove unused method declaration (net/ndisc.h).
  ...
2008-02-15 07:33:07 -08:00
Randy Dunlap bc2cda1ebd docbook: make a networking book and fix a few errors
Move networking (core and drivers) docbook to its own networking book.
Fix a few kernel-doc errors in header and source files.

Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Cc: "J. Bruce Fields" <bfields@fieldses.org>
Cc: Neil Brown <neilb@suse.de>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-13 16:21:19 -08:00
Urs Thuermann fee54fa517 [NET]: Fix comment for skb_pull_rcsum
Fix comment for skb_pull_rcsum

Signed-off-by: Urs Thuermann <urs@isnogud.escape.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-02-12 22:03:25 -08:00
Rusty Russell f35d9d8aae virtio: Implement skb_partial_csum_set, for setting partial csums on untrusted packets.
Use it in virtio_net (replacing buggy version there), it's also going
to be used by TAP for partial csum support.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Acked-by: David S. Miller <davem@davemloft.net>
2008-02-04 23:49:56 +11:00
Jens Axboe 9c55e01c0c [TCP]: Splice receive support.
Support for network splice receive.

Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:53:31 -08:00
Paul Moore 02f1c89d6e [NET]: Clone the sk_buff 'iif' field in __skb_clone()
Both NetLabel and SELinux (other LSMs may grow to use it as well) rely
on the 'iif' field to determine the receiving network interface of
inbound packets.  Unfortunately, at present this field is not
preserved across a skb clone operation which can lead to garbage
values if the cloned skb is sent back through the network stack.  This
patch corrects this problem by properly copying the 'iif' field in
__skb_clone() and removing the 'iif' field assignment from
skb_act_clone() since it is no longer needed.

Also, while we are here, put the assignments in the same order as the
offsets to reduce cacheline bounces.

Signed-off-by: Paul Moore <paul.moore@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-08 23:30:17 -08:00
Herbert Xu 2d4baff8da [SKBUFF]: Free old skb properly in skb_morph
The skb_morph function only freed the data part of the dst skb, but leaked
the auxiliary data such as the netfilter fields.  This patch fixes this by
moving the relevant parts from __kfree_skb to skb_release_all and calling
it in skb_morph.

It also makes kfree_skbmem static since it's no longer called anywhere else
and it now no longer does skb_release_data.

Thanks to Yasuyuki KOZAKAI for finding this problem and posting a patch for
it.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2007-11-26 23:11:19 +08:00
Jens Axboe c46f2334c8 [SG] Get rid of __sg_mark_end()
sg_mark_end() overwrites the page_link information, but all users want
__sg_mark_end() behaviour where we just set the end bit. That is the most
natural way to use the sg list, since you'll fill it in and then mark the
end point.

So change sg_mark_end() to only set the termination bit. Add a sg_magic
debug check as well, and clear a chain pointer if it is set.

Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2007-11-02 08:47:06 +01:00
David S. Miller 51c739d1f4 [NET]: Fix incorrect sg_mark_end() calls.
This fixes scatterlist corruptions added by

	commit 68e3f5dd4d
	[CRYPTO] users: Fix up scatterlist conversion errors

The issue is that the code calls sg_mark_end() which clobbers the
sg_page() pointer of the final scatterlist entry.

The first part fo the fix makes skb_to_sgvec() do __sg_mark_end().

After considering all skb_to_sgvec() call sites the most correct
solution is to call __sg_mark_end() in skb_to_sgvec() since that is
what all of the callers would end up doing anyways.

I suspect this might have fixed some problems in virtio_net which is
the sole non-crypto user of skb_to_sgvec().

Other similar sg_mark_end() cases were converted over to
__sg_mark_end() as well.

Arguably sg_mark_end() is a poorly named function because it doesn't
just "mark", it clears out the page pointer as a side effect, which is
what led to these bugs in the first place.

The one remaining plain sg_mark_end() call is in scsi_alloc_sgtable()
and arguably it could be converted to __sg_mark_end() if only so that
we can delete this confusing interface from linux/scatterlist.h

Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-30 21:29:29 -07:00
Jamal Hadi Salim a057ae3c10 [NET_CLS_ACT]: Use skb_act_clone
clean skb_clone of any signs of CONFIG_NET_CLS_ACT and
have mirred us skb_act_clone()

Signed-off-by: Jamal Hadi Salim <hadi@cyberus.ca>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-26 02:47:54 -07:00
Jens Axboe 642f149031 SG: Change sg_set_page() to take length and offset argument
Most drivers need to set length and offset as well, so may as well fold
those three lines into one.

Add sg_assign_page() for those two locations that only needed to set
the page, where the offset/length is set outside of the function context.

Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2007-10-24 11:20:47 +02:00
Jens Axboe fa05f1286b Update net/ to use sg helpers
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2007-10-22 21:19:56 +02:00
Herbert Xu 172a863f2b [NET]: Fix csum_start update in pskb_expand_head
I got confused by the dual nature of the off variable in the
function pskb_expand_head.  The csum_start offset should use
nhead instead of off which can change depending on whether we
are using offsets or pointers.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-15 12:26:33 -07:00
Herbert Xu e0053ec07e [SKBUFF]: Add skb_morph
This patch creates a new function skb_morph that's just like skb_clone
except that it lets user provide the spare skb that will be overwritten
by the one that's to be cloned.

This will be used by IP fragment reassembly so that we get back the same
skb that went in last (rather than the head skb that we get now which
requires us to carry around double pointers all over the place).

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-15 12:26:24 -07:00
Herbert Xu dec18810c5 [SKBUFF]: Merge common code between copy_skb_header and skb_clone
This patch creates a new function __copy_skb_header to merge the common
code between copy_skb_header and skb_clone.  Having two functions which
are largely the same is a source of wasted labour as well as confusion.

In fact the tc_verd stuff is almost certainly a bug since it's treated
differently in skb_clone compared to the callers of copy_skb_header
(skb_copy/pskb_copy/skb_copy_expand).

I've kept that difference in tact with a comment added asking for
clarification.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-15 12:26:24 -07:00
Herbert Xu 52886051ff [SKBUFF]: Fix up csum_start when head room changes
Thanks for noticing the bug where csum_start is not updated
when the head room changes.

This patch fixes that.  It also moves the csum/ip_summed
copying into copy_skb_header so that skb_copy_expand gets
it too.  I've checked its callers and no one should be upset
by this.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10 16:51:24 -07:00
Paul Mundt 20c2df83d2 mm: Remove slab destructors from kmem_cache_create().
Slab destructors were no longer supported after Christoph's
c59def9f22 change. They've been
BUGs for both slab and slub, and slob never supported them
either.

This rips out support for the dtor pointer from kmem_cache_create()
completely and fixes up every single callsite in the kernel (there were
about 224, not including the slab allocator definitions themselves,
or the documentation references).

Signed-off-by: Paul Mundt <lethal@linux-sh.org>
2007-07-20 10:11:58 +09:00
Jozsef Kadlecsik ba9dda3ab5 [NETFILTER]: x_tables: add TRACE target
The TRACE target can be used to follow IP and IPv6 packets through
the ruleset.

Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: Patrick NcHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-07-10 22:17:14 -07:00
Peter P Waskiewicz Jr f25f4e4480 [CORE] Stack changes to add multiqueue hardware support API
Add the multiqueue hardware device support API to the core network
stack.  Allow drivers to allocate multiple queues and manage them at
the netdev level if they choose to do so.

Added a new field to sk_buff, namely queue_mapping, for drivers to
know which tx_ring to select based on OS classification of the flow.

Signed-off-by: Peter P Waskiewicz Jr <peter.p.waskiewicz.jr@intel.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-07-10 22:16:21 -07:00
Patrick McHardy 334a8132d9 [SKBUFF]: Keep track of writable header len of headerless clones
Currently NAT (and others) that want to modify cloned skbs copy them,
even if in the vast majority of cases its not necessary because the
skb is a clone made by TCP and the portion NAT wants to modify is
actually writable because TCP release the header reference before
cloning.

The problem is that there is no clean way for NAT to find out how
long the writable header area is, so this patch introduces skb->hdr_len
to hold this length. When a headerless skb is cloned skb->hdr_len
is set to the current headroom, for regular clones it is copied from
the original. A new function skb_clone_writable(skb, len) returns
whether the skb is writable up to len bytes from skb->data. To avoid
enlarging the skb the mac_len field is reduced to 16 bit and the
new hdr_len field is put in the remaining 16 bit.

I've done a few rough benchmarks of NAT (not with this exact patch,
but a very similar one). As expected it saves huge amounts of system
time in case of sendfile, bringing it down to basically the same
amount as without NAT, with sendmsg it only helps on loopback,
probably because of the large MTU.

Transmit a 1GB file using sendfile/sendmsg over eth0/lo with and
without NAT:

- sendfile eth0, no NAT:	sys     0m0.388s
- sendfile eth0, NAT:		sys     0m1.835s
- sendfile eth0: NAT + path:	sys     0m0.370s	(~ -80%)

- sendfile lo, no NAT:		sys     0m0.258s
- sendfile lo, NAT:		sys     0m2.609s
- sendfile lo, NAT + patch:	sys     0m0.260s	(~ -90%)

- sendmsg eth0, no NAT:		sys     0m2.508s
- sendmsg eth0, NAT:		sys     0m2.539s
- sendmsg eth0, NAT + patch:	sys     0m2.445s	(no change)

- sendmsg lo, no NAT:		sys	0m2.151s
- sendmsg lo, NAT:		sys     0m3.557s
- sendmsg lo, NAT + patch:	sys     0m2.159s	(~ -40%)

I expect other users can see a similar performance improvement,
packet mangling iptables targets, ipip and ip_gre come to mind ..

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-07-10 22:15:37 -07:00
Johannes Berg 2cd052e443 [NET] skbuff: remove export of static symbol
skb_clone_fraglist is static so it shouldn't be exported.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-07-05 17:40:19 -07:00
Olaf Kirch 5b5a60da28 [NET]: Make skb_seq_read unmap the last fragment
Having walked through the entire skbuff, skb_seq_read would leave the
last fragment mapped.  As a consequence, the unwary caller would leak
kmaps, and proceed with preempt_count off by one. The only (kind of
non-intuitive) workaround is to use skb_seq_read_abort.

This patch makes sure skb_seq_read always unmaps frag_data after
having cycled through the skb's paged part.

Signed-off-by: Olaf Kirch <olaf.kirch@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-06-23 23:11:52 -07:00
Patrick McHardy dbbeb2f991 [SKBUFF]: Fix incorrect config #ifdef around skb_copy_secmark
secmark doesn't depend on CONFIG_NET_SCHED.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Acked-by: James Morris <jmorris@namei.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-06-23 22:58:34 -07:00
Mikael Pettersson b6ccc67d8e [NET]: Fix net/core/skbuff.c gcc-3.2.3 compilation error
Compiling 2.6.22-rc1 with gcc-3.2.3 for i486 fails with:

  gcc -m32 -Wp,-MD,net/core/.skbuff.o.d  -nostdinc -isystem /home/mikpe/pkgs/linux-x86/gnu/lib/gcc-lib/i486-pc-linux-gnu/3.2.3/include -D__KERNEL__ -Iinclude  -include include/linux/autoconf.h -Wall -Wundef -Wstrict-prototypes -Wno-trigraphs -fno-strict-aliasing -fno-common -O2 -pipe -msoft-float -mregparm=3 -freg-struct-return -mpreferred-stack-boundary=4  -march=i486 -ffreestanding -maccumulate-outgoing-args -DCONFIG_AS_CFI=1  -Iinclude/asm-i386/mach-default -fomit-frame-pointer       -D"KBUILD_STR(s)=#s" -D"KBUILD_BASENAME=KBUILD_STR(skbuff)"  -D"KBUILD_MODNAME=KBUILD_STR(skbuff)" -c -o net/core/skbuff.o net/core/skbuff.c
net/core/skbuff.c:648:1: directives may not be used inside a macro argument
net/core/skbuff.c:647:39: unterminated argument list invoking macro "memcpy"
net/core/skbuff.c: In function `pskb_expand_head':
net/core/skbuff.c:651: `memcpy' undeclared (first use in this function)
net/core/skbuff.c:651: (Each undeclared identifier is reported only once
net/core/skbuff.c:651: for each function it appears in.)
net/core/skbuff.c:651: syntax error before "skb"
make[2]: *** [net/core/skbuff.o] Error 1
make[1]: *** [net/core] Error 2
make: *** [net] Error 2

The patch below implements a simple workaround which is to
clone the offending memcpy() call and specialise it for the
two different scenarios.

Other workarounds are of course possible: e.g. bind the varying
parameter in a local variable, or use a macro or inline function
to perform the varying computation.

Signed-off-by: Mikael Pettersson <mikpe@it.uu.se>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-05-19 13:55:25 -07:00
David S. Miller 1a028e5072 [NET]: Revert sk_buff walker cleanups.
This reverts eefa390628

The simplification made in that change works with the assumption that
the 'offset' parameter to these functions is always positive or zero,
which is not true.  It can be and often is negative in order to access
SKB header values in front of skb->data.

Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-27 15:21:23 -07:00
Jean Delvare eefa390628 [NET]: Clean up sk_buff walkers.
I noticed recently that, in skb_checksum(), "offset" and "start" are
essentially the same thing and have the same value throughout the
function, despite being computed differently. Using a single variable
allows some cleanups and makes the skb_checksum() function smaller,
more readable, and presumably marginally faster.

We appear to have many other "sk_buff walker" functions built on the
exact same model, so the cleanup applies to them, too. Here is a list
of the functions I found to be affected:

net/appletalk/ddp.c:atalk_sum_skb()
net/core/datagram.c:skb_copy_datagram_iovec()
net/core/datagram.c:skb_copy_and_csum_datagram()
net/core/skbuff.c:skb_copy_bits()
net/core/skbuff.c:skb_store_bits()
net/core/skbuff.c:skb_checksum()
net/core/skbuff.c:skb_copy_and_csum_bit()
net/core/user_dma.c:dma_skb_copy_datagram_iovec()
net/xfrm/xfrm_algo.c:skb_icv_walk()
net/xfrm/xfrm_algo.c:skb_to_sgvec()

OTOH, I admit I'm a bit surprised, the cleanup is rather obvious so I'm
really wondering if I am missing something. Can anyone please comment
on this?

Signed-off-by: Jean Delvare <jdelvare@suse.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-26 00:44:22 -07:00
Stephen Hemminger 3ff50b7997 [NET]: cleanup extra semicolons
Spring cleaning time...

There seems to be a lot of places in the network code that have
extra bogus semicolons after conditionals.  Most commonly is a
bogus semicolon after: switch() { }

Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-25 22:29:24 -07:00
Stephen Hemminger 0c6fcc8a8c [NET] skbuff: skb_store_bits const is backwards
Getting warnings becuase skb_store_bits has skb as constant,
but the function overwrites it. Looks like const was on the
wrong side.

Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-25 22:29:17 -07:00
Patrick McHardy efd1e8d569 [SK_BUFF]: Fix missing offset adjustment in skb_copy_expand
skb_copy_expand changes the headroom, so it needs to adjust the header
offsets by the difference between the old and the new value.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-25 22:28:53 -07:00
Herbert Xu 663ead3bb8 [NET]: Use csum_start offset instead of skb_transport_header
The skb transport pointer is currently used to specify the start
of the checksum region for transmit checksum offload.  Unfortunately,
the same pointer is also used during receive side processing.

This creates a problem when we want to retransmit a received
packet with partial checksums since the skb transport pointer
would be overwritten.

This patch solves this problem by creating a new 16-bit csum_start
offset value to replace the skb transport header for the purpose
of checksums.  This offset is calculated from skb->head so that
it does not have to change when skb->data changes.

No extra space is required since csum_offset itself fits within
a 16-bit word so we can use the other 16 bits for csum_start.

For backwards compatibility, just before we push a packet with
partial checksums off into the device driver, we set the skb
transport header to what it would have been under the old scheme.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-25 22:28:40 -07:00
Patrick McHardy 56eb88828b [SK_BUFF]: Fix missing offset adjustment in pskb_expand_head
Since we're increasing the headroom, the header offsets need to be
increased by the same amount as well.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-25 22:28:36 -07:00
David Howells 716ea3a7aa [NET]: Move generic skbuff stuff from XFRM code to generic code
Move generic skbuff stuff from XFRM code to generic code so that
AF_RXRPC can use it too.

The kdoc comments I've attached to the functions needs to be checked
by whoever wrote them as I had to make some guesses about the workings
of these functions.

Signed-off-By: David Howells <dhowells@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-25 22:28:33 -07:00
Arnaldo Carvalho de Melo 27d7ff46a3 [SK_BUFF]: Introduce skb_copy_to_linear_data{_offset}
To clearly state the intent of copying to linear sk_buffs, _offset being a
overly long variant but interesting for the sake of saving some bytes.

Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
2007-04-25 22:28:29 -07:00
Arnaldo Carvalho de Melo d626f62b11 [SK_BUFF]: Introduce skb_copy_from_linear_data{_offset}
To clearly state the intent of copying from linear sk_buffs, _offset being a
overly long variant but interesting for the sake of saving some bytes.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2007-04-25 22:28:23 -07:00
Yasuyuki Kozakai 5f79e0f916 [NETFILTER]: nf_conntrack: don't use nfct in skb if conntrack is disabled
Signed-off-by: Yasuyuki Kozakai <yasuyuki.kozakai@toshiba.co.jp>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-25 22:27:44 -07:00
Arnaldo Carvalho de Melo 897933bcdf [SK_BUFF]: Remove skb_add_mtu() leftovers
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2007-04-25 22:26:35 -07:00
Arnaldo Carvalho de Melo ca0605a7c8 [SK_BUFF]: Adjust the zeroing up to tail in __alloc_skb too
I did it just in alloc_skb_from_cache, forgot __alloc_skb, fixed now.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-25 22:26:31 -07:00
Arnaldo Carvalho de Melo 4305b54135 [SK_BUFF]: Convert skb->end to sk_buff_data_t
Now to convert the last one, skb->data, that will allow many simplifications
and removal of some of the offset helpers.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-25 22:26:29 -07:00
Arnaldo Carvalho de Melo 27a884dc3c [SK_BUFF]: Convert skb->tail to sk_buff_data_t
So that it is also an offset from skb->head, reduces its size from 8 to 4 bytes
on 64bit architectures, allowing us to combine the 4 bytes hole left by the
layer headers conversion, reducing struct sk_buff size to 256 bytes, i.e. 4
64byte cachelines, and since the sk_buff slab cache is SLAB_HWCACHE_ALIGN...
:-)

Many calculations that previously required that skb->{transport,network,
mac}_header be first converted to a pointer now can be done directly, being
meaningful as offsets or pointers.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-25 22:26:28 -07:00
Arnaldo Carvalho de Melo 2e07fa9cd3 [SK_BUFF]: Use offsets for skb->{mac,network,transport}_header on 64bit architectures
With this we save 8 bytes per network packet, leaving a 4 bytes hole to be used
in further shrinking work, likely with the offsetization of other pointers,
such as ->{data,tail,end}, at the cost of adds, that were minimized by the
usual practice of setting skb->{mac,nh,n}.raw to a local variable that is then
accessed multiple times in each function, it also is not more expensive than
before with regards to most of the handling of such headers, like setting one
of these headers to another (transport to network, etc), or subtracting, adding
to/from it, comparing them, etc.

Now we have this layout for sk_buff on a x86_64 machine:

[acme@mica net-2.6.22]$ pahole vmlinux sk_buff
struct sk_buff {
	struct sk_buff *       next;             /*   0   8 */
	struct sk_buff *       prev;             /*   8   8 */
	struct rb_node         rb;               /*  16  24 */
	struct sock *          sk;               /*  40   8 */
	ktime_t                tstamp;           /*  48   8 */
	struct net_device *    dev;              /*  56   8 */
	/* --- cacheline 1 boundary (64 bytes) --- */
	struct net_device *    input_dev;        /*  64   8 */
	sk_buff_data_t         transport_header; /*  72   4 */
	sk_buff_data_t         network_header;   /*  76   4 */
	sk_buff_data_t         mac_header;       /*  80   4 */

	/* XXX 4 bytes hole, try to pack */

	struct dst_entry *     dst;              /*  88   8 */
	struct sec_path *      sp;               /*  96   8 */
	char                   cb[48];           /* 104  48 */
	/* cacheline 2 boundary (128 bytes) was 24 bytes ago*/
	unsigned int           len;              /* 152   4 */
	unsigned int           data_len;         /* 156   4 */
	unsigned int           mac_len;          /* 160   4 */
	union {
		__wsum         csum;             /*       4 */
		__u32          csum_offset;      /*       4 */
	};                                       /* 164   4 */
	__u32                  priority;         /* 168   4 */
	__u8                   local_df:1;       /* 172   1 */
	__u8                   cloned:1;         /* 172   1 */
	__u8                   ip_summed:2;      /* 172   1 */
	__u8                   nohdr:1;          /* 172   1 */
	__u8                   nfctinfo:3;       /* 172   1 */
	__u8                   pkt_type:3;       /* 173   1 */
	__u8                   fclone:2;         /* 173   1 */
	__u8                   ipvs_property:1;  /* 173   1 */

	/* XXX 2 bits hole, try to pack */

	__be16                 protocol;         /* 174   2 */
	void    (*destructor)(struct sk_buff *); /* 176   8 */
	struct nf_conntrack *  nfct;             /* 184   8 */
	/* --- cacheline 3 boundary (192 bytes) --- */
	struct sk_buff *       nfct_reasm;       /* 192   8 */
	struct nf_bridge_info *nf_bridge;        /* 200   8 */
	__u16                  tc_index;         /* 208   2 */
	__u16                  tc_verd;          /* 210   2 */
	dma_cookie_t           dma_cookie;       /* 212   4 */
	__u32                  secmark;          /* 216   4 */
	__u32                  mark;             /* 220   4 */
	unsigned int           truesize;         /* 224   4 */
	atomic_t               users;            /* 228   4 */
	unsigned char *        head;             /* 232   8 */
	unsigned char *        data;             /* 240   8 */
	unsigned char *        tail;             /* 248   8 */
	/* --- cacheline 4 boundary (256 bytes) --- */
	unsigned char *        end;              /* 256   8 */
}; /* size: 264, cachelines: 5 */
   /* sum members: 260, holes: 1, sum holes: 4 */
   /* bit holes: 1, sum bit holes: 2 bits */
   /* last cacheline: 8 bytes */

On 32 bits nothing changes, and pointers continue to be used with the compiler
turning all this abstraction layer into dust. But there are some sk_buff
validation tricks that are now possible, humm... :-)

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-25 22:26:21 -07:00