linux/net
Eric Dumazet 6746960140 ipv6: RTAX_FEATURE_ALLFRAG causes inefficient TCP segment sizing
Quoting Tore Anderson from :
https://bugzilla.kernel.org/show_bug.cgi?id=42572

When RTAX_FEATURE_ALLFRAG is set on a route, the effective TCP segment
size does not take into account the size of the IPv6 Fragmentation
header that needs to be included in outbound packets, causing every
transmitted TCP segment to be fragmented across two IPv6 packets, the
latter of which will only contain 8 bytes of actual payload.

RTAX_FEATURE_ALLFRAG is typically set on a route in response to
receving a ICMPv6 Packet Too Big message indicating a Path MTU of less
than 1280 bytes. 1280 bytes is the minimum IPv6 MTU, however ICMPv6
PTBs with MTU < 1280 are still valid, in particular when an IPv6
packet is sent to an IPv4 destination through a stateless translator.
Any ICMPv4 Need To Fragment packets originated from the IPv4 part of
the path will be translated to ICMPv6 PTB which may then indicate an
MTU of less than 1280.

The Linux kernel refuses to reduce the effective MTU to anything below
1280 bytes, instead it sets it to exactly 1280 bytes, and
RTAX_FEATURE_ALLFRAG is also set. However, the TCP segment size appears
to be set to 1240 bytes (1280 Path MTU - 40 bytes of IPv6 header),
instead of 1232 (additionally taking into account the 8 bytes required
by the IPv6 Fragmentation extension header).

This in turn results in rather inefficient transmission, as every
transmitted TCP segment now is split in two fragments containing
1232+8 bytes of payload.

After this patch, all the outgoing packets that includes a
Fragmentation header all are "atomic" or "non-fragmented" fragments,
i.e., they both have Offset=0 and More Fragments=0.

With help from David S. Miller

Reported-by: Tore Anderson <tore@fud.no>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Maciej Żenczykowski <maze@google.com>
Cc: Tom Herbert <therbert@google.com>
Tested-by: Tore Anderson <tore@fud.no>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-27 00:03:34 -04:00
..
9p net: cleanup unsigned to unsigned int 2012-04-15 12:44:40 -04:00
802 net: Convert all sysctl registrations to register_net_sysctl 2012-04-20 21:22:30 -04:00
8021q vlan: Stop using NLA_PUT*(). 2012-04-02 04:33:44 -04:00
appletalk net: Convert all sysctl registrations to register_net_sysctl 2012-04-20 21:22:30 -04:00
atm net: cleanup unsigned to unsigned int 2012-04-15 12:44:40 -04:00
ax25 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2012-04-23 23:15:17 -04:00
batman-adv batman-adv: skip the window protection test when the originator has no neighbours 2012-04-18 09:54:02 +02:00
bluetooth Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth 2012-04-09 15:47:49 -04:00
bridge net: Convert all sysctl registrations to register_net_sysctl 2012-04-20 21:22:30 -04:00
caif net: cleanup unsigned to unsigned int 2012-04-15 12:44:40 -04:00
can can: fix sparse warning for cgw_list 2012-04-16 21:08:18 +02:00
ceph net: cleanup unsigned to unsigned int 2012-04-15 12:44:40 -04:00
core net: sock_diag_handler structs can be const 2012-04-25 20:46:59 -04:00
dcb net: dcb: add CEE notify calls 2012-04-25 19:47:17 -04:00
dccp net: Convert all sysctl registrations to register_net_sysctl 2012-04-20 21:22:30 -04:00
decnet net decnet: Convert to use register_net_sysctl 2012-04-20 21:22:29 -04:00
dns_resolver net: cleanup unsigned to unsigned int 2012-04-15 12:44:40 -04:00
dsa dsa: Move switch drivers to new directory drivers/net/dsa 2011-11-29 00:21:36 -05:00
econet sock: Introduce named constants for sk_reuse 2012-04-21 15:52:25 -04:00
ethernet net: cleanup unsigned to unsigned int 2012-04-15 12:44:40 -04:00
ieee802154 6lowpan: duplicate definition of IEEE802154_ALEN 2012-04-26 06:01:09 -04:00
ipv4 ipv6: RTAX_FEATURE_ALLFRAG causes inefficient TCP segment sizing 2012-04-27 00:03:34 -04:00
ipv6 ipv6: RTAX_FEATURE_ALLFRAG causes inefficient TCP segment sizing 2012-04-27 00:03:34 -04:00
ipx net: Convert all sysctl registrations to register_net_sysctl 2012-04-20 21:22:30 -04:00
irda net: Convert all sysctl registrations to register_net_sysctl 2012-04-20 21:22:30 -04:00
iucv Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux 2012-03-22 18:15:32 -07:00
key net: cleanup unsigned to unsigned int 2012-04-15 12:44:40 -04:00
l2tp net: cleanup unsigned to unsigned int 2012-04-15 12:44:40 -04:00
lapb Remove all #inclusions of asm/system.h 2012-03-28 18:30:03 +01:00
llc net: add a limit parameter to sk_add_backlog() 2012-04-23 22:28:28 -04:00
mac80211 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next into for-davem 2012-04-26 15:03:48 -04:00
netfilter sock: Introduce named constants for sk_reuse 2012-04-21 15:52:25 -04:00
netlabel netlabel: use GFP flags from caller instead of GFP_ATOMIC 2012-03-22 19:29:57 -04:00
netlink af_netlink: drop_monitor/dropwatch friendly 2012-04-24 00:35:14 -04:00
netrom net: Convert all sysctl registrations to register_net_sysctl 2012-04-20 21:22:30 -04:00
nfc Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next into for-davem 2012-04-18 14:27:48 -04:00
openvswitch net: cleanup unsigned to unsigned int 2012-04-15 12:44:40 -04:00
packet af_packet: packet_getsockopt() cleanup 2012-04-21 16:36:42 -04:00
phonet net: Convert all sysctl registrations to register_net_sysctl 2012-04-20 21:22:30 -04:00
rds sock: Introduce named constants for sk_reuse 2012-04-21 15:52:25 -04:00
rfkill device.h: cleanup users outside of linux/include (C files) 2012-03-11 14:27:37 -04:00
rose net: Convert all sysctl registrations to register_net_sysctl 2012-04-20 21:22:30 -04:00
rxrpc net: cleanup unsigned to unsigned int 2012-04-15 12:44:40 -04:00
sched Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2012-04-23 23:15:17 -04:00
sctp net: add a limit parameter to sk_add_backlog() 2012-04-23 22:28:28 -04:00
sunrpc sock: Introduce named constants for sk_reuse 2012-04-21 15:52:25 -04:00
tipc tipc: remove inline instances from C source files. 2012-04-24 00:41:03 -04:00
unix net: sock_diag_handler structs can be const 2012-04-25 20:46:59 -04:00
wanrouter wanrouter: Remove kernel_lock annotations 2011-11-07 13:27:30 -05:00
wimax net: cleanup unsigned to unsigned int 2012-04-15 12:44:40 -04:00
wireless Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next into for-davem 2012-04-26 15:03:48 -04:00
x25 net: add a limit parameter to sk_add_backlog() 2012-04-23 22:28:28 -04:00
xfrm net: Convert all sysctl registrations to register_net_sysctl 2012-04-20 21:22:30 -04:00
Kconfig net: Add Open vSwitch kernel components. 2011-12-03 09:35:17 -08:00
Makefile net: Add Open vSwitch kernel components. 2011-12-03 09:35:17 -08:00
compat.c net: cleanup unsigned to unsigned int 2012-04-15 12:44:40 -04:00
nonet.c
socket.c net: change big iov allocations 2012-04-21 16:24:20 -04:00
sysctl_net.c net: Remove register_net_sysctl_table 2012-04-20 21:22:30 -04:00