linux/net/ipv4
Eric Dumazet 6746960140 ipv6: RTAX_FEATURE_ALLFRAG causes inefficient TCP segment sizing
Quoting Tore Anderson from :
https://bugzilla.kernel.org/show_bug.cgi?id=42572

When RTAX_FEATURE_ALLFRAG is set on a route, the effective TCP segment
size does not take into account the size of the IPv6 Fragmentation
header that needs to be included in outbound packets, causing every
transmitted TCP segment to be fragmented across two IPv6 packets, the
latter of which will only contain 8 bytes of actual payload.

RTAX_FEATURE_ALLFRAG is typically set on a route in response to
receving a ICMPv6 Packet Too Big message indicating a Path MTU of less
than 1280 bytes. 1280 bytes is the minimum IPv6 MTU, however ICMPv6
PTBs with MTU < 1280 are still valid, in particular when an IPv6
packet is sent to an IPv4 destination through a stateless translator.
Any ICMPv4 Need To Fragment packets originated from the IPv4 part of
the path will be translated to ICMPv6 PTB which may then indicate an
MTU of less than 1280.

The Linux kernel refuses to reduce the effective MTU to anything below
1280 bytes, instead it sets it to exactly 1280 bytes, and
RTAX_FEATURE_ALLFRAG is also set. However, the TCP segment size appears
to be set to 1240 bytes (1280 Path MTU - 40 bytes of IPv6 header),
instead of 1232 (additionally taking into account the 8 bytes required
by the IPv6 Fragmentation extension header).

This in turn results in rather inefficient transmission, as every
transmitted TCP segment now is split in two fragments containing
1232+8 bytes of payload.

After this patch, all the outgoing packets that includes a
Fragmentation header all are "atomic" or "non-fragmented" fragments,
i.e., they both have Offset=0 and More Fragments=0.

With help from David S. Miller

Reported-by: Tore Anderson <tore@fud.no>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Maciej Żenczykowski <maze@google.com>
Cc: Tom Herbert <therbert@google.com>
Tested-by: Tore Anderson <tore@fud.no>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-27 00:03:34 -04:00
..
netfilter net: Convert all sysctl registrations to register_net_sysctl 2012-04-20 21:22:30 -04:00
Kconfig net: Fix build regression when INET_UDP_DIAG=y and IPV6=m 2012-02-07 13:35:28 -05:00
Makefile tcp memory pressure controls 2011-12-12 19:04:10 -05:00
af_inet.c sock: Introduce named constants for sk_reuse 2012-04-21 15:52:25 -04:00
ah4.c ipv4: fix checkpatch errors 2012-04-15 12:37:19 -04:00
arp.c net: cleanup unsigned to unsigned int 2012-04-15 12:44:40 -04:00
cipso_ipv4.c ipv4: Convert call_rcu() to kfree_rcu(), drop opt_kfree_rcu() 2012-02-21 09:03:31 -08:00
datagram.c ipv4: Lock socket and use cork flow in ip4_datagram_connect(). 2011-05-08 13:48:57 -07:00
devinet.c net ipv4: Convert devinet to use register_net_sysctl 2012-04-20 21:22:30 -04:00
esp4.c net: ipv4: Standardize prefixes for message logging 2012-03-12 17:05:21 -07:00
fib_frontend.c net: cleanup unsigned to unsigned int 2012-04-15 12:44:40 -04:00
fib_lookup.h ipv4: Fix nexthop caching wrt. scoping. 2011-03-24 18:06:47 -07:00
fib_rules.c ipv4: Stop using NLA_PUT*(). 2012-04-02 04:33:43 -04:00
fib_semantics.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2012-04-10 14:30:45 -04:00
fib_trie.c Remove all #inclusions of asm/system.h 2012-03-28 18:30:03 +01:00
gre.c net: ipv4: Standardize prefixes for message logging 2012-03-12 17:05:21 -07:00
icmp.c Remove all #inclusions of asm/system.h 2012-03-28 18:30:03 +01:00
igmp.c ipv4: fix checkpatch errors 2012-04-15 12:37:19 -04:00
inet_connection_sock.c sock: Introduce named constants for sk_reuse 2012-04-21 15:52:25 -04:00
inet_diag.c net: sock_diag_handler structs can be const 2012-04-25 20:46:59 -04:00
inet_fragment.c net/ipv4: EXPORT_SYMBOL cleanups 2010-07-12 12:57:54 -07:00
inet_hashtables.c ipv4: fix checkpatch errors 2012-04-15 12:37:19 -04:00
inet_lro.c net: add skb frag size accessors 2011-10-19 03:10:46 -04:00
inet_timewait_sock.c net: cleanup unsigned to unsigned int 2012-04-15 12:44:40 -04:00
inetpeer.c route: Remove redirect_genid 2012-03-08 00:30:32 -08:00
ip_forward.c ipv4: fix checkpatch errors 2012-04-15 12:37:19 -04:00
ip_fragment.c net: Convert all sysctl registrations to register_net_sysctl 2012-04-20 21:22:30 -04:00
ip_gre.c ipv4: fix checkpatch errors 2012-04-15 12:37:19 -04:00
ip_input.c Remove all #inclusions of asm/system.h 2012-03-28 18:30:03 +01:00
ip_options.c net: cleanup unsigned to unsigned int 2012-04-15 12:44:40 -04:00
ip_output.c Remove all #inclusions of asm/system.h 2012-03-28 18:30:03 +01:00
ip_sockglue.c net: cleanup unsigned to unsigned int 2012-04-15 12:44:40 -04:00
ipcomp.c net: Convert printks to pr_<level> 2012-03-11 23:42:51 -07:00
ipconfig.c net: cleanup unsigned to unsigned int 2012-04-15 12:44:40 -04:00
ipip.c ipv4: fix checkpatch errors 2012-04-15 12:37:19 -04:00
ipmr.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2012-04-10 14:30:45 -04:00
netfilter.c net: Delete all remaining instances of ctl_path 2012-04-20 21:22:30 -04:00
ping.c net: cleanup unsigned to unsigned int 2012-04-15 12:44:40 -04:00
proc.c tcp: reduce out_of_order memory use 2012-03-19 16:53:08 -04:00
protocol.c net: add __rcu annotations to protocol 2010-10-27 11:37:31 -07:00
raw.c ipv4: fix checkpatch errors 2012-04-15 12:37:19 -04:00
route.c net: Convert all sysctl registrations to register_net_sysctl 2012-04-20 21:22:30 -04:00
syncookies.c tcp: fix syncookie regression 2012-03-11 15:52:12 -07:00
sysctl_net_ipv4.c net: Delete all remaining instances of ctl_path 2012-04-20 21:22:30 -04:00
tcp.c tcp repair: Fix unaligned access when repairing options (v2) 2012-04-26 06:13:51 -04:00
tcp_bic.c tcp: fix undo after RTO for BIC 2012-01-20 14:17:26 -05:00
tcp_cong.c net: ipv4: Standardize prefixes for message logging 2012-03-12 17:05:21 -07:00
tcp_cubic.c tcp: fix undo after RTO for CUBIC 2012-01-20 14:17:26 -05:00
tcp_diag.c inet_diag: Rename inet_diag_req into inet_diag_req_v2 2012-01-11 12:56:06 -08:00
tcp_highspeed.c tcp: mark tcp_congestion_ops read_mostly 2011-03-10 00:40:17 -08:00
tcp_htcp.c tcp: mark tcp_congestion_ops read_mostly 2011-03-10 00:40:17 -08:00
tcp_hybla.c tcp: mark tcp_congestion_ops read_mostly 2011-03-10 00:40:17 -08:00
tcp_illinois.c tcp: mark tcp_congestion_ops read_mostly 2011-03-10 00:40:17 -08:00
tcp_input.c tcp: tcp_try_coalesce returns a boolean 2012-04-23 23:36:58 -04:00
tcp_ipv4.c tcp: sk_add_backlog() is too agressive for TCP 2012-04-23 22:28:28 -04:00
tcp_lp.c Fix common misspellings 2011-03-31 11:26:23 -03:00
tcp_memcontrol.c Merge branch 'for-3.4' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup 2012-03-20 18:11:21 -07:00
tcp_minisocks.c tcp: md5: rcu conversion 2012-01-31 12:14:00 -05:00
tcp_output.c ipv6: RTAX_FEATURE_ALLFRAG causes inefficient TCP segment sizing 2012-04-27 00:03:34 -04:00
tcp_probe.c net: cleanup unsigned to unsigned int 2012-04-15 12:44:40 -04:00
tcp_scalable.c tcp: mark tcp_congestion_ops read_mostly 2011-03-10 00:40:17 -08:00
tcp_timer.c net: ipv4: Standardize prefixes for message logging 2012-03-12 17:05:21 -07:00
tcp_vegas.c tcp: mark tcp_congestion_ops read_mostly 2011-03-10 00:40:17 -08:00
tcp_vegas.h
tcp_veno.c tcp: mark tcp_congestion_ops read_mostly 2011-03-10 00:40:17 -08:00
tcp_westwood.c tcp: mark tcp_congestion_ops read_mostly 2011-03-10 00:40:17 -08:00
tcp_yeah.c Fix common misspellings 2011-03-31 11:26:23 -03:00
tunnel4.c net: Convert printks to pr_<level> 2012-03-11 23:42:51 -07:00
udp.c net: add a limit parameter to sk_add_backlog() 2012-04-23 22:28:28 -04:00
udp_diag.c net: kill duplicate included header 2012-01-17 10:31:12 -05:00
udp_impl.h ipv4: fix checkpatch errors 2012-04-15 12:37:19 -04:00
udplite.c net: ipv4: Standardize prefixes for message logging 2012-03-12 17:05:21 -07:00
xfrm4_input.c net/ipv4: EXPORT_SYMBOL cleanups 2010-07-12 12:57:54 -07:00
xfrm4_mode_beet.c ipsec: be careful of non existing mac headers 2012-02-23 16:50:45 -05:00
xfrm4_mode_transport.c
xfrm4_mode_tunnel.c ipsec: be careful of non existing mac headers 2012-02-23 16:50:45 -05:00
xfrm4_output.c xfrm4: Don't call icmp_send on local error 2011-07-01 17:33:19 -07:00
xfrm4_policy.c net: Convert all sysctl registrations to register_net_sysctl 2012-04-20 21:22:30 -04:00
xfrm4_state.c net: Add export.h for EXPORT_SYMBOL/THIS_MODULE to non-modules 2011-10-31 19:30:30 -04:00
xfrm4_tunnel.c net: ipv4: Standardize prefixes for message logging 2012-03-12 17:05:21 -07:00