The PPP channel MTU is used with Multilink PPP when ppp_mp_explode() (see
ppp_generic module) tries to determine how big a fragment might be. According
to RFC 1661, the MTU excludes the 2-byte PPP protocol field, see the
corresponding comment and code in ppp_mp_explode():
/*
* hdrlen includes the 2-byte PPP protocol field, but the
* MTU counts only the payload excluding the protocol field.
* (RFC1661 Section 2)
*/
mtu = pch->chan->mtu - (hdrlen - 2);
However, the pppoe module *does* include the PPP protocol field in the channel
MTU, which is wrong as it causes the PPP payload to be 1-2 bytes too big under
certain circumstances (one byte if PPP protocol compression is used, two
otherwise), causing the generated Ethernet packets to be dropped. So the pppoe
module has to subtract two bytes from the channel MTU. This error only
manifests itself when using Multilink PPP, as otherwise the channel MTU is not
used anywhere.
In the following, I will describe how to reproduce this bug. We configure two
pppd instances for multilink PPP over two PPPoE links, say eth2 and eth3, with
a MTU of 1492 bytes for each link and a MRRU of 2976 bytes. (This MRRU is
computed by adding the two link MTUs and subtracting the MP header twice, which
is 4 bytes long.) The necessary pppd statements on both sides are "multilink
mtu 1492 mru 1492 mrru 2976". On the client side, we additionally need "plugin
rp-pppoe.so eth2" and "plugin rp-pppoe.so eth3", respectively; on the server
side, we additionally need to start two pppoe-server instances to be able to
establish two PPPoE sessions, one over eth2 and one over eth3. We set the MTU
of the PPP network interface to the MRRU (2976) on both sides of the connection
in order to make use of the higher bandwidth. (If we didn't do that, IP
fragmentation would kick in, which we want to avoid.)
Now we send a ICMPv4 echo request with a payload of 2948 bytes from client to
server over the PPP link. This results in the following network packet:
2948 (echo payload)
+ 8 (ICMPv4 header)
+ 20 (IPv4 header)
---------------------
2976 (PPP payload)
These 2976 bytes do not exceed the MTU of the PPP network interface, so the
IP packet is not fragmented. Now the multilink PPP code in ppp_mp_explode()
prepends one protocol byte (0x21 for IPv4), making the packet one byte bigger
than the negotiated MRRU. So this packet would have to be divided in three
fragments. But this does not happen as each link MTU is assumed to be two bytes
larger. So this packet is diveded into two fragments only, one of size 1489 and
one of size 1488. Now we have for that bigger fragment:
1489 (PPP payload)
+ 4 (MP header)
+ 2 (PPP protocol field for the MP payload (0x3d))
+ 6 (PPPoE header)
--------------------------
1501 (Ethernet payload)
This packet exceeds the link MTU and is discarded.
If one configures the link MTU on the client side to 1501, one can see the
discarded Ethernet frames with tcpdump running on the client. A
ping -s 2948 -c 1 192.168.15.254
leads to the smaller fragment that is correctly received on the server side:
(tcpdump -vvvne -i eth3 pppoes and ppp proto 0x3d)
52:54:00:ad:87:fd > 52:54:00:79:5c:d0, ethertype PPPoE S (0x8864),
length 1514: PPPoE [ses 0x3] MLPPP (0x003d), length 1494: seq 0x000,
Flags [end], length 1492
and to the bigger fragment that is not received on the server side:
(tcpdump -vvvne -i eth2 pppoes and ppp proto 0x3d)
52:54:00:70:9e:89 > 52:54:00:5d:6f:b0, ethertype PPPoE S (0x8864),
length 1515: PPPoE [ses 0x5] MLPPP (0x003d), length 1495: seq 0x000,
Flags [begin], length 1493
With the patch below, we correctly obtain three fragments:
52:54:00:ad:87:fd > 52:54:00:79:5c:d0, ethertype PPPoE S (0x8864),
length 1514: PPPoE [ses 0x1] MLPPP (0x003d), length 1494: seq 0x000,
Flags [begin], length 1492
52:54:00:70:9e:89 > 52:54:00:5d:6f:b0, ethertype PPPoE S (0x8864),
length 1514: PPPoE [ses 0x1] MLPPP (0x003d), length 1494: seq 0x000,
Flags [none], length 1492
52:54:00:ad:87:fd > 52:54:00:79:5c:d0, ethertype PPPoE S (0x8864),
length 27: PPPoE [ses 0x1] MLPPP (0x003d), length 7: seq 0x000,
Flags [end], length 5
And the ICMPv4 echo request is successfully received at the server side:
IP (tos 0x0, ttl 64, id 21925, offset 0, flags [DF], proto ICMP (1),
length 2976)
192.168.222.2 > 192.168.15.254: ICMP echo request, id 30530, seq 0,
length 2956
The bug was introduced in commit c9aa689537
("[PPPOE]: Advertise PPPoE MTU") from the very beginning. This patch applies
to 3.10 upwards but the fix can be applied (with minor modifications) to
kernels as old as 2.6.32.
Signed-off-by: Christoph Schulz <develop@kristov.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use possibly more efficient ether_addr_equal
to instead of memcmp.
Cc: Michal Ostrowski <mostrows@earthlink.net>
Cc: netdev@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ding Tianhong <dingtianhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch now always passes msg->msg_namelen as 0. recvmsg handlers must
set msg_namelen to the proper size <= sizeof(struct sockaddr_storage)
to return msg_name to the user.
This prevents numerous uninitialized memory leaks we had in the
recvmsg handlers and makes it harder for new code to accidentally leak
uninitialized memory.
Optimize for the case recvfrom is called with NULL as address. We don't
need to copy the address at all, so set it to NULL before invoking the
recvmsg handler. We can do so, because all the recvmsg handlers must
cope with the case a plain read() is called on them. read() also sets
msg_name to NULL.
Also document these changes in include/linux/net.h as suggested by David
Miller.
Changes since RFC:
Set msg->msg_name = NULL if user specified a NULL in msg_name but had a
non-null msg_namelen in verify_iovec/verify_compat_iovec. This doesn't
affect sendto as it would bail out earlier while trying to copy-in the
address. It also more naturally reflects the logic by the callers of
verify_iovec.
With this change in place I could remove "
if (!uaddr || msg_sys->msg_namelen == 0)
msg->msg_name = NULL
".
This change does not alter the user visible error logic as we ignore
msg_namelen as long as msg_name is NULL.
Also remove two unnecessary curly brackets in ___sys_recvmsg and change
comments to netdev style.
Cc: David Miller <davem@davemloft.net>
Suggested-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
So far, only net_device * could be passed along with netdevice notifier
event. This patch provides a possibility to pass custom structure
able to provide info that event listener needs to know.
Signed-off-by: Jiri Pirko <jiri@resnulli.us>
v2->v3: fix typo on simeth
shortened dev_getter
shortened notifier_info struct name
v1->v2: fix notifier_call parameter in call_netdevice_notifier()
Signed-off-by: David S. Miller <davem@davemloft.net>
proc_net_remove is only used to remove proc entries
that under /proc/net,it's not a general function for
removing proc entries of netns. if we want to remove
some proc entries which under /proc/net/stat/, we still
need to call remove_proc_entry.
this patch use remove_proc_entry to replace proc_net_remove.
we can remove proc_net_remove after this patch.
Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Right now, some modules such as bonding use proc_create
to create proc entries under /proc/net/, and other modules
such as ipv4 use proc_net_fops_create.
It looks a little chaos.this patch changes all of
proc_net_fops_create to proc_create. we can remove
proc_net_fops_create after this patch.
Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When PPPOE is running over a virtual ethernet interface (e.g., a
bonding interface) and the user tries to delete the interface in case
the PPPOE state is ZOMBIE, the kernel will loop forever while
unregistering net_device for the reference count is not decreased to
zero which should have been done with dev_put().
Signed-off-by: Xiaodong Xu <stid.smth@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Call consume_skb() in place of kfree_skb() were appropriate.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The patch removes unused return value from __delete_item() and
delete_item() methods in drivers/net/ppp/pppoe.c.
Signed-off-by: Rami Rosen <ramirose@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Since all that include/linux/if_ppp.h does is #include <linux/ppp-ioctl.h>,
this replaces the occurrences of #include <linux/if_ppp.h> with
#include <linux/ppp-ioctl.h>.
It also corrects an error in Documentation/networking/l2tp.txt, where
it referenced include/linux/if_ppp.h as the source of some definitions
that are actually now defined in include/linux/if_pppol2tp.h.
Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Move the PPP drivers into drivers/net/ppp/ and make the
necessary Kconfig and Makefile changes.
CC: Paul Mackerras <paulus@samba.org>
CC: Frank Cusack <fcusack@fcusack.com>
CC: Michal Ostrowski <mostrows@speakeasy.net>
CC: Michal Ostrowski <mostrows@earthlink.net>
CC: Dmitry Kozlov <xeb@mail.ru>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>