The problem is that when new policies are inserted, sockets do not see
the update (but all new route lookups do).
This bug is related to the SA insertion stale route issue solved
recently, and this policy visibility problem can be fixed in a similar
way.
The fix is to flush out the bundles of all policies deeper than the
policy being inserted. Consider beginning state of "outgoing"
direction policy list:
policy A --> policy B --> policy C --> policy D
First, realize that inserting a policy into a list only potentially
changes IPSEC routes for that direction. Therefore we need not bother
considering the policies for other directions. We need only consider
the existing policies in the list we are doing the inserting.
Consider new policy "B'", inserted after B.
policy A --> policy B --> policy B' --> policy C --> policy D
Two rules:
1) If policy A or policy B matched before the insertion, they
appear before B' and thus would still match after inserting
B'
2) Policy C and D, now "shadowed" and after policy B', potentially
contain stale routes because policy B' might be selected
instead of them.
Therefore we only need flush routes assosciated with policies
appearing after a newly inserted policy, if any.
Signed-off-by: David S. Miller <davem@davemloft.net>
I hope to actually change this behaviour shortly but this will help
anybody grepping code at present.
Signed-off-by: Ian McDonald <imcdnzl@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
If you add more than one IPv6 address belonging to the same prefix and
delete the address that was last added, routing table entry for that
prefix is also deleted.
Tested on 2.6.14.4
To reproduce:
ip addr add 3ffe::1/64 dev eth0
ip addr add 3ffe::2/64 dev eth0
/* wait DAD */
sleep 1
ip addr del 3ffe::2/64 dev eth0
ip -6 route
(route to 3ffe::/64 should be gone)
In ipv6_del_addr(), if ifa == ifp, we set ifa->if_next to NULL, and later
assign ifap = &ifa->if_next, effectively terminating the for-loop.
This prevents us from checking if there are other addresses using the same
prefix that are valid, and thus resulting in deletion of the prefix.
This applies only if the first entry in idev->addr_list is the address to
be deleted.
Signed-off-by: Kristian Slavov <kristian.slavov@nomadiclab.com>
Acked-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
In vlan_ioctl_handler() the code misses couple checks for
error return values.
Signed-off-by: Mika Kukkonen <mikukkon@iki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
I found these while compiling with extra gcc warnings;
considering the indenting surely they are not intentional?
Signed-off-by: Mika Kukkonen <mikukkon@iki.fi>
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
gss_create_upcall() should not error just because rpc.gssd closed the
pipe on its end. Instead, it should requeue the pending requests and then
retry.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Make sctp_writeable() use sk_wmem_alloc rather than sk_wmem_queued to
determine the sndbuf space available. It also removes all the modifications
to sk_wmem_queued as it is not currently used in SCTP.
Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: Sridhar Samudrala <sri@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When we insert a new xfrm_state which potentially
subsumes an existing one, make sure all cached
bundles are flushed so that the new SA is used
immediately.
Signed-off-by: David S. Miller <davem@davemloft.net>
The route expiration time is stored in rt6i_expires in jiffies.
The argument of rt6_route_add() for adding a route is not the
expiration time in jiffies nor in clock_t, but the lifetime
(or time left before expiration) in clock_t.
Because of the confusion, we sometimes saw several strange errors
(FAILs) in TAHI IPv6 Ready Logo Phase-2 Self Test.
The symptoms were analyzed by Mitsuru Chinen <CHINEN@jp.ibm.com>.
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
A typo caused some bridged IPv6 packets to get dropped randomly,
as reported by Sebastien Chaumontet. The patch below fixes this
(using skb->nh.raw instead of raw) and also makes the jumbo packet
length checking up-to-date with the code in
net/ipv6/exthdrs.c::ipv6_hop_jumbo.
Signed-off-by: Bart De Schuymer <bdschuym@pandora.be>
Signed-off-by: David S. Miller <davem@davemloft.net>
IP6_NF_TARGET_NFQUEUE depends on IP6_NF_IPTABLES, not IP_NF_IPTABLES.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
As noticed by Phil Oester, the GRE NAT protocol helper is initialized
before the NAT core, which makes registration fail.
Change the linking order to make NAT be initialized first.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Receiving VLAN packets over a device (without VLAN assist) that is
doing hardware checksumming (CHECKSUM_HW), causes errors because the
VLAN code forgets to adjust the hardware checksum.
Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
The skb_postpull_rcsum introduced a bug to the checksum modification.
Although the length pulled is offset bytes, the origin of the pulling
is the GRE header, not the IP header.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Noticed by Andi Kleen, it is pointless to emit the device
structure pointer in the kernel logs like this.
Signed-off-by: David S. Miller <davem@davemloft.net>
When a TFTP client is SNATed so that the port is also changed, the
port is never changed back for the expected connection.
Signed-off-by: Marcus Sundberg <marcus@ingate.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
It is a simple bug which uses the wrong member.
This bug does not seriously affect ordinary use of IPsec.
But it is important to pass IPv6 ready logo phase-2
conformance test of IPsec SGW.
Signed-off-by: Kazunori MIYAZAWA <miyazawa@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
The problem I was seeing turned out to be that skb->dev is NULL when
the checksum is being completed in user context. This happens because
the reference to the device is dropped (to allow it to be released
when packets are in the queue).
Because skb->dev was NULL, the netdev_rx_csum_fault was panicing on
deref of dev->name. How about this?
Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
So we can properly use __GFP_COMP and avoid the use of
PG_reserved pages.
With extremely helpful review from Hugh Dickins.
Signed-off-by: David S. Miller <davem@davemloft.net>
We have to store the congestion control timestamp on the SKB before we
clone it, not after. Else we get no timestamping information at all.
tcp_transmit_skb() has been reworked so that we can do the timestamp
still in one spot, instead of at all the call sites.
Problem discovered, and initial fix, from Tom Young
<tyo@ee.unimelb.edu.au>.
Signed-off-by: David S. Miller <davem@davemloft.net>
Remove unneeded call to tcp_vegas_rtt_calc. The more accurate
microsecond value has already been registered prior to calling
tcp_vegas_cong_avoid.
Signed-off-by: Thomas Young <tyo@ee.mu.oz.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Move the resetting of rtt measurements to inside the once per RTT
block of code.
Signed-off-by: Thomas Young <tyo@ee.mu.oz.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
The patch (originally from Steve) simply adds memory buffer settings to
DECnet similar to those in TCP.
Signed-off-by: Patrick Caulfield <patrick@tykepenguin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When a function takes a function pointer as argument it should use the 'return
(*pointer)(params...)' syntax used everywhere else in the kernel as this is
recognized by kernel-doc.
Signed-off-by: Martin Waitz <tali@admingilde.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
NFA_NEST calls NFA_PUT which jumps to nfattr_failure if the skb has no
room left. We call read_unlock_bh at nfattr_failure for the NFA_PUT inside
the locked section, so move NFA_NEST inside the locked section too.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Should have been marked EXPERIMENTAL from the beginning, as the current
bunch of fixes show.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
ip_conntrack_flush() used to be part of ip_conntrack_cleanup(), which needs
to drop _all_ references on module unload. Table flushed using ctnetlink
just needs to clean the table and doesn't need to flush the event cache or
wait for any references attached to skbs. Move everything but pure table
flushing back to ip_conntrack_cleanup().
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
At least, valid nfnetlink message should have nlmsghdr and nfgenmsg.
Signed-off-by: Yasuyuki Kozakai <yasuyuki.kozakai@toshiba.co.jp>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
This makes nf_conntrack_icmpv6 check that ICMPv6 type isn't < 128
to avoid accessing out of array valid_new[] and invmap[].
Signed-off-by: Yasuyuki Kozakai <yasuyuki.kozakai@toshiba.co.jp>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
ip_nat_initialized() takes enum ip_nat_manip_type as it's second argument,
not a hook number.
Noticed and initial patch by Marcus Sundberg <marcus@ingate.com>.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
The elements on rpci->in_upcall are tracked by the filp->private_data,
which will ensure that they get released when the file is closed.
The exception is if rpc_close_pipes() gets called first, since that
sets rpci->ops to NULL.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
[ Modified to match inet_create() bug fix by Herbert Xu -DaveM ]
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
There is a coding error in inet_create that causes it to always return
ESOCKTNOSUPPORT. It should return EPROTONOSUPPORT when there are
protocols registered for a given socket type but none of them match
the requested protocol.
This is based on a patch by Jayachandran C.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
From: David Stevens <dlstevens@us.ibm.com>
As explained at:
http://www.cs.ucsb.edu/~krishna/igmp_dos/
With IGMP version 1 and 2 it is possible to inject a unicast
report to a client which will make it ignore multicast
reports sent later by the router.
The fix is to only accept the report if is was sent to a
multicast or unicast address.
Signed-off-by: David S. Miller <davem@davemloft.net>
an ipv4 socket.
Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: Sridhar Samudrala <sri@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch fixes an issue where it is possible to get valid data after
a ENOTCONN error. It returns socket errors only after data queued on
socket receive queue is consumed.
Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: Sridhar Samudrala <sri@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The receive path for fib_lookup netlink messages is lacking sanity
checks for header and payload and is thus vulnerable to malformed
netlink messages causing illegal memory references.
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Around jiffies wrap time (i.e. within first 5 mins after boot), recent
match rules which contain both --seconds and --hitcount arguments
experience false matches.
This is because the last_pkts array is filled with zeros on creation, and
when comparing 'now' to 0 (+ --seconds argument), time_before_eq thinks it
has found a hit.
Below patch adds a break if the packet value is zero. This has the
unfortunate side effect of causing mismatches if a packet was received
when jiffies really was equal to zero. The odds of that happening are
slim compared to the problems caused by not adding the break however.
Plus, the author used this same method just below, so it is "good enough".
This fixes netfilter bugs #383 and #395.
Signed-off-by: Phil Oester <kernel@linuxace.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Mounting NFS file systems after a (warm) reboot could take a long time if
firewalling and connection tracking was enabled.
The reason is that the NFS clients tends to use the same ports (800 and
counting down). Now on reboot, the server would still have a TCB for an
existing TCP connection client:800 -> server:2049. The client sends a
SYN from port 800 to server:2049, which elicits an ACK from the server.
The firewall on the client drops the ACK because (from its point of
view) the connection is still in half-open state, and it expects to see
a SYNACK.
The client will eventually time out after several minutes.
The following patch corrects this, by accepting ACKs on half open
connections as well.
Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch makes two needlessly global functions static.
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: David S. Miller <davem@davemloft.net>