The current IPSEC rule resolution behavior we have does not work for a
lot of people, even though technically it's an improvement from the
-EAGAIN buisness we had before.
Right now we'll block until the key manager resolves the route. That
works for simple cases, but many folks would rather packets get
silently dropped until the key manager resolves the IPSEC rules.
We can't tell these folks to "set the socket non-blocking" because
they don't have control over the non-block setting of things like the
sockets used to resolve DNS deep inside of the resolver libraries in
libc.
With that in mind I coded up the patch below with some help from
Herbert Xu which provides packet-drop behavior during larval state
resolution, controllable via sysctl and off by default.
This lays the framework to either:
1) Make this default at some point or...
2) Move this logic into xfrm{4,6}_policy.c and implement the
ARP-like resolution queue we've all been dreaming of.
The idea would be to queue packets to the policy, then
once the larval state is resolved by the key manager we
re-resolve the route and push the packets out. The
packets would timeout if the rule didn't get resolved
in a certain amount of time.
Signed-off-by: David S. Miller <davem@davemloft.net>
Now the skb->nh union has just one member, .raw, i.e. it is just like the
skb->mac union, strange, no? I'm just leaving it like that till the transport
layer is done with, when we'll rename skb->mac.raw to skb->mac_header (or
->mac_header_offset?), ditto for ->{h,nh}.
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
For the places where we need a pointer to the network header, it is still legal
to touch skb->nh.raw directly if just adding to, subtracting from or setting it
to another layer header.
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Do this even for non-blocking sockets. This avoids the silly -EAGAIN
that applications can see now, even for non-blocking sockets in some
cases (f.e. connect()).
With help from Venkat Tekkirala.
Signed-off-by: David S. Miller <davem@davemloft.net>
That accumulated over the last months hackaton, shame on me for not
using git-apply whitespace helping hand, will do that from now on.
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
This reaps the benefit of the earlier patch, which changed the type of
CCID 3 states to use enums, in that many conditions are now simplified
and the number of possible (unexpected) values is greatly reduced.
In a few instances, this also allowed to simplify pre-conditions; where
care has been taken to retain logical equivalence.
[DCCP]: Introduce a consistent BUG/WARN message scheme
This refines the existing set of DCCP messages so that
* BUG(), BUG_ON(), WARN_ON() have meaningful DCCP-specific counterparts
* DCCP_CRIT (for severe warnings) is not rate-limited
* DCCP_WARN() is introduced as rate-limited wrapper
Using these allows a faster and cleaner transition to their original
counterparts once the code has matured into a full DCCP implementation.
Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Spotted by Ian McDonald, tentatively fixed by Gerrit Renker:
http://www.mail-archive.com/dccp%40vger.kernel.org/msg00599.html
Rewritten not to unroll sk_receive_skb, in the common case, i.e. no lock
debugging, its optimized away.
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
This patch does not change code; it performs some trivial clean/tidy-ups:
* removal of a `debug_prefix' string in favour of the
already existing dccp_role(sk)
* add documentation of structures and constants
* separated out the cases for invalid packets (step 1
of the packet validation)
* removing duplicate statements
* combining declaration & initialisation
Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Throughout the TCP/DCCP (and tunnelling) code, it often happens that the
return code of a transmit function needs to be tested against NET_XMIT_CN
which is a value that does not indicate a strict error condition.
This patch uses a macro for these recurring situations which is consistent
with the already existing macro net_xmit_errno, saving on duplicated code.
Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
This
* resolves a FIXME - DCCPv6 connections started all with
an initial sequence number of 1;
* provides a redirection `secure_dccpv6_sequence_number'
in case the init_sequence_v6 code should be updated later;
* concentrates the update of S.GAR into dccp_connect_init();
* removes a duplicate dccp_update_gss() in ipv4.c;
* uses inet->dport instead of usin->sin_port, due to the
following assignment in dccp_v4_connect():
inet->dport = usin->sin_port;
Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
This patch removes the following redundancies:
1) The test skb->protocol == htons(ETH_P_IPV6) in dccp_v6_init_sequence
is always true since
* dccp_v6_conn_request() is the only calling function
* dccp_v6_conn_request() redirects all skb's with ETH_P_IP to
dccp_v4_conn_request()
2) The first argument, `struct sock *sk', of dccp_v{4,6}_init_sequence()
is never used.
(This is similar for tcp_v{4,6}_init_sequence, an analogous patch has been
submitted to netdev and merged.)
By the way - are the `sport' / `dport' arguments in the right order?
I have made them consistent among calls but they seem to be in the
reverse order.
Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
This patch does the following:
a) introduces variable-length checksums as specified in [RFC 4340, sec. 9.2]
b) provides necessary socket options and documentation as to how to use them
c) basic support and infrastructure for the Minimum Checksum Coverage feature
[RFC 4340, sec. 9.2.1]: acceptability tests, user notification and user
interface
In addition, it
(1) fixes two bugs in the DCCPv4 checksum computation:
* pseudo-header used checksum_len instead of skb->len
* incorrect checksum coverage calculation based on dccph_x
(2) removes dccp_v4_verify_checksum() since it reduplicates code of the
checksum computation; code calling this function is updated accordingly.
(3) now uses skb_checksum(), which is safer than checksum_partial() if the
sk_buff has is a non-linear buffer (has pages attached to it).
(4) fixes an outstanding TODO item:
* If P.CsCov is too large for the packet size, drop packet and return.
The code has been tested with applications, the latest version of tcpdump now
comes with support for partial DCCP checksums.
Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Sorts out the comments for processing steps 2,3 in section 8.5 of RFC 4340.
All comments have been updated against this document, and the reference to step
2 has been made consistent throughout the files.
Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
This does the same for ipv6.c as the preceding one does for ipv4.c: Only the
inet_connection_sock_af_ops forward declarations remain, since at least
dccp_ipv6_mapped has a circular dependency to dccp_v6_request_recv_sock.
No code change, merely re-ordering.
Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
This patch removes two functions, the send_ack functions of request_sock,
which are not called/used by the DCCP code. It is correct that these
functions are not called, below is a justification why calling these
functions (on a passive socket in the LISTEN/RESPOND state) would mean
a DCCP protocol violation.
A) Background: using request_sock in TCP:
This is a code simplification and was singled out from the
DCCPv6 Oops patch on
http://www.mail-archive.com/dccp@vger.kernel.org/msg00600.html
It mainly makes the code consistent between ipv{4,6}.c for the functions
dccp_v4_rcv
dccp_v6_rcv
and removes the do_time_wait label to simplify code somewhat.
Commiter note: fixed up a compile problem, trivial.
Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
This is a code simplification:
it combines three often recurring operations into one inline function,
* allocate `len' bytes header space in skb
* fill these `len' bytes with zeroes
* cast the start of this header space as dccp_hdr
Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
TCP and RAW do not have this issue. Closes Bug #7432.
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Updates the references to spec documents throughout the code, taking into
account that
* the DCCP, CCID 2, and CCID 3 drafts all became RFCs in March this year
* RFC 1063 was obsoleted by RFC 1191
* draft-ietf-tcpimpl-pmtud-0x.txt was published as an Informational
RFC, RFC 2923 on 2000-09-22.
All references verified.
Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Based upon a patch from Jesper Juhl. Try to match the
TCP IPv6 code this was copied from as much as possible,
so that it's easy to see where to add the ipv6 pktoptions
support code.
Signed-off-by: David S. Miller <davem@davemloft.net>
I think I got the cause for the Oops observed in
http://www.mail-archive.com/dccp@vger.kernel.org/msg00578.html
The problem is always with applications listening on PF_INET6 sockets. Apart
from the mentioned oops, I observed another one one, triggered at irregular
intervals via timer interrupt:
run_timer_softirq -> dccp_keepalive_timer
-> inet_csk_reqsk_queue_prune
-> reqsk_free
-> dccp_v6_reqsk_destructor
The latter function is the problem and is also the last function to be called
in said kernel panic.
In any case, there is a real problem with allocating the right request_sock
which is what this patch tackles.
It fixes the following problem:
- application listens on PF_INET6
- DCCPv4 packet comes in, is handed over to dccp_v4_do_rcv, from there
to dccp_v4_conn_request
Now: socket is PF_INET6, packet is IPv4. The following code then furnishes the
connection with IPv6 - request_sock operations:
req = reqsk_alloc(sk->sk_prot->rsk_prot);
The first problem is that all further incoming packets will get a Reset since
the connection can not be looked up.
The second problem is worse:
--> reqsk_alloc is called instead of inet6_reqsk_alloc
--> consequently inet6_rsk_offset is never set (dangling pointer)
--> the request_sock_ops are nevertheless still dccp6_request_ops
--> destructor is called via reqsk_free
--> dccp_v6_reqsk_destructor tries to free random memory location (inet6_rsk_offset not set)
--> panic
I have tested this for a while, DCCP sockets are now handled correctly in all
three scenarios (v4/v6 only/v4-mapped).
Commiter note: I've added the dccp_request_sock_ops forward declaration to keep
the tree building and to reduce the size of the patch for 2.6.19,
later I'll move the functions to the top of the affected source
code to match what we have in the TCP counterpart, where this
problem hasn't existed in the first place, dumb me not to have
done the same thing on DCCP land 8)
Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Function sk_filter() is called from tcp_v{4,6}_rcv() functions with arg
needlock = 0, while socket is not locked at that moment. In order to avoid
this and similar issues in the future, use rcu for sk->sk_filter field read
protection.
Signed-off-by: Dmitry Mishin <dim@openvz.org>
Signed-off-by: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
Signed-off-by: Kirill Korotaev <dev@openvz.org>
Based on MIPL2 kernel patch.
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: Ville Nuorvala <vnuorval@tcs.hut.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
This automatically labels the TCP, Unix stream, and dccp child sockets
as well as openreqs to be at the same MLS level as the peer. This will
result in the selection of appropriately labeled IPSec Security
Associations.
This also uses the sock's sid (as opposed to the isec sid) in SELinux
enforcement of secmark in rcv_skb and postroute_last hooks.
Signed-off-by: Venkat Yekkirala <vyekkirala@TrustedCS.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This labels the flows that could utilize IPSec xfrms at the points the
flows are defined so that IPSec policy and SAs at the right label can
be used.
The following protos are currently not handled, but they should
continue to be able to use single-labeled IPSec like they currently
do.
ipmr
ip_gre
ipip
igmp
sit
sctp
ip6_tunnel (IPv6 over IPv6 tunnel device)
decnet
Signed-off-by: Venkat Yekkirala <vyekkirala@TrustedCS.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The current users of ip6_dst_lookup can be divided into two classes:
1) The caller holds no locks and is in user-context (UDP).
2) The caller does not want to lookup the dst cache at all.
The second class covers everyone except UDP because most people do
the cache lookup directly before calling ip6_dst_lookup. This patch
adds ip6_sk_dst_lookup for the first class.
Similarly ip6_dst_store users can be divded into those that need to
take the socket dst lock and those that don't. This patch adds
__ip6_dst_store for those (everyone except UDP/datagram) that don't
need an extra lock.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
When using the default sequence window size (100) I got the following in
my logs:
Jun 22 14:24:09 localhost kernel: [ 1492.114775] DCCP: Step 6 failed for
DATA packet, (LSWL(6279674225) <= P.seqno(6279674749) <=
S.SWH(6279674324)) and (P.ackno doesn't exist or LAWL(18798206530) <=
P.ackno(1125899906842620) <= S.AWH(18798206548), sending SYNC...
Jun 22 14:24:09 localhost kernel: [ 1492.115147] DCCP: Step 6 failed for
DATA packet, (LSWL(6279674225) <= P.seqno(6279674750) <=
S.SWH(6279674324)) and (P.ackno doesn't exist or LAWL(18798206530) <=
P.ackno(1125899906842620) <= S.AWH(18798206549), sending SYNC...
I went to alter the default sysctl and it didn't take for new sockets.
Below patch fixes this.
I think the default is too low but it is what the DCCP spec specifies.
As a side effect of this my rx speed using iperf goes from about 2.8 Mbits/sec
to 3.5. This is still far too slow but it is a step in the right direction.
Compile tested only for IPv6 but not particularly complex change.
Signed off by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
No code changes, just tidying up, in some cases moving EXPORT_SYMBOLs
to just after the function exported, etc.
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch extends {get|set}sockopt compatibility layer in order to
move protocol specific parts to their place and avoid huge universal
net/compat.c file in the future.
Signed-off-by: Dmitry Mishin <dim@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Merging it with its only user: dccp_v[46]_reqsk_send_ack.
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Using this also provides opportunities for introducing
inet_csk_alloc_skb that would call alloc_skb, account it to the sock
and skb_reserve(max_header), but I'll leave this for later, for now
using sk_prot->max_header consistently is enough.
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
No changes in the logic were made, just removing trailing whitespaces,
etc.
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Consolidating open coded sequences in tcp and dccp, v4 and v6.
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
I guess I forgot to add it, nah, now it just works:
18:04:33.274066 IP6 ::1.1476 > ::1.5001: request (service=0)
18:04:33.334482 IP6 ::1.5001 > ::1.1476: reset (code=bad_service_code)
Ditched IP_DCCP_UNLOAD_HACK, as now we would have to do it for both
IPv6 and IPv4, so I'll come up with another way for freeing the
control sockets in upcoming changesets.
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
As this is used by both ipv4 and ipv6 and is not ipv4 specific.
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Removing one more ipv6 uses ipv4 stuff case in dccp land.
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This also fixes the layout of dccp_hdr short sequence numbers, problem
was not fatal now as we only support long (48 bits) sequence numbers.
Signed-off-by: Andrea Bittau <a.bittau@cs.ucl.ac.uk>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
It was copy&pasted from tcp_v6_send_synack() which has
a DST leak recently fixed by Eric W. Biederman.
So dccp_v6_send_response() needs the same fix too.
Signed-off-by: David S. Miller <davem@davemloft.net>
Move nextheader offset to the IP6CB to make it possible to pass a
packet to ip6_input_finish multiple times and have it skip already
parsed headers. As a nice side effect this gets rid of the manual
hopopts skipping in ip6_input_finish.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
To help in reducing the number of include dependencies, several files were
touched as they were getting needed headers indirectly for stuff they use.
Thanks also to Alan Menegotto for pointing out that net/dccp/proto.c had
linux/dccp.h include twice.
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Its common enough to to justify that, TCP still can't use it as it has the
prequeueing stuff, still to be made generic in the not so distant future :-)
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>