linux

Commit Graph

Author	SHA1	Message	Date
Gerrit Renker	797eba424d	[TFRC]: The function tfrc_rx_hist_entry_delete() is not used anymore Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:57:03 -08:00
Gerrit Renker	78282d2af5	[TFRC]: Move comment. Moved up the comment "Receiver routines" above the first occurrence of RX history routines. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:57:03 -08:00
Arnaldo Carvalho de Melo	b84a2189c4	[TFRC]: New rx history code Credit here goes to Gerrit Renker, that provided the initial implementation for this new codebase. I modified it just to try to make it closer to the existing API, renaming some functions, add namespacing and fix one bug where the tfrc_rx_hist_alloc was not freeing the allocated ring entries on the error path. Original changeset comment from Gerrit: ----------- This provides a new, self-contained and generic RX history service for TFRC based protocols. Details: * new data structure, initialisation and cleanup routines; * allocation of dccp_rx_hist entries local to packet_history.c, as a service exported by the dccp_tfrc_lib module. * interface to automatically track highest-received seqno; * receiver-based RTT estimation (needed for instance by RFC 3448, 6.3.1); * a generic function to test for `data packets' as per RFC 4340, sec. 7.7. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:56:43 -08:00
Gerrit Renker	30a0eacd47	[CCID3]: The receiver of a half-connection does not set window counter values Only the sender sets window counters [RFC 4342, sections 5 and 8.1]. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:56:43 -08:00
Arnaldo Carvalho de Melo	d58d1af03a	[TFRC]: Rename dccp_rx_ to tfrc_rx_ This is in preparation for merging the new rx history code written by Gerrit Renker. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:56:42 -08:00
Arnaldo Carvalho de Melo	34a9e7ea91	[TFRC]: Make the rx history slab be global This is in preparation for merging the new rx history code written by Gerrit Renker. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:56:41 -08:00
Arnaldo Carvalho de Melo	e9c8b24a6a	[TFRC]: Rename tfrc_tx_hist to tfrc_tx_hist_slab, for consistency Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:56:40 -08:00
Gerrit Renker	2180c41ca5	[DCCP]: Introduce generic function to test for `data packets' as per RFC 4340, sec. 7.7. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:56:40 -08:00
Gerrit Renker	c40616c597	[TFRC]: Provide central source file and debug facility This patch changes the tfrc_lib module in the following manner: (1) a dedicated tfrc source file to call the packet history & loss interval init/exit functions. (2) a dedicated tfrc_pr_debug macro with toggle switch `tfrc_debug'. Commiter note: renamed tfrc_module.c to tfrc.c, and made CONFIG_IP_DCCP_CCID3 select IP_DCCP_TFRC_LIB. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:56:39 -08:00
Arnaldo Carvalho de Melo	9108d5f4b2	[TFRC]: Hide tx history details from the CCIDs Based on a previous patch by Gerrit Renker. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:55:19 -08:00
Gerrit Renker	3159afe0d2	[DCCP]: Remove duplicate test for CloseReq This removes a redundant test for unexpected packet types. In dccp_rcv_state_process it is tested twice whether a DCCP-server has received a CloseReq (Step 7): * first in the combined if-statement, * then in the call to dccp_rcv_closereq(). The latter is necesssary since dccp_rcv_closereq() is also called from __dccp_rcv_established(). This patch removes the duplicate test. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:55:14 -08:00
Gerrit Renker	0c86962076	[DCCP]: Integrate state transitions for passive-close This adds the necessary state transitions for the two forms of passive-close * PASSIVE_CLOSE - which is entered when a host receives a Close; * PASSIVE_CLOSEREQ - which is entered when a client receives a CloseReq. Here is a detailed account of what the patch does in each state. 1) Receiving CloseReq The pseudo-code in 8.5 says: Step 13: Process CloseReq If P.type == CloseReq and S.state < CLOSEREQ, Generate Close S.state := CLOSING Set CLOSING timer. This means we need to address what to do in CLOSED, LISTEN, REQUEST, RESPOND, PARTOPEN, and OPEN. * CLOSED: silently ignore - it may be a late or duplicate CloseReq; * LISTEN/RESPOND: will not appear, since Step 7 is performed first (we know we are the client); * REQUEST: perform Step 13 directly (no need to enqueue packet); * OPEN/PARTOPEN: enter PASSIVE_CLOSEREQ so that the application has a chance to process unread data. When already in PASSIVE_CLOSEREQ, no second CloseReq is enqueued. In any other state, the CloseReq is ignored. I think that this offers some robustness against rare and pathological cases: e.g. a simultaneous close where the client sends a Close and the server a CloseReq. The client will then be retransmitting its Close until it gets the Reset, so ignoring the CloseReq while in state CLOSING is sane. 2) Receiving Close The code below from 8.5 is unconditional. Step 14: Process Close If P.type == Close, Generate Reset(Closed) Tear down connection Drop packet and return Thus we need to consider all states: * CLOSED: silently ignore, since this can happen when a retransmitted or late Close arrives; * LISTEN: dccp_rcv_state_process() will generate a Reset ("No Connection"); * REQUEST: perform Step 14 directly (no need to enqueue packet); * RESPOND: dccp_check_req() will generate a Reset ("Packet Error") -- left it at that; * OPEN/PARTOPEN: enter PASSIVE_CLOSE so that application has a chance to process unread data; * CLOSEREQ: server performed active-close -- perform Step 14; * CLOSING: simultaneous-close: use a tie-breaker to avoid message ping-pong (see comment); * PASSIVE_CLOSEREQ: ignore - the peer has a bug (sending first a CloseReq and now a Close); * TIMEWAIT: packet is ignored. Note that the condition of receiving a packet in state CLOSED here is different from the condition "there is no socket for such a connection": the socket still exists, but its state indicates it is unusable. Last, dccp_finish_passive_close sets either DCCP_CLOSED or DCCP_CLOSING = TCP_CLOSING, so that sk_stream_wait_close() will wait for the final Reset (which will trigger CLOSING => CLOSED). Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:55:13 -08:00
Gerrit Renker	f11135a344	[DCCP]: Dedicated auxiliary states to support passive-close This adds two auxiliary states to deal with passive closes: * PASSIVE_CLOSE (reached from OPEN via reception of Close) and * PASSIVE_CLOSEREQ (reached from OPEN via reception of CloseReq) as internal intermediate states. These states are used to allow a receiver to process unread data before acknowledging the received connection-termination-request (the Close/CloseReq). Without such support, it will happen that passively-closed sockets enter CLOSED state while there is still unprocessed data in the queue; leading to unexpected and erratic API behaviour. PASSIVE_CLOSE has been mapped into TCPF_CLOSE_WAIT, so that the code will seamlessly work with inet_accept() (which tests for this state). The state names are thanks to Arnaldo, who suggested this naming scheme following an earlier revision of this patch. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:55:12 -08:00
Gerrit Renker	f53dc67c5e	[DCCP]: Use AF-independent rebuild_header routine This fixes a nasty bug: dccp_send_reset() is called by both DCCPv4 and DCCPv6, but uses inet_sk_rebuild_header() in each case. This leads to unpredictable and weird behaviour: under some conditions, DCCPv6 Resets were sent, in other not. The fix is to use the AF-independent rebuild_header routine. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:55:12 -08:00
Arnaldo Carvalho de Melo	276f2edc52	[TFRC]: Migrate TX history to singly-linked lis This patch was based on another made by Gerrit Renker, his changelog was: ------------------------------------------------------ The patch set migrates TFRC TX history to a singly-linked list. The details are: * use of a consistent naming scheme (all TFRC functions now begin with `tfrc_'); * allocation and cleanup are taken care of internally; * provision of a lookup function, which is used by the CCID TX infrastructure to determine the time a packet was sent (in turn used for RTT sampling); * integration of the new interface with the present use in CCID3. ------------------------------------------------------ Simplifications I did: . removing the tfrc_tx_hist_head that had a pointer to the list head and another for the slabcache. . No need for creating a slabcache for each CCID that wants to use the TFRC tx history routines, create a single slabcache when the dccp_tfrc_lib module init routine is called. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:55:11 -08:00
Pavel Emelyanov	8d8ad9d7c4	[NET]: Name magic constants in sock_wake_async() The sock_wake_async() performs a bit different actions depending on "how" argument. Unfortunately this argument ony has numerical magic values. I propose to give names to their constants to help people reading this function callers understand what's going on without looking into this function all the time. I suppose this is 2.6.25 material, but if it's not (or the naming seems poor/bad/awful), I can rework it against the current net-2.6 tree. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:55:03 -08:00
Gerrit Renker	ce865a61c8	[DCCP]: Add support for abortive release This continues from the previous patch and adds support for actively aborting a DCCP connection, using a Reset Code 2, "Aborted" to inform the peer of an abortive release. I have tried this in various client/server settings and it works as expected. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:55:02 -08:00
Gerrit Renker	d83bd95bf1	[DCCP]: Check for unread data on close This removes one FIXME with regard to close when there is still unread data. The mechanism is implemented similar to TCP: with regard to DCCP-specifics, a Reset with Code 2, "Aborted" is sent to the peer. This corresponds in part to RFC 4340, 8.1.1 and 8.1.5. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:55:01 -08:00
Gerrit Renker	dcfbc7e97a	[CCID2]: Remove misleading comment This removes a comment which identifies an `issue' with dccp_write_xmit() where there is none. The comment assumes it is possible that a packet is sent between the calls to ccid_hc_tx_send_packet(), dccp_transmit_skb(), ccid_hc_tx_packet_sent() (in the above order) in dccp_write_xmit(). I think that this is impossible, since dccp_write_xmit() is always called under lock: * when called as dccp_write_xmit(sk, 1) from dccp_send_close(), the socket is locked (see code comment above dccp_send_close()); * when called as dccp_write_xmit(sk, 0) from dccp_send_msg(), it is after lock_sock() has been called; * when called as dccp_write_xmit(sk, 0) from dccp_write_xmit_timer(), bh_lock_sock() has been called and the if/else statement has made sure that sk_lock.owner is not set; * there are no other places where dccp_write_xmit() is called. Furthermore, the debug statement for printing the sequence number of the packet just sent has been removed, since the entire list is being printed anyway and so the entry of that number appears last. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:55:01 -08:00
Gerrit Renker	a302002516	[CCID2]: Remove redundant ack-counting variable The code used two different variables to count Acks, one of them redundant. This patch reduces the number of Ack counters to one. The type of the Ack counter has also been changed to u32 (twice the range of int); and the variable has been renamed into `packets_acked' - for consistency with RFC 3465 (and similarly named variables are used by TCP and SCTP). Lastly, a slightly less aggressive `maxincr' increment is used (for even Ack Ratios, maxincr was Ack Ratio/2 + 1 instead of Ack Ratio/2). Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:55:00 -08:00
Gerrit Renker	83399361c3	[CCID2]: Remove redundant synchronisation variable This removes the synchronisation variable `ccid2hctx_sendwait', which is set to 1 when the CCID2 sender may send a new packet, and which is set to 0 otherwise The variable is redundant, since it is only used in combination with the hc_tx_send_packet/ hc_tx_packet_sent function pair. Both functions are called under socket lock, so the following happens when the CCID2 may send a new packet: * it sets sendwait = 1 in tx_send_packet and returns 0; * the subsequent call to tx_packet_sent clears the sendwait flag; * since tx_send_packet returns 0 if and only if sendwait == 1, the BUG_ON condition in tx_packet_sent is never satisfied, since that function is never called when tx_send_packet returns a value different from 0 (cf. dccp_write_xmit); * the call to tx_packet_sent clears the flag so that the condition "!sendwait" is true the next time tx_packet_sent is called. In other words, it is sufficient to just return 0 / not-0 to synchronise tx_send_packet and tx_packet_sent -- which is what the patch does. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:54:59 -08:00
Gerrit Renker	da98e0b5d4	[CCID2]: Redundant debugging output This reduces the amount of redundant debugging messages: * pipe/cwnd are printed in both tx_send_packet() and tx_packet_sent(). Both functions are called immediately after one another, so one occurrence is sufficient. * Since tx_packet_sent() prints pipe/cwnd already, the second printk for pipe is redundant. * In tx_packet_sent() the check_sanity function is called twice (at the begin and at the end). Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:54:59 -08:00
Gerrit Renker	95b21d7e9d	[CCID2]: Replace pipe assignment-function with assignment The function ccid2_change_pipe only does an assignment. This patch simplifies the code by replacing the function with the assignment it performs. Furthermore, the type of pipe is promoted from `signed' to unsigned (increasing the range). As a result, a BUG_ON test for negative values now becomes obsolete (for safety not removed, but replaced with a less annoying `DCCP_BUG'). Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:54:58 -08:00
Gerrit Renker	3deeadd74b	[CCID2]: Replace cwnd assignment-function with assignment The current function ccid2_change_cwnd in effect makes only an assignment, as the test whether cwnd has reached 0 is only required when cwnd is halved. This patch simplifies the code by replacing the function with the assignment it performs. Furthermore, since ssthresh derives from cwnd and appears in many assignments and comparisons, the type of ssthresh has also been changed to match that of cwnd. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:54:57 -08:00
Gerrit Renker	63df18ad7f	[CCID2]: Replace read-only variable with constant This replaces the field member `numdupack', which was used as a read-only constant in the code, with a #define. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:54:57 -08:00
Gerrit Renker	7792cd8885	[CCID2]: Remove unused variable This removes a variable `ccid2hctx_sent' which is incremented but never referenced/read (i.e., dead code). Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:54:56 -08:00
Gerrit Renker	900bfed471	[CCID2]: Disable broken Ack Ratio adaptation algorithm This comments out a problematic section comprising a half-finished algorithm: - The variable `ccid2hctx_ackloss' is never initialised to a value different from 0 and hence in fact is a read-only constant. - The `arsent' variable counts packets other than Acks (it is incremented for every packet), and there is no test for Ack Loss. - The concept of counting Acks as such leads to a complex calculation, and the calculation at the moment is inconsistent with this concept. The problem is that the number of Acks - rather than the number of windows - is counted, which leads to a complex (cubic/quadratic) expression - this is not even implemented. In its current state, the commented-out algorithm interfers with normal processing by changing Ack Ratio incorrectly, and at the wrong times. A new algorithm is necessary, which will not necessarily use the same variables as used by the unfinished one; hence the old variables have been removed. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:54:55 -08:00
Gerrit Renker	b00d2bbc45	[CCID2]: Larger initial windows also for CCID2 RFC 4341, sec. 5 states that "The cwnd parameter is initialized to at most four packets for new connections, following the rules from [RFC3390]", which is implemented by this patch. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:54:55 -08:00
Arnaldo Carvalho de Melo	e18d7a9857	[DCCP]: Initialize dccp_sock before calling the ccid constructors This is because in the next patch CCID2 will assume that dccps_mss_cache is non-zero. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:54:54 -08:00
Gerrit Renker	d50ad163e6	[CCID2]: Deadlock and spurious timeouts when Ack Ratio > cwnd This patch removes a bug in the current code. I agree with Andrea's comment that there is a problem here but the way it is treated does not fix it. The problem is that whenever Ack Ratio > cwnd, starvation/deadlock occurs: * the receiver will not send an Ack until (Ack Ratio - cwnd) data packets have arrived; * the sender will not send any data packet before the receipt of an Ack advances the send window. The only way that the connection then progresses was via RTO timeout. In one extreme case (bulk transfer), it was observed that this happened for every single packet; i.e. hundreds of packets, each a RTO timeout of 1..3 seconds apart: a transfer which normally would take a fraction of a second thus grew to several minutes. The solution taken by this approach is to observe the relation "Ack Ratio <= cwnd" by using the constraint (1) from RFC 4341, 6.1.2; i.e. set Ack Ratio = ceil(cwnd / 2) and update it whenever either Ack Ratio or cwnd change. This ensures that the deadlock problem can not arise. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:54:53 -08:00
Gerrit Renker	df054e1d00	[CCID2]: Don't assign negative values to Ack Ratio Since it makes not sense to assign negative values to Ack Ratio, this patch disallows this possibility. As a consequence, a Bug test for negative Ack Ratio values becomes obsolete. Furthermore, a check against overflow (as Ack Ratio may not exceed 2 bytes, due to RFC 4340, 11.3) has been added. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:54:53 -08:00
Gerrit Renker	cfbbeabc88	[CCID2]: Fix sequence number arithmetic/comparisons This replaces use of normal subtraction with modulo-48 subtraction. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:54:52 -08:00
Gerrit Renker	3de5489f47	[CCID2]: Bug in reading Ack Vectors In CCID2 the receiver-history is sorted in ascending order of sequence number, but the processing of received Ack Vectors requires the list traversal in the opposite direction. The current code has a bug in this regard: the list traversal is upwards. As a consequence, only Ack Vectors with a run length of 1 will pass, in all other Ack Vectors the remaining (acked) sequence numbers are missed, and may later falsely be identified as lost. Note: This bug is only visible when Ack Ratio > 1, since otherwise the run lengths of Ack Vectors are 0. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:54:51 -08:00
Gerrit Renker	a47c51044a	[ACKVEC]: Reduce length of identifiers This is reduces the length of the struct ackvec/ackvec_record fields. It is a purely text-based replacement: s#dccpavr_#avr_#g; s#dccpav_#av_#g; and increases readability somewhat. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:54:51 -08:00
Gerrit Renker	c86ab2b6a5	[DCCP]: Ignore Ack Vectors / Elapsed Time on DCCP-Request also Small update with regard to RFC 4340 (references added as documentation): on Requests, Ack Vectors / Elapsed Time should be ignored. Length handling of Elapsed Time also simplified. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:54:47 -08:00
Gerrit Renker	6d57b43bf8	[DCCP]: Remove redundant dependency on IP_DCCP This cleans up the consequences of an earlier patch which introduced the `if IP_DCCP' clause into net/dccp/Kconfig. The CCID Kconfig menu is sourced within this clause; as a consequence, all tests of type `depends on IP_DCCP' are now redundant. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:54:46 -08:00
Gerrit Renker	e333b3edc4	[DCCP]: Promote CCID2 as default CCID This patch addresses the following problems: 1. DCCP relies for its proper functioning on having at least one CCID module enabled (as in TCP plugable congestion control). Currently it is possible to disable both CCIDs and thus leave the DCCP module in a compiled, but entirely non-functional state: no sockets can be created when no CCID is available. Furthermore, the protocol is (again like TCP) not intended to be used without CCIDs. Last, a non-empty CCID list is needed for doing CCID feature negotiation. 2. Internally the default CCID that is advertised by the Linux host is set to CCID2 (DCCPF_INITIAL_CCID in include/linux/dccp.h). Disabling CCID2 in the Kconfig menu without changing the defaults leads to a failure `module not found' when trying to load the dccp module (which internally tries to load the default CCID). 3. The specification (RFC 4340, sec. 10) treats CCID2 somewhat like a `minimum common denominator'; the specification says that: * "New connections start with CCID 2 for both endpoints" * "A DCCP implementation intended for general use, such as an implementation in a general-purpose operating system kernel, SHOULD implement at least CCID 2. The intent is to make CCID 2 broadly available for interoperability [...]" Providing CCID2 as minimum-required CCID (like Reno/Cubic in TCP) thus seems reasonable. Hence this patch automatically selects CCID2 when DCCP is enabled. Documentation also added. Discussions with Ian McDonald on this subject are gratefully acknowledged. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:54:46 -08:00
Gerrit Renker	8e8c71f1ab	[DCCP]: Honour and make use of shutdown option set by user This extends the DCCP socket API by honouring any shutdown(2) option set by the user. The behaviour is, as much as possible, made consistent with the API for TCP's shutdown. This patch exploits the information provided by the user via the socket API to reduce processing costs: * if the read end is closed (SHUT_RD), it is not necessary to deliver to input CCID; * if the write end is closed (SHUT_WR), the same idea applies, but with a difference - as long as the TX queue has not been drained, we need to receive feedback to keep congestion-control rates up to date. Hence SHUT_WR is honoured only after the last packet (under congestion control) has been sent; * although SHUT_RDWR seems nonsensical, it is nevertheless supported in the same manner as for TCP (and agrees with test for SHUTDOWN_MASK in dccp_poll() in net/dccp/proto.c). Furthermore, most of the code already honours the sk_shutdown flags (dccp_recvmsg() for instance sets the read length to 0 if SHUT_RD had been called); CCID handling is now added to this by the present patch. There will also no longer be any delivery when the socket is in the final stages, i.e. when one of dccp_close(), dccp_fin(), or dccp_done() has been called - which is fine since at that stage the connection is its final stages. Motivation and background are on http://www.erg.abdn.ac.uk/users/gerrit/dccp/notes/shutdown A FIXME has been added to notify the other end if SHUT_RD has been set (RFC 4340, 11.7). Note: There is a comment in inet_shutdown() in net/ipv4/af_inet.c which asks to "make sure the socket is a TCP socket". This should probably be extended to mean `TCP or DCCP socket' (the code is also used by UDP and raw sockets). Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:54:44 -08:00
Gerrit Renker	c3ada46a00	[CCID3]: Inline for moving average The moving average computation occurs so frequently in the CCID 3 code that it merits an inline function of its own. This is uses a suggestion by Arnaldo as per http://www.mail-archive.com/dccp@vger.kernel.org/msg01662.html Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:54:43 -08:00
Gerrit Renker	a5358fdc9c	[CCID3]: Accurately determine idle & application-limited periods This fixes/updates the handling of idle and application-limited periods in CCID3, which currently is broken: there is no detection as to how long a sender has been idle - there is only one flag which is toggled in between function calls. Being obsolete now, the `idle' flag is removed. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:54:42 -08:00
Gerrit Renker	eb279b79c4	[CCID3]: Ignore trivial amounts of elapsed time This patch fixes a previously undiscovered bug; the problem is in computing the elapsed time as the time between `receiving' the packet (i.e. skb enters CCID module) and sending feedback: - there is no layer-processing, queueing, or delay involved, - hence the elapsed time is in the order of 1 function call - this is in the dimension of maximally 50..100usec - which renders the use of elapsed time almost entirely useless. The fix is simply to ignore such trivial amounts of elapsed time. As a further advantage, the now useless elapsed_time field can be removed from the socket, which reduces the socket structure by another four bytes. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:54:42 -08:00
Gerrit Renker	6c08b2cf48	[CCID3]: Revert use of MSS instead of s This updates the CCID3 code with regard to two instances of using `MSS' in place of `s': 1. The RFC3390-based initial rate: both rfc3448bis as well as the Faster Restart draft now consistently use `s' instead of MSS. 2. Now agrees with section 4.2 of rfc3448bis: "If the sender is ready to send data when it does not yet have a round trip sample, the value of X is set to s bytes per second, for segment size s [...]" Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:54:41 -08:00
Pavel Emelyanov	b24b8a247f	[NET]: Convert init_timer into setup_timer Many-many code in the kernel initialized the timer->function and timer->data together with calling init_timer(timer). There is already a helper for this. Use it for networking code. The patch is HUGE, but makes the code 130 lines shorter (98 insertions(+), 228 deletions(-)). Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:53:35 -08:00
Joe Perches	5e8e034cc5	[DCCP]: Spelling fixes Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-12-20 13:59:39 -08:00
Joe Perches	4756daa3b6	[DCCP]: Add missing "space" Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-11-19 23:46:02 -08:00
Eric Dumazet	230140cffa	[INET]: Remove per bucket rwlock in tcp/dccp ehash table. As done two years ago on IP route cache table (commit `22c047ccbc`) , we can avoid using one lock per hash bucket for the huge TCP/DCCP hash tables. On a typical x86_64 platform, this saves about 2MB or 4MB of ram, for litle performance differences. (we hit a different cache line for the rwlock, but then the bucket cache line have a better sharing factor among cpus, since we dirty it less often). For netstat or ss commands that want a full scan of hash table, we perform fewer memory accesses. Using a 'small' table of hashed rwlocks should be more than enough to provide correct SMP concurrency between different buckets, without using too much memory. Sizing of this table depends on num_possible_cpus() and various CONFIG settings. This patch provides some locking abstraction that may ease a future work using a different model for TCP/DCCP table. Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-11-07 04:15:11 -08:00
David S. Miller	c62cf5cb17	[DCCP]: Use DEFINE_PROTO_INUSE infrastructure. Signed-off-by: David S. Miller <davem@davemloft.net>	2007-11-07 04:09:01 -08:00
Gerrit Renker	24c667db59	[CCID2/3]: Initialisation assignments of 0 are redundant Assigning initial values of `0' is redundant when loading a new CCID structure, since in net/dccp/ccid.c the entire CCID structure is zeroed out prior to initialisation in ccid_new(): struct ccid { struct ccid_operations ccid_ops; char ccid_priv[0]; }; // ... if (rx) { memset(ccid + 1, 0, ccid_ops->ccid_hc_rx_obj_size); if (ccid->ccid_ops->ccid_hc_rx_init != NULL && ccid->ccid_ops->ccid_hc_rx_init(ccid, sk) != 0) goto out_free_ccid; } else { memset(ccid + 1, 0, ccid_ops->ccid_hc_tx_obj_size); / analogous to the rx case */ } This patch therefore removes the redundant assignments. Thanks to Arnaldo for the inspiration. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2007-10-24 10:53:01 -02:00
Gerrit Renker	76fd1e87d9	[DCCP]: Unaligned pointer access This fixes `unaligned (read) access' errors of the type Kernel unaligned access at TPC[100f970c] dccp_parse_options+0x4f4/0x7e0 [dccp] Kernel unaligned access at TPC[1011f2e4] ccid3_hc_tx_parse_options+0x1ac/0x380 [dccp_ccid3] Kernel unaligned access at TPC[100f9898] dccp_parse_options+0x680/0x880 [dccp] by using the get_unaligned macro for parsing options. Commiter note: Preserved the sparse __be{16,32} annotations. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2007-10-24 10:46:58 -02:00
Gerrit Renker	d8ef2c29a0	[DCCP]: Convert Reset code into socket error number This adds support for converting the 11 currently defined Reset codes into system error numbers, which are stored in sk_err for further interpretation. This makes the externally visible API behaviour similar to TCP, since a client connecting to a non-existing port will experience ECONNREFUSED. * Code 0, Unspecified, is interpreted as non-error (0); * Code 1, Closed (normal termination), also maps into 0; * Code 2, Aborted, maps into "Connection reset by peer" (ECONNRESET); * Code 3, No Connection and Code 7, Connection Refused, map into "Connection refused" (ECONNREFUSED); * Code 4, Packet Error, maps into "No message of desired type" (ENOMSG); * Code 5, Option Error, maps into "Illegal byte sequence" (EILSEQ); * Code 6, Mandatory Error, maps into "Operation not supported on transport endpoint" (EOPNOTSUPP); * Code 8, Bad Service Code, maps into "Invalid request code" (EBADRQC); * Code 9, Too Busy, maps into "Too many users" (EUSERS); * Code 10, Bad Init Cookie, maps into "Invalid request descriptor" (EBADR); * Code 11, Aggression Penalty, maps into "Quota exceeded" (EDQUOT) which makes sense in terms of using more than the `fair share' of bandwidth. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2007-10-24 10:27:48 -02:00
Gerrit Renker	1238d0873b	[DCCP]: One more exemption from full sequence number checks This fixes the following problem: client connects to peer which has no DCCP enabled or loaded; ICMP error messages ("Protocol Unavailable") can be seen on the wire, but the application hangs. Reason: ICMP packets don't get through to dccp_v4_err. When reporting errors, a sequence number check is made for the DCCP packet that had caused an ICMP error to arrive. Such checks can not be made if the socket is in state LISTEN, RESPOND (which in the implementation is the same as LISTEN), or REQUEST, since update_gsr() has not been called in these states, hence the sequence window is 0..0. This patch fixes the problem by adding the REQUEST state as another exemption to the window check. The error reporting now works as expected on connecting. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz> Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>	2007-10-24 10:18:06 -02:00
Gerrit Renker	fde20105f3	[DCCP]: Retrieve packet sequence number for error reporting This fixes a problem when analysing erroneous packets in dccp_v{4,6}_err: * dccp_hdr_seq currently takes an skb * however, the transport headers in the skb are shifted, due to the preceding IPv4/v6 header. Fixed for v4 and v6 by changing dccp_hdr_seq to take a struct dccp_hdr as argument. Verified that the correct sequence number is now reported in the error handler. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2007-10-24 10:12:09 -02:00
Arnaldo Carvalho de Melo	6273172e17	[DCCP]: Implement SIOCINQ/FIONREAD Just like UDP. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Leandro Melo de Sales <leandroal@gmail.com> Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-23 21:27:50 -07:00
Jean Delvare	7131c6c736	[INET]: Use MODULE_ALIAS_NET_PF_PROTO_TYPE where possible. Now that we have this new macro, use it where possible. Signed-off-by: Jean Delvare <jdelvare@suse.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-22 02:59:54 -07:00
Jean Delvare	305e1e9691	[INET]: Let inet_diag and friends autoload By adding module aliases to inet_diag, tcp_diag and dccp_diag, we let them load automatically as needed. This makes tools like "ss" run faster. Signed-off-by: Jean Delvare <jdelvare@suse.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-22 02:59:54 -07:00
Ingo Molnar	bd5435e76a	[DCCP]: fix link error with !CONFIG_SYSCTL Do not define the sysctl_dccp_sync_ratelimit sysctl variable in the CONFIG_SYSCTL dependent sysctl.c module - move it to input.c instead. This fixes the following build bug: net/built-in.o: In function `dccp_check_seqno': input.c:(.text+0xbd859): undefined reference to `sysctl_dccp_sync_ratelimit' distcc[29953] ERROR: compile (null) on localhost failed make: *** [vmlinux] Error 1 Found via 'make randconfig' build testing. Signed-off-by: Ingo Molnar <mingo@elte.hu> Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-17 19:33:06 -07:00
Herbert Xu	e5bbef20e0	[IPV6]: Replace sk_buff ** with sk_buff * in input handlers With all the users of the double pointers removed from the IPv6 input path, this patch converts all occurances of sk_buff ** to sk_buff * in IPv6 input handlers. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-15 12:50:28 -07:00
Gerrit Renker	4a5409a5a8	[DCCP]: Twice the wrong reset code in receiving connection-Requests This fixes two bugs in processing of connection-Requests in v{4,6}_conn_request: 1. Due to using the variable `reset_code', the Reset code generated internally by dccp_parse_options() is overwritten with the initialised value ("Too Busy") of reset_code, which is not what is intended. 2. When receiving a connection-Request on a multicast or broadcast address, no Reset should be generated, to avoid storms of such packets. Instead of jumping to the `drop' label, the v{4,6}_conn_request functions now return 0. Below is why in my understanding this is correct: When the conn_request function returns < 0, then the caller, dccp_rcv_state_process(), returns 1. In all instances where dccp_rcv_state_process is called (dccp_v4_do_rcv, dccp_v6_do_rcv, and dccp_child_process), a return value of != 0 from dccp_rcv_state_process() means that a Reset is generated. If on the other hand the conn_request function returns 0, the packet is discarded and no Reset is generated. Note: There may be a related problem when sending the Response, due to the following. if (dccp_v6_send_response(sk, req, NULL)) goto drop_and_free; /* ... */ drop_and_free: return -1; In this case, if send_response fails due to transmission errors, the next thing that is generated is a Reset with a code "Too Busy". I haven't been able to conjure up such a condition, but it might be good to change the behaviour here also (not done by this patch). Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-10 16:54:38 -07:00
Gerrit Renker	dcad856fe8	[DCCP]: Wrong format in printk The elapsed time uses u32, but printk was using %d, not %u. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2007-10-10 16:54:36 -07:00
Gerrit Renker	451bc0473f	[DCCP]: Tidy-up -- minisock initialisation This * removes a declaration of a non-existent function __dccp_minisock_init; * shifts the initialisation function dccp_minisock_init() from options.c to minisocks.c, where it is more naturally expected to be. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-10 16:54:36 -07:00
Gerrit Renker	5e28599a6e	[CCID2]: Sequence number wraparound issues This replaces several uses of standard arithmetic with the DCCP sequence number arithmetic functions. The problem here is that the sequence number wrap-around was not taken into consideration. * Condition "seqp->ccid2s_seq <= prev->ccid2s_seq" has been replaced by dccp_delta_seqno(seqp->ccid2s_seq, prev->ccid2s_seq) >= 0 since if seqp is `before' prev, then the delta_seqno() is positive. * The test whether sequence numbers `a' and `b' are consecutive has the form dccp_delta_seqno(a, b) == 1 * Increment of ccid2hctx_rpseq could be done using dccp_inc_seqno(), but since here the incremented ccid2hctx_rpseq == seqno, used assignment instead. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-10 16:54:35 -07:00
Gerrit Renker	6c58324808	[CCID2]: Remove redundant case block skb's passed to ccid2_hc_tx_send_packet() are headerless, the packet type is decided later, in dccp_write_xmit(). Therefore the first test of the switch/case block is always true, the others are never reached. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-10 16:54:35 -07:00
Gerrit Renker	ee196c2186	[CCID2]: Remove redundant BUG_ON This removes a test for `val < 1' which would only have been triggered when val < 0, due to a preceding test for 0. Fixed by using an unsigned type for cwnd (as in TCP) instead. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-10 16:54:34 -07:00
Gerrit Renker	7d9e8931f9	[CCID2]: Remove ugly BUG_ON This removes an ugly BUG_ON which has been pointed out by Arnaldo. Instead of freezing up the machine, a `critical' message is now issued to the system log. There is potential of doing this more gracefully (eg. there are a few internal variables which could be updated despite the lack of memory), but that requires more complicated changes to the algorithm; thus a `FIXME' has been added. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-10 16:54:34 -07:00
Gerrit Renker	cd1f7d347c	[CCID2]: Simplify interface This patch simplifies the interface of ccid2_hc_tx_alloc_seq(): * ccid2_hc_tx_alloc_seq() is always called with an argument of CCID2_SEQBUF_LEN; * other code - ccid2_hc_tx_check_sanity() - even depends on the assumption that ccid2_hc_tx_alloc_seq() has been called with this particular size; * passing the `gfp_t' argument to ccid2_hc_tx_alloc_seq() is redundant with gfp_any(). Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-10 16:54:33 -07:00
Gerrit Renker	042d18f9f3	[DCCP]: Make all `debug' parameters bool This just sets the parameter to bool, since debugging messages are either on or off. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-10 16:54:32 -07:00
Gerrit Renker	7c559a9e44	[DCCP]: Add socket option to query the current MPS This enables applications to query the current value of the Maximum Packet Size via a socket option, suggested as a SHOULD in (RFC 4340, p. 102). This socket option is useful to avoid the annoying bail-out via `-EMSGSIZE'. In particular, as fragmentation is not currently supported (and its use is partly discouraged in RFC 4340). With this option, it is possible to size buffers accordingly, e.g. int buflen = dccp_get_cur_mps(sockfd); /* or */ if (msgsize > dccp_get_cur_mps(sockfd)) die("message is too large for this path"); Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-10 16:54:31 -07:00
Gerrit Renker	bc8498721d	[DCCP]: Wait for CCID This performs a minor optimisation: when ccid_hc_tx_send_packet returns a value greater zero, then the same call previously was done again at the begin of the while loop in dccp_wait_for_ccid. This patch exploits the available information and schedule-timeouts directly instead. Documentation also added. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-10 16:54:31 -07:00
Arnaldo Carvalho de Melo	bb293e6a24	[CCID3]: Remove ifdef surrounding BUG_ON As suggested by DaveM. Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>	2007-10-10 16:52:45 -07:00
Gerrit Renker	cecd8d0ec4	[DCCP]: Reduce the number of writable states Since DCCP requires to close both ends of a connection simultaneously, permission to write in state DCCP_CLOSING is removed in dccp_sendmsg(): * if the sending end closed, it would encounter a write error anyhow; * if the other end has closed the connection, it accepts no more data. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz> Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>	2007-10-10 16:52:45 -07:00
Gerrit Renker	e356d37a09	[DCCP]: Factor out common code for generating Resets This factors code common to dccp_v{4,6}_ctl_send_reset into a separate function, and adds support for filling in the Data 1 ... Data 3 fields from RFC 4340, 5.6. It is useful to have this separate, since the following Reset codes will always be generated from the control socket rather than via dccp_send_reset: * Code 3, "No Connection", cf. 8.3.1; * Code 4, "Packet Error" (identification for Data 1 added); * Code 5, "Option Error" (identification for Data 1..3 added, will be used later); * Code 6, "Mandatory Error" (same as Option Error); * Code 7, "Connection Refused" (what on Earth is the difference to "No Connection"?); * Code 8, "Bad Service Code"; * Code 9, "Too Busy"; * Code 10, "Bad Init Cookie" (not used). Code 0 is not recommended by the RFC, the following codes would be used in dccp_send_reset() instead, since they all relate to an established DCCP connection: * Code 1, "Closed"; * Code 2, "Aborted"; * Code 11, "Aggression Penalty" (12.3). Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>	2007-10-10 16:52:44 -07:00
Gerrit Renker	9bf55cda9b	[DCCP]: Sequence number wrap-around when sending reset This replaces normal addition with mod-48 addition so that sequence number wraparound is respected. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz> Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>	2007-10-10 16:52:43 -07:00
Gerrit Renker	a94f0f9705	[DCCP]: Rate-limit DCCP-Syncs This implements a SHOULD from RFC 4340, 7.5.4: "To protect against denial-of-service attacks, DCCP implementations SHOULD impose a rate limit on DCCP-Syncs sent in response to sequence-invalid packets, such as not more than eight DCCP-Syncs per second." The rate-limit is maintained on a per-socket basis. This is a more stringent policy than enforcing the rate-limit on a per-source-address basis and protects against attacks with forged source addresses. Moreover, the mechanism is deliberately kept simple. In contrast to xrlim_allow(), bursts of Sync packets in reply to sequence-invalid packets are not supported. This foils such attacks where the receipt of a Sync triggers further sequence-invalid packets. (I have tested this mechanism against xrlim_allow algorithm for Syncs, permitting bursts just increases the problems.) In order to keep flexibility, the timeout parameter can be set via sysctl; and the whole mechanism can even be disabled (which is however not recommended). The algorithm in this patch has been improved with regard to wrapping issues thanks to a suggestion by Arnaldo. Commiter note: Rate limited the step 6 DCCP_WARN too, as it says we're sending a sync. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz> Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>	2007-10-10 16:52:43 -07:00
Gerrit Renker	ee1a15922d	[DCCP]: Remove duplicate code for Reset from connected socket In this patch, duplicated code is removed for the case when a Reset packet is sent from a connected socket. This code duplication is between dccp_make_reset and dccp_transmit_skb, which already contained an (up to now entirely unused) switch statement to fill in the reset code from the DCCP_SKB_CB. The only thing that has been removed is the call to dst_clone(dst), since the queue_xmit functions use sk_dst_cache anyway. I wasn't sure which purpose inet_sk_rebuild_header served, so I left it in. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>	2007-10-10 16:52:42 -07:00
Gerrit Renker	0430ee3451	[DCCP]: Add Support for Data 1 .. 3 fields of Reset packets This adds fields to support the informational Data 1..3 fields of the DCCP-Reset packets (RFC 4340, 5.6), and makes minor cosmetic changes to documentation. Code which fills in these fields follows in subsequent patches, it is primarily used for reporting option-processing and feature-negotiation errors. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>	2007-10-10 16:52:42 -07:00
Gerrit Renker	727ecc5faa	[DCCP]: Add FIXME for send_delayed_ack This adds a FIXME to signal that the function dccp_send_delayed_ack is nowhere used in the entire DCCP/CCID code. Using a delayed Ack timer is suggested in 11.3 of RFC 4340, but it has also rather subtle implications for the Ack-Ratio-accounting. CCID2 does not use this (maybe it should). I think leaving the function in is good, in case someone wants to implement this. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>	2007-10-10 16:52:41 -07:00
Gerrit Renker	2e86908f7d	[CCID3]: Move NULL-protection into function This moves several instances of testing against NULL into the function which is used to de-reference the CCID-private data. Committer note: Made the BUG_ON depend on having CONFIG_IP_DCCP_CCID3_DEBUG, as it is too much to have this on production code. Also made sure that the macro is used only after checking if sk_state is not LISTEN, to make it equivalent to what we had before. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz> Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>	2007-10-10 16:52:41 -07:00
Gerrit Renker	08831700cc	[DCCP]: Send Reset upon Sync in state REQUEST This fixes the code to correspond to RFC 4340, 7.5.4, which states the exception that a Sync received in state REQUEST generates a Reset (not a SyncAck). To achieve this, only a small change is required. Since dccp_rcv_request_sent_state_process() already uses the correct Reset Code number 4 ("Packet Error"), we only need to shift the if-statement a few lines further down. (To test this case: replace DCCP_PKT_RESPONSE with DCCP_PKT_SYNC in dccp_make_response.) Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz> Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>	2007-10-10 16:52:40 -07:00
Gerrit Renker	b0d045ca45	[DCCP]: Parameter renaming The parameter `seq' of dccp_send_sync() is in fact an acknowledgement number and not a sequence number - thus renamed by this patch into `ackno'. Secondly, a `critical' warning is added when a Sync/SyncAck could not be sent. Sanity: I have checked all other functions that are called in dccp_transmit_skb, there are no clashes with the use of dccpd_ack_seq; no other function is using this slot at the same time. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz> Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-10 16:52:37 -07:00
Gerrit Renker	e155d76922	[DCCP]: Fix Reset/Sync-Flood Bug This updates sequence number checking with regard to RFC 4340, 7.5.4. Missing in the code was an exception for sequence-invalid Reset packets, which get a Sync acknowledging GSR, instead of (as usual) P.seqno. This can lead to an oscillating ping-pong flood of Reset packets. In fact, it has been observed on the wire as follows: 1. client establishes connection to server; 2. before server can write to client, client crashes without notifying the server (NB: now no longer possible due to ABORT function); 3. server sends DCCP-Data packet (has no ackno); 4. client generates Reset "No Connection", seqno=0, increments seqno; 5. server replies with Sync, using ackno = P.seqno; 6. client generates Reset "No Connection" with seqno = ackno + 1; 7. goto (5). The difference is that now in (5) the server uses GSR. This causes the Reset sent by the client in (6) to become sequence-valid, so that in (7) the vicious circle is broken; the Reset is then enqueued and causes the socket to enter TIMEWAIT state. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz> Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-10 16:52:37 -07:00
Gerrit Renker	cbe1f5f88a	[DCCP]: Shorten variable names in dccp_check_seqno This patch is in part required by the next patch; it * replaces 6 instances of `DCCP_SKB_CB(skb)->dccpd_seq' with `seqno'; * replaces 7 instances of `DCCP_SKB_CB(skb)->dccpd_ack_seq' with `ackno'; * replaces 1 use of dccp_inc_seqno() by unfolding `ADD48' macro in place. No changes in algorithm, all changes are text replacement/substitution. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz> Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-10 16:52:36 -07:00
Gerrit Renker	3393da8241	[DCCP]: Simplify interface of dccp_sample_rtt The third parameter of dccp_sample_rtt now becomes useless and is removed. Also combined the subtraction of the timestamp echo and the elapsed time. This is safe, since (a) presence of timestamp echo is tested first and (b) elapsed time is either present and non-zero or it is not set and equals 0 due to the memset in dccp_parse_options. To avoid measuring option-processing time, the timestamp for measuring the initial Request/Response RTT sample is taken directly when the function is called (the Linux implementation always adds a timestamp on the Request, so there is no loss in doing this). Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz> Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-10 16:52:35 -07:00
Gerrit Renker	4c70f383e0	[DCCP]: Provide 10s of microsecond timesource This provides a timesource, conveniently used for DCCP timestamps, which returns the elapsed time in 10s of microseconds since initialisation. This makes for a wrap-around time of about 11.9 hours, which should be sufficient for most applications. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-10 16:52:35 -07:00
Gerrit Renker	aa97efd97a	[DCCP]: Reuse ktime_get_real() calls again This patch reduces the number of timestamps taken in the receive path for each packet. The ccid3_hc_tx_update_x() routine is called in * the receive path for each CCID3-controlled packet * for the nofeedback timer (if no feedback arrives during 4 RTT) Currently, when there is no loss, each packet gets timestamped twice. The patch resolves this by recycling the first timestamp taken on packet reception for RTT sampling. When the no_feedback_timer() is called, then the timestamp argument is simply set to NULL - so that ccid3_hc_tx_update_x() takes care of the logic. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-10 16:52:34 -07:00
Eric W. Biederman	457c4cbc5a	[NET]: Make /proc/net per network namespace This patch makes /proc/net per network namespace. It modifies the global variables proc_net and proc_net_stat to be per network namespace. The proc_net file helpers are modified to take a network namespace argument, and all of their callers are fixed to pass &init_net for that argument. This ensures that all of the /proc/net files are only visible and usable in the initial network namespace until the code behind them has been updated to be handle multiple network namespaces. Making /proc/net per namespace is necessary as at least some files in /proc/net depend upon the set of network devices which is per network namespace, and even more files in /proc/net have contents that are relevant to a single network namespace. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-10 16:49:06 -07:00
Micah Gruber	c726187225	[DCCP]: Remove unneeded pointer newdp from dccp_v4_request_recv_sock() This trivial patch removes the unneeded pointer newdp, which is never used. Signed-off-by: Micah Gruber <micah.gruber@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-10 16:48:59 -07:00
Ilpo Järvinen	172589ccdd	[NET]: DIV_ROUND_UP cleanup (part two) Hopefully captured all single statement cases under net/. I'm not too sure if there is some policy about #includes that are "guaranteed" (ie., in the current tree) to be available through some other #included header, so I just added linux/kernel.h to each changed file that didn't #include it previously. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-10 16:48:37 -07:00
Arnaldo Carvalho de Melo	6168b96c07	[DCCP]: Nuke the timeval helpers now that we fully converted to ktime_t Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-10 16:48:17 -07:00
Arnaldo Carvalho de Melo	8fb8354af9	[DCCP]: Nuke dccp_timestamp and dccps_epoch, not used anymore Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-10 16:48:17 -07:00
Arnaldo Carvalho de Melo	234748954a	[DCCP] options: convert dccp_insert_option_timestamp to ktime_t Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-10 16:48:16 -07:00
Arnaldo Carvalho de Melo	19ac21465e	[DCCP]: Convert dccps_timestamp_time to ktime_t Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-10 16:48:16 -07:00
Arnaldo Carvalho de Melo	0740d49c24	[DCCP] packet_history: Convert dccphtx_tstamp to ktime_t Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-10 16:48:15 -07:00
Arnaldo Carvalho de Melo	e7c2335794	[DCCP] packet_history: convert dccphrx_tstamp to ktime_t Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-10 16:48:14 -07:00
Arnaldo Carvalho de Melo	b8bda9d708	[DCCP] ackvec: Convert to ktime_t Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-10 16:48:14 -07:00
Arnaldo Carvalho de Melo	668348a423	[DCCP] CCID3: Stop using dccp_timestamp Now to convert the ackvec code to ktime_t so that we can get rid of dccp_timestamp and the epoch thing in dccp_sock. Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-10 16:48:13 -07:00
Arnaldo Carvalho de Melo	9823b7b554	[DCCP]: Convert dccp_sample_rtt to ktime_t Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-10 16:48:13 -07:00
Arnaldo Carvalho de Melo	e7a81c6d62	[DCCP]: Convert ccid3hcrx_tstamp_last_feedback to ktime_t Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-10 16:48:12 -07:00
Arnaldo Carvalho de Melo	1faf0a1f5d	[DCCP]: Convert ccid3hcrx_tstamp_last_ack to ktime_t Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-10 16:48:11 -07:00
Arnaldo Carvalho de Melo	23f062af6e	[DCCP]: Convert ccid3hctx_t_ld to ktime_t Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-10 16:48:11 -07:00
Arnaldo Carvalho de Melo	ac198ea8d9	[DCCP]: Make ccid3_hc_tx_update_x get a timestamp if needed The code was too complicated, if p > 0 in ccid3_hc_tx_no_feedback_timer the timestamp was being obtained to be passed to ccid3_hc_tx_update_x, where only if p > 0 the timestamp was needed, so just leave it to ccid3_hc_tx_update_x to obtain the timestamp if needed. This will help in the upcoming changesets where we'll convert t_ld to ktime_t. We'll eventually try to reuse ktime_get_real() calls again. Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-10 16:48:10 -07:00
Gerrit Renker	39dad26c37	[DCCP]: Allocation in atomic context This fixes the following bug reported in syslog: [ 4039.051658] BUG: sleeping function called from invalid context at /usr/src/davem-2.6/mm/slab.c:3032 [ 4039.051668] in_atomic():1, irqs_disabled():0 [ 4039.051670] INFO: lockdep is turned off. [ 4039.051674] [<c0104c0f>] show_trace_log_lvl+0x1a/0x30 [ 4039.051687] [<c0104d4d>] show_trace+0x12/0x14 [ 4039.051691] [<c0104d65>] dump_stack+0x16/0x18 [ 4039.051695] [<c011371e>] __might_sleep+0xaf/0xbe [ 4039.051700] [<c0157b66>] __kmalloc+0xb1/0xd0 [ 4039.051706] [<f090416f>] ccid2_hc_tx_alloc_seq+0x35/0xc3 [dccp_ccid2] [ 4039.051717] [<f09048d6>] ccid2_hc_tx_packet_sent+0x27f/0x2d9 [dccp_ccid2] [ 4039.051723] [<f085486b>] dccp_write_xmit+0x1eb/0x338 [dccp] [ 4039.051741] [<f085603d>] dccp_sendmsg+0x113/0x18f [dccp] [ 4039.051750] [<c03907fc>] inet_sendmsg+0x2e/0x4c [ 4039.051758] [<c033a47d>] sock_aio_write+0xd5/0x107 [ 4039.051766] [<c015abc1>] do_sync_write+0xcd/0x11c [ 4039.051772] [<c015b296>] vfs_write+0x118/0x11f [ 4039.051840] [<c015b932>] sys_write+0x3d/0x64 [ 4039.051845] [<c0103e7c>] syscall_call+0x7/0xb [ 4039.051848] ======================= The problem was that GFP_KERNEL was used; fixed by using gfp_any(). Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-08-21 20:58:06 -07:00
Jesper Juhl	e576de82ee	[DCCP]: fix memory leak and clean up style - dccp_feat_empty_confirm() There's a memory leak in net/dccp/feat.c::dccp_feat_empty_confirm(). If we hit the 'default:' case of the 'switch' statement, then we return without freeing 'opt', thus leaking 'struct dccp_opt_pend' bytes. The leak is fixed easily enough by adding a kfree(opt); before the return statement. The patch also changes the layout of the 'switch' to be more in line with CodingStyle. Signed-off-by: Jesper Juhl <jesper.juhl@gmail.com> Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-08-13 22:52:10 -07:00
Oleg Nesterov	d725fdc802	[DCCP]: fix theoretical ccids_{read,write}_lock() race Make sure that spin_unlock_wait() is properly ordered wrt atomic_inc(). (akpm: can't we convert this code to use rwlocks?) Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-08-13 22:52:09 -07:00
Paul Mundt	20c2df83d2	mm: Remove slab destructors from kmem_cache_create(). Slab destructors were no longer supported after Christoph's `c59def9f22` change. They've been BUGs for both slab and slub, and slob never supported them either. This rips out support for the dtor pointer from kmem_cache_create() completely and fixes up every single callsite in the kernel (there were about 224, not including the slab allocator definitions themselves, or the documentation references). Signed-off-by: Paul Mundt <lethal@linux-sh.org>	2007-07-20 10:11:58 +09:00
Linus Torvalds	ce8c2293be	Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 * 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6: (25 commits) [TG3]: Fix msi issue with kexec/kdump. [NET] XFRM: Fix whitespace errors. [NET] TIPC: Fix whitespace errors. [NET] SUNRPC: Fix whitespace errors. [NET] SCTP: Fix whitespace errors. [NET] RXRPC: Fix whitespace errors. [NET] ROSE: Fix whitespace errors. [NET] RFKILL: Fix whitespace errors. [NET] PACKET: Fix whitespace errors. [NET] NETROM: Fix whitespace errors. [NET] NETFILTER: Fix whitespace errors. [NET] IPV4: Fix whitespace errors. [NET] DCCP: Fix whitespace errors. [NET] CORE: Fix whitespace errors. [NET] BLUETOOTH: Fix whitespace errors. [NET] AX25: Fix whitespace errors. [PATCH] mac80211: remove rtnl locking in ieee80211_sta.c [PATCH] mac80211: fix GCC warning on 64bit platforms [GENETLINK]: Dynamic multicast groups. [NETLIKN]: Allow removing multicast groups. ...	2007-07-19 10:23:21 -07:00
Michael Ellerman	9e367d8592	jprobes: remove JPROBE_ENTRY() AFAICT now that jprobe.entry is a void *, JPROBE_ENTRY doesn't do anything useful - so remove it .. I've left a do-nothing version so that out-of-tree jprobes code will still compile without modifications. Signed-off-by: Michael Ellerman <michael@ellerman.id.au> Cc: Prasanna S Panchamukhi <prasanna@in.ibm.com> Acked-by: Ananth N Mavinakayanahalli <ananth@in.ibm.com> Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com> Cc: David S. Miller <davem@davemloft.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-19 10:04:44 -07:00
YOSHIFUJI Hideaki	23248005fb	[NET] DCCP: Fix whitespace errors. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>	2007-07-19 10:43:28 +09:00
YOSHIFUJI Hideaki	bb4dbf9e61	[IPV6]: Do not send RH0 anymore. Based on <draft-ietf-ipv6-deprecate-rh0-00.txt>. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-07-10 22:55:49 -07:00
Adrian Bunk	4fda25a2cd	[DCCP]: Make struct dccp_li_cachep static. Signed-off-by: Adrian Bunk <bunk@stusta.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-07-10 22:18:52 -07:00
Gerrit Renker	49d66a70cf	[CCID3]: Fix a bug in the send time processing ccid3_hc_tx_send_packet currently returns 0 when the time difference between current time and t_nom is less than 1000 microseconds. In this case the packet is sent immediately; but, unlike other packets that can be emitted on first attempt, it will not have its window counter updated and its options set as required. This is a bug. Fix: Require the time difference to be at least 1000 microseconds. The algorithm then converges: time differences > 1000 microseconds trigger the timer in dccp_write_xmit; after timer expiry this function is tried again; when the time difference is less than 1000, the packet will have its options added and window counter updated as required. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>	2007-07-10 22:15:34 -07:00
Gerrit Renker	8132da4d41	[CCID3]: Sending time: update to ktime_t This updates the computation of t_nom and t_last_win_count to use the newer gettimeofday interface. Committer note: used ktime_to_timeval to set the 'now' variable to t_ld in ccid3hctx_no_feedback_timer Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>	2007-07-10 22:15:27 -07:00
Arnaldo Carvalho de Melo	dd36a9aba4	loss_interval: make struct dccp_li_hist_entry private net/dccp/ccids/lib/loss_interval.c is the only place where this struct is used. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2007-07-10 22:15:24 -07:00
Arnaldo Carvalho de Melo	cc4d6a3a34	loss_interval: Nuke dccp_li_hist It had just a slab cache, so, for the sake of simplicity just make dccp_trfc_lib module init routine create the slab cache, no need for users of the lib to create a private loss_interval object. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2007-07-10 22:15:23 -07:00
Arnaldo Carvalho de Melo	c70b729e66	loss_interval: Make dccp_li_hist_entry_{new,delete} private Not used outside the loss_interval code anymore. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2007-07-10 22:15:22 -07:00
Arnaldo Carvalho de Melo	8c281780c6	loss_interval: unexport dccp_li_hist_interval_new Now its only used inside the loss_interval code. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2007-07-10 22:15:21 -07:00
Arnaldo Carvalho de Melo	cc0a910b94	[DCCP] loss_interval: Move ccid3_hc_rx_update_li to loss_interval Renaming it to dccp_li_update_li. Also based on previous work by Ian McDonald. Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>	2007-07-10 22:15:20 -07:00
Arnaldo Carvalho de Melo	878ac60023	[CCID3]: Pass ccid3_li_hist to ccid3_hc_rx_update_li Now ccid3_hc_rx_update_li is ready to be moved to net/dccp/ccids/lib/loss_interval, it uses the same interface as the other functions there. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2007-07-10 22:15:19 -07:00
Arnaldo Carvalho de Melo	d83258a3da	Remove accesses to ccid3_hc_rx_sock in ccid3_hc_rx_{update,calc_first}_li This is a preparatory patch for moving these loss interval functions from net/dccp/ccids/ccid3.c to net/dccp/ccids/lib/loss_interval.c. Based on a patch by Ian McDonald. Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>	2007-07-10 22:15:18 -07:00
Ian McDonald	6bc7efe8ef	loss_interval: Fix timeval initialisation When compiling with EXTRA_CFLAGS=-W noticed that tstamp is not initialised correctly in dccp_li_calc_first_li. Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net> Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz>	2007-07-10 22:15:06 -07:00
Ian McDonald	e961811fcd	Fix dccp_sum_coverage When compiling with EXTRA_CFLAGS=-W notice that we have signed/unsigned issue in dccp.h. Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net> Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz>	2007-07-10 22:15:05 -07:00
Ian McDonald	b2f41ff413	ccid3: Update copyrights Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz> Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>	2007-07-10 22:15:04 -07:00
Bill Nottingham	75202e7689	[NET]: Fix comparisons of unsigned < 0. Recent gcc versions emit warnings when unsigned variables are compared < 0 or >= 0. Signed-off-by: Bill Nottingham <notting@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-06-03 18:08:47 -07:00
David S. Miller	14e50e57ae	[XFRM]: Allow packet drops during larval state resolution. The current IPSEC rule resolution behavior we have does not work for a lot of people, even though technically it's an improvement from the -EAGAIN buisness we had before. Right now we'll block until the key manager resolves the route. That works for simple cases, but many folks would rather packets get silently dropped until the key manager resolves the IPSEC rules. We can't tell these folks to "set the socket non-blocking" because they don't have control over the non-block setting of things like the sockets used to resolve DNS deep inside of the resolver libraries in libc. With that in mind I coded up the patch below with some help from Herbert Xu which provides packet-drop behavior during larval state resolution, controllable via sysctl and off by default. This lays the framework to either: 1) Make this default at some point or... 2) Move this logic into xfrm{4,6}_policy.c and implement the ARP-like resolution queue we've all been dreaming of. The idea would be to queue packets to the policy, then once the larval state is resolved by the key manager we re-resolve the route and push the packets out. The packets would timeout if the rule didn't get resolved in a certain amount of time. Signed-off-by: David S. Miller <davem@davemloft.net>	2007-05-24 18:17:54 -07:00
David S. Miller	1b07a95a5b	[DCCP]: Fix build warning when debugging is disabled. Signed-off-by: David S. Miller <davem@davemloft.net>	2007-05-24 16:36:55 -07:00
Jan Engelhardt	3df25df354	[DCCP]: Use menuconfig objects. Use menuconfigs instead of menus, so the whole menu can be disabled at once instead of going through all options. Signed-off-by: Jan Engelhardt <jengelh@gmx.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-05-24 16:36:46 -07:00
Milind Arun Choudhary	4ef8d0aeaf	[NET]: SPIN_LOCK_UNLOCKED cleanup in drivers/atm, net SPIN_LOCK_UNLOCKED cleanup,use __SPIN_LOCK_UNLOCKED instead Signed-off-by: Milind Arun Choudhary <milindchoudhary@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-26 01:37:44 -07:00
Gerrit Renker	f73f7097c9	[DCCP]: Debug statements for Elapsed Time option This prints the value of the parsed Elapsed Time when received via a Timestamp Echo option [RFC 4342, 13.3]. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz> Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:28:55 -07:00
Gerrit Renker	b2449fdc30	[DCCP]: Fix bug in the calculation of very low sending rates This fixes an error in the calculation of t_ipi when X converges towards very low sending rates (between 1 and 64 bytes per second). Although this case may not sound likely, it can be reproduced by connecting, hitting enter (1 byte sent) and waiting for some time, during which the nofeedback timer halves the sending rate until finally it reaches the region 1..64 bytes/sec. Computing X is handled correctly (tested separately); but by dividing X _before_ entering the calculation of t_ipi, X becomes zero as a result. This in turn triggers a BUG condition caught in scaled_div(). Fixed by replacing with equivalent statement and explicit typecast for good measure. Calculation verified and effect of patch tested - reduced never below 1 byte per 64 seconds afterwards, i.e. not allowing divide-by-zero. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz> Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:28:54 -07:00
Gerrit Renker	30833ffead	[CCID3]: Use initial RTT sample from SYN exchange The patch follows the following recommendation made in an erratum to RFC 4342: "Senders MAY additionally make use of other available RTT measurements, including those from the initial Request-Response packet exchange." It implements larger initial windows with regard to this inital RTT measurement, using the mechanism suggested in draft-ietf-dccp-rfc3448bis, section 4.2. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:27:04 -07:00
Gerrit Renker	89560b53b9	[DCCP]: Sample RTT from SYN exchange Function:	2007-04-25 22:27:02 -07:00
Gerrit Renker	7dfee1a9c0	[CCID3]: Use function for RTT sampling This replaces the existing occurrences of RTT sampling with the use of the new function dccp_sample_rtt. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:27:01 -07:00
Gerrit Renker	4712a792ee	[DCCP]: Provide function for RTT sampling A recurring problem, in particular in the CCID code, is that RTT samples from packets with timestamp echo and elapsed time options need to be taken. This service is provided via a new function dccp_sample_rtt in this patch. Furthermore, to protect against `insane' RTT samples, the sampled value is bounded between 100 microseconds and 4 seconds - for which u32 is sufficient. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:27:00 -07:00
Gerrit Renker	0c150efb28	[CCID3]: Handle Idle and Application-Limited periods This updates the code with regard to handling idle and application-limited periods as specified in [RFC 4342, 5.1]. Background:	2007-04-25 22:26:59 -07:00
Gerrit Renker	a21f9f96cd	[CCID3]: Wrap computation of RFC3390-initial rate into separate function The CCID 3 and TFRC specs (RFC 4342, RFC 3448, draft-3448bis) make frequent reference to the computation of the RFC-3390 initial sending rate: 1. Initial sending rate when RTT is known (RFC 4342, p. 6) 2. Response to Idle/Application-Limited periods (RFC 4342, 5.1) This warrants putting the code into its own function, for later code reuse. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:26:58 -07:00
Gerrit Renker	1761f7d7fe	[CCID3]: Remove build warnings for 64bit This clears the following sparc64 build warnings: 1) warning: format "%ld" expects type "long int", but argument 3 has type "suseconds_t" 2) warning: format "%llu" expects type "long long unsigned int", but argument 3 has type "__u64" Fixed by using typecast to unsigned. This is argued to be safe, since the quantities, after de-scaling (factor 2^6) fit all in u32. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:26:57 -07:00
Gerrit Renker	fddc2feb94	[CCID3]: More to see in dccp_probe This adds a few more fields of interest to /proc/net/dccpprobe, the following output ensues: 1 2 3 4 5 6 7 8 9 10 11 sec.usec src:sport dst:dport size s rtt p X_calc X_recv X t_ipi Also made the formatting consistent. Scripts that go with this can be downloaded from http://139.133.210.30/users/gerrit/dccp/dccp_probe/ Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:26:56 -07:00
Gerrit Renker	6626e3628f	[DCCP]: More debug information for dccp_wait_for_ccid This adds more detail in the wait_for_ccid packet scheduling loop. In particular, it informs about (i) when delay is used and (ii) why a packet is discarded. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:26:54 -07:00
Gerrit Renker	ac12b0c495	[DCCP]: Always use debug-toggle parameters Currently debugging output (when configured) is automatically enabled when DCCP modules are compiled into the kernel rather than built as loadable modules. This is not necessary, since the module parameters in this case become kernel commandline parameters, e.g. DCCP or CCID3 debug output can be enabled for a static build by appending the following at the boot prompt: dccp.dccp_debug=1 dccp_ccid3.ccid3_debug=1 This patch therefore does away with the more complicated way of always enabling debug output for static builds Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:26:53 -07:00
Gerrit Renker	1266adee12	[CCID3]: Remove race condition and update t_ipi when `s' changes This: 1. removes a race condition in the access to the scheduled send time t_nom which results from allowing asynchronous r/w access to t_nom without locks; 2. updates the inter-packet interval t_ipi = s/X when `s' changes, following a suggestion by Ian McDonald. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:26:52 -07:00
Ian McDonald	8699be7d24	[CCID3]: More verbose debugging This adds a few debugging statements to ccid3.c Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz> Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:26:51 -07:00
Ian McDonald	551dc5f7a1	[CCID3]: Fix use of invalid loss intervals This fixes a bug which uses an invalid comparison. The bug resulted in the use of invalid loss intervals. Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz> Acked-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:26:50 -07:00
Gerrit Renker	371fe7779c	[CCID3]: Use MSS for larger initial windows This improves the slow-start phase by using the MSS (as suggested in RFC 4342, sec. 5) instead of the packet size s. Also figured out that __u32 is ample resource enough. After applying, I got the following in the logs: ccid3_hc_tx_packet_recv: client(f7421700), s=6, MSS=1424, w_init=4380, R_sample=176us, X=24886363 Had the previous variant been used, w_init would have been as low as 24. Committer note: removed unneeded cast to unsigned long long that was causing a compiler warning on 64bit architectures. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:26:49 -07:00
Gerrit Renker	9bf17475eb	[CCID3]: Re-order CCID 3 source file No code change at all. This splits ccid3.c into a RX and a TX section, so that the file has an organisation similar to the other ones (e.g. packet_history.{h,c}). Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:26:48 -07:00
Gerrit Renker	353b13e10a	[CCID3]: Remove redundant `len' test Since CCID3 avoids sending 0-byte data packets (cf. ccid3_hc_tx_send_packet), testing for zero-payload length, as performed by ccid3_hc_tx_update_s, is redundant - hence removed by this patch. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:26:47 -07:00
Gerrit Renker	8d13bf9a0b	[DCCP]: Remove ambiguity in the way before48 is used This removes two ambiguities in employing the new definition of before48, following the analysis on http://www.mail-archive.com/dccp@vger.kernel.org/msg01295.html (1) Updating GSR when P.seqno >= S.SWL With the old definition we did not update when P.seqno and S.SWL are 2^47 apart. To ensure the same behaviour as with the old definition, this is replaced with the equivalent condition dccp_delta_seqno(S.SWL, P.seqno) >= 0 (2) Sending SYNC when P.seqno >= S.OSR Here it is debatable whether the new definition causes an ambiguity: the case is similar to (1); and to have consistency with the case (1), we use the equivalent condition dccp_delta_seqno(S.OSR, P.seqno) >= 0 Detailed Justification	2007-04-25 22:26:46 -07:00
Gerrit Renker	b16be51b5e	[DCCP]: Fix for follows48 The follows48 relation identifies whether 48-bit sequence number x is the direct successor of y. Currently, it does not handle cases of the following type correctly: follows48(0x(prefix)10000LL, 0x(prefix)0FFFFLL) where prefix is an arbitrary hex sequence of up to 7 digits. This is fixed by reusing the new dccp_delta_seqno function. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:26:45 -07:00
Gerrit Renker	d52de17b8c	[DCCP]: Make `before' relation unambiguous Problem:	2007-04-25 22:26:44 -07:00
Gerrit Renker	0aec51c869	[DCCP]: Make dccp_delta_seqno return signed numbers Problem:	2007-04-25 22:26:43 -07:00
Gerrit Renker	6b811d43f6	[DCCP]: 48-bit sequence number arithmetic This patch * organizes the sequence arithmetic functions into one corner of dccp.h * performs a small modification of dccp_set_seqno to make it more widely reusable (now it is safe to use any number, since it performs modulo-2^48 assignment) * adds functions and generic macros for 48-bit sequence arithmetic: --48 bit complement --modulo-48 addition and modulo-48 subtraction --dccp_inc_seqno now a special case of add48 Constants renamed following a suggestion by Arnaldo. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:26:42 -07:00
Arnaldo Carvalho de Melo	88c7664f13	[SK_BUFF]: Introduce icmp_hdr(), remove skb->h.icmph Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:25:23 -07:00
Arnaldo Carvalho de Melo	0660e03f6b	[SK_BUFF]: Introduce ipv6_hdr(), remove skb->nh.ipv6h Now the skb->nh union has just one member, .raw, i.e. it is just like the skb->mac union, strange, no? I'm just leaving it like that till the transport layer is done with, when we'll rename skb->mac.raw to skb->mac_header (or ->mac_header_offset?), ditto for ->{h,nh}. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:25:14 -07:00
Arnaldo Carvalho de Melo	eddc9ec53b	[SK_BUFF]: Introduce ip_hdr(), remove skb->nh.iph Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:25:10 -07:00
Arnaldo Carvalho de Melo	d56f90a7c9	[SK_BUFF]: Introduce skb_network_header() For the places where we need a pointer to the network header, it is still legal to touch skb->nh.raw directly if just adding to, subtracting from or setting it to another layer header. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:24:59 -07:00
Adrian Bunk	cb69cc5236	[TCP/DCCP/RANDOM]: Remove unused exports. This patch removes the following not or no longer used exports: - drivers/char/random.c: secure_tcp_sequence_number - net/dccp/options.c: sysctl_dccp_feat_sequence_window - net/netlink/af_netlink.c: netlink_set_err Signed-off-by: Adrian Bunk <bunk@stusta.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:24:03 -07:00
Arnaldo Carvalho de Melo	39ebc0276b	[DCCP] getsockopt: Fix DCCP_SOCKOPT_[SEND,RECV]_CSCOV We were only checking if there was enough space to put the int, but left len as specified by the (malicious) user, sigh, fix it by setting len to sizeof(val) and transfering just one int worth of data, the one asked for. Also check for negative len values. Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-03-28 11:54:32 -07:00
Adrian Bunk	c93a882ebe	[DCCP]: make dccp_write_xmit_timer() static again dccp_write_xmit_timer() needlessly became global. Signed-off-by: Adrian Bunk <bunk@stusta.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-03-25 18:48:10 -07:00
Gerrit Renker	aabb601b0f	[DCCP]: Initialise write_xmit_timer also on passive sockets The TX CCID needs the write_xmit_timer for delaying packet sends. Previously this timer was only activated on active (connecting) sockets. This patch initialises the write_xmit_timer in sync with the other timers, i.e. the timer will be ready on any socket. This is used by applications with a listening socket which start to stream after receiving an initiation by the client. The write_xmit_timer is stopped when the application closes, as before. Was tested to work and to remove the timer bug reported on dccp@vger. Also moved timer initialisation into timer.c (static). Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-03-09 13:47:58 -08:00
Gerrit Renker	151a99317e	[DCCP]: Revert patch which disables bidirectional mode This reverts an earlier patch which disabled bidirectional mode, meaning that a listening (passive) socket was not allowed to write to the other (active) end of the connection. This mode had been disabled when there were problems with CCID3, but it imposes a constraint on socket programming and thus hinders deployment. A change is included to ignore RX feedback received by the TX CCID3 module. Many thanks to Andre Noll for pointing out this issue. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-03-07 16:08:07 -08:00
Gerrit Renker	99c72ce091	[DCCP]: Set RTO for newly created child socket This mirrors a recent change in tcp_open_req_child, whereby the icsk_rto of the newly created child socket was not set (but rather on the parent socket). Same fix for DCCP. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-03-06 14:24:44 -08:00
Gerrit Renker	4d46861be6	[DCCP]: Correctly split CCID half connections This fixes a bug caused by a previous patch, which causes DCCP servers in LISTEN state to not receive packets. This patch changes the logic so that * servers in either LISTEN or OPEN state get the RX half connection packets * clients in OPEN state get the TX half connection packets Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-03-06 14:24:18 -08:00
Patrick McHardy	b08d5840d2	[NET]: Fix kfree(skb) Signed-off-by: Patrick McHardy <kaber@trash.net> Acked-by: Paul Moore <paul.moore@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-02-28 09:42:14 -08:00
Eric W. Biederman	0b4d414714	[PATCH] sysctl: remove insert_at_head from register_sysctl The semantic effect of insert_at_head is that it would allow new registered sysctl entries to override existing sysctl entries of the same name. Which is pain for caching and the proc interface never implemented. I have done an audit and discovered that none of the current users of register_sysctl care as (excpet for directories) they do not register duplicate sysctl entries. So this patch simply removes the support for overriding existing entries in the sys_sysctl interface since no one uses it or cares and it makes future enhancments harder. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Acked-by: Ralf Baechle <ralf@linux-mips.org> Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Russell King <rmk@arm.linux.org.uk> Cc: David Howells <dhowells@redhat.com> Cc: "Luck, Tony" <tony.luck@intel.com> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Andi Kleen <ak@muc.de> Cc: Jens Axboe <axboe@kernel.dk> Cc: Corey Minyard <minyard@acm.org> Cc: Neil Brown <neilb@suse.de> Cc: "John W. Linville" <linville@tuxdriver.com> Cc: James Bottomley <James.Bottomley@steeleye.com> Cc: Jan Kara <jack@ucw.cz> Cc: Trond Myklebust <trond.myklebust@fys.uio.no> Cc: Mark Fasheh <mark.fasheh@oracle.com> Cc: David Chinner <dgc@sgi.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Patrick McHardy <kaber@trash.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-02-14 08:09:59 -08:00
Eric W. Biederman	f7d749fa0a	[PATCH] sysctl: dccp: remove unnecessary insert_at_head flag Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Cc: Arnaldo Carvalho de Melo <acme@conectiva.com.br> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-02-14 08:09:55 -08:00
Arjan van de Ven	9a32144e9d	[PATCH] mark struct file_operations const 7 Many struct file_operations in the kernel can be "const". Marking them const moves these to the .rodata section, which avoids false sharing with potential dirty data. In addition it'll catch accidental writes at compile time to these shared resources. Signed-off-by: Arjan van de Ven <arjan@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-02-12 09:48:46 -08:00
YOSHIFUJI Hideaki	c9eaf17341	[NET] DCCP: Fix whitespace errors. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-02-10 23:19:27 -08:00
Eric Dumazet	dbca9b2750	[NET]: change layout of ehash table ehash table layout is currently this one : First half of this table is used by sockets not in TIME_WAIT state Second half of it is used by sockets in TIME_WAIT state. This is non optimal because of for a given hash or socket, the two chain heads are located in separate cache lines. Moreover the locks of the second half are never used. If instead of this halving, we use two list heads in inet_ehash_bucket instead of only one, we probably can avoid one cache miss, and reduce ram usage, particularly if sizeof(rwlock_t) is big (various CONFIG_DEBUG_SPINLOCK, CONFIG_DEBUG_LOCK_ALLOC settings). So we still halves the table but we keep together related chains to speedup lookups and socket state change. In this patch I did not try to align struct inet_ehash_bucket, but a future patch could try to make this structure have a convenient size (a power of two or a multiple of L1_CACHE_SIZE). I guess rwlock will just vanish as soon as RCU is plugged into ehash :) , so maybe we dont need to scratch our heads to align the bucket... Note : In case struct inet_ehash_bucket is not a power of two, we could probably change alloc_large_system_hash() (in case it use __get_free_pages()) to free the unused space. It currently allocates a big zone, but the last quarter of it could be freed. Again, this should be a temporary 'problem'. Patch tested on ipv4 tcp only, but should be OK for IPV6 and DCCP. Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-02-08 14:16:46 -08:00
Andrew Morton	0f08461ebf	[DCCP]: Warning fixes. net/dccp/ccids/ccid3.c: In function `ccid3_hc_rx_packet_recv': net/dccp/ccids/ccid3.c:1007: warning: long int format, different type arg (arg 3) net/dccp/ccids/ccid3.c:1007: warning: long int format, different type arg (arg 4) opaque types must be suitably cast for printing. Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-02-08 12:38:56 -08:00
David S. Miller	8eb9086f21	[IPV4/IPV6]: Always wait for IPSEC SA resolution in socket contexts. Do this even for non-blocking sockets. This avoids the silly -EAGAIN that applications can see now, even for non-blocking sockets in some cases (f.e. connect()). With help from Venkat Tekkirala. Signed-off-by: David S. Miller <davem@davemloft.net>	2007-02-08 12:38:45 -08:00
David S. Miller	e89862f4c5	[TCP]: Restore SKB socket owner setting in tcp_transmit_skb(). Revert `931731123a` We can't elide the skb_set_owner_w() here because things like certain netfilter targets (such as owner MATCH) need a socket to be set on the SKB for correct operation. Thanks to Jan Engelhardt and other netfilter list members for pointing this out. Signed-off-by: David S. Miller <davem@davemloft.net>	2007-01-26 01:04:55 -08:00
Ian McDonald	832e3ca62d	[DCCP] ccid3: return value in ccid3_hc_rx_calc_first_li In a recent patch we introduced invalid return codes which will result in the opposite of what is intended (i.e. send more packets in face of peculiar network conditions). This fixes it by returning ~0 which means not calculated as per dccp_li_hist_calc_i_mean. Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>	2006-12-13 16:48:24 -08:00
Arnaldo Carvalho de Melo	8109b02b53	[DCCP]: Whitespace cleanups That accumulated over the last months hackaton, shame on me for not using git-apply whitespace helping hand, will do that from now on. Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>	2006-12-11 14:35:00 -08:00
Arnaldo Carvalho de Melo	1fba78b6cb	[DCCP] ccid3: Fixup some type conversions related to rtts Spotted by David Miller when compiling on sparc64, I reproduced it here on parisc64, that are the only platforms to define __kernel_suseconds_t as an 'int', all the others, x86_64 and x86 included typedef it as a 'long', but from the definition of suseconds_t it should just be an 'int' on platforms where it is >= 32bits, it would not require all the castings from suseconds_t to (int) when printking variables of this type, that are not needed on parisc64 and sparc64. Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>	2006-12-11 14:34:59 -08:00
Gerrit Renker	9e8efc8240	[DCCP] ccid3: BUG-FIX - conversion errors This fixes conversion errors which arose by not properly type-casting from u32 to __u64. Fixed by explicitly casting each type which is not __u64, or by performing operation after assignment. The patch further adds missing debug information to track the current value of X_recv. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>	2006-12-11 14:34:58 -08:00
Gerrit Renker	7af5af3013	[DCCP] ccid3: Reorder packet history source file No code change at all. This reorders the source file to follow the same order as the corresponding header file. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>	2006-12-11 14:34:57 -08:00
Gerrit Renker	85dcb1f780	[DCCP] ccid3: Reorder packet history header file No code change at all. To make the header file easier to read, the following ordering is established among the declarations: * hist_new * hist_delete * hist_entry_new * hist_head * hist_find_entry * hist_add_entry * hist_entry_delete * hist_purge Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>	2006-12-11 14:34:56 -08:00
Gerrit Renker	a967241129	[DCCP] ccid3: Make debug output consistent This patch does not alter any algorithm, just the debug message format: * s#%s, sk=%p#%s(%p)#g * when a statename is present, it now uses %s(%p, state=%s) * when only function entry is debugged, it adds an `- entry' Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>	2006-12-11 14:34:55 -08:00
Gerrit Renker	c5a1ae9a4c	[DCCP] ccid3: Perform history operations only after packet has been sent This migrates all packet history operations into the routine ccid3_hc_tx_packet_sent, thereby removing synchronization problems that occur when, as before, the operations are spread over multiple routines. The following minor simplifications are also applied: * several simplifications now follow from this change - several tests are now no longer required * removal of one unnecessary variable (dp) Justification: Currently packet history operations span two different routines, one of which is likely to pass through several iterations of sleeping and awakening. The first routine, ccid3_hc_tx_send_packet, allocates an entry and sets a few fields. The remaining fields are filled in when the second routine (which is not within a sleeping context), ccid3_hc_tx_packet_sent, is called. This has several strong drawbacks: * it is not necessary to split history operations - all fields can be filled in by the second routine * the first routine is called multiple times, until a packet can be sent, and sleeps meanwhile - this causes a lot of difficulties with regard to keeping the list consistent * since both routines do not have a producer-consumer like synchronization, it is very difficult to maintain data across calls to these routines * the fact that the routines are called in different contexts (sleeping, not sleeping) adds further problems Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>	2006-12-11 14:34:54 -08:00
Gerrit Renker	e312d100f1	[DCCP] ccid3: TX history - remove unused field This removes the `dccphtx_ccval' field since it is nowhere used in the code and in fact not necessary for the accounting. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>	2006-12-11 14:34:53 -08:00
Gerrit Renker	9f8681db96	[DCCP] ccid3: Shift window counter computation This puts the window counter computation [RFC 4342, 8.1] into a separate function which is called whenever a new packet is ready for immediate transmission in ccid3_hc_tx_send_packet. Justification: The window counter update was previously computed after the packet was sent. This has two drawbacks, both fixed by this patch: 1) re-compute another timestamp almost directly after the packet was sent (expensive), 2) the CCVal for the window counter is needed at the instant the packet is sent. Further details: The initialisation of the window counter is left in the state NO_SENT, as before. The algorithm will do nothing if either RTT is initialised to 0 (which is ok) or if the RTT value remains below 4 microseconds (which is almost pathological). Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>	2006-12-11 14:34:52 -08:00
Gerrit Renker	de553c189e	[DCCP] ccid3: Sanity-check RTT samples CCID3 performance depends much on the accuracy of RTT samples. If RTT samples grow too large, performance can be catastrophically poor. To limit the amount of possible damage in such cases, the patch * introduces an upper limit which identifies a maximum `sane' RTT value; * uses a macro to enforce this upper limit. Using a macro was given preference, since it is necessary to identify the calling function in the warning message. Since exceeding this threshold identifies a critical condition, DCCP_CRIT is used and not DCCP_WARN. Many thanks to Ian McDonald for collaboration on this issue. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>	2006-12-11 14:34:51 -08:00
Gerrit Renker	fe0499ae95	[DCCP] ccid3: Initialise RTT values In both the sender and the receiver it is possible that the stored RTT value is accessed before an actual RTT estimate has been computed. This patch * initialises the sender RTT to 0 - the sender always accesses the RTT in ccid3_hc_tx_packet_sent - the RTT is further needed for the window counter algorithm * replaces the receiver initialisation of 5msec with 0 - which has the same effect and removes an `XXX' - the RTT value is needed in ccid3_hc_rx_packet_recv as rtt_prev Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>	2006-12-11 14:34:50 -08:00
Gerrit Renker	65d6c2b42e	[DCCP] ccid: Deprecate ccid_hc_tx_insert_options The function ccid3_hc_tx_insert_options only does a redundant no-op, as the operation DCCP_SKB_CB(skb)->dccpd_ccval = hctx->ccid3hctx_last_win_count; is already performed _unconditionally_ in ccid3_hc_tx_send_packet. Since there is further no current need for this function, it is removed entirely. Since furthermore, there is actually no present need for the entire interface function ccid_hc_tx_insert_options, it was decided to remove it also, to clean up the interface. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>	2006-12-11 14:34:49 -08:00
Gerrit Renker	f6282f4da5	[DCCP]: Warn when discarding packet due to internal errors This adds a (debug) warning message which is triggered whenever a packet is discarded due to send failure. It also adds a conditional, so that an interruption during dccp_wait_for_ccid is not treated as a `BUG': the rationale is that interruptions are external, whereas bug warnings are concerned with the internals. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>	2006-12-11 14:34:48 -08:00
Gerrit Renker	bf58a381e8	[DCCP]: Only deliver to the CCID rx side in charge This is an optimisation to reduce CPU load. The received feedback is now only directed to the active CCID component, without requiring processing also by the inactive one. As a consequence, a similar test in ccid3.c is now redundant and is also removed. Justification: Currently DCCP works as a unidirectional service, i.e. a listening server is not at the same time a connecting client. As far as I can see, several modifications are necessary until that becomes possible. At the present time, received feedback is both fed to the rx/tx CCID modules. In unidirectional service, only one of these is active at any one time. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>	2006-12-11 14:34:47 -08:00
Gerrit Renker	d63d8364cf	[DCCP]: Simplify TFRC calculation In migrating towards using the newer functions scaled_div/scaled_div32 for TFRC computations mapped from floating-point onto integer arithmetic, this completes the last stage of modifications. In particular, the overflow case for computing X_calc is circumvented by * breaking the computation into two stages * the first stage, res = (s1E6)/R, cannot overflow due to use of u64 in the second stage, res = (res*1E6)/f, overflow on u32 is avoided due to (i) returning UINT_MAX in this case (which is logically appropriate) and (ii) issuing a warning message into the system log (since very likely there is a problem somewhere else with the parameters) Lastly, all such scaling operations are now exported into tfrc.h, since actually this form of scaled computation is specific to TFRC and not to CCID3. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>	2006-12-11 14:34:46 -08:00
Gerrit Renker	0f9e5b573f	[DCCP]: Debug timeval operations Problem: Most target types in the CCID3 code are u32, so subtle conversion errors can occur if signed time calculations yield negative results: the original values are lost in the conversion to unsigned, calculation errors go undetected. This patch therefore * sets all critical time types from unsigned to suseconds_t * avoids comparison between signed/unsigned via type-casting * provides ample warning messages in case time calculations are negative These warning messages can be removed at a later stage when the code has undergone more testing. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>	2006-12-11 14:34:45 -08:00
Gerrit Renker	bfe24a6cc2	[DCCP] ccid3: Simplify calculation for reverse lookup of p This simplifies the calculation of a value p for a given fval when the first loss interval is computed (RFC 3448, 6.3.1). It makes use of the two new functions scaled_div/scaled_div32 to provide overflow protection. Additionally, protection against divide-by-zero is extended - in this case the function will return the maximally possible value of p=100%. Background: The maximum fval, f(100%), is approximately 244, i.e. the scaled value of fval should never exceed 244E6, which fits easily into u32. The problem is the scaling by 10^6, since additionally R(TT) is in microseconds. This is resolved by breaking the division into two stages: the first stage computes fval=(s10^6)/R, stores that into u64; the second stage computes fval = (fval10^6)/X_recv and complains if overflow is reached for u32. This case is safe since the TFRC reverse-lookup routine then returns p=100%. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>	2006-12-11 14:34:44 -08:00
Gerrit Renker	b9039a2a8d	[DCCP] ccid3: Replace scaled division operations This replaces the remaining uses of usecs_div with scaled_div32, which internally uses 64bit division and produces a warning on overflow. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>	2006-12-11 14:34:43 -08:00
Gerrit Renker	1a21e49a8d	[DCCP] ccid3: Finer-grained resolution of sending rates This patch * resolves a bug where packets smaller than 32/64 bytes resulted in sending rates of 0 * supports all sending rates from 1/64 bytes/second up to 4Gbyte/second * simplifies the present overflow problems in calculations Current sending rate X and the cached value X_recv of the receiver-estimated sending rate are both scaled by 64 (2^6) in order to * cope with low sending rates (minimally 1 byte/second) * allow upgrading to use a packets-per-second implementation of CCID 3 * avoid calculation errors due to integer arithmetic cut-off The patch implements a revised strategy from http://www.mail-archive.com/dccp@vger.kernel.org/msg01040.html The only difference with regard to that strategy is that t_ipi is already used in the calculation of the nofeedback timeout, which saves one division. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>	2006-12-11 14:34:42 -08:00
Gerrit Renker	179ebc9f92	[DCCP] ccid3: Fix two bugs in sending rate computation This fixes 1) a bug in the recomputation of the sending rate by the nofeedback timer when no feedback at all has so far been sent by the receiver: min_t was used instead of max_t, which is wrong (cf. RFC 3448, p. 10); 2) an error in the computation of larger initial windows: instead of min(... max()) (cf. RFC 4342, 5.), the code had used max(... max()). Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>	2006-12-11 14:34:41 -08:00
Gerrit Renker	ff58629824	[DCCP] ccid3: Two optimisations for sending rate recomputation This performs two optimisations for the recomputation of the sending rate. 1) Currently the target sending rate X_calc is recalculated whenever a) the nofeedback timer expires, or b) a feedback packet is received. In the (a) case, recomputing X_calc is redundant, since * the parameters p and RTT do not change in between the reception of feedback packets; * the parameter X_recv is either modified from received feedback or via the nofeedback timer; * a test (`p == 0') in the nofeedback timer avoids using a stale/undefined value of X_calc if p was previously 0. 2) The nofeedback timer now only recomputes a timestamp when p == 0. This is according to step (4) of [RFC 3448, 4.3] and avoids unnecessarily determining a timestamp. A debug statement about not updating X is also removed - it helps very little in debugging and just clutters the logs. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>	2006-12-11 14:34:40 -08:00
Gerrit Renker	45393a66a2	[DCCP] ccid3: Check against too large p This patch follows a suggestion by Ian McDonald and ensures that in the current code the value of p can not exceed 100%. Such a value is illegal and would consequently cause a bug condition in tfrc_calc_x(). The receiver case is also tested, and a warning message is added. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>	2006-12-11 14:34:39 -08:00
Ian McDonald	5cc3741d6c	[DCCP]: Remove timeo from output.c It simplifies waiting for the CCID module to signal that a packet is ready to be sent. Other simplifications flow on from this such as removing constants. As a result of this EAGAIN is not returned any more by dccp_wait_for_ccid (which would otherwise lead to unnecessarily discarding the packet in dccp_write_xmit). Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz> Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>	2006-12-11 14:34:37 -08:00
Christoph Lameter	e18b890bb0	[PATCH] slab: remove kmem_cache_t Replace all uses of kmem_cache_t with struct kmem_cache. The patch was generated using the following script: #!/bin/sh # # Replace one string by another in all the kernel sources. # set -e for file in `find * -name ".c" -o -name ".h"\|xargs grep -l $1`; do quilt add $file sed -e "1,\$s/$1/$2/g" $file >/tmp/$$ mv /tmp/$$ $file quilt refresh done The script was run like this sh replace kmem_cache_t "struct kmem_cache" Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-12-07 08:39:25 -08:00
Christoph Lameter	54e6ecb239	[PATCH] slab: remove SLAB_ATOMIC SLAB_ATOMIC is an alias of GFP_ATOMIC Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-12-07 08:39:24 -08:00
David Howells	9db7372445	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6 Conflicts: drivers/ata/libata-scsi.c include/linux/libata.h Futher merge of Linus's head and compilation fixups. Signed-Off-By: David Howells <dhowells@redhat.com>	2006-12-05 17:01:28 +00:00
David Howells	4c1ac1b491	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6 Conflicts: drivers/infiniband/core/iwcm.c drivers/net/chelsio/cxgb2.c drivers/net/wireless/bcm43xx/bcm43xx_main.c drivers/net/wireless/prism54/islpci_eth.c drivers/usb/core/hub.h drivers/usb/input/hid-core.c net/core/netpoll.c Fix up merge failures with Linus's head and fix new compilation failures. Signed-Off-By: David Howells <dhowells@redhat.com>	2006-12-05 14:37:56 +00:00
Gerrit Renker	2bbf29acd8	[DCCP] tfrc: Binary search for reverse TFRC lookup This replaces the linear search algorithm for reverse lookup with binary search. It has the advantage of better scalability: O(log2(N)) instead of O(N). This means that the average number of iterations is reduced from 250 (linear search if each value appears equally likely) down to at most 9. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>	2006-12-03 14:53:27 -02:00
Gerrit Renker	44158306d7	[DCCP] ccid3: Deprecate TFRC_SMALLEST_P This patch deprecates the existing use of an arbitrary value TFRC_SMALLEST_P for low-threshold values of p. This avoids masking low-resolution errors. Instead, the code now checks against real boundaries (implemented by preceding patch) and provides warnings whenever a real value falls below the threshold. If such messages are observed, it is a better solution to take this as an indication that the lookup table needs to be re-engineered. Changelog: ---------- This patch * makes handling all TFRC resolution errors local to the TFRC library * removes unnecessary test whether X_calc is 'infinity' due to p==0 -- this condition is already caught by tfrc_calc_x() * removes setting ccid3hctx_p = TFRC_SMALLEST_P in ccid3_hc_tx_packet_recv since this is now done by the TFRC library * updates BUG_ON test in ccid3_hc_tx_no_feedback_timer to take into account that p now is either 0 (and then X_calc is irrelevant), or it is > 0; since the handling of TFRC_SMALLEST_P is now taken care of in the tfrc library Justification: -------------- The TFRC code uses a lookup table which has a bounded resolution. The lowest possible value of the loss event rate `p' which can be resolved is currently 0.0001. Substituting this lower threshold for p when p is less than 0.0001 results in a huge, exponentially-growing error. The error can be computed by the following formula: (f(0.0001) - f(p))/f(p) * 100 for p < 0.0001 Currently the solution is to use an (arbitrary) value TFRC_SMALLEST_P = 40 * 1E-6 = 0.00004 and to consider all values below this value as `virtually zero'. Due to the exponentially growing resolution error, this is not a good idea, since it hides the fact that the table can not resolve practically occurring cases. Already at p == TFRC_SMALLEST_P, the error is as high as 58.19%! Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>	2006-12-03 14:53:07 -02:00
Gerrit Renker	006042d7e1	[DCCP] tfrc: Identify TFRC table limits and simplify code This * adds documentation about the lowest resolution that is possible within the bounds of the current lookup table * defines a constant TFRC_SMALLEST_P which defines this resolution * issues a warning if a given value of p is below resolution * combines two previously adjacent if-blocks of nearly identical structure into one This patch does not change the algorithm as such. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>	2006-12-03 14:52:41 -02:00
Gerrit Renker	8d0086adac	[DCCP] tfrc: Add protection against invalid parameters to TFRC routines 1) For the forward X_calc lookup, it * protects effectively against RTT=0 (this case is possible), by returning the maximal lookup value instead of just setting it to 1 * reformulates the array-bounds exceeded condition: this only happens if p is greater than 1E6 (due to the scaling) * the case of negative indices can now with certainty be excluded, since documentation shows that the formulas are within bounds * additional protection against p = 0 (would give divide-by-zero) 2) For the reverse lookup, it warns against * protects against exceeding array bounds * now returns 0 if f(p) = 0, due to function definition * warns about minimal resolution error and returns the smallest table value instead of p=0 [this would mask congestion conditions] Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>	2006-12-03 14:52:26 -02:00
Gerrit Renker	90fb0e60dd	[DCCP] tfrc: Fix small error in reverse lookup of p for given f(p) This fixes the following small error in tfrc_calc_x_reverse_lookup. 1) The table is generated by the following equations: lookup[index][0] = g((index+1) * 1000000/TFRC_CALC_X_ARRSIZE); lookup[index][1] = g((index+1) * TFRC_CALC_X_SPLIT/TFRC_CALC_X_ARRSIZE); where g(q) is 1E6 * f(q/1E6) 2) The reverse lookup assigns an entry in lookup[index][small] 3) This index needs to match the above, i.e. * if small=0 then p = (index + 1) * 1000000/TFRC_CALC_X_ARRSIZE * if small=1 then p = (index+1) * TFRC_CALC_X_SPLIT/TFRC_CALC_X_ARRSIZE These are exactly the changes that the patch makes; previously the code did not conform to the way the lookup table was generated (this difference resulted in a mean error of about 1.12%). Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>	2006-12-03 14:52:01 -02:00
Gerrit Renker	50ab46c790	[DCCP] tfrc: Document boundaries and limits of the TFRC lookup table This adds documentation for the TCP Reno throughput equation which is at the heart of the TFRC sending rate / loss rate calculations. It spells out precisely how the values were determined and what they mean. The equations were derived through reverse engineering and found to be fully accurate (verified using test programs). This patch does not change any code. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>	2006-12-03 14:51:29 -02:00
Gerrit Renker	26af3072b0	[DCCP] ccid3: Fix warning message about illegal ACK This avoids a (harmless) warning message being printed at the DCCP server (the receiver of a DCCP half connection). Incoming packets are both directed to * ccid_hc_rx_packet_recv() for the server half * ccid_hc_tx_packet_recv() for the client half The message gets printed since on a server the client half is currently not sending data packets. This is resolved for the moment by checking the DCCP-role first. In future times (bidirectional DCCP connections), this test may have to be more sophisticated. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>	2006-12-03 14:51:14 -02:00
Gerrit Renker	5c3fbb6acf	[DCCP] ccid3: Fix bug in calculation of send rate The main object of this patch is the following bug: ==> In ccid3_hc_tx_packet_recv, the parameters p and X_recv were updated _after_ the send rate was calculated. This is clearly an error and is resolved by re-ordering statements. In addition, * r_sample is converted from u32 to long to check whether the time difference was negative (it would otherwise be converted to a large u32 value) * protection against RTT=0 (this is possible) is provided in a further patch * t_elapsed is also converted to long, to match the type of r_sample * adds a a more debugging information regarding current send rates * various trivial comment/documentation updates Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>	2006-12-03 14:50:56 -02:00
Gerrit Renker	76d127779e	[DCCP]: Fix BUG in retransmission delay calculation This bug resulted in ccid3_hc_tx_send_packet returning negative delay values, which in turn triggered silently dequeueing packets in dccp_write_xmit. As a result, only a few out of the submitted packets made it at all onto the network. Occasionally, when dccp_wait_for_ccid was involved, this also triggered a bug warning since ccid3_hc_tx_send_packet returned a negative value (which in reality was a negative delay value). The cause for this bug lies in the comparison if (delay >= hctx->ccid3hctx_delta) return delay / 1000L; The type of `delay' is `long', that of ccid3hctx_delta is `u32'. When comparing negative long values against u32 values, the test returned `true' whenever delay was smaller than 0 (meaning the packet was overdue to send). The fix is by casting, subtracting, and then testing the difference with regard to 0. This has been tested and shown to work. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>	2006-12-03 14:50:42 -02:00
Gerrit Renker	8a508ac26e	[DCCP]: Use higher RTO default for CCID3 The TFRC nofeedback timer normally expires after the maximum of 4 RTTs and twice the current send interval (RFC 3448, 4.3). On LANs with a small RTT this can mean a high processing load and reduced performance, since then the nofeedback timer is triggered very frequently. This patch provides a configuration option to set the bound for the nofeedback timer, using as default 100 milliseconds. By setting the configuration option to 0, strict RFC 3448 behaviour can be enforced for the nofeedback timer. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>	2006-12-03 14:50:23 -02:00
Gerrit Renker	6b57c93dc3	[DCCP]: Use `unsigned' for packet lengths This patch implements a suggestion by Ian McDonald and 1) Avoids tests against negative packet lengths by using unsigned int for packet payload lengths in the CCID send_packet()/packet_sent() routines 2) As a consequence, it removes an now unnecessary test with regard to `len > 0' in ccid3_hc_tx_packet_sent: that condition is always true, since * negative packet lengths are avoided * ccid3_hc_tx_send_packet flags an error whenever the payload length is 0. As a consequence, ccid3_hc_tx_packet_sent is never called as all errors returned by ccid_hc_tx_send_packet are caught in dccp_write_xmit 3) Removes the third argument of ccid_hc_tx_send_packet (the `len' parameter), since it is currently always set to skb->len. The code is updated with regard to this parameter change. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>	2006-12-02 21:31:02 -08:00
Gerrit Renker	a79ef76f4d	[DCCP] ccid3: Larger initial windows This implements the larger-initial-windows feature for CCID 3, as described in section 5 of RFC 4342. When the first feedback packet arrives, the sender can send up to 2..4 packets per RTT, instead of just one. The patch further * reduces the number of timestamping calls by passing the timestamp value (which is computed in one of the calling functions anyway) as argument * renames one constant with a very long name into one which is shorter and resembles the one in RFC 3448 (t_mbi) * simplifies some of the min_t/max_t cases where both `x', `y' have the same type Commiter note: renamed TFRC_t_mbi to TFRC_T_MBI, to follow Linux coding style. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>	2006-12-02 21:31:01 -08:00
Arnaldo Carvalho de Melo	841bac1d60	[DCCP]: Make {set,get}sockopt(DCCP_SOCKOPT_PACKET_SIZE) return 0 To reflect the fact that this now is of no effect, not making apps stop working, just be warned in the system log. Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>	2006-12-02 21:31:00 -08:00
Gerrit Renker	5aed324369	[DCCP]: Tidy up unused structures This removes and cleans up unused variables and structures which have become unnecessary following the introduction of the EWMA patch to automatically track the CCID 3 receiver/sender packet sizes `s'. It deprecates the PACKET_SIZE socket option by returning an error code and printing a deprecation warning if an application tries to read or write this socket option. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>	2006-12-02 21:30:59 -08:00
Gerrit Renker	78ad713da6	[DCCP] ccid3: Track RX/TX packet size `s' using moving-average Problem:	2006-12-02 21:30:58 -08:00
Gerrit Renker	2a1fda6f6c	[DCCP] ccid3: Set NoFeedback Timeout according to RFC 3448 This corrects the setting of the nofeedback timer with regard to RFC 3448 - previously it was not set to max(4R, 2s/X) as specified. Using the maximum of 1 second as upper bound (as it was done before) can have detrimental effects, especially if R is small. Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz> Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>	2006-12-02 21:30:57 -08:00
Gerrit Renker	4384260443	[DCCP]: Remove allocation of sysctl numbers This is in response to a request sent earlier by Eric W. Biederman and replaces all sysctl numbers for net.dccp.default with CTL_UNNUMBERED. It has been tested to compile and to work. Commiter note: I've removed the use of CTL_UNNUMBERED, not setting .ctl_name sets it to 0, that is the what CTL_UNNUMBERED is, reason is to avoid unneeded source code cluttering. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>	2006-12-02 21:30:56 -08:00
Gerrit Renker	5d0dbc4a9b	[DCCP] ccid3: Consolidate handling of t_RTO This patch * removes setting t_RTO in ccid3_hc_tx_init (per [RFC 3448, 4.2], t_RTO is undefined until feedback has been received); * makes some trivial changes (updates of comments); * performs a small optimisation by exploiting that the feedback timeout uses the value of t_ipi. The way it is done is safe, because the timeouts appear after the changes to t_ipi, ensuring that up-to-date values are used; * in ccid3_hc_tx_packet_recv, moves the t_rto statement closer to the calculation of the next_tmout. This makes the code clearer to read and is also safe, since t_rto is not updated until the next call of ccid3_hc_tx_packet_recv, and is not read by the functions called via ccid_wait_for_ccid(); * removes a `max' statement in sk_reset_timer, this is not needed since the timeout value is always greater than 1E6 microseconds. * adds `XXX'es to highlight that currently the nofeedback timer is set in a non-standard way Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>	2006-12-02 21:30:52 -08:00
Gerrit Renker	17893bc1a6	[DCCP] ccid3: Consistently update t_nom, t_ipi, t_delta This patch: * consolidates updating of parameters (t_nom, t_ipi, t_delta) which need to be updated at the same time, since they are inter-dependent * removes two inline functions which are no longer needed as a result of the above consolidation * resolves a FIXME regarding the re-calculation of t_ipi within the nofeedback timer, in the state where no feedback has previously been received * ties updating these parameters to updating the sending rate X, exploiting that all three parameters in turn depend on X; and using a small optimisation which can reduce the number of required instructions: only update the three parameters when X really changes Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>	2006-12-02 21:30:51 -08:00
Gerrit Renker	48e03eee71	[DCCP] ccid3: Consolidate timer resets This patch concerns updating the value of the nofeedback timer when no feedback has been received so far. Since in this case the value of R is still undefined according to [RFC 3448, 4.2], we can not perform step (3) of [RFC 3448, 4.3]. A clarification is provided in [RFC 4342, sec. 5], which states that in these cases the nofeedback timer (still) expires "after two seconds". Many thanks to Ian McDonald for pointing this out and providing the clarification. The patch * implements [RFC 4342, sec. 5] with regard to the above case * consolidates handling timer restart by - adding an appropriate jump label and - initialising the timeout value Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz> Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>	2006-12-02 21:30:50 -08:00
Gerrit Renker	5e19e3fcd7	[DCCP] ccid3: Resolve small FIXME This considers the case - ACK received while no packet has been sent so far. Resolved by printing a (rate-limited) warning message. Further removes an unnecessary BUG_ON in ccid3_hc_tx_packet_recv, received feedback on a terminating connection is simply ignored. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>	2006-12-02 21:30:41 -08:00
Gerrit Renker	70dbd5b0ef	[DCCP] ccid3: Remove redundant statements in ccid3_hc_tx_packet_sent This patch removes a switch statement which is redundant since, * nothing is done in states TFRC_SSTATE_NO_SENT/TFRC_SSTATE_NO_FBACK * it is impossible that the function is called in the state TFRC_SSTATE_TERM, since --the function is called, in dccp_write_xmit, after ccid3_hc_tx_send_packet --if ccid3_hc_tx_send_packet is called in state TFRC_SSTATE_TERM, it returns -EINVAL, which means that ccid3_hc_tx_packet_sent will not be called (compare dccp_write_xmit) --> therefore, this case is logically impossible * the remaining state is TFRC_SSTATE_FBACK which conditionally updates t_ipi, t_nom, and t_delta. This is a no-op, since --t_ipi only changes when feedback is received --however, when feedback arrives via ccid3_hc_tx_packet_recv, there is an identical code block which performs the same set of operations --performing the same set of operations again in ccid3_hc_tx_packet_sent therefore does not change anything, since between the time of receiving the last feedback (and therefore update of t_ipi, t_nom, and t_delta), the value of t_ipi has not changed --since t_ipi has not changed, the values of t_delta and t_nom also do not change, they depend fully on t_ipi Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>	2006-12-02 21:30:40 -08:00
Gerrit Renker	da335baf9e	[DCCP] ccid3: Avoid congestion control on zero-sized data packets This resolves an `XXX' in ccid3_hc_tx_send_packet(). The function is only called on Data and DataAck packets and returns a negative result on zero-sized messages. This is a reasonable policy since CCID 3 is a congestion-control module and congestion control on zero-sized Data(Ack) packets is in a way pathological. The patch uses a more suitable error code for this case, it returns the Posix.1 code `EBADMSG' ("Not a data message") instead of `ENOTCONN'. As a result of ignoring zero-sized packets, a the condition for a warning "First packet is data" in ccid3_hc_tx_packet_sent is always satisfied; this message has been removed since it will always be printed. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>	2006-12-02 21:30:39 -08:00
Gerrit Renker	7da7f456d7	[DCCP] ccid3: Simplify control flow of ccid3_hc_tx_send_packet This makes some logically equivalent simplifications, by replacing rc - values plus goto's with direct return statements. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>	2006-12-02 21:30:38 -08:00
Gerrit Renker	91cf5a1725	[DCCP] ccid3: Fix calculation of t_ipi time of scheduled transmission Problem:	2006-12-02 21:30:37 -08:00
Gerrit Renker	f5c2d6367b	[DCCP] ccid3: Simplify control flow in the calculation of t_ipi This patch performs a simplifying (performance) optimisation: In each call of the inline function ccid3_calc_new_t_ipi(), the state is tested against TFRC_SSTATE_NO_FBACK. This is expensive when the function is called very often. A simpler solution, implemented by this patch, is to adapt the control flow. Background:	2006-12-02 21:30:36 -08:00
Gerrit Renker	90feeb951f	[DCCP] ccid3: Fix bug in calculation of first t_nom and first t_ipi Problem:	2006-12-02 21:30:35 -08:00
Andrea Bittau	6472c051fc	[DCCP] ccid2: Allow window to grow larger Now that we can stuff bigger ack vectors into options. Signed-off-by: Andrea Bittau <a.bittau@cs.ucl.ac.uk> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>	2006-12-02 21:30:34 -08:00
Andrea Bittau	522f1d095b	[DCCP] ackvec: Split long ack vectors across multiple options Ack vectors grow proportional to the window size. If an ack vector does not fit into a single option, it must be spread across multiple options. This patch will allow for windows to grow larger. Committer note: Simplified the patch a bit, original algorithm kept. Signed-off-by: Andrea Bittau <a.bittau@cs.ucl.ac.uk> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>	2006-12-02 21:30:33 -08:00
Andrea Bittau	bdf13d208d	[DCCP] ackvec: infrastructure for sending more than one ackvec per packet Commiter note: This was split from Andrea's original patch, in the process I changed the type of the ackvec index fields to u16 instead of to int and haven't folded dccp_ackvec_parse with dccp_ackvec_check_rcv_ackno. Next patch will actually do the insertion of more than one ackvec per packet, using, initially, up to a max of 2 ackvecs as per Andrea's original patch, then I'll work on support for larger ackvecs, be it using a sysctl or using setsockopt. Signed-off-by: Andrea Bittau <a.bittau@cs.ucl.ac.uk> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>	2006-12-02 21:30:32 -08:00
Andrea Bittau	0bd4ff1b15	[DCCP] ackvec: Remove unused dccpav_ack_ptr field from dccp_ackvec Commiter note: original patch was splitted. Signed-off-by: Andrea Bittau <a.bittau@cs.ucl.ac.uk> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>	2006-12-02 21:30:31 -08:00
Ian McDonald	82e3ab9dbe	[DCCP]: Adds the tx buffer sysctls This one got lost on the way from Ian to Gerrit to me, fix it. Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>	2006-12-02 21:24:42 -08:00
Ian McDonald	455431739c	[DCCP] CCID3: Remove non-referenced variable This removes a non-referenced variable. Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>	2006-12-02 21:24:41 -08:00
Ian McDonald	e1b7441e80	[DCCP]: Make dccp_probe more portable This makes the code of the dccp_probe module more portable. Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>	2006-12-02 21:24:39 -08:00
Gerrit Renker	23ea8945f6	[CCID 3]: Add annotations for socket structures This adds documentation to the CCID 3 rx/tx socket fields, plus some minor re-formatting. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>	2006-12-02 21:24:38 -08:00
Gerrit Renker	59348b19ef	[DCCP]: Simplified conditions due to use of enum:8 states This reaps the benefit of the earlier patch, which changed the type of CCID 3 states to use enums, in that many conditions are now simplified and the number of possible (unexpected) values is greatly reduced. In a few instances, this also allowed to simplify pre-conditions; where care has been taken to retain logical equivalence. [DCCP]: Introduce a consistent BUG/WARN message scheme This refines the existing set of DCCP messages so that * BUG(), BUG_ON(), WARN_ON() have meaningful DCCP-specific counterparts * DCCP_CRIT (for severe warnings) is not rate-limited * DCCP_WARN() is introduced as rate-limited wrapper Using these allows a faster and cleaner transition to their original counterparts once the code has matured into a full DCCP implementation. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>	2006-12-02 21:24:38 -08:00
Ian McDonald	b1308dc015	[DCCP]: Set TX Queue Length Bounds via Sysctl Previously the transmit queue was unbounded. This patch: * puts a limit on transmit queue length and sends back EAGAIN if the buffer is full * sets the TX queue length to a sensible default * implements tx buffer sysctls for DCCP Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz> Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>	2006-12-02 21:24:37 -08:00
Gerrit Renker	56724aa434	[DCCP]: Add CCID3 debug support to Kconfig This adds a CCID3 debug option to the configuration menu which is missing in Kconfig, but already used by the code. CCID 2 already provides such an entry. To enable debugging, set CONFIG_IP_DCCP_CCID3_DEBUG=y NOTE: The use of ccid3_{t,r}x_state_name is safe, since now only enum values can appear. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>	2006-12-02 21:24:36 -08:00
Gerrit Renker	84116716cc	[DCCP]: enable debug messages also for static builds This patch * makes debugging (when configured) work both for static / module build * provides generic debugging macros for use in other DCCP / CCID modules * adds missing information about debug parameters to Kconfig * performs some code tidy-up Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>	2006-12-02 21:24:35 -08:00
Arnaldo Carvalho de Melo	eed73417d5	[DCCP]: Use kmemdup Code diff stats: [acme@newtoy net-2.6.20]$ codiff /tmp/dccp.ko.before /tmp/dccp.ko.after /pub/scm/linux/kernel/git/acme/net-2.6.20/net/dccp/feat.c: __dccp_feat_init \| -16 dccp_feat_change_recv \| -55 dccp_feat_clone \| -56 3 functions changed, 127 bytes removed [acme@newtoy net-2.6.20]$ Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>	2006-12-02 21:23:59 -08:00
Andrea Bittau	32aac18dfa	[DCCP] CCID2: Code optimizations These are code optimizations which are relevant when dealing with large windows. They are not coded the way I would like to, but they do the job for the short-term. This patch should be more neat. Commiter note: Changed the seqno comparisions to use {after,before}48 to handle wrapping. Signed-off-by: Andrea Bittau <a.bittau@cs.ucl.ac.uk> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>	2006-12-02 21:23:52 -08:00
Arnaldo Carvalho de Melo	58a5a7b955	[NET]: Conditionally use bh_lock_sock_nested in sk_receive_skb Spotted by Ian McDonald, tentatively fixed by Gerrit Renker: http://www.mail-archive.com/dccp%40vger.kernel.org/msg00599.html Rewritten not to unroll sk_receive_skb, in the common case, i.e. no lock debugging, its optimized away. Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>	2006-12-02 21:23:51 -08:00
Arnaldo Carvalho de Melo	e523a1550e	[DCCP]: One NET_INC_STATS() could be NET_INC_STATS_BH in dccp_v4_err() Spotted by Eric Dumazet in tcp_v4_rcv(). Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>	2006-12-02 21:23:50 -08:00
Gerrit Renker	3c6952624a	[DCCP]: Introduce DCCP_{BUG{_ON},CRIT} macros, use enum:8 for the ccid3 states This patch tackles the following problem: * the ccid3_hc_{t,r}x_sock define ccid3hc{t,r}x_state as `u8', but in reality there can only be a few, pre-defined enum names * this necessitates addiditional checking for unexpected values which would otherwise be caught by the compiler Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>	2006-12-02 21:23:49 -08:00
Al Viro	7d533f9418	[NET]: More dccp endianness annotations. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-12-02 21:23:45 -08:00
Al Viro	868c86bcb5	[NET]: annotate csum_ipv6_magic() callers in net/* Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-12-02 21:23:31 -08:00
Al Viro	2bda285315	[NET]: Annotate csum_tcpudp_magic() callers in net/* Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-12-02 21:23:30 -08:00
YOSHIFUJI Hideaki	cfb6eeb4c8	[TCP]: MD5 Signature Option (RFC2385) support. Based on implementation by Rick Payne. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-12-02 21:22:39 -08:00
Andrea Bittau	d23ca15a21	[DCCP] ACKVEC: Optimization - Do not traverse records if none will be found Do not traverse the list of ack vector records [proportional to window size] when we know we will not find what we are looking for. This is especially useful because ack vectors are checked twice: 1) Upon parsing of options. 2) Upon notification of a new ack. All of the work will occur during check #1. Therefore, when check #2 is performed, no new work will be done. This is now "detected" and there is no performance hit when doing #2. Signed-off-by: Andrea Bittau <a.bittau@cs.ucl.ac.uk> Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>	2006-12-02 21:22:31 -08:00
Gerrit Renker	09dbc3895e	[DCCP]: Miscellaneous code tidy-ups This patch does not change code; it performs some trivial clean/tidy-ups: * removal of a `debug_prefix' string in favour of the already existing dccp_role(sk) * add documentation of structures and constants * separated out the cases for invalid packets (step 1 of the packet validation) * removing duplicate statements * combining declaration & initialisation Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>	2006-12-02 21:22:30 -08:00
Gerrit Renker	c02fdc0e81	[DCCP]: Make feature negotiation more readable This patch replaces cryptic feature negotiation messages of type Oct 31 15:42:20 kernel: dccp_feat_change: feat change type=32 feat=1 Oct 31 15:42:21 kernel: dccp_feat_change: feat change type=34 feat=1 Oct 31 15:42:21 kernel: dccp_feat_change: feat change type=32 feat=5 into ones of type: Nov 2 13:54:45 kernel: dccp_feat_change: ChangeL(CCID (1), 3) Nov 2 13:54:45 kernel: dccp_feat_change: ChangeR(CCID (1), 3) Nov 2 13:54:45 kernel: dccp_feat_change: ChangeL(Ack Ratio (5), 2) Also, * completed the feature number list wrt RFC 4340 sec. 6.4 * annotating which ones have been implemented so far * implemented rudimentary sanity checking in feat.c (FIXMEs) * some minor fixes Commiter note: uninlined dccp_feat_name and dccp_feat_typename, for consistency with dccp_{state,packet}_name, that, BTW, should be compiled only if CONFIG_IP_DCCP_DEBUG is selected, leaving this to another cset tho. Also shortened dccp_feat_negotiation_debug to dccp_feat_debug. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>	2006-12-02 21:22:29 -08:00
Gerrit Renker	6a128e053e	[DCCPv6]: Resolve conditional build problem Resolves the problem that if IPv6 was configured `y' and DCCP `m' then dccp_ipv6 was not built as a module. With this change, dccp_ipv6 is built as a module whenever DCCP OR IPv6 are configured as modules; it will be built-in only if both DCCP = `y' and IPV6 = `y'. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>	2006-12-02 21:22:28 -08:00
Gerrit Renker	b9df3cb8cf	[TCP/DCCP]: Introduce net_xmit_eval Throughout the TCP/DCCP (and tunnelling) code, it often happens that the return code of a transmit function needs to be tested against NET_XMIT_CN which is a value that does not indicate a strict error condition. This patch uses a macro for these recurring situations which is consistent with the already existing macro net_xmit_errno, saving on duplicated code. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>	2006-12-02 21:22:27 -08:00
Gerrit Renker	d7f7365f57	[DCCPv6]: Choose a genuine initial sequence number This * resolves a FIXME - DCCPv6 connections started all with an initial sequence number of 1; * provides a redirection `secure_dccpv6_sequence_number' in case the init_sequence_v6 code should be updated later; * concentrates the update of S.GAR into dccp_connect_init(); * removes a duplicate dccp_update_gss() in ipv4.c; * uses inet->dport instead of usin->sin_port, due to the following assignment in dccp_v4_connect(): inet->dport = usin->sin_port; Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>	2006-12-02 21:22:22 -08:00
Gerrit Renker	865e9022d8	[DCCP]: Remove redundant statements in init_sequence (ISS) This patch removes the following redundancies: 1) The test skb->protocol == htons(ETH_P_IPV6) in dccp_v6_init_sequence is always true since * dccp_v6_conn_request() is the only calling function * dccp_v6_conn_request() redirects all skb's with ETH_P_IP to dccp_v4_conn_request() 2) The first argument, `struct sock *sk', of dccp_v{4,6}_init_sequence() is never used. (This is similar for tcp_v{4,6}_init_sequence, an analogous patch has been submitted to netdev and merged.) By the way - are the `sport' / `dport' arguments in the right order? I have made them consistent among calls but they seem to be in the reverse order. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>	2006-12-02 21:22:21 -08:00
Gerrit Renker	4ed800d02c	[DCCP]: Remove forward declarations in timer.c This removes 3 forward declarations by reordering 2 functions. No code change at all. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>	2006-12-02 21:22:20 -08:00
Gerrit Renker	afb0a34dd3	[DCCP]: Introduce a consistent naming scheme for sysctls In order to make their function clearer and obtain a consistent naming scheme to identify sysctls, all existing DCCP sysctls have been prefixed with `sysctl_dccp', following the same convention as used by TCP. Feature-specific sysctls retain the `feat' in the middle, although the `default' has been dropped, since it is obvious from use. Also removed a duplicate `dccp_feat_default_sequence_window' in ipv4.c. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>	2006-12-02 21:22:19 -08:00
Gerrit Renker	2e2e9e92bd	[DCCP]: Add sysctls to control retransmission behaviour This adds 3 sysctls which govern the retransmission behaviour of DCCP control packets (3way handshake, feature negotiation). It removes 4 FIXMEs from the code. The close resemblance of sysctl variables to their TCP analogues is emphasised not only by their name, but also by giving them the same initial values. This is useful since there is not much practical experience with DCCP yet. Furthermore, with regard to the previous patch, it is now possible to limit the number of keepalive-Responses by setting net.dccp.default.request_retries (also a bit like in TCP). Lastly, added documentation of all existing DCCP sysctls. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>	2006-12-02 21:22:18 -08:00
Gerrit Renker	e11d9d3080	[DCCP]: Increment sequence numbers on retransmitted Response packets Problem:	2006-12-02 21:22:17 -08:00
Gerrit Renker	08a29e41bb	[DCCP]: Update comments on precisely which packets can be retransmitted This updates program documentation: spell out precise conditions about which packets are eligible for retransmission (which is actually quite hard to extract from RFC 4340). It is based on the following table derived from RFC 4340: +-----------+---------------------------------+---------------------+ \| Type \| Retransmit? \| Remark \| +-----------+---------------------------------+---------------------+ \| Request \| in client-REQUEST state \| sec. 8.1.1 \| \| Response \| NEVER \| SHOULD NOT, 8.1.3 \| \| Data \| NEVER \| unreliable protocol \| \| Ack \| possible in client-PARTOPEN \| sec. 8.1.5 \| \| DataAck \| NEVER \| unreliable protocol \| \| CloseReq \| only in server-CLOSEREQ state \| MUST, sec. 8.3 \| \| Close \| in node-CLOSING state \| MUST, sec. 8.3 \| +-----------+-------------------------------------------------------+ \| Reset \| only in response to other packets \| \| Sync \| only in response to sequence-invalid packets (7.5.4) \| \| SyncAck \| only in response to Sync packets \| +-----------+-------------------------------------------------------+ Hence the only packets eligible for retransmission are: * Requests in client-REQUEST state (sec. 8.1.1) * Acks in client-PARTOPEN state (sec. 8.1.5) * CloseReq in server-CLOSEREQ state (sec. 8.3) * Close in node-CLOSING state (sec. 8.3) I had meant to put in a check for these types too, but have left that for later. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>	2006-12-02 21:22:16 -08:00
Gerrit Renker	6f4e5fff1e	[DCCP]: Support for partial checksums (RFC 4340, sec. 9.2) This patch does the following: a) introduces variable-length checksums as specified in [RFC 4340, sec. 9.2] b) provides necessary socket options and documentation as to how to use them c) basic support and infrastructure for the Minimum Checksum Coverage feature [RFC 4340, sec. 9.2.1]: acceptability tests, user notification and user interface In addition, it (1) fixes two bugs in the DCCPv4 checksum computation: * pseudo-header used checksum_len instead of skb->len * incorrect checksum coverage calculation based on dccph_x (2) removes dccp_v4_verify_checksum() since it reduplicates code of the checksum computation; code calling this function is updated accordingly. (3) now uses skb_checksum(), which is safer than checksum_partial() if the sk_buff has is a non-linear buffer (has pages attached to it). (4) fixes an outstanding TODO item: * If P.CsCov is too large for the packet size, drop packet and return. The code has been tested with applications, the latest version of tcpdump now comes with support for partial DCCP checksums. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>	2006-12-02 21:22:09 -08:00
Gerrit Renker	d83ca5accb	[DCCP]: Update code comments for Step 2/3 Sorts out the comments for processing steps 2,3 in section 8.5 of RFC 4340. All comments have been updated against this document, and the reference to step 2 has been made consistent throughout the files. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>	2006-12-02 21:22:04 -08:00
Gerrit Renker	cf557926f6	[DCCP]: tidy up dccp_v{4,6}_conn_request This is a code simplification to remove reduplicated code by concentrating and abstracting shared code. Detailed Changes:	2006-12-02 21:22:03 -08:00
Ian McDonald	f45b3ec481	[DCCP]: Fix logfile overflow This patch fixes data being spewed into the logs continually. As the code stood if there was a large queue and long delays timeo would go down to zero and never get reset. This fixes it by resetting timeo. Put constant into header as well. Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz> Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>	2006-12-02 21:22:02 -08:00
Ian McDonald	fec5b80e49	[DCCP]: Fix DCCP Probe Typo Fixes a typo in Kconfig, patch is by Ian McDonald and is re-sent from http://www.mail-archive.com/dccp@vger.kernel.org/msg00579.html Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz> Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>	2006-12-02 21:22:01 -08:00
Gerrit Renker	73c9e02c22	[DCCPv6]: remove forward declarations in ipv6.c This does the same for ipv6.c as the preceding one does for ipv4.c: Only the inet_connection_sock_af_ops forward declarations remain, since at least dccp_ipv6_mapped has a circular dependency to dccp_v6_request_recv_sock. No code change, merely re-ordering. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>	2006-12-02 21:22:00 -08:00
Gerrit Renker	3d2fe62b8d	[DCCPv4]: remove forward declarations in ipv4.c This relates to Arnaldo's announcement in http://www.mail-archive.com/dccp@vger.kernel.org/msg00604.html Originally this had been part of the Oops fix and is a revised variant of http://www.mail-archive.com/dccp@vger.kernel.org/msg00598.html No code change, merely reshuffling, with the particular objective of having all request_sock_ops close(r) together for more clarity. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>	2006-12-02 21:21:59 -08:00
Gerrit Renker	8a73cd09d9	[DCCP]: calling dccp_v{4,6}_reqsk_send_ack is a BUG This patch removes two functions, the send_ack functions of request_sock, which are not called/used by the DCCP code. It is correct that these functions are not called, below is a justification why calling these functions (on a passive socket in the LISTEN/RESPOND state) would mean a DCCP protocol violation. A) Background: using request_sock in TCP:	2006-12-02 21:21:58 -08:00
Arnaldo Carvalho de Melo	f6484f7c7a	[DCCP] timewait: Remove leftover extern declarations Gerrit Renker noticed dccp_tw_deschedule and submitted a patch with a FIXME, but as he suggests in the same patch the best thing is to just ditch this declaration, while doing that also noticed that tcp_tw_count is as well not defined anywhere, so ditch it too. Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>	2006-12-02 21:21:57 -08:00
Gerrit Renker	d23c7107bf	[DCCP]: Simplify jump labels in dccp_v{4,6}_rcv This is a code simplification and was singled out from the DCCPv6 Oops patch on http://www.mail-archive.com/dccp@vger.kernel.org/msg00600.html It mainly makes the code consistent between ipv{4,6}.c for the functions dccp_v4_rcv dccp_v6_rcv and removes the do_time_wait label to simplify code somewhat. Commiter note: fixed up a compile problem, trivial. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>	2006-12-02 21:21:56 -08:00
Gerrit Renker	9b42078ed6	[DCCP]: Combine allocating & zeroing header space on skb This is a code simplification: it combines three often recurring operations into one inline function, * allocate `len' bytes header space in skb * fill these `len' bytes with zeroes * cast the start of this header space as dccp_hdr Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>	2006-12-02 21:21:55 -08:00
Gerrit Renker	89e7e57778	[DCCPv6]: Add a FIXME for missing IPV6_PKTOPTIONS This refers to the possible memory leak pointed out in http://www.mail-archive.com/dccp@vger.kernel.org/msg00574.html, fixed by David Miller in http://www.mail-archive.com/netdev@vger.kernel.org/msg24881.html and adds a FIXME to point out where code is missing. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>	2006-12-02 21:21:54 -08:00
Gerrit Renker	60361be1be	[DCCP]: set safe upper bound for option length This is a re-send from http://www.mail-archive.com/dccp@vger.kernel.org/msg00553.html It is the same patch as before, but I have built in Arnaldo's suggestions pointed out in that posting. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>	2006-12-02 21:21:53 -08:00
David S. Miller	931731123a	[TCP]: Don't set SKB owner in tcp_transmit_skb(). The data itself is already charged to the SKB, doing the skb_set_owner_w() just generates a lot of noise and extra atomics we don't really need. Lmbench improvements on lat_tcp are minimal: before: TCP latency using localhost: 23.2701 microseconds TCP latency using localhost: 23.1994 microseconds TCP latency using localhost: 23.2257 microseconds after: TCP latency using localhost: 22.8380 microseconds TCP latency using localhost: 22.9465 microseconds TCP latency using localhost: 22.8462 microseconds Signed-off-by: David S. Miller <davem@davemloft.net>	2006-12-02 21:21:52 -08:00
David S. Miller	494b4e7d81	[DCCP]: Fix typo _read_mostly --> __read_mostly. Signed-off-by: David S. Miller <davem@davemloft.net>	2006-12-02 21:21:45 -08:00
Eric Dumazet	72a3effaf6	[NET]: Size listen hash tables using backlog hint We currently allocate a fixed size (TCP_SYNQ_HSIZE=512) slots hash table for each LISTEN socket, regardless of various parameters (listen backlog for example) On x86_64, this means order-1 allocations (might fail), even for 'small' sockets, expecting few connections. On the contrary, a huge server wanting a backlog of 50000 is slowed down a bit because of this fixed limit. This patch makes the sizing of listen hash table a dynamic parameter, depending of : - net.core.somaxconn tunable (default is 128) - net.ipv4.tcp_max_syn_backlog tunable (default : 256, 1024 or 128) - backlog value given by user application (2nd parameter of listen()) For large allocations (bigger than PAGE_SIZE), we use vmalloc() instead of kmalloc(). We still limit memory allocation with the two existing tunables (somaxconn & tcp_max_syn_backlog). So for standard setups, this patch actually reduce RAM usage. Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-12-02 21:21:44 -08:00
Akinobu Mita	ac16ca6412	[NET]: Fix kfifo_alloc() error check. The return value of kfifo_alloc() should be checked by IS_ERR(). Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-11-25 15:16:49 -08:00
David Howells	c4028958b6	WorkStruct: make allyesconfig Fix up for make allyesconfig. Signed-Off-By: David Howells <dhowells@redhat.com>	2006-11-22 14:57:56 +00:00
YOSHIFUJI Hideaki	f2776ff047	[IPV6]: Fix address/interface handling in UDP and DCCP, according to the scoping architecture. TCP and RAW do not have this issue. Closes Bug #7432. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-11-21 17:41:56 -08:00
Randy Dunlap	234af48401	[DCCP]: fix printk format warnings Fix printk format warnings: build2.out:net/dccp/ccids/ccid2.c:355: warning: long long unsigned int format, u64 arg (arg 3) build2.out:net/dccp/ccids/ccid2.c:360: warning: long long unsigned int format, u64 arg (arg 3) build2.out:net/dccp/ccids/ccid2.c:482: warning: long long unsigned int format, u64 arg (arg 5) build2.out:net/dccp/ccids/ccid2.c:639: warning: long long unsigned int format, u64 arg (arg 3) build2.out:net/dccp/ccids/ccid2.c:639: warning: long long unsigned int format, u64 arg (arg 4) build2.out:net/dccp/ccids/ccid2.c:674: warning: long long unsigned int format, u64 arg (arg 3) build2.out:net/dccp/ccids/ccid2.c:720: warning: long long unsigned int format, u64 arg (arg 3) Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-10-30 15:24:37 -08:00
Gerrit Renker	0e64e94e47	[DCCP]: Update documentation references. Updates the references to spec documents throughout the code, taking into account that * the DCCP, CCID 2, and CCID 3 drafts all became RFCs in March this year * RFC 1063 was obsoleted by RFC 1191 * draft-ietf-tcpimpl-pmtud-0x.txt was published as an Informational RFC, RFC 2923 on 2000-09-22. All references verified. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-10-24 16:17:51 -07:00
David S. Miller	fd169f15a6	[DCCP] ipv6: Fix opt_skb leak. Based upon a patch from Jesper Juhl. Try to match the TCP IPv6 code this was copied from as much as possible, so that it's easy to see where to add the ipv6 pktoptions support code. Signed-off-by: David S. Miller <davem@davemloft.net>	2006-10-21 19:55:21 -07:00
Gerrit Renker	82709531a8	[DCCP]: Fix Oops in DCCPv6 I think I got the cause for the Oops observed in http://www.mail-archive.com/dccp@vger.kernel.org/msg00578.html The problem is always with applications listening on PF_INET6 sockets. Apart from the mentioned oops, I observed another one one, triggered at irregular intervals via timer interrupt: run_timer_softirq -> dccp_keepalive_timer -> inet_csk_reqsk_queue_prune -> reqsk_free -> dccp_v6_reqsk_destructor The latter function is the problem and is also the last function to be called in said kernel panic. In any case, there is a real problem with allocating the right request_sock which is what this patch tackles. It fixes the following problem: - application listens on PF_INET6 - DCCPv4 packet comes in, is handed over to dccp_v4_do_rcv, from there to dccp_v4_conn_request Now: socket is PF_INET6, packet is IPv4. The following code then furnishes the connection with IPv6 - request_sock operations: req = reqsk_alloc(sk->sk_prot->rsk_prot); The first problem is that all further incoming packets will get a Reset since the connection can not be looked up. The second problem is worse: --> reqsk_alloc is called instead of inet6_reqsk_alloc --> consequently inet6_rsk_offset is never set (dangling pointer) --> the request_sock_ops are nevertheless still dccp6_request_ops --> destructor is called via reqsk_free --> dccp_v6_reqsk_destructor tries to free random memory location (inet6_rsk_offset not set) --> panic I have tested this for a while, DCCP sockets are now handled correctly in all three scenarios (v4/v6 only/v4-mapped). Commiter note: I've added the dccp_request_sock_ops forward declaration to keep the tree building and to reduce the size of the patch for 2.6.19, later I'll move the functions to the top of the affected source code to match what we have in the TCP counterpart, where this problem hasn't existed in the first place, dumb me not to have done the same thing on DCCP land 8) Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>	2006-10-21 19:55:20 -07:00
YOSHIFUJI Hideaki	9469c7b4aa	[NET]: Use typesafe inet_twsk() inline function instead of cast. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-10-11 23:59:58 -07:00
Al Viro	bada8adc4e	[IPV4]: ip_route_connect() ipv4 address arguments annotated annotated address arguments (port number left alone for now); ditto for inferred net-endian variables in callers. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-09-28 17:54:06 -07:00
Ian McDonald	e41542f516	[DCCP]: Introduce dccp_probe This adds DCCP probing shamelessly ripped off from TCP probes by Stephen Hemminger. I've put in here support for further CCID3 variables as well. Andrea/Arnaldo might look to extend for CCID2. Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>	2006-09-24 18:08:17 -03:00
Ian McDonald	3dd9a7c3a1	[DCCP]: Use constants for CCIDs With constants for CCID numbers this now uses them in some places. Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>	2006-09-24 18:03:41 -03:00
Gerrit Renker	00e4d116a7	[DCCP]: Allow default/fallback service code. This has been discussed on dccp@vger and removes the necessity for applications to supply service codes in each and every case. If an application does not want to provide a service code, that's fine, it will be given 0. Otherwise, service codes can be set via socket options as before. This patch has been tested using various client/server configurations (including listening on multiple service codes). Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>	2006-09-24 17:49:26 -03:00
Andrea Bittau	593f16aa62	[DCCP] CCID2: Add helper functions for changing important CCID2 state Introduce methods which manipulate interesting congestion control state such as pipe and rtt estimate. This is useful for people wishing to monitor the variables of CCID and instrument the code [perhaps using Kprobes]. Personally, I am a fan of encapsulation---that justifies this change =D. Signed-off-by: Andrea Bittau <a.bittau@cs.ucl.ac.uk> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-09-22 15:19:42 -07:00
Andrea Bittau	374bcf32c8	[DCCP] CCID2: Halve cwnd once upon multiple losses in a single RTT When multiple losses occur in one RTT, the window should be halved only once [a single "congestion event"]. This is now implemented, although not perfectly. Slightly changed the interface for changing the cwnd: pass hctx instead of dp. This is required in order to allow for change_cwnd to be called from _init(). Signed-off-by: Andrea Bittau <a.bittau@cs.ucl.ac.uk> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-09-22 15:19:41 -07:00
Andrea Bittau	07978aabd5	[DCCP] CCID2: Allocate seq records on demand Allocate more sequence state on demand. Each time a packet is sent out by CCID2, a record of it needs to be kept. This list of records grows proportionally to cwnd. Previously, the length of this list was hardcored and therefore the cwnd could only grow to this value (of 128). Now, records are allocated on demand as necessary---cwnd may grow as it wishes. The exceptional case of when memory is not available is not handled gracefully. Perhaps, cwnd should be capped at that point. Signed-off-by: Andrea Bittau <a.bittau@cs.ucl.ac.uk> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-09-22 15:19:40 -07:00
Andrea Bittau	8d424f6ca2	[DCCP] CCID2: Add Kconfig option for CCID2 debug Allow the user to choose whether or not to enable CCID2 debugging via Kconfig. Signed-off-by: Andrea Bittau <a.bittau@cs.ucl.ac.uk> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-09-22 15:19:39 -07:00
Andrea Bittau	446dec30c7	[DCCP] CCID2: Tell DCCP to quickly check whether cwnd is available If not enough cwnd is available, tell the sender to check again as soon as possible. This will increase CPU utilization (polling frequently for cwnd) but will improve network performance. That is, the sender will need to wait less before detecting the increase of cwnd. A better architecture would be for the CCID to call-back (or dequeue) from DCCP when it is able to transmit traffic -- not the other way around as it currently occurs. Signed-off-by: Andrea Bittau <a.bittau@cs.ucl.ac.uk> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-09-22 15:19:39 -07:00
Andrea Bittau	d458c25ce2	[DCCP] CCID2: Initialize ssthresh to infinity Initialize the slow-start threshold to infinity. This way, upon connection initiation, slow-start will be exited only upon a packet loss. This patch will allow connections to quickly gain speed. Signed-off-by: Andrea Bittau <a.bittau@cs.ucl.ac.uk> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-09-22 15:19:11 -07:00
Andrea Bittau	29651cda97	[DCCP] CCID2: Fix jiffie wrap issues Jiffies are now handled correctly (I hope) in CCID2. If they wrap, no problem. Signed-off-by: Andrea Bittau <a.bittau@cs.ucl.ac.uk> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-09-22 15:19:10 -07:00
Andrea Bittau	4a0a50fb43	[DCCP] ackvec: Remove unused variables Get rid of unused variables in ackvector state. Signed-off-by: Andrea Bittau <a.bittau@cs.ucl.ac.uk> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-09-22 15:19:09 -07:00
Andrea Bittau	8e27e4650c	[DCCP] ackvec: Fix how DCCP_ACKVEC_STATE_NOT_RECEIVED is used Fix the way state is masked out. DCCP_ACKVEC_STATE_NOT_RECEIVED is defined as appears in the packet, therefore bit shifting is not required. This fix allows CCID2 to correctly detect losses. Signed-off-by: Andrea Bittau <a.bittau@cs.ucl.ac.uk> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-09-22 15:19:08 -07:00
Andrea Bittau	23d06e3b98	[DCCP] ACKVEC: fix ackvector length calculation Fix ackvector length calculation upon receiving an "ack-of-ack". This patch avoids the ackvector from growing too large which causes it to not be inserted into packets. Signed-off-by: Andrea Bittau <a.bittau@cs.ucl.ac.uk> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-09-22 15:19:07 -07:00
Dmitry Mishin	fda9ef5d67	[NET]: Fix sk->sk_filter field access Function sk_filter() is called from tcp_v{4,6}_rcv() functions with arg needlock = 0, while socket is not locked at that moment. In order to avoid this and similar issues in the future, use rcu for sk->sk_filter field read protection. Signed-off-by: Dmitry Mishin <dim@openvz.org> Signed-off-by: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru> Signed-off-by: Kirill Korotaev <dev@openvz.org>	2006-09-22 15:18:47 -07:00
Ian McDonald	fc747e82b4	[DCCP]: Tidyup CCID3 list handling As Arnaldo Carvalho de Melo points out I should be using list_entry in case the structure changes in future. Current code functions but is reliant on position and requires type cast. Noticed when doing this that I have one more variable than I needed so removing that also. Signed off by: Ian McDonald <ian.mcdonald@jandi.co.nz> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-09-22 15:18:33 -07:00
Ian McDonald	97e5848dd3	[DCCP]: Introduce tx buffering This adds transmit buffering to DCCP. I have tested with CCID2/3 and with loss and rate limiting. Signed off by: Ian McDonald <ian.mcdonald@jandi.co.nz> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-09-22 15:18:17 -07:00
Ian McDonald	2a0109a707	[DCCP]: Shift sysctls into feat.h This shifts further sysctls into feat.h. No change in functionality - shifting code only. Signed off by: Ian McDonald <ian.mcdonald@jandi.co.nz> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-09-22 15:18:16 -07:00
YOSHIFUJI Hideaki	8e1ef0a95b	[IPV6]: Cache source address as well in ipv6_pinfo{}. Based on MIPL2 kernel patch. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: Ville Nuorvala <vnuorval@tcs.hut.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-09-22 14:55:45 -07:00

... 4 5 6 7 8 ...

732 Commits