linux

Commit Graph

Author	SHA1	Message	Date
Eric Dumazet	47a31a6ffc	[IPV4]: Use the {DEFINE\|REF}_PROTO_INUSE infrastructure Trivial patch to make "tcp,udp,udplite,raw" protocols uses the fast "inuse sockets" infrastructure Each protocol use then a static percpu var, instead of a dynamic one. This saves some ram and some cpu cycles Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-11-07 04:08:58 -08:00
Jens Axboe	c46f2334c8	[SG] Get rid of __sg_mark_end() sg_mark_end() overwrites the page_link information, but all users want __sg_mark_end() behaviour where we just set the end bit. That is the most natural way to use the sg list, since you'll fill it in and then mark the end point. So change sg_mark_end() to only set the termination bit. Add a sg_magic debug check as well, and clear a chain pointer if it is set. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2007-11-02 08:47:06 +01:00
David S. Miller	51c739d1f4	[NET]: Fix incorrect sg_mark_end() calls. This fixes scatterlist corruptions added by commit `68e3f5dd4d` [CRYPTO] users: Fix up scatterlist conversion errors The issue is that the code calls sg_mark_end() which clobbers the sg_page() pointer of the final scatterlist entry. The first part fo the fix makes skb_to_sgvec() do __sg_mark_end(). After considering all skb_to_sgvec() call sites the most correct solution is to call __sg_mark_end() in skb_to_sgvec() since that is what all of the callers would end up doing anyways. I suspect this might have fixed some problems in virtio_net which is the sole non-crypto user of skb_to_sgvec(). Other similar sg_mark_end() cases were converted over to __sg_mark_end() as well. Arguably sg_mark_end() is a poorly named function because it doesn't just "mark", it clears out the page pointer as a side effect, which is what led to these bugs in the first place. The one remaining plain sg_mark_end() call is in scsi_alloc_sgtable() and arguably it could be converted to __sg_mark_end() if only so that we can delete this confusing interface from linux/scatterlist.h Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-30 21:29:29 -07:00
Matthias M. Dellweg	b0a713e9e6	[TCP] MD5: Remove some more unnecessary casting. while reviewing the tcp_md5-related code further i came across with another two of these casts which you probably have missed. I don't actually think that they impose a problem by now, but as you said we should remove them. Signed-off-by: Matthias M. Dellweg <2500@gmx.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-29 22:37:27 -07:00
David S. Miller	c7da57a183	[TCP]: Fix scatterlist handling in MD5 signature support. Use sg_init_table() and sg_mark_end() as needed. Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-26 00:41:21 -07:00
Stephen Hemminger	227b60f510	[INET]: local port range robustness Expansion of original idea from Denis V. Lunev <den@openvz.org> Add robustness and locking to the local_port_range sysctl. 1. Enforce that low < high when setting. 2. Use seqlock to ensure atomic update. The locking might seem like overkill, but there are cases where sysadmin might want to change value in the middle of a DoS attack. Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-10 17:30:46 -07:00
Eric W. Biederman	457c4cbc5a	[NET]: Make /proc/net per network namespace This patch makes /proc/net per network namespace. It modifies the global variables proc_net and proc_net_stat to be per network namespace. The proc_net file helpers are modified to take a network namespace argument, and all of their callers are fixed to pass &init_net for that argument. This ensures that all of the /proc/net files are only visible and usable in the initial network namespace until the code behind them has been updated to be handle multiple network namespaces. Making /proc/net per namespace is necessary as at least some files in /proc/net depend upon the set of network devices which is per network namespace, and even more files in /proc/net have contents that are relevant to a single network namespace. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-10 16:49:06 -07:00
David S. Miller	f8ab18d2d9	[TCP]: Fix MD5 signature handling on big-endian. Based upon a report and initial patch by Peter Lieven. tcp4_md5sig_key and tcp6_md5sig_key need to start with the exact same members as tcp_md5sig_key. Because they are both cast to that type by tcp_v{4,6}_md5_do_lookup(). Unfortunately tcp{4,6}_md5sig_key use a u16 for the key length instead of a u8, which is what tcp_md5sig_key uses. This just so happens to work by accident on little-endian, but on big-endian it doesn't. Instead of casting, just place tcp_md5sig_key as the first member of the address-family specific structures, adjust the access sites, and kill off the ugly casts. Signed-off-by: David S. Miller <davem@davemloft.net>	2007-09-28 15:18:35 -07:00
David S. Miller	3516ffb0fe	[TCP]: Invoke tcp_sendmsg() directly, do not use inet_sendmsg(). As discovered by Evegniy Polyakov, if we try to sendmsg after a connection reset, we can do incredibly stupid things. The core issue is that inet_sendmsg() tries to autobind the socket, but we should never do that for TCP. Instead we should just go straight into TCP's sendmsg() code which will do all of the necessary state and pending socket error checks. TCP's sendpage already directly vectors to tcp_sendpage(), so this merely brings sendmsg() in line with that. Signed-off-by: David S. Miller <davem@davemloft.net>	2007-08-02 19:42:28 -07:00
Herbert Xu	a7ab4b501f	[TCPv4]: Improve BH latency in /proc/net/tcp Currently the code for /proc/net/tcp disable BH while iterating over the entire established hash table. Even though we call cond_resched_softirq for each entry, we still won't process softirq's as regularly as we would otherwise do which results in poor performance when the system is loaded near capacity. This anomaly comes from the 2.4 code where this was all in a single function and the local_bh_disable might have made sense as a small optimisation. The cost of each local_bh_disable is so small when compared against the increased latency in keeping it disabled over a large but mostly empty TCP established hash table that we should just move it to the individual read_lock/read_unlock calls as we do in inet_diag. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-07-10 22:06:20 -07:00
David S. Miller	3d7dbeac58	[TCP]: Disable TSO if MD5SIG is enabled. Signed-off-by: David S. Miller <davem@davemloft.net>	2007-06-12 14:36:42 -07:00
Patrick McHardy	f0e48dbfc5	[TCP]: Honour sk_bound_dev_if in tcp_v4_send_ack A time_wait socket inherits sk_bound_dev_if from the original socket, but it is not used when sending ACK packets using ip_send_reply. Fix by passing the oif to ip_send_reply in struct ip_reply_arg and use it for output routing. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-06-07 13:38:51 -07:00
Wei Dong	584bdf8cbd	[IPV4]: Fix "ipOutNoRoutes" counter error for TCP and UDP Signed-off-by: Wei Dong <weidong@cn.fujitsu.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-06-03 18:08:50 -07:00
Herbert Xu	604763722c	[NET]: Treat CHECKSUM_PARTIAL as CHECKSUM_UNNECESSARY When a transmitted packet is looped back directly, CHECKSUM_PARTIAL maps to the semantics of CHECKSUM_UNNECESSARY. Therefore we should treat it as such in the stack. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:28:43 -07:00
Herbert Xu	663ead3bb8	[NET]: Use csum_start offset instead of skb_transport_header The skb transport pointer is currently used to specify the start of the checksum region for transmit checksum offload. Unfortunately, the same pointer is also used during receive side processing. This creates a problem when we want to retransmit a received packet with partial checksums since the skb transport pointer would be overwritten. This patch solves this problem by creating a new 16-bit csum_start offset value to replace the skb transport header for the purpose of checksums. This offset is calculated from skb->head so that it does not have to change when skb->data changes. No extra space is required since csum_offset itself fits within a 16-bit word so we can use the other 16 bits for csum_start. For backwards compatibility, just before we push a packet with partial checksums off into the device driver, we set the skb transport header to what it would have been under the old scheme. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:28:40 -07:00
Eric Dumazet	4103f8cd5c	[TCP]: tcp_memory_pressure and tcp_socket are__read_mostly candidates tcp_memory_pressure and tcp_socket currently share a cache line with tcp_memory_allocated, tcp_sockets_allocated. (Very hot cache line) It makes sense to declare these variables as __read_mostly, to avoid false sharing on SMP. ffffffff8081d9c0 B tcp_orphan_count ffffffff8081d9c4 B tcp_memory_allocated ffffffff8081d9c8 B tcp_sockets_allocated ffffffff8081d9cc B tcp_memory_pressure ffffffff8081d9d0 b tcp_md5sig_users ffffffff8081d9d8 b tcp_md5sig_pool ffffffff8081d9e0 b warntime.31570 ffffffff8081d9e8 b tcp_socket Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:28:19 -07:00
Arnaldo Carvalho de Melo	aa8223c7bb	[SK_BUFF]: Introduce tcp_hdr(), remove skb->h.th Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:25:26 -07:00
Arnaldo Carvalho de Melo	ab6a5bb6b2	[TCP]: Introduce tcp_hdrlen() and tcp_optlen() The ip_hdrlen() buddy, created to reduce the number of skb->h.th-> uses and to avoid the longer, open coded equivalent. Ditched a no-op in bnx2 in the process. I wonder if we should have a BUG_ON(skb->h.th->doff < 5) in tcp_optlen()... Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:25:24 -07:00
Arnaldo Carvalho de Melo	88c7664f13	[SK_BUFF]: Introduce icmp_hdr(), remove skb->h.icmph Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:25:23 -07:00
Arnaldo Carvalho de Melo	eddc9ec53b	[SK_BUFF]: Introduce ip_hdr(), remove skb->nh.iph Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:25:10 -07:00
David S. Miller	fe067e8ab5	[TCP]: Abstract out all write queue operations. This allows the write queue implementation to be changed, for example, to one which allows fast interval searching. Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:24:02 -07:00
James Morris	9d729f72dc	[NET]: Convert xtime.tv_sec to get_seconds() Where appropriate, convert references to xtime.tv_sec to the get_seconds() helper function. Signed-off-by: James Morris <jmorris@namei.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:23:32 -07:00
Ilpo Järvinen	cf4c6bf83d	[TCP]: struct *sock argument renamed: sp -> sk In general, TCP code uses "sk" for struct sock pointer. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:23:20 -07:00
YOSHIFUJI Hideaki	e905a9edab	[NET] IPV4: Fix whitespace errors. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-02-10 23:19:39 -08:00
Eric Dumazet	dbca9b2750	[NET]: change layout of ehash table ehash table layout is currently this one : First half of this table is used by sockets not in TIME_WAIT state Second half of it is used by sockets in TIME_WAIT state. This is non optimal because of for a given hash or socket, the two chain heads are located in separate cache lines. Moreover the locks of the second half are never used. If instead of this halving, we use two list heads in inet_ehash_bucket instead of only one, we probably can avoid one cache miss, and reduce ram usage, particularly if sizeof(rwlock_t) is big (various CONFIG_DEBUG_SPINLOCK, CONFIG_DEBUG_LOCK_ALLOC settings). So we still halves the table but we keep together related chains to speedup lookups and socket state change. In this patch I did not try to align struct inet_ehash_bucket, but a future patch could try to make this structure have a convenient size (a power of two or a multiple of L1_CACHE_SIZE). I guess rwlock will just vanish as soon as RCU is plugged into ehash :) , so maybe we dont need to scratch our heads to align the bucket... Note : In case struct inet_ehash_bucket is not a power of two, we could probably change alloc_large_system_hash() (in case it use __get_free_pages()) to free the unused space. It currently allocates a big zone, but the last quarter of it could be freed. Again, this should be a temporary 'problem'. Patch tested on ipv4 tcp only, but should be OK for IPV6 and DCCP. Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-02-08 14:16:46 -08:00
David S. Miller	8eb9086f21	[IPV4/IPV6]: Always wait for IPSEC SA resolution in socket contexts. Do this even for non-blocking sockets. This avoids the silly -EAGAIN that applications can see now, even for non-blocking sockets in some cases (f.e. connect()). With help from Venkat Tekkirala. Signed-off-by: David S. Miller <davem@davemloft.net>	2007-02-08 12:38:45 -08:00
Frederik Deweerdt	ba7808eac1	[TCP]: remove tcp header from tcp_v4_check (take #2 ) The tcphdr struct passed to tcp_v4_check is not used, the following patch removes it from the parameter list. This adds the netfilter modifications missing in the patch I sent for rc3-mm1. Signed-off-by: Frederik Deweerdt <frederik.deweerdt@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-02-08 12:38:44 -08:00
Craig Schlenter	cb48cfe807	[TCP]: Fix iov_len calculation in tcp_v4_send_ack(). This fixes the ftp stalls present in the current kernels. All credit goes to Komuro <komurojun-mbn@nifty.com> for tracking this down. The patch is untested but it looks cough obviously correct. Signed-off-by: Craig Schlenter <craig@codefountain.com> Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-01-09 00:30:08 -08:00
Leigh Brown	a9fc00cca8	[TCP]: Trivial fix to message in tcp_v4_inbound_md5_hash The message logged in tcp_v4_inbound_md5_hash when the hash was expected but not found was reversed. Signed-off-by: Leigh Brown <leigh@solinno.co.uk> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-12-17 21:59:26 -08:00
Leigh Brown	8228a18dd3	[TCP]: Fix oops caused by tcp_v4_md5_do_del md5sig_info.alloced4 must be set to zero when freeing keys4, otherwise it will not be alloc'd again when another key is added to the same socket by tcp_v4_md5_do_add. Signed-off-by: Leigh Brown <leigh@solinno.co.uk> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-12-17 21:59:25 -08:00
Andrew Morton	b6332e6cf9	[TCP]: Fix warnings with TCP_MD5SIG disabled. Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-12-02 21:31:52 -08:00
Adrian Bunk	f5b99bcddd	[NET]: Possible cleanups. This patch contains the following possible cleanups: - make the following needlessly global functions statis: - ipv4/tcp.c: __tcp_alloc_md5sig_pool() - ipv4/tcp_ipv4.c: tcp_v4_reqsk_md5_lookup() - ipv4/udplite.c: udplite_rcv() - ipv4/udplite.c: udplite_err() - make the following needlessly global structs static: - ipv4/tcp_ipv4.c: tcp_request_sock_ipv4_ops - ipv4/tcp_ipv4.c: tcp_sock_ipv4_specific - ipv6/tcp_ipv6.c: tcp_request_sock_ipv6_ops - net/ipv{4,6}/udplite.c: remove inline's from static functions (gcc should know best when to inline them) Signed-off-by: Adrian Bunk <bunk@stusta.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-12-02 21:31:51 -08:00
David S. Miller	08dd1a506b	[TCP] MD5SIG: Kill CONFIG_TCP_MD5SIG_DEBUG. It just obfuscates the code and adds limited value. And as Adrian Bunk noticed, it lacked Kconfig help text too, so just kill it. Signed-off-by: David S. Miller <davem@davemloft.net>	2006-12-02 21:31:47 -08:00
Al Viro	ff1dcadb1b	[NET]: Split skb->csum ... into anonymous union of __wsum and __u32 (csum and csum_offset resp.) Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-12-02 21:27:18 -08:00
Al Viro	8e5200f540	[NET]: Fix assorted misannotations (from md5 and udplite merges). Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-12-02 21:27:16 -08:00
Arnaldo Carvalho de Melo	f6685938f9	[TCP_IPV4]: Use kmemdup where appropriate Also use a variable to avoid the longish tp->md5sig_info-> use in tcp_v4_md5_do_add. Code diff stats: [acme@newtoy net-2.6.20]$ codiff /tmp/tcp_ipv4.o.before /tmp/tcp_ipv4.o.after /pub/scm/linux/kernel/git/acme/net-2.6.20/net/ipv4/tcp_ipv4.c: tcp_v4_md5_do_add \| -62 tcp_v4_syn_recv_sock \| -32 tcp_v4_parse_md5_keys \| -86 3 functions changed, 180 bytes removed [acme@newtoy net-2.6.20]$ Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>	2006-12-02 21:23:54 -08:00
Arnaldo Carvalho de Melo	7174259e6c	[TCP_IPV4]: CodingStyle cleanups, no code change Mostly related to CONFIG_TCP_MD5SIG recent merge. Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>	2006-12-02 21:23:53 -08:00
Al Viro	b51655b958	[NET]: Annotate __skb_checksum_complete() and friends. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-12-02 21:23:38 -08:00
Al Viro	714e85be35	[IPV6]: Assorted trivial endianness annotations. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-12-02 21:22:50 -08:00
YOSHIFUJI Hideaki	cfb6eeb4c8	[TCP]: MD5 Signature Option (RFC2385) support. Based on implementation by Rick Payne. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-12-02 21:22:39 -08:00
Gerrit Renker	b9df3cb8cf	[TCP/DCCP]: Introduce net_xmit_eval Throughout the TCP/DCCP (and tunnelling) code, it often happens that the return code of a transmit function needs to be tested against NET_XMIT_CN which is a value that does not indicate a strict error condition. This patch uses a macro for these recurring situations which is consistent with the already existing macro net_xmit_errno, saving on duplicated code. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>	2006-12-02 21:22:27 -08:00
Gerrit Renker	a94f723d59	[TCP]: Remove dead code in init_sequence This removes two redundancies: 1) The test (skb->protocol == htons(ETH_P_IPV6) in tcp_v6_init_sequence() is always true, due to * tcp_v6_conn_request() is the only function calling this one * tcp_v6_conn_request() redirects all skb's with ETH_P_IP protocol to tcp_v4_conn_request() [ cf. top of tcp_v6_conn_request()] 2) The first argument, `struct sock *sk' of tcp_v{4,6}_init_sequence() is never used. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-12-02 21:22:10 -08:00
Eric Dumazet	72a3effaf6	[NET]: Size listen hash tables using backlog hint We currently allocate a fixed size (TCP_SYNQ_HSIZE=512) slots hash table for each LISTEN socket, regardless of various parameters (listen backlog for example) On x86_64, this means order-1 allocations (might fail), even for 'small' sockets, expecting few connections. On the contrary, a huge server wanting a backlog of 50000 is slowed down a bit because of this fixed limit. This patch makes the sizing of listen hash table a dynamic parameter, depending of : - net.core.somaxconn tunable (default is 128) - net.ipv4.tcp_max_syn_backlog tunable (default : 256, 1024 or 128) - backlog value given by user application (2nd parameter of listen()) For large allocations (bigger than PAGE_SIZE), we use vmalloc() instead of kmalloc(). We still limit memory allocation with the two existing tunables (somaxconn & tcp_max_syn_backlog). So for standard setups, this patch actually reduce RAM usage. Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-12-02 21:21:44 -08:00
Eric Dumazet	06ca719fad	[TCP]: One NET_INC_STATS() could be NET_INC_STATS_BH in tcp_v4_err() I believe this NET_INC_STATS() call can be replaced by NET_INC_STATS_BH(), a little bit cheaper. Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-10-20 00:22:25 -07:00
YOSHIFUJI Hideaki	9469c7b4aa	[NET]: Use typesafe inet_twsk() inline function instead of cast. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-10-11 23:59:58 -07:00
YOSHIFUJI Hideaki	4244f8a9f8	[TCP]: Use TCPOLEN_TSTAMP_ALIGNED macro instead of magic number. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-10-11 23:59:54 -07:00
Al Viro	23f33c2d4f	[IPV4]: struct inet_timewait_sock annotations Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-09-28 18:02:27 -07:00
Al Viro	adaf345b53	[IPV4]: annotate address in inet_request_sock Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-09-28 18:01:51 -07:00
Al Viro	bada8adc4e	[IPV4]: ip_route_connect() ipv4 address arguments annotated annotated address arguments (port number left alone for now); ditto for inferred net-endian variables in callers. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-09-28 17:54:06 -07:00
Dmitry Mishin	fda9ef5d67	[NET]: Fix sk->sk_filter field access Function sk_filter() is called from tcp_v{4,6}_rcv() functions with arg needlock = 0, while socket is not locked at that moment. In order to avoid this and similar issues in the future, use rcu for sk->sk_filter field read protection. Signed-off-by: Dmitry Mishin <dim@openvz.org> Signed-off-by: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru> Signed-off-by: Kirill Korotaev <dev@openvz.org>	2006-09-22 15:18:47 -07:00
Brian Haley	ab32ea5d8a	[NET/IPV4/IPV6]: Change some sysctl variables to __read_mostly Change net/core, ipv4 and ipv6 sysctl variables to __read_mostly. Couldn't actually measure any performance increase while testing (.3% I consider noise), but seems like the right thing to do. Signed-off-by: Brian Haley <brian.haley@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-09-22 14:55:03 -07:00
Herbert Xu	8f491069b4	[IPV4]: Use network-order dport for all visible inet_lookup_* Right now most inet_lookup_* functions take a host-order hnum instead of a network-order dport because that's how it is represented internally. This means that users of these functions have to be careful about using the right byte-order. To add more confusion, inet_lookup takes a network-order dport unlike all other functions. So this patch changes all visible inet_lookup functions to take a dport and move all dport->hnum conversion inside them. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-09-22 14:54:14 -07:00
Patrick McHardy	84fa7933a3	[NET]: Replace CHECKSUM_HW by CHECKSUM_PARTIAL/CHECKSUM_COMPLETE Replace CHECKSUM_HW by CHECKSUM_PARTIAL (for outgoing packets, whose checksum still needs to be completed) and CHECKSUM_COMPLETE (for incoming packets, device supplied full checksum). Patch originally from Herbert Xu, updated by myself for 2.6.18-rc3. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-09-22 14:53:53 -07:00
Venkat Yekkirala	4237c75c0a	[MLSXFRM]: Auto-labeling of child sockets This automatically labels the TCP, Unix stream, and dccp child sockets as well as openreqs to be at the same MLS level as the peer. This will result in the selection of appropriately labeled IPSec Security Associations. This also uses the sock's sid (as opposed to the isec sid) in SELinux enforcement of secmark in rcv_skb and postroute_last hooks. Signed-off-by: Venkat Yekkirala <vyekkirala@TrustedCS.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-09-22 14:53:29 -07:00
Wei Yongjun	3687b1dc6f	[TCP]: SNMPv2 tcpAttemptFails counter error Refer to RFC2012, tcpAttemptFails is defined as following: tcpAttemptFails OBJECT-TYPE SYNTAX Counter32 MAX-ACCESS read-only STATUS current DESCRIPTION "The number of times TCP connections have made a direct transition to the CLOSED state from either the SYN-SENT state or the SYN-RCVD state, plus the number of times TCP connections have made a direct transition to the LISTEN state from the SYN-RCVD state." ::= { tcp 7 } When I lookup into RFC793, I found that the state change should occured under following condition: 1. SYN-SENT -> CLOSED a) Received ACK,RST segment when SYN-SENT state. 2. SYN-RCVD -> CLOSED b) Received SYN segment when SYN-RCVD state(came from LISTEN). c) Received RST segment when SYN-RCVD state(came from SYN-SENT). d) Received SYN segment when SYN-RCVD state(came from SYN-SENT). 3. SYN-RCVD -> LISTEN e) Received RST segment when SYN-RCVD state(came from LISTEN). In my test, those direct state transition can not be counted to tcpAttemptFails. Signed-off-by: Wei Yongjun <yjwei@nanjing-fnst.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-08-02 13:38:19 -07:00
Panagiotis Issaris	0da974f4f3	[NET]: Conversions from kmalloc+memset to k(z\|c)alloc. Signed-off-by: Panagiotis Issaris <takis@issaris.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-07-21 14:51:30 -07:00
Herbert Xu	a430a43d08	[NET] gso: Fix up GSO packets with broken checksums Certain subsystems in the stack (e.g., netfilter) can break the partial checksum on GSO packets. Until they're fixed, this patch allows this to work by recomputing the partial checksums through the GSO mechanism. Once they've all been converted to update the partial checksum instead of clearing it, this workaround can be removed. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-07-08 13:34:56 -07:00
Ingo Molnar	c636618485	[PATCH] lockdep: annotate bh_lock_sock() Teach special (recursive) locking code to the lock validator. Has no effect on non-lockdep kernels. Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Arjan van de Ven <arjan@linux.intel.com> Cc: "David S. Miller" <davem@davemloft.net> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-07-03 15:27:08 -07:00
Ingo Molnar	e4d9191885	[PATCH] lockdep: locking init debugging improvement Locking init improvement: - introduce and use __SPIN_LOCK_UNLOCKED for array initializations, to pass in the name string of locks, used by debugging Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Arjan van de Ven <arjan@linux.intel.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-07-03 15:27:02 -07:00
Linus Torvalds	e37a72de84	Merge master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 * master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6: [IPV6]: Added GSO support for TCPv6 [NET]: Generalise TSO-specific bits from skb_setup_caps [IPV6]: Added GSO support for TCPv6 [IPV6]: Remove redundant length check on input [NETFILTER]: SCTP conntrack: fix crash triggered by packet without chunks [TG3]: Update version and reldate [TG3]: Add TSO workaround using GSO [TG3]: Turn on hw fix for ASF problems [TG3]: Add rx BD workaround [TG3]: Add tg3_netif_stop() in vlan functions [TCP]: Reset gso_segs if packet is dodgy	2006-06-30 15:40:17 -07:00
Herbert Xu	bcd7611117	[NET]: Generalise TSO-specific bits from skb_setup_caps This patch generalises the TSO-specific bits from sk_setup_caps by adding the sk_gso_type member to struct sock. This makes sk_setup_caps generic so that it can be used by TCPv6 or UFO. The only catch is that whoever uses this must provide a GSO implementation for their protocol which I think is a fair deal :) For now UFO continues to live without a GSO implementation which is OK since it doesn't use the sock caps field at the moment. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-06-30 14:12:08 -07:00
Jörn Engel	6ab3d5624e	Remove obsolete #include <linux/config.h> Signed-off-by: Jörn Engel <joern@wohnheim.fh-wedel.de> Signed-off-by: Adrian Bunk <bunk@stusta.de>	2006-06-30 19:25:36 +02:00
Sridhar Samudrala	47da8ee681	[TCP]: Export accept queue len of a TCP listening socket via rx_queue While debugging a TCP server hang issue, we noticed that currently there is no way for a user to get the acceptq backlog value for a TCP listen socket. All the standard networking utilities that display socket info like netstat, ss and /proc/net/tcp have 2 fields called rx_queue and tx_queue. These fields do not mean much for listening sockets. This patch uses one of these unused fields(rx_queue) to export the accept queue len for listening sockets. Signed-off-by: Sridhar Samudrala <sri@us.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-06-29 16:57:57 -07:00
Chris Leech	1a2449a87b	[I/OAT]: TCP recv offload to I/OAT Locks down user pages and sets up for DMA in tcp_recvmsg, then calls dma_async_try_early_copy in tcp_v4_do_rcv Signed-off-by: Chris Leech <christopher.leech@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-06-17 21:25:56 -07:00
Adrian Bunk	6c97e72a16	[IPV4]: Possible cleanups. This patch contains the following possible cleanups: - make the following needlessly global function static: - arp.c: arp_rcv() - remove the following unused EXPORT_SYMBOL's: - devinet.c: devinet_ioctl - fib_frontend.c: ip_rt_ioctl - inet_hashtables.c: inet_bind_bucket_create - inet_hashtables.c: inet_bind_hash - tcp_input.c: sysctl_tcp_abc - tcp_ipv4.c: sysctl_tcp_tw_reuse - tcp_output.c: sysctl_tcp_mtu_probing - tcp_output.c: sysctl_tcp_base_mss Signed-off-by: Adrian Bunk <bunk@stusta.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-04-14 15:00:20 -07:00
Arnaldo Carvalho de Melo	543d9cfeec	[NET]: Identation & other cleanups related to compat_[gs]etsockopt cset No code changes, just tidying up, in some cases moving EXPORT_SYMBOLs to just after the function exported, etc. Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-03-20 22:48:35 -08:00
Dmitry Mishin	3fdadf7d27	[NET]: {get\|set}sockopt compatibility layer This patch extends {get\|set}sockopt compatibility layer in order to move protocol specific parts to their place and avoid huge universal net/compat.c file in the future. Signed-off-by: Dmitry Mishin <dim@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-03-20 22:45:21 -08:00
Arnaldo Carvalho de Melo	c4d9390941	[ICSK]: Introduce inet_csk_ctl_sock_create Consolidating open coded sequences in tcp and dccp, v4 and v6. Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-03-20 22:01:03 -08:00
John Heffner	5d424d5a67	[TCP]: MTU probing Implementation of packetization layer path mtu discovery for TCP, based on the internet-draft currently found at <http://www.ietf.org/internet-drafts/draft-ietf-pmtud-method-05.txt>. Signed-off-by: John Heffner <jheffner@psc.edu> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-03-20 17:53:41 -08:00
Sam Ravnborg	f9d9516db7	[NET]: Do not export inet_bind_bucket_create twice. inet_bind_bucket_create was exported twice. Keep the export in the file where inet_bind_bucket_create is defined. Signed-off-by: Sam Ravnborg <sam@ravnborg.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-01-31 17:47:02 -08:00
Patrick McHardy	5d39a795bf	[IPV4]: Always set fl.proto in ip_route_newports ip_route_newports uses the struct flowi from the struct rtable returned by ip_route_connect for the new route lookup and just replaces the port numbers if they have changed. If an IPsec policy exists which doesn't match port 0 the struct flowi won't have the proto field set and no xfrm lookup is done for the changed ports. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-01-31 17:35:35 -08:00
Patrick McHardy	b59c270104	[NETFILTER]: Keep conntrack reference until IPsec policy checks are done Keep the conntrack reference until policy checks have been performed for IPsec NAT support. The reference needs to be dropped before a packet is queued to avoid having the conntrack module unloadable. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-01-07 12:57:36 -08:00
Arnaldo Carvalho de Melo	80e40daa47	[TCP]: syn_flood_warning is only needed if CONFIG_SYN_COOKIES is selected CC net/ipv4/tcp_ipv4.o /pub/scm/linux/kernel/git/acme/net-2.6/net/ipv4/tcp_ipv4.c:665: warning: 'syn_flood_warning' defined but not used Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>	2006-01-04 01:58:06 -02:00
Stephen Hemminger	40efc6fa17	[TCP]: less inline's TCP inline usage cleanup: * get rid of inline in several places * replace __inline__ with inline where possible * move functions used in one file out of tcp.h * let compiler decide on used once cases On x86_64: text data bss dec hex filename 3594701 648348 567400 4810449 4966d1 vmlinux.orig 3593133 648580 567400 4809113 496199 vmlinux On sparc64: text data bss dec hex filename 2538278 406152 530392 3474822 350586 vmlinux.ORIG `2536382` 406384 530392 3473158 34ff06 vmlinux Signed-off-by: Stephen Hemminger <shemminger@osdl.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-01-03 16:03:49 -08:00
Arnaldo Carvalho de Melo	d83d8461f9	[IP_SOCKGLUE]: Remove most of the tcp specific calls As DCCP needs to be called in the same spots. Now we have a member in inet_sock (is_icsk), set at sock creation time from struct inet_protosw->flags (if INET_PROTOSW_ICSK is set, like for TCP and DCCP) to see if a struct sock instance is a inet_connection_sock for places like the ones in ip_sockglue.c (v4 and v6) where we previously were looking if sk_type was SOCK_STREAM, that is insufficient because we now use the same code for DCCP, that has sk_type SOCK_DCCP. Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-01-03 13:10:58 -08:00
Arnaldo Carvalho de Melo	a7f5e7f164	[INET]: Generalise tcp_v4_hash_connect Renaming it to inet_hash_connect, making it possible to ditch dccp_v4_hash_connect and share the same code with TCP instead. Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-01-03 13:10:55 -08:00
Arnaldo Carvalho de Melo	6d6ee43e0b	[TWSK]: Introduce struct timewait_sock_ops So that we can share several timewait sockets related functions and make the timewait mini sockets infrastructure closer to the request mini sockets one. Next changesets will take advantage of this, moving more code out of TCP and DCCP v4 and v6 to common infrastructure. Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-01-03 13:10:54 -08:00
Arnaldo Carvalho de Melo	af05dc9394	[ICSK]: Move v4_addr2sockaddr from TCP to icsk Renaming it to inet_csk_addr2sockaddr. Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-01-03 13:10:39 -08:00
Arnaldo Carvalho de Melo	8292a17a39	[ICSK]: Rename struct tcp_func to struct inet_connection_sock_af_ops And move it to struct inet_connection_sock. DCCP will use it in the upcoming changesets. Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-01-03 13:10:38 -08:00
Arnaldo Carvalho de Melo	971af18bbf	[IPV6]: Reuse inet_csk_get_port in tcp_v6_get_port Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-01-03 13:10:33 -08:00
Stephen Hemminger	caa20d9abe	[TCP]: spelling fixes Minor spelling fixes for TCP code. Signed-off-by: Stephen Hemminger <shemminger@osdl.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-11-10 17:13:47 -08:00
Herbert Xu	fb286bb299	[NET]: Detect hardware rx checksum faults correctly Here is the patch that introduces the generic skb_checksum_complete which also checks for hardware RX checksum faults. If that happens, it'll call netdev_rx_csum_fault which currently prints out a stack trace with the device name. In future it can turn off RX checksum. I've converted every spot under net/ that does RX checksum checks to use skb_checksum_complete or __skb_checksum_complete with the exceptions of: * Those places where checksums are done bit by bit. These will call netdev_rx_csum_fault directly. * The following have not been completely checked/converted: ipmr ip_vs netfilter dccp This patch is based on patches and suggestions from Stephen Hemminger and David S. Miller. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-11-10 13:01:24 -08:00
Jesper Juhl	a51482bde2	[NET]: kfree cleanup From: Jesper Juhl <jesper.juhl@gmail.com> This is the net/ part of the big kfree cleanup patch. Remove pointless checks for NULL prior to calling kfree() in net/. Signed-off-by: Jesper Juhl <jesper.juhl@gmail.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Arnaldo Carvalho de Melo <acme@conectiva.com.br> Acked-by: Marcel Holtmann <marcel@holtmann.org> Acked-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: Andrew Morton <akpm@osdl.org>	2005-11-08 09:41:34 -08:00
Stephen Hemminger	6df716340d	[TCP/DCCP]: Randomize port selection This patch randomizes the port selected on bind() for connections to help with possible security attacks. It should also be faster in most cases because there is no need for a global lock. Signed-off-by: Stephen Hemminger <shemminger@osdl.org> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>	2005-11-05 21:23:15 -02:00
Eric Dumazet	81c3d5470e	[INET]: speedup inet (tcp/dccp) lookups Arnaldo and I agreed it could be applied now, because I have other pending patches depending on this one (Thank you Arnaldo) (The other important patch moves skc_refcnt in a separate cache line, so that the SMP/NUMA performance doesnt suffer from cache line ping pongs) 1) First some performance data : -------------------------------- tcp_v4_rcv() wastes a lot of time in __inet_lookup_established() The most time critical code is : sk_for_each(sk, node, &head->chain) { if (INET_MATCH(sk, acookie, saddr, daddr, ports, dif)) goto hit; /* You sunk my battleship! / } The sk_for_each() does use prefetch() hints but only the begining of "struct sock" is prefetched. As INET_MATCH first comparison uses inet_sk(__sk)->daddr, wich is far away from the begining of "struct sock", it has to bring into CPU cache cold cache line. Each iteration has to use at least 2 cache lines. This can be problematic if some chains are very long. 2) The goal ----------- The idea I had is to change things so that INET_MATCH() may return FALSE in 99% of cases only using the data already in the CPU cache, using one cache line per iteration. 3) Description of the patch --------------------------- Adds a new 'unsigned int skc_hash' field in 'struct sock_common', filling a 32 bits hole on 64 bits platform. struct sock_common { unsigned short skc_family; volatile unsigned char skc_state; unsigned char skc_reuse; int skc_bound_dev_if; struct hlist_node skc_node; struct hlist_node skc_bind_node; atomic_t skc_refcnt; + unsigned int skc_hash; struct proto skc_prot; }; Store in this 32 bits field the full hash, not masked by (ehash_size - 1) Using this full hash as the first comparison done in INET_MATCH permits us immediatly skip the element without touching a second cache line in case of a miss. Suppress the sk_hashent/tw_hashent fields since skc_hash (aliased to sk_hash and tw_hash) already contains the slot number if we mask with (ehash_size - 1) File include/net/inet_hashtables.h 64 bits platforms : #define INET_MATCH(__sk, __hash, __cookie, __saddr, __daddr, __ports, __dif)\ (((__sk)->sk_hash == (__hash)) ((((__u64 )&(inet_sk(__sk)->daddr)))== (__cookie)) && \ ((((__u32 )&(inet_sk(__sk)->dport))) == (__ports)) && \ (!((__sk)->sk_bound_dev_if) \|\| ((__sk)->sk_bound_dev_if == (__dif)))) 32bits platforms: #define TCP_IPV4_MATCH(__sk, __hash, __cookie, __saddr, __daddr, __ports, __dif)\ (((__sk)->sk_hash == (__hash)) && \ (inet_sk(__sk)->daddr == (__saddr)) && \ (inet_sk(__sk)->rcv_saddr == (__daddr)) && \ (!((__sk)->sk_bound_dev_if) \|\| ((__sk)->sk_bound_dev_if == (__dif)))) - Adds a prefetch(head->chain.first) in __inet_lookup_established()/__tcp_v4_check_established() and __inet6_lookup_established()/__tcp_v6_check_established() and __dccp_v4_check_established() to bring into cache the first element of the list, before the {read\|write}_lock(&head->lock); Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Acked-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-10-03 14:13:38 -07:00
Arnaldo Carvalho de Melo	20380731bc	[NET]: Fix sparse warnings Of this type, mostly: CHECK net/ipv6/netfilter.c net/ipv6/netfilter.c:96:12: warning: symbol 'ipv6_netfilter_init' was not declared. Should it be static? net/ipv6/netfilter.c:101:6: warning: symbol 'ipv6_netfilter_fini' was not declared. Should it be static? Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-08-29 16:01:32 -07:00
Arnaldo Carvalho de Melo	6687e988d9	[ICSK]: Move TCP congestion avoidance members to icsk This changeset basically moves tcp_sk()->{ca_ops,ca_state,etc} to inet_csk(), minimal renaming/moving done in this changeset to ease review. Most of it is just changes of struct tcp_sock * to struct sock * parameters. With this we move to a state closer to two interesting goals: 1. Generalisation of net/ipv4/tcp_diag.c, becoming inet_diag.c, being used for any INET transport protocol that has struct inet_hashinfo and are derived from struct inet_connection_sock. Keeps the userspace API, that will just not display DCCP sockets, while newer versions of tools can support DCCP. 2. INET generic transport pluggable Congestion Avoidance infrastructure, using the current TCP CA infrastructure with DCCP. Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-08-29 15:56:18 -07:00
Patrick McHardy	64ce207306	[NET]: Make NETDEBUG pure printk wrappers Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-08-29 15:56:08 -07:00
Arnaldo Carvalho de Melo	295ff7edb8	[TIMEWAIT]: Introduce inet_timewait_death_row That groups all of the tables and variables associated to the TCP timewait schedulling/recycling/killing code, that now can be isolated from the TCP specific code and used by other transport protocols, such as DCCP. Next changeset will move this code to net/ipv4/inet_timewait_sock.c Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-08-29 15:55:48 -07:00
Arnaldo Carvalho de Melo	0a5578cf8e	[ICSK]: Generalise tcp_listen_{start,stop} This also moved inet_iif from tcp to inet_hashtables.h, as it is needed by the inet_lookup callers, perhaps this needs a bit of polishing, but for now seems fine. Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-08-29 15:49:24 -07:00
Arnaldo Carvalho de Melo	3f421baa47	[NET]: Just move the inet_connection_sock function from tcp sources Completing the previous changeset, this also generalises tcp_v4_synq_add, renaming it to inet_csk_reqsk_queue_hash_add, already geing used in the DCCP tree, which I plan to merge RSN. Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-08-29 15:49:14 -07:00
Arnaldo Carvalho de Melo	463c84b97f	[NET]: Introduce inet_connection_sock This creates struct inet_connection_sock, moving members out of struct tcp_sock that are shareable with other INET connection oriented protocols, such as DCCP, that in my private tree already uses most of these members. The functions that operate on these members were renamed, using a inet_csk_ prefix while not being moved yet to a new file, so as to ease the review of these changes. Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-08-29 15:43:19 -07:00
Arnaldo Carvalho de Melo	e48c414ee6	[INET]: Generalise the TCP sock ID lookup routines And also some TIME_WAIT functions. [acme@toy net-2.6.14]$ grep built-in /tmp/before.size /tmp/after.size /tmp/before.size: 282955 13122 9312 305389 4a8ed net/ipv4/built-in.o /tmp/after.size: 281566 13122 9312 304000 4a380 net/ipv4/built-in.o [acme@toy net-2.6.14]$ I kept them still inlined, will uninline at some point to see what would be the performance difference. Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-08-29 15:42:18 -07:00
Arnaldo Carvalho de Melo	8feaf0c0a5	[INET]: Generalise tcp_tw_bucket, aka TIME_WAIT sockets This paves the way to generalise the rest of the sock ID lookup routines and saves some bytes in TCPv4 TIME_WAIT sockets on distro kernels (where IPv6 is always built as a module): [root@qemu ~]# grep tw_sock /proc/slabinfo tw_sock_TCPv6 0 0 128 31 1 tw_sock_TCP 0 0 96 41 1 [root@qemu ~]# Now if a protocol wants to use the TIME_WAIT generic infrastructure it only has to set the sk_prot->twsk_obj_size field with the size of its inet_timewait_sock derived sock and proto_register will create sk_prot->twsk_slab, for now its only for INET sockets, but we can introduce timewait_sock later if some non INET transport protocolo wants to use this stuff. Next changesets will take advantage of this new infrastructure to generalise even more TCP code. [acme@toy net-2.6.14]$ grep built-in /tmp/before.size /tmp/after.size /tmp/before.size: 188646 11764 5068 205478 322a6 net/ipv4/built-in.o /tmp/after.size: 188144 11764 5068 204976 320b0 net/ipv4/built-in.o [acme@toy net-2.6.14]$ Tested with both IPv4 & IPv6 (::1 (localhost) & ::ffff:172.20.0.1 (qemu host)). Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-08-29 15:42:13 -07:00
Arnaldo Carvalho de Melo	33b6223190	[INET]: Generalise tcp_v4_lookup_listener [acme@toy net-2.6.14]$ grep built-in /tmp/before /tmp/after /tmp/before: 282560 13122 9312 304994 4a762 net/ipv4/built-in.o /tmp/after: 282560 13122 9312 304994 4a762 net/ipv4/built-in.o Will be used in DCCP, not exporting it right now not to get in Adrian Bunk's exported-but-not-used-on-modules radar 8) Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-08-29 15:42:08 -07:00
Arnaldo Carvalho de Melo	81849d106b	[INET]: Generalise tcp_v4_hash & tcp_unhash It really just makes the existing code be a helper function that tcp_v4_hash and tcp_unhash uses, specifying the right inet_hashinfo, tcp_hashinfo. One thing I'll investigate at some point is to have the inet_hashinfo pointer in sk_prot, so that we get all the hashtable information from the sk pointer, this can lead to some extra indirections that may well hurt performance/code size, we'll see. Ultimate idea would be that sk_prot would provide _all_ the information about a protocol implementation. Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-08-29 15:42:02 -07:00
Arnaldo Carvalho de Melo	f3f05f7046	[INET]: Generalise the tcp_listen_ lock routines Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-08-29 15:41:49 -07:00
Arnaldo Carvalho de Melo	6e04e02165	[INET]: Move tcp_port_rover to inet_hashinfo Also expose all of the tcp_hashinfo members, i.e. killing those tcp_ehash, etc macros, this will more clearly expose already generic functions and some that need just a bit of work to become generic, as we'll see in the upcoming changesets. Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-08-29 15:41:44 -07:00
Arnaldo Carvalho de Melo	2d8c4ce519	[INET]: Generalise tcp_bind_hash & tcp_inherit_port This required moving tcp_bucket_cachep to inet_hashinfo. Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-08-29 15:40:29 -07:00
Arnaldo Carvalho de Melo	a55ebcc4c4	[INET]: Move bind_hash from tcp_sk to inet_sk This should really be in a inet_connection_sock, but I'm leaving it for a later optimization, when some more fields common to INET transport protocols now in tcp_sk or inet_sk will be chunked out into inet_connection_sock, for now its better to concentrate on getting the changes in the core merged to leave the DCCP tree with only DCCP specific code. Next changesets will take advantage of this move to generalise things like tcp_bind_hash, tcp_put_port, tcp_inherit_port, making the later receive a inet_hashinfo parameter, and even __tcp_tw_hashdance, etc in the future, when tcp_tw_bucket gets transformed into the struct timewait_sock hierarchy. tcp_destroy_sock also is eligible as soon as tcp_orphan_count gets moved to sk_prot. A cascade of incremental changes will ultimately make the tcp_lookup functions be fully generic. Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-08-29 15:38:48 -07:00
Arnaldo Carvalho de Melo	77d8bf9c62	[INET]: Move the TCP hashtable functions/structs to inet_hashtables.[ch] Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-08-29 15:38:39 -07:00
Arnaldo Carvalho de Melo	0f7ff9274e	[INET]: Just rename the TCP hashtable functions/structs to inet_ This is to break down the complexity of the series of patches, making it very clear that this one just does: 1. renames tcp_ prefixed hashtable functions and data structures that were already mostly generic to inet_ to share it with DCCP and other INET transport protocols. 2. Removes not used functions (__tb_head & tb_head) 3. Removes some leftover prototypes in the headers (tcp_bucket_unlock & tcp_v4_build_header) Next changesets will move tcp_sk(sk)->bind_hash to inet_sock so that we can make functions such as tcp_inherit_port, __tcp_inherit_port, tcp_v4_get_port, __tcp_put_port, generic and get others like tcp_destroy_sock closer to generic (tcp_orphan_count will go to sk->sk_prot to allow this). Eventually most of these functions will be used passing the transport protocol inet_hashinfo structure. Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-08-29 15:38:32 -07:00
Arnaldo Carvalho de Melo	304a16180f	[INET]: Move the TCP ehash functions to include/net/inet_hashtables.h To be shared with DCCP (and others), this is the start of a series of patches that will expose the already generic TCP hash table routines. The few changes noticed when calling gcc -S before/after on a pentium4 were of this type: movl 40(%esp), %edx cmpl %esi, 472(%edx) je .L168 - pushl $291 + pushl $272 pushl $.LC0 pushl $.LC1 pushl $.LC2 [acme@toy net-2.6.14]$ size net/ipv4/tcp_ipv4.before.o net/ipv4/tcp_ipv4.after.o text data bss dec hex filename 17804 516 140 18460 481c net/ipv4/tcp_ipv4.before.o 17804 516 140 18460 481c net/ipv4/tcp_ipv4.after.o Holler if some weird architecture has issues with things like this 8) Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-08-29 15:38:22 -07:00
Arnaldo Carvalho de Melo	32519f11d3	[INET]: Introduce inet_sk_rebuild_header From tcp_v4_rebuild_header, that already was pretty generic, I only needed to use sk->sk_protocol instead of the hardcoded IPPROTO_TCP and establish the requirement that INET transport layer protocols that want to use this function map TCP_SYN_SENT to its equivalent state. Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-08-29 15:37:55 -07:00
Arnaldo Carvalho de Melo	6cbb0df788	[SOCK]: Introduce sk_setup_caps From tcp_v4_setup_caps, that always is preceded by a call to __sk_dst_set, so coalesce this sequence into sk_setup_caps, removing one call to a TCP function in the IP layer. Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-08-29 15:37:48 -07:00
Arnaldo Carvalho de Melo	614c6cb4f2	[SOCK]: Rename __tcp_v4_rehash to __sk_prot_rehash This operation was already generic and DCCP will use it. Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-08-29 15:37:42 -07:00
David S. Miller	d5d283751e	[TCP]: Document non-trivial locking path in tcp_v{4,6}_get_port(). This trips up a lot of folks reading this code. Put an unlikely() around the port-exhaustion test for good measure. Signed-off-by: David S. Miller <davem@davemloft.net>	2005-08-23 10:49:54 -07:00
Heikki Orsila	ca9334523c	[IPV4]: Debug cleanup Here's a small patch to cleanup NETDEBUG() use in net/ipv4/ for Linux kernel 2.6.13-rc5. Also weird use of indentation is changed in some places. Signed-off-by: Heikki Orsila <heikki.orsila@iki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-08-08 14:26:52 -07:00
David S. Miller	c1b4a7e695	[TCP]: Move to new TSO segmenting scheme. Make TSO segment transmit size decisions at send time not earlier. The basic scheme is that we try to build as large a TSO frame as possible when pulling in the user data, but the size of the TSO frame output to the card is determined at transmit time. This is guided by tp->xmit_size_goal. It is always set to a multiple of MSS and tells sendmsg/sendpage how large an SKB to try and build. Later, tcp_write_xmit() and tcp_push_one() chop up the packet if necessary and conditions warrant. These routines can also decide to "defer" in order to wait for more ACKs to arrive and thus allow larger TSO frames to be emitted. A general observation is that TSO elongates the pipe, thus requiring a larger congestion window and larger buffering especially at the sender side. Therefore, it is important that applications 1) get a large enough socket send buffer (this is accomplished by our dynamic send buffer expansion code) 2) do large enough writes. Signed-off-by: David S. Miller <davem@davemloft.net>	2005-07-05 15:24:38 -07:00
Stephen Hemminger	5f8ef48d24	[TCP]: Allow choosing TCP congestion control via sockopt. Allow using setsockopt to set TCP congestion control to use on a per socket basis. Signed-off-by: Stephen Hemminger <shemminger@osdl.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-06-23 20:37:36 -07:00
Stephen Hemminger	317a76f9a4	[TCP]: Add pluggable congestion control algorithm infrastructure. Allow TCP to have multiple pluggable congestion control algorithms. Algorithms are defined by a set of operations and can be built in or modules. The legacy "new RENO" algorithm is used as a starting point and fallback. Signed-off-by: Stephen Hemminger <shemminger@osdl.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-06-23 12:19:55 -07:00
David S. Miller	e52c1f17e4	[NET]: Move sysctl_max_syn_backlog into request_sock.c This fixes the CONFIG_INET=n build failure noticed by Andrew Morton. Signed-off-by: David S. Miller <davem@davemloft.net>	2005-06-18 22:49:40 -07:00
Arnaldo Carvalho de Melo	2ad69c55a2	[NET] rename struct tcp_listen_opt to struct listen_sock Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-06-18 22:48:55 -07:00
Arnaldo Carvalho de Melo	0e87506fcc	[NET] Generalise tcp_listen_opt This chunks out the accept_queue and tcp_listen_opt code and moves them to net/core/request_sock.c and include/net/request_sock.h, to make it useful for other transport protocols, DCCP being the first one to use it. Next patches will rename tcp_listen_opt to accept_sock and remove the inline tcp functions that just call a reqsk_queue_ function. Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-06-18 22:47:59 -07:00
Arnaldo Carvalho de Melo	60236fdd08	[NET] Rename open_request to request_sock Ok, this one just renames some stuff to have a better namespace and to dissassociate it from TCP: struct open_request -> struct request_sock tcp_openreq_alloc -> reqsk_alloc tcp_openreq_free -> reqsk_free tcp_openreq_fastfree -> __reqsk_free With this most of the infrastructure closely resembles a struct sock methods subset. Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-06-18 22:47:21 -07:00
Arnaldo Carvalho de Melo	2e6599cb89	[NET] Generalise TCP's struct open_request minisock infrastructure Kept this first changeset minimal, without changing existing names to ease peer review. Basicaly tcp_openreq_alloc now receives the or_calltable, that in turn has two new members: ->slab, that replaces tcp_openreq_cachep ->obj_size, to inform the size of the openreq descendant for a specific protocol The protocol specific fields in struct open_request were moved to a class hierarchy, with the things that are common to all connection oriented PF_INET protocols in struct inet_request_sock, the TCP ones in tcp_request_sock, that is an inet_request_sock, that is an open_request. I.e. this uses the same approach used for the struct sock class hierarchy, with sk_prot indicating if the protocol wants to use the open_request infrastructure by filling in sk_prot->rsk_prot with an or_calltable. Results? Performance is improved and TCP v4 now uses only 64 bytes per open request minisock, down from 96 without this patch :-) Next changeset will rename some of the structs, fields and functions mentioned above, struct or_calltable is way unclear, better name it struct request_sock_ops, s/struct open_request/struct request_sock/g, etc. Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-06-18 22:46:52 -07:00
Folkert van Heusden	0b2531bdc5	[TCP]: Optimize check in port-allocation code. Signed-off-by: Folkert van Heusden <folkert@vanheusden.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-05-03 14:36:08 -07:00
Linus Torvalds	1da177e4c3	Linux-2.6.12-rc2 Initial git repository build. I'm not bothering with the full history, even though we have it. We can create a separate "historical" git archive of that later if we want to, and in the meantime it's about 3.2GB when imported into git - space that would just make the early git days unnecessarily complicated, when we don't have a lot of good infrastructure for it. Let it rip!	2005-04-16 15:20:36 -07:00

... 2 3 4 5 6

268 Commits