linux

Commit Graph

Author	SHA1	Message	Date
Eric Dumazet	c2d9ba9bce	net: CONFIG_NET_NS reduction Use read_pnet() and write_pnet() to reduce number of ifdef CONFIG_NET_NS Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-06-02 05:16:23 -07:00
Eric Dumazet	79640a4ca6	net: add additional lock to qdisc to increase throughput When many cpus compete for sending frames on a given qdisc, the qdisc spinlock suffers from very high contention. The cpu owning __QDISC_STATE_RUNNING bit has same priority to acquire the lock, and cannot dequeue packets fast enough, since it must wait for this lock for each dequeued packet. One solution to this problem is to force all cpus spinning on a second lock before trying to get the main lock, when/if they see __QDISC_STATE_RUNNING already set. The owning cpu then compete with at most one other cpu for the main lock, allowing for higher dequeueing rate. Based on a previous patch from Alexander Duyck. I added the heuristic to avoid the atomic in fast path, and put the new lock far away from the cache line used by the dequeue worker. Also try to release the busylock lock as late as possible. Tests with following script gave a boost from ~50.000 pps to ~600.000 pps on a dual quad core machine (E5450 @3.00GHz), tg3 driver. (A single netperf flow can reach ~800.000 pps on this platform) for j in `seq 0 3`; do for i in `seq 0 7`; do netperf -H 192.168.0.1 -t UDP_STREAM -l 60 -N -T $i -- -m 6 & done done Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Acked-by: Alexander Duyck <alexander.h.duyck@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-06-02 05:09:29 -07:00
Eric Dumazet	3711210576	net: QDISC_STATE_RUNNING dont need atomic bit ops __QDISC_STATE_RUNNING is always changed while qdisc lock is held. We can avoid two atomic operations in xmit path, if we move this bit in a new __state container. Location of this __state container is carefully chosen so that fast path only dirties one qdisc cache line. THROTTLED bit could later be moved into this __state location too, to avoid dirtying first qdisc cache line. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-06-02 03:24:13 -07:00
Eric Dumazet	bc135b23d0	net: Define accessors to manipulate QDISC_STATE_RUNNING Define three helpers to manipulate QDISC_STATE_RUNNIG flag, that a second patch will move on another location. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-06-02 03:23:51 -07:00
Eric Dumazet	b1faf56664	net: sock_queue_err_skb() dont mess with sk_forward_alloc Correct sk_forward_alloc handling for error_queue would need to use a backlog of frames that softirq handler could not deliver because socket is owned by user thread. Or extend backlog processing to be able to process normal and error packets. Another possibility is to not use mem charge for error queue, this is what I implemented in this patch. Note: this reverts commit `29030374` (net: fix sk_forward_alloc corruptions), since we dont need to lock socket anymore. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-05-31 23:44:05 -07:00
Linus Torvalds	72da3bc0cb	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6 * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (22 commits) netlink: bug fix: wrong size was calculated for vfinfo list blob netlink: bug fix: don't overrun skbs on vf_port dump xt_tee: use skb_dst_drop() netdev/fec: fix ifconfig eth0 down hang issue cnic: Fix context memory init. on 5709. drivers/net: Eliminate a NULL pointer dereference drivers/net/hamradio: Eliminate a NULL pointer dereference be2net: Patch removes redundant while statement in loop. ipv6: Add GSO support on forwarding path net: fix __neigh_event_send() vhost: fix the memory leak which will happen when memory_access_ok fails vhost-net: fix to check the return value of copy_to/from_user() correctly vhost: fix to check the return value of copy_to/from_user() correctly vhost: Fix host panic if ioctl called with wrong index net: fix lock_sock_bh/unlock_sock_bh net/iucv: Add missing spin_unlock net: ll_temac: fix checksum offload logic net: ll_temac: fix interrupt bug when interrupt 0 is used sctp: dubious bitfields in sctp_transport ipmr: off by one in __ipmr_fill_mroute() ...	2010-05-28 10:18:40 -07:00
Eric Dumazet	8a74ad60a5	net: fix lock_sock_bh/unlock_sock_bh This new sock lock primitive was introduced to speedup some user context socket manipulation. But it is unsafe to protect two threads, one using regular lock_sock/release_sock, one using lock_sock_bh/unlock_sock_bh This patch changes lock_sock_bh to be careful against 'owned' state. If owned is found to be set, we must take the slow path. lock_sock_bh() now returns a boolean to say if the slow path was taken, and this boolean is used at unlock_sock_bh time to call the appropriate unlock function. After this change, BH are either disabled or enabled during the lock_sock_bh/unlock_sock_bh protected section. This might be misleading, so we rename these functions to lock_sock_fast()/unlock_sock_fast(). Reported-by: Anton Blanchard <anton@samba.org> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Tested-by: Anton Blanchard <anton@samba.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-05-27 00:30:53 -07:00
Dan Carpenter	ff937938e7	sctp: dubious bitfields in sctp_transport Sparse complains because these one-bit bitfields are signed. include/net/sctp/structs.h:879:24: error: dubious one-bit signed bitfield include/net/sctp/structs.h:889:31: error: dubious one-bit signed bitfield include/net/sctp/structs.h:895:26: error: dubious one-bit signed bitfield include/net/sctp/structs.h:898:31: error: dubious one-bit signed bitfield include/net/sctp/structs.h:901:27: error: dubious one-bit signed bitfield It doesn't cause a problem in the current code, but it would be better to clean it up. This was introduced by c0058a35aacc7: "sctp: Save some room in the sctp_transport by using bitfields". Signed-off-by: Dan Carpenter <error27@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-05-26 00:40:11 -07:00
Herbert Xu	ea16f912a6	cls_cgroup: Initialise classid when module is absent When the cls_cgroup module is not loaded, task_cls_classid will return an uninitialised classid instead of zero. Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-05-25 18:53:57 -07:00
Linus Torvalds	b1cdc4670b	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6 * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (63 commits) drivers/net/usb/asix.c: Fix pointer cast. be2net: Bug fix to avoid disabling bottom half during firmware upgrade. proc_dointvec: write a single value hso: add support for new products Phonet: fix potential use-after-free in pep_sock_close() ath9k: remove VEOL support for ad-hoc ath9k: change beacon allocation to prefer the first beacon slot sock.h: fix kernel-doc warning cls_cgroup: Fix build error when built-in macvlan: do proper cleanup in macvlan_common_newlink() V2 be2net: Bug fix in init code in probe net/dccp: expansion of error code size ath9k: Fix rx of mcast/bcast frames in PS mode with auto sleep wireless: fix sta_info.h kernel-doc warnings wireless: fix mac80211.h kernel-doc warnings iwlwifi: testing the wrong variable in iwl_add_bssid_station() ath9k_htc: rare leak in ath9k_hif_usb_alloc_tx_urbs() ath9k_htc: dereferencing before check in hif_usb_tx_cb() rt2x00: Fix rt2800usb TX descriptor writing. rt2x00: Fix failed SLEEP->AWAKE and AWAKE->SLEEP transitions. ...	2010-05-25 16:59:51 -07:00
David S. Miller	a261af927d	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-2.6	2010-05-25 13:15:11 -07:00
Alexey Dobriyan	4be929be34	kernel-wide: replace USHORT_MAX, SHORT_MAX and SHORT_MIN with USHRT_MAX, SHRT_MAX and SHRT_MIN - C99 knows about USHRT_MAX/SHRT_MAX/SHRT_MIN, not USHORT_MAX/SHORT_MAX/SHORT_MIN. - Make SHRT_MIN of type s16, not int, for consistency. [akpm@linux-foundation.org: fix drivers/dma/timb_dma.c] [akpm@linux-foundation.org: fix security/keys/keyring.c] Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Acked-by: WANG Cong <xiyou.wangcong@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-05-25 08:07:02 -07:00
Randy Dunlap	acfbe96a30	sock.h: fix kernel-doc warning Fix sock.h kernel-doc warning: Warning(include/net/sock.h:1438): No description found for parameter 'wq' Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-05-24 23:54:18 -07:00
Herbert Xu	937eada45f	cls_cgroup: Fix build error when built-in There is a typo in cgroup_cls_state when cls_cgroup is built-in. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-05-24 23:53:37 -07:00
Randy Dunlap	4e8998f09b	wireless: fix mac80211.h kernel-doc warnings Fix kernel-doc warnings in mac80211.h: Warning(include/net/mac80211.h:838): No description found for parameter 'ap_addr' Warning(include/net/mac80211.h:1726): No description found for parameter 'get_survey' Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>	2010-05-24 15:07:42 -04:00
John W. Linville	3dc3fc52ea	Revert "ath9k: Group Key fix for VAPs" This reverts commit `03ceedea97`. This patch was reported to cause a regression in which connectivity is lost and cannot be reestablished after a suspend/resume cycle. Signed-off-by: John W. Linville <linville@tuxdriver.com>	2010-05-24 14:59:27 -04:00
Linus Torvalds	a69eee4988	Revert "ath9k: Group Key fix for VAPs" This reverts commit `03ceedea97`, since it breaks resume from suspend-to-ram on Rafael's Acer Ferrari One. NetworkManager thinks everything is ok, but it can't connect to the AP to get an IP address after the resume. In fact, it even breaks resume for non-ath9k chipsets: reverting it also fixes Rafael's Toshiba Protege R500 with the iwlagn driver. As Johannes says: "Indeed, this patch needs to be reverted. That mac80211 change is wrong and completely unnecessary." Reported-and-requested-by: Rafael J. Wysocki <rjw@sisk.pl> Acked-by: Johannes Berg <johannes@sipsolutions.net> Cc: Daniel Yingqiang Ma <yma.cool@gmail.com> Cc: John W. Linville <linville@tuxdriver.com> Cc: David Miller <davem@davemloft.net> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-05-24 07:45:43 -07:00
Herbert Xu	f845172531	cls_cgroup: Store classid in struct sock Up until now cls_cgroup has relied on fetching the classid out of the current executing thread. This runs into trouble when a packet processing is delayed in which case it may execute out of another thread's context. Furthermore, even when a packet is not delayed we may fail to classify it if soft IRQs have been disabled, because this scenario is indistinguishable from one where a packet unrelated to the current thread is processed by a real soft IRQ. In fact, the current semantics is inherently broken, as a single skb may be constructed out of the writes of two different tasks. A different manifestation of this problem is when the TCP stack transmits in response of an incoming ACK. This is currently unclassified. As we already have a concept of packet ownership for accounting purposes in the skb->sk pointer, this is a natural place to store the classid in a persistent manner. This patch adds the cls_cgroup classid in struct sock, filling up an existing hole on 64-bit :) The value is set at socket creation time. So all sockets created via socket(2) automatically gains the ID of the thread creating it. Whenever another process touches the socket by either reading or writing to it, we will change the socket classid to that of the process if it has a valid (non-zero) classid. For sockets created on inbound connections through accept(2), we inherit the classid of the original listening socket through sk_clone, possibly preceding the actual accept(2) call. In order to minimise risks, I have not made this the authoritative classid. For now it is only used as a backup when we execute with soft IRQs disabled. Once we're completely happy with its semantics we can use it as the sole classid. Footnote: I have rearranged the error path on cls_group module creation. If we didn't do this, then there is a window where someone could create a tc rule using cls_group before the cgroup subsystem has been registered. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-05-24 00:12:34 -07:00
Sjur Braendeland	7aecf4944f	caif: Bugfix - use standard Linux lists Discovered bug when running high number of parallel connect requests. Replace buggy home brewed list with linux/list.h. Signed-off-by: Sjur Braendeland <sjur.brandeland@stericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-05-23 23:57:41 -07:00
Sripathi Kodi	4681dbdacb	9p: add 9P2000.L rename operation I made a V2 of this patch on top of my patches for VFS switches. All the changes were due to change in some offsets. rename - change name of file or directory size[4] Trename tag[2] fid[4] newdirfid[4] name[s] size[4] Rrename tag[2] The rename message is used to change the name of a file, possibly moving it to a new directory. The 9P wstat message can only rename a file within the same directory. Signed-off-by: Jim Garlick <garlick@llnl.gov> Signed-off-by: Sripathi Kodi <sripathik@in.ibm.com> Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>	2010-05-21 16:44:34 -05:00
Sripathi Kodi	bda8e77520	9p: add 9P2000.L statfs operation I made a V2 of this patch on top of my patches for VFS switches. The change was adding v9fs_statfs pointer to v9fs_super_ops_dotl instead of v9fs_super_ops. statfs - get file system statistics size[4] Tstatfs tag[2] fid[4] size[4] Rstatfs tag[2] type[4] bsize[4] blocks[8] bfree[8] bavail[8] files[8] ffree[8] fsid[8] namelen[4] The statfs message is used to request file system information returned by the statfs(2) system call, which is used by df(1) to report file system and disk space usage. Signed-off-by: Jim Garlick <garlick@llnl.gov> Signed-off-by: Sripathi Kodi <sripathik@in.ibm.com> Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>	2010-05-21 16:44:33 -05:00
David S. Miller	41499bd676	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/kaber/nf-next-2.6	2010-05-20 23:12:18 -07:00
Joerg Marx	fc350777c7	netfilter: nf_conntrack: fix a race in __nf_conntrack_confirm against nf_ct_get_next_corpse() This race was triggered by a 'conntrack -F' command running in parallel to the insertion of a hash for a new connection. Losing this race led to a dead conntrack entry effectively blocking traffic for a particular connection until timeout or flushing the conntrack hashes again. Now the check for an already dying connection is done inside the lock. Signed-off-by: Joerg Marx <joerg.marx@secunet.com> Signed-off-by: Patrick McHardy <kaber@trash.net>	2010-05-20 15:55:30 +02:00
Herbert Xu	e9d3e08497	ipv6: Replace inet6_ifaddr->dead with state This patch replaces the boolean dead flag on inet6_ifaddr with a state enum. This allows us to roll back changes when deleting an address according to whether DAD has completed or not. This patch only adds the state field and does not change the logic. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-05-18 15:36:06 -07:00
Eric Dumazet	d19d56ddc8	net: Introduce skb_tunnel_rx() helper skb rxhash should be cleared when a skb is handled by a tunnel before being delivered again, so that correct packet steering can take place. There are other cleanups and accounting that we can factorize in a new helper, skb_tunnel_rx() Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-05-17 22:36:55 -07:00
David S. Miller	820ae8a80e	Merge branch 'for-davem' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next-2.6	2010-05-17 21:09:11 -07:00
andrew hendry	37cda78741	X25: Move accept approve flag to bitfield Moves the x25 accept approve flag from char into bitfield. Signed-off-by: Andrew Hendry <andrew.hendry@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-05-17 17:39:27 -07:00
andrew hendry	b7792e34cb	X25: Move interrupt flag to bitfield Moves the x25 interrupt flag from char into bitfield. Signed-off-by: Andrew Hendry <andrew.hendry@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-05-17 17:39:27 -07:00
andrew hendry	cb863ffd4a	X25: Move qbit flag to bitfield Moves the X25 q bit flag from char into a bitfield to allow BKL cleanup. Signed-off-by: Andrew Hendry <andrew.hendry@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-05-17 17:39:26 -07:00
Eric Dumazet	407eadd996	net: implements ip_route_input_noref() ip_route_input() is the version returning a refcounted dst, while ip_route_input_noref() returns a non refcounted one. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-05-17 17:18:51 -07:00
Eric Dumazet	7fee226ad2	net: add a noref bit on skb dst Use low order bit of skb->_skb_dst to tell dst is not refcounted. Change _skb_dst to _skb_refdst to make sure all uses are catched. skb_dst() returns the dst, regardless of noref bit set or not, but with a lockdep check to make sure a noref dst is not given if current user is not rcu protected. New skb_dst_set_noref() helper to set an notrefcounted dst on a skb. (with lockdep check) skb_dst_drop() drops a reference only if skb dst was refcounted. skb_dst_force() helper is used to force a refcount on dst, when skb is queued and not anymore RCU protected. Use skb_dst_force() in __sk_add_backlog(), __dev_xmit_skb() if !IFF_XMIT_DST_RELEASE or skb enqueued on qdisc queue, in sock_queue_rcv_skb(), in __nf_queue(). Use skb_dst_force() in dev_requeue_skb(). Note: dst_use_noref() still dirties dst, we might transform it later to do one dirtying per jiffies. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-05-17 17:18:50 -07:00
John W. Linville	6fe70aae0d	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next-2.6 into for-davem	2010-05-17 13:57:43 -04:00
David S. Miller	6811d58fc1	Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 Conflicts: include/linux/if_link.h	2010-05-16 22:26:58 -07:00
Eric Dumazet	a465419b1f	net: Introduce sk_route_nocaps TCP-MD5 sessions have intermittent failures, when route cache is invalidated. ip_queue_xmit() has to find a new route, calls sk_setup_caps(sk, &rt->u.dst), destroying the sk->sk_route_caps &= ~NETIF_F_GSO_MASK that MD5 desperately try to make all over its way (from tcp_transmit_skb() for example) So we send few bad packets, and everything is fine when tcp_transmit_skb() is called again for this socket. Since ip_queue_xmit() is at a lower level than TCP-MD5, I chose to use a socket field, sk_route_nocaps, containing bits to mask on sk_route_caps. Reported-by: Bhaskar Dutta <bhaskie@gmail.com> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-05-16 00:36:33 -07:00
Eric Dumazet	35790c0421	tcp: fix MD5 (RFC2385) support TCP MD5 support uses percpu data for temporary storage. It currently disables preemption so that same storage cannot be reclaimed by another thread on same cpu. We also have to make sure a softirq handler wont try to use also same context. Various bug reports demonstrated corruptions. Fix is to disable preemption and BH. Reported-by: Bhaskar Dutta <bhaskie@gmail.com> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-05-16 00:34:04 -07:00
Amerigo Wang	e3826f1e94	net: reserve ports for applications using fixed port numbers (Dropped the infiniband part, because Tetsuo modified the related code, I will send a separate patch for it once this is accepted.) This patch introduces /proc/sys/net/ipv4/ip_local_reserved_ports which allows users to reserve ports for third-party applications. The reserved ports will not be used by automatic port assignments (e.g. when calling connect() or bind() with port number 0). Explicit port allocation behavior is unchanged. Signed-off-by: Octavian Purdila <opurdila@ixiacom.com> Signed-off-by: WANG Cong <amwang@redhat.com> Cc: Neil Horman <nhorman@tuxdriver.com> Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-05-15 23:28:40 -07:00
David S. Miller	bf47f4b0ba	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/kaber/ipmr-2.6	2010-05-12 23:30:45 -07:00
Allan Stephens	8e1c298c01	tipc: Update commenting in TIPC API Eliminate comments in TIPC's main API files that are either obsolete, incorrect, misleading, or unhelpful. It also adds in one new comment. Signed-off-by: Allan Stephens <allan.stephens@windriver.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-05-12 23:02:23 -07:00
Johannes Berg	5ce6e438d5	mac80211: add offload channel switch support This adds support for offloading the channel switch operation to devices that support such, typically by having specific firmware API for it. The reasons for this could be that the firmware provides better timing or that regulatory enforcement done by the device requires special handling of CSAs. In order to allow drivers to specify the timing to the device, the new channel_switch callback will pass through the received frame's mactime, where available. Signed-off-by: Wey-Yi Guy <wey-yi.w.guy@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>	2010-05-12 16:39:05 -04:00
David S. Miller	278554bd65	Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 Conflicts: Documentation/feature-removal-schedule.txt drivers/net/wireless/ath/ar9170/usb.c drivers/scsi/iscsi_tcp.c net/ipv4/ipmr.c	2010-05-12 00:05:35 -07:00
John W. Linville	cc755896a4	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next-2.6 into for-davem Conflicts: drivers/net/wireless/ath/ar9170/main.c	2010-05-11 14:24:55 -04:00
Patrick McHardy	d1db275dd3	ipv6: ip6mr: support multiple tables This patch adds support for multiple independant multicast routing instances, named "tables". Userspace multicast routing daemons can bind to a specific table instance by issuing a setsockopt call using a new option MRT6_TABLE. The table number is stored in the raw socket data and affects all following ip6mr setsockopt(), getsockopt() and ioctl() calls. By default, a single table (RT6_TABLE_DFLT) is created with a default routing rule pointing to it. Newly created pim6reg devices have the table number appended ("pim6regX"), with the exception of devices created in the default table, which are named just "pim6reg" for compatibility reasons. Packets are directed to a specific table instance using routing rules, similar to how regular routing rules work. Currently iif, oif and mark are supported as keys, source and destination addresses could be supported additionally. Example usage: - bind pimd/xorp/... to a specific table: uint32_t table = 123; setsockopt(fd, SOL_IPV6, MRT6_TABLE, &table, sizeof(table)); - create routing rules directing packets to the new table: # ip -6 mrule add iif eth0 lookup 123 # ip -6 mrule add oif eth0 lookup 123 Signed-off-by: Patrick McHardy <kaber@trash.net>	2010-05-11 14:40:55 +02:00
Patrick McHardy	6bd5214339	ipv6: ip6mr: move mroute data into seperate structure Signed-off-by: Patrick McHardy <kaber@trash.net>	2010-05-11 14:40:53 +02:00
Patrick McHardy	f30a778421	ipv6: ip6mr: convert struct mfc_cache to struct list_head Signed-off-by: Patrick McHardy <kaber@trash.net>	2010-05-11 14:40:51 +02:00
Patrick McHardy	c476efbcde	ipv6: ip6mr: move unres_queue and timer to per-namespace data The unres_queue is currently shared between all namespaces. Following patches will additionally allow to create multiple multicast routing tables in each namespace. Having a single shared queue for all these users seems to excessive, move the queue and the cleanup timer to the per-namespace data to unshare it. As a side-effect, this fixes a bug in the seq file iteration functions: the first entry returned is always from the current namespace, entries returned after that may belong to any namespace. Signed-off-by: Patrick McHardy <kaber@trash.net>	2010-05-11 14:40:48 +02:00
David S. Miller	d250fe91ae	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/kaber/nf-next-2.6	2010-05-10 23:03:26 -07:00
Patrick McHardy	1e4b105712	Merge branch 'master' of /repos/git/net-next-2.6 Conflicts: net/bridge/br_device.c net/bridge/br_forward.c Signed-off-by: Patrick McHardy <kaber@trash.net>	2010-05-10 18:39:28 +02:00
Marcel Holtmann	f48fd9c8cd	Bluetooth: Create per controller workqueue Instead of having a global workqueue for all controllers, it makes more sense to have a workqueue per controller. Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2010-05-10 09:34:03 +02:00
Gustavo F. Padovan	dfc909befb	Bluetooth: Fix race condition on l2cap_ertm_send() l2cap_ertm_send() can be called both from user context and bottom half context. The socket locks for that contexts are different, the user context uses a mutex(which can sleep) and the second one uses a spinlock_bh. That creates a race condition when we have interruptions on both contexts at the same time. The better way to solve this is to add a new spinlock to lock l2cap_ertm_send() and the vars it access. The other solution was to defer l2cap_ertm_send() with a workqueue, but we the sending process already has one defer on the hci layer. It's not a good idea add another one. The patch refactor the code to create l2cap_retransmit_frames(), then we encapulate the lock of l2cap_ertm_send() for some call. It also changes l2cap_retransmit_frame() to l2cap_retransmit_one_frame() to avoid confusion Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi> Reviewed-by: João Paulo Rechi Vita <jprvita@profusion.mobi> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2010-05-10 09:28:53 +02:00
Gustavo F. Padovan	1890d36bb5	Bluetooth: Implement Local Busy Condition handling Supports Local Busy condition handling through a waitqueue that wake ups each 200ms and try to push the packets to the upper layer. If it can push all the queue then it leaves the Local Busy state. The patch modifies the behaviour of l2cap_ertm_reassembly_sdu() to support retry of the push operation. Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi> Reviewed-by: João Paulo Rechi Vita <jprvita@profusion.mobi> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2010-05-10 09:28:53 +02:00
Gustavo F. Padovan	9a9c6a3441	Bluetooth: Make hci_send_acl() void hci_send_acl can't fail, so we can make it void. This patch changes that and all the funcions that use hci_send_acl(). That change exposed a bug on sending connectionless data. We were not reporting the lenght send back to the user space. Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi> Reviewed-by: João Paulo Rechi Vita <jprvita@profusion.mobi> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2010-05-10 09:28:52 +02:00
Gustavo F. Padovan	68d7f0ce91	Bluetooth: Enable option to configure Max Transmission value via sockopt With the sockopt extension we can set a per-channel MaxTx value. Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi> Reviewed-by: João Paulo Rechi Vita <jprvita@profusion.mobi> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2010-05-10 09:28:50 +02:00
Gustavo F. Padovan	803020c6fa	Bluetooth: Change acknowledgement to use the value of txWindow Now that we can set the txWindow we need to change the acknowledgement procedure to ack after each (pi->txWindow/6 + 1). The plus 1 is to avoid the zero value. It also renames pi->num_to_ack to a better name: pi->num_acked. Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi> Reviewed-by: João Paulo Rechi Vita <jprvita@profusion.mobi> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2010-05-10 09:28:50 +02:00
Gustavo F. Padovan	14b5aa71ec	Bluetooth: Add sockopt configuration for txWindow on L2CAP Now we can set/get Transmission Window size via sockopt. Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi> Reviewed-by: João Paulo Rechi Vita <jprvita@profusion.mobi> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2010-05-10 09:28:49 +02:00
Gustavo F. Padovan	1c7621596d	Bluetooth: Fix configuration of the MPS value We were accepting values bigger than we can accept. This was leading ERTM to drop packets because of wrong FCS checks. Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi> Reviewed-by: João Paulo Rechi Vita <jprvita@profusion.mobi> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2010-05-10 09:28:48 +02:00
Gustavo F. Padovan	c1b4f43be0	Bluetooth: Add timer to Acknowledge I-frames We ack I-frames on each txWindow/5 I-frames received, but if the sender stop to send I-frames and it's not a txWindow multiple we can leave some frames unacked. So I added a timer to ack I-frames on this case. The timer expires in 200ms. Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi> Reviewed-by: João Paulo Rechi Vita <jprvita@profusion.mobi> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2010-05-10 09:28:48 +02:00
Gustavo F. Padovan	d5392c8f1e	Bluetooth: Implement 'Send IorRRorRNR' event After receive a RR with P bit set ERTM shall use this funcion to choose what type of frame to reply with F bit = 1. Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi> Reviewed-by: João Paulo Rechi Vita <jprvita@profusion.mobi> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2010-05-10 09:28:46 +02:00
Gustavo F. Padovan	0d861d8b8e	Bluetooth: Make hci_send_sco() void It also removes an unneeded check for the MTU. The check is done before on sco_send_frame() Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi> Reviewed-by: João Paulo Rechi Vita <jprvita@profusion.mobi> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2010-05-10 09:28:45 +02:00
Neil Horman	3ee943728f	ipv4: remove ip_rt_secret timer (v4) A while back there was a discussion regarding the rt_secret_interval timer. Given that we've had the ability to do emergency route cache rebuilds for awhile now, based on a statistical analysis of the various hash chain lengths in the cache, the use of the flush timer is somewhat redundant. This patch removes the rt_secret_interval sysctl, allowing us to rely solely on the statistical analysis mechanism to determine the need for route cache flushes. Signed-off-by: Neil Horman <nhorman@tuxdriver.com> Acked-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-05-08 01:57:52 -07:00
Johannes Berg	0aaffa9b96	mac80211: improve HT channel handling Currently, when one interface switches HT mode, all others will follow along. This is clearly undesirable, since the new one might switch to no-HT while another one is operating in HT. Address this issue by keeping track of the HT mode per interface, and allowing only changes that are compatible, i.e. switching into HT40+ is not possible when another interface is in HT40-, in that case the second one needs to fall back to HT20. Also, to allow drivers to know what's going on, store the per-interface HT mode (channel type) in the virtual interface's bss_conf. Signed-off-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: John W. Linville <linville@tuxdriver.com>	2010-05-07 14:55:51 -04:00
Johannes Berg	f444de05d2	cfg80211/mac80211: better channel handling Currently (all tested with hwsim) you can do stupid things like setting up an AP on a certain channel, then adding another virtual interface and making that associate on another channel -- this will make the beaconing to move channel but obviously without the necessary IEs data update. In order to improve this situation, first make the configuration APIs (cfg80211 and nl80211) aware of multi-channel operation -- we'll eventually need that in the future anyway. There's one userland API change and one API addition. The API change is that now SET_WIPHY must be called with virtual interface index rather than only wiphy index in order to take effect for that interface -- luckily all current users (hostapd) do that. For monitor interfaces, the old setting is preserved, but monitors are always slaved to other devices anyway so no guarantees. The second userland API change is the introduction of a per virtual interface SET_CHANNEL command, that hostapd should use going forward to make it easier to understand what's going on (it can automatically detect a kernel with this command). Other than mac80211, no existing cfg80211 drivers are affected by this change because they only allow a single virtual interface. mac80211, however, now needs to be aware that the channel settings are per interface now, and needs to disallow (for now) real multi-channel operation, which is another important part of this patch. One of the immediate benefits is that you can now start hostapd to operate on a hardware that already has a connection on another virtual interface, as long as you specify the same channel. Note that two things are left unhandled (this is an improvement -- not a complete fix): * different HT/no-HT modes currently you could start an HT AP and then connect to a non-HT network on the same channel which would configure the hardware for no HT; that can be fixed fairly easily * CSA An AP we're connected to on a virtual interface might indicate switching channels, and in that case we would follow it, regardless of how many other interfaces are operating; this requires more effort to fix but is pretty rare after all Signed-off-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: John W. Linville <linville@tuxdriver.com>	2010-05-07 14:55:50 -04:00
Johannes Berg	ac8dd506e4	mac80211: fix BSS info reconfiguration When reconfiguring an interface due to a previous hardware restart, mac80211 will currently include the new IBSS flag on non-IBSS interfaces which may confuse drivers. Instead of doing the ~0 trick, simply spell out which things are going to be reconfigured. Signed-off-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: John W. Linville <linville@tuxdriver.com>	2010-05-07 14:55:49 -04:00
Vlad Yasevich	50b5d6ad63	sctp: Fix a race between ICMP protocol unreachable and connect() ICMP protocol unreachable handling completely disregarded the fact that the user may have locked the socket. It proceeded to destroy the association, even though the user may have held the lock and had a ref on the association. This resulted in the following: Attempt to release alive inet socket f6afcc00 ========================= [ BUG: held lock freed! ] ------------------------- somenu/2672 is freeing memory f6afcc00-f6afcfff, with a lock still held there! (sk_lock-AF_INET){+.+.+.}, at: [<c122098a>] sctp_connect+0x13/0x4c 1 lock held by somenu/2672: #0: (sk_lock-AF_INET){+.+.+.}, at: [<c122098a>] sctp_connect+0x13/0x4c stack backtrace: Pid: 2672, comm: somenu Not tainted 2.6.32-telco #55 Call Trace: [<c1232266>] ? printk+0xf/0x11 [<c1038553>] debug_check_no_locks_freed+0xce/0xff [<c10620b4>] kmem_cache_free+0x21/0x66 [<c1185f25>] __sk_free+0x9d/0xab [<c1185f9c>] sk_free+0x1c/0x1e [<c1216e38>] sctp_association_put+0x32/0x89 [<c1220865>] __sctp_connect+0x36d/0x3f4 [<c122098a>] ? sctp_connect+0x13/0x4c [<c102d073>] ? autoremove_wake_function+0x0/0x33 [<c12209a8>] sctp_connect+0x31/0x4c [<c11d1e80>] inet_dgram_connect+0x4b/0x55 [<c11834fa>] sys_connect+0x54/0x71 [<c103a3a2>] ? lock_release_non_nested+0x88/0x239 [<c1054026>] ? might_fault+0x42/0x7c [<c1054026>] ? might_fault+0x42/0x7c [<c11847ab>] sys_socketcall+0x6d/0x178 [<c10da994>] ? trace_hardirqs_on_thunk+0xc/0x10 [<c1002959>] syscall_call+0x7/0xb This was because the sctp_wait_for_connect() would aqcure the socket lock and then proceed to release the last reference count on the association, thus cause the fully destruction path to finish freeing the socket. The simplest solution is to start a very short timer in case the socket is owned by user. When the timer expires, we can do some verification and be able to do the release properly. Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-05-06 00:56:07 -07:00
John W. Linville	83163244f8	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next-2.6 into for-davem Conflicts: drivers/net/wireless/libertas_tf/cmd.c drivers/net/wireless/libertas_tf/main.c	2010-05-05 16:14:16 -04:00
David S. Miller	f546061840	Merge branch 'net-next' of git://git.kernel.org/pub/scm/linux/kernel/git/vxy/lksctp-dev Add missing linux/vmalloc.h include to net/sctp/probe.c Signed-off-by: David S. Miller <davem@davemloft.net>	2010-05-03 16:24:31 -07:00
David S. Miller	7ef527377b	Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6	2010-05-02 22:02:06 -07:00
Jan Engelhardt	1183f3838c	net: fix compile error due to double return type in SOCK_DEBUG Fix this one: include/net/sock.h: error: two or more data types in declaration specifiers Signed-off-by: Jan Engelhardt <jengelh@medozas.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-05-02 13:42:39 -07:00
Eric Dumazet	4381548237	net: sock_def_readable() and friends RCU conversion sk_callback_lock rwlock actually protects sk->sk_sleep pointer, so we need two atomic operations (and associated dirtying) per incoming packet. RCU conversion is pretty much needed : 1) Add a new structure, called "struct socket_wq" to hold all fields that will need rcu_read_lock() protection (currently: a wait_queue_head_t and a struct fasync_struct pointer). [Future patch will add a list anchor for wakeup coalescing] 2) Attach one of such structure to each "struct socket" created in sock_alloc_inode(). 3) Respect RCU grace period when freeing a "struct socket_wq" 4) Change sk_sleep pointer in "struct sock" by sk_wq, pointer to "struct socket_wq" 5) Change sk_sleep() function to use new sk->sk_wq instead of sk->sk_sleep 6) Change sk_has_sleeper() to wq_has_sleeper() that must be used inside a rcu_read_lock() section. 7) Change all sk_has_sleeper() callers to : - Use rcu_read_lock() instead of read_lock(&sk->sk_callback_lock) - Use wq_has_sleeper() to eventually wakeup tasks. - Use rcu_read_unlock() instead of read_unlock(&sk->sk_callback_lock) 8) sock_wake_async() is modified to use rcu protection as well. 9) Exceptions : macvtap, drivers/net/tun.c, af_unix use integrated "struct socket_wq" instead of dynamically allocated ones. They dont need rcu freeing. Some cleanups or followups are probably needed, (possible sk_callback_lock conversion to a spinlock for example...). Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-05-01 15:00:15 -07:00
Vlad Yasevich	0e3aef8d09	sctp: Tag messages that can be Nagle delayed at creation. When we create the sctp_datamsg and fragment the user data, we know exactly if we are sending full segments or not and how they might be bundled. During this time, we can mark messages a Nagle capable or not. This makes the check at transmit time much simpler. Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>	2010-04-30 22:41:10 -04:00
Vlad Yasevich	cf9b4812e1	sctp: fast recovery algorithm is per association. SCTP fast recovery algorithm really applies per association and impacts all transports. Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>	2010-04-30 22:41:10 -04:00
Vlad Yasevich	b2cf9b6bd9	sctp: update transport initializations Right now, sctp transports are not fully initialized and when adding any new fields, they have to be explicitely initialized. This is prone to mistakes. So we switch to calling kzalloc() which makes things much simpler. Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>	2010-04-30 22:41:10 -04:00
Vlad Yasevich	c0058a35aa	sctp: Save some room in the sctp_transport by using bitfields Saves some room in the sctp_transport structure. Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>	2010-04-30 22:41:09 -04:00
Vlad Yasevich	ae19c54866	sctp: remove 'resent' bit from the chunk The 'resent' bit is used to make sure that we don't update rto estimate based on retransmitted chunks. However, we already have the 'rto_pending' bit that we test when need to update rto, so 'resent' bit is just extra. Additionally, we currently have a bug in that we always set a 'resent' bit and thus rto estimate is only updated by Heartbeats. Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>	2010-04-30 22:41:09 -04:00
Wei Yongjun	52688d6ec9	sctp: discard ABORT chunk with zero verification tag in COOKIE-WAIT state In current implementation if ABORT chunk is received with T flag is set and zero verification tag in COOKIE-WAIT state, the ABORT chunk will be always accepted. This is because in COOKIE-WAIT state, the endpoint does not know the peer's verification tag, and it's zero in the endpoint. Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com> Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>	2010-04-30 21:42:44 -04:00
Eric Dumazet	767dd03369	net: speedup sock_recv_ts_and_drops() sock_recv_ts_and_drops() is fat and slow (~ 4% of cpu time on some profiles) We can test all socket flags at once to make fast path fast again. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-04-30 16:29:42 -07:00
John W. Linville	f5c044e53a	mac80211: remove deprecated noise field from ieee80211_rx_status Also remove associated IEEE80211_HW_NOISE_DBM from ieee80211_hw_flags. Signed-off-by: John W. Linville <linville@tuxdriver.com>	2010-04-30 15:38:13 -04:00
Eric Dumazet	f84af32cbc	net: ip_queue_rcv_skb() helper When queueing a skb to socket, we can immediately release its dst if target socket do not use IP_CMSG_PKTINFO. tcp_data_queue() can drop dst too. This to benefit from a hot cache line and avoid the receiver, possibly on another cpu, to dirty this cache line himself. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-04-28 15:31:51 -07:00
Eric Dumazet	4b0b72f7dd	net: speedup udp receive path Since commit `95766fff` ([UDP]: Add memory accounting.), each received packet needs one extra sock_lock()/sock_release() pair. This added latency because of possible backlog handling. Then later, ticket spinlocks added yet another latency source in case of DDOS. This patch introduces lock_sock_bh() and unlock_sock_bh() synchronization primitives, avoiding one atomic operation and backlog processing. skb_free_datagram_locked() uses them instead of full blown lock_sock()/release_sock(). skb is orphaned inside locked section for proper socket memory reclaim, and finally freed outside of it. UDP receive path now take the socket spinlock only once. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-04-28 14:35:48 -07:00
Neil Horman	5fa782c2f5	sctp: Fix skb_over_panic resulting from multiple invalid parameter errors (CVE-2010-1173) (v4) Ok, version 4 Change Notes: 1) Minor cleanups, from Vlads notes Summary: Hey- Recently, it was reported to me that the kernel could oops in the following way: <5> kernel BUG at net/core/skbuff.c:91! <5> invalid operand: 0000 [#1] <5> Modules linked in: sctp netconsole nls_utf8 autofs4 sunrpc iptable_filter ip_tables cpufreq_powersave parport_pc lp parport vmblock(U) vsock(U) vmci(U) vmxnet(U) vmmemctl(U) vmhgfs(U) acpiphp dm_mirror dm_mod button battery ac md5 ipv6 uhci_hcd ehci_hcd snd_ens1371 snd_rawmidi snd_seq_device snd_pcm_oss snd_mixer_oss snd_pcm snd_timer snd_page_alloc snd_ac97_codec snd soundcore pcnet32 mii floppy ext3 jbd ata_piix libata mptscsih mptsas mptspi mptscsi mptbase sd_mod scsi_mod <5> CPU: 0 <5> EIP: 0060:[<c02bff27>] Not tainted VLI <5> EFLAGS: 00010216 (2.6.9-89.0.25.EL) <5> EIP is at skb_over_panic+0x1f/0x2d <5> eax: 0000002c ebx: c033f461 ecx: c0357d96 edx: c040fd44 <5> esi: c033f461 edi: df653280 ebp: 00000000 esp: c040fd40 <5> ds: 007b es: 007b ss: 0068 <5> Process swapper (pid: 0, threadinfo=c040f000 task=c0370be0) <5> Stack: c0357d96 e0c29478 00000084 00000004 c033f461 df653280 d7883180 e0c2947d <5> 00000000 00000080 df653490 00000004 de4f1ac0 de4f1ac0 00000004 df653490 <5> 00000001 e0c2877a 08000800 de4f1ac0 df653490 00000000 e0c29d2e 00000004 <5> Call Trace: <5> [<e0c29478>] sctp_addto_chunk+0xb0/0x128 [sctp] <5> [<e0c2947d>] sctp_addto_chunk+0xb5/0x128 [sctp] <5> [<e0c2877a>] sctp_init_cause+0x3f/0x47 [sctp] <5> [<e0c29d2e>] sctp_process_unk_param+0xac/0xb8 [sctp] <5> [<e0c29e90>] sctp_verify_init+0xcc/0x134 [sctp] <5> [<e0c20322>] sctp_sf_do_5_1B_init+0x83/0x28e [sctp] <5> [<e0c25333>] sctp_do_sm+0x41/0x77 [sctp] <5> [<c01555a4>] cache_grow+0x140/0x233 <5> [<e0c26ba1>] sctp_endpoint_bh_rcv+0xc5/0x108 [sctp] <5> [<e0c2b863>] sctp_inq_push+0xe/0x10 [sctp] <5> [<e0c34600>] sctp_rcv+0x454/0x509 [sctp] <5> [<e084e017>] ipt_hook+0x17/0x1c [iptable_filter] <5> [<c02d005e>] nf_iterate+0x40/0x81 <5> [<c02e0bb9>] ip_local_deliver_finish+0x0/0x151 <5> [<c02e0c7f>] ip_local_deliver_finish+0xc6/0x151 <5> [<c02d0362>] nf_hook_slow+0x83/0xb5 <5> [<c02e0bb2>] ip_local_deliver+0x1a2/0x1a9 <5> [<c02e0bb9>] ip_local_deliver_finish+0x0/0x151 <5> [<c02e103e>] ip_rcv+0x334/0x3b4 <5> [<c02c66fd>] netif_receive_skb+0x320/0x35b <5> [<e0a0928b>] init_stall_timer+0x67/0x6a [uhci_hcd] <5> [<c02c67a4>] process_backlog+0x6c/0xd9 <5> [<c02c690f>] net_rx_action+0xfe/0x1f8 <5> [<c012a7b1>] __do_softirq+0x35/0x79 <5> [<c0107efb>] handle_IRQ_event+0x0/0x4f <5> [<c01094de>] do_softirq+0x46/0x4d Its an skb_over_panic BUG halt that results from processing an init chunk in which too many of its variable length parameters are in some way malformed. The problem is in sctp_process_unk_param: if (NULL == errp) errp = sctp_make_op_error_space(asoc, chunk, ntohs(chunk->chunk_hdr->length)); if (errp) { sctp_init_cause(errp, SCTP_ERROR_UNKNOWN_PARAM, WORD_ROUND(ntohs(param.p->length))); sctp_addto_chunk(errp, WORD_ROUND(ntohs(param.p->length)), param.v); When we allocate an error chunk, we assume that the worst case scenario requires that we have chunk_hdr->length data allocated, which would be correct nominally, given that we call sctp_addto_chunk for the violating parameter. Unfortunately, we also, in sctp_init_cause insert a sctp_errhdr_t structure into the error chunk, so the worst case situation in which all parameters are in violation requires chunk_hdr->length+(sizeof(sctp_errhdr_t)param_count) bytes of data. The result of this error is that a deliberately malformed packet sent to a listening host can cause a remote DOS, described in CVE-2010-1173: http://cve.mitre.org/cgi-bin/cvename.cgi?name=2010-1173 I've tested the below fix and confirmed that it fixes the issue. We move to a strategy whereby we allocate a fixed size error chunk and ignore errors we don't have space to report. Tested by me successfully Signed-off-by: Neil Horman <nhorman@tuxdriver.com> Acked-by: Vlad Yasevich <vladislav.yasevich@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-04-28 14:22:01 -07:00
Johannes Berg	8fc214ba95	mac80211: notify driver about IBSS status Some drivers (e.g. iwlwifi) need to know and try to figure it out based on other things, but making it explicit is definitely better. Signed-off-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: John W. Linville <linville@tuxdriver.com>	2010-04-28 16:50:29 -04:00
Sjur Braendeland	8d545c8f95	caif: Disconnect without waiting for response Changes: o Function cfcnfg_disconn_adapt_layer is changed to do asynchronous disconnect, not waiting for any response from the modem. Due to this the function cfcnfg_linkdestroy_rsp does nothing anymore. o Because disconnect may take down a connection before a connect response is received the function cfcnfg_linkup_rsp is checking if the client is still waiting for the response, if not a disconnect request is sent to the modem. o cfctrl is no longer keeping track of pending disconnect requests. o Added function cfctrl_cancel_req, which is used for deleting a pending connect request if disconnect is done before connect response is received. o Removed unused function cfctrl_insert_req2 o Added better handling of connect reject from modem. Signed-off-by: Sjur Braendeland <sjur.brandeland@stericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-04-28 12:55:13 -07:00
Sjur Braendeland	5b20865675	caif: Add reference counting to service layer Changes: o Added functions cfsrvl_get and cfsrvl_put. o Added support release_client to use by socket and net device. o Increase reference counting for in-flight packets from cfmuxl Signed-off-by: Sjur Braendeland <sjur.brandeland@stericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-04-28 12:55:12 -07:00
Sjur Braendeland	e539d83cc8	caif: Rename functions in cfcnfg and caif_dev Changes: o Renamed cfcnfg_del_adapt_layer to cfcnfg_disconn_adapt_layer o Fixed typo cfcfg to cfcnfg o Renamed linkid to channel_id o Updated documentation in caif_dev.h o Minor formatting changes Signed-off-by: Sjur Braendeland <sjur.brandeland@stericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-04-28 12:55:11 -07:00
Vlad Yasevich	c078669340	sctp: Fix oops when sending queued ASCONF chunks When we finish processing ASCONF_ACK chunk, we try to send the next queued ASCONF. This action runs the sctp state machine recursively and it's not prepared to do so. kernel BUG at kernel/timer.c:790! invalid opcode: 0000 [#1] SMP last sysfs file: /sys/module/ipv6/initstate Modules linked in: sha256_generic sctp libcrc32c ipv6 dm_multipath uinput 8139too i2c_piix4 8139cp mii i2c_core pcspkr virtio_net joydev floppy virtio_blk virtio_pci [last unloaded: scsi_wait_scan] Pid: 0, comm: swapper Not tainted 2.6.34-rc4 #15 /Bochs EIP: 0060:[<c044a2ef>] EFLAGS: 00010286 CPU: 0 EIP is at add_timer+0xd/0x1b EAX: cecbab14 EBX: 000000f0 ECX: c0957b1c EDX: 03595cf4 ESI: cecba800 EDI: cf276f00 EBP: c0957aa0 ESP: c0957aa0 DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 Process swapper (pid: 0, ti=c0956000 task=c0988ba0 task.ti=c0956000) Stack: c0957ae0 d1851214 c0ab62e4 c0ab5f26 0500ffff 00000004 00000005 00000004 <0> 00000000 d18694fd 00000004 1666b892 cecba800 cecba800 c0957b14 00000004 <0> c0957b94 d1851b11 ceda8b00 cecba800 cf276f00 00000001 c0957b14 000000d0 Call Trace: [<d1851214>] ? sctp_side_effects+0x607/0xdfc [sctp] [<d1851b11>] ? sctp_do_sm+0x108/0x159 [sctp] [<d1863386>] ? sctp_pname+0x0/0x1d [sctp] [<d1861a56>] ? sctp_primitive_ASCONF+0x36/0x3b [sctp] [<d185657c>] ? sctp_process_asconf_ack+0x2a4/0x2d3 [sctp] [<d184e35c>] ? sctp_sf_do_asconf_ack+0x1dd/0x2b4 [sctp] [<d1851ac1>] ? sctp_do_sm+0xb8/0x159 [sctp] [<d1863334>] ? sctp_cname+0x0/0x52 [sctp] [<d1854377>] ? sctp_assoc_bh_rcv+0xac/0xe1 [sctp] [<d1858f0f>] ? sctp_inq_push+0x2d/0x30 [sctp] [<d186329d>] ? sctp_rcv+0x797/0x82e [sctp] Tested-by: Wei Yongjun <yjwei@cn.fujitsu.com> Signed-off-by: Yuansong Qiao <ysqiao@research.ait.ie> Signed-off-by: Shuaijun Zhang <szhang@research.ait.ie> Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-04-28 12:16:34 -07:00
Wei Yongjun	561b1733a4	sctp: avoid irq lock inversion while call sk->sk_data_ready() sk->sk_data_ready() of sctp socket can be called from both BH and non-BH contexts, but the default sk->sk_data_ready(), sock_def_readable(), can not be used in this case. Therefore, we have to make a new function sctp_data_ready() to grab sk->sk_data_ready() with BH disabling. ========================================================= [ INFO: possible irq lock inversion dependency detected ] 2.6.33-rc6 #129 --------------------------------------------------------- sctp_darn/1517 just changed the state of lock: (clock-AF_INET){++.?..}, at: [<c06aab60>] sock_def_readable+0x20/0x80 but this lock took another, SOFTIRQ-unsafe lock in the past: (slock-AF_INET){+.-...} and interrupts could create inverse lock ordering between them. other info that might help us debug this: 1 lock held by sctp_darn/1517: #0: (sk_lock-AF_INET){+.+.+.}, at: [<cdfe363d>] sctp_sendmsg+0x23d/0xc00 [sctp] Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com> Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-04-28 12:16:31 -07:00
Jiri Pirko	05fceb4ad7	net: disallow to use net_assign_generic externally Now there's no need to use this fuction directly because it's handled by register_pernet_device. So to make this simple and easy to understand, make this static to do not tempt potentional users. Signed-off-by: Jiri Pirko <jpirko@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-04-27 15:49:02 -07:00
Eric Dumazet	c377411f24	net: sk_add_backlog() take rmem_alloc into account Current socket backlog limit is not enough to really stop DDOS attacks, because user thread spend many time to process a full backlog each round, and user might crazy spin on socket lock. We should add backlog size and receive_queue size (aka rmem_alloc) to pace writers, and let user run without being slow down too much. Introduce a sk_rcvqueues_full() helper, to avoid taking socket lock in stress situations. Under huge stress from a multiqueue/RPS enabled NIC, a single flow udp receiver can now process ~200.000 pps (instead of ~100 pps before the patch) on a 8 core machine. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-04-27 15:13:20 -07:00
David S. Miller	c58dc01bab	net: Make RFS socket operations not be inet specific. Idea from Eric Dumazet. As for placement inside of struct sock, I tried to choose a place that otherwise has a 32-bit hole on 64-bit systems. Signed-off-by: David S. Miller <davem@davemloft.net> Acked-by: Eric Dumazet <eric.dumazet@gmail.com>	2010-04-27 15:11:48 -07:00
Johannes Berg	a060bbfe4e	mac80211: give virtual interface to hw_scan When scanning, it is somewhat important to scan on the correct virtual interface. All drivers that currently implement hw_scan only support a single virtual interface, but that may change and then we'd want to be ready. Signed-off-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: John W. Linville <linville@tuxdriver.com>	2010-04-27 16:09:23 -04:00
Juuso Oikarinen	9043f3b89a	cfg80211: Remove default dynamic PS timeout value Now that the mac80211 is choosing dynamic ps timeouts based on the ps-qos network latency configuration, configure a default value of -1 as the dynamic ps timeout in cfg80211. This value allows the mac80211 to determine the value to be used. Signed-off-by: Juuso Oikarinen <juuso.oikarinen@nokia.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>	2010-04-27 16:09:22 -04:00
Juuso Oikarinen	195e294d21	mac80211: Determine dynamic PS timeout based on ps-qos network latency Determine the dynamic PS timeout based on the configured ps-qos network latency. For backwards wext compatibility, allow the dynamic PS timeout configured by the cfg80211 to overrule the automatically determined value. Signed-off-by: Juuso Oikarinen <juuso.oikarinen@nokia.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>	2010-04-27 16:09:22 -04:00
Felix Fietkau	fd8aaaf351	cfg80211: add ap isolation support This is used to configure APs to not bridge traffic between connected stations. Signed-off-by: Felix Fietkau <nbd@openwrt.org> Signed-off-by: John W. Linville <linville@tuxdriver.com>	2010-04-27 16:09:21 -04:00
David S. Miller	bb61187465	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/kaber/ipmr-2.6	2010-04-27 12:57:39 -07:00
Eric Dumazet	0b53ff2ead	net: fix a lockdep rcu warning in __sk_dst_set() __sk_dst_set() might be called while no state can be integrated in a rcu_dereference_check() condition. So use rcu_dereference_raw() to shutup lockdep warnings (if CONFIG_PROVE_RCU is set) Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-04-27 12:53:26 -07:00
Eric Dumazet	18f9f1365d	rps: inet_rps_save_rxhash() argument is not const const qualifier on sock argument is misleading, since we can modify rxhash. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-04-27 12:53:25 -07:00
Flavio Leitner	6c37e5de45	TCP: avoid to send keepalive probes if receiving data RFC 1122 says the following: ... Keep-alive packets MUST only be sent when no data or acknowledgement packets have been received for the connection within an interval. ... The acknowledgement packet is reseting the keepalive timer but the data packet isn't. This patch fixes it by checking the timestamp of the last received data packet too when the keepalive timer expires. Signed-off-by: Flavio Leitner <fleitner@redhat.com> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Acked-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-04-27 12:53:25 -07:00
Paul E. McKenney	7ec75c582e	net: suppress RCU lockdep false positive in twsk_net() Calls to twsk_net() are in some cases protected by reference counting as an alternative to RCU protection. Cases covered by reference counts include __inet_twsk_kill(), inet_twsk_free(), inet_twdr_do_twkill_work(), inet_twdr_twcal_tick(), and tcp_timewait_state_process(). RCU is used by inet_twsk_purge(). Locking is used by established_get_first() and established_get_next(). Finally, __inet_twsk_hashdance() is an initialization case. It appears to be non-trivial to locate the appropriate locks and reference counts from within twsk_net(), so used rcu_dereference_raw(). Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Acked-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-04-27 12:39:01 -07:00
Patrick McHardy	3d0c9c4eb2	net: fib_rules: mark arguments to fib_rules_register const and __net_initdata fib_rules_register() duplicates the template passed to it without modification, mark the argument as const. Additionally the templates are only needed when instantiating a new namespace, so mark them as __net_initdata, which means they can be discarded when CONFIG_NET_NS=n. Signed-off-by: Patrick McHardy <kaber@trash.net>	2010-04-26 16:02:04 +02:00
David S. Miller	b7d6a43211	Merge branch 'net-next-2.6_20100423a/br/br_multicast_v3' of git://git.linux-ipv6.org/gitroot/yoshfuji/linux-2.6-next	2010-04-23 23:37:24 -07:00
Brian Haley	4b340ae20d	IPv6: Complete IPV6_DONTFRAG support Finally add support to detect a local IPV6_DONTFRAG event and return the relevant data to the user if they've enabled IPV6_RECVPATHMTU on the socket. The next recvmsg() will return no data, but have an IPV6_PATHMTU as ancillary data. Signed-off-by: Brian Haley <brian.haley@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-04-23 23:35:29 -07:00

1 2 3 4 5 ...

3332 Commits