Commit Graph

18888 Commits

Author SHA1 Message Date
David S. Miller 3e8c806a08 Revert "tcp: disallow bind() to reuse addr/port"
This reverts commit c191a836a9.

It causes known regressions for programs that expect to be able to use
SO_REUSEADDR to shutdown a socket, then successfully rebind another
socket to the same ID.

Programs such as haproxy and amavisd expect this to work.

This should fix kernel bugzilla 32832.

Signed-off-by: David S. Miller <davem@davemloft.net>
2011-04-13 12:01:14 -07:00
Jozsef Kadlecsik eafbd3fde6 netfilter: ipset: set match and SET target fixes
The SET target with --del-set did not work due to using wrongly
the internal dimension of --add-set instead of --del-set.
Also, the checkentries did not release the set references when
returned an error. Bugs reported by Lennert Buytenhek.

Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2011-04-13 13:45:57 +02:00
Jozsef Kadlecsik 0e8a835aa5 netfilter: ipset: bitmap:ip,mac type requires "src" for MAC
Enforce that the second "src/dst" parameter of the set match and SET target
must be "src", because we have access to the source MAC only in the packet.
The previous behaviour, that the type required the second parameter
but actually ignored the value was counter-intuitive and confusing.

Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2011-04-13 13:43:23 +02:00
Wei Yongjun 9494c7c577 sctp: fix oops while removed transport still using as retran path
Since we can not update retran path to unconfirmed transports,
when we remove a peer, the retran path may not be update if the
other transports are all unconfirmed, and we will still using
the removed transport as the retran path. This may cause panic
if retrasnmit happen.

Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-04-12 19:33:51 -07:00
Vlad Yasevich 25f7bf7d0d sctp: fix oops when updating retransmit path with DEBUG on
commit fbdf501c93
  sctp: Do no select unconfirmed transports for retransmissions

Introduced the initial falt.

commit d598b166ce
  sctp: Make sure we always return valid retransmit path

Solved the problem, but forgot to change the DEBUG statement.
Thus it was still possible to dereference a NULL pointer.

Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-04-12 19:33:50 -07:00
Ben Hutchings 31d8b9e099 net: Disable NETIF_F_TSO_ECN when TSO is disabled
NETIF_F_TSO_ECN has no effect when TSO is disabled; this just means
that feature state will be accurately reported to user-space.

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-04-12 19:29:45 -07:00
Ben Hutchings ea2d36883c net: Disable all TSO features when SG is disabled
The feature flags NETIF_F_TSO and NETIF_F_TSO6 independently enable
TSO for IPv4 and IPv6 respectively.  However, the test in
netdev_fix_features() and its predecessor functions was never updated
to check for NETIF_F_TSO6, possibly because it was originally proposed
that TSO for IPv6 would be dependent on both feature flags.

Now that these feature flags can be changed independently from
user-space and we depend on netdev_fix_features() to fix invalid
feature combinations, it's important to disable them both if
scatter-gather is disabled.  Also disable NETIF_F_TSO_ECN so
user-space sees all TSO features as disabled.

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-04-12 19:29:45 -07:00
David S. Miller bfac3693c4 ieee802154: Remove hacked CFLAGS in net/ieee802154/Makefile
It adds -Wall (which the kernel carefully controls already) and of all
things -DDEBUG (which should be set by other means if desired, please
we have dynamic-debug these days).

Kill this noise.

Reported-by: Dave Jones <davej@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-04-12 15:33:23 -07:00
Dave Jones 020318d0d2 irda: fix locking unbalance in irda_sendmsg
5b40964ead ("irda: Remove BKL instances
from af_irda.c") introduced a path where we have a locking unbalance.
If we pass invalid flags, we unlock a socket we never locked,
resulting in this...

=====================================
[ BUG: bad unlock balance detected! ]
-------------------------------------
trinity/20101 is trying to release lock (sk_lock-AF_IRDA) at:
[<ffffffffa057f001>] irda_sendmsg+0x207/0x21d [irda]
but there are no more locks to release!

other info that might help us debug this:
no locks held by trinity/20101.

stack backtrace:
Pid: 20101, comm: trinity Not tainted 2.6.39-rc3+ #3
Call Trace:
 [<ffffffffa057f001>] ? irda_sendmsg+0x207/0x21d [irda]
 [<ffffffff81085041>] print_unlock_inbalance_bug+0xc7/0xd2
 [<ffffffffa057f001>] ? irda_sendmsg+0x207/0x21d [irda]
 [<ffffffff81086aca>] lock_release+0xcf/0x18e
 [<ffffffff813ed190>] release_sock+0x2d/0x155
 [<ffffffffa057f001>] irda_sendmsg+0x207/0x21d [irda]
 [<ffffffff813e9f8c>] __sock_sendmsg+0x69/0x75
 [<ffffffff813ea105>] sock_sendmsg+0xa1/0xb6
 [<ffffffff81100ca3>] ? might_fault+0x5c/0xac
 [<ffffffff81086b7c>] ? lock_release+0x181/0x18e
 [<ffffffff81100cec>] ? might_fault+0xa5/0xac
 [<ffffffff81100ca3>] ? might_fault+0x5c/0xac
 [<ffffffff81133b94>] ? fcheck_files+0xb9/0xf0
 [<ffffffff813f387a>] ? copy_from_user+0x2f/0x31
 [<ffffffff813f3b70>] ? verify_iovec+0x52/0xa6
 [<ffffffff813eb4e3>] sys_sendmsg+0x23a/0x2b8
 [<ffffffff81086b7c>] ? lock_release+0x181/0x18e
 [<ffffffff810773c6>] ? up_read+0x28/0x2c
 [<ffffffff814bec3d>] ? do_page_fault+0x360/0x3b4
 [<ffffffff81087043>] ? trace_hardirqs_on_caller+0x10b/0x12f
 [<ffffffff810458aa>] ? finish_task_switch+0xb2/0xe3
 [<ffffffff8104583e>] ? finish_task_switch+0x46/0xe3
 [<ffffffff8108364a>] ? trace_hardirqs_off_caller+0x33/0x90
 [<ffffffff814bbaf9>] ? retint_swapgs+0x13/0x1b
 [<ffffffff81087043>] ? trace_hardirqs_on_caller+0x10b/0x12f
 [<ffffffff810a9dd3>] ? audit_syscall_entry+0x11c/0x148
 [<ffffffff8125609e>] ? trace_hardirqs_on_thunk+0x3a/0x3f
 [<ffffffff814c22c2>] system_call_fastpath+0x16/0x1b

Signed-off-by: Dave Jones <davej@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-04-12 15:29:54 -07:00
Joakim Tjernlund 192910a6cc net: Do not wrap sysctl igmp_max_memberships in IP_MULTICAST
controlling igmp_max_membership is useful even when IP_MULTICAST
is off.
Quagga(an OSPF deamon) uses multicast addresses for all interfaces
using a single socket and hits igmp_max_membership limit when
there are 20 interfaces or more.
Always export sysctl igmp_max_memberships in proc, just like
igmp_max_msf

Signed-off-by: Joakim Tjernlund <Joakim.Tjernlund@transmode.se>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-04-12 13:59:33 -07:00
Eric Dumazet 66944e1c57 inetpeer: reduce stack usage
On 64bit arches, we use 752 bytes of stack when cleanup_once() is called
from inet_getpeer().

Lets share the avl stack to save ~376 bytes.

Before patch :

# objdump -d net/ipv4/inetpeer.o | scripts/checkstack.pl

0x000006c3 unlink_from_pool [inetpeer.o]:		376
0x00000721 unlink_from_pool [inetpeer.o]:		376
0x00000cb1 inet_getpeer [inetpeer.o]:			376
0x00000e6d inet_getpeer [inetpeer.o]:			376
0x0004 inet_initpeers [inetpeer.o]:			112
# size net/ipv4/inetpeer.o
   text	   data	    bss	    dec	    hex	filename
   5320	    432	     21	   5773	   168d	net/ipv4/inetpeer.o

After patch :

objdump -d net/ipv4/inetpeer.o | scripts/checkstack.pl
0x00000c11 inet_getpeer [inetpeer.o]:			376
0x00000dcd inet_getpeer [inetpeer.o]:			376
0x00000ab9 peer_check_expire [inetpeer.o]:		328
0x00000b7f peer_check_expire [inetpeer.o]:		328
0x0004 inet_initpeers [inetpeer.o]:			112
# size net/ipv4/inetpeer.o
   text	   data	    bss	    dec	    hex	filename
   5163	    432	     21	   5616	   15f0	net/ipv4/inetpeer.o

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Scot Doyle <lkml@scotdoyle.com>
Cc: Stephen Hemminger <shemminger@vyatta.com>
Cc: Hiroaki SHIMODA <shimoda.hiroaki@gmail.com>
Reviewed-by: Hiroaki SHIMODA <shimoda.hiroaki@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-04-12 13:58:33 -07:00
Eric Dumazet f8e9881c2a bridge: reset IPCB in br_parse_ip_options
Commit 462fb2af97 (bridge : Sanitize skb before it enters the IP
stack), missed one IPCB init before calling ip_options_compile()

Thanks to Scot Doyle for his tests and bug reports.

Reported-by: Scot Doyle <lkml@scotdoyle.com>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Hiroaki SHIMODA <shimoda.hiroaki@gmail.com>
Acked-by: Bandan Das <bandan.das@stratus.com>
Acked-by: Stephen Hemminger <shemminger@vyatta.com>
Cc: Jan Lübbe <jluebbe@debian.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-04-12 13:39:14 -07:00
David S. Miller aa8673599f llc: Fix length check in llc_fixup_skb().
Fixes bugzilla #32872

The LLC stack pretends to support non-linear skbs but there is a
direct use of skb_tail_pointer() in llc_fixup_skb().

Use pskb_may_pull() to see if data_size bytes remain and can be
accessed linearly in the packet, instead of direct pointer checks.

Signed-off-by: David S. Miller <davem@davemloft.net>
2011-04-11 18:59:05 -07:00
Sjur Brændeland 4a9f65f630 caif: performance bugfix - allow radio stack to prioritize packets.
In the CAIF Payload message the Packet Type indication must be set to
    UNCLASSIFIED in order to allow packet prioritization in the modem's
    network stack. Otherwise TCP-Ack is not prioritized in the modems
    transmit queue.

Signed-off-by: Sjur Brændeland <sjur.brandeland@stericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-04-11 13:15:58 -07:00
Sjur Brændeland 0c184ed903 caif: Bugfix use for_each_safe when removing list nodes.
Signed-off-by: Sjur Brændeland <sjur.brandeland@stericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-04-11 13:15:57 -07:00
Linus Torvalds c44eaf41a5 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (34 commits)
  net: Add support for SMSC LAN9530, LAN9730 and LAN89530
  mlx4_en: Restoring RX buffer pointer in case of failure
  mlx4: Sensing link type at device initialization
  ipv4: Fix "Set rt->rt_iif more sanely on output routes."
  MAINTAINERS: add entry for Xen network backend
  be2net: Fix suspend/resume operation
  be2net: Rename some struct members for clarity
  pppoe: drop PPPOX_ZOMBIEs in pppoe_flush_dev
  dsa/mv88e6131: add support for mv88e6085 switch
  ipv6: Enable RFS sk_rxhash tracking for ipv6 sockets (v2)
  be2net: Fix a potential crash during shutdown.
  bna: Fix for handling firmware heartbeat failure
  can: mcp251x: Allow pass IRQ flags through platform data.
  smsc911x: fix mac_lock acquision before calling smsc911x_mac_read
  iwlwifi: accept EEPROM version 0x423 for iwl6000
  rt2x00: fix cancelling uninitialized work
  rtlwifi: Fix some warnings/bugs
  p54usb: IDs for two new devices
  wl12xx: fix potential buffer overflow in testmode nvs push
  zd1211rw: reset rx idle timer from tasklet
  ...
2011-04-11 07:27:24 -07:00
Linus Torvalds 94c8a984ae Merge branch 'bugfixes' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6
* 'bugfixes' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6:
  NFS: Change initial mount authflavor only when server returns NFS4ERR_WRONGSEC
  NFS: Fix a signed vs. unsigned secinfo bug
  Revert "net/sunrpc: Use static const char arrays"
2011-04-08 11:47:35 -07:00
OGAWA Hirofumi 1b86a58f9d ipv4: Fix "Set rt->rt_iif more sanely on output routes."
Commit 1018b5c016 ("Set rt->rt_iif more
sanely on output routes.")  breaks rt_is_{output,input}_route.

This became the cause to return "IP_PKTINFO's ->ipi_ifindex == 0".

To fix it, this does:

1) Add "int rt_route_iif;" to struct rtable

2) For input routes, always set rt_route_iif to same value as rt_iif

3) For output routes, always set rt_route_iif to zero.  Set rt_iif
   as it is done currently.

4) Change rt_is_{output,input}_route() to test rt_route_iif

Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-04-07 14:04:08 -07:00
Linus Torvalds 42933bac11 Merge branch 'for-linus2' of git://git.profusion.mobi/users/lucas/linux-2.6
* 'for-linus2' of git://git.profusion.mobi/users/lucas/linux-2.6:
  Fix common misspellings
2011-04-07 11:14:49 -07:00
David S. Miller a25a32ab71 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-2.6 2011-04-06 13:34:15 -07:00
Peter Korsgaard ec80bfcb68 dsa/mv88e6131: add support for mv88e6085 switch
The mv88e6085 is identical to the mv88e6095, except that all ports are
10/100 Mb/s, so use the existing setup code except for the cpu/dsa speed
selection in _setup_port().

Signed-off-by: Peter Korsgaard <jacmet@sunsite.dk>
Acked-by: Lennert Buytenhek <buytenh@wantstofly.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-04-06 13:32:53 -07:00
Neil Horman 47482f132a ipv6: Enable RFS sk_rxhash tracking for ipv6 sockets (v2)
properly record sk_rxhash in ipv6 sockets (v2)

Noticed while working on another project that flows to sockets which I had open
on a test systems weren't getting steered properly when I had RFS enabled.
Looking more closely I found that:

1) The affected sockets were all ipv6
2) They weren't getting steered because sk->sk_rxhash was never set from the
incomming skbs on that socket.

This was occuring because there are several points in the IPv4 tcp and udp code
which save the rxhash value when a new connection is established.  Those calls
to sock_rps_save_rxhash were never added to the corresponding ipv6 code paths.
This patch adds those calls.  Tested by myself to properly enable RFS
functionalty on ipv6.

Change notes:
v2:
	Filtered UDP to only arm RFS on bound sockets (Eric Dumazet)

Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-04-06 13:07:09 -07:00
Trond Myklebust 0867659fa3 Revert "net/sunrpc: Use static const char arrays"
This reverts commit 411b5e0561.

Olga Kornievskaia reports:

Problem: linux client mounting linux server using rc4-hmac-md5
enctype. gssd fails with create a context after receiving a reply from
the server.

Diagnose: putting printout statements in the server kernel and
kerberos libraries revealed that client and server derived different
integrity keys.

Server kernel code was at fault due the the commit

[aglo@skydive linux-pnfs]$ git show 411b5e0561

Trond: The problem is that since it relies on virt_to_page(), you cannot
call sg_set_buf() for data in the const section.

Reported-by: Olga Kornievskaia <aglo@citi.umich.edu>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: stable@kernel.org	[2.6.36+]
2011-04-06 11:18:17 -07:00
Sage Weil 77f38e0eea libceph: fix linger request requeueing
Fix the request transition from linger -> normal request.  The key is to
preserve r_osd and requeue on the same OSD.  Reregister as a normal request,
add the request to the proper queues, then unregister the linger.  Fix the
unregister helper to avoid clearing r_osd (and also simplify the parallel
check in __unregister_request()).

Reported-by: Henry Chang <henry.cy.chang@gmail.com>
Signed-off-by: Sage Weil <sage@newdream.net>
2011-04-06 09:09:16 -07:00
David S. Miller 9d93059497 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/kaber/nf-2.6 2011-04-05 14:21:11 -07:00
Linus Torvalds d14f5b810b Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (27 commits)
  ipv6: Don't pass invalid dst_entry pointer to dst_release().
  mlx4: fix kfree on error path in new_steering_entry()
  tcp: len check is unnecessarily devastating, change to WARN_ON
  sctp: malloc enough room for asconf-ack chunk
  sctp: fix auth_hmacs field's length of struct sctp_cookie
  net: Fix dev dev_ethtool_get_rx_csum() for forced NETIF_F_RXCSUM
  usbnet: use eth%d name for known ethernet devices
  starfire: clean up dma_addr_t size test
  iwlegacy: fix bugs in change_interface
  carl9170: Fix tx aggregation problems with some clients
  iwl3945: disable hw scan by default
  wireless: rt2x00: rt2800usb.c add and identify ids
  iwl3945: do not deprecate software scan
  mac80211: fix aggregation frame release during timeout
  cfg80211: fix BSS double-unlinking (continued)
  cfg80211:: fix possible NULL pointer dereference
  mac80211: fix possible NULL pointer dereference
  mac80211: fix NULL pointer dereference in ieee80211_key_alloc()
  ath9k: fix a chip wakeup related crash in ath9k_start
  mac80211: fix a crash in minstrel_ht in HT mode with no supported MCS rates
  ...
2011-04-05 12:26:57 -07:00
Boris Ostrovsky 738faca343 ipv6: Don't pass invalid dst_entry pointer to dst_release().
Make sure dst_release() is not called with error pointer. This is
similar to commit 4910ac6c52 ("ipv4:
Don't ip_rt_put() an error pointer in RAW sockets.").

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@amd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-04-04 13:07:26 -07:00
Helmut Schaa fcf8bd3ba5 mac80211: Fix duplicate frames on cooked monitor
Cleaning the ieee80211_rx_data.flags field here is wrong, instead the
flags should be valid accross processing the frame on different
interfaces. Fix this by removing the incorrect flags=0 assignment.

Introduced in commit 554891e63a
(mac80211: move packet flags into packet).

Signed-off-by: Helmut Schaa <helmut.schaa@googlemail.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2011-04-04 15:22:11 -04:00
Florian Westphal 96120d86fe netfilter: xt_conntrack: fix inverted conntrack direction test
--ctdir ORIGINAL matches REPLY packets, and vv:

userspace sets "invert_flags &= ~XT_CONNTRACK_DIRECTION" in ORIGINAL
case.

Thus: (CTINFO2DIR(ctinfo) == IP_CT_DIR_ORIGINAL) ^
      !!(info->invert_flags & XT_CONNTRACK_DIRECTION))

yields "1 ^ 0", which is true -> returns false.

Reproducer:
iptables -I OUTPUT 1 -p tcp --syn -m conntrack --ctdir ORIGINAL

Signed-off-by: Florian Westphal <fwestphal@astaro.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2011-04-04 17:06:21 +02:00
Florian Westphal b7225041e9 netfilter: xt_addrtype: replace rt6_lookup with nf_afinfo->route
This avoids pulling in the ipv6 module when using (ipv4-only) iptables
-m addrtype.

Signed-off-by: Florian Westphal <fw@strlen.de>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2011-04-04 17:01:43 +02:00
Florian Westphal 0fae2e7740 netfilter: af_info: add 'strict' parameter to limit lookup to .oif
ipv6 fib lookup can set RT6_LOOKUP_F_IFACE flag to restrict search
to an interface, but this flag cannot be set via struct flowi.

Also, it cannot be set via ip6_route_output: this function uses the
passed sock struct to determine if this flag is required
(by testing for nonzero sk_bound_dev_if).

Work around this by passing in an artificial struct sk in case
'strict' argument is true.

This is required to replace the rt6_lookup call in xt_addrtype.c with
nf_afinfo->route().

Signed-off-by: Florian Westphal <fw@strlen.de>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2011-04-04 17:00:54 +02:00
Florian Westphal 31ad3dd64e netfilter: af_info: add network namespace parameter to route hook
This is required to eventually replace the rt6_lookup call in
xt_addrtype.c with nf_afinfo->route().

Signed-off-by: Florian Westphal <fw@strlen.de>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2011-04-04 16:56:29 +02:00
Hans Schillstrom a09d19779f IPVS: fix NULL ptr dereference in ip_vs_ctl.c ip_vs_genl_dump_daemons()
ipvsadm -ln --daemon will trigger a Null pointer exception because
ip_vs_genl_dump_daemons() uses skb_net() instead of skb_sknet().

To prevent others from NULL ptr a check is made in ip_vs.h skb_net().

Signed-off-by: Hans Schillstrom <hans.schillstrom@ericsson.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2011-04-04 15:25:18 +02:00
David Sterba b4232a2277 netfilter: h323: bug in parsing of ASN1 SEQOF field
Static analyzer of clang found a dead store which appears to be a bug in
reading count of items in SEQOF field, only the lower byte of word is
stored. This may lead to corrupted read and communication shutdown.

The bug has been in the module since it's first inclusion into linux
kernel.

[Patrick: the bug is real, but without practical consequence since the
 largest amount of sequence-of members we parse is 30.]

Signed-off-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2011-04-04 15:21:02 +02:00
Jozsef Kadlecsik 2f9f28b212 netfilter: ipset: references are protected by rwlock instead of mutex
The timeout variant of the list:set type must reference the member sets.
However, its garbage collector runs at timer interrupt so the mutex
protection of the references is a no go. Therefore the reference protection
is converted to rwlock.

Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2011-04-04 15:19:25 +02:00
Jozsef Kadlecsik 512d06b5b6 netfilter: ipset: list:set timeout variant fixes
- the timeout value was actually not set
- the garbage collector was broken

The variant is fixed, the tests to the ipset testsuite are added.

Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2011-04-04 15:18:45 +02:00
Ilpo Järvinen 2fceec1337 tcp: len check is unnecessarily devastating, change to WARN_ON
All callers are prepared for alloc failures anyway, so this error
can safely be boomeranged to the callers domain without super
bad consequences. ...At worst the connection might go into a state
where each RTO tries to (unsuccessfully) re-fragment with such
a mis-sized value and eventually dies.

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-04-01 21:47:41 -07:00
Wei Yongjun 2cab86bee8 sctp: malloc enough room for asconf-ack chunk
Sometime the ASCONF_ACK parameters can equal to the fourfold of
ASCONF parameters, this only happend in some special case:

  ASCONF parameter is :
    Unrecognized Parameter (4 bytes)
  ASCONF_ACK parameter should be:
    Error Cause Indication parameter (8 bytes header)
     + Error Cause (4 bytes header)
       + Unrecognized Parameter (4bytes)

Four 4bytes Unrecognized Parameters in ASCONF chunk will cause panic.

Pid: 0, comm: swapper Not tainted 2.6.38-next+ #22 Bochs Bochs
EIP: 0060:[<c0717eae>] EFLAGS: 00010246 CPU: 0
EIP is at skb_put+0x60/0x70
EAX: 00000077 EBX: c09060e2 ECX: dec1dc30 EDX: c09469c0
ESI: 00000000 EDI: de3c8d40 EBP: dec1dc58 ESP: dec1dc2c
 DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
Process swapper (pid: 0, ti=dec1c000 task=c09aef20 task.ti=c0980000)
Stack:
 c09469c0 e1894fa4 00000044 00000004 de3c8d00 de3c8d00 de3c8d44 de3c8d40
 c09060e2 de25dd80 de3c8d40 dec1dc7c e1894fa4 dec1dcb0 00000040 00000004
 00000000 00000800 00000004 00000004 dec1dce0 e1895a2b dec1dcb4 de25d960
Call Trace:
 [<e1894fa4>] ? sctp_addto_chunk+0x4e/0x89 [sctp]
 [<e1894fa4>] sctp_addto_chunk+0x4e/0x89 [sctp]
 [<e1895a2b>] sctp_process_asconf+0x32f/0x3d1 [sctp]
 [<e188d554>] sctp_sf_do_asconf+0xf8/0x173 [sctp]
 [<e1890b02>] sctp_do_sm+0xb8/0x159 [sctp]
 [<e18a2248>] ? sctp_cname+0x0/0x52 [sctp]
 [<e189392d>] sctp_assoc_bh_rcv+0xac/0xe3 [sctp]
 [<e1897d76>] sctp_inq_push+0x2d/0x30 [sctp]
 [<e18a21b2>] sctp_rcv+0x7a7/0x83d [sctp]
 [<c077a95c>] ? ipv4_confirm+0x118/0x125
 [<c073a970>] ? nf_iterate+0x34/0x62
 [<c074789d>] ? ip_local_deliver_finish+0x0/0x194
 [<c074789d>] ? ip_local_deliver_finish+0x0/0x194
 [<c0747992>] ip_local_deliver_finish+0xf5/0x194
 [<c074789d>] ? ip_local_deliver_finish+0x0/0x194
 [<c0747a6e>] NF_HOOK.clone.1+0x3d/0x44
 [<c0747ab3>] ip_local_deliver+0x3e/0x44
 [<c074789d>] ? ip_local_deliver_finish+0x0/0x194
 [<c074775c>] ip_rcv_finish+0x29f/0x2c7
 [<c07474bd>] ? ip_rcv_finish+0x0/0x2c7
 [<c0747a6e>] NF_HOOK.clone.1+0x3d/0x44
 [<c0747cae>] ip_rcv+0x1f5/0x233
 [<c07474bd>] ? ip_rcv_finish+0x0/0x2c7
 [<c071dce3>] __netif_receive_skb+0x310/0x336
 [<c07221f3>] netif_receive_skb+0x4b/0x51
 [<e0a4ed3d>] cp_rx_poll+0x1e7/0x29c [8139cp]
 [<c072275e>] net_rx_action+0x65/0x13a
 [<c0445a54>] __do_softirq+0xa1/0x149
 [<c04459b3>] ? __do_softirq+0x0/0x149
 <IRQ>
 [<c0445891>] ? irq_exit+0x37/0x72
 [<c040a7e9>] ? do_IRQ+0x81/0x95
 [<c07b3670>] ? common_interrupt+0x30/0x38
 [<c0428058>] ? native_safe_halt+0xa/0xc
 [<c040f5d7>] ? default_idle+0x58/0x92
 [<c0408fb0>] ? cpu_idle+0x96/0xb2
 [<c0797989>] ? rest_init+0x5d/0x5f
 [<c09fd90c>] ? start_kernel+0x34b/0x350
 [<c09fd0cb>] ? i386_start_kernel+0xba/0xc1

Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-04-01 21:45:51 -07:00
David S. Miller 5e58e5283a Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-2.6 2011-04-01 17:15:25 -07:00
Linus Torvalds 84daeb09ef Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6:
  appletalk: Fix OOPS in atalk_release().
  mlx4: Fixing bad size of event queue buffer
  mlx4: Fixing use after free
  bonding:typo in comment
  sctp: Pass __GFP_NOWARN to hash table allocation attempts.
  connector: convert to synchronous netlink message processing
  fib: add rtnl locking in ip_fib_net_exit
  atm/solos-pci: Don't flap VCs when carrier state changes
  atm/solos-pci: Don't include frame pseudo-header on transmit hex-dump
  atm/solos-pci: Use VPI.VCI notation uniformly.
  Atheros, atl2: Fix mem leaks in error paths of atl2_set_eeprom
  netdev: fix mtu check when TSO is enabled
  net/usb: Ethernet quirks for the LG-VL600 4G modem
  phylib: phy_attach_direct: phy_init_hw can fail, add cleanup
  bridge: mcast snooping, fix length check of snooped MLDv1/2
  via-ircc: Pass PCI device pointer to dma_{alloc, free}_coherent()
  via-ircc: Use pci_{get, set}_drvdata() instead of static pointer variable
  net: gre: provide multicast mappings for ipv4 and ipv6
  bridge: Fix compilation warning in function br_stp_recalculate_bridge_id()
  net: Fix warnings caused by MAX_SKB_FRAGS change.
2011-04-01 08:53:50 -07:00
David S. Miller c100c8f4c3 appletalk: Fix OOPS in atalk_release().
Commit 60d9f461a2 ("appletalk: remove
the BKL") added a dereference of "sk" before checking for NULL in
atalk_release().

Guard the code block completely, rather than partially, with the
NULL check.

Reported-by: Dave Jones <davej@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-03-31 18:59:10 -07:00
Lucas De Marchi 25985edced Fix common misspellings
Fixes generated by 'codespell' and manually reviewed.

Signed-off-by: Lucas De Marchi <lucas.demarchi@profusion.mobi>
2011-03-31 11:26:23 -03:00
David S. Miller a84b50ceb7 sctp: Pass __GFP_NOWARN to hash table allocation attempts.
Like DCCP and other similar pieces of code, there are mechanisms
here to try allocating smaller hash tables if the allocation
fails.  So pass in __GFP_NOWARN like the others do instead of
emitting a scary message.

Reported-by: Dave Jones <davej@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-03-30 17:51:36 -07:00
Eric Dumazet e2666f8495 fib: add rtnl locking in ip_fib_net_exit
Daniel J Blueman reported a lockdep splat in trie_firstleaf(), caused by
RTNL being not locked before a call to fib_table_flush()

Reported-by: Daniel J Blueman <daniel.blueman@gmail.com>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-03-30 16:57:46 -07:00
Philip A. Prindeville c031235b39 atm/solos-pci: Don't flap VCs when carrier state changes
Don't flap VCs when carrier state changes; higher-level protocols
can detect loss of connectivity and act accordingly. This is more
consistent with how other network interfaces work.

We no longer use release_vccs() so we can delete it.

release_vccs() was duplicated from net/atm/common.c; make the
corresponding function exported, since other code duplicates it
and could leverage it if it were public.

Signed-off-by: Philip A. Prindeville <philipp@redfish-solutions.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-03-30 16:53:38 -07:00
Linus Torvalds 50f3515828 Merge git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client
* git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client:
  libceph: Create a new key type "ceph".
  libceph: Get secret from the kernel keys api when mounting with key=NAME.
  ceph: Move secret key parsing earlier.
  libceph: fix null dereference when unregistering linger requests
  ceph: unlock on error in ceph_osdc_start_request()
  ceph: fix possible NULL pointer dereference
  ceph: flush msgr_wq during mds_client shutdown
2011-03-30 09:46:09 -07:00
Daniel Lezcano 79b569f0ec netdev: fix mtu check when TSO is enabled
In case the device where is coming from the packet has TSO enabled,
we should not check the mtu size value as this one could be bigger
than the expected value.

This is the case for the macvlan driver when the lower device has
TSO enabled. The macvlan inherit this feature and forward the packets
without fragmenting them. Then the packets go through dev_forward_skb
and are dropped. This patch fix this by checking TSO is not enabled
when we want to check the mtu size.

Signed-off-by: Daniel Lezcano <daniel.lezcano@free.fr>
Acked-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-03-30 02:42:17 -07:00
Linus Lüssing ff9a57a62a bridge: mcast snooping, fix length check of snooped MLDv1/2
"len = ntohs(ip6h->payload_len)" does not include the length of the ipv6
header itself, which the rest of this function assumes, though.

This leads to a length check less restrictive as it should be in the
following line for one thing. For another, it very likely leads to an
integer underrun when substracting the offset and therefore to a very
high new value of 'len' due to its unsignedness. This will ultimately
lead to the pskb_trim_rcsum() practically never being called, even in
the cases where it should.

Signed-off-by: Linus Lüssing <linus.luessing@web.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-03-30 02:28:20 -07:00
Timo Teräs 93ca3bb5df net: gre: provide multicast mappings for ipv4 and ipv6
My commit 6d55cb91a0 (gre: fix hard header destination
address checking) broke multicast.

The reason is that ip_gre used to get ipgre_header() calls with
zero destination if we have NOARP or multicast destination. Instead
the actual target was decided at ipgre_tunnel_xmit() time based on
per-protocol dissection.

Instead of allowing the "abuse" of ->header() calls with invalid
destination, this creates multicast mappings for ip_gre. This also
fixes "ip neigh show nud noarp" to display the proper multicast
mappings used by the gre device.

Reported-by: Doug Kehn <rdkehn@yahoo.com>
Signed-off-by: Timo Teräs <timo.teras@iki.fi>
Acked-by: Doug Kehn <rdkehn@yahoo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-03-30 00:10:47 -07:00
Balaji G 1459a3cc51 bridge: Fix compilation warning in function br_stp_recalculate_bridge_id()
net/bridge/br_stp_if.c: In function ‘br_stp_recalculate_bridge_id’:
net/bridge/br_stp_if.c:216:3: warning: ‘return’ with no value, in function returning non-void

Signed-off-by: G.Balaji <balajig81@gmail.com>
Acked-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-03-29 23:37:23 -07:00
Tommi Virtanen 4b2a58abd1 libceph: Create a new key type "ceph".
This allows us to use existence of the key type as a feature test,
from userspace.

Signed-off-by: Tommi Virtanen <tommi.virtanen@dreamhost.com>
Signed-off-by: Sage Weil <sage@newdream.net>
2011-03-29 12:11:24 -07:00
Tommi Virtanen e2c3d29b42 libceph: Get secret from the kernel keys api when mounting with key=NAME.
Signed-off-by: Tommi Virtanen <tommi.virtanen@dreamhost.com>
Signed-off-by: Sage Weil <sage@newdream.net>
2011-03-29 12:11:19 -07:00
Tommi Virtanen 8323c3aa74 ceph: Move secret key parsing earlier.
This makes the base64 logic be contained in mount option parsing,
and prepares us for replacing the homebew key management with the
kernel key retention service.

Signed-off-by: Tommi Virtanen <tommi.virtanen@dreamhost.com>
Signed-off-by: Sage Weil <sage@newdream.net>
2011-03-29 12:11:16 -07:00
Sage Weil fbdb919048 libceph: fix null dereference when unregistering linger requests
We should only clear r_osd if we are neither registered as a linger or a
regular request.  We may unregister as a linger while still registered as
a regular request (e.g., in reset_osd).  Incorrectly clearing r_osd there
leads to a null pointer dereference in __send_request.

Also simplify the parallel check in __unregister_request() where we just
removed r_osd_item and know it's empty.

Signed-off-by: Sage Weil <sage@newdream.net>
2011-03-29 12:11:06 -07:00
Dan Carpenter 234af26ff1 ceph: unlock on error in ceph_osdc_start_request()
There was a missing unlock on the error path if __map_request() failed.

Signed-off-by: Dan Carpenter <error27@gmail.com>
Signed-off-by: Sage Weil <sage@newdream.net>
2011-03-29 08:59:54 -07:00
Linus Torvalds cb1817b373 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (30 commits)
  xfrm: Restrict extended sequence numbers to esp
  xfrm: Check for esn buffer len in xfrm_new_ae
  xfrm: Assign esn pointers when cloning a state
  xfrm: Move the test on replay window size into the replay check functions
  netdev: bfin_mac: document TE setting in RMII modes
  drivers net: Fix declaration ordering in inline functions.
  cxgb3: Apply interrupt coalescing settings to all queues
  net: Always allocate at least 16 skb frags regardless of page size
  ipv4: Don't ip_rt_put() an error pointer in RAW sockets.
  net: fix ethtool->set_flags not intended -EINVAL return value
  mlx4_en: Fix loss of promiscuity
  tg3: Fix inline keyword usage
  tg3: use <linux/io.h> and <linux/uaccess.h> instead <asm/io.h> and <asm/uaccess.h>
  net: use CHECKSUM_NONE instead of magic number
  Net / jme: Do not use legacy PCI power management
  myri10ge: small rx_done refactoring
  bridge: notify applications if address of bridge device changes
  ipv4: Fix IP timestamp option (IPOPT_TS_PRESPEC) handling in ip_options_echo()
  can: c_can: Fix tx_bytes accounting
  can: c_can_platform: fix irq check in probe
  ...
2011-03-29 07:41:33 -07:00
Steffen Klassert 02aadf72fe xfrm: Restrict extended sequence numbers to esp
The IPsec extended sequence numbers are fully implemented just for
esp. So restrict the usage to esp until other protocols have
support too.

Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-03-28 23:34:53 -07:00
Steffen Klassert e2b19125e9 xfrm: Check for esn buffer len in xfrm_new_ae
In xfrm_new_ae() we may overwrite the allocated esn replay state
buffer with a wrong size. So check that the new size matches the
original allocated size and return an error if this is not the case.

Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-03-28 23:34:52 -07:00
Steffen Klassert af2f464e32 xfrm: Assign esn pointers when cloning a state
When we clone a xfrm state we have to assign the replay_esn
and the preplay_esn pointers to the state if we use the
new replay detection method. To this end, we add a
xfrm_replay_clone() function that allocates memory for
the replay detection and takes over the necessary values
from the original state.

Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-03-28 23:34:52 -07:00
Steffen Klassert 36ae0148db xfrm: Move the test on replay window size into the replay check functions
As it is, the replay check is just performed if the replay window of the
legacy implementation is nonzero. So we move the test on a nonzero replay
window inside the replay check functions to be sure we are testing for the
right implementation.

Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-03-28 23:34:51 -07:00
David S. Miller 4910ac6c52 ipv4: Don't ip_rt_put() an error pointer in RAW sockets.
Reported-by: Marc Kleine-Budde <mkl@pengutronix.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-03-28 16:51:15 -07:00
Daniel Halperin 499fe9a419 mac80211: fix aggregation frame release during timeout
Suppose the aggregation reorder buffer looks like this:

x-T-R1-y-R2,

where x and y are frames that have not been received, T is a received
frame that has timed out, and R1,R2 are received frames that have not
yet timed out. The proper behavior in this scenario is to move the
window past x (skipping it), release T and R1, and leave the window at y
until y is received or R2 times out.

As written, this code will instead leave the window at R1, because it
has not yet timed out. Fix this by exiting the reorder loop only when
the frame that has not timed out AND there are skipped frames earlier in
the current valid window.

Signed-off-by: Daniel Halperin <dhalperi@cs.washington.edu>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2011-03-28 15:42:02 -04:00
Juuso Oikarinen 2b78ac9bfc cfg80211: fix BSS double-unlinking (continued)
This patch adds to the fix "fix BSS double-unlinking"
(commit 3207390a8b) by Johannes Berg.

It turns out, that the double-unlinking scenario can also occur if expired
BSS elements are removed whilst an interface is performing association.

To work around that, replace list_del with list_del_init also in the
"cfg80211_bss_expire" function, so that the check for whether the BSS still is
in the list works correctly in cfg80211_unlink_bss.

Signed-off-by: Juuso Oikarinen <juuso.oikarinen@nokia.com>
Reviewed-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2011-03-28 15:42:02 -04:00
Mariusz Kozlowski bef9bacc4e cfg80211:: fix possible NULL pointer dereference
In cfg80211_inform_bss_frame() wiphy is first dereferenced on privsz
initialisation and then it is checked for NULL. This patch fixes that.

Signed-off-by: Mariusz Kozlowski <mk@lab.zgora.pl>
Acked-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2011-03-28 15:42:02 -04:00
Mariusz Kozlowski 67aa030c0d mac80211: fix possible NULL pointer dereference
This patch moves 'key' dereference after BUG_ON(!key) so that when key is NULL
we will see proper trace instead of oops.

Signed-off-by: Mariusz Kozlowski <mk@lab.zgora.pl>
Acked-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2011-03-28 15:42:02 -04:00
Petr Štetiar 1f951a7f8b mac80211: fix NULL pointer dereference in ieee80211_key_alloc()
The ieee80211_key struct can be kfree()d several times in the function, for
example if some of the key setup functions fails beforehand, but there's no
check if the struct is still valid before we call memcpy() and INIT_LIST_HEAD()
on it.  In some cases (like it was in my case), if there's missing aes-generic
module it could lead to the following kernel OOPS:

	Unable to handle kernel NULL pointer dereference at virtual address 0000018c
	....
	PC is at memcpy+0x80/0x29c
	...
	Backtrace:
	[<bf11c5e4>] (ieee80211_key_alloc+0x0/0x234 [mac80211]) from [<bf1148b4>] (ieee80211_add_key+0x70/0x12c [mac80211])
	[<bf114844>] (ieee80211_add_key+0x0/0x12c [mac80211]) from [<bf070cc0>] (__cfg80211_set_encryption+0x2a8/0x464 [cfg80211])

Signed-off-by: Petr Štetiar <ynezz@true.cz>
Reviewed-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2011-03-28 15:42:02 -04:00
Felix Fietkau 4dc217df68 mac80211: fix a crash in minstrel_ht in HT mode with no supported MCS rates
When a client connects in HT mode but does not provide any valid MCS
rates, the function that finds the next sample rate gets stuck in an
infinite loop.
Fix this by falling back to legacy rates if no usable MCS rates are found.

Signed-off-by: Felix Fietkau <nbd@openwrt.org>
Cc: stable@kernel.org
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2011-03-28 15:42:01 -04:00
Stanislaw Gruszka 673e63c688 net: fix ethtool->set_flags not intended -EINVAL return value
After commit d5dbda2380 "ethtool: Add
support for vlan accleration.", drivers that have NETIF_F_HW_VLAN_TX,
and/or NETIF_F_HW_VLAN_RX feature, but do not allow enable/disable vlan
acceleration via ethtool set_flags, always return -EINVAL from that
function. Fix by returning -EINVAL only if requested features do not
match current settings and can not be changed by driver.

Change any driver that define ethtool->set_flags to use
ethtool_invalid_flags() to avoid similar problems in the future
(also on drivers that do not have the problem).

Tested with modified (to reproduce this bug) myri10ge driver.

Cc: stable@kernel.org # 2.6.37+
Signed-off-by: Stanislaw Gruszka <sgruszka@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-03-27 23:35:24 -07:00
Cesar Eduardo Barros 3e49e6d520 net: use CHECKSUM_NONE instead of magic number
Two places in the kernel were doing skb->ip_summed = 0.

Change both to skb->ip_summed = CHECKSUM_NONE, which is more readable.

Signed-off-by: Cesar Eduardo Barros <cesarb@cesarb.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-03-27 23:35:05 -07:00
stephen hemminger edf947f100 bridge: notify applications if address of bridge device changes
The mac address of the bridge device may be changed when a new interface
is added to the bridge. If this happens, then the bridge needs to call
the network notifiers to tickle any other systems that care. Since bridge
can be a module, this also means exporting the notifier function.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-03-27 23:35:02 -07:00
Jan Luebbe 8628bd8af7 ipv4: Fix IP timestamp option (IPOPT_TS_PRESPEC) handling in ip_options_echo()
The current handling of echoed IP timestamp options with prespecified
addresses is rather broken since the 2.2.x kernels. As far as i understand
it, it should behave like when originating packets.

Currently it will only timestamp the next free slot if:
 - there is space for *two* timestamps
 - some random data from the echoed packet taken as an IP is *not* a local IP

This first is caused by an off-by-one error. 'soffset' points to the next
free slot and so we only need to have 'soffset + 7 <= optlen'.

The second bug is using sptr as the start of the option, when it really is
set to 'skb_network_header(skb)'. I just use dptr instead which points to
the timestamp option.

Finally it would only timestamp for non-local IPs, which we shouldn't do.
So instead we exclude all unicast destinations, similar to what we do in
ip_options_compile().

Signed-off-by: Jan Luebbe <jluebbe@debian.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-03-27 23:35:02 -07:00
Oliver Hartkopp 53914b6799 can: make struct proto const
can_ioctl is the only reason for struct proto to be non-const.
script/check-patch.pl suggests struct proto be const.

Setting the reference to the common can_ioctl() in all CAN protocols directly
removes the need to make the struct proto writable in af_can.c

Signed-off-by: Kurt Van Dijck <kurt.van.dijck@eia.be>
Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-03-27 23:34:59 -07:00
Amerigo Wang 3b261ade42 net: remove useless comments in net/core/dev.c
The code itself can explain what it is doing, no need these comments.

Signed-off-by: WANG Cong <amwang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-03-27 23:34:59 -07:00
Ben Hutchings e0bccd315d rose: Add length checks to CALL_REQUEST parsing
Define some constant offsets for CALL_REQUEST based on the description
at <http://www.techfest.com/networking/wan/x25plp.htm> and the
definition of ROSE as using 10-digit (5-byte) addresses.  Use them
consistently.  Validate all implicit and explicit facilities lengths.
Validate the address length byte rather than either trusting or
assuming its value.

Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-03-27 17:59:04 -07:00
Dan Rosenberg be20250c13 ROSE: prevent heap corruption with bad facilities
When parsing the FAC_NATIONAL_DIGIS facilities field, it's possible for
a remote host to provide more digipeaters than expected, resulting in
heap corruption.  Check against ROSE_MAX_DIGIS to prevent overflows, and
abort facilities parsing on failure.

Additionally, when parsing the FAC_CCITT_DEST_NSAP and
FAC_CCITT_SRC_NSAP facilities fields, a remote host can provide a length
of less than 10, resulting in an underflow in a memcpy size, causing a
kernel panic due to massive heap corruption.  A length of greater than
20 results in a stack overflow of the callsign array.  Abort facilities
parsing on these invalid length values.

Signed-off-by: Dan Rosenberg <drosenberg@vsecurity.com>
Cc: stable@kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-03-27 17:59:03 -07:00
Dan Rosenberg d370af0ef7 irda: validate peer name and attribute lengths
Length fields provided by a peer for names and attributes may be longer
than the destination array sizes.  Validate lengths to prevent stack
buffer overflows.

Signed-off-by: Dan Rosenberg <drosenberg@vsecurity.com>
Cc: stable@kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-03-27 17:59:02 -07:00
Dan Rosenberg d50e7e3604 irda: prevent heap corruption on invalid nickname
Invalid nicknames containing only spaces will result in an underflow in
a memcpy size calculation, subsequently destroying the heap and
panicking.

v2 also catches the case where the provided nickname is longer than the
buffer size, which can result in controllable heap corruption.

Signed-off-by: Dan Rosenberg <drosenberg@vsecurity.com>
Cc: stable@kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-03-27 17:59:02 -07:00
Steffen Klassert e433430a0c dst: Clone child entry in skb_dst_pop
We clone the child entry in skb_dst_pop before we call
skb_dst_drop(). Otherwise we might kill the child right
before we return it to the caller.

Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-03-27 17:55:01 -07:00
Steffen Klassert 3bc07321cc xfrm: Force a dst refcount before entering the xfrm type handlers
Crypto requests might return asynchronous. In this case we leave
the rcu protected region, so force a refcount on the skb's
destination entry before we enter the xfrm type input/output
handlers.

This fixes a crash when a route is deleted whilst sending IPsec
data that is transformed by an asynchronous algorithm.

Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-03-27 17:55:01 -07:00
OGAWA Hirofumi a271c5a0de NFS: Ensure that rpc_release_resources_task() can be called twice.
BUG: atomic_dec_and_test(): -1: atomic counter underflow at:
Pid: 2827, comm: mount.nfs Not tainted 2.6.38 #1
Call Trace:
 [<ffffffffa02223a0>] ? put_rpccred+0x44/0x14e [sunrpc]
 [<ffffffffa021bbe9>] ? rpc_ping+0x4e/0x58 [sunrpc]
 [<ffffffffa021c4a5>] ? rpc_create+0x481/0x4fc [sunrpc]
 [<ffffffffa022298a>] ? rpcauth_lookup_credcache+0xab/0x22d [sunrpc]
 [<ffffffffa028be8c>] ? nfs_create_rpc_client+0xa6/0xeb [nfs]
 [<ffffffffa028c660>] ? nfs4_set_client+0xc2/0x1f9 [nfs]
 [<ffffffffa028cd3c>] ? nfs4_create_server+0xf2/0x2a6 [nfs]
 [<ffffffffa0295d07>] ? nfs4_remote_mount+0x4e/0x14a [nfs]
 [<ffffffff810dd570>] ? vfs_kern_mount+0x6e/0x133
 [<ffffffffa029605a>] ? nfs_do_root_mount+0x76/0x95 [nfs]
 [<ffffffffa029643d>] ? nfs4_try_mount+0x56/0xaf [nfs]
 [<ffffffffa0297434>] ? nfs_get_sb+0x435/0x73c [nfs]
 [<ffffffff810dd59b>] ? vfs_kern_mount+0x99/0x133
 [<ffffffff810dd693>] ? do_kern_mount+0x48/0xd8
 [<ffffffff810f5b75>] ? do_mount+0x6da/0x741
 [<ffffffff810f5c5f>] ? sys_mount+0x83/0xc0
 [<ffffffff8100293b>] ? system_call_fastpath+0x16/0x1b

Well, so, I think this is real bug of nfs codes somewhere. With some
review, the code

rpc_call_sync()
    rpc_run_task
        rpc_execute()
            __rpc_execute()
                rpc_release_task()
                    rpc_release_resources_task()
                        put_rpccred()                <= release cred
    rpc_put_task
        rpc_do_put_task()
            rpc_release_resources_task()
                put_rpccred()                        <= release cred again

seems to be release cred unintendedly.

Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-03-27 17:55:36 +02:00
Mariusz Kozlowski 6b0ae4097c ceph: fix possible NULL pointer dereference
This patch fixes 'event_work' dereference before it is checked for NULL.

Signed-off-by: Mariusz Kozlowski <mk@lab.zgora.pl>
Signed-off-by: Sage Weil <sage@newdream.net>
2011-03-26 13:41:20 -07:00
Linus Torvalds 00a2470546 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (56 commits)
  route: Take the right src and dst addresses in ip_route_newports
  ipv4: Fix nexthop caching wrt. scoping.
  ipv4: Invalidate nexthop cache nh_saddr more correctly.
  net: fix pch_gbe section mismatch warning
  ipv4: fix fib metrics
  mlx4_en: Removing HW info from ethtool -i report.
  net_sched: fix THROTTLED/RUNNING race
  drivers/net/a2065.c: Convert release_resource to release_region/release_mem_region
  drivers/net/ariadne.c: Convert release_resource to release_region/release_mem_region
  bonding: fix rx_handler locking
  myri10ge: fix rmmod crash
  mlx4_en: updated driver version to 1.5.4.1
  mlx4_en: Using blue flame support
  mlx4_core: reserve UARs for userspace consumers
  mlx4_core: maintain available field in bitmap allocator
  mlx4: Add blue flame support for kernel consumers
  mlx4_en: Enabling new steering
  mlx4: Add support for promiscuous mode in the new steering model.
  mlx4: generalization of multicast steering.
  mlx4_en: Reporting HW revision in ethtool -i
  ...
2011-03-25 21:02:22 -07:00
Julian Anastasov 1fbc784392 ipv4: do not ignore route errors
The "ipv4: Inline fib_semantic_match into check_leaf"
change forgets to return the route errors. check_leaf should
return the same results as fib_table_lookup.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-03-25 20:33:23 -07:00
Sage Weil ef550f6f4f ceph: flush msgr_wq during mds_client shutdown
The release method for mds connections uses a backpointer to the
mds_client, so we need to flush the workqueue of any pending work (and
ceph_connection references) prior to freeing the mds_client.  This fixes
an oops easily triggered under UML by

 while true ; do mount ... ; umount ... ; done

Also fix an outdated comment: the flush in ceph_destroy_client only flushes
OSD connections out.  This bug is basically an artifact of the ceph ->
ceph+libceph conversion.

Signed-off-by: Sage Weil <sage@newdream.net>
2011-03-25 13:27:48 -07:00
Linus Torvalds 40471856f2 Merge branch 'nfs-for-2.6.39' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6
* 'nfs-for-2.6.39' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6: (28 commits)
  Cleanup XDR parsing for LAYOUTGET, GETDEVICEINFO
  NFSv4.1 convert layoutcommit sync to boolean
  NFSv4.1 pnfs_layoutcommit_inode fixes
  NFS: Determine initial mount security
  NFS: use secinfo when crossing mountpoints
  NFS: Add secinfo procedure
  NFS: lookup supports alternate client
  NFS: convert call_sync() to a function
  NFSv4.1 remove temp code that prevented ds commits
  NFSv4.1: layoutcommit
  NFSv4.1: filelayout driver specific code for COMMIT
  NFSv4.1: remove GETATTR from ds commits
  NFSv4.1: add generic layer hooks for pnfs COMMIT
  NFSv4.1: alloc and free commit_buckets
  NFSv4.1: shift filelayout_free_lseg
  NFSv4.1: pull out code from nfs_commit_release
  NFSv4.1: pull error handling out of nfs_commit_list
  NFSv4.1: add callback to nfs4_commit_done
  NFSv4.1: rearrange nfs_commit_rpcsetup
  NFSv4.1: don't send COMMIT to ds for data sync writes
  ...
2011-03-25 10:03:28 -07:00
David S. Miller 37e826c513 ipv4: Fix nexthop caching wrt. scoping.
Move the scope value out of the fib alias entries and into fib_info,
so that we always use the correct scope when recomputing the nexthop
cached source address.

Reported-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-03-24 18:06:47 -07:00
David S. Miller 436c3b66ec ipv4: Invalidate nexthop cache nh_saddr more correctly.
Any operation that:

1) Brings up an interface
2) Adds an IP address to an interface
3) Deletes an IP address from an interface

can potentially invalidate the nh_saddr value, requiring
it to be recomputed.

Perform the recomputation lazily using a generation ID.

Reported-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-03-24 17:42:21 -07:00
Trond Myklebust 0acd220192 Merge branch 'nfs-for-2.6.39' into nfs-for-next 2011-03-24 17:03:14 -04:00
Thomas Gleixner b77dcf8460 Bluetooth: Fix warning with hci_cmd_timer
After we made debugobjects working again, we got the following:

WARNING: at lib/debugobjects.c:262 debug_print_object+0x8e/0xb0()
Hardware name: System Product Name
ODEBUG: free active (active state 0) object type: timer_list hint: hci_cmd_timer+0x0/0x60
Pid: 2125, comm: dmsetup Tainted: G        W   2.6.38-06707-gc62b389 #110375
Call Trace:
 [<ffffffff8104700a>] warn_slowpath_common+0x7a/0xb0
 [<ffffffff810470b6>] warn_slowpath_fmt+0x46/0x50
 [<ffffffff812d3a5e>] debug_print_object+0x8e/0xb0
 [<ffffffff81bd8810>] ? hci_cmd_timer+0x0/0x60
 [<ffffffff812d4685>] debug_check_no_obj_freed+0x125/0x230
 [<ffffffff810f1063>] ? check_object+0xb3/0x2b0
 [<ffffffff810f3630>] kfree+0x150/0x190
 [<ffffffff81be4d06>] ? bt_host_release+0x16/0x20
 [<ffffffff81be4d06>] bt_host_release+0x16/0x20
 [<ffffffff813a1907>] device_release+0x27/0xa0
 [<ffffffff812c519c>] kobject_release+0x4c/0xa0
 [<ffffffff812c5150>] ? kobject_release+0x0/0xa0
 [<ffffffff812c61f6>] kref_put+0x36/0x70
 [<ffffffff812c4d37>] kobject_put+0x27/0x60
 [<ffffffff813a21f7>] put_device+0x17/0x20
 [<ffffffff81bda4f9>] hci_free_dev+0x29/0x30
 [<ffffffff81928be6>] vhci_release+0x36/0x70
 [<ffffffff810fb366>] fput+0xd6/0x1f0
 [<ffffffff810f8fe6>] filp_close+0x66/0x90
 [<ffffffff810f90a9>] sys_close+0x99/0xf0
 [<ffffffff81d4c96b>] system_call_fastpath+0x16/0x1b

That timer was introduced with commit 6bd32326cda(Bluetooth: Use
proper timer for hci command timout)

Timer seems to be running when the thing is closed. Removing the timer
unconditionally fixes the problem. And yes, it needs to be fixed
before the HCI_UP check.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>
2011-03-24 17:04:44 -03:00
Andrei Emeltchenko a0cc9a1b57 Bluetooth: delete hanging L2CAP channel
Sometimes L2CAP connection remains hanging. Make sure that
L2CAP channel is deleted.

Signed-off-by: Andrei Emeltchenko <andrei.emeltchenko@nokia.com>
Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>
2011-03-24 17:04:44 -03:00
Johan Hedberg 6994ca5e8a Bluetooth: Fix missing hci_dev_lock_bh in user_confirm_reply
The code was correctly calling _unlock at the end of the function but
there was no actual _lock call anywhere.

Signed-off-by: Johan Hedberg <johan.hedberg@nokia.com>
Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>
2011-03-24 17:04:44 -03:00
Gustavo F. Padovan f630cf0d54 Bluetooth: Fix HCI_RESET command synchronization
We can't send new commands before a cmd_complete for the HCI_RESET command
shows up.

Reported-by: Mikko Vinni <mmvinni@yahoo.com>
Reported-by: Justin P. Mattock <justinmattock@gmail.com>
Reported-by: Ed Tomlinson <edt@aei.ca>
Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>
Tested-by: Justin P. Mattock <justinmattock@gmail.com>
Tested-by: Mikko Vinni <mmvinni@yahoo.com>
Tested-by: Ed Tomlinson <edt@aei.ca>
2011-03-24 17:04:44 -03:00
Suraj Sumangala 52cb1c0d20 Bluetooth: Increment unacked_frames count only the first transmit
This patch lets 'l2cap_pinfo.unacked_frames' be incremented only
the first time a frame is transmitted.

Previously it was being incremented for retransmitted packets
too resulting the value to cross the transmit window size.

Signed-off-by: Suraj Sumangala <suraj@atheros.com>
Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>
2011-03-24 17:04:43 -03:00
Eric Dumazet fcd13f42c9 ipv4: fix fib metrics
Alessandro Suardi reported that we could not change route metrics :

ip ro change default .... advmss 1400

This regression came with commit 9c150e82ac (Allocate fib metrics
dynamically). fib_metrics is no longer an array, but a pointer to an
array.

Reported-by: Alessandro Suardi <alessandro.suardi@gmail.com>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Tested-by: Alessandro Suardi <alessandro.suardi@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-03-24 11:49:54 -07:00
Bryan Schumaker 8f70e95f9f NFS: Determine initial mount security
When sec=<something> is not presented as a mount option,
we should attempt to determine what security flavor the
server is using.

Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-03-24 13:52:42 -04:00
Bryan Schumaker 7ebb931598 NFS: use secinfo when crossing mountpoints
A submount may use different security than the parent
mount does.  We should figure out what sec flavor the
submount uses at mount time.

Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-03-24 13:52:42 -04:00
Linus Torvalds dc87c55120 Merge branch 'for-2.6.39' of git://linux-nfs.org/~bfields/linux
* 'for-2.6.39' of git://linux-nfs.org/~bfields/linux:
  SUNRPC: Remove resource leak in svc_rdma_send_error()
  nfsd: wrong index used in inner loop
  nfsd4: fix comment and remove unused nfsd4_file fields
  nfs41: make sure nfs server return right ca_maxresponsesize_cached
  nfsd: fix compile error
  svcrpc: fix bad argument in unix_domain_find
  nfsd4: fix struct file leak
  nfsd4: minor nfs4state.c reshuffling
  svcrpc: fix rare race on unix_domain creation
  nfsd41: modify the members value of nfsd4_op_flags
  nfsd: add proc file listing kernel's gss_krb5 enctypes
  gss:krb5 only include enctype numbers in gm_upcall_enctypes
  NFSD, VFS: Remove dead code in nfsd_rename()
  nfsd: kill unused macro definition
  locks: use assign_type()
2011-03-24 08:20:39 -07:00
Akinobu Mita e1dc1c81b9 rds: use little-endian bitops
As a preparation for removing ext2 non-atomic bit operations from
asm/bitops.h.  This converts ext2 non-atomic bit operations to
little-endian bit operations.

Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Cc: Andy Grover <andy.grover@oracle.com>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-03-23 19:46:16 -07:00
Akinobu Mita 12ce22423a rds: stop including asm-generic/bitops/le.h directly
asm-generic/bitops/le.h is only intended to be included directly from
asm-generic/bitops/ext2-non-atomic.h or asm-generic/bitops/minix-le.h
which implements generic ext2 or minix bit operations.

This stops including asm-generic/bitops/le.h directly and use ext2
non-atomic bit operations instead.

It seems odd to use ext2_*_bit() on rds, but it will replaced with
__{set,clear,test}_bit_le() after introducing little endian bit operations
for all architectures.  This indirect step is necessary to maintain
bisectability for some architectures which have their own little-endian
bit operations.

Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Cc: Andy Grover <andy.grover@oracle.com>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-03-23 19:46:10 -07:00
Eric Dumazet eb49a97363 ipv4: fix ip_rt_update_pmtu()
commit 2c8cec5c10 (Cache learned PMTU information in inetpeer) added
an extra inet_putpeer() call in ip_rt_update_pmtu().

This results in various problems, since we can free one inetpeer, while
it is still in use.

Ref: http://www.spinics.net/lists/netdev/msg159121.html

Reported-by: Alexander Beregalov <a.beregalov@gmail.com>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-03-23 12:18:15 -07:00
David S. Miller 406b6f974d ipv4: Fallback to FIB local table in __ip_dev_find().
In commit 9435eb1cf0
("ipv4: Implement __ip_dev_find using new interface address hash.")
we reimplemented __ip_dev_find() so that it doesn't have to
do a full FIB table lookup.

Instead, it consults a hash table of addresses configured to
interfaces.

This works identically to the old code in all except one case,
and that is for loopback subnets.

The old code would match the loopback device for any IP address
that falls within a subnet configured to the loopback device.

Handle this corner case by doing the FIB lookup.

We could implement this via inet_addr_onlink() but:

1) Someone could configure many addresses to loopback and
   inet_addr_onlink() is a simple list traversal.

2) We know the old code works.

Reported-by: Julian Anastasov <ja@ssi.bg>
Acked-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-03-23 12:16:15 -07:00
David S. Miller f6152737a9 tcp: Make undo_ssthresh arg to tcp_undo_cwr() a bool.
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-03-22 19:37:11 -07:00
Yuchung Cheng 67d4120a17 tcp: avoid cwnd moderation in undo
In the current undo logic, cwnd is moderated after it was restored
to the value prior entering fast-recovery. It was moderated first
in tcp_try_undo_recovery then again in tcp_complete_cwr.

Since the undo indicates recovery was false, these moderations
are not necessary. If the undo is triggered when most of the
outstanding data have been acknowledged, the (restored) cwnd is
falsely pulled down to a small value.

This patch removes these cwnd moderations if cwnd is undone
  a) during fast-recovery
	b) by receiving DSACKs past fast-recovery

Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-03-22 19:36:08 -07:00
Linus Lüssing a7bff75b08 bridge: Fix possibly wrong MLD queries' ethernet source address
The ipv6_dev_get_saddr() is currently called with an uninitialized
destination address. Although in tests it usually seemed to nevertheless
always fetch the right source address, there seems to be a possible race
condition.

Therefore this commit changes this, first setting the destination
address and only after that fetching the source address.

Reported-by: Jan Beulich <JBeulich@novell.com>
Signed-off-by: Linus Lüssing <linus.luessing@web.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-03-22 19:26:42 -07:00
Florian Westphal 9c7a4f9ce6 ipv6: ip6_route_output does not modify sk parameter, so make it const
This avoids explicit cast to avoid 'discards qualifiers'
compiler warning in a netfilter patch that i've been working on.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-03-22 19:17:36 -07:00
Eric Dumazet 94dcf29a11 kthread: use kthread_create_on_node()
ksoftirqd, kworker, migration, and pktgend kthreads can be created with
kthread_create_on_node(), to get proper NUMA affinities for their stack and
task_struct.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Acked-by: David S. Miller <davem@davemloft.net>
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Acked-by: Rusty Russell <rusty@rustcorp.com.au>
Acked-by: Tejun Heo <tj@kernel.org>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: David Howells <dhowells@redhat.com>
Cc: <linux-arch@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-03-22 17:44:01 -07:00
Linus Torvalds ab70a1d7c7 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ericvh/v9fs
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ericvh/v9fs:
  [net/9p]: Introduce basic flow-control for VirtIO transport.
  9p: use the updated offset given by generic_write_checks
  [net/9p] Don't re-pin pages on retrying virtqueue_add_buf().
  [net/9p] Set the condition just before waking up.
  [net/9p] unconditional wake_up to proc waiting for space on VirtIO ring
  fs/9p: Add v9fs_dentry2v9ses
  fs/9p: Attach writeback_fid on first open with WR flag
  fs/9p: Open writeback fid in O_SYNC mode
  fs/9p: Use truncate_setsize instead of vmtruncate
  net/9p: Fix compile warning
  net/9p: Convert the in the 9p rpc call path to GFP_NOFS
  fs/9p: Fix race in initializing writeback fid
2011-03-22 16:26:10 -07:00
Linus Torvalds 0adfc56ce8 Merge git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client
* git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client:
  rbd: use watch/notify for changes in rbd header
  libceph: add lingering request and watch/notify event framework
  rbd: update email address in Documentation
  ceph: rename dentry_release -> d_release, fix comment
  ceph: add request to the tail of unsafe write list
  ceph: remove request from unsafe list if it is canceled/timed out
  ceph: move readahead default to fs/ceph from libceph
  ceph: add ino32 mount option
  ceph: update common header files
  ceph: remove debugfs debug cruft
  libceph: fix osd request queuing on osdmap updates
  ceph: preserve I_COMPLETE across rename
  libceph: Fix base64-decoding when input ends in newline.
2011-03-22 16:25:25 -07:00
Trond Myklebust 246408dcd5 SUNRPC: Never reuse the socket port after an xs_close()
If we call xs_close(), we're in one of two situations:
 - Autoclose, which means we don't expect to resend a request
 - bind+connect failed, which probably means the port is in use

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: stable@kernel.org
2011-03-22 18:42:33 -04:00
David S. Miller db138908cc Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-2.6 2011-03-22 14:36:18 -07:00
Venkateswararao Jujjuri (JV) 68da9ba4ee [net/9p]: Introduce basic flow-control for VirtIO transport.
Recent zerocopy work in the 9P VirtIO transport maps and pins
user buffers into kernel memory for the server to work on them.
Since the user process can initiate this kind of pinning with a simple
read/write call, thousands of IO threads initiated by the user process can
hog the system resources and could result into denial of service.

This patch introduces flow control to avoid that extreme scenario.

The ceiling limit to avoid denial of service attacks is set to relatively
high (nr_free_pagecache_pages()/4) so that it won't interfere with
regular usage, but can step in extreme cases to limit the total system
hang. Since we don't have a global structure to accommodate this variable,
I choose the virtio_chan as the home for this.

Signed-off-by: Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com>
Reviewed-by: Badari Pulavarty <pbadari@us.ibm.com>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
2011-03-22 16:32:50 -05:00
Venkateswararao Jujjuri (JV) 316ad5501c [net/9p] Don't re-pin pages on retrying virtqueue_add_buf().
Signed-off-by: Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
2011-03-22 16:32:48 -05:00
Venkateswararao Jujjuri (JV) a01a984035 [net/9p] Set the condition just before waking up.
Given that the sprious wake-ups are common, we need to move the
condition setting right next to the wake_up().  After setting the condition
to req->status = REQ_STATUS_RCVD, sprious wakeups may cause the
virtqueue back on the free list for someone else to use.
This may result in kernel panic while relasing the pinned pages
in p9_release_req_pages().

Also rearranged the while loop in req_done() for better redability.

Signed-off-by: Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
2011-03-22 16:32:47 -05:00
Venkateswararao Jujjuri (JV) 53bda3e5b4 [net/9p] unconditional wake_up to proc waiting for space on VirtIO ring
Process may wait to get space on VirtIO ring to send a transaction to
VirtFS server. Current code just does a conditional wake_up() which
means only one process will be woken up even if multiple processes
are waiting.

This fix makes the wake_up unconditional. Hence we won't have any
processes waiting for-ever.

Signed-off-by: Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
2011-03-22 16:32:19 -05:00
Aneesh Kumar K.V 472e7f9f8b net/9p: Fix compile warning
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
2011-03-22 15:43:35 -05:00
Aneesh Kumar K.V eeff66ef6e net/9p: Convert the in the 9p rpc call path to GFP_NOFS
Without this we can cause reclaim allocation in writepage.

[ 3433.448430] =================================
[ 3433.449117] [ INFO: inconsistent lock state ]
[ 3433.449117] 2.6.38-rc5+ #84
[ 3433.449117] ---------------------------------
[ 3433.449117] inconsistent {RECLAIM_FS-ON-W} -> {IN-RECLAIM_FS-R} usage.
[ 3433.449117] kswapd0/505 [HC0[0]:SC0[0]:HE1:SE1] takes:
[ 3433.449117]  (iprune_sem){+++++-}, at: [<ffffffff810ebbab>] shrink_icache_memory+0x45/0x2b1
[ 3433.449117] {RECLAIM_FS-ON-W} state was registered at:
[ 3433.449117]   [<ffffffff8107fe5f>] mark_held_locks+0x52/0x70
[ 3433.449117]   [<ffffffff8107ff02>] lockdep_trace_alloc+0x85/0x9f
[ 3433.449117]   [<ffffffff810d353d>] slab_pre_alloc_hook+0x18/0x3c
[ 3433.449117]   [<ffffffff810d3fd5>] kmem_cache_alloc+0x23/0xa2
[ 3433.449117]   [<ffffffff8127be77>] idr_pre_get+0x2d/0x6f
[ 3433.449117]   [<ffffffff815434eb>] p9_idpool_get+0x30/0xae
[ 3433.449117]   [<ffffffff81540123>] p9_client_rpc+0xd7/0x9b0
[ 3433.449117]   [<ffffffff815427b0>] p9_client_clunk+0x88/0xdb
[ 3433.449117]   [<ffffffff811d56e5>] v9fs_evict_inode+0x3c/0x48
[ 3433.449117]   [<ffffffff810eb511>] evict+0x1f/0x87
[ 3433.449117]   [<ffffffff810eb5c0>] dispose_list+0x47/0xe3
[ 3433.449117]   [<ffffffff810eb8da>] evict_inodes+0x138/0x14f
[ 3433.449117]   [<ffffffff810d90e2>] generic_shutdown_super+0x57/0xe8
[ 3433.449117]   [<ffffffff810d91e8>] kill_anon_super+0x11/0x50
[ 3433.449117]   [<ffffffff811d4951>] v9fs_kill_super+0x49/0xab
[ 3433.449117]   [<ffffffff810d926e>] deactivate_locked_super+0x21/0x46
[ 3433.449117]   [<ffffffff810d9e84>] deactivate_super+0x40/0x44
[ 3433.449117]   [<ffffffff810ef848>] mntput_no_expire+0x100/0x109
[ 3433.449117]   [<ffffffff810f0aeb>] sys_umount+0x2f1/0x31c
[ 3433.449117]   [<ffffffff8102c87b>] system_call_fastpath+0x16/0x1b
[ 3433.449117] irq event stamp: 192941
[ 3433.449117] hardirqs last  enabled at (192941): [<ffffffff81568dcf>] _raw_spin_unlock_irq+0x2b/0x30
[ 3433.449117] hardirqs last disabled at (192940): [<ffffffff810b5f97>] shrink_inactive_list+0x290/0x2f5
[ 3433.449117] softirqs last  enabled at (188470): [<ffffffff8105fd65>] __do_softirq+0x133/0x152
[ 3433.449117] softirqs last disabled at (188455): [<ffffffff8102d7cc>] call_softirq+0x1c/0x28
[ 3433.449117]
[ 3433.449117] other info that might help us debug this:
[ 3433.449117] 1 lock held by kswapd0/505:
[ 3433.449117]  #0:  (shrinker_rwsem){++++..}, at: [<ffffffff810b52e2>] shrink_slab+0x38/0x15f
[ 3433.449117]
[ 3433.449117] stack backtrace:
[ 3433.449117] Pid: 505, comm: kswapd0 Not tainted 2.6.38-rc5+ #84
[ 3433.449117] Call Trace:
[ 3433.449117]  [<ffffffff8107fbce>] ? valid_state+0x17e/0x191
[ 3433.449117]  [<ffffffff81036896>] ? save_stack_trace+0x28/0x45
[ 3433.449117]  [<ffffffff81080426>] ? check_usage_forwards+0x0/0x87
[ 3433.449117]  [<ffffffff8107fcf4>] ? mark_lock+0x113/0x22c
[ 3433.449117]  [<ffffffff8108105f>] ? __lock_acquire+0x37a/0xcf7
[ 3433.449117]  [<ffffffff8107fc0e>] ? mark_lock+0x2d/0x22c
[ 3433.449117]  [<ffffffff81081077>] ? __lock_acquire+0x392/0xcf7
[ 3433.449117]  [<ffffffff810b14d2>] ? determine_dirtyable_memory+0x15/0x28
[ 3433.449117]  [<ffffffff81081a33>] ? lock_acquire+0x57/0x6d
[ 3433.449117]  [<ffffffff810ebbab>] ? shrink_icache_memory+0x45/0x2b1
[ 3433.449117]  [<ffffffff81567d85>] ? down_read+0x47/0x5c
[ 3433.449117]  [<ffffffff810ebbab>] ? shrink_icache_memory+0x45/0x2b1
[ 3433.449117]  [<ffffffff810ebbab>] ? shrink_icache_memory+0x45/0x2b1
[ 3433.449117]  [<ffffffff810b5385>] ? shrink_slab+0xdb/0x15f
[ 3433.449117]  [<ffffffff810b69bc>] ? kswapd+0x574/0x96a
[ 3433.449117]  [<ffffffff810b6448>] ? kswapd+0x0/0x96a
[ 3433.449117]  [<ffffffff810714e2>] ? kthread+0x7d/0x85
[ 3433.449117]  [<ffffffff8102d6d4>] ? kernel_thread_helper+0x4/0x10
[ 3433.449117]  [<ffffffff81569200>] ? restore_args+0x0/0x30
[ 3433.449117]  [<ffffffff81071465>] ? kthread+0x0/0x85
[ 3433.449117]  [<ffffffff8102d6d0>] ? kernel_thread_helper+0x0/0x10

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
2011-03-22 15:43:35 -05:00
Yehuda Sadeh a40c4f10e3 libceph: add lingering request and watch/notify event framework
Lingering requests are requests that are sent to the OSD normally but
tracked also after we get a successful request.  This keeps the OSD
connection open and resends the original request if the object moves to
another OSD.  The OSD can then send notification messages back to us
if another client initiates a notify.

This framework will be used by RBD so that the client gets notification
when a snapshot is created by another node or tool.

Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net>
Signed-off-by: Sage Weil <sage@newdream.net>
2011-03-22 11:33:55 -07:00
Julian Anastasov 04024b937a ipv4: optimize route adding on secondary promotion
Optimize the calling of fib_add_ifaddr for all
secondary addresses after the promoted one to start from
their place, not from the new place of the promoted
secondary. It will save some CPU cycles because we
are sure the promoted secondary was first for the subnet
and all next secondaries do not change their place.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-03-22 01:06:33 -07:00
Julian Anastasov 2d230e2b2c ipv4: remove the routes on secondary promotion
The secondary address promotion relies on fib_sync_down_addr
to remove all routes created for the secondary addresses when
the old primary address is deleted. It does not happen for cases
when the primary address is also in another subnet. Fix that
by deleting local and broadcast routes for all secondaries while
they are on device list and by faking that all addresses from
this subnet are to be deleted. It relies on fib_del_ifaddr being
able to ignore the IPs from the concerned subnet while checking
for duplication.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-03-22 01:06:33 -07:00
Julian Anastasov e6abbaa272 ipv4: fix route deletion for IPs on many subnets
Alex Sidorenko reported for problems with local
routes left after IP addresses are deleted. It happens
when same IPs are used in more than one subnet for the
device.

	Fix fib_del_ifaddr to restrict the checks for duplicate
local and broadcast addresses only to the IFAs that use
our primary IFA or another primary IFA with same address.
And we expect the prefsrc to be matched when the routes
are deleted because it is possible they to differ only by
prefsrc. This patch prevents local and broadcast routes
to be leaked until their primary IP is deleted finally
from the box.

	As the secondary address promotion needs to delete
the routes for all secondaries that used the old primary IFA,
add option to ignore these secondaries from the checks and
to assume they are already deleted, so that we can safely
delete the route while these IFAs are still on the device list.

Reported-by: Alex Sidorenko <alexandre.sidorenko@hp.com>
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-03-22 01:06:32 -07:00
Julian Anastasov 74cb3c108b ipv4: match prefsrc when deleting routes
fib_table_delete forgets to match the routes by prefsrc.
Callers can specify known IP in fc_prefsrc and we should remove
the exact route. This is needed for cases when same local or
broadcast addresses are used in different subnets and the
routes differ only in prefsrc. All callers that do not provide
fc_prefsrc will ignore the route prefsrc as before and will
delete the first occurence. That is how the ip route del default
magic works.

	Current callers are:

- ip_rt_ioctl where rtentry_to_fib_config provides fc_prefsrc only
when the provided device name matches IP label with colon.

- inet_rtm_delroute where RTA_PREFSRC is optional too

- fib_magic which deals with routes when deleting addresses
and where the fc_prefsrc is always set with the primary IP
for the concerned IFA.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-03-22 01:06:31 -07:00
Michał Mirosław 27660515a2 net: implement dev_disable_lro() hw_features compatibility
Implement compatibility with new hw_features for dev_disable_lro().
This is a transition path - dev_disable_lro() should be later
integrated into netdev_fix_features() after all drivers are converted.

Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-03-22 01:00:26 -07:00
Simon Horman 736561a01f IPVS: Use global mutex in ip_vs_app.c
As part of the work to make IPVS network namespace aware
__ip_vs_app_mutex was replaced by a per-namespace lock,
ipvs->app_mutex. ipvs->app_key is also supplied for debugging purposes.

Unfortunately this implementation results in ipvs->app_key residing
in non-static storage which at the very least causes a lockdep warning.

This patch takes the rather heavy-handed approach of reinstating
__ip_vs_app_mutex which will cover access to the ipvs->list_head
of all network namespaces.

[   12.610000] IPVS: Creating netns size=2456 id=0
[   12.630000] IPVS: Registered protocols (TCP, UDP, SCTP, AH, ESP)
[   12.640000] BUG: key ffff880003bbf1a0 not in .data!
[   12.640000] ------------[ cut here ]------------
[   12.640000] WARNING: at kernel/lockdep.c:2701 lockdep_init_map+0x37b/0x570()
[   12.640000] Hardware name: Bochs
[   12.640000] Pid: 1, comm: swapper Tainted: G        W 2.6.38-kexec-06330-g69b7efe-dirty #122
[   12.650000] Call Trace:
[   12.650000]  [<ffffffff8102e685>] warn_slowpath_common+0x75/0xb0
[   12.650000]  [<ffffffff8102e6d5>] warn_slowpath_null+0x15/0x20
[   12.650000]  [<ffffffff8105967b>] lockdep_init_map+0x37b/0x570
[   12.650000]  [<ffffffff8105829d>] ? trace_hardirqs_on+0xd/0x10
[   12.650000]  [<ffffffff81055ad8>] debug_mutex_init+0x38/0x50
[   12.650000]  [<ffffffff8104bc4c>] __mutex_init+0x5c/0x70
[   12.650000]  [<ffffffff81685ee7>] __ip_vs_app_init+0x64/0x86
[   12.660000]  [<ffffffff81685a3b>] ? ip_vs_init+0x0/0xff
[   12.660000]  [<ffffffff811b1c33>] T.620+0x43/0x170
[   12.660000]  [<ffffffff811b1e9a>] ? register_pernet_subsys+0x1a/0x40
[   12.660000]  [<ffffffff81685a3b>] ? ip_vs_init+0x0/0xff
[   12.660000]  [<ffffffff81685a3b>] ? ip_vs_init+0x0/0xff
[   12.660000]  [<ffffffff811b1db7>] register_pernet_operations+0x57/0xb0
[   12.660000]  [<ffffffff81685a3b>] ? ip_vs_init+0x0/0xff
[   12.670000]  [<ffffffff811b1ea9>] register_pernet_subsys+0x29/0x40
[   12.670000]  [<ffffffff81685f19>] ip_vs_app_init+0x10/0x12
[   12.670000]  [<ffffffff81685a87>] ip_vs_init+0x4c/0xff
[   12.670000]  [<ffffffff8166562c>] do_one_initcall+0x7a/0x12e
[   12.670000]  [<ffffffff8166583e>] kernel_init+0x13e/0x1c2
[   12.670000]  [<ffffffff8128c134>] kernel_thread_helper+0x4/0x10
[   12.670000]  [<ffffffff8128ad40>] ? restore_args+0x0/0x30
[   12.680000]  [<ffffffff81665700>] ? kernel_init+0x0/0x1c2
[   12.680000]  [<ffffffff8128c130>] ? kernel_thread_helper+0x0/0x1global0

Signed-off-by: Simon Horman <horms@verge.net.au>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Julian Anastasov <ja@ssi.bg>
Cc: Hans Schillstrom <hans@schillstrom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-03-21 20:39:24 -07:00
Eric Dumazet f40f94fc6c ipvs: fix a typo in __ip_vs_control_init()
Reported-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Simon Horman <horms@verge.net.au>
Cc: Julian Anastasov <ja@ssi.bg>
Acked-by: Simon Horman <horms@verge.net.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-03-21 20:39:24 -07:00
Eric W. Biederman 9d2a8fa96a net ipv6: Fix duplicate /proc/sys/net/ipv6/neigh directory entries.
When I was fixing issues with unregisgtering tables under /proc/sys/net/ipv6/neigh
by adding a mount point it appears I missed a critical ordering issue, in the
ipv6 initialization.  I had not realized that ipv6_sysctl_register is called
at the very end of the ipv6 initialization and in particular after we call
neigh_sysctl_register from ndisc_init.

"neigh" needs to be initialized in ipv6_static_sysctl_register which is
the first ipv6 table to initialized, and definitely before ndisc_init.
This removes the weirdness of duplicate tables while still providing a
"neigh" mount point which prevents races in sysctl unregistering.

This was initially reported at https://bugzilla.kernel.org/show_bug.cgi?id=31232
Reported-by: sunkan@zappa.cx
Signed-off-by: Eric W. Biederman <ebiederm@aristanetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-03-21 18:23:34 -07:00
Neil Horman ac0a121d79 net: fix incorrect spelling in drop monitor protocol
It was pointed out to me recently that my spelling could be better :)

Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-03-21 18:20:26 -07:00
Arnd Bergmann b20e7bbfc7 net/appletalk: fix atalk_release use after free
The BKL removal in appletalk introduced a use-after-free problem,
where atalk_destroy_socket frees a sock, but we still release
the socket lock on it.

An easy fix is to take an extra reference on the sock and sock_put
it when returning from atalk_release.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-03-21 18:18:00 -07:00
Eric Dumazet 674f211599 ipx: fix ipx_release()
Commit b0d0d915d1 (remove the BKL) added a regression, because
sock_put() can free memory while we are going to use it later.

Fix is to delay sock_put() _after_ release_sock().

Reported-by: Ingo Molnar <mingo@elte.hu>
Tested-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-03-21 18:16:39 -07:00
James Chapman 8aa525a934 l2tp: fix possible oops on l2tp_eth module unload
A struct used in the l2tp_eth driver for registering network namespace
ops was incorrectly marked as __net_initdata, leading to oops when
module unloaded.

BUG: unable to handle kernel paging request at ffffffffa00ec098
IP: [<ffffffff8123dbd8>] ops_exit_list+0x7/0x4b
PGD 142d067 PUD 1431063 PMD 195da8067 PTE 0
Oops: 0000 [#1] SMP 
last sysfs file: /sys/module/l2tp_eth/refcnt
Call Trace:
 [<ffffffff8123dc94>] ? unregister_pernet_operations+0x32/0x93
 [<ffffffff8123dd20>] ? unregister_pernet_device+0x2b/0x38
 [<ffffffff81068b6e>] ? sys_delete_module+0x1b8/0x222
 [<ffffffff810c7300>] ? do_munmap+0x254/0x318
 [<ffffffff812c64e5>] ? page_fault+0x25/0x30
 [<ffffffff812c6952>] ? system_call_fastpath+0x16/0x1b

Signed-off-by: James Chapman <jchapman@katalix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-03-21 18:10:25 -07:00
Wei Yongjun a454f0ccef xfrm: Fix initialize repl field of struct xfrm_state
Commit 'xfrm: Move IPsec replay detection functions to a separate file'
  (9fdc4883d9)
introduce repl field to struct xfrm_state, and only initialize it
under SA's netlink create path, the other path, such as pf_key,
ipcomp/ipcomp6 etc, the repl field remaining uninitialize. So if
the SA is created by pf_key, any input packet with SA's encryption
algorithm will cause panic.

    int xfrm_input()
    {
        ...
        x->repl->advance(x, seq);
        ...
    }

This patch fixed it by introduce new function __xfrm_init_state().

Pid: 0, comm: swapper Not tainted 2.6.38-next+ #14 Bochs Bochs
EIP: 0060:[<c078e5d5>] EFLAGS: 00010206 CPU: 0
EIP is at xfrm_input+0x31c/0x4cc
EAX: dd839c00 EBX: 00000084 ECX: 00000000 EDX: 01000000
ESI: dd839c00 EDI: de3a0780 EBP: dec1de88 ESP: dec1de64
 DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
Process swapper (pid: 0, ti=dec1c000 task=c09c0f20 task.ti=c0992000)
Stack:
 00000000 00000000 00000002 c0ba27c0 00100000 01000000 de3a0798 c0ba27c0
 00000033 dec1de98 c0786848 00000000 de3a0780 dec1dea4 c0786868 00000000
 dec1debc c074ee56 e1da6b8c de3a0780 c074ed44 de3a07a8 dec1decc c074ef32
Call Trace:
 [<c0786848>] xfrm4_rcv_encap+0x22/0x27
 [<c0786868>] xfrm4_rcv+0x1b/0x1d
 [<c074ee56>] ip_local_deliver_finish+0x112/0x1b1
 [<c074ed44>] ? ip_local_deliver_finish+0x0/0x1b1
 [<c074ef32>] NF_HOOK.clone.1+0x3d/0x44
 [<c074ef77>] ip_local_deliver+0x3e/0x44
 [<c074ed44>] ? ip_local_deliver_finish+0x0/0x1b1
 [<c074ec03>] ip_rcv_finish+0x30a/0x332
 [<c074e8f9>] ? ip_rcv_finish+0x0/0x332
 [<c074ef32>] NF_HOOK.clone.1+0x3d/0x44
 [<c074f188>] ip_rcv+0x20b/0x247
 [<c074e8f9>] ? ip_rcv_finish+0x0/0x332
 [<c072797d>] __netif_receive_skb+0x373/0x399
 [<c0727bc1>] netif_receive_skb+0x4b/0x51
 [<e0817e2a>] cp_rx_poll+0x210/0x2c4 [8139cp]
 [<c072818f>] net_rx_action+0x9a/0x17d
 [<c0445b5c>] __do_softirq+0xa1/0x149
 [<c0445abb>] ? __do_softirq+0x0/0x149

Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-03-21 18:08:28 -07:00
Sage Weil 6f6c700675 libceph: fix osd request queuing on osdmap updates
If we send a request to osd A, and the request's pg remaps to osd B and
then back to A in quick succession, we need to resend the request to A. The
old code was only calling kick_requests after processing all incremental
maps in a message, so it was very possible to not resend a request that
needed to be resent.  This would make the osd eventually time out (at least
with the current default of osd timeouts enabled).

The correct approach is to scan requests on every map incremental.  This
patch refactors the kick code in a few ways:
 - all requests are either on req_lru (in flight), req_unsent (ready to
   send), or req_notarget (currently map to no up osd)
 - mapping always done by map_request (previous map_osds)
 - if the mapping changes, we requeue.  requests are resent only after all
   map incrementals are processed.
 - some osd reset code is moved out of kick_requests into a separate
   function
 - the "kick this osd" functionality is moved to kick_osd_requests, as it
   is unrelated to scanning for request->pg->osd mapping changes

Signed-off-by: Sage Weil <sage@newdream.net>
2011-03-21 12:24:19 -07:00
Felix Fietkau 8bc8aecdc5 mac80211: initialize sta->last_rx in sta_info_alloc
This field is used to determine the inactivity time. When in AP mode,
hostapd uses it for kicking out inactive clients after a while. Without this
patch, hostapd immediately deauthenticates a new client if it checks the
inactivity time before the client sends its first data frame.

Signed-off-by: Felix Fietkau <nbd@openwrt.org>
Cc: stable@kernel.org
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2011-03-21 15:19:50 -04:00
Vasiliy Kulikov 961ed183a9 netfilter: ipt_CLUSTERIP: fix buffer overflow
'buffer' string is copied from userspace.  It is not checked whether it is
zero terminated.  This may lead to overflow inside of simple_strtoul().
Changli Gao suggested to copy not more than user supplied 'size' bytes.

It was introduced before the git epoch.  Files "ipt_CLUSTERIP/*" are
root writable only by default, however, on some setups permissions might be
relaxed to e.g. network admin user.

Signed-off-by: Vasiliy Kulikov <segoon@openwall.com>
Acked-by: Changli Gao <xiaosuo@gmail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2011-03-20 15:42:52 +01:00
Eric Dumazet db856674ac netfilter: xtables: fix reentrancy
commit f3c5c1bfd4 (make ip_tables reentrant) introduced a race in
handling the stackptr restore, at the end of ipt_do_table()

We should do it before the call to xt_info_rdunlock_bh(), or we allow
cpu preemption and another cpu overwrites stackptr of original one.

A second fix is to change the underflow test to check the origptr value
instead of 0 to detect underflow, or else we allow a jump from different
hooks.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Jan Engelhardt <jengelh@medozas.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2011-03-20 15:40:06 +01:00
Jozsef Kadlecsik 5c1aba4678 netfilter: ipset: fix checking the type revision at create command
The revision of the set type was not checked at the create command: if the
userspace sent a valid set type but with not supported revision number,
it'd create a loop.

Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2011-03-20 15:35:01 +01:00
Jozsef Kadlecsik 5e0c1eb7e6 netfilter: ipset: fix address ranges at hash:*port* types
The hash:*port* types with IPv4 silently ignored when address ranges
with non TCP/UDP were added/deleted from the set and used the first
address from the range only.

Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2011-03-20 15:33:26 +01:00
Herbert Xu 6b1e960fdb bridge: Reset IPCB when entering IP stack on NF_FORWARD
Whenever we enter the IP stack proper from bridge netfilter we
need to ensure that the skb is in a form the IP stack expects
it to be in.

The entry point on NF_FORWARD did not meet the requirements of
the IP stack, therefore leading to potential crashes/panics.

This patch fixes the problem.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Acked-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-03-18 15:13:12 -07:00
Eric Dumazet d870bfb9d3 vlan: should take into account needed_headroom
Commit c95b819ad7 (gre: Use needed_headroom)
made gre use needed_headroom instead of hard_header_len

This uncover a bug in vlan code.

We should make sure vlan devices take into account their
real_dev->needed_headroom or we risk a crash in ipgre_header(), because
we dont have enough room to push IP header in skb.

Reported-by: Diddi Oscarsson <diddi@diddi.se>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Patrick McHardy <kaber@trash.net>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-03-18 15:13:12 -07:00
Ben Hutchings 3a7da39d16 ethtool: Compat handling for struct ethtool_rxnfc
This structure was accidentally defined such that its layout can
differ between 32-bit and 64-bit processes.  Add compat structure
definitions and an ioctl wrapper function.

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Acked-by: Alexander Duyck <alexander.h.duyck@intel.com>
Cc: stable@kernel.org [2.6.30+]
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-03-18 15:13:11 -07:00
Roger Luethi 5e5069b41d ethtool: __ethtool_set_sg: check for function pointer before using it
__ethtool_set_sg does not check if dev->ethtool_ops->set_sg is defined
which can result in a NULL pointer dereference when ethtool is used to
change SG settings for drivers without SG support.

Signed-off-by: Roger Luethi <rl@hellgate.ch>
Reviewed-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-03-18 15:13:10 -07:00
Vasiliy Kulikov 67c5c6cb81 econet: 4 byte infoleak to the network
struct aunhdr has 4 padding bytes between 'pad' and 'handle' fields on
x86_64.  These bytes are not initialized in the variable 'ah' before
sending 'ah' to the network.  This leads to 4 bytes kernel stack
infoleak.

This bug was introduced before the git epoch.

Signed-off-by: Vasiliy Kulikov <segoon@openwall.com>
Acked-by: Phil Blundell <philb@gnu.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-03-18 15:12:15 -07:00
Linus Torvalds e16b396ce3 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (47 commits)
  doc: CONFIG_UNEVICTABLE_LRU doesn't exist anymore
  Update cpuset info & webiste for cgroups
  dcdbas: force SMI to happen when expected
  arch/arm/Kconfig: remove one to many l's in the word.
  asm-generic/user.h: Fix spelling in comment
  drm: fix printk typo 'sracth'
  Remove one to many n's in a word
  Documentation/filesystems/romfs.txt: fixing link to genromfs
  drivers:scsi Change printk typo initate -> initiate
  serial, pch uart: Remove duplicate inclusion of linux/pci.h header
  fs/eventpoll.c: fix spelling
  mm: Fix out-of-date comments which refers non-existent functions
  drm: Fix printk typo 'failled'
  coh901318.c: Change initate to initiate.
  mbox-db5500.c Change initate to initiate.
  edac: correct i82975x error-info reported
  edac: correct i82975x mci initialisation
  edac: correct commented info
  fs: update comments to point correct document
  target: remove duplicate include of target/target_core_device.h from drivers/target/target_core_hba.c
  ...

Trivial conflict in fs/eventpoll.c (spelling vs addition)
2011-03-18 10:37:40 -07:00
Linus Torvalds 7fd23a2471 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/hid
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/hid: (48 commits)
  HID: add support for Logitech Driving Force Pro wheel
  HID: hid-ortek: remove spurious reference
  HID: add support for Ortek PKB-1700
  HID: roccat-koneplus: vorrect mode of sysfs attr 'sensor'
  HID: hid-ntrig: init settle and mode check
  HID: merge hid-egalax into hid-multitouch
  HID: hid-multitouch: Send events per slot if CONTACTCOUNT is missing
  HID: ntrig remove if and drop an indent
  HID: ACRUX - activate the device immediately after binding
  HID: ntrig: apply NO_INIT_REPORTS quirk
  HID: hid-magicmouse: Correct touch orientation direction
  HID: ntrig don't dereference unclaimed hidinput
  HID: Do not create input devices for feature reports
  HID: bt hidp: send Output reports using SET_REPORT on the Control channel
  HID: hid-sony.c: Fix sending Output reports to the Sixaxis
  HID: add support for Keytouch IEC 60945
  HID: Add HID Report Descriptor to sysfs
  HID: add IRTOUCH infrared USB to hid_have_special_driver
  HID: kernel oops in out_cleanup in function hidinput_connect
  HID: Add teletext/color keys - gyration remote - EU version (GYAR3101CKDE)
  ...
2011-03-18 10:35:30 -07:00
Linus Torvalds 179198373c Merge branch 'nfs-for-2.6.39' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6
* 'nfs-for-2.6.39' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6: (54 commits)
  RPC: killing RPC tasks races fixed
  xprt: remove redundant check
  SUNRPC: Convert struct rpc_xprt to use atomic_t counters
  SUNRPC: Ensure we always run the tk_callback before tk_action
  sunrpc: fix printk format warning
  xprt: remove redundant null check
  nfs: BKL is no longer needed, so remove the include
  NFS: Fix a warning in fs/nfs/idmap.c
  Cleanup: Factor out some cut-and-paste code.
  cleanup: save 60 lines/100 bytes by combining two mostly duplicate functions.
  NFS: account direct-io into task io accounting
  gss:krb5 only include enctype numbers in gm_upcall_enctypes
  RPCRDMA: Fix FRMR registration/invalidate handling.
  RPCRDMA: Fix to XDR page base interpretation in marshalling logic.
  NFSv4: Send unmapped uid/gids to the server when using auth_sys
  NFSv4: Propagate the error NFS4ERR_BADOWNER to nfs4_do_setattr
  NFSv4: cleanup idmapper functions to take an nfs_server argument
  NFSv4: Send unmapped uid/gids to the server if the idmapper fails
  NFSv4: If the server sends us a numeric uid/gid then accept it
  NFSv4.1: reject zero layout with zeroed stripe unit
  ...
2011-03-17 17:40:00 -07:00
Jesper Juhl 4be34b9d69 SUNRPC: Remove resource leak in svc_rdma_send_error()
We leak the memory allocated to 'ctxt' when we return after
'ib_dma_mapping_error()' returns !=0.

Signed-off-by: Jesper Juhl <jj@chaosbits.net>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2011-03-17 14:38:03 -04:00
Stanislav Kinsbursky 8e26de238f RPC: killing RPC tasks races fixed
RPC task RPC_TASK_QUEUED bit is set must be checked before trying to wake up
task rpc_killall_tasks() because task->tk_waitqueue can not be set (equal to
NULL).
Also, as Trond Myklebust mentioned, such approach (instead of checking
tk_waitqueue to NULL) allows us to "optimise away the call to
rpc_wake_up_queued_task() altogether for those
tasks that aren't queued".

Here is an example of dereferencing of tk_waitqueue equal to NULL:

CPU 0               	CPU 1				CPU 2
--------------------	---------------------	--------------------------
nfs4_run_open_task
rpc_run_task
rpc_execute
rpc_set_active
rpc_make_runnable
(waiting)
			rpc_async_schedule
			nfs4_open_prepare
			nfs_wait_on_sequence
						nfs_umount_begin
						rpc_killall_tasks
						rpc_wake_up_task
						rpc_wake_up_queued_task
						spin_lock(tk_waitqueue == NULL)
						BUG()
			rpc_sleep_on
			spin_lock(&q->lock)
			__rpc_sleep_on
			task->tk_waitqueue = q

Signed-off-by: Stanislav Kinsbursky <skinsbursky@openvz.org>
Cc: stable@kernel.org
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-03-17 12:39:00 -04:00
j223yang@asset.uwaterloo.ca ba3c578de2 xprt: remove redundant check
remove redundant check.

Signed-off-by: Jinqiu Yang <crindy646@gmail.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-03-17 12:39:00 -04:00
Trond Myklebust a8de240a90 SUNRPC: Convert struct rpc_xprt to use atomic_t counters
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-03-17 12:38:59 -04:00
Trond Myklebust e020c6800c SUNRPC: Ensure we always run the tk_callback before tk_action
This fixes a race in which the task->tk_callback() puts the rpc_task
to sleep, setting a new callback. Under certain circumstances, the current
code may end up executing the task->tk_action before it gets round to the
callback.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: stable@kernel.org
2011-03-17 12:38:41 -04:00
Jiri Kosina 65b06194c9 Merge branches 'dragonrise', 'hidraw-feature', 'multitouch', 'ntrig', 'roccat', 'upstream' and 'upstream-fixes' into for-linus 2011-03-17 14:31:46 +01:00