Commit Graph

25330 Commits

Author SHA1 Message Date
Jerry Chu 37561f68bd tcp: Reject invalid ack_seq to Fast Open sockets
A packet with an invalid ack_seq may cause a TCP Fast Open socket to switch
to the unexpected TCP_CLOSING state, triggering a BUG_ON kernel panic.

When a FIN packet with an invalid ack_seq# arrives at a socket in
the TCP_FIN_WAIT1 state, rather than discarding the packet, the current
code will accept the FIN, causing state transition to TCP_CLOSING.

This may be a small deviation from RFC793, which seems to say that the
packet should be dropped. Unfortunately I did not expect this case for
Fast Open hence it will trigger a BUG_ON panic.

It turns out there is really nothing bad about a TFO socket going into
TCP_CLOSING state so I could just remove the BUG_ON statements. But after
some thought I think it's better to treat this case like TCP_SYN_RECV
and return a RST to the confused peer who caused the unacceptable ack_seq
to be generated in the first place.

Signed-off-by: H.K. Jerry Chu <hkchu@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Acked-by: Yuchung Cheng <ycheng@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-10-23 02:42:56 -04:00
Eric Dumazet 3d861f6610 net: fix secpath kmemleak
Mike Kazantsev found 3.5 kernels and beyond were leaking memory,
and tracked the faulty commit to a1c7fff7e1 ("net:
netdev_alloc_skb() use build_skb()")

While this commit seems fine, it uncovered a bug introduced
in commit bad43ca832 ("net: introduce skb_try_coalesce()), in function
kfree_skb_partial()"):

If head is stolen, we free the sk_buff,
without removing references on secpath (skb->sp).

So IPsec + IP defrag/reassembly (using skb coalescing), or
TCP coalescing could leak secpath objects.

Fix this bug by calling skb_release_head_state(skb) to properly
release all possible references to linked objects.

Reported-by: Mike Kazantsev <mk.fraggod@gmail.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Bisected-by: Mike Kazantsev <mk.fraggod@gmail.com>
Tested-by: Mike Kazantsev <mk.fraggod@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-10-22 15:16:07 -04:00
Yuchung Cheng 6f73601efb tcp: add SYN/data info to TCP_INFO
Add a bit TCPI_OPT_SYN_DATA (32) to the socket option TCP_INFO:tcpi_options.
It's set if the data in SYN (sent or received) is acked by SYN-ACK. Server or
client application can use this information to check Fast Open success rate.

Signed-off-by: Yuchung Cheng <ycheng@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-10-22 15:16:06 -04:00
Julian Anastasov bbb5823cf7 netfilter: nf_conntrack: fix rt_gateway checks for H.323 helper
After the change "Adjust semantics of rt->rt_gateway"
(commit f8126f1d51) we should properly match the nexthop when
destinations are directly connected because rt_gateway can be 0.

The rt_gateway checks in H.323 helper try to avoid the creation
of an unnecessary expectation in this call-forwarding case:

http://people.netfilter.org/zhaojingmin/h323_conntrack_nat_helper/#_Toc133598073

However, the existing code fails to avoid that in many cases,
see this thread:

http://marc.info/?l=linux-netdev&m=135043175028620&w=2

It seems it is not trivial to know from the kernel if two hosts
have to go through the firewall to communicate each other, which
is the main point of the call-forwarding filter code to avoid
creating unnecessary expectations.

So this patch just gets things the way they were as before
commit f8126f1d51.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-10-22 12:21:55 +02:00
Linus Torvalds ccfc27302c TTY fixes for 3.7-rc2
Here are some tty and serial driver fixes for your 3.7-rc1 tree.
 
 Again, the UABI header file fixes, and a number of build and runtime serial
 driver bugfixes that solve problems people have been reporting (the staging
 driver is a tty driver, hence the fixes coming in through this tree.)
 
 All of these have been in the linux-next tree for a while.
 
 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.19 (GNU/Linux)
 
 iEYEABECAAYFAlCBmVQACgkQMUfUDdst+ynEuwCfexOnEj0evTfXN32kqG50MglI
 o/UAnixeFbfSrHtFOybIEKiHchG2QX9F
 =LPFk
 -----END PGP SIGNATURE-----

Merge tag 'tty-3.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty

Pull TTY fixes from Greg Kroah-Hartman:
 "Here are some tty and serial driver fixes for your 3.7-rc1 tree.

  Again, the UABI header file fixes, and a number of build and runtime
  serial driver bugfixes that solve problems people have been reporting
  (the staging driver is a tty driver, hence the fixes coming in through
  this tree.)

  All of these have been in the linux-next tree for a while.

  Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>"

* tag 'tty-3.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty:
  staging: dgrp: check return value of alloc_tty_driver
  staging: dgrp: check for NULL pointer in (un)register_proc_table
  serial/8250_hp300: Missing 8250 register interface conversion bits
  UAPI: (Scripted) Disintegrate include/linux/hsi
  tty: serial: sccnxp: Fix bug with unterminated platform_id list
  staging: serial: dgrp: Add missing #include <linux/uaccess.h>
  serial: sccnxp: Allows the driver to be compiled as a module
  tty: Fix bogus "callbacks suppressed" messages
  net, TTY: initialize tty->driver_data before usage
2012-10-19 11:28:59 -07:00
Linus Torvalds 90cdb1a0e6 Merge branch 'for-3.7' of git://linux-nfs.org/~bfields/linux
Pull nfsd bugfixes from J Bruce Fields.

* 'for-3.7' of git://linux-nfs.org/~bfields/linux:
  SUNRPC: Prevent kernel stack corruption on long values of flush
  NLM: nlm_lookup_file() may return NLMv4-specific error codes
2012-10-19 11:00:00 -07:00
John W. Linville 06f40a41b8 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless into for-davem 2012-10-19 13:55:42 -04:00
David S. Miller db0fe0b2f6 Included fixes:
- Fix broadcast packet CRC calculation which can lead to ~80% broadcast packet
   loss
 - Fix a race condition in duplicate broadcast packet check
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.19 (GNU/Linux)
 
 iEYEABECAAYFAlCANjIACgkQpGgxIkP9cwfeUwCgkGfv2Y8evZYX6d1bW2kFg7di
 LJ8AoISKwbTCJUvKrUuQHAvTss7bcX4k
 =/dxF
 -----END PGP SIGNATURE-----

Merge tag 'batman-adv-fix-for-davem' of git://git.open-mesh.org/linux-merge

Included fixes:
- Fix broadcast packet CRC calculation which can lead to ~80% broadcast packet
  loss
- Fix a race condition in duplicate broadcast packet check

Signed-off-by: David S. Miller <davem@davemloft.net>
2012-10-18 15:36:59 -04:00
Eric Dumazet a3374c42aa tcp: fix FIONREAD/SIOCINQ
tcp_ioctl() tries to take into account if tcp socket received a FIN
to report correct number bytes in receive queue.

But its flaky because if the application ate the last skb,
we return 1 instead of 0.

Correct way to detect that FIN was received is to test SOCK_DONE.

Reported-by: Elliot Hughes <enh@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Cc: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-10-18 15:34:31 -04:00
Eric Dumazet 6d772ac557 netlink: use kfree_rcu() in netlink_release()
On some suspend/resume operations involving wimax device, we have
noticed some intermittent memory corruptions in netlink code.

Stéphane Marchesin tracked this corruption in netlink_update_listeners()
and suggested a patch.

It appears netlink_release() should use kfree_rcu() instead of kfree()
for the listeners structure as it may be used by other cpus using RCU
protection.

netlink_release() must set to NULL the listeners pointer when
it is about to be freed.

Also have to protect netlink_update_listeners() and
netlink_has_listeners() if listeners is NULL.

Add a nl_deref_protected() lockdep helper to properly document which
locks protects us.

Reported-by: Jonathan Kliegman <kliegs@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Stéphane Marchesin <marcheu@google.com>
Cc: Sam Leffler <sleffler@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-10-18 15:34:30 -04:00
Steffen Klassert 13d82bf50d ipv4: Fix flushing of cached routing informations
Currently we can not flush cached pmtu/redirect informations via
the ipv4_sysctl_rtcache_flush sysctl. We need to check the rt_genid
of the old route and reset the nh exeption if the old route is
expired when we bind a new route to a nh exeption.

Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-10-18 15:34:30 -04:00
Jiri Pirko 18c22a03a2 vlan: allow to change type when no vlan device is hooked on netdev
vlan_info might be present but still no vlan devices might be there.
That is in case of vlan0 automatically added.

So in that case, allow to change netdev type.

Reported-by: Jon Stanley <jstanley@rmrf.net>
Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-10-18 15:34:30 -04:00
Linus Lüssing 7dac7b76b8 batman-adv: Fix potential broadcast BLA-duplicate-check race condition
Threads in the bottom half of batadv_bla_check_bcast_duplist() might
otherwise for instance overwrite variables which other threads might
be using/reading at the same time in the top half, potentially
leading to messing up the bcast_duplist, possibly resulting in false
bridge loop avoidance duplicate check decisions.

Signed-off-by: Linus Lüssing <linus.luessing@web.de>
Acked-by: Simon Wunderlich <siwu@hrz.tu-chemnitz.de>
Signed-off-by: Marek Lindner <lindner_marek@yahoo.de>
2012-10-18 18:17:31 +02:00
Linus Lüssing 7f112af40f batman-adv: Fix broadcast packet CRC calculation
So far the crc16 checksum for a batman-adv broadcast data packet, received
on a batman-adv hard interface, was calculated over zero bytes of its
content leading to many incoming broadcast data packets wrongly being
dropped (60-80% packet loss).

This patch fixes this issue by calculating the crc16 over the actual,
complete broadcast payload.

The issue is a regression introduced by
("batman-adv: add broadcast duplicate check").

Signed-off-by: Linus Lüssing <linus.luessing@web.de>
Acked-by: Simon Wunderlich <siwu@hrz.tu-chemnitz.de>
Signed-off-by: Marek Lindner <lindner_marek@yahoo.de>
2012-10-18 18:14:57 +02:00
Felix Fietkau 279f0f5524 cfg80211: fix initialization of chan->max_reg_power
A few places touch chan->max_power based on updated tx power rules, but
forget to do the same to chan->max_reg_power.

Signed-off-by: Felix Fietkau <nbd@openwrt.org>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2012-10-18 17:18:48 +02:00
Felix Fietkau c4a9fafc77 cfg80211: fix antenna gain handling
No driver initializes chan->max_antenna_gain to something sensible, and
the only place where it is being used right now is inside ath9k. This
leads to ath9k potentially using less tx power than it can use, which can
decrease performance/range in some rare cases.

Rather than going through every single driver, this patch initializes
chan->orig_mag in wiphy_register(), ignoring whatever value the driver
left in there. If a driver for some reason wishes to limit it independent
from regulatory rulesets, it can do so internally.

Signed-off-by: Felix Fietkau <nbd@openwrt.org>
Cc: stable@vger.kernel.org
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2012-10-18 17:18:48 +02:00
John W. Linville 290eddc4b3 Merge branch 'for-john' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211 2012-10-17 16:23:33 -04:00
Sasha Levin 212ba90696 SUNRPC: Prevent kernel stack corruption on long values of flush
The buffer size in read_flush() is too small for the longest possible values
for it. This can lead to a kernel stack corruption:

[   43.047329] Kernel panic - not syncing: stack-protector: Kernel stack is corrupted in: ffffffff833e64b4
[   43.047329]
[   43.049030] Pid: 6015, comm: trinity-child18 Tainted: G        W    3.5.0-rc7-next-20120716-sasha #221
[   43.050038] Call Trace:
[   43.050435]  [<ffffffff836c60c2>] panic+0xcd/0x1f4
[   43.050931]  [<ffffffff833e64b4>] ? read_flush.isra.7+0xe4/0x100
[   43.051602]  [<ffffffff810e94e6>] __stack_chk_fail+0x16/0x20
[   43.052206]  [<ffffffff833e64b4>] read_flush.isra.7+0xe4/0x100
[   43.052951]  [<ffffffff833e6500>] ? read_flush_pipefs+0x30/0x30
[   43.053594]  [<ffffffff833e652c>] read_flush_procfs+0x2c/0x30
[   43.053596]  [<ffffffff812b9a8c>] proc_reg_read+0x9c/0xd0
[   43.053596]  [<ffffffff812b99f0>] ? proc_reg_write+0xd0/0xd0
[   43.053596]  [<ffffffff81250d5b>] do_loop_readv_writev+0x4b/0x90
[   43.053596]  [<ffffffff81250fd6>] do_readv_writev+0xf6/0x1d0
[   43.053596]  [<ffffffff812510ee>] vfs_readv+0x3e/0x60
[   43.053596]  [<ffffffff812511b8>] sys_readv+0x48/0xb0
[   43.053596]  [<ffffffff8378167d>] system_call_fastpath+0x1a/0x1f

Signed-off-by: Sasha Levin <levinsasha928@gmail.com>
Cc: stable@kernel.org
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-10-17 14:59:10 -04:00
Johannes Berg 3a40414f82 mac80211: connect with HT20 if HT40 is not permitted
Some changes to fix issues with HT40 APs in Korea
and follow-up changes to allow using HT40 even if
the local regulatory database disallows it caused
issues with iwlwifi (and could cause issues with
other devices); iwlwifi firmware would assert if
you tried to connect to an AP that has an invalid
configuration (e.g. using HT40- on channel 140.)

Fix this, while avoiding the "Korean AP" issue by
disabling HT40 and advertising HT20 to the AP
when connecting.

Cc: stable@vger.kernel.org [3.6]
Reported-by: Florian Reitmeir <florian@reitmeir.org>
Tested-by: Florian Reitmeir <florian@reitmeir.org>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2012-10-17 13:46:38 +02:00
Eric Dumazet 2ad5b9e4bd netfilter: xt_TEE: don't use destination address found in header
Torsten Luettgert bisected TEE regression starting with commit
f8126f1d51 (ipv4: Adjust semantics of rt->rt_gateway.)

The problem is that it tries to ARP-lookup the original destination
address of the forwarded packet, not the address of the gateway.

Fix this using FLOWI_FLAG_KNOWN_NH Julian added in commit
c92b96553a (ipv4: Add FLOWI_FLAG_KNOWN_NH), so that known
nexthop (info->gw.ip) has preference on resolving.

Reported-by: Torsten Luettgert <ml-netfilter@enda.eu>
Bisected-by: Torsten Luettgert <ml-netfilter@enda.eu>
Tested-by: Torsten Luettgert <ml-netfilter@enda.eu>
Cc: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-10-17 11:00:31 +02:00
Pablo Neira Ayuso 0b4f5b1d63 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
To obtain new flag FLOWI_FLAG_KNOWN_NH to fix netfilter's xt_TEE target.
2012-10-17 10:59:20 +02:00
Eric Dumazet 9f0d3c2781 ipv6: addrconf: fix /proc/net/if_inet6
Commit 1d5783030a (ipv6/addrconf: speedup /proc/net/if_inet6 filling)
added bugs hiding some devices from if_inet6 and breaking applications.

"ip -6 addr" could still display all IPv6 addresses, while "ifconfig -a"
couldnt.

One way to reproduce the bug is by starting in a shell :

unshare -n /bin/bash
ifconfig lo up

And in original net namespace, lo device disappeared from if_inet6

Reported-by: Jan Hinnerk Stosch <janhinnerk.stosch@gmail.com>
Tested-by: Jan Hinnerk Stosch <janhinnerk.stosch@gmail.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Mihai Maruseac <mihai.maruseac@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-10-16 14:41:47 -04:00
Zijie Pan f6e80abeab sctp: fix call to SCTP_CMD_PROCESS_SACK in sctp_cmd_interpreter()
Bug introduced by commit edfee0339e
(sctp: check src addr when processing SACK to update transport state)

Signed-off-by: Zijie Pan <zijie.pan@6wind.com>
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Acked-by: Vlad Yasevich <vyasevich@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-10-16 14:41:46 -04:00
Jiri Pirko 55462cf30a vlan: fix bond/team enslave of vlan challenged slave/port
In vlan_uses_dev() check for number of vlan devs rather than existence
of vlan_info. The reason is that vlan id 0 is there without appropriate
vlan dev on it by default which prevented from enslaving vlan challenged
dev.

Reported-by: Jon Stanley <jstanley@rmrf.net>
Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-10-16 14:41:46 -04:00
Felix Fietkau d4fa14cd62 mac80211: use ieee80211_free_txskb in a few more places
Free tx status skbs when draining power save buffers, pending frames, or
when tearing down a vif.
Fixes remaining conditions that can lead to hostapd/wpa_supplicant hangs when
running out of socket write memory.

Signed-off-by: Felix Fietkau <nbd@openwrt.org>
Cc: stable@vger.kernel.org
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2012-10-15 14:45:50 -04:00
Stanislaw Gruszka 4045f72bcf mac80211: check if key has TKIP type before updating IV
This patch fix corruption which can manifest itself by following crash
when switching on rfkill switch with rt2x00 driver:
https://bugzilla.redhat.com/attachment.cgi?id=615362

Pointer key->u.ccmp.tfm of group key get corrupted in:

ieee80211_rx_h_michael_mic_verify():

        /* update IV in key information to be able to detect replays */
        rx->key->u.tkip.rx[rx->security_idx].iv32 = rx->tkip_iv32;
        rx->key->u.tkip.rx[rx->security_idx].iv16 = rx->tkip_iv16;

because rt2x00 always set RX_FLAG_MMIC_STRIPPED, even if key is not TKIP.

We already check type of the key in different path in
ieee80211_rx_h_michael_mic_verify() function, so adding additional
check here is reasonable.

Cc: stable@vger.kernel.org # 3.0+
Signed-off-by: Stanislaw Gruszka <sgruszka@redhat.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2012-10-15 14:42:53 -04:00
John W. Linville 3d02a9265c Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth 2012-10-15 14:34:23 -04:00
Stanislaw Gruszka 6863255bd0 cfg80211/mac80211: avoid state mishmash on deauth
Avoid situation when we are on associate state in mac80211 and
on disassociate state in cfg80211. This can results on crash
during modules unload (like showed on this thread:
http://marc.info/?t=134373976300001&r=1&w=2) and possibly other
problems.

Reported-by: Pedro Francisco <pedrogfrancisco@gmail.com>
Cc: stable@vger.kernel.org
Signed-off-by: Stanislaw Gruszka <sgruszka@redhat.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2012-10-15 17:21:34 +02:00
Johannes Berg df9b42963f Merge remote-tracking branch 'wireless/master' into mac80211 2012-10-15 17:20:54 +02:00
Elison Niven 939ccba437 netfilter: xt_nat: fix incorrect hooks for SNAT and DNAT targets
In (c7232c9 netfilter: add protocol independent NAT core), the
hooks were accidentally modified:

SNAT hooks are POST_ROUTING and LOCAL_IN (before it was LOCAL_OUT).
DNAT hooks are PRE_ROUTING and LOCAL_OUT (before it was LOCAL_IN).

Signed-off-by: Elison Niven <elison.niven@cyberoam.com>
Signed-off-by: Sanket Shah <sanket.shah@cyberoam.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-10-15 13:39:12 +02:00
Pablo Neira Ayuso 0153d5a810 netfilter: xt_CT: fix timeout setting with IPv6
This patch fixes ip6tables and the CT target if it is used to set
some custom conntrack timeout policy for IPv6.

Use xt_ct_find_proto which already handles the ip6tables case for us.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-10-15 13:38:58 +02:00
Greg Kroah-Hartman c362495586 Merge 3.7-rc1 into tty-linus
This syncs up the tty-linus branch to the latest in Linus's tree to get all of
the UAPI stuff needed for the next set of patches to merge.

Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-10-14 22:41:27 -07:00
Linus Torvalds d25282d1c9 Merge branch 'modules-next' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux
Pull module signing support from Rusty Russell:
 "module signing is the highlight, but it's an all-over David Howells frenzy..."

Hmm "Magrathea: Glacier signing key". Somebody has been reading too much HHGTTG.

* 'modules-next' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux: (37 commits)
  X.509: Fix indefinite length element skip error handling
  X.509: Convert some printk calls to pr_devel
  asymmetric keys: fix printk format warning
  MODSIGN: Fix 32-bit overflow in X.509 certificate validity date checking
  MODSIGN: Make mrproper should remove generated files.
  MODSIGN: Use utf8 strings in signer's name in autogenerated X.509 certs
  MODSIGN: Use the same digest for the autogen key sig as for the module sig
  MODSIGN: Sign modules during the build process
  MODSIGN: Provide a script for generating a key ID from an X.509 cert
  MODSIGN: Implement module signature checking
  MODSIGN: Provide module signing public keys to the kernel
  MODSIGN: Automatically generate module signing keys if missing
  MODSIGN: Provide Kconfig options
  MODSIGN: Provide gitignore and make clean rules for extra files
  MODSIGN: Add FIPS policy
  module: signature checking hook
  X.509: Add a crypto key parser for binary (DER) X.509 certificates
  MPILIB: Provide a function to read raw data into an MPI
  X.509: Add an ASN.1 decoder
  X.509: Add simple ASN.1 grammar compiler
  ...
2012-10-14 13:39:34 -07:00
Linus Torvalds 09a9ad6a1f Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace
Pull user namespace compile fixes from Eric W Biederman:
 "This tree contains three trivial fixes.  One compiler warning, one
  thinko fix, and one build fix"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace:
  btrfs: Fix compilation with user namespace support enabled
  userns: Fix posix_acl_file_xattr_userns gid conversion
  userns: Properly print bluetooth socket uids
2012-10-13 13:23:39 -07:00
Linus Torvalds bd81ccea85 Merge branch 'for-3.7' of git://linux-nfs.org/~bfields/linux
Pull nfsd update from J Bruce Fields:
 "Another relatively quiet cycle.  There was some progress on my
  remaining 4.1 todo's, but a couple of them were just of the form
  "check that we do X correctly", so didn't have much affect on the
  code.

  Other than that, a bunch of cleanup and some bugfixes (including an
  annoying NFSv4.0 state leak and a busy-loop in the server that could
  cause it to peg the CPU without making progress)."

* 'for-3.7' of git://linux-nfs.org/~bfields/linux: (46 commits)
  UAPI: (Scripted) Disintegrate include/linux/sunrpc
  UAPI: (Scripted) Disintegrate include/linux/nfsd
  nfsd4: don't allow reclaims of expired clients
  nfsd4: remove redundant callback probe
  nfsd4: expire old client earlier
  nfsd4: separate session allocation and initialization
  nfsd4: clean up session allocation
  nfsd4: minor free_session cleanup
  nfsd4: new_conn_from_crses should only allocate
  nfsd4: separate connection allocation and initialization
  nfsd4: reject bad forechannel attrs earlier
  nfsd4: enforce per-client sessions/no-sessions distinction
  nfsd4: set cl_minorversion at create time
  nfsd4: don't pin clientids to pseudoflavors
  nfsd4: fix bind_conn_to_session xdr comment
  nfsd4: cast readlink() bug argument
  NFSD: pass null terminated buf to kstrtouint()
  nfsd: remove duplicate init in nfsd4_cb_recall
  nfsd4: eliminate redundant nfs4_free_stateid
  fs/nfsd/nfs4idmap.c: adjust inconsistent IS_ERR and PTR_ERR
  ...
2012-10-13 10:53:54 +09:00
Linus Torvalds 98260daa18 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull networking updates from David Miller:

 1) Alexey Kuznetsov noticed we routed TCP resets improperly in the
    assymetric routing case, fix this by reverting a change that made us
    use the incoming interface in the outgoing route key when we didn't
    have a socket context to work with.

 2) TCP sysctl kernel memory leakage to userspace fix from Alan Cox.

 3) Move UAPI bits from David Howells, WIMAX and CAN this time.

 4) Fix TX stalls in e1000e wrt.  Byte Queue Limits, from Hiroaki
    SHIMODA, Denys Fedoryshchenko, and Jesse Brandeburg.

 5) Fix IPV6 crashes in packet generator module, from Amerigo Wang.

 6) Tidies and fixes in the new VXLAN driver from Stephen Hemminger.

 7) Bridge IP options parse doesn't check first if SKB header has at
    least an IP header's worth of content present.  Fix from Sarveshwar
    Bandi.

 8) The kernel now generates compound pages on transmit and the Xen
    netback drivers needs some adjustments in order to handle this.  Fix
    from Ian Campbell.

 9) Turn off ASPM in JME driver, from Kevin Bardon and Matthew Garrett.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (43 commits)
  mcs7830: Fix link state detection
  net: add doc for in4_pton()
  net: add doc for in6_pton()
  vti: fix sparse bit endian warnings
  tcp: resets are misrouted
  usbnet: Support devices reporting idleness
  Add CDC-ACM support for the CX93010-2x UCMxx USB Modem
  net/ethernet/jme: disable ASPM
  tcp: sysctl interface leaks 16 bytes of kernel memory
  kaweth: print correct debug ptr
  e1000e: Change wthresh to 1 to avoid possible Tx stalls
  ipv4: fix route mark sparse warning
  xen: netback: handle compound page fragments on transmit.
  bridge: Pull ip header into skb->data before looking into ip header.
  isdn: fix a wrapping bug in isdn_ppp_ioctl()
  vxlan: fix oops when give unknown ifindex
  vxlan: fix receive checksum handling
  vxlan: add additional headroom
  vxlan: allow configuring port range
  vxlan: associate with tunnel socket on transmit
  ...
2012-10-13 10:51:48 +09:00
Eric W. Biederman 1bbb3095a5 userns: Properly print bluetooth socket uids
With user namespace support enabled building bluetooth generated the warning.
net/bluetooth/af_bluetooth.c: In function ‘bt_seq_show’:
net/bluetooth/af_bluetooth.c:598:7: warning: format ‘%u’ expects argument of type ‘unsigned int’, but argument 7 has type ‘kuid_t’ [-Wformat]

Convert sock_i_uid from a kuid_t to a uid_t before printing, to avoid
this problem.

Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Cc: Masatake YAMATO <yamato@redhat.com>
Cc: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2012-10-12 13:16:47 -07:00
Amerigo Wang 93ac0ef016 net: add doc for in4_pton()
It is not easy to use in4_pton() correctly without reading
its definition, so add some doc for it.

Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Cong Wang <amwang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-10-12 13:56:52 -04:00
Amerigo Wang 28194fcdc1 net: add doc for in6_pton()
It is not easy to use in6_pton() correctly without reading
its definition, so add some doc for it.

Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Cong Wang <amwang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-10-12 13:56:52 -04:00
stephen hemminger 8437e7610c vti: fix sparse bit endian warnings
Use be32_to_cpu instead of htonl to keep sparse happy.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-10-12 13:56:52 -04:00
Alexey Kuznetsov 4c67525849 tcp: resets are misrouted
After commit e2446eaa ("tcp_v4_send_reset: binding oif to iif in no
sock case").. tcp resets are always lost, when routing is asymmetric.
Yes, backing out that patch will result in misrouting of resets for
dead connections which used interface binding when were alive, but we
actually cannot do anything here.  What's died that's died and correct
handling normal unbound connections is obviously a priority.

Comment to comment:
> This has few benefits:
>   1. tcp_v6_send_reset already did that.

It was done to route resets for IPv6 link local addresses. It was a
mistake to do so for global addresses. The patch fixes this as well.

Actually, the problem appears to be even more serious than guaranteed
loss of resets.  As reported by Sergey Soloviev <sol@eqv.ru>, those
misrouted resets create a lot of arp traffic and huge amount of
unresolved arp entires putting down to knees NAT firewalls which use
asymmetric routing.

Signed-off-by: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
2012-10-12 13:52:40 -04:00
Johan Hedberg 065a13e2cc Bluetooth: SMP: Fix setting unknown auth_req bits
When sending a pairing request or response we should not just blindly
copy the value that the remote device sent. Instead we should at least
make sure to mask out any unknown bits. This is particularly critical
from the upcoming LE Secure Connections feature perspective as
incorrectly indicating support for it (by copying the remote value)
would cause a failure to pair with devices that support it.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Cc: stable@kernel.org
Acked-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
2012-10-12 17:55:20 +08:00
Linus Torvalds 940e3a8dd6 The following changes since commit 4cbe5a555fa58a79b6ecbb6c531b8bab0650778d:
Linux 3.6-rc4 (2012-09-01 10:39:58 -0700)
 
 are available in the git repository at:
 
   git://git.kernel.org/pub/scm/linux/kernel/git/ericvh/v9fs.git for-next
 
 for you to fetch changes up to 552aad02a283ee88406b102b4d6455eef7127196:
 
   9P: Fix race between p9_write_work() and p9_fd_request() (2012-09-17 14:54:11 -0500)
 
 ----------------------------------------------------------------
 Jeff Layton (1):
       9p: don't use __getname/__putname for uname/aname
 
 Jim Meyering (1):
       fs/9p: avoid debug OOPS when reading a long symlink
 
 Simon Derr (5):
       net/9p: Check errno validity
       9P: Fix race in p9_read_work()
       9P: fix test at the end of p9_write_work()
       9P: Fix race in p9_write_work()
       9P: Fix race between p9_write_work() and p9_fd_request()
 
  fs/9p/v9fs.c      |   30 +++++++++++++++++++-----------
  fs/9p/vfs_inode.c |    8 ++++----
  net/9p/client.c   |   18 ++++++++++++++++--
  net/9p/trans_fd.c |   38 ++++++++++++++++++++------------------
  4 files changed, 59 insertions(+), 35 deletions(-)
 
 Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG/MacGPG2 v2.0.17 (Darwin)
 Comment: GPGTools - http://gpgtools.org
 
 iQIcBAABAgAGBQJQdvxdAAoJEDZk62b0Tg6xru8QAL1I6YH+O6c+sHONQQnifkl/
 WciZDUSYx7Pd4Ffy48m6y5J6M2VNFUIzqpIKnm4xAiHUwct/E/+yuyfK2zAe1Jxc
 nqPxoU2iyFWXc1Hu5HQhrjMXMlqePPuF1kwTYd0vCXXbVgWfbhwYfjoRr3PGuVTD
 3SpQrBxIvQj1aWRMyyQTcnqnmTLPFr1kX0TRBgvipSfQETVFR5gCXK8sJUDvU+0S
 4kywmb3y31/EpcKdDs7CE1m5kCi6T2mguP5NR4dHtN8YT76IW4urIqyAw6069wQV
 AMmoqhJP2cJ6kyyh93ltZSgcMIUgfrDj2pIsGT3hILusTh9vBT10Db8iNT2ledy8
 W+TxjK0/H0h5rfitHYqD+XnCF4pKFRm5aOOYL8jg02Uh8jU9MzkAIw1/fmXUOZ7O
 rht+HttJht2QCFniV1C442hbzL0J5mYsGPwpWZ5j4dN7PBIi8SYh+Ik0la4rRa8I
 m9C04HHvPsc0gRXPAp1+Ptby4FnPS846a9Ffm4xrkNhFl3z916ef67MnoCGu3roM
 GU9FEOdWhSWJ+52qLcXZwqkrPvlUMOehwnSjlab3BCThPRVK0D7gdTzBN4NDQZWo
 AzhK5sNRFwEidnYo7gy0g2UsRWRgPP7fiUe/xtlWaBlm0DU1+jZc/uzjEn6/h77R
 fQfniKFcMRFIeksGts5e
 =hADE
 -----END PGP SIGNATURE-----

Merge tag 'for-linus-merge-3.7' of git://git.kernel.org/pub/scm/linux/kernel/git/ericvh/v9fs

Pull v9fs update from Eric Van Hensbergen.

* tag 'for-linus-merge-3.7' of git://git.kernel.org/pub/scm/linux/kernel/git/ericvh/v9fs:
  9P: Fix race between p9_write_work() and p9_fd_request()
  9P: Fix race in p9_write_work()
  9P: fix test at the end of p9_write_work()
  9P: Fix race in p9_read_work()
  9p: don't use __getname/__putname for uname/aname
  net/9p: Check errno validity
  fs/9p: avoid debug OOPS when reading a long symlink
2012-10-12 09:59:23 +09:00
Alan Cox 0e24c4fc52 tcp: sysctl interface leaks 16 bytes of kernel memory
If the rc_dereference of tcp_fastopen_ctx ever fails then we copy 16 bytes
of kernel stack into the proc result.

Signed-off-by: Alan Cox <alan@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-10-11 15:12:33 -04:00
Simon Derr 759f42987f 9P: Fix race between p9_write_work() and p9_fd_request()
Race scenario:

thread A			thread B

p9_write_work()                p9_fd_request()

if (list_empty
  (&m->unsent_req_list))
  ...

                               spin_lock(&client->lock);
                               req->status = REQ_STATUS_UNSENT;
                               list_add_tail(..., &m->unsent_req_list);
                               spin_unlock(&client->lock);
                               ....
                               if (n & POLLOUT &&
                               !test_and_set_bit(Wworksched, &m->wsched)
                               schedule_work(&m->wq);
                               --> not done because Wworksched is set

  clear_bit(Wworksched, &m->wsched);
  return;

--> nobody will take care of sending the new request.

This is not very likely to happen though, because p9_write_work()
being called with an empty unsent_req_list is not frequent.
But this also means that taking the lock earlier will not be costly.

Signed-off-by: Simon Derr <simon.derr@bull.net>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
2012-10-11 12:03:31 -05:00
J. Bruce Fields a9ca4043d0 Merge Trond's bugfixes
Merge branch 'bugfixes' of git://linux-nfs.org/~trondmy/nfs-2.6 into
for-3.7-incoming.  Mainly needed for Bryan's "SUNRPC: Set alloc_slot for
backchannel tcp ops", without which the 4.1 server oopses.
2012-10-11 12:41:05 -04:00
stephen hemminger 68aaed54e7 ipv4: fix route mark sparse warning
Sparse complains about RTA_MARK which is should be host order according
to include file and usage in iproute.

net/ipv4/route.c:2223:46: warning: incorrect type in argument 3 (different base types)
net/ipv4/route.c:2223:46:    expected restricted __be32 [usertype] value
net/ipv4/route.c:2223:46:    got unsigned int [unsigned] [usertype] flowic_mark

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-10-10 22:54:59 -04:00
Sarveshwar Bandi 6caab7b054 bridge: Pull ip header into skb->data before looking into ip header.
If lower layer driver leaves the ip header in the skb fragment, it needs to
be first pulled into skb->data before inspecting ip header length or ip version
number.

Signed-off-by: Sarveshwar Bandi <sarveshwar.bandi@emulex.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-10-10 22:50:45 -04:00
Amerigo Wang c468fb1375 pktgen: replace scan_ip6() with in6_pton()
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Cong Wang <amwang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-10-10 22:33:30 -04:00
Amerigo Wang 4c139b8cce pktgen: enable automatic IPv6 address setting
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Cong Wang <amwang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-10-10 22:33:30 -04:00
Amerigo Wang 0373a94671 pktgen: display IPv4 address in human-readable format
It is weird to display IPv4 address in %x format, what's more,
IPv6 address is disaplayed in human-readable format too. So,
make it human-readable.

Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Cong Wang <amwang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-10-10 22:33:30 -04:00
Amerigo Wang 68bf9f0b91 pktgen: set different default min_pkt_size for different protocols
ETH_ZLEN is too small for IPv6, so this default value is not
suitable.

Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Cong Wang <amwang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-10-10 22:33:30 -04:00
Amerigo Wang 5aa8b57200 pktgen: fix crash when generating IPv6 packets
For IPv6, sizeof(struct ipv6hdr) = 40, thus the following
expression will result negative:

        datalen = pkt_dev->cur_pkt_size - 14 -
                  sizeof(struct ipv6hdr) - sizeof(struct udphdr) -
                  pkt_dev->pkt_overhead;

And,  the check "if (datalen < sizeof(struct pktgen_hdr))" will be
passed as "datalen" is promoted to unsigned, therefore will cause
a crash later.

This is a quick fix by checking if "datalen" is negative. The following
patch will increase the default value of 'min_pkt_size' for IPv6.

This bug should exist for a long time, so Cc -stable too.

Cc: <stable@vger.kernel.org>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Cong Wang <amwang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-10-10 22:33:30 -04:00
David S. Miller 85457685e0 Merge tag 'master-2012-10-08' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless
John W. Linville says:

====================
Here is a batch of fixes intended for 3.7...

Amitkumar Karwar provides a couple of mwifiex fixes to correctly
report some reason codes for certain connection failures.  He also
provides a fix to cleanup after a scanning failure.  Bing Zhao rounds
that out with another mwifiex scanning fix.

Daniel Golle gives us a fix for a copy/paste error in rt2x00.

Felix Fietkau brings a couple of ath9k fixes related to suspend/resume,
and a couple of fixes to prevent memory leaks in ath9k and mac80211.

Ronald Wahl sends a carl9170 fix for a sleep in softirq context.

Thomas Pedersen reorders some code to prevent drv_get_tsf from being
called while holding a spinlock, now that it can sleep.

Finally, Wei Yongjun prevents a NULL pointer dereference in the
ath5k driver.

Please let me know if there are problems!
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2012-10-10 11:59:54 -04:00
Linus Torvalds df632d3ce7 NFS client updates for Linux 3.7
Features include:
 
 - Remove CONFIG_EXPERIMENTAL dependency from NFSv4.1
   Aside from the issues discussed at the LKS, distros are shipping
   NFSv4.1 with all the trimmings.
 - Fix fdatasync()/fsync() for the corner case of a server reboot.
 - NFSv4 OPEN access fix: finally distinguish correctly between
   open-for-read and open-for-execute permissions in all situations.
 - Ensure that the TCP socket is closed when we're in CLOSE_WAIT
 - More idmapper bugfixes
 - Lots of pNFS bugfixes and cleanups to remove unnecessary state and
   make the code easier to read.
 - In cases where a pNFS read or write fails, allow the client to
   resume trying layoutgets after two minutes of read/write-through-mds.
 - More net namespace fixes to the NFSv4 callback code.
 - More net namespace fixes to the NFSv3 locking code.
 - More NFSv4 migration preparatory patches.
   Including patches to detect network trunking in both NFSv4 and NFSv4.1
 - pNFS block updates to optimise LAYOUTGET calls.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.12 (GNU/Linux)
 
 iQIcBAABAgAGBQJQdMvBAAoJEGcL54qWCgDyV84P/0XvcEXj6kdMv9EiWfRczo7r
 iAwAIhiEmG1agtZa6v+Gso2MYRQbkGyJi0LKIwzGqNUi0BLQGQCoV93kB0ITVpiN
 g7poDTnPyoItW1oJCtC48/Mx0G5C1yrHSwFAJrXmtzDF1mwd/BIQReafYp6x+/TU
 Mvwm7au3Y2ySRBEDmY4zyBERHXGt//JmsZ9Ays6jewQg5ZOyjDQKoeHVYaaeJoF0
 A0tQGcBSNdySagI5dt4SlkuO7AClhzVHlilep2dsBu/TLS0F2pEdHXvM2W0koZmM
 uazaIpzd2F7TfokTYExgsyKsqpkzpDf1kebN4Y1+Ioi7Yy30dQrX6lNaUNcOmOJQ
 xx694HDHV90KdRBVSFhOIHMTBRcls68hBcWib3MXWHTKX6HVgnFMwhwxGH0MRezf
 3rmXoqn+CO1j5WeQmA3BqdVbHSZHi913TKEwE/qoW4pmOFhv5I2flXWQS/Rwvdng
 2xDCe6TlvhMS92IpyvNEIicXLRSm+DUAmoAfSqqlifZIAEM5R29e/wCAsmVprO3B
 LPHyUoIMO6SZ1PL6Rk20+6qQfvCK7U/ChULsUL/zb7R88Pc3sFE2BeAvZVATsvH3
 +FJWTz43fwUBoMhPsn8xSBLn/fq6az5C19syz6Fpu3DZ4X0EwyVWifiFk6HgcxZD
 J8ajEl+dNZeFE8rkwykX
 =uBk7
 -----END PGP SIGNATURE-----

Merge tag 'nfs-for-3.7-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs

Pull NFS client updates from Trond Myklebust:
 "Features include:

   - Remove CONFIG_EXPERIMENTAL dependency from NFSv4.1
     Aside from the issues discussed at the LKS, distros are shipping
     NFSv4.1 with all the trimmings.
   - Fix fdatasync()/fsync() for the corner case of a server reboot.
   - NFSv4 OPEN access fix: finally distinguish correctly between
     open-for-read and open-for-execute permissions in all situations.
   - Ensure that the TCP socket is closed when we're in CLOSE_WAIT
   - More idmapper bugfixes
   - Lots of pNFS bugfixes and cleanups to remove unnecessary state and
     make the code easier to read.
   - In cases where a pNFS read or write fails, allow the client to
     resume trying layoutgets after two minutes of read/write-
     through-mds.
   - More net namespace fixes to the NFSv4 callback code.
   - More net namespace fixes to the NFSv3 locking code.
   - More NFSv4 migration preparatory patches.
     Including patches to detect network trunking in both NFSv4 and
     NFSv4.1
   - pNFS block updates to optimise LAYOUTGET calls."

* tag 'nfs-for-3.7-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (113 commits)
  pnfsblock: cleanup nfs4_blkdev_get
  NFS41: send real read size in layoutget
  NFS41: send real write size in layoutget
  NFS: track direct IO left bytes
  NFSv4.1: Cleanup ugliness in pnfs_layoutgets_blocked()
  NFSv4.1: Ensure that the layout sequence id stays 'close' to the current
  NFSv4.1: Deal with seqid wraparound in the pNFS return-on-close code
  NFSv4 set open access operation call flag in nfs4_init_opendata_res
  NFSv4.1: Remove the dependency on CONFIG_EXPERIMENTAL
  NFSv4 reduce attribute requests for open reclaim
  NFSv4: nfs4_open_done first must check that GETATTR decoded a file type
  NFSv4.1: Deal with wraparound when updating the layout "barrier" seqid
  NFSv4.1: Deal with wraparound issues when updating the layout stateid
  NFSv4.1: Always set the layout stateid if this is the first layoutget
  NFSv4.1: Fix another refcount issue in pnfs_find_alloc_layout
  NFSv4: don't put ACCESS in OPEN compound if O_EXCL
  NFSv4: don't check MAY_WRITE access bit in OPEN
  NFS: Set key construction data for the legacy upcall
  NFSv4.1: don't do two EXCHANGE_IDs on mount
  NFS: nfs41_walk_client_list(): re-lock before iterating
  ...
2012-10-10 23:52:35 +09:00
Alex Elder 588377d619 rbd: reset BACKOFF if unable to re-queue
If ceph_fault() is unable to queue work after a delay, it sets the
BACKOFF connection flag so con_work() will attempt to do so.

In con_work(), when BACKOFF is set, if queue_delayed_work() doesn't
result in newly-queued work, it simply ignores this condition and
proceeds as if no backoff delay were desired.  There are two
problems with this--one of which is a bug.

The first problem is simply that the intended behavior is to back
off, and if we aren't able queue the work item to run after a delay
we're not doing that.

The only reason queue_delayed_work() won't queue work is if the
provided work item is already queued.  In the messenger, this
means that con_work() is already scheduled to be run again.  So
if we simply set the BACKOFF flag again when this occurs, we know
the next con_work() call will again attempt to hold off activity
on the connection until after the delay.

The second problem--the bug--is a leak of a reference count.  If
queue_delayed_work() returns 0 in con_work(), con->ops->put() drops
the connection reference held on entry to con_work().  However,
processing is (was) allowed to continue, and at the end of the
function a second con->ops->put() is called.

This patch fixes both problems.

Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
2012-10-09 21:59:52 -07:00
J. Bruce Fields f474af7051 UAPI Disintegration 2012-10-09
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.12 (GNU/Linux)
 
 iQIVAwUAUHPmWxOxKuMESys7AQKN4w//XDwALfbf0MXIw+gwyRiUtJe9mGexvI6X
 1R4FWU9a3ImzEZP4cWnmPGT2wmC/x007DcIvx8cyvbdlSuqtR2i/DC+HbWabiLRn
 nJS7Eer1BJvLv5dn6NmXMEz7yB4Z46+frcmBs3WQeR0sqBMDm+rjQzCqECznO8Jc
 VtCbox+VR2DuWcM++YECTblYEH3Z+doDXUN2eBaD8L9x3klPbPXD7OcRyOnry8w+
 ynmUTKKyH4+hpxDakYrObPIg+vFCxb4QRck1mlgA4wbvb3eqjhM0oOCYJ8GvmILA
 vdFYztWCjkiuOl5djtXBlsClX8SAMOBYlRed+R1GvjNCSR+WCWrFJJ2F8qoQ1w87
 9ts2/8qrozS8luTB475SkT2uLdJkIUKX89Oh+dWeE8YkbPnRPj5lNAdtNY5QSyDq
 VaRpIo+YfmZygyvHJQlAXBuZ0mvzcPzArfcPgSVTD3B7xTEGVu/45V7SnQX5os/V
 v39ySPXMdGOIdvK51gw7OtZl64uqrEKu39PyYDX/GUADflp/CHD0J7PJrQePbsH9
 AQolVZDIxTfKqYQnUdL8+C8Zc24RowEzz3c2+aO89MSzwGqev3q8sXRVbW/Iqryg
 p+V3nHe+ipKcga5tOBlPr9KDtDd7j3xN2yaIwf5/QyO1OHBpjAZP1gjSVDcUcwpi
 svYy4kPn3PA=
 =etoL
 -----END PGP SIGNATURE-----

nfs: disintegrate UAPI for nfs

This is to complete part of the Userspace API (UAPI) disintegration for which
the preparatory patches were pulled recently.  After these patches, userspace
headers will be segregated into:

        include/uapi/linux/.../foo.h

for the userspace interface stuff, and:

        include/linux/.../foo.h

for the strictly kernel internal stuff.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-10-09 18:35:22 -04:00
jeff.liu 5175a5e76b RDS: fix rds-ping spinlock recursion
This is the revised patch for fixing rds-ping spinlock recursion
according to Venkat's suggestions.

RDS ping/pong over TCP feature has been broken for years(2.6.39 to
3.6.0) since we have to set TCP cork and call kernel_sendmsg() between
ping/pong which both need to lock "struct sock *sk". However, this
lock has already been hold before rds_tcp_data_ready() callback is
triggerred. As a result, we always facing spinlock resursion which
would resulting in system panic.

Given that RDS ping is only used to test the connectivity and not for
serious performance measurements, we can queue the pong transmit to
rds_wq as a delayed response.

Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
CC: Venkat Venkatsubra <venkat.x.venkatsubra@oracle.com>
CC: David S. Miller <davem@davemloft.net>
CC: James Morris <james.l.morris@oracle.com>
Signed-off-by: Jie Liu <jeff.liu@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-10-09 13:57:23 -04:00
David S. Miller 8dd9117cc7 Merge git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux
Pulled mainline in order to get the UAPI infrastructure already
merged before I pull in David Howells's UAPI trees for networking.

Signed-off-by: David S. Miller <davem@davemloft.net>
2012-10-09 13:14:32 -04:00
Michel Lespinasse 4c199a93a2 rbtree: empty nodes have no color
Empty nodes have no color.  We can make use of this property to simplify
the code emitted by the RB_EMPTY_NODE and RB_CLEAR_NODE macros.  Also,
we can get rid of the rb_init_node function which had been introduced by
commit 88d19cf379 ("timers: Add rb_init_node() to allow for stack
allocated rb nodes") to avoid some issue with the empty node's color not
being initialized.

I'm not sure what the RB_EMPTY_NODE checks in rb_prev() / rb_next() are
doing there, though.  axboe introduced them in commit 10fd48f237
("rbtree: fixed reversed RB_EMPTY_NODE and rb_next/prev").  The way I
see it, the 'empty node' abstraction is only used by rbtree users to
flag nodes that they haven't inserted in any rbtree, so asking the
predecessor or successor of such nodes doesn't make any sense.

One final rb_init_node() caller was recently added in sysctl code to
implement faster sysctl name lookups.  This code doesn't make use of
RB_EMPTY_NODE at all, and from what I could see it only called
rb_init_node() under the mistaken assumption that such initialization was
required before node insertion.

[sfr@canb.auug.org.au: fix net/ceph/osd_client.c build]
Signed-off-by: Michel Lespinasse <walken@google.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Acked-by: David Woodhouse <David.Woodhouse@intel.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Daniel Santos <daniel.santos@pobox.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: John Stultz <john.stultz@linaro.org>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-10-09 16:22:32 +09:00
Arnd Bergmann b61a602ee6 ipvs: initialize returned data in do_ip_vs_get_ctl
As reported by a gcc warning, the do_ip_vs_get_ctl does not initalize
all the members of the ip_vs_timeout_user structure it returns if
at least one of the TCP or UDP protocols is disabled for ipvs.

This makes sure that the data is always initialized, before it is
returned as a response to IPVS_CMD_GET_CONFIG or printed as a
debug message in IPVS_CMD_SET_CONFIG.

Without this patch, building ARM ixp4xx_defconfig results in:

net/netfilter/ipvs/ip_vs_ctl.c: In function 'ip_vs_genl_set_cmd':
net/netfilter/ipvs/ip_vs_ctl.c:2238:47: warning: 't.udp_timeout' may be used uninitialized in this function [-Wuninitialized]
net/netfilter/ipvs/ip_vs_ctl.c:3322:28: note: 't.udp_timeout' was declared here
net/netfilter/ipvs/ip_vs_ctl.c:2238:47: warning: 't.tcp_fin_timeout' may be used uninitialized in this function [-Wuninitialized]
net/netfilter/ipvs/ip_vs_ctl.c:3322:28: note: 't.tcp_fin_timeout' was declared here
net/netfilter/ipvs/ip_vs_ctl.c:2238:47: warning: 't.tcp_timeout' may be used uninitialized in this function [-Wuninitialized]
net/netfilter/ipvs/ip_vs_ctl.c:3322:28: note: 't.tcp_timeout' was declared here

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
2012-10-09 13:04:34 +09:00
Julian Anastasov ad4d3ef8b7 ipvs: fix ARP resolving for direct routing mode
After the change "Make neigh lookups directly in output packet path"
(commit a263b30936) IPVS can not reach the real server for DR mode
because we resolve the destination address from IP header, not from
route neighbour. Use the new FLOWI_FLAG_KNOWN_NH flag to request
output routes with known nexthop, so that it has preference
on resolving.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-10-08 17:42:36 -04:00
Julian Anastasov c92b96553a ipv4: Add FLOWI_FLAG_KNOWN_NH
Add flag to request that output route should be
returned with known rt_gateway, in case we want to use
it as nexthop for neighbour resolving.

	The returned route can be cached as follows:

- in NH exception: because the cached routes are not shared
	with other destinations
- in FIB NH: when using gateway because all destinations for
	NH share same gateway

	As last option, to return rt_gateway!=0 we have to
set DST_NOCACHE.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-10-08 17:42:36 -04:00
Julian Anastasov 155e8336c3 ipv4: introduce rt_uses_gateway
Add new flag to remember when route is via gateway.
We will use it to allow rt_gateway to contain address of
directly connected host for the cases when DST_NOCACHE is
used or when the NH exception caches per-destination route
without DST_NOCACHE flag, i.e. when routes are not used for
other destinations. By this way we force the neighbour
resolving to work with the routed destination but we
can use different address in the packet, feature needed
for IPVS-DR where original packet for virtual IP is routed
via route to real IP.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-10-08 17:42:36 -04:00
Julian Anastasov f8a17175c6 ipv4: make sure nh_pcpu_rth_output is always allocated
Avoid checking nh_pcpu_rth_output in fast path,
abort fib_info creation on alloc_percpu failure.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-10-08 17:42:35 -04:00
Julian Anastasov e0adef0f74 ipv4: fix forwarding for strict source routes
After the change "Adjust semantics of rt->rt_gateway"
(commit f8126f1d51) rt_gateway can be 0 but ip_forward() compares
it directly with nexthop. What we want here is to check if traffic
is to directly connected nexthop and to fail if using gateway.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-10-08 17:42:35 -04:00
Julian Anastasov e81da0e113 ipv4: fix sending of redirects
After "Cache input routes in fib_info nexthops" (commit
d2d68ba9fe) and "Elide fib_validate_source() completely when possible"
(commit 7a9bc9b81a) we can not send ICMP redirects. It seems we
should not cache the RTCF_DOREDIRECT flag in nh_rth_input because
the same fib_info can be used for traffic that is not redirected,
eg. from other input devices or from sources that are not in same subnet.

	As result, we have to disable the caching of RTCF_DOREDIRECT
flag and to force source validation for the case when forwarding
traffic to the input device. If traffic comes from directly connected
source we allow redirection as it was done before both changes.

	Avoid setting RTCF_DOREDIRECT if IN_DEV_TX_REDIRECTS
is disabled, this can avoid source address validation and to
help caching the routes.

	After the change "Adjust semantics of rt->rt_gateway"
(commit f8126f1d51) we should make sure our ICMP_REDIR_HOST messages
contain daddr instead of 0.0.0.0 when target is directly connected.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-10-08 17:42:35 -04:00
Eric Dumazet 863472454c ipv6: gro: fix PV6_GRO_CB(skb)->proto problem
It seems IPV6_GRO_CB(skb)->proto can be destroyed in skb_gro_receive()
if a new skb is allocated (to serve as an anchor for frag_list)

We copy NAPI_GRO_CB() only (not the IPV6 specific part) in :

*NAPI_GRO_CB(nskb) = *NAPI_GRO_CB(p);

So we leave IPV6_GRO_CB(nskb)->proto to 0 (fresh skb allocation) instead
of IPPROTO_TCP (6)

ipv6_gro_complete() isnt able to call ops->gro_complete()
[ tcp6_gro_complete() ]

Fix this by moving proto in NAPI_GRO_CB() and getting rid of
IPV6_GRO_CB

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-10-08 15:40:43 -04:00
Florian Zumbiehl 48cc32d38a vlan: don't deliver frames for unknown vlans to protocols
6a32e4f9dd made the vlan code skip marking
vlan-tagged frames for not locally configured vlans as PACKET_OTHERHOST if
there was an rx_handler, as the rx_handler could cause the frame to be received
on a different (virtual) vlan-capable interface where that vlan might be
configured.

As rx_handlers do not necessarily return RX_HANDLER_ANOTHER, this could cause
frames for unknown vlans to be delivered to the protocol stack as if they had
been received untagged.

For example, if an ipv6 router advertisement that's tagged for a locally not
configured vlan is received on an interface with macvlan interfaces attached,
macvlan's rx_handler returns RX_HANDLER_PASS after delivering the frame to the
macvlan interfaces, which caused it to be passed to the protocol stack, leading
to ipv6 addresses for the announced prefix being configured even though those
are completely unusable on the underlying interface.

The fix moves marking as PACKET_OTHERHOST after the rx_handler so the
rx_handler, if there is one, sees the frame unchanged, but afterwards,
before the frame is delivered to the protocol stack, it gets marked whether
there is an rx_handler or not.

Signed-off-by: Florian Zumbiehl <florz@florz.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-10-08 15:21:55 -04:00
Felix Fietkau c3e7724b6b mac80211: use ieee80211_free_txskb to fix possible skb leaks
A few places free skbs using dev_kfree_skb even though they're called
after ieee80211_subif_start_xmit might have cloned it for tracking tx
status. Use ieee80211_free_txskb here to prevent skb leaks.

Signed-off-by: Felix Fietkau <nbd@openwrt.org>
Cc: stable@vger.kernel.org
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2012-10-08 15:06:05 -04:00
Thomas Pedersen 55fabefe36 mac80211: call drv_get_tsf() in sleepable context
The call to drv_get/set_tsf() was put on the workqueue to perform tsf
adjustments since that function might sleep. However it ended up inside
a spinlock, whose critical section must be atomic. Do tsf adjustment
outside the spinlock instead, and get rid of a warning.

Signed-off-by: Thomas Pedersen <thomas@cozybit.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2012-10-08 15:06:02 -04:00
Eric Dumazet 2e71a6f808 net: gro: selective flush of packets
Current GRO can hold packets in gro_list for almost unlimited
time, in case napi->poll() handler consumes its budget over and over.

In this case, napi_complete()/napi_gro_flush() are not called.

Another problem is that gro_list is flushed in non friendly way :
We scan the list and complete packets in the reverse order.
(youngest packets first, oldest packets last)
This defeats priorities that sender could have cooked.

Since GRO currently only store TCP packets, we dont really notice the
bug because of retransmits, but this behavior can add unexpected
latencies, particularly on mice flows clamped by elephant flows.

This patch makes sure no packet can stay more than 1 ms in queue, and
only in stress situations.

It also complete packets in the right order to minimize latencies.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Jesse Gross <jesse@nicira.com>
Cc: Tom Herbert <therbert@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-10-08 14:51:51 -04:00
Steffen Klassert ee9a8f7ab2 ipv4: Don't report stale pmtu values to userspace
We report cached pmtu values even if they are already expired.
Change this to not report these values after they are expired
and fix a race in the expire time calculation, as suggested by
Eric Dumazet.

Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-10-08 14:46:35 -04:00
Steffen Klassert 7f92d334ba ipv4: Don't create nh exeption when the device mtu is smaller than the reported pmtu
When a local tool like tracepath tries to send packets bigger than
the device mtu, we create a nh exeption and set the pmtu to device
mtu. The device mtu does not expire, so check if the device mtu is
smaller than the reported pmtu and don't crerate a nh exeption in
that case.

Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-10-08 14:46:35 -04:00
Steffen Klassert d851c12b60 ipv4: Always invalidate or update the route on pmtu events
Some protocols, like IPsec still cache routes. So we need to invalidate
the old route on pmtu events to avoid the reuse of stale routes.
We also need to update the mtu and expire time of the route if we already
use a nh exception route, otherwise we ignore newly learned pmtu values
after the first expiration.

With this patch we always invalidate or update the route on pmtu events.

Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-10-08 14:46:34 -04:00
David Howells cf7f601c06 KEYS: Add payload preparsing opportunity prior to key instantiate or update
Give the key type the opportunity to preparse the payload prior to the
instantiation and update routines being called.  This is done with the
provision of two new key type operations:

	int (*preparse)(struct key_preparsed_payload *prep);
	void (*free_preparse)(struct key_preparsed_payload *prep);

If the first operation is present, then it is called before key creation (in
the add/update case) or before the key semaphore is taken (in the update and
instantiate cases).  The second operation is called to clean up if the first
was called.

preparse() is given the opportunity to fill in the following structure:

	struct key_preparsed_payload {
		char		*description;
		void		*type_data[2];
		void		*payload;
		const void	*data;
		size_t		datalen;
		size_t		quotalen;
	};

Before the preparser is called, the first three fields will have been cleared,
the payload pointer and size will be stored in data and datalen and the default
quota size from the key_type struct will be stored into quotalen.

The preparser may parse the payload in any way it likes and may store data in
the type_data[] and payload fields for use by the instantiate() and update()
ops.

The preparser may also propose a description for the key by attaching it as a
string to the description field.  This can be used by passing a NULL or ""
description to the add_key() system call or the key_create_or_update()
function.  This cannot work with request_key() as that required the description
to tell the upcall about the key to be created.

This, for example permits keys that store PGP public keys to generate their own
name from the user ID and public key fingerprint in the key.

The instantiate() and update() operations are then modified to look like this:

	int (*instantiate)(struct key *key, struct key_preparsed_payload *prep);
	int (*update)(struct key *key, struct key_preparsed_payload *prep);

and the new payload data is passed in *prep, whether or not it was preparsed.

Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2012-10-08 13:49:48 +10:30
Linus Torvalds 7035cdf36d Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client
Pull ceph updates from Sage Weil:
 "The bulk of this pull is a series from Alex that refactors and cleans
  up the RBD code to lay the groundwork for supporting the new image
  format and evolving feature set.  There are also some cleanups in
  libceph, and for ceph there's fixed validation of file striping
  layouts and a bugfix in the code handling a shrinking MDS cluster."

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: (71 commits)
  ceph: avoid 32-bit page index overflow
  ceph: return EIO on invalid layout on GET_DATALOC ioctl
  rbd: BUG on invalid layout
  ceph: propagate layout error on osd request creation
  libceph: check for invalid mapping
  ceph: convert to use le32_add_cpu()
  ceph: Fix oops when handling mdsmap that decreases max_mds
  rbd: update remaining header fields for v2
  rbd: get snapshot name for a v2 image
  rbd: get the snapshot context for a v2 image
  rbd: get image features for a v2 image
  rbd: get the object prefix for a v2 rbd image
  rbd: add code to get the size of a v2 rbd image
  rbd: lay out header probe infrastructure
  rbd: encapsulate code that gets snapshot info
  rbd: add an rbd features field
  rbd: don't use index in __rbd_add_snap_dev()
  rbd: kill create_snap sysfs entry
  rbd: define rbd_dev_image_id()
  rbd: define some new format constants
  ...
2012-10-08 06:38:18 +09:00
Eric Dumazet ca07e43e28 net: gro: fix a potential crash in skb_gro_reset_offset
Before accessing skb first fragment, better make sure there
is one.

This is probably not needed for old kernels, since an ethernet frame
cannot contain only an ethernet header, but the recent GRO addition
to tunnels makes this patch needed.

Also skb_gro_reset_offset() can be static, it actually allows
compiler to inline it.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-10-07 14:49:17 -04:00
Eric Dumazet 51ec04038c ipv6: GRO should be ECN friendly
IPv4 side of the problem was addressed in commit a9e050f4e7
(net: tcp: GRO should be ECN friendly)

This patch does the same, but for IPv6 : A Traffic Class mismatch
doesnt mean flows are different, but instead should force a flush
of previous packets.

This patch removes artificial packet reordering problem.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-10-07 14:44:36 -04:00
ramesh.nagappa@gmail.com e1f165032c net: Fix skb_under_panic oops in neigh_resolve_output
The retry loop in neigh_resolve_output() and neigh_connected_output()
call dev_hard_header() with out reseting the skb to network_header.
This causes the retry to fail with skb_under_panic. The fix is to
reset the network_header within the retry loop.

Signed-off-by: Ramesh Nagappa <ramesh.nagappa@ericsson.com>
Reviewed-by: Shawn Lu <shawn.lu@ericsson.com>
Reviewed-by: Robert Coulson <robert.coulson@ericsson.com>
Reviewed-by: Billie Alsup <billie.alsup@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-10-07 14:42:39 -04:00
Eric Dumazet acb600def2 net: remove skb recycling
Over time, skb recycling infrastructure got litle interest and
many bugs. Generic rx path skb allocation is now using page
fragments for efficient GRO / TCP coalescing, and recyling
a tx skb for rx path is not worth the pain.

Last identified bug is that fat skbs can be recycled
and it can endup using high order pages after few iterations.

With help from Maxime Bizon, who pointed out that commit
87151b8689 (net: allow pskb_expand_head() to get maximum tailroom)
introduced this regression for recycled skbs.

Instead of fixing this bug, lets remove skb recycling.

Drivers wanting really hot skbs should use build_skb() anyway,
to allocate/populate sk_buff right before netif_receive_skb()

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Maxime Bizon <mbizon@freebox.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-10-07 00:40:54 -04:00
Gao feng 6dc878a8ca netlink: add reference of module in netlink_dump_start
I get a panic when I use ss -a and rmmod inet_diag at the
same time.

It's because netlink_dump uses inet_diag_dump which belongs to module
inet_diag.

I search the codes and find many modules have the same problem.  We
need to add a reference to the module which the cb->dump belongs to.

Thanks for all help from Stephen,Jan,Eric,Steffen and Pablo.

Change From v3:
change netlink_dump_start to inline,suggestion from Pablo and
Eric.

Change From v2:
delete netlink_dump_done,and call module_put in netlink_dump
and netlink_sock_destruct.

Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-10-07 00:30:56 -04:00
Linus Torvalds 283dbd8205 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull networking changes from David Miller:
 "The most important bit in here is the fix for input route caching from
  Eric Dumazet, it's a shame we couldn't fully analyze this in time for
  3.6 as it's a 3.6 regression introduced by the routing cache removal.

  Anyways, will send quickly to -stable after you pull this in.

  Other changes of note:

   1) Fix lockdep splats in team and bonding, from Eric Dumazet.

   2) IPV6 adds link local route even when there is no link local
      address, from Nicolas Dichtel.

   3) Fix ixgbe PTP implementation, from Jacob Keller.

   4) Fix excessive stack usage in cxgb4 driver, from Vipul Pandya.

   5) MAC length computed improperly in VLAN demux, from Antonio
      Quartulli."

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (26 commits)
  ipv6: release reference of ip6_null_entry's dst entry in __ip6_del_rt
  Remove noisy printks from llcp_sock_connect
  tipc: prevent dropped connections due to rcvbuf overflow
  silence some noisy printks in irda
  team: set qdisc_tx_busylock to avoid LOCKDEP splat
  bonding: set qdisc_tx_busylock to avoid LOCKDEP splat
  sctp: check src addr when processing SACK to update transport state
  sctp: fix a typo in prototype of __sctp_rcv_lookup()
  ipv4: add a fib_type to fib_info
  can: mpc5xxx_can: fix section type conflict
  can: peak_pcmcia: fix error return code
  can: peak_pci: fix error return code
  cxgb4: Fix build error due to missing linux/vmalloc.h include.
  bnx2x: fix ring size for 10G functions
  cxgb4: Dynamically allocate memory in t4_memory_rw() and get_vpd_params()
  ixgbe: add support for X540-AT1
  ixgbe: fix poll loop for FDIRCTRL.INIT_DONE bit
  ixgbe: fix PTP ethtool timestamping function
  ixgbe: (PTP) Fix PPS interrupt code
  ixgbe: Fix PTP X540 SDP alignment code for PPS signal
  ...
2012-10-06 03:11:59 +09:00
Andi Kleen 04a6f82cf0 sections: fix section conflicts in net
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Cc: David Miller <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-10-06 03:04:45 +09:00
Andi Kleen 6299b669b1 sections: fix section conflicts in net/can
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Cc: Oliver Hartkopp <socketcan@hartkopp.net>
Cc: David Miller <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-10-06 03:04:45 +09:00
Sasha Levin 6f5601251d net, TTY: initialize tty->driver_data before usage
Commit 9c650ffc ("TTY: ircomm_tty, add tty install") split _open() to
_install() and _open(). It also moved the initialization of driver_data
out of open(), but never added it to install() - causing a NULL ptr
deref whenever the driver was used.

Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
Acked-by: Jiri Slaby <jslaby@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-10-05 09:26:01 -07:00
Gao feng 6825a26c2d ipv6: release reference of ip6_null_entry's dst entry in __ip6_del_rt
as we hold dst_entry before we call __ip6_del_rt,
so we should alse call dst_release not only return
-ENOENT when the rt6_info is ip6_null_entry.

and we already hold the dst entry, so I think it's
safe to call dst_release out of the write-read lock.

Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-10-04 16:00:07 -04:00
Dave Jones 32418cfe49 Remove noisy printks from llcp_sock_connect
Validation of userspace input shouldn't trigger dmesg spamming.

Signed-off-by: Dave Jones <davej@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-10-04 15:58:47 -04:00
Erik Hugne e57edf6b6d tipc: prevent dropped connections due to rcvbuf overflow
When large buffers are sent over connected TIPC sockets, it
is likely that the sk_backlog will be filled up on the
receiver side, but the TIPC flow control mechanism is happily
unaware of this since that is based on message count.

The sender will receive a TIPC_ERR_OVERLOAD message when this occurs
and drop it's side of the connection, leaving it stale on
the receiver end.

By increasing the sk_rcvbuf to a 'worst case' value, we avoid the
overload caused by a full backlog queue and the flow control
will work properly.

This worst case value is the max TIPC message size times
the flow control window, multiplied by two because a sender
will transmit up to double the window size before a port is marked
congested.
We multiply this by 2 to account for the sk_buff and other overheads.

Signed-off-by: Erik Hugne <erik.hugne@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-10-04 15:53:48 -04:00
Dave Jones 096895818c silence some noisy printks in irda
Fuzzing causes these printks to spew constantly.
Changing them to DEBUG statements is consistent with other usage in the file,
and makes them disappear when CONFIG_IRDA_DEBUG is disabled.

Signed-off-by: Dave Jones <davej@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-10-04 15:53:48 -04:00
Nicolas Dichtel edfee0339e sctp: check src addr when processing SACK to update transport state
Suppose we have an SCTP connection with two paths. After connection is
established, path1 is not available, thus this path is marked as inactive. Then
traffic goes through path2, but for some reasons packets are delayed (after
rto.max). Because packets are delayed, the retransmit mechanism will switch
again to path1. At this time, we receive a delayed SACK from path2. When we
update the state of the path in sctp_check_transmitted(), we do not take into
account the source address of the SACK, hence we update the wrong path.

Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Acked-by: Vlad Yasevich <vyasevich@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-10-04 15:53:48 -04:00
Nicolas Dichtel 575659936f sctp: fix a typo in prototype of __sctp_rcv_lookup()
Just to avoid confusion when people only reads this prototype.

Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Acked-by: Vlad Yasevich <vyasevich@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-10-04 15:53:48 -04:00
Eric Dumazet f4ef85bbda ipv4: add a fib_type to fib_info
commit d2d68ba9fe (ipv4: Cache input routes in fib_info nexthops.)
introduced a regression for forwarding.

This was hard to reproduce but the symptom was that packets were
delivered to local host instead of being forwarded.

David suggested to add fib_type to fib_info so that we dont
inadvertently share same fib_info for different purposes.

With help from Julian Anastasov who provided very helpful
hints, reproduced here :

<quote>
        Can it be a problem related to fib_info reuse
from different routes. For example, when local IP address
is created for subnet we have:

broadcast 192.168.0.255 dev DEV  proto kernel  scope link  src
192.168.0.1
192.168.0.0/24 dev DEV  proto kernel  scope link  src 192.168.0.1
local 192.168.0.1 dev DEV  proto kernel  scope host  src 192.168.0.1

        The "dev DEV  proto kernel  scope link  src 192.168.0.1" is
a reused fib_info structure where we put cached routes.
The result can be same fib_info for 192.168.0.255 and
192.168.0.0/24. RTN_BROADCAST is cached only for input
routes. Incoming broadcast to 192.168.0.255 can be cached
and can cause problems for traffic forwarded to 192.168.0.0/24.
So, this patch should solve the problem because it
separates the broadcast from unicast traffic.

        And the ip_route_input_slow caching will work for
local and broadcast input routes (above routes 1 and 3) just
because they differ in scope and use different fib_info.

</quote>

Many thanks to Chris Clayton for his patience and help.

Reported-by: Chris Clayton <chris2553@googlemail.com>
Bisected-by: Chris Clayton <chris2553@googlemail.com>
Reported-by: Dave Jones <davej@redhat.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Julian Anastasov <ja@ssi.bg>
Tested-by: Chris Clayton <chris2553@googlemail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-10-04 13:58:26 -04:00
Linus Torvalds aab174f0df Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull vfs update from Al Viro:

 - big one - consolidation of descriptor-related logics; almost all of
   that is moved to fs/file.c

   (BTW, I'm seriously tempted to rename the result to fd.c.  As it is,
   we have a situation when file_table.c is about handling of struct
   file and file.c is about handling of descriptor tables; the reasons
   are historical - file_table.c used to be about a static array of
   struct file we used to have way back).

   A lot of stray ends got cleaned up and converted to saner primitives,
   disgusting mess in android/binder.c is still disgusting, but at least
   doesn't poke so much in descriptor table guts anymore.  A bunch of
   relatively minor races got fixed in process, plus an ext4 struct file
   leak.

 - related thing - fget_light() partially unuglified; see fdget() in
   there (and yes, it generates the code as good as we used to have).

 - also related - bits of Cyrill's procfs stuff that got entangled into
   that work; _not_ all of it, just the initial move to fs/proc/fd.c and
   switch of fdinfo to seq_file.

 - Alex's fs/coredump.c spiltoff - the same story, had been easier to
   take that commit than mess with conflicts.  The rest is a separate
   pile, this was just a mechanical code movement.

 - a few misc patches all over the place.  Not all for this cycle,
   there'll be more (and quite a few currently sit in akpm's tree)."

Fix up trivial conflicts in the android binder driver, and some fairly
simple conflicts due to two different changes to the sock_alloc_file()
interface ("take descriptor handling from sock_alloc_file() to callers"
vs "net: Providing protocol type via system.sockprotoname xattr of
/proc/PID/fd entries" adding a dentry name to the socket)

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (72 commits)
  MAX_LFS_FILESIZE should be a loff_t
  compat: fs: Generic compat_sys_sendfile implementation
  fs: push rcu_barrier() from deactivate_locked_super() to filesystems
  btrfs: reada_extent doesn't need kref for refcount
  coredump: move core dump functionality into its own file
  coredump: prevent double-free on an error path in core dumper
  usb/gadget: fix misannotations
  fcntl: fix misannotations
  ceph: don't abuse d_delete() on failure exits
  hypfs: ->d_parent is never NULL or negative
  vfs: delete surplus inode NULL check
  switch simple cases of fget_light to fdget
  new helpers: fdget()/fdput()
  switch o2hb_region_dev_write() to fget_light()
  proc_map_files_readdir(): don't bother with grabbing files
  make get_file() return its argument
  vhost_set_vring(): turn pollstart/pollstop into bool
  switch prctl_set_mm_exe_file() to fget_light()
  switch xfs_find_handle() to fget_light()
  switch xfs_swapext() to fget_light()
  ...
2012-10-02 20:25:04 -07:00
Antonio Quartulli 5316cf9a51 8021q: fix mac_len recomputation in vlan_untag()
skb_reset_mac_len() relies on the value of the skb->network_header pointer,
therefore we must wait for such pointer to be recalculated before computing
the new mac_len value.

Signed-off-by: Antonio Quartulli <ordex@autistici.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-10-02 22:45:57 -04:00
Nicolas Dichtel 62b54dd915 ipv6: don't add link local route when there is no link local address
When an address is added on loopback (ip -6 a a 2002::1/128 dev lo), a route
to fe80::/64 is added in the main table:
  unreachable fe80::/64 dev lo  proto kernel  metric 256  error -101

This route does not match any prefix (no fe80:: address on lo). In fact,
addrconf_dev_config() will not add link local address because this function
filters interfaces by type. If the link local address is added manually, the
route to the link local prefix will be automatically added by
addrconf_add_linklocal().
Note also, that this route is not deleted when the address is removed.

After looking at the code, it seems that addrconf_add_lroute() is redundant with
addrconf_add_linklocal(), because this function will add the link local route
when the link local address is configured.

Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-10-02 22:36:23 -04:00
Linus Torvalds 916082b073 workqueue: avoid using deprecated functions
The network merge brought in a few users of functions that got
deprecated by the workqueue cleanups: the 'system_nrt_wq' is now the
same as the regular system_wq, since all workqueues are now non-
reentrant.

Similarly, remove one use of flush_work_sync() - the regular
flush_work() has become synchronous, and the "_sync()" version is thus
deprecated as being superfluous.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-10-02 16:01:31 -07:00
Linus Torvalds aecdc33e11 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next
Pull networking changes from David Miller:

 1) GRE now works over ipv6, from Dmitry Kozlov.

 2) Make SCTP more network namespace aware, from Eric Biederman.

 3) TEAM driver now works with non-ethernet devices, from Jiri Pirko.

 4) Make openvswitch network namespace aware, from Pravin B Shelar.

 5) IPV6 NAT implementation, from Patrick McHardy.

 6) Server side support for TCP Fast Open, from Jerry Chu and others.

 7) Packet BPF filter supports MOD and XOR, from Eric Dumazet and Daniel
    Borkmann.

 8) Increate the loopback default MTU to 64K, from Eric Dumazet.

 9) Use a per-task rather than per-socket page fragment allocator for
    outgoing networking traffic.  This benefits processes that have very
    many mostly idle sockets, which is quite common.

    From Eric Dumazet.

10) Use up to 32K for page fragment allocations, with fallbacks to
    smaller sizes when higher order page allocations fail.  Benefits are
    a) less segments for driver to process b) less calls to page
    allocator c) less waste of space.

    From Eric Dumazet.

11) Allow GRO to be used on GRE tunnels, from Eric Dumazet.

12) VXLAN device driver, one way to handle VLAN issues such as the
    limitation of 4096 VLAN IDs yet still have some level of isolation.
    From Stephen Hemminger.

13) As usual there is a large boatload of driver changes, with the scale
    perhaps tilted towards the wireless side this time around.

Fix up various fairly trivial conflicts, mostly caused by the user
namespace changes.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1012 commits)
  hyperv: Add buffer for extended info after the RNDIS response message.
  hyperv: Report actual status in receive completion packet
  hyperv: Remove extra allocated space for recv_pkt_list elements
  hyperv: Fix page buffer handling in rndis_filter_send_request()
  hyperv: Fix the missing return value in rndis_filter_set_packet_filter()
  hyperv: Fix the max_xfer_size in RNDIS initialization
  vxlan: put UDP socket in correct namespace
  vxlan: Depend on CONFIG_INET
  sfc: Fix the reported priorities of different filter types
  sfc: Remove EFX_FILTER_FLAG_RX_OVERRIDE_IP
  sfc: Fix loopback self-test with separate_tx_channels=1
  sfc: Fix MCDI structure field lookup
  sfc: Add parentheses around use of bitfield macro arguments
  sfc: Fix null function pointer in efx_sriov_channel_type
  vxlan: virtual extensible lan
  igmp: export symbol ip_mc_leave_group
  netlink: add attributes to fdb interface
  tg3: unconditionally select HWMON support when tg3 is enabled.
  Revert "net: ti cpsw ethernet: allow reading phy interface mode from DT"
  gre: fix sparse warning
  ...
2012-10-02 13:38:27 -07:00
Linus Torvalds 437589a74b Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace
Pull user namespace changes from Eric Biederman:
 "This is a mostly modest set of changes to enable basic user namespace
  support.  This allows the code to code to compile with user namespaces
  enabled and removes the assumption there is only the initial user
  namespace.  Everything is converted except for the most complex of the
  filesystems: autofs4, 9p, afs, ceph, cifs, coda, fuse, gfs2, ncpfs,
  nfs, ocfs2 and xfs as those patches need a bit more review.

  The strategy is to push kuid_t and kgid_t values are far down into
  subsystems and filesystems as reasonable.  Leaving the make_kuid and
  from_kuid operations to happen at the edge of userspace, as the values
  come off the disk, and as the values come in from the network.
  Letting compile type incompatible compile errors (present when user
  namespaces are enabled) guide me to find the issues.

  The most tricky areas have been the places where we had an implicit
  union of uid and gid values and were storing them in an unsigned int.
  Those places were converted into explicit unions.  I made certain to
  handle those places with simple trivial patches.

  Out of that work I discovered we have generic interfaces for storing
  quota by projid.  I had never heard of the project identifiers before.
  Adding full user namespace support for project identifiers accounts
  for most of the code size growth in my git tree.

  Ultimately there will be work to relax privlige checks from
  "capable(FOO)" to "ns_capable(user_ns, FOO)" where it is safe allowing
  root in a user names to do those things that today we only forbid to
  non-root users because it will confuse suid root applications.

  While I was pushing kuid_t and kgid_t changes deep into the audit code
  I made a few other cleanups.  I capitalized on the fact we process
  netlink messages in the context of the message sender.  I removed
  usage of NETLINK_CRED, and started directly using current->tty.

  Some of these patches have also made it into maintainer trees, with no
  problems from identical code from different trees showing up in
  linux-next.

  After reading through all of this code I feel like I might be able to
  win a game of kernel trivial pursuit."

Fix up some fairly trivial conflicts in netfilter uid/git logging code.

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: (107 commits)
  userns: Convert the ufs filesystem to use kuid/kgid where appropriate
  userns: Convert the udf filesystem to use kuid/kgid where appropriate
  userns: Convert ubifs to use kuid/kgid
  userns: Convert squashfs to use kuid/kgid where appropriate
  userns: Convert reiserfs to use kuid and kgid where appropriate
  userns: Convert jfs to use kuid/kgid where appropriate
  userns: Convert jffs2 to use kuid and kgid where appropriate
  userns: Convert hpfs to use kuid and kgid where appropriate
  userns: Convert btrfs to use kuid/kgid where appropriate
  userns: Convert bfs to use kuid/kgid where appropriate
  userns: Convert affs to use kuid/kgid wherwe appropriate
  userns: On alpha modify linux_to_osf_stat to use convert from kuids and kgids
  userns: On ia64 deal with current_uid and current_gid being kuid and kgid
  userns: On ppc convert current_uid from a kuid before printing.
  userns: Convert s390 getting uid and gid system calls to use kuid and kgid
  userns: Convert s390 hypfs to use kuid and kgid where appropriate
  userns: Convert binder ipc to use kuids
  userns: Teach security_path_chown to take kuids and kgids
  userns: Add user namespace support to IMA
  userns: Convert EVM to deal with kuids and kgids in it's hmac computation
  ...
2012-10-02 11:11:09 -07:00
Linus Torvalds 68d47a137c Merge branch 'for-3.7-hierarchy' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup
Pull cgroup hierarchy update from Tejun Heo:
 "Currently, different cgroup subsystems handle nested cgroups
  completely differently.  There's no consistency among subsystems and
  the behaviors often are outright broken.

  People at least seem to agree that the broken hierarhcy behaviors need
  to be weeded out if any progress is gonna be made on this front and
  that the fallouts from deprecating the broken behaviors should be
  acceptable especially given that the current behaviors don't make much
  sense when nested.

  This patch makes cgroup emit warning messages if cgroups for
  subsystems with broken hierarchy behavior are nested to prepare for
  fixing them in the future.  This was put in a separate branch because
  more related changes were expected (didn't make it this round) and the
  memory cgroup wanted to pull in this and make changes on top."

* 'for-3.7-hierarchy' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
  cgroup: mark subsystems with broken hierarchy support and whine if cgroups are nested for them
2012-10-02 10:52:28 -07:00