Commit Graph

25266 Commits

Author SHA1 Message Date
Nicolas Dichtel 51ebd31815 ipv6: add support of equal cost multipath (ECMP)
Each nexthop is added like a single route in the routing table. All routes
that have the same metric/weight and destination but not the same gateway
are considering as ECMP routes. They are linked together, through a list called
rt6i_siblings.

ECMP routes can be added in one shot, with RTA_MULTIPATH attribute or one after
the other (in both case, the flag NLM_F_EXCL should not be set).

The patch is based on a previous work from
Luc Saillard <luc.saillard@6wind.com>.

Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-10-23 02:38:32 -04:00
Eric Dumazet d94ce9b283 ipv4: 16 slots in initial fib_info hash table
A small host typically needs ~10 fib_info structures, so create initial
hash table with 16 slots instead of only one. This removes potential
false sharing and reallocs/rehashes (1->2->4->8->16)

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-10-22 14:29:06 -04:00
Eric Dumazet 0e71c55c9e tcp: speedup SIOCINQ ioctl
SIOCINQ can use the lock_sock_fast() version to avoid double acquisition
of socket lock.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-10-22 14:29:06 -04:00
Eric Dumazet 354e4aa391 tcp: RFC 5961 5.2 Blind Data Injection Attack Mitigation
RFC 5961 5.2 [Blind Data Injection Attack].[Mitigation]

  All TCP stacks MAY implement the following mitigation.  TCP stacks
  that implement this mitigation MUST add an additional input check to
  any incoming segment.  The ACK value is considered acceptable only if
  it is in the range of ((SND.UNA - MAX.SND.WND) <= SEG.ACK <=
  SND.NXT).  All incoming segments whose ACK value doesn't satisfy the
  above condition MUST be discarded and an ACK sent back.

Move tcp_send_challenge_ack() before tcp_ack() to avoid a forward
declaration.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Cc: Jerry Chu <hkchu@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-10-22 14:29:06 -04:00
Eric Dumazet 46baac38ef pkt_sched: use ns_to_ktime() helper
ns_to_ktime() seems better than ktime_set() + ktime_add_ns()

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-10-21 22:21:27 -04:00
Rami Rosen 47b70db555 net:dev: remove double indentical assignment in dev_change_net_namespace().
This patch removes double assignment of err to -EINVAL in dev_change_net_namespace().

Signed-off-by: Rami Rosen <ramirose@gmail.com>
Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-10-21 20:44:39 -04:00
Pavel Emelyanov f7b86bfe8d sockopt: Make SO_BINDTODEVICE readable
The SO_BINDTODEVICE option is the only SOL_SOCKET one that can be set, but
cannot be get via sockopt API. The only way we can find the device id a
socket is bound to is via sock-diag interface. But the diag works only on
hashed sockets, while the opt in question can be set for yet unhashed one.

That said, in order to know what device a socket is bound to (we do want
to know this in checkpoint-restore project) I propose to make this option
getsockopt-able and report the respective device index.

Another solution to the problem might be to teach the sock-diag reporting
info on unhashed sockets. Should I go this way instead?

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-10-21 20:42:18 -04:00
Joe Perches 06e3041164 pktgen: Use ipv6_addr_any
Use the standard test for a non-zero ipv6 address.

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-10-21 20:37:05 -04:00
David S. Miller db0fe0b2f6 Included fixes:
- Fix broadcast packet CRC calculation which can lead to ~80% broadcast packet
   loss
 - Fix a race condition in duplicate broadcast packet check
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.19 (GNU/Linux)
 
 iEYEABECAAYFAlCANjIACgkQpGgxIkP9cwfeUwCgkGfv2Y8evZYX6d1bW2kFg7di
 LJ8AoISKwbTCJUvKrUuQHAvTss7bcX4k
 =/dxF
 -----END PGP SIGNATURE-----

Merge tag 'batman-adv-fix-for-davem' of git://git.open-mesh.org/linux-merge

Included fixes:
- Fix broadcast packet CRC calculation which can lead to ~80% broadcast packet
  loss
- Fix a race condition in duplicate broadcast packet check

Signed-off-by: David S. Miller <davem@davemloft.net>
2012-10-18 15:36:59 -04:00
Eric Dumazet a3374c42aa tcp: fix FIONREAD/SIOCINQ
tcp_ioctl() tries to take into account if tcp socket received a FIN
to report correct number bytes in receive queue.

But its flaky because if the application ate the last skb,
we return 1 instead of 0.

Correct way to detect that FIN was received is to test SOCK_DONE.

Reported-by: Elliot Hughes <enh@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Cc: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-10-18 15:34:31 -04:00
Eric Dumazet 6d772ac557 netlink: use kfree_rcu() in netlink_release()
On some suspend/resume operations involving wimax device, we have
noticed some intermittent memory corruptions in netlink code.

Stéphane Marchesin tracked this corruption in netlink_update_listeners()
and suggested a patch.

It appears netlink_release() should use kfree_rcu() instead of kfree()
for the listeners structure as it may be used by other cpus using RCU
protection.

netlink_release() must set to NULL the listeners pointer when
it is about to be freed.

Also have to protect netlink_update_listeners() and
netlink_has_listeners() if listeners is NULL.

Add a nl_deref_protected() lockdep helper to properly document which
locks protects us.

Reported-by: Jonathan Kliegman <kliegs@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Stéphane Marchesin <marcheu@google.com>
Cc: Sam Leffler <sleffler@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-10-18 15:34:30 -04:00
Steffen Klassert 13d82bf50d ipv4: Fix flushing of cached routing informations
Currently we can not flush cached pmtu/redirect informations via
the ipv4_sysctl_rtcache_flush sysctl. We need to check the rt_genid
of the old route and reset the nh exeption if the old route is
expired when we bind a new route to a nh exeption.

Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-10-18 15:34:30 -04:00
Jiri Pirko 18c22a03a2 vlan: allow to change type when no vlan device is hooked on netdev
vlan_info might be present but still no vlan devices might be there.
That is in case of vlan0 automatically added.

So in that case, allow to change netdev type.

Reported-by: Jon Stanley <jstanley@rmrf.net>
Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-10-18 15:34:30 -04:00
Linus Lüssing 7dac7b76b8 batman-adv: Fix potential broadcast BLA-duplicate-check race condition
Threads in the bottom half of batadv_bla_check_bcast_duplist() might
otherwise for instance overwrite variables which other threads might
be using/reading at the same time in the top half, potentially
leading to messing up the bcast_duplist, possibly resulting in false
bridge loop avoidance duplicate check decisions.

Signed-off-by: Linus Lüssing <linus.luessing@web.de>
Acked-by: Simon Wunderlich <siwu@hrz.tu-chemnitz.de>
Signed-off-by: Marek Lindner <lindner_marek@yahoo.de>
2012-10-18 18:17:31 +02:00
Linus Lüssing 7f112af40f batman-adv: Fix broadcast packet CRC calculation
So far the crc16 checksum for a batman-adv broadcast data packet, received
on a batman-adv hard interface, was calculated over zero bytes of its
content leading to many incoming broadcast data packets wrongly being
dropped (60-80% packet loss).

This patch fixes this issue by calculating the crc16 over the actual,
complete broadcast payload.

The issue is a regression introduced by
("batman-adv: add broadcast duplicate check").

Signed-off-by: Linus Lüssing <linus.luessing@web.de>
Acked-by: Simon Wunderlich <siwu@hrz.tu-chemnitz.de>
Signed-off-by: Marek Lindner <lindner_marek@yahoo.de>
2012-10-18 18:14:57 +02:00
Eric Dumazet 2ad5b9e4bd netfilter: xt_TEE: don't use destination address found in header
Torsten Luettgert bisected TEE regression starting with commit
f8126f1d51 (ipv4: Adjust semantics of rt->rt_gateway.)

The problem is that it tries to ARP-lookup the original destination
address of the forwarded packet, not the address of the gateway.

Fix this using FLOWI_FLAG_KNOWN_NH Julian added in commit
c92b96553a (ipv4: Add FLOWI_FLAG_KNOWN_NH), so that known
nexthop (info->gw.ip) has preference on resolving.

Reported-by: Torsten Luettgert <ml-netfilter@enda.eu>
Bisected-by: Torsten Luettgert <ml-netfilter@enda.eu>
Tested-by: Torsten Luettgert <ml-netfilter@enda.eu>
Cc: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-10-17 11:00:31 +02:00
Pablo Neira Ayuso 0b4f5b1d63 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
To obtain new flag FLOWI_FLAG_KNOWN_NH to fix netfilter's xt_TEE target.
2012-10-17 10:59:20 +02:00
Eric Dumazet 9f0d3c2781 ipv6: addrconf: fix /proc/net/if_inet6
Commit 1d5783030a (ipv6/addrconf: speedup /proc/net/if_inet6 filling)
added bugs hiding some devices from if_inet6 and breaking applications.

"ip -6 addr" could still display all IPv6 addresses, while "ifconfig -a"
couldnt.

One way to reproduce the bug is by starting in a shell :

unshare -n /bin/bash
ifconfig lo up

And in original net namespace, lo device disappeared from if_inet6

Reported-by: Jan Hinnerk Stosch <janhinnerk.stosch@gmail.com>
Tested-by: Jan Hinnerk Stosch <janhinnerk.stosch@gmail.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Mihai Maruseac <mihai.maruseac@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-10-16 14:41:47 -04:00
Zijie Pan f6e80abeab sctp: fix call to SCTP_CMD_PROCESS_SACK in sctp_cmd_interpreter()
Bug introduced by commit edfee0339e
(sctp: check src addr when processing SACK to update transport state)

Signed-off-by: Zijie Pan <zijie.pan@6wind.com>
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Acked-by: Vlad Yasevich <vyasevich@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-10-16 14:41:46 -04:00
Jiri Pirko 55462cf30a vlan: fix bond/team enslave of vlan challenged slave/port
In vlan_uses_dev() check for number of vlan devs rather than existence
of vlan_info. The reason is that vlan id 0 is there without appropriate
vlan dev on it by default which prevented from enslaving vlan challenged
dev.

Reported-by: Jon Stanley <jstanley@rmrf.net>
Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-10-16 14:41:46 -04:00
Elison Niven 939ccba437 netfilter: xt_nat: fix incorrect hooks for SNAT and DNAT targets
In (c7232c9 netfilter: add protocol independent NAT core), the
hooks were accidentally modified:

SNAT hooks are POST_ROUTING and LOCAL_IN (before it was LOCAL_OUT).
DNAT hooks are PRE_ROUTING and LOCAL_OUT (before it was LOCAL_IN).

Signed-off-by: Elison Niven <elison.niven@cyberoam.com>
Signed-off-by: Sanket Shah <sanket.shah@cyberoam.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-10-15 13:39:12 +02:00
Pablo Neira Ayuso 0153d5a810 netfilter: xt_CT: fix timeout setting with IPv6
This patch fixes ip6tables and the CT target if it is used to set
some custom conntrack timeout policy for IPv6.

Use xt_ct_find_proto which already handles the ip6tables case for us.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-10-15 13:38:58 +02:00
Linus Torvalds d25282d1c9 Merge branch 'modules-next' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux
Pull module signing support from Rusty Russell:
 "module signing is the highlight, but it's an all-over David Howells frenzy..."

Hmm "Magrathea: Glacier signing key". Somebody has been reading too much HHGTTG.

* 'modules-next' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux: (37 commits)
  X.509: Fix indefinite length element skip error handling
  X.509: Convert some printk calls to pr_devel
  asymmetric keys: fix printk format warning
  MODSIGN: Fix 32-bit overflow in X.509 certificate validity date checking
  MODSIGN: Make mrproper should remove generated files.
  MODSIGN: Use utf8 strings in signer's name in autogenerated X.509 certs
  MODSIGN: Use the same digest for the autogen key sig as for the module sig
  MODSIGN: Sign modules during the build process
  MODSIGN: Provide a script for generating a key ID from an X.509 cert
  MODSIGN: Implement module signature checking
  MODSIGN: Provide module signing public keys to the kernel
  MODSIGN: Automatically generate module signing keys if missing
  MODSIGN: Provide Kconfig options
  MODSIGN: Provide gitignore and make clean rules for extra files
  MODSIGN: Add FIPS policy
  module: signature checking hook
  X.509: Add a crypto key parser for binary (DER) X.509 certificates
  MPILIB: Provide a function to read raw data into an MPI
  X.509: Add an ASN.1 decoder
  X.509: Add simple ASN.1 grammar compiler
  ...
2012-10-14 13:39:34 -07:00
Linus Torvalds 09a9ad6a1f Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace
Pull user namespace compile fixes from Eric W Biederman:
 "This tree contains three trivial fixes.  One compiler warning, one
  thinko fix, and one build fix"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace:
  btrfs: Fix compilation with user namespace support enabled
  userns: Fix posix_acl_file_xattr_userns gid conversion
  userns: Properly print bluetooth socket uids
2012-10-13 13:23:39 -07:00
Linus Torvalds bd81ccea85 Merge branch 'for-3.7' of git://linux-nfs.org/~bfields/linux
Pull nfsd update from J Bruce Fields:
 "Another relatively quiet cycle.  There was some progress on my
  remaining 4.1 todo's, but a couple of them were just of the form
  "check that we do X correctly", so didn't have much affect on the
  code.

  Other than that, a bunch of cleanup and some bugfixes (including an
  annoying NFSv4.0 state leak and a busy-loop in the server that could
  cause it to peg the CPU without making progress)."

* 'for-3.7' of git://linux-nfs.org/~bfields/linux: (46 commits)
  UAPI: (Scripted) Disintegrate include/linux/sunrpc
  UAPI: (Scripted) Disintegrate include/linux/nfsd
  nfsd4: don't allow reclaims of expired clients
  nfsd4: remove redundant callback probe
  nfsd4: expire old client earlier
  nfsd4: separate session allocation and initialization
  nfsd4: clean up session allocation
  nfsd4: minor free_session cleanup
  nfsd4: new_conn_from_crses should only allocate
  nfsd4: separate connection allocation and initialization
  nfsd4: reject bad forechannel attrs earlier
  nfsd4: enforce per-client sessions/no-sessions distinction
  nfsd4: set cl_minorversion at create time
  nfsd4: don't pin clientids to pseudoflavors
  nfsd4: fix bind_conn_to_session xdr comment
  nfsd4: cast readlink() bug argument
  NFSD: pass null terminated buf to kstrtouint()
  nfsd: remove duplicate init in nfsd4_cb_recall
  nfsd4: eliminate redundant nfs4_free_stateid
  fs/nfsd/nfs4idmap.c: adjust inconsistent IS_ERR and PTR_ERR
  ...
2012-10-13 10:53:54 +09:00
Linus Torvalds 98260daa18 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull networking updates from David Miller:

 1) Alexey Kuznetsov noticed we routed TCP resets improperly in the
    assymetric routing case, fix this by reverting a change that made us
    use the incoming interface in the outgoing route key when we didn't
    have a socket context to work with.

 2) TCP sysctl kernel memory leakage to userspace fix from Alan Cox.

 3) Move UAPI bits from David Howells, WIMAX and CAN this time.

 4) Fix TX stalls in e1000e wrt.  Byte Queue Limits, from Hiroaki
    SHIMODA, Denys Fedoryshchenko, and Jesse Brandeburg.

 5) Fix IPV6 crashes in packet generator module, from Amerigo Wang.

 6) Tidies and fixes in the new VXLAN driver from Stephen Hemminger.

 7) Bridge IP options parse doesn't check first if SKB header has at
    least an IP header's worth of content present.  Fix from Sarveshwar
    Bandi.

 8) The kernel now generates compound pages on transmit and the Xen
    netback drivers needs some adjustments in order to handle this.  Fix
    from Ian Campbell.

 9) Turn off ASPM in JME driver, from Kevin Bardon and Matthew Garrett.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (43 commits)
  mcs7830: Fix link state detection
  net: add doc for in4_pton()
  net: add doc for in6_pton()
  vti: fix sparse bit endian warnings
  tcp: resets are misrouted
  usbnet: Support devices reporting idleness
  Add CDC-ACM support for the CX93010-2x UCMxx USB Modem
  net/ethernet/jme: disable ASPM
  tcp: sysctl interface leaks 16 bytes of kernel memory
  kaweth: print correct debug ptr
  e1000e: Change wthresh to 1 to avoid possible Tx stalls
  ipv4: fix route mark sparse warning
  xen: netback: handle compound page fragments on transmit.
  bridge: Pull ip header into skb->data before looking into ip header.
  isdn: fix a wrapping bug in isdn_ppp_ioctl()
  vxlan: fix oops when give unknown ifindex
  vxlan: fix receive checksum handling
  vxlan: add additional headroom
  vxlan: allow configuring port range
  vxlan: associate with tunnel socket on transmit
  ...
2012-10-13 10:51:48 +09:00
Eric W. Biederman 1bbb3095a5 userns: Properly print bluetooth socket uids
With user namespace support enabled building bluetooth generated the warning.
net/bluetooth/af_bluetooth.c: In function ‘bt_seq_show’:
net/bluetooth/af_bluetooth.c:598:7: warning: format ‘%u’ expects argument of type ‘unsigned int’, but argument 7 has type ‘kuid_t’ [-Wformat]

Convert sock_i_uid from a kuid_t to a uid_t before printing, to avoid
this problem.

Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Cc: Masatake YAMATO <yamato@redhat.com>
Cc: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2012-10-12 13:16:47 -07:00
Amerigo Wang 93ac0ef016 net: add doc for in4_pton()
It is not easy to use in4_pton() correctly without reading
its definition, so add some doc for it.

Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Cong Wang <amwang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-10-12 13:56:52 -04:00
Amerigo Wang 28194fcdc1 net: add doc for in6_pton()
It is not easy to use in6_pton() correctly without reading
its definition, so add some doc for it.

Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Cong Wang <amwang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-10-12 13:56:52 -04:00
stephen hemminger 8437e7610c vti: fix sparse bit endian warnings
Use be32_to_cpu instead of htonl to keep sparse happy.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-10-12 13:56:52 -04:00
Alexey Kuznetsov 4c67525849 tcp: resets are misrouted
After commit e2446eaa ("tcp_v4_send_reset: binding oif to iif in no
sock case").. tcp resets are always lost, when routing is asymmetric.
Yes, backing out that patch will result in misrouting of resets for
dead connections which used interface binding when were alive, but we
actually cannot do anything here.  What's died that's died and correct
handling normal unbound connections is obviously a priority.

Comment to comment:
> This has few benefits:
>   1. tcp_v6_send_reset already did that.

It was done to route resets for IPv6 link local addresses. It was a
mistake to do so for global addresses. The patch fixes this as well.

Actually, the problem appears to be even more serious than guaranteed
loss of resets.  As reported by Sergey Soloviev <sol@eqv.ru>, those
misrouted resets create a lot of arp traffic and huge amount of
unresolved arp entires putting down to knees NAT firewalls which use
asymmetric routing.

Signed-off-by: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
2012-10-12 13:52:40 -04:00
Linus Torvalds 940e3a8dd6 The following changes since commit 4cbe5a555fa58a79b6ecbb6c531b8bab0650778d:
Linux 3.6-rc4 (2012-09-01 10:39:58 -0700)
 
 are available in the git repository at:
 
   git://git.kernel.org/pub/scm/linux/kernel/git/ericvh/v9fs.git for-next
 
 for you to fetch changes up to 552aad02a283ee88406b102b4d6455eef7127196:
 
   9P: Fix race between p9_write_work() and p9_fd_request() (2012-09-17 14:54:11 -0500)
 
 ----------------------------------------------------------------
 Jeff Layton (1):
       9p: don't use __getname/__putname for uname/aname
 
 Jim Meyering (1):
       fs/9p: avoid debug OOPS when reading a long symlink
 
 Simon Derr (5):
       net/9p: Check errno validity
       9P: Fix race in p9_read_work()
       9P: fix test at the end of p9_write_work()
       9P: Fix race in p9_write_work()
       9P: Fix race between p9_write_work() and p9_fd_request()
 
  fs/9p/v9fs.c      |   30 +++++++++++++++++++-----------
  fs/9p/vfs_inode.c |    8 ++++----
  net/9p/client.c   |   18 ++++++++++++++++--
  net/9p/trans_fd.c |   38 ++++++++++++++++++++------------------
  4 files changed, 59 insertions(+), 35 deletions(-)
 
 Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG/MacGPG2 v2.0.17 (Darwin)
 Comment: GPGTools - http://gpgtools.org
 
 iQIcBAABAgAGBQJQdvxdAAoJEDZk62b0Tg6xru8QAL1I6YH+O6c+sHONQQnifkl/
 WciZDUSYx7Pd4Ffy48m6y5J6M2VNFUIzqpIKnm4xAiHUwct/E/+yuyfK2zAe1Jxc
 nqPxoU2iyFWXc1Hu5HQhrjMXMlqePPuF1kwTYd0vCXXbVgWfbhwYfjoRr3PGuVTD
 3SpQrBxIvQj1aWRMyyQTcnqnmTLPFr1kX0TRBgvipSfQETVFR5gCXK8sJUDvU+0S
 4kywmb3y31/EpcKdDs7CE1m5kCi6T2mguP5NR4dHtN8YT76IW4urIqyAw6069wQV
 AMmoqhJP2cJ6kyyh93ltZSgcMIUgfrDj2pIsGT3hILusTh9vBT10Db8iNT2ledy8
 W+TxjK0/H0h5rfitHYqD+XnCF4pKFRm5aOOYL8jg02Uh8jU9MzkAIw1/fmXUOZ7O
 rht+HttJht2QCFniV1C442hbzL0J5mYsGPwpWZ5j4dN7PBIi8SYh+Ik0la4rRa8I
 m9C04HHvPsc0gRXPAp1+Ptby4FnPS846a9Ffm4xrkNhFl3z916ef67MnoCGu3roM
 GU9FEOdWhSWJ+52qLcXZwqkrPvlUMOehwnSjlab3BCThPRVK0D7gdTzBN4NDQZWo
 AzhK5sNRFwEidnYo7gy0g2UsRWRgPP7fiUe/xtlWaBlm0DU1+jZc/uzjEn6/h77R
 fQfniKFcMRFIeksGts5e
 =hADE
 -----END PGP SIGNATURE-----

Merge tag 'for-linus-merge-3.7' of git://git.kernel.org/pub/scm/linux/kernel/git/ericvh/v9fs

Pull v9fs update from Eric Van Hensbergen.

* tag 'for-linus-merge-3.7' of git://git.kernel.org/pub/scm/linux/kernel/git/ericvh/v9fs:
  9P: Fix race between p9_write_work() and p9_fd_request()
  9P: Fix race in p9_write_work()
  9P: fix test at the end of p9_write_work()
  9P: Fix race in p9_read_work()
  9p: don't use __getname/__putname for uname/aname
  net/9p: Check errno validity
  fs/9p: avoid debug OOPS when reading a long symlink
2012-10-12 09:59:23 +09:00
Alan Cox 0e24c4fc52 tcp: sysctl interface leaks 16 bytes of kernel memory
If the rc_dereference of tcp_fastopen_ctx ever fails then we copy 16 bytes
of kernel stack into the proc result.

Signed-off-by: Alan Cox <alan@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-10-11 15:12:33 -04:00
Simon Derr 759f42987f 9P: Fix race between p9_write_work() and p9_fd_request()
Race scenario:

thread A			thread B

p9_write_work()                p9_fd_request()

if (list_empty
  (&m->unsent_req_list))
  ...

                               spin_lock(&client->lock);
                               req->status = REQ_STATUS_UNSENT;
                               list_add_tail(..., &m->unsent_req_list);
                               spin_unlock(&client->lock);
                               ....
                               if (n & POLLOUT &&
                               !test_and_set_bit(Wworksched, &m->wsched)
                               schedule_work(&m->wq);
                               --> not done because Wworksched is set

  clear_bit(Wworksched, &m->wsched);
  return;

--> nobody will take care of sending the new request.

This is not very likely to happen though, because p9_write_work()
being called with an empty unsent_req_list is not frequent.
But this also means that taking the lock earlier will not be costly.

Signed-off-by: Simon Derr <simon.derr@bull.net>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
2012-10-11 12:03:31 -05:00
J. Bruce Fields a9ca4043d0 Merge Trond's bugfixes
Merge branch 'bugfixes' of git://linux-nfs.org/~trondmy/nfs-2.6 into
for-3.7-incoming.  Mainly needed for Bryan's "SUNRPC: Set alloc_slot for
backchannel tcp ops", without which the 4.1 server oopses.
2012-10-11 12:41:05 -04:00
stephen hemminger 68aaed54e7 ipv4: fix route mark sparse warning
Sparse complains about RTA_MARK which is should be host order according
to include file and usage in iproute.

net/ipv4/route.c:2223:46: warning: incorrect type in argument 3 (different base types)
net/ipv4/route.c:2223:46:    expected restricted __be32 [usertype] value
net/ipv4/route.c:2223:46:    got unsigned int [unsigned] [usertype] flowic_mark

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-10-10 22:54:59 -04:00
Sarveshwar Bandi 6caab7b054 bridge: Pull ip header into skb->data before looking into ip header.
If lower layer driver leaves the ip header in the skb fragment, it needs to
be first pulled into skb->data before inspecting ip header length or ip version
number.

Signed-off-by: Sarveshwar Bandi <sarveshwar.bandi@emulex.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-10-10 22:50:45 -04:00
Amerigo Wang c468fb1375 pktgen: replace scan_ip6() with in6_pton()
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Cong Wang <amwang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-10-10 22:33:30 -04:00
Amerigo Wang 4c139b8cce pktgen: enable automatic IPv6 address setting
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Cong Wang <amwang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-10-10 22:33:30 -04:00
Amerigo Wang 0373a94671 pktgen: display IPv4 address in human-readable format
It is weird to display IPv4 address in %x format, what's more,
IPv6 address is disaplayed in human-readable format too. So,
make it human-readable.

Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Cong Wang <amwang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-10-10 22:33:30 -04:00
Amerigo Wang 68bf9f0b91 pktgen: set different default min_pkt_size for different protocols
ETH_ZLEN is too small for IPv6, so this default value is not
suitable.

Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Cong Wang <amwang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-10-10 22:33:30 -04:00
Amerigo Wang 5aa8b57200 pktgen: fix crash when generating IPv6 packets
For IPv6, sizeof(struct ipv6hdr) = 40, thus the following
expression will result negative:

        datalen = pkt_dev->cur_pkt_size - 14 -
                  sizeof(struct ipv6hdr) - sizeof(struct udphdr) -
                  pkt_dev->pkt_overhead;

And,  the check "if (datalen < sizeof(struct pktgen_hdr))" will be
passed as "datalen" is promoted to unsigned, therefore will cause
a crash later.

This is a quick fix by checking if "datalen" is negative. The following
patch will increase the default value of 'min_pkt_size' for IPv6.

This bug should exist for a long time, so Cc -stable too.

Cc: <stable@vger.kernel.org>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Cong Wang <amwang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-10-10 22:33:30 -04:00
David S. Miller 85457685e0 Merge tag 'master-2012-10-08' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless
John W. Linville says:

====================
Here is a batch of fixes intended for 3.7...

Amitkumar Karwar provides a couple of mwifiex fixes to correctly
report some reason codes for certain connection failures.  He also
provides a fix to cleanup after a scanning failure.  Bing Zhao rounds
that out with another mwifiex scanning fix.

Daniel Golle gives us a fix for a copy/paste error in rt2x00.

Felix Fietkau brings a couple of ath9k fixes related to suspend/resume,
and a couple of fixes to prevent memory leaks in ath9k and mac80211.

Ronald Wahl sends a carl9170 fix for a sleep in softirq context.

Thomas Pedersen reorders some code to prevent drv_get_tsf from being
called while holding a spinlock, now that it can sleep.

Finally, Wei Yongjun prevents a NULL pointer dereference in the
ath5k driver.

Please let me know if there are problems!
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2012-10-10 11:59:54 -04:00
Linus Torvalds df632d3ce7 NFS client updates for Linux 3.7
Features include:
 
 - Remove CONFIG_EXPERIMENTAL dependency from NFSv4.1
   Aside from the issues discussed at the LKS, distros are shipping
   NFSv4.1 with all the trimmings.
 - Fix fdatasync()/fsync() for the corner case of a server reboot.
 - NFSv4 OPEN access fix: finally distinguish correctly between
   open-for-read and open-for-execute permissions in all situations.
 - Ensure that the TCP socket is closed when we're in CLOSE_WAIT
 - More idmapper bugfixes
 - Lots of pNFS bugfixes and cleanups to remove unnecessary state and
   make the code easier to read.
 - In cases where a pNFS read or write fails, allow the client to
   resume trying layoutgets after two minutes of read/write-through-mds.
 - More net namespace fixes to the NFSv4 callback code.
 - More net namespace fixes to the NFSv3 locking code.
 - More NFSv4 migration preparatory patches.
   Including patches to detect network trunking in both NFSv4 and NFSv4.1
 - pNFS block updates to optimise LAYOUTGET calls.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.12 (GNU/Linux)
 
 iQIcBAABAgAGBQJQdMvBAAoJEGcL54qWCgDyV84P/0XvcEXj6kdMv9EiWfRczo7r
 iAwAIhiEmG1agtZa6v+Gso2MYRQbkGyJi0LKIwzGqNUi0BLQGQCoV93kB0ITVpiN
 g7poDTnPyoItW1oJCtC48/Mx0G5C1yrHSwFAJrXmtzDF1mwd/BIQReafYp6x+/TU
 Mvwm7au3Y2ySRBEDmY4zyBERHXGt//JmsZ9Ays6jewQg5ZOyjDQKoeHVYaaeJoF0
 A0tQGcBSNdySagI5dt4SlkuO7AClhzVHlilep2dsBu/TLS0F2pEdHXvM2W0koZmM
 uazaIpzd2F7TfokTYExgsyKsqpkzpDf1kebN4Y1+Ioi7Yy30dQrX6lNaUNcOmOJQ
 xx694HDHV90KdRBVSFhOIHMTBRcls68hBcWib3MXWHTKX6HVgnFMwhwxGH0MRezf
 3rmXoqn+CO1j5WeQmA3BqdVbHSZHi913TKEwE/qoW4pmOFhv5I2flXWQS/Rwvdng
 2xDCe6TlvhMS92IpyvNEIicXLRSm+DUAmoAfSqqlifZIAEM5R29e/wCAsmVprO3B
 LPHyUoIMO6SZ1PL6Rk20+6qQfvCK7U/ChULsUL/zb7R88Pc3sFE2BeAvZVATsvH3
 +FJWTz43fwUBoMhPsn8xSBLn/fq6az5C19syz6Fpu3DZ4X0EwyVWifiFk6HgcxZD
 J8ajEl+dNZeFE8rkwykX
 =uBk7
 -----END PGP SIGNATURE-----

Merge tag 'nfs-for-3.7-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs

Pull NFS client updates from Trond Myklebust:
 "Features include:

   - Remove CONFIG_EXPERIMENTAL dependency from NFSv4.1
     Aside from the issues discussed at the LKS, distros are shipping
     NFSv4.1 with all the trimmings.
   - Fix fdatasync()/fsync() for the corner case of a server reboot.
   - NFSv4 OPEN access fix: finally distinguish correctly between
     open-for-read and open-for-execute permissions in all situations.
   - Ensure that the TCP socket is closed when we're in CLOSE_WAIT
   - More idmapper bugfixes
   - Lots of pNFS bugfixes and cleanups to remove unnecessary state and
     make the code easier to read.
   - In cases where a pNFS read or write fails, allow the client to
     resume trying layoutgets after two minutes of read/write-
     through-mds.
   - More net namespace fixes to the NFSv4 callback code.
   - More net namespace fixes to the NFSv3 locking code.
   - More NFSv4 migration preparatory patches.
     Including patches to detect network trunking in both NFSv4 and
     NFSv4.1
   - pNFS block updates to optimise LAYOUTGET calls."

* tag 'nfs-for-3.7-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (113 commits)
  pnfsblock: cleanup nfs4_blkdev_get
  NFS41: send real read size in layoutget
  NFS41: send real write size in layoutget
  NFS: track direct IO left bytes
  NFSv4.1: Cleanup ugliness in pnfs_layoutgets_blocked()
  NFSv4.1: Ensure that the layout sequence id stays 'close' to the current
  NFSv4.1: Deal with seqid wraparound in the pNFS return-on-close code
  NFSv4 set open access operation call flag in nfs4_init_opendata_res
  NFSv4.1: Remove the dependency on CONFIG_EXPERIMENTAL
  NFSv4 reduce attribute requests for open reclaim
  NFSv4: nfs4_open_done first must check that GETATTR decoded a file type
  NFSv4.1: Deal with wraparound when updating the layout "barrier" seqid
  NFSv4.1: Deal with wraparound issues when updating the layout stateid
  NFSv4.1: Always set the layout stateid if this is the first layoutget
  NFSv4.1: Fix another refcount issue in pnfs_find_alloc_layout
  NFSv4: don't put ACCESS in OPEN compound if O_EXCL
  NFSv4: don't check MAY_WRITE access bit in OPEN
  NFS: Set key construction data for the legacy upcall
  NFSv4.1: don't do two EXCHANGE_IDs on mount
  NFS: nfs41_walk_client_list(): re-lock before iterating
  ...
2012-10-10 23:52:35 +09:00
J. Bruce Fields f474af7051 UAPI Disintegration 2012-10-09
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.12 (GNU/Linux)
 
 iQIVAwUAUHPmWxOxKuMESys7AQKN4w//XDwALfbf0MXIw+gwyRiUtJe9mGexvI6X
 1R4FWU9a3ImzEZP4cWnmPGT2wmC/x007DcIvx8cyvbdlSuqtR2i/DC+HbWabiLRn
 nJS7Eer1BJvLv5dn6NmXMEz7yB4Z46+frcmBs3WQeR0sqBMDm+rjQzCqECznO8Jc
 VtCbox+VR2DuWcM++YECTblYEH3Z+doDXUN2eBaD8L9x3klPbPXD7OcRyOnry8w+
 ynmUTKKyH4+hpxDakYrObPIg+vFCxb4QRck1mlgA4wbvb3eqjhM0oOCYJ8GvmILA
 vdFYztWCjkiuOl5djtXBlsClX8SAMOBYlRed+R1GvjNCSR+WCWrFJJ2F8qoQ1w87
 9ts2/8qrozS8luTB475SkT2uLdJkIUKX89Oh+dWeE8YkbPnRPj5lNAdtNY5QSyDq
 VaRpIo+YfmZygyvHJQlAXBuZ0mvzcPzArfcPgSVTD3B7xTEGVu/45V7SnQX5os/V
 v39ySPXMdGOIdvK51gw7OtZl64uqrEKu39PyYDX/GUADflp/CHD0J7PJrQePbsH9
 AQolVZDIxTfKqYQnUdL8+C8Zc24RowEzz3c2+aO89MSzwGqev3q8sXRVbW/Iqryg
 p+V3nHe+ipKcga5tOBlPr9KDtDd7j3xN2yaIwf5/QyO1OHBpjAZP1gjSVDcUcwpi
 svYy4kPn3PA=
 =etoL
 -----END PGP SIGNATURE-----

nfs: disintegrate UAPI for nfs

This is to complete part of the Userspace API (UAPI) disintegration for which
the preparatory patches were pulled recently.  After these patches, userspace
headers will be segregated into:

        include/uapi/linux/.../foo.h

for the userspace interface stuff, and:

        include/linux/.../foo.h

for the strictly kernel internal stuff.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-10-09 18:35:22 -04:00
jeff.liu 5175a5e76b RDS: fix rds-ping spinlock recursion
This is the revised patch for fixing rds-ping spinlock recursion
according to Venkat's suggestions.

RDS ping/pong over TCP feature has been broken for years(2.6.39 to
3.6.0) since we have to set TCP cork and call kernel_sendmsg() between
ping/pong which both need to lock "struct sock *sk". However, this
lock has already been hold before rds_tcp_data_ready() callback is
triggerred. As a result, we always facing spinlock resursion which
would resulting in system panic.

Given that RDS ping is only used to test the connectivity and not for
serious performance measurements, we can queue the pong transmit to
rds_wq as a delayed response.

Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
CC: Venkat Venkatsubra <venkat.x.venkatsubra@oracle.com>
CC: David S. Miller <davem@davemloft.net>
CC: James Morris <james.l.morris@oracle.com>
Signed-off-by: Jie Liu <jeff.liu@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-10-09 13:57:23 -04:00
David S. Miller 8dd9117cc7 Merge git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux
Pulled mainline in order to get the UAPI infrastructure already
merged before I pull in David Howells's UAPI trees for networking.

Signed-off-by: David S. Miller <davem@davemloft.net>
2012-10-09 13:14:32 -04:00
Michel Lespinasse 4c199a93a2 rbtree: empty nodes have no color
Empty nodes have no color.  We can make use of this property to simplify
the code emitted by the RB_EMPTY_NODE and RB_CLEAR_NODE macros.  Also,
we can get rid of the rb_init_node function which had been introduced by
commit 88d19cf379 ("timers: Add rb_init_node() to allow for stack
allocated rb nodes") to avoid some issue with the empty node's color not
being initialized.

I'm not sure what the RB_EMPTY_NODE checks in rb_prev() / rb_next() are
doing there, though.  axboe introduced them in commit 10fd48f237
("rbtree: fixed reversed RB_EMPTY_NODE and rb_next/prev").  The way I
see it, the 'empty node' abstraction is only used by rbtree users to
flag nodes that they haven't inserted in any rbtree, so asking the
predecessor or successor of such nodes doesn't make any sense.

One final rb_init_node() caller was recently added in sysctl code to
implement faster sysctl name lookups.  This code doesn't make use of
RB_EMPTY_NODE at all, and from what I could see it only called
rb_init_node() under the mistaken assumption that such initialization was
required before node insertion.

[sfr@canb.auug.org.au: fix net/ceph/osd_client.c build]
Signed-off-by: Michel Lespinasse <walken@google.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Acked-by: David Woodhouse <David.Woodhouse@intel.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Daniel Santos <daniel.santos@pobox.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: John Stultz <john.stultz@linaro.org>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-10-09 16:22:32 +09:00
Arnd Bergmann b61a602ee6 ipvs: initialize returned data in do_ip_vs_get_ctl
As reported by a gcc warning, the do_ip_vs_get_ctl does not initalize
all the members of the ip_vs_timeout_user structure it returns if
at least one of the TCP or UDP protocols is disabled for ipvs.

This makes sure that the data is always initialized, before it is
returned as a response to IPVS_CMD_GET_CONFIG or printed as a
debug message in IPVS_CMD_SET_CONFIG.

Without this patch, building ARM ixp4xx_defconfig results in:

net/netfilter/ipvs/ip_vs_ctl.c: In function 'ip_vs_genl_set_cmd':
net/netfilter/ipvs/ip_vs_ctl.c:2238:47: warning: 't.udp_timeout' may be used uninitialized in this function [-Wuninitialized]
net/netfilter/ipvs/ip_vs_ctl.c:3322:28: note: 't.udp_timeout' was declared here
net/netfilter/ipvs/ip_vs_ctl.c:2238:47: warning: 't.tcp_fin_timeout' may be used uninitialized in this function [-Wuninitialized]
net/netfilter/ipvs/ip_vs_ctl.c:3322:28: note: 't.tcp_fin_timeout' was declared here
net/netfilter/ipvs/ip_vs_ctl.c:2238:47: warning: 't.tcp_timeout' may be used uninitialized in this function [-Wuninitialized]
net/netfilter/ipvs/ip_vs_ctl.c:3322:28: note: 't.tcp_timeout' was declared here

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
2012-10-09 13:04:34 +09:00
Julian Anastasov ad4d3ef8b7 ipvs: fix ARP resolving for direct routing mode
After the change "Make neigh lookups directly in output packet path"
(commit a263b30936) IPVS can not reach the real server for DR mode
because we resolve the destination address from IP header, not from
route neighbour. Use the new FLOWI_FLAG_KNOWN_NH flag to request
output routes with known nexthop, so that it has preference
on resolving.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-10-08 17:42:36 -04:00