Commit Graph

30407 Commits

Author SHA1 Message Date
Johannes Berg 2ecf7536b2 quota/genetlink: use proper genetlink multicast APIs
The quota code is abusing the genetlink API and is using
its family ID as the multicast group ID, which is invalid
and may belong to somebody else (and likely will.)

Make the quota code use the correct API, but since this
is already used as-is by userspace, reserve a family ID
for this code and also reserve that group ID to not break
userspace assumptions.

Acked-by: Jan Kara <jack@suse.cz>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-11-19 16:39:05 -05:00
Johannes Berg e5dcecba01 drop_monitor/genetlink: use proper genetlink multicast APIs
The drop monitor code is abusing the genetlink API and is
statically using the generic netlink multicast group 1, even
if that group belongs to somebody else (which it invariably
will, since it's not reserved.)

Make the drop monitor code use the proper APIs to reserve a
group ID, but also reserve the group id 1 in generic netlink
code to preserve the userspace API. Since drop monitor can
be a module, don't clear the bit for it on unregistration.

Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-11-19 16:39:05 -05:00
Johannes Berg c53ed74236 genetlink: only pass array to genl_register_family_with_ops()
As suggested by David Miller, make genl_register_family_with_ops()
a macro and pass only the array, evaluating ARRAY_SIZE() in the
macro, this is a little safer.

The openvswitch has some indirection, assing ops/n_ops directly in
that code. This might ultimately just assign the pointers in the
family initializations, saving the struct genl_family_and_ops and
code (once mcast groups are handled differently.)

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-11-19 16:39:05 -05:00
Andrey Vagin dbde497966 tcp: don't update snd_nxt, when a socket is switched from repair mode
snd_nxt must be updated synchronously with sk_send_head.  Otherwise
tp->packets_out may be updated incorrectly, what may bring a kernel panic.

Here is a kernel panic from my host.
[  103.043194] BUG: unable to handle kernel NULL pointer dereference at 0000000000000048
[  103.044025] IP: [<ffffffff815aaaaf>] tcp_rearm_rto+0xcf/0x150
...
[  146.301158] Call Trace:
[  146.301158]  [<ffffffff815ab7f0>] tcp_ack+0xcc0/0x12c0

Before this panic a tcp socket was restored. This socket had sent and
unsent data in the write queue. Sent data was restored in repair mode,
then the socket was switched from reapair mode and unsent data was
restored. After that the socket was switched back into repair mode.

In that moment we had a socket where write queue looks like this:
snd_una    snd_nxt   write_seq
   |_________|________|
             |
	  sk_send_head

After a second switching from repair mode the state of socket was
changed:

snd_una          snd_nxt, write_seq
   |_________ ________|
             |
	  sk_send_head

This state is inconsistent, because snd_nxt and sk_send_head are not
synchronized.

Bellow you can find a call trace, how packets_out can be incremented
twice for one skb, if snd_nxt and sk_send_head are not synchronized.
In this case packets_out will be always positive, even when
sk_write_queue is empty.

tcp_write_wakeup
	skb = tcp_send_head(sk);
	tcp_fragment
		if (!before(tp->snd_nxt, TCP_SKB_CB(buff)->end_seq))
			tcp_adjust_pcount(sk, skb, diff);
	tcp_event_new_data_sent
		tp->packets_out += tcp_skb_pcount(skb);

I think update of snd_nxt isn't required, when a socket is switched from
repair mode.  Because it's initialized in tcp_connect_init. Then when a
write queue is restored, snd_nxt is incremented in tcp_event_new_data_sent,
so it's always is in consistent state.

I have checked, that the bug is not reproduced with this patch and
all tests about restoring tcp connections work fine.

Cc: Pavel Emelyanov <xemul@parallels.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
Cc: James Morris <jmorris@namei.org>
Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
Cc: Patrick McHardy <kaber@trash.net>
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Acked-by: Pavel Emelyanov <xemul@parallels.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-11-19 16:14:20 -05:00
fan.du 236c9f8486 xfrm: Release dst if this dst is improper for vti tunnel
After searching rt by the vti tunnel dst/src parameter,
if this rt has neither attached to any transformation
nor the transformation is not tunnel oriented, this rt
should be released back to ip layer.

otherwise causing dst memory leakage.

Signed-off-by: Fan Du <fan.du@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-11-19 15:50:57 -05:00
Johannes Berg 840e93f2ee netlink: fix documentation typo in netlink_set_err()
The parameter is just 'group', not 'groups', fix the documentation typo.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-11-19 15:07:01 -05:00
Luís Fernando Cornachioni Estrozi acab78b996 netfilter: ebt_ip6: fix source and destination matching
This bug was introduced on commit 0898f99a2. This just recovers two
checks that existed before as suggested by Bart De Schuymer.

Signed-off-by: Luís Fernando Cornachioni Estrozi <lestrozi@uolinc.com>
Signed-off-by: Bart De Schuymer <bdschuym@pandora.be>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-11-19 15:33:29 +01:00
Hannes Frederic Sowa cf970c002d ping: prevent NULL pointer dereference on write to msg_name
A plain read() on a socket does set msg->msg_name to NULL. So check for
NULL pointer first.

Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-11-18 16:02:03 -05:00
Vlad Yasevich eca42aaf89 ipv6: Fix inet6_init() cleanup order
Commit 6d0bfe2261
	net: ipv6: Add IPv6 support to the ping socket

introduced a change in the cleanup logic of inet6_init and
has a bug in that ipv6_packet_cleanup() may not be called.
Fix the cleanup ordering.

CC: Hannes Frederic Sowa <hannes@stressinduktion.org>
CC: Lorenzo Colitti <lorenzo@google.com>
CC: Fabio Estevam <fabio.estevam@freescale.com>
Signed-off-by: Vlad Yasevich <vyasevich@gmail.com>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-11-18 15:38:46 -05:00
Johannes Berg 029b234fb3 genetlink: rename shadowed variable
Sparse pointed out that the new flags variable I had added
shadowed an existing one, rename the new one to avoid that,
making the code clearer.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-11-18 15:34:00 -05:00
Hannes Frederic Sowa bceaa90240 inet: prevent leakage of uninitialized memory to user in recv syscalls
Only update *addr_len when we actually fill in sockaddr, otherwise we
can return uninitialized memory from the stack to the caller in the
recvfrom, recvmmsg and recvmsg syscalls. Drop the the (addr_len == NULL)
checks because we only get called with a valid addr_len pointer either
from sock_common_recvmsg or inet_recvmsg.

If a blocking read waits on a socket which is concurrently shut down we
now return zero and set msg_msgnamelen to 0.

Reported-by: mpb <mpb.mail@gmail.com>
Suggested-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-11-18 15:12:03 -05:00
Fabio Estevam bcd081a3ae net: ipv6: ndisc: Fix warning when CONFIG_SYSCTL=n
When CONFIG_SYSCTL=n the following build warning happens:

net/ipv6/ndisc.c:1730:1: warning: label 'out' defined but not used [-Wunused-label]

The 'out' label is only used when CONFIG_SYSCTL=y, so move it inside the
'ifdef CONFIG_SYSCTL' block.

Signed-off-by: Fabio Estevam <fabio.estevam@freescale.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-11-18 14:49:16 -05:00
Pablo Neira Ayuso 0c3c6c00c6 netfilter: nf_conntrack: decrement global counter after object release
nf_conntrack_free() decrements our counter (net->ct.count)
before releasing the conntrack object. That counter is used in the
nf_conntrack_cleanup_net_list path to check if it's time to
kmem_cache_destroy our cache of conntrack objects. I think we have
a race there that should be easier to trigger (although still hard)
with CONFIG_DEBUG_OBJECTS_FREE as object releases become slowier
according to the following splat:

[ 1136.321305] WARNING: CPU: 2 PID: 2483 at lib/debugobjects.c:260
debug_print_object+0x83/0xa0()
[ 1136.321311] ODEBUG: free active (active state 0) object type:
timer_list hint: delayed_work_timer_fn+0x0/0x20
...
[ 1136.321390] Call Trace:
[ 1136.321398]  [<ffffffff8160d4a2>] dump_stack+0x45/0x56
[ 1136.321405]  [<ffffffff810514e8>] warn_slowpath_common+0x78/0xa0
[ 1136.321410]  [<ffffffff81051557>] warn_slowpath_fmt+0x47/0x50
[ 1136.321414]  [<ffffffff812f8883>] debug_print_object+0x83/0xa0
[ 1136.321420]  [<ffffffff8106aa90>] ? execute_in_process_context+0x90/0x90
[ 1136.321424]  [<ffffffff812f99fb>] debug_check_no_obj_freed+0x20b/0x250
[ 1136.321429]  [<ffffffff8112e7f2>] ? kmem_cache_destroy+0x92/0x100
[ 1136.321433]  [<ffffffff8115d945>] kmem_cache_free+0x125/0x210
[ 1136.321436]  [<ffffffff8112e7f2>] kmem_cache_destroy+0x92/0x100
[ 1136.321443]  [<ffffffffa046b806>] nf_conntrack_cleanup_net_list+0x126/0x160 [nf_conntrack]
[ 1136.321449]  [<ffffffffa046c43d>] nf_conntrack_pernet_exit+0x6d/0x80 [nf_conntrack]
[ 1136.321453]  [<ffffffff81511cc3>] ops_exit_list.isra.3+0x53/0x60
[ 1136.321457]  [<ffffffff815124f0>] cleanup_net+0x100/0x1b0
[ 1136.321460]  [<ffffffff8106b31e>] process_one_work+0x18e/0x430
[ 1136.321463]  [<ffffffff8106bf49>] worker_thread+0x119/0x390
[ 1136.321467]  [<ffffffff8106be30>] ? manage_workers.isra.23+0x2a0/0x2a0
[ 1136.321470]  [<ffffffff8107210b>] kthread+0xbb/0xc0
[ 1136.321472]  [<ffffffff81072050>] ? kthread_create_on_node+0x110/0x110
[ 1136.321477]  [<ffffffff8161b8fc>] ret_from_fork+0x7c/0xb0
[ 1136.321479]  [<ffffffff81072050>] ? kthread_create_on_node+0x110/0x110
[ 1136.321481] ---[ end trace 25f53c192da70825 ]---

Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-11-18 14:07:19 +01:00
Pablo Neira Ayuso 8691a9a338 netfilter: nft_compat: fix error path in nft_parse_compat()
The patch 0ca743a55991: "netfilter: nf_tables: add compatibility
layer for x_tables", leads to the following Smatch

 warning: "net/netfilter/nft_compat.c:140 nft_parse_compat()
          warn: signedness bug returning '(-34)'"

This nft_parse_compat function returns error codes but the return
type is u8 so the error codes are transformed into small positive
values. The callers don't check the return.

Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-11-18 12:53:41 +01:00
Phil Oester 23dfe136e2 netfilter: fix wrong byte order in nf_ct_seqadj_set internal information
In commit 41d73ec053, sequence number adjustments were moved to a
separate file. Unfortunately, the sequence numbers that are stored
in the nf_ct_seqadj structure are expressed in host byte order. The
necessary ntohl call was removed when the call to adjust_tcp_sequence
was collapsed into nf_ct_seqadj_set. This broke the FTP NAT helper.
Fix it by adding back the byte order conversions.

Reported-by: Dawid Stawiarski <dawid.stawiarski@netart.pl>
Signed-off-by: Phil Oester <kernel@linuxace.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-11-18 12:53:40 +01:00
Martin Topholm c1898c4c29 netfilter: synproxy: correct wscale option passing
Timestamp are used to store additional syncookie parameters such as sack,
ecn, and wscale. The wscale value we need to encode is the client's
wscale, since we can't recover that later in the session. Next overwrite
the wscale option so the later synproxy_send_client_synack will send
the backend's wscale to the client.

Signed-off-by: Martin Topholm <mph@one.com>
Reviewed-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-11-18 12:53:38 +01:00
Martin Topholm a6441b7a39 netfilter: synproxy: send mss option to backend
When the synproxy_parse_options is called on the client ack the mss
option will not be present. Consequently mss wont be included in the
backend syn packet, which falls back to 536 bytes mss.

Therefore XT_SYNPROXY_OPT_MSS is explicitly flagged when recovering mss
value from cookie.

Signed-off-by: Martin Topholm <mph@one.com>
Reviewed-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-11-18 12:53:36 +01:00
Linus Torvalds 673fdfe3f0 NFS client bugfixes:
- Stable fix for data corruption when retransmitting O_DIRECT writes
 - Stable fix for a deep recursion/stack overflow bug in rpc_release_client
 - Stable fix for infinite looping when mounting a NFSv4.x volume
 - Fix a typo in the nfs mount option parser
 - Allow pNFS layouts to be compiled into the kernel when NFSv4.1 is
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.15 (GNU/Linux)
 
 iQIcBAABAgAGBQJSh95hAAoJEGcL54qWCgDy1wgP/1zc4C7sMBQFWpIo676MHT4n
 m5v4bWgYhRBC0dne5GG8dC4+Q2cPkua4H7cWHCJKQmMuDmbzgOB33RVyQdwU/YNp
 ItLIZLz2EySCKo8OOKvbf4l5jDFeoBYEbheB2bmcE42BgixaTbiHKXpgCtoHr5pT
 qOX0JI29QtstAY3heiLW52bA3OqNJGwfE595KKEHXZwcD0n8izjqOU7Vrqj0E8/Q
 S+Xw9a613fo7chzbdcugR+iW6kkr7qtjxXiI5OXvplGyHycbBJRfvAqHkg01Z69k
 At9Y43cTEFiEx/zfKflmiFkn+IF9xFhABYNCKvpTtLFvQkwJDfYHa6h2jrFac/87
 mTRZHIzJ0nghhE1VxOEjA2zvIE3Hd5Xk4By+2BKJaB/Tp0RPbSsHs7t0s8t7RdHi
 ZwP/bNDynZY3S+HlbMor3A3900bUXLQBpCpRt/0+Hvc5bGLRszA5/Jinv+EqwOT9
 LHXTE/CsQGJCOz72SjDZT4Gsa0t11UKdRpznk4XCEvH9tflK78nS32XUktZEC9u/
 bCycLbvX+LrquxjQ9WN2TCmwnwyEiv45tSK2b8gf8JS1zJmePDKdnQ1dpHbiZAIO
 uhEhAqDwAY64+T2+AGncITh8ZfthZhU6wkfGoepqYvC1/5AaeSWrFidDvE1NJUGh
 xjcsGH6Ym8NnnT3rt/qp
 =uOIM
 -----END PGP SIGNATURE-----

Merge tag 'nfs-for-3.13-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs

Pull NFS client bugfixes:
 - Stable fix for data corruption when retransmitting O_DIRECT writes
 - Stable fix for a deep recursion/stack overflow bug in rpc_release_client
 - Stable fix for infinite looping when mounting a NFSv4.x volume
 - Fix a typo in the nfs mount option parser
 - Allow pNFS layouts to be compiled into the kernel when NFSv4.1 is

* tag 'nfs-for-3.13-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
  nfs: fix pnfs Kconfig defaults
  NFS: correctly report misuse of "migration" mount option.
  nfs: don't retry detect_trunking with RPC_AUTH_UNIX more than once
  SUNRPC: Avoid deep recursion in rpc_release_client
  SUNRPC: Fix a data corruption issue when retransmitting RPC calls
2013-11-16 13:14:56 -08:00
Linus Torvalds 449bf8d03c Merge branch 'nfsd-next' of git://linux-nfs.org/~bfields/linux
Pull nfsd changes from Bruce Fields:
 "This includes miscellaneous bugfixes and cleanup and a performance fix
  for write-heavy NFSv4 workloads.

  (The most significant nfsd-relevant change this time is actually in
  the delegation patches that went through Viro, fixing a long-standing
  bug that can cause NFSv4 clients to miss updates made by non-nfs users
  of the filesystem.  Those enable some followup nfsd patches which I
  have queued locally, but those can wait till 3.14)"

* 'nfsd-next' of git://linux-nfs.org/~bfields/linux: (24 commits)
  nfsd: export proper maximum file size to the client
  nfsd4: improve write performance with better sendspace reservations
  svcrpc: remove an unnecessary assignment
  sunrpc: comment typo fix
  Revert "nfsd: remove_stid can be incorporated into nfs4_put_delegation"
  nfsd4: fix discarded security labels on setattr
  NFSD: Add support for NFS v4.2 operation checking
  nfsd4: nfsd_shutdown_net needs state lock
  NFSD: Combine decode operations for v4 and v4.1
  nfsd: -EINVAL on invalid anonuid/gid instead of silent failure
  nfsd: return better errors to exportfs
  nfsd: fh_update should error out in unexpected cases
  nfsd4: need to destroy revoked delegations in destroy_client
  nfsd: no need to unhash_stid before free
  nfsd: remove_stid can be incorporated into nfs4_put_delegation
  nfsd: nfs4_open_delegation needs to remove_stid rather than unhash_stid
  nfsd: nfs4_free_stid
  nfsd: fix Kconfig syntax
  sunrpc: trim off EC bytes in GSSAPI v2 unwrap
  gss_krb5: document that we ignore sequence number
  ...
2013-11-16 12:04:02 -08:00
Al Viro b26d4cd385 consolidate simple ->d_delete() instances
Rename simple_delete_dentry() to always_delete_dentry() and export it.
Export simple_dentry_operations, while we are at it, and get rid of
their duplicates

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-11-15 22:04:17 -05:00
Eric Dumazet f52ed89971 pkt_sched: fq: fix pacing for small frames
For performance reasons, sch_fq tried hard to not setup timers for every
sent packet, using a quantum based heuristic : A delay is setup only if
the flow exhausted its credit.

Problem is that application limited flows can refill their credit
for every queued packet, and they can evade pacing.

This problem can also be triggered when TCP flows use small MSS values,
as TSO auto sizing builds packets that are smaller than the default fq
quantum (3028 bytes)

This patch adds a 40 ms delay to guard flow credit refill.

Fixes: afe4fd0624 ("pkt_sched: fq: Fair Queue packet scheduler")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Maciej Żenczykowski <maze@google.com>
Cc: Willem de Bruijn <willemb@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-11-15 21:01:52 -05:00
Eric Dumazet 65c5189a2b pkt_sched: fq: warn users using defrate
Commit 7eec4174ff ("pkt_sched: fq: fix non TCP flows pacing")
obsoleted TCA_FQ_FLOW_DEFAULT_RATE without notice for the users.

Suggested by David Miller

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-11-15 21:01:52 -05:00
Johannes Berg 568508aa07 genetlink: unify registration functions
Now that the ops assignment is just two variables rather than a
long list iteration etc., there's no reason to separately export
__genl_register_family() and __genl_register_family_with_ops().

Unify the two functions into __genl_register_family() and make
genl_register_family_with_ops() call it after assigning the ops.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-11-15 20:50:23 -05:00
Linus Torvalds 9073e1a804 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial
Pull trivial tree updates from Jiri Kosina:
 "Usual earth-shaking, news-breaking, rocket science pile from
  trivial.git"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (23 commits)
  doc: usb: Fix typo in Documentation/usb/gadget_configs.txt
  doc: add missing files to timers/00-INDEX
  timekeeping: Fix some trivial typos in comments
  mm: Fix some trivial typos in comments
  irq: Fix some trivial typos in comments
  NUMA: fix typos in Kconfig help text
  mm: update 00-INDEX
  doc: Documentation/DMA-attributes.txt fix typo
  DRM: comment: `halve' -> `half'
  Docs: Kconfig: `devlopers' -> `developers'
  doc: typo on word accounting in kprobes.c in mutliple architectures
  treewide: fix "usefull" typo
  treewide: fix "distingush" typo
  mm/Kconfig: Grammar s/an/a/
  kexec: Typo s/the/then/
  Documentation/kvm: Update cpuid documentation for steal time and pv eoi
  treewide: Fix common typo in "identify"
  __page_to_pfn: Fix typo in comment
  Correct some typos for word frequency
  clk: fixed-factor: Fix a trivial typo
  ...
2013-11-15 16:47:22 -08:00
Michal Kubeček 529d048954 macvlan: disable LRO on lower device instead of macvlan
A macvlan device has always LRO disabled so that calling
dev_disable_lro() on it does nothing. If we need to disable LRO
e.g. because

  - the macvlan device is inserted into a bridge
  - IPv6 forwarding is enabled for it
  - it is in a different namespace than lowerdev and IPv4
    forwarding is enabled in it

we need to disable LRO on its underlying device instead (as we
do for 802.1q VLAN devices).

v2: use newly introduced netif_is_macvlan()

Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-11-15 17:55:48 -05:00
John W. Linville 32019c739c Merge branch 'for-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth 2013-11-15 14:18:45 -05:00
Jukka Rissanen 1188f05497 6lowpan: Uncompression of traffic class field was incorrect
If priority/traffic class field in IPv6 header is set (seen when
using ssh), the uncompression sets the TC and Flow fields incorrectly.

Example:

This is IPv6 header of a sent packet. Note the priority/TC (=1) in
the first byte.

00000000: 61 00 00 00 00 2c 06 40 fe 80 00 00 00 00 00 00
00000010: 02 02 72 ff fe c6 42 10 fe 80 00 00 00 00 00 00
00000020: 02 1e ab ff fe 4c 52 57

This gets compressed like this in the sending side

00000000: 72 31 04 06 02 1e ab ff fe 4c 52 57 ec c2 00 16
00000010: aa 2d fe 92 86 4e be c6 ....

In the receiving end, the packet gets uncompressed to this
IPv6 header

00000000: 60 06 06 02 00 2a 1e 40 fe 80 00 00 00 00 00 00
00000010: 02 02 72 ff fe c6 42 10 fe 80 00 00 00 00 00 00
00000020: ab ff fe 4c 52 57 ec c2

First four bytes are set incorrectly and we have also lost
two bytes from destination address.

The fix is to switch the case values in switch statement
when checking the TC field.

Signed-off-by: Jukka Rissanen <jukka.rissanen@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-11-15 03:11:06 -05:00
Erik Hugne 3db0a197ed tipc: fix dereference before check warning
This fixes the following Smatch warning:
net/tipc/link.c:2364 tipc_link_recv_fragment()
    warn: variable dereferenced before check '*head' (see line 2361)

A null pointer might be passed to skb_try_coalesce if
a malicious sender injects orphan fragments on a link.

Signed-off-by: Erik Hugne <erik.hugne@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-11-15 03:11:06 -05:00
Linus Torvalds b746f9c794 Nothing really exciting: some groundwork for changing virtio endian, and
some robustness fixes for broken virtio devices, plus minor tweaks.
 
 [vs last pull request: added the virtio-scsi broken vq escape patch, which
 I somehow lost.]
 
 Cheers,
 Rusty.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.14 (GNU/Linux)
 
 iQIcBAABAgAGBQJSgDsJAAoJENkgDmzRrbjxEE4P/jXqZHS/HdlxW9k0BjKKlEIF
 PdtCoP3UhWTdskXvy2pD8m6nYn214MEJYUIa4HFlIEZsdxhuexzQHY19Ynkjagyv
 57sRsUUm5fYQLIL7IUh2DUD1VU38hUFinno/y333szzvCj9qITDA/QABsiWxK8NO
 dq+Lmeixgrhc5yN9iryW+gZV+hekJIZ4LsU5ejSaJucKblzXUH8qIbmSthG7RTYJ
 tr4J7xTTXbhxY4CoC5Dpx2hvsFkvzaAIvI4Nr1mDjfq5cR8BaYvnC89U1IbhdAey
 p1AbZE58JLrY+Z8K8LBRGV2KjO8qSZ6R47hbZ9nAnodJYB7sZLyj6jUe1q+/htuC
 Dh9Xm9O4eW2xNaFk20dYeIF4UU5/HzdsbvG/IlH8x4sm8/K706ocYyAOHlzYUg2T
 k7gltrgDzDokMgb2R44gwnr4oaJ2q8Gne6JXswlPEv2eRs6vNnA5Xhc0rEHGkU6C
 gYn1vNFN6yx0vf2syG/Ce5pZtMxGpefKQkHzzWdq8FKr1B9s54dDuf2hls7J8A9t
 OQT1gE33yURSelf4Kh4k9zWXaWk/Ohv9l2R1cqpALnJ4/+q0fP5t7HdK500S7aax
 DxLeFeqvsBw7nlWgsGxQmt+fjITQFHhcDiwst0ehnt6RbDEW7XPIguz0K/gyhxYG
 +UNbl/5Gr64jnUX3YCzm
 =vY2L
 -----END PGP SIGNATURE-----

Merge tag 'virtio-next-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux

Pull virtio updates from Rusty Russell:
 "Nothing really exciting: some groundwork for changing virtio endian,
  and some robustness fixes for broken virtio devices, plus minor
  tweaks"

* tag 'virtio-next-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux:
  virtio_scsi: verify if queue is broken after virtqueue_get_buf()
  x86, asmlinkage, lguest: Pass in globals into assembler statement
  virtio: mmio: fix signature checking for BE guests
  virtio_ring: adapt to notify() returning bool
  virtio_net: verify if queue is broken after virtqueue_get_buf()
  virtio_console: verify if queue is broken after virtqueue_get_buf()
  virtio_blk: verify if queue is broken after virtqueue_get_buf()
  virtio_ring: add new function virtqueue_is_broken()
  virtio_test: verify if virtqueue_kick() succeeded
  virtio_net: verify if virtqueue_kick() succeeded
  virtio_ring: let virtqueue_{kick()/notify()} return a bool
  virtio_ring: change host notification API
  virtio_config: remove virtio_config_val
  virtio: use size-based config accessors.
  virtio_config: introduce size-based accessors.
  virtio_ring: plug kmemleak false positive.
  virtio: pm: use CONFIG_PM_SLEEP instead of CONFIG_PM
2013-11-15 13:28:47 +09:00
Tetsuo Handa 652586df95 seq_file: remove "%n" usage from seq_file users
All seq_printf() users are using "%n" for calculating padding size,
convert them to use seq_setwidth() / seq_pad() pair.

Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Signed-off-by: Kees Cook <keescook@chromium.org>
Cc: Joe Perches <joe@perches.com>
Cc: David Miller <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-11-15 09:32:20 +09:00
Eric Dumazet c9e9042994 ipv4: fix possible seqlock deadlock
ip4_datagram_connect() being called from process context,
it should use IP_INC_STATS() instead of IP_INC_STATS_BH()
otherwise we can deadlock on 32bit arches, or get corruptions of
SNMP counters.

Fixes: 584bdf8cbd ("[IPV4]: Fix "ipOutNoRoutes" counter error for TCP and UDP")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Dave Jones <davej@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-11-14 17:31:14 -05:00
Geyslan G. Bem 84a035f694 net/hsr: Fix possible leak in 'hsr_get_node_status()'
If 'hsr_get_node_data()' returns error, going directly to 'fail' label
doesn't free the memory pointed by 'skb_out'.

Signed-off-by: Geyslan G. Bem <geyslan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-11-14 17:26:21 -05:00
Maciej Żenczykowski 2abc2f070e pkt_sched: fq: change classification of control packets
Initial sch_fq implementation copied code from pfifo_fast to classify
a packet as a high prio packet.

This clashes with setups using PRIO with say 7 bands, as one of the
band could be incorrectly (mis)classified by FQ.

Packets would be queued in the 'internal' queue, and no pacing ever
happen for this special queue.

Fixes: afe4fd0624 ("pkt_sched: fq: Fair Queue packet scheduler")
Signed-off-by: Maciej Żenczykowski <maze@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Stephen Hemminger <stephen@networkplumber.org>
Cc: Willem de Bruijn <willemb@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-11-14 17:16:07 -05:00
Johannes Berg 4534de8305 genetlink: make all genl_ops users const
Now that genl_ops are no longer modified in place when
registering, they can be made const. This patch was done
mostly with spatch:

@@
identifier ops;
@@
+const
 struct genl_ops ops[] = {
 ...
 };

(except the struct thing in net/openvswitch/datapath.c)

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-11-14 17:10:41 -05:00
Johannes Berg f84f771d94 genetlink: allow making ops const
Allow making the ops array const by not modifying the ops
flags on registration but rather only when ops are sent
out in the family information.

No users are updated yet except for the pre_doit/post_doit
calls in wireless (the only ones that exist now.)

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-11-14 17:10:41 -05:00
Johannes Berg d91824c08f genetlink: register family ops as array
Instead of using a linked list, use an array. This reduces
the data size needed by the users of genetlink, for example
in wireless (net/wireless/nl80211.c) on 64-bit it frees up
over 1K of data space.

Remove the attempted sending of CTRL_CMD_NEWOPS ctrl event
since genl_ctrl_event(CTRL_CMD_NEWOPS, ...) only returns
-EINVAL anyway, therefore no such event could ever be sent.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-11-14 17:10:41 -05:00
Johannes Berg 3686ec5e84 genetlink: remove genl_register_ops/genl_unregister_ops
genl_register_ops() is still needed for internal registration,
but is no longer available to users of the API.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-11-14 17:10:40 -05:00
Johannes Berg b61a5eea59 wimax: use genl_register_family_with_ops()
This simplifies the code since there's no longer a need to
have error handling in the registration.

Unfortunately it means more extern function declarations are
needed, but the overall goal would seem to justify this.

Due to the removal of duplication in the netlink policies,
this reduces the size of wimax by almost 1k.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-11-14 17:10:40 -05:00
Johannes Berg 1c582d915d ieee802154: use genl_register_family_with_ops()
This simplifies the code since there's no longer a need to
have error handling in the registration.

Unfortunately it means more extern function declarations are
needed, but the overall goal would seem to justify this.

While at it, also fix the registration error path - if the
family registration failed then it shouldn't be unregistered.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-11-14 17:10:40 -05:00
Johannes Berg 9504b3ee1c hsr: use genl_register_family_with_ops()
This simplifies the code since there's no longer a
need to have error handling in the registration.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-11-14 17:10:40 -05:00
Nicolas Dichtel 1e9f3d6f1c ip6tnl: fix use after free of fb_tnl_dev
Bug has been introduced by commit bb8140947a ("ip6tnl: allow to use rtnl ops
on fb tunnel").

When ip6_tunnel.ko is unloaded, FB device is delete by rtnl_link_unregister()
and then we try to use the pointer in ip6_tnl_destroy_tunnels().

Let's add an handler for dellink, which will never remove the FB tunnel. With
this patch it will no more be possible to remove it via 'ip link del ip6tnl0',
but it's safer.

The same fix was already proposed by Willem de Bruijn <willemb@google.com> for
sit interfaces.

CC: Willem de Bruijn <willemb@google.com>
Reported-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-11-14 17:04:38 -05:00
Nicolas Dichtel f7cb888633 sit/gre6: don't try to add the same route two times
addrconf_add_linklocal() already adds the link local route, so there is no
reason to add it before calling this function.

Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-11-14 16:59:16 -05:00
Nicolas Dichtel f0e2acfa30 sit: link local routes are missing
When a link local address was added to a sit interface, the corresponding route
was not configured. This breaks routing protocols that use the link local
address, like OSPFv3.

To ease the code reading, I remove sit_route_add(), which only adds v4 mapped
routes, and add this kind of route directly in sit_add_v4_addrs(). Thus link
local and v4 mapped routes are configured in the same place.

Reported-by: Li Hongjun <hongjun.li@6wind.com>
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-11-14 16:59:16 -05:00
Nicolas Dichtel 929c9cf310 sit: fix prefix length of ll and v4mapped addresses
When the local IPv4 endpoint is wilcard (0.0.0.0), the prefix length is
correctly set, ie 64 if the address is a link local one or 96 if the address is
a v4 mapped one.
But when the local endpoint is specified, the prefix length is set to 128 for
both kind of address. This patch fix this wrong prefix length.

Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-11-14 16:59:16 -05:00
Willem de Bruijn 9434266f2c sit: fix use after free of fb_tunnel_dev
Bug: The fallback device is created in sit_init_net and assumed to be
freed in sit_exit_net. First, it is dereferenced in that function, in
sit_destroy_tunnels:

        struct net *net = dev_net(sitn->fb_tunnel_dev);

Prior to this, rtnl_unlink_register has removed all devices that match
rtnl_link_ops == sit_link_ops.

Commit 205983c437 added the line

+       sitn->fb_tunnel_dev->rtnl_link_ops = &sit_link_ops;

which cases the fallback device to match here and be freed before it
is last dereferenced.

Fix: This commit adds an explicit .delllink callback to sit_link_ops
that skips deallocation at rtnl_unlink_register for the fallback
device. This mechanism is comparable to the one in ip_tunnel.

It also modifies sit_destroy_tunnels and its only caller sit_exit_net
to avoid the offending dereference in the first place. That double
lookup is more complicated than required.

Test: The bug is only triggered when CONFIG_NET_NS is enabled. It
causes a GPF only when CONFIG_DEBUG_SLAB is enabled. Verified that
this bug exists at the mentioned commit, at davem-net HEAD and at
3.11.y HEAD. Verified that it went away after applying this patch.

Fixes: 205983c437 ("sit: allow to use rtnl ops on fb tunnel")

Signed-off-by: Willem de Bruijn <willemb@google.com>
Acked-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-11-14 16:42:17 -05:00
Chang Xiangzhong d30a58ba2e net: sctp: bug-fixing: retran_path not set properly after transports recovering (v3)
When a transport recovers due to the new coming sack, SCTP should
iterate all of its transport_list to locate the __two__ most recently used
transport and set to active_path and retran_path respectively. The exising
code does not find the two properly - In case of the following list:

[most-recent] -> [2nd-most-recent] -> ...

Both active_path and retran_path would be set to the 1st element.

The bug happens when:
1) multi-homing
2) failure/partial_failure transport recovers
Both active_path and retran_path would be set to the same most-recent one, in
other words, retran_path would not take its role - an end user might not even
notice this issue.

Signed-off-by: Chang Xiangzhong <changxiangzhong@gmail.com>
Acked-by: Vlad Yasevich <vyasevich@gmail.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-11-14 16:35:09 -05:00
Eric Dumazet dccf76ca6b net-tcp: fix panic in tcp_fastopen_cache_set()
We had some reports of crashes using TCP fastopen, and Dave Jones
gave a nice stack trace pointing to the error.

Issue is that tcp_get_metrics() should not be called with a NULL dst

Fixes: 1fe4c481ba ("net-tcp: Fast Open client - cookie cache")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Dave Jones <davej@redhat.com>
Cc: Yuchung Cheng <ycheng@google.com>
Acked-by: Yuchung Cheng <ycheng@google.com>
Tested-by: Dave Jones <davej@fedoraproject.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-11-14 16:33:18 -05:00
Eric Dumazet 98e09386c0 tcp: tsq: restore minimal amount of queueing
After commit c9eeec26e3 ("tcp: TSQ can use a dynamic limit"), several
users reported throughput regressions, notably on mvneta and wifi
adapters.

802.11 AMPDU requires a fair amount of queueing to be effective.

This patch partially reverts the change done in tcp_write_xmit()
so that the minimal amount is sysctl_tcp_limit_output_bytes.

It also remove the use of this sysctl while building skb stored
in write queue, as TSO autosizing does the right thing anyway.

Users with well behaving NICS and correct qdisc (like sch_fq),
can then lower the default sysctl_tcp_limit_output_bytes value from
128KB to 8KB.

This new usage of sysctl_tcp_limit_output_bytes permits each driver
authors to check how their driver performs when/if the value is set
to a minimum of 4KB.

Normally, line rate for a single TCP flow should be possible,
but some drivers rely on timers to perform TX completion and
too long TX completion delays prevent reaching full throughput.

Fixes: c9eeec26e3 ("tcp: TSQ can use a dynamic limit")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Sujith Manoharan <sujith@msujith.org>
Reported-by: Arnaud Ebalard <arno@natisbad.org>
Tested-by: Sujith Manoharan <sujith@msujith.org>
Cc: Felix Fietkau <nbd@openwrt.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-11-14 16:25:14 -05:00
Toshiaki Makita b4e09b29c7 bridge: Fix memory leak when deleting bridge with vlan filtering enabled
We currently don't call br_vlan_flush() when deleting a bridge, which
leads to memory leak if br->vlan_info is allocated.

Steps to reproduce:
  while :
  do
    brctl addbr br0
    bridge vlan add dev br0 vid 10 self
    brctl delbr br0
  done
We can observe the cache size of corresponding slab entry
(as kmalloc-2048 in SLUB) is increased.

kmemleak output:
unreferenced object 0xffff8800b68a7000 (size 2048):
  comm "bridge", pid 2086, jiffies 4295774704 (age 47.656s)
  hex dump (first 32 bytes):
    00 00 00 00 00 00 00 00 00 48 9b 36 00 88 ff ff  .........H.6....
    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
  backtrace:
    [<ffffffff815eb6ae>] kmemleak_alloc+0x4e/0xb0
    [<ffffffff8116a1ca>] kmem_cache_alloc_trace+0xca/0x220
    [<ffffffffa03eddd6>] br_vlan_add+0x66/0xe0 [bridge]
    [<ffffffffa03e543c>] br_setlink+0x2dc/0x340 [bridge]
    [<ffffffff8150e481>] rtnl_bridge_setlink+0x101/0x200
    [<ffffffff8150d9d9>] rtnetlink_rcv_msg+0x99/0x260
    [<ffffffff81528679>] netlink_rcv_skb+0xa9/0xc0
    [<ffffffff8150d938>] rtnetlink_rcv+0x28/0x30
    [<ffffffff81527ccd>] netlink_unicast+0xdd/0x190
    [<ffffffff8152807f>] netlink_sendmsg+0x2ff/0x740
    [<ffffffff814e8368>] sock_sendmsg+0x88/0xc0
    [<ffffffff814e8ac8>] ___sys_sendmsg.part.14+0x298/0x2b0
    [<ffffffff814e91de>] __sys_sendmsg+0x4e/0x90
    [<ffffffff814e922e>] SyS_sendmsg+0xe/0x10
    [<ffffffff81601669>] system_call_fastpath+0x16/0x1b
    [<ffffffffffffffff>] 0xffffffffffffffff

Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-11-14 16:16:34 -05:00
Toshiaki Makita dbbaf949bc bridge: Call vlan_vid_del for all vids at nbp_vlan_flush
We should call vlan_vid_del for all vids at nbp_vlan_flush to prevent
vid_info->refcount from being leaked when detaching a bridge port.

Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-11-14 16:16:34 -05:00
Toshiaki Makita 192368372d bridge: Use vlan_vid_[add/del] instead of direct ndo_vlan_rx_[add/kill]_vid calls
We should use wrapper functions vlan_vid_[add/del] instead of
ndo_vlan_rx_[add/kill]_vid. Otherwise, we might be not able to communicate
using vlan interface in a certain situation.

Example of problematic case:
  vconfig add eth0 10
  brctl addif br0 eth0
  bridge vlan add dev eth0 vid 10
  bridge vlan del dev eth0 vid 10
  brctl delif br0 eth0
In this case, we cannot communicate via eth0.10 because vlan 10 is
filtered by NIC that has the vlan filtering feature.

Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-11-14 16:16:34 -05:00
Alexei Starovoitov 81b9eab5eb core/dev: do not ignore dmac in dev_forward_skb()
commit 06a23fe31c
("core/dev: set pkt_type after eth_type_trans() in dev_forward_skb()")
and refactoring 64261f230a
("dev: move skb_scrub_packet() after eth_type_trans()")

are forcing pkt_type to be PACKET_HOST when skb traverses veth.

which means that ip forwarding will kick in inside netns
even if skb->eth->h_dest != dev->dev_addr

Fix order of eth_type_trans() and skb_scrub_packet() in dev_forward_skb()
and in ip_tunnel_rcv()

Fixes: 06a23fe31c ("core/dev: set pkt_type after eth_type_trans() in dev_forward_skb()")
CC: Isaku Yamahata <yamahatanetdev@gmail.com>
CC: Maciej Zenczykowski <zenczykowski@gmail.com>
CC: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-11-14 02:39:53 -05:00
Linus Torvalds 5e30025a31 Merge branch 'core-locking-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull core locking changes from Ingo Molnar:
 "The biggest changes:

   - add lockdep support for seqcount/seqlocks structures, this
     unearthed both bugs and required extra annotation.

   - move the various kernel locking primitives to the new
     kernel/locking/ directory"

* 'core-locking-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (21 commits)
  block: Use u64_stats_init() to initialize seqcounts
  locking/lockdep: Mark __lockdep_count_forward_deps() as static
  lockdep/proc: Fix lock-time avg computation
  locking/doc: Update references to kernel/mutex.c
  ipv6: Fix possible ipv6 seqlock deadlock
  cpuset: Fix potential deadlock w/ set_mems_allowed
  seqcount: Add lockdep functionality to seqcount/seqlock structures
  net: Explicitly initialize u64_stats_sync structures for lockdep
  locking: Move the percpu-rwsem code to kernel/locking/
  locking: Move the lglocks code to kernel/locking/
  locking: Move the rwsem code to kernel/locking/
  locking: Move the rtmutex code to kernel/locking/
  locking: Move the semaphore core to kernel/locking/
  locking: Move the spinlock code to kernel/locking/
  locking: Move the lockdep code to kernel/locking/
  locking: Move the mutex code to kernel/locking/
  hung_task debugging: Add tracepoint to report the hang
  x86/locking/kconfig: Update paravirt spinlock Kconfig description
  lockstat: Report avg wait and hold times
  lockdep, x86/alternatives: Drop ancient lockdep fixup message
  ...
2013-11-14 16:30:30 +09:00
Randy Dunlap 4819224853 netfilter: fix connlimit Kconfig prompt string
Under Core Netfilter Configuration, connlimit match support has
an extra double quote at the end of it.

Fixes a portion of kernel bugzilla #52671:
  https://bugzilla.kernel.org/show_bug.cgi?id=52671

Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Reported-by: lailavrazda1979@gmail.com
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-11-13 23:31:11 +01:00
Weng Meiling 587ac5ee6f svcrpc: remove an unnecessary assignment
Signed-off-by: Weng Meiling <wengmeiling.weng@huawei.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2013-11-13 15:31:22 -05:00
Johan Hedberg 86ca9eac31 Bluetooth: Fix rejecting SMP security request in slave role
The SMP security request is for a slave role device to request the
master role device to initiate a pairing request. If we receive this
command while we're in the slave role we should reject it appropriately.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2013-11-13 11:36:54 -02:00
Seung-Woo Kim 31e8ce80ba Bluetooth: Fix crash in l2cap_chan_send after l2cap_chan_del
Removing a bond and disconnecting from a specific remote device
can cause l2cap_chan_send() is called after l2cap_chan_del() is
called. This causes following crash.

[ 1384.972086] Unable to handle kernel NULL pointer dereference at virtual address 00000008
[ 1384.972090] pgd = c0004000
[ 1384.972125] [00000008] *pgd=00000000
[ 1384.972137] Internal error: Oops: 17 [#1] PREEMPT SMP ARM
[ 1384.972144] Modules linked in:
[ 1384.972156] CPU: 0 PID: 841 Comm: krfcommd Not tainted 3.10.14-gdf22a71-dirty #435
[ 1384.972162] task: df29a100 ti: df178000 task.ti: df178000
[ 1384.972182] PC is at l2cap_create_basic_pdu+0x30/0x1ac
[ 1384.972191] LR is at l2cap_chan_send+0x100/0x1d4
[ 1384.972198] pc : [<c051d250>]    lr : [<c0521c78>]    psr: 40000113
[ 1384.972198] sp : df179d40  ip : c083a010  fp : 00000008
[ 1384.972202] r10: 00000004  r9 : 0000065a  r8 : 000003f5
[ 1384.972206] r7 : 00000000  r6 : 00000000  r5 : df179e84  r4 : da557000
[ 1384.972210] r3 : 00000000  r2 : 00000004  r1 : df179e84  r0 : 00000000
[ 1384.972215] Flags: nZcv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment kernel
[ 1384.972220] Control: 10c53c7d  Table: 5c8b004a  DAC: 00000015
[ 1384.972224] Process krfcommd (pid: 841, stack limit = 0xdf178238)
[ 1384.972229] Stack: (0xdf179d40 to 0xdf17a000)
[ 1384.972238] 9d40: 00000000 da557000 00000004 df179e84 00000004 000003f5 0000065a 00000000
[ 1384.972245] 9d60: 00000008 c0521c78 df179e84 da557000 00000004 da557204 de0c6800 df179e84
[ 1384.972253] 9d80: da557000 00000004 da557204 c0526b7c 00000004 df724000 df179e84 00000004
[ 1384.972260] 9da0: df179db0 df29a100 c083bc48 c045481c 00000001 00000000 00000000 00000000
[ 1384.972267] 9dc0: 00000000 df29a100 00000000 00000000 00000000 00000000 df179e10 00000000
[ 1384.972274] 9de0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
[ 1384.972281] 9e00: 00000000 00000000 00000000 00000000 df179e4c c000ec80 c0b538c0 00000004
[ 1384.972288] 9e20: df724000 df178000 00000000 df179e84 c0b538c0 00000000 df178000 c07f4570
[ 1384.972295] 9e40: dcad9c00 df179e74 c07f4394 df179e60 df178000 00000000 df179e84 de247010
[ 1384.972303] 9e60: 00000043 c0454dec 00000001 00000004 df315c00 c0530598 00000004 df315c0c
[ 1384.972310] 9e80: ffffc32c 00000000 00000000 df179ea0 00000001 00000000 00000000 00000000
[ 1384.972317] 9ea0: df179ebc 00000004 df315c00 c05df838 00000000 c0530810 c07d08c0 d7017303
[ 1384.972325] 9ec0: 6ec245b9 00000000 df315c00 c0531b04 c07f3fe0 c07f4018 da67a300 df315c00
[ 1384.972332] 9ee0: 00000000 c05334e0 df315c00 df315b80 df315c00 de0c6800 da67a300 00000000
[ 1384.972339] 9f00: de0c684c c0533674 df204100 df315c00 df315c00 df204100 df315c00 c082b138
[ 1384.972347] 9f20: c053385c c0533754 a0000113 df178000 00000001 c083bc48 00000000 c053385c
[ 1384.972354] 9f40: 00000000 00000000 00000000 c05338c4 00000000 df9f0000 df9f5ee4 df179f6c
[ 1384.972360] 9f60: df178000 c0049db4 00000000 00000000 c07f3ff8 00000000 00000000 00000000
[ 1384.972368] 9f80: df179f80 df179f80 00000000 00000000 df179f90 df179f90 df9f5ee4 c0049cfc
[ 1384.972374] 9fa0: 00000000 00000000 00000000 c000f168 00000000 00000000 00000000 00000000
[ 1384.972381] 9fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
[ 1384.972388] 9fe0: 00000000 00000000 00000000 00000000 00000013 00000000 00010000 00000600
[ 1384.972411] [<c051d250>] (l2cap_create_basic_pdu+0x30/0x1ac) from [<c0521c78>] (l2cap_chan_send+0x100/0x1d4)
[ 1384.972425] [<c0521c78>] (l2cap_chan_send+0x100/0x1d4) from [<c0526b7c>] (l2cap_sock_sendmsg+0xa8/0x104)
[ 1384.972440] [<c0526b7c>] (l2cap_sock_sendmsg+0xa8/0x104) from [<c045481c>] (sock_sendmsg+0xac/0xcc)
[ 1384.972453] [<c045481c>] (sock_sendmsg+0xac/0xcc) from [<c0454dec>] (kernel_sendmsg+0x2c/0x34)
[ 1384.972469] [<c0454dec>] (kernel_sendmsg+0x2c/0x34) from [<c0530598>] (rfcomm_send_frame+0x58/0x7c)
[ 1384.972481] [<c0530598>] (rfcomm_send_frame+0x58/0x7c) from [<c0530810>] (rfcomm_send_ua+0x98/0xbc)
[ 1384.972494] [<c0530810>] (rfcomm_send_ua+0x98/0xbc) from [<c0531b04>] (rfcomm_recv_disc+0xac/0x100)
[ 1384.972506] [<c0531b04>] (rfcomm_recv_disc+0xac/0x100) from [<c05334e0>] (rfcomm_recv_frame+0x144/0x264)
[ 1384.972519] [<c05334e0>] (rfcomm_recv_frame+0x144/0x264) from [<c0533674>] (rfcomm_process_rx+0x74/0xfc)
[ 1384.972531] [<c0533674>] (rfcomm_process_rx+0x74/0xfc) from [<c0533754>] (rfcomm_process_sessions+0x58/0x160)
[ 1384.972543] [<c0533754>] (rfcomm_process_sessions+0x58/0x160) from [<c05338c4>] (rfcomm_run+0x68/0x110)
[ 1384.972558] [<c05338c4>] (rfcomm_run+0x68/0x110) from [<c0049db4>] (kthread+0xb8/0xbc)
[ 1384.972576] [<c0049db4>] (kthread+0xb8/0xbc) from [<c000f168>] (ret_from_fork+0x14/0x2c)
[ 1384.972586] Code: e3100004 e1a07003 e5946000 1a000057 (e5969008)
[ 1384.972614] ---[ end trace 6170b7ce00144e8c ]---

Signed-off-by: Seung-Woo Kim <sw0312.kim@samsung.com>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2013-11-13 11:36:54 -02:00
Seung-Woo Kim 8992da0993 Bluetooth: Fix to set proper bdaddr_type for RFCOMM connect
L2CAP socket validates proper bdaddr_type for connect, so this
patch fixes to set explictly bdaddr_type for RFCOMM connect.

Signed-off-by: Seung-Woo Kim <sw0312.kim@samsung.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2013-11-13 11:36:53 -02:00
Seung-Woo Kim c507f138fc Bluetooth: Fix RFCOMM bind fail for L2CAP sock
L2CAP socket bind checks its bdaddr type but RFCOMM kernel thread
does not assign proper bdaddr type for L2CAP sock. This can cause
that RFCOMM failure.

Signed-off-by: Seung-Woo Kim <sw0312.kim@samsung.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2013-11-13 11:36:53 -02:00
Marcel Holtmann 60c7a3c9c7 Bluetooth: Fix issue with RFCOMM getsockopt operation
The commit 94a86df010 seem to have
uncovered a long standing bug that did not trigger so far.

BUG: unable to handle kernel paging request at 00000009dd503502
IP: [<ffffffff815b1868>] rfcomm_sock_getsockopt+0x128/0x200
PGD 0
Oops: 0000 [#1] SMP
Modules linked in: ath5k ath mac80211 cfg80211
CPU: 2 PID: 1459 Comm: bluetoothd Not tainted 3.11.0-133163-gcebd830 #2
Hardware name: System manufacturer System Product Name/P6T DELUXE V2, BIOS
1202    12/22/2010
task: ffff8803304106a0 ti: ffff88033046a000 task.ti: ffff88033046a000
RIP: 0010:[<ffffffff815b1868>]  [<ffffffff815b1868>]
rfcomm_sock_getsockopt+0x128/0x200
RSP: 0018:ffff88033046bed8  EFLAGS: 00010246
RAX: 00000009dd503502 RBX: 0000000000000003 RCX: 00007fffa2ed5548
RDX: 0000000000000003 RSI: 0000000000000012 RDI: ffff88032fd37480
RBP: ffff88033046bf28 R08: 00007fffa2ed554c R09: ffff88032f5707d8
R10: 00007fffa2ed5548 R11: 0000000000000202 R12: ffff880330bbd000
R13: 00007fffa2ed5548 R14: 0000000000000003 R15: 00007fffa2ed554c
FS:  00007fc44cfac700(0000) GS:ffff88033fc80000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00000009dd503502 CR3: 00000003304c2000 CR4: 00000000000007e0
Stack:
ffff88033046bf28 ffffffff815b0f2f ffff88033046bf18 0002ffff81105ef6
0000000600000000 ffff88032fd37480 0000000000000012 00007fffa2ed5548
0000000000000003 00007fffa2ed554c ffff88033046bf78 ffffffff814c0380
Call Trace:
[<ffffffff815b0f2f>] ? rfcomm_sock_setsockopt+0x5f/0x190
[<ffffffff814c0380>] SyS_getsockopt+0x60/0xb0
[<ffffffff815e0852>] system_call_fastpath+0x16/0x1b
Code: 02 00 00 00 0f 47 d0 4c 89 ef e8 74 13 cd ff 83 f8 01 19 c9 f7 d1 83 e1
f2 e9 4b ff ff ff 0f 1f 44 00 00 49 8b 84 24 70 02 00 00 <4c> 8b 30 4c 89 c0 e8
2d 19 cd ff 85 c0 49 89 d7 b9 f2 ff ff ff
RIP  [<ffffffff815b1868>] rfcomm_sock_getsockopt+0x128/0x200
RSP <ffff88033046bed8>
CR2: 00000009dd503502

It triggers in the following segment of the code:

0x1313 is in rfcomm_sock_getsockopt (net/bluetooth/rfcomm/sock.c:743).
738
739	static int rfcomm_sock_getsockopt_old(struct socket *sock, int optname, char __user *optval, int __user *optlen)
740	{
741		struct sock *sk = sock->sk;
742		struct rfcomm_conninfo cinfo;
743		struct l2cap_conn *conn = l2cap_pi(sk)->chan->conn;
744		int len, err = 0;
745		u32 opt;
746
747		BT_DBG("sk %p", sk);

The l2cap_pi(sk) is wrong here since it should have been rfcomm_pi(sk),
but that socket of course does not contain the low-level connection
details requested here.

Tracking down the actual offending commit, it seems that this has been
introduced when doing some L2CAP refactoring:

commit 8c1d787be4
Author: Gustavo F. Padovan <padovan@profusion.mobi>
Date:   Wed Apr 13 20:23:55 2011 -0300

@@ -743,6 +743,7 @@ static int rfcomm_sock_getsockopt_old(struct socket *sock, int optname, char __u
        struct sock *sk = sock->sk;
        struct sock *l2cap_sk;
        struct rfcomm_conninfo cinfo;
+       struct l2cap_conn *conn = l2cap_pi(sk)->chan->conn;
        int len, err = 0;
        u32 opt;

@@ -787,8 +788,8 @@ static int rfcomm_sock_getsockopt_old(struct socket *sock, int optname, char __u

                l2cap_sk = rfcomm_pi(sk)->dlc->session->sock->sk;

-               cinfo.hci_handle = l2cap_pi(l2cap_sk)->conn->hcon->handle;
-               memcpy(cinfo.dev_class, l2cap_pi(l2cap_sk)->conn->hcon->dev_class, 3);
+               cinfo.hci_handle = conn->hcon->handle;
+               memcpy(cinfo.dev_class, conn->hcon->dev_class, 3);

The l2cap_sk got accidentally mixed into the sk (which is RFCOMM) and
now causing a problem within getsocketopt() system call. To fix this,
just re-introduce l2cap_sk and make sure the right socket is used for
the low-level connection details.

Reported-by: Fabio Rossi <rossi.f@inwind.it>
Reported-by: Janusz Dziedzic <janusz.dziedzic@gmail.com>
Tested-by: Janusz Dziedzic <janusz.dziedzic@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2013-11-13 11:36:52 -02:00
Linus Torvalds 42a2d923cc Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next
Pull networking updates from David Miller:

 1) The addition of nftables.  No longer will we need protocol aware
    firewall filtering modules, it can all live in userspace.

    At the core of nftables is a, for lack of a better term, virtual
    machine that executes byte codes to inspect packet or metadata
    (arriving interface index, etc.) and make verdict decisions.

    Besides support for loading packet contents and comparing them, the
    interpreter supports lookups in various datastructures as
    fundamental operations.  For example sets are supports, and
    therefore one could create a set of whitelist IP address entries
    which have ACCEPT verdicts attached to them, and use the appropriate
    byte codes to do such lookups.

    Since the interpreted code is composed in userspace, userspace can
    do things like optimize things before giving it to the kernel.

    Another major improvement is the capability of atomically updating
    portions of the ruleset.  In the existing netfilter implementation,
    one has to update the entire rule set in order to make a change and
    this is very expensive.

    Userspace tools exist to create nftables rules using existing
    netfilter rule sets, but both kernel implementations will need to
    co-exist for quite some time as we transition from the old to the
    new stuff.

    Kudos to Patrick McHardy, Pablo Neira Ayuso, and others who have
    worked so hard on this.

 2) Daniel Borkmann and Hannes Frederic Sowa made several improvements
    to our pseudo-random number generator, mostly used for things like
    UDP port randomization and netfitler, amongst other things.

    In particular the taus88 generater is updated to taus113, and test
    cases are added.

 3) Support 64-bit rates in HTB and TBF schedulers, from Eric Dumazet
    and Yang Yingliang.

 4) Add support for new 577xx tigon3 chips to tg3 driver, from Nithin
    Sujir.

 5) Fix two fatal flaws in TCP dynamic right sizing, from Eric Dumazet,
    Neal Cardwell, and Yuchung Cheng.

 6) Allow IP_TOS and IP_TTL to be specified in sendmsg() ancillary
    control message data, much like other socket option attributes.
    From Francesco Fusco.

 7) Allow applications to specify a cap on the rate computed
    automatically by the kernel for pacing flows, via a new
    SO_MAX_PACING_RATE socket option.  From Eric Dumazet.

 8) Make the initial autotuned send buffer sizing in TCP more closely
    reflect actual needs, from Eric Dumazet.

 9) Currently early socket demux only happens for TCP sockets, but we
    can do it for connected UDP sockets too.  Implementation from Shawn
    Bohrer.

10) Refactor inet socket demux with the goal of improving hash demux
    performance for listening sockets.  With the main goals being able
    to use RCU lookups on even request sockets, and eliminating the
    listening lock contention.  From Eric Dumazet.

11) The bonding layer has many demuxes in it's fast path, and an RCU
    conversion was started back in 3.11, several changes here extend the
    RCU usage to even more locations.  From Ding Tianhong and Wang
    Yufen, based upon suggestions by Nikolay Aleksandrov and Veaceslav
    Falico.

12) Allow stackability of segmentation offloads to, in particular, allow
    segmentation offloading over tunnels.  From Eric Dumazet.

13) Significantly improve the handling of secret keys we input into the
    various hash functions in the inet hashtables, TCP fast open, as
    well as syncookies.  From Hannes Frederic Sowa.  The key fundamental
    operation is "net_get_random_once()" which uses static keys.

    Hannes even extended this to ipv4/ipv6 fragmentation handling and
    our generic flow dissector.

14) The generic driver layer takes care now to set the driver data to
    NULL on device removal, so it's no longer necessary for drivers to
    explicitly set it to NULL any more.  Many drivers have been cleaned
    up in this way, from Jingoo Han.

15) Add a BPF based packet scheduler classifier, from Daniel Borkmann.

16) Improve CRC32 interfaces and generic SKB checksum iterators so that
    SCTP's checksumming can more cleanly be handled.  Also from Daniel
    Borkmann.

17) Add a new PMTU discovery mode, IP_PMTUDISC_INTERFACE, which forces
    using the interface MTU value.  This helps avoid PMTU attacks,
    particularly on DNS servers.  From Hannes Frederic Sowa.

18) Use generic XPS for transmit queue steering rather than internal
    (re-)implementation in virtio-net.  From Jason Wang.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1622 commits)
  random32: add test cases for taus113 implementation
  random32: upgrade taus88 generator to taus113 from errata paper
  random32: move rnd_state to linux/random.h
  random32: add prandom_reseed_late() and call when nonblocking pool becomes initialized
  random32: add periodic reseeding
  random32: fix off-by-one in seeding requirement
  PHY: Add RTL8201CP phy_driver to realtek
  xtsonic: add missing platform_set_drvdata() in xtsonic_probe()
  macmace: add missing platform_set_drvdata() in mace_probe()
  ethernet/arc/arc_emac: add missing platform_set_drvdata() in arc_emac_probe()
  ipv6: protect for_each_sk_fl_rcu in mem_check with rcu_read_lock_bh
  vlan: Implement vlan_dev_get_egress_qos_mask as an inline.
  ixgbe: add warning when max_vfs is out of range.
  igb: Update link modes display in ethtool
  netfilter: push reasm skb through instead of original frag skbs
  ip6_output: fragment outgoing reassembled skb properly
  MAINTAINERS: mv643xx_eth: take over maintainership from Lennart
  net_sched: tbf: support of 64bit rates
  ixgbe: deleting dfwd stations out of order can cause null ptr deref
  ixgbe: fix build err, num_rx_queues is only available with CONFIG_RPS
  ...
2013-11-13 17:40:34 +09:00
Linus Torvalds 9bc9ccd7db Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull vfs updates from Al Viro:
 "All kinds of stuff this time around; some more notable parts:

   - RCU'd vfsmounts handling
   - new primitives for coredump handling
   - files_lock is gone
   - Bruce's delegations handling series
   - exportfs fixes

  plus misc stuff all over the place"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (101 commits)
  ecryptfs: ->f_op is never NULL
  locks: break delegations on any attribute modification
  locks: break delegations on link
  locks: break delegations on rename
  locks: helper functions for delegation breaking
  locks: break delegations on unlink
  namei: minor vfs_unlink cleanup
  locks: implement delegations
  locks: introduce new FL_DELEG lock flag
  vfs: take i_mutex on renamed file
  vfs: rename I_MUTEX_QUOTA now that it's not used for quotas
  vfs: don't use PARENT/CHILD lock classes for non-directories
  vfs: pull ext4's double-i_mutex-locking into common code
  exportfs: fix quadratic behavior in filehandle lookup
  exportfs: better variable name
  exportfs: move most of reconnect_path to helper function
  exportfs: eliminate unused "noprogress" counter
  exportfs: stop retrying once we race with rename/remove
  exportfs: clear DISCONNECTED on all parents sooner
  exportfs: more detailed comment for path_reconnect
  ...
2013-11-13 15:34:18 +09:00
Trond Myklebust d07ba8422f SUNRPC: Avoid deep recursion in rpc_release_client
In cases where an rpc client has a parent hierarchy, then
rpc_free_client may end up calling rpc_release_client() on the
parent, thus recursing back into rpc_free_client. If the hierarchy
is deep enough, then we can get into situations where the stack
simply overflows.

The fix is to have rpc_release_client() loop so that it can take
care of the parent rpc client hierarchy without needing to
recurse.

Reported-by: Jeff Layton <jlayton@redhat.com>
Reported-by: Weston Andros Adamson <dros@netapp.com>
Reported-by: Bruce Fields <bfields@fieldses.org>
Link: http://lkml.kernel.org/r/2C73011F-0939-434C-9E4D-13A1EB1403D7@netapp.com
Cc: stable@vger.kernel.org
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-11-12 18:56:57 -05:00
J. Bruce Fields f06c3d2bb8 sunrpc: comment typo fix
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2013-11-12 11:42:10 -05:00
Linus Torvalds 39cf275a1a Merge branch 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler changes from Ingo Molnar:
 "The main changes in this cycle are:

   - (much) improved CONFIG_NUMA_BALANCING support from Mel Gorman, Rik
     van Riel, Peter Zijlstra et al.  Yay!

   - optimize preemption counter handling: merge the NEED_RESCHED flag
     into the preempt_count variable, by Peter Zijlstra.

   - wait.h fixes and code reorganization from Peter Zijlstra

   - cfs_bandwidth fixes from Ben Segall

   - SMP load-balancer cleanups from Peter Zijstra

   - idle balancer improvements from Jason Low

   - other fixes and cleanups"

* 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (129 commits)
  ftrace, sched: Add TRACE_FLAG_PREEMPT_RESCHED
  stop_machine: Fix race between stop_two_cpus() and stop_cpus()
  sched: Remove unnecessary iteration over sched domains to update nr_busy_cpus
  sched: Fix asymmetric scheduling for POWER7
  sched: Move completion code from core.c to completion.c
  sched: Move wait code from core.c to wait.c
  sched: Move wait.c into kernel/sched/
  sched/wait: Fix __wait_event_interruptible_lock_irq_timeout()
  sched: Avoid throttle_cfs_rq() racing with period_timer stopping
  sched: Guarantee new group-entities always have weight
  sched: Fix hrtimer_cancel()/rq->lock deadlock
  sched: Fix cfs_bandwidth misuse of hrtimer_expires_remaining
  sched: Fix race on toggling cfs_bandwidth_used
  sched: Remove extra put_online_cpus() inside sched_setaffinity()
  sched/rt: Fix task_tick_rt() comment
  sched/wait: Fix build breakage
  sched/wait: Introduce prepare_to_wait_event()
  sched/wait: Add ___wait_cond_timeout() to wait_event*_timeout() too
  sched: Remove get_online_cpus() usage
  sched: Fix race in migrate_swap_stop()
  ...
2013-11-12 10:20:12 +09:00
Hannes Frederic Sowa f8c31c8f80 ipv6: protect for_each_sk_fl_rcu in mem_check with rcu_read_lock_bh
Fixes a suspicious rcu derference warning.

Cc: Florent Fourcot <florent.fourcot@enst-bretagne.fr>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-11-11 01:25:28 -05:00
David S. Miller e267cb960a vlan: Implement vlan_dev_get_egress_qos_mask as an inline.
This is to avoid very silly Kconfig dependencies for modules
using this routine.

Signed-off-by: David S. Miller <davem@davemloft.net>
2013-11-11 00:42:07 -05:00
Jiri Pirko 6aafeef03b netfilter: push reasm skb through instead of original frag skbs
Pushing original fragments through causes several problems. For example
for matching, frags may not be matched correctly. Take following
example:

<example>
On HOSTA do:
ip6tables -I INPUT -p icmpv6 -j DROP
ip6tables -I INPUT -p icmpv6 -m icmp6 --icmpv6-type 128 -j ACCEPT

and on HOSTB you do:
ping6 HOSTA -s2000    (MTU is 1500)

Incoming echo requests will be filtered out on HOSTA. This issue does
not occur with smaller packets than MTU (where fragmentation does not happen)
</example>

As was discussed previously, the only correct solution seems to be to use
reassembled skb instead of separete frags. Doing this has positive side
effects in reducing sk_buff by one pointer (nfct_reasm) and also the reams
dances in ipvs and conntrack can be removed.

Future plan is to remove net/ipv6/netfilter/nf_conntrack_reasm.c
entirely and use code in net/ipv6/reassembly.c instead.

Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Marcelo Ricardo Leitner <mleitner@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-11-11 00:19:35 -05:00
Jiri Pirko 9037c3579a ip6_output: fragment outgoing reassembled skb properly
If reassembled packet would fit into outdev MTU, it is not fragmented
according the original frag size and it is send as single big packet.

The second case is if skb is gso. In that case fragmentation does not happen
according to the original frag size.

This patch fixes these.

Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-11-11 00:19:35 -05:00
Yang Yingliang a33c4a2663 net_sched: tbf: support of 64bit rates
With psched_ratecfg_precompute(), tbf can deal with 64bit rates.
Add two new attributes so that tc can use them to break the 32bit
limit.

Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
Suggested-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-11-09 14:53:37 -05:00
Trond Myklebust a6b31d18b0 SUNRPC: Fix a data corruption issue when retransmitting RPC calls
The following scenario can cause silent data corruption when doing
NFS writes. It has mainly been observed when doing database writes
using O_DIRECT.

1) The RPC client uses sendpage() to do zero-copy of the page data.
2) Due to networking issues, the reply from the server is delayed,
   and so the RPC client times out.

3) The client issues a second sendpage of the page data as part of
   an RPC call retransmission.

4) The reply to the first transmission arrives from the server
   _before_ the client hardware has emptied the TCP socket send
   buffer.
5) After processing the reply, the RPC state machine rules that
   the call to be done, and triggers the completion callbacks.
6) The application notices the RPC call is done, and reuses the
   pages to store something else (e.g. a new write).

7) The client NIC drains the TCP socket send buffer. Since the
   page data has now changed, it reads a corrupted version of the
   initial RPC call, and puts it on the wire.

This patch fixes the problem in the following manner:

The ordering guarantees of TCP ensure that when the server sends a
reply, then we know that the _first_ transmission has completed. Using
zero-copy in that situation is therefore safe.
If a time out occurs, we then send the retransmission using sendmsg()
(i.e. no zero-copy), We then know that the socket contains a full copy of
the data, and so it will retransmit a faithful reproduction even if the
RPC call completes, and the application reuses the O_DIRECT buffer in
the meantime.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: stable@vger.kernel.org
2013-11-08 17:19:15 -05:00
Duan Jiong f104a567e6 ipv6: use rt6_get_dflt_router to get default router in rt6_route_rcv
As the rfc 4191 said, the Router Preference and Lifetime values in a
::/0 Route Information Option should override the preference and lifetime
values in the Router Advertisement header. But when the kernel deals with
a ::/0 Route Information Option, the rt6_get_route_info() always return
NULL, that means that overriding will not happen, because those default
routers were added without flag RTF_ROUTEINFO in rt6_add_dflt_router().

In order to deal with that condition, we should call rt6_get_dflt_router
when the prefix length is 0.

Signed-off-by: Duan Jiong <duanj.fnst@cn.fujitsu.com>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-11-08 15:16:04 -05:00
Jiri Benc cdbe7c2d6d nfnetlink: do not ack malformed messages
Commit 0628b123c9 ("netfilter: nfnetlink: add batch support and use it
from nf_tables") introduced a bug leading to various crashes in netlink_ack
when netlink message with invalid nlmsg_len was sent by an unprivileged
user.

Signed-off-by: Jiri Benc <jbenc@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-11-08 15:12:11 -05:00
Andreas Henriksson 13eb2ab2d3 net: Fix "ip rule delete table 256"
When trying to delete a table >= 256 using iproute2 the local table
will be deleted.
The table id is specified as a netlink attribute when it needs more then
8 bits and iproute2 then sets the table field to RT_TABLE_UNSPEC (0).
Preconditions to matching the table id in the rule delete code
doesn't seem to take the "table id in netlink attribute" into condition
so the frh_get_table helper function never gets to do its job when
matching against current rule.
Use the helper function twice instead of peaking at the table value directly.

Originally reported at: http://bugs.debian.org/724783

Reported-by: Nicolas HICHER <nhicher@avencall.com>
Signed-off-by: Andreas Henriksson <andreas@fatal.se>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-11-08 14:53:10 -05:00
Florent Fourcot 394055f6fa ipv6: protect flow label renew against GC
Take ip6_fl_lock before to read and update
a label.

v2: protect only the relevant code

Reported-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: Florent Fourcot <florent.fourcot@enst-bretagne.fr>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-11-08 13:44:15 -05:00
Florent Fourcot 53b47106c0 ipv6: increase maximum lifetime of flow labels
If the last RFC 6437 does not give any constraints
for lifetime of flow labels, the previous RFC 3697
spoke of a minimum of 120 seconds between
reattribution of a flow label.

The maximum linger is currently set to 60 seconds
and does not allow this configuration without
CAP_NET_ADMIN right.

This patch increase the maximum linger to 150
seconds, allowing more flexibility to standard
users.

Signed-off-by: Florent Fourcot <florent.fourcot@enst-bretagne.fr>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-11-08 13:43:52 -05:00
Florent Fourcot 3fdfa5ff50 ipv6: enable IPV6_FLOWLABEL_MGR for getsockopt
It is already possible to set/put/renew a label
with IPV6_FLOWLABEL_MGR and setsockopt. This patch
add the possibility to get information about this
label (current value, time before expiration, etc).

It helps application to take decision for a renew
or a release of the label.

v2:
 * Add spin_lock to prevent race condition
 * return -ENOENT if no result found
 * check if flr_action is GET

v3:
 * move the spin_lock to protect only the
   relevant code

Signed-off-by: Florent Fourcot <florent.fourcot@enst-bretagne.fr>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-11-08 13:42:57 -05:00
Eric Dumazet 3797d3e846 net: flow_dissector: small optimizations in IPv4 dissect
By moving code around, we avoid :

1) A reload of iph->ihl (bit field, so needs a mask)

2) A conditional test (replaced by a conditional mov on x86)
   Fast path loads iph->protocol anyway.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-11-08 13:30:02 -05:00
John W. Linville c1f3bb6bd3 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next into for-davem 2013-11-08 09:03:10 -05:00
Eric Dumazet dcd6077183 inet: fix a UFO regression
While testing virtio_net and skb_segment() changes, Hannes reported
that UFO was sending wrong frames.

It appears this was introduced by a recent commit :
8c3a897bfa ("inet: restore gso for vxlan")

The old condition to perform IP frag was :

tunnel = !!skb->encapsulation;
...
        if (!tunnel && proto == IPPROTO_UDP) {

So the new one should be :

udpfrag = !skb->encapsulation && proto == IPPROTO_UDP;
...
        if (udpfrag) {

Initialization of udpfrag must be done before call
to ops->callbacks.gso_segment(skb, features), as
skb_udp_tunnel_segment() clears skb->encapsulation

(We want udpfrag to be true for UFO, false for VXLAN)

With help from Alexei Starovoitov

Reported-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-11-08 02:07:59 -05:00
Mathias Krause bc32383cd6 net: skbuff - kernel-doc fixes
Use "@" to refer to parameters in the kernel-doc description. According
to Documentation/kernel-doc-nano-HOWTO.txt "&" shall be used to refer to
structures only.

Signed-off-by: Mathias Krause <mathias.krause@secunet.com>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-11-07 19:28:59 -05:00
Mathias Krause 253c6daa34 caif: use pskb_put() instead of reimplementing its functionality
Also remove the warning for fragmented packets -- skb_cow_data() will
linearize the buffer, removing all fragments.

Signed-off-by: Mathias Krause <mathias.krause@secunet.com>
Cc: Dmitry Tarnyagin <dmitry.tarnyagin@lockless.no>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-11-07 19:28:59 -05:00
Mathias Krause 0c7ddf36c2 net: move pskb_put() to core code
This function has usage beside IPsec so move it to the core skbuff code.
While doing so, give it some documentation and change its return type to
'unsigned char *' to be in line with skb_put().

Signed-off-by: Mathias Krause <mathias.krause@secunet.com>
Cc: Steffen Klassert <steffen.klassert@secunet.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-11-07 19:28:58 -05:00
John Fastabend a6cc0cfa72 net: Add layer 2 hardware acceleration operations for macvlan devices
Add a operations structure that allows a network interface to export
the fact that it supports package forwarding in hardware between
physical interfaces and other mac layer devices assigned to it (such
as macvlans). This operaions structure can be used by virtual mac
devices to bypass software switching so that forwarding can be done
in hardware more efficiently.

Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
CC: Andy Gospodarek <andy@greyhouse.net>
CC: "David S. Miller" <davem@davemloft.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-11-07 19:11:41 -05:00
Dan Carpenter 78032f9b3e 6lowpan: release device on error path
We recently added a new error path and it needs a dev_put().

Fixes: 7adac1ec81 ('6lowpan: Only make 6lowpan links to IEEE802154 devices')
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-11-07 19:11:13 -05:00
Eyal Perry d324353919 net/vlan: Provide read access to the vlan egress map
Provide a method for read-only access to the vlan device egress mapping.

Do this by refactoring vlan_dev_get_egress_qos_mask() such that now it
receives as an argument the skb priority instead of pointer to the skb.

Such an access is needed for the IBoE stack where the control plane
goes through the network stack. This is an add-on step on top of commit
d4a968658c "net/route: export symbol ip_tos2prio" which allowed the RDMA-CM
to use ip_tos2prio.

Signed-off-by: Eyal Perry <eyalpe@mellanox.com>
Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-11-07 19:09:44 -05:00
Erik Hugne a715b49e79 tipc: reassembly failures should cause link reset
If appending a received fragment to the pending fragment chain
in a unicast link fails, the current code tries to force a retransmission
of the fragment by decrementing the 'next received sequence number'
field in the link. This is done under the assumption that the failure
is caused by an out-of-memory situation, an assumption that does
not hold true after the previous patch in this series.

A failure to append a fragment can now only be caused by a protocol
violation by the sending peer, and it must hence be assumed that it
is either malicious or buggy.  Either way, the correct behavior is now
to reset the link instead of trying to revert its sequence number.
So, this is what we do in this commit.

Signed-off-by: Erik Hugne <erik.hugne@ericsson.com>
Reviewed-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-11-07 18:30:11 -05:00
Erik Hugne 40ba3cdf54 tipc: message reassembly using fragment chain
When the first fragment of a long data data message is received on a link, a
reassembly buffer large enough to hold the data from this and all subsequent
fragments of the message is allocated. The payload of each new fragment is
copied into this buffer upon arrival. When the last fragment is received, the
reassembled message is delivered upwards to the port/socket layer.

Not only is this an inefficient approach, but it may also cause bursts of
reassembly failures in low memory situations. since we may fail to allocate
the necessary large buffer in the first place. Furthermore, after 100 subsequent
such failures the link will be reset, something that in reality aggravates the
situation.

To remedy this problem, this patch introduces a different approach. Instead of
allocating a big reassembly buffer, we now append the arriving fragments
to a reassembly chain on the link, and deliver the whole chain up to the
socket layer once the last fragment has been received. This is safe because
the retransmission layer of a TIPC link always delivers packets in strict
uninterrupted order, to the reassembly layer as to all other upper layers.
Hence there can never be more than one fragment chain pending reassembly at
any given time in a link, and we can trust (but still verify) that the
fragments will be chained up in the correct order.

Signed-off-by: Erik Hugne <erik.hugne@ericsson.com>
Reviewed-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-11-07 18:30:11 -05:00
Erik Hugne 528f6f4bf3 tipc: don't reroute message fragments
When a message fragment is received in a broadcast or unicast link,
the reception code will append the fragment payload to a big reassembly
buffer through a call to the function tipc_recv_fragm(). However, after
the return of that call, the logics goes on and passes the fragment
buffer to the function tipc_net_route_msg(), which will simply drop it.
This behavior is a remnant from the now obsolete multi-cluster
functionality, and has no relevance in the current code base.

Although currently harmless, this unnecessary call would be fatal
after applying the next patch in this series, which introduces
a completely new reassembly algorithm. So we change the code to
eliminate the redundant call.

Signed-off-by: Erik Hugne <erik.hugne@ericsson.com>
Reviewed-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-11-07 18:30:11 -05:00
Linus Torvalds c224b76b56 NFS client updates for Linux 3.13
Highlights include:
 
 - Changes to the RPC socket code to allow NFSv4 to turn off timeout+retry
   - Detect TCP connection breakage through the "keepalive" mechanism
 - Add client side support for NFSv4.x migration (Chuck Lever)
 - Add support for multiple security flavour arguments to the "sec=" mount
   option (Dros Adamson)
 - fs-cache bugfixes from David Howells:
   - Fix an issue whereby caching can be enabled on a file that is open for
     writing
 - More NFSv4 open code stable bugfixes
 - Various Labeled NFS (selinux) bugfixes, including one stable fix
 - Fix buffer overflow checking in the RPCSEC_GSS upcall encoding
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.15 (GNU/Linux)
 
 iQIcBAABAgAGBQJSe8TEAAoJEGcL54qWCgDydu0QAJVtVhfwlUKm/HZ4oAy0Q5T8
 rJOWupqGnwyqTNLIRTlNegFSwMY+bABbkihXzSoj641o5zRb200KePlNxknzzlu1
 Q715035LDeEC1jrrHHeztTa9uWxAZ9B6gstMzilJYbV72VRYuWA6Q5LstXwQy/jN
 ViSldrGJ4sRZUe6wpNLPBRDBfOMWOtZdyRqqqjm71ZHJJnaqQWLBvThTG4MsLlpg
 j/khi5189MxJWePTKI9zGZdnXZAZ0ar1tAi1QWDNv044EwsS3LZZIko+YdBh6LZx
 9IBwk6TqOXFY0jxPDsIZtTfWPf4pjewRrPINMkjlZl3TJEf97sIlavZ7gWqvVIz5
 eXzFGy7D2XBgub8TGcmZM/7keHY/sqghz7lXZ8FulXlVem52r/95NiQ9tu8l8hq3
 Ab0FUnjtXeuaDFPBCHlKb3zmCMGFF89VqtpCj2plCPvfcGgJvXJqddWBRisQw9St
 UgD1PQWRFGtkrHv5EcQkd5boVdRNjAVAC9PaCWNpOpSVDjJyuUE+v/k75+ZwDcG8
 afAFMJSbCwRxW+cFlLAsQTfQztzuWTTOOVQvJDxfyYulcWshyIruhiYItRDfJqRp
 RynuVzrBERzUs5wsefnBbC218C/WSlOrodPbsZvdhKolvRx1RNtWT29ilZ6+p2tH
 4378ZRLtQvm9RXBnAkRc
 =gflJ
 -----END PGP SIGNATURE-----

Merge tag 'nfs-for-3.13-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs

Pull NFS client updates from Trond Myklebust:
 "Highlights include:

   - Changes to the RPC socket code to allow NFSv4 to turn off
     timeout+retry:
      * Detect TCP connection breakage through the "keepalive" mechanism
   - Add client side support for NFSv4.x migration (Chuck Lever)
   - Add support for multiple security flavour arguments to the "sec="
     mount option (Dros Adamson)
   - fs-cache bugfixes from David Howells:
     * Fix an issue whereby caching can be enabled on a file that is
       open for writing
   - More NFSv4 open code stable bugfixes
   - Various Labeled NFS (selinux) bugfixes, including one stable fix
   - Fix buffer overflow checking in the RPCSEC_GSS upcall encoding"

* tag 'nfs-for-3.13-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (68 commits)
  NFSv4.2: Remove redundant checks in nfs_setsecurity+nfs4_label_init_security
  NFSv4: Sanity check the server reply in _nfs4_server_capabilities
  NFSv4.2: encode_readdir - only ask for labels when doing readdirplus
  nfs: set security label when revalidating inode
  NFSv4.2: Fix a mismatch between Linux labeled NFS and the NFSv4.2 spec
  NFS: Fix a missing initialisation when reading the SELinux label
  nfs: fix oops when trying to set SELinux label
  nfs: fix inverted test for delegation in nfs4_reclaim_open_state
  SUNRPC: Cleanup xs_destroy()
  SUNRPC: close a rare race in xs_tcp_setup_socket.
  SUNRPC: remove duplicated include from clnt.c
  nfs: use IS_ROOT not DCACHE_DISCONNECTED
  SUNRPC: Fix buffer overflow checking in gss_encode_v0_msg/gss_encode_v1_msg
  SUNRPC: gss_alloc_msg - choose _either_ a v0 message or a v1 message
  SUNRPC: remove an unnecessary if statement
  nfs: Use PTR_ERR_OR_ZERO in 'nfs/nfs4super.c'
  nfs: Use PTR_ERR_OR_ZERO in 'nfs41_callback_up' function
  nfs: Remove useless 'error' assignment
  sunrpc: comment typo fix
  SUNRPC: Add correct rcu_dereference annotation in rpc_clnt_set_transport
  ...
2013-11-08 05:57:46 +09:00
Linus Torvalds 0324e74534 Driver Core / sysfs patches for 3.13-rc1
Here's the big driver core / sysfs update for 3.13-rc1.
 
 There's lots of dev_groups updates for different subsystems, as they all
 get slowly migrated over to the safe versions of the attribute groups
 (removing userspace races with the creation of the sysfs files.)  Also
 in here are some kobject updates, devres expansions, and the first round
 of Tejun's sysfs reworking to enable it to be used by other subsystems
 as a backend for an in-kernel filesystem.
 
 All of these have been in linux-next for a while with no reported
 issues.
 
 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.19 (GNU/Linux)
 
 iEYEABECAAYFAlJ6xAMACgkQMUfUDdst+yk1kQCfcHXhfnrvFZ5J/mDP509IzhNS
 ddEAoLEWoivtBppNsgrWqXpD1vi4UMsE
 =JmVW
 -----END PGP SIGNATURE-----

Merge tag 'driver-core-3.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core

Pull driver core / sysfs patches from Greg KH:
 "Here's the big driver core / sysfs update for 3.13-rc1.

  There's lots of dev_groups updates for different subsystems, as they
  all get slowly migrated over to the safe versions of the attribute
  groups (removing userspace races with the creation of the sysfs
  files.) Also in here are some kobject updates, devres expansions, and
  the first round of Tejun's sysfs reworking to enable it to be used by
  other subsystems as a backend for an in-kernel filesystem.

  All of these have been in linux-next for a while with no reported
  issues"

* tag 'driver-core-3.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (83 commits)
  sysfs: rename sysfs_assoc_lock and explain what it's about
  sysfs: use generic_file_llseek() for sysfs_file_operations
  sysfs: return correct error code on unimplemented mmap()
  mdio_bus: convert bus code to use dev_groups
  device: Make dev_WARN/dev_WARN_ONCE print device as well as driver name
  sysfs: separate out dup filename warning into a separate function
  sysfs: move sysfs_hash_and_remove() to fs/sysfs/dir.c
  sysfs: remove unused sysfs_get_dentry() prototype
  sysfs: honor bin_attr.attr.ignore_lockdep
  sysfs: merge sysfs_elem_bin_attr into sysfs_elem_attr
  devres: restore zeroing behavior of devres_alloc()
  sysfs: fix sysfs_write_file for bin file
  input: gameport: convert bus code to use dev_groups
  input: serio: remove bus usage of dev_attrs
  input: serio: use DEVICE_ATTR_RO()
  i2o: convert bus code to use dev_groups
  memstick: convert bus code to use dev_groups
  tifm: convert bus code to use dev_groups
  virtio: convert bus code to use dev_groups
  ipack: convert bus code to use dev_groups
  ...
2013-11-07 11:42:15 +09:00
John Stultz 5ac68e7c34 ipv6: Fix possible ipv6 seqlock deadlock
While enabling lockdep on seqlocks, I ran across the warning below
caused by the ipv6 stats being updated in both irq and non-irq context.

This patch changes from IP6_INC_STATS_BH to IP6_INC_STATS (suggested
by Eric Dumazet) to resolve this problem.

[   11.120383] =================================
[   11.121024] [ INFO: inconsistent lock state ]
[   11.121663] 3.12.0-rc1+ #68 Not tainted
[   11.122229] ---------------------------------
[   11.122867] inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage.
[   11.123741] init/4483 [HC0[0]:SC1[3]:HE1:SE0] takes:
[   11.124505]  (&stats->syncp.seq#6){+.?...}, at: [<c1ab80c2>] ndisc_send_ns+0xe2/0x130
[   11.125736] {SOFTIRQ-ON-W} state was registered at:
[   11.126447]   [<c10e0eb7>] __lock_acquire+0x5c7/0x1af0
[   11.127222]   [<c10e2996>] lock_acquire+0x96/0xd0
[   11.127925]   [<c1a9a2c3>] write_seqcount_begin+0x33/0x40
[   11.128766]   [<c1a9aa03>] ip6_dst_lookup_tail+0x3a3/0x460
[   11.129582]   [<c1a9e0ce>] ip6_dst_lookup_flow+0x2e/0x80
[   11.130014]   [<c1ad18e0>] ip6_datagram_connect+0x150/0x4e0
[   11.130014]   [<c1a4d0b5>] inet_dgram_connect+0x25/0x70
[   11.130014]   [<c198dd61>] SYSC_connect+0xa1/0xc0
[   11.130014]   [<c198f571>] SyS_connect+0x11/0x20
[   11.130014]   [<c198fe6b>] SyS_socketcall+0x12b/0x300
[   11.130014]   [<c1bbf880>] syscall_call+0x7/0xb
[   11.130014] irq event stamp: 1184
[   11.130014] hardirqs last  enabled at (1184): [<c1086901>] local_bh_enable+0x71/0x110
[   11.130014] hardirqs last disabled at (1183): [<c10868cd>] local_bh_enable+0x3d/0x110
[   11.130014] softirqs last  enabled at (0): [<c108014d>] copy_process.part.42+0x45d/0x11a0
[   11.130014] softirqs last disabled at (1147): [<c1086e05>] irq_exit+0xa5/0xb0
[   11.130014]
[   11.130014] other info that might help us debug this:
[   11.130014]  Possible unsafe locking scenario:
[   11.130014]
[   11.130014]        CPU0
[   11.130014]        ----
[   11.130014]   lock(&stats->syncp.seq#6);
[   11.130014]   <Interrupt>
[   11.130014]     lock(&stats->syncp.seq#6);
[   11.130014]
[   11.130014]  *** DEADLOCK ***
[   11.130014]
[   11.130014] 3 locks held by init/4483:
[   11.130014]  #0:  (rcu_read_lock){.+.+..}, at: [<c109363c>] SyS_setpriority+0x4c/0x620
[   11.130014]  #1:  (((&ifa->dad_timer))){+.-...}, at: [<c108c1c0>] call_timer_fn+0x0/0xf0
[   11.130014]  #2:  (rcu_read_lock){.+.+..}, at: [<c1ab6494>] ndisc_send_skb+0x54/0x5d0
[   11.130014]
[   11.130014] stack backtrace:
[   11.130014] CPU: 0 PID: 4483 Comm: init Not tainted 3.12.0-rc1+ #68
[   11.130014] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
[   11.130014]  00000000 00000000 c55e5c10 c1bb0e71 c57128b0 c55e5c4c c1badf79 c1ec1123
[   11.130014]  c1ec1484 00001183 00000000 00000000 00000001 00000003 00000001 00000000
[   11.130014]  c1ec1484 00000004 c5712dcc 00000000 c55e5c84 c10de492 00000004 c10755f2
[   11.130014] Call Trace:
[   11.130014]  [<c1bb0e71>] dump_stack+0x4b/0x66
[   11.130014]  [<c1badf79>] print_usage_bug+0x1d3/0x1dd
[   11.130014]  [<c10de492>] mark_lock+0x282/0x2f0
[   11.130014]  [<c10755f2>] ? kvm_clock_read+0x22/0x30
[   11.130014]  [<c10dd8b0>] ? check_usage_backwards+0x150/0x150
[   11.130014]  [<c10e0e74>] __lock_acquire+0x584/0x1af0
[   11.130014]  [<c10b1baf>] ? sched_clock_cpu+0xef/0x190
[   11.130014]  [<c10de58c>] ? mark_held_locks+0x8c/0xf0
[   11.130014]  [<c10e2996>] lock_acquire+0x96/0xd0
[   11.130014]  [<c1ab80c2>] ? ndisc_send_ns+0xe2/0x130
[   11.130014]  [<c1ab66d3>] ndisc_send_skb+0x293/0x5d0
[   11.130014]  [<c1ab80c2>] ? ndisc_send_ns+0xe2/0x130
[   11.130014]  [<c1ab80c2>] ndisc_send_ns+0xe2/0x130
[   11.130014]  [<c108cc32>] ? mod_timer+0xf2/0x160
[   11.130014]  [<c1aa706e>] ? addrconf_dad_timer+0xce/0x150
[   11.130014]  [<c1aa70aa>] addrconf_dad_timer+0x10a/0x150
[   11.130014]  [<c1aa6fa0>] ? addrconf_dad_completed+0x1c0/0x1c0
[   11.130014]  [<c108c233>] call_timer_fn+0x73/0xf0
[   11.130014]  [<c108c1c0>] ? __internal_add_timer+0xb0/0xb0
[   11.130014]  [<c1aa6fa0>] ? addrconf_dad_completed+0x1c0/0x1c0
[   11.130014]  [<c108c5b1>] run_timer_softirq+0x141/0x1e0
[   11.130014]  [<c1086b20>] ? __do_softirq+0x70/0x1b0
[   11.130014]  [<c1086b70>] __do_softirq+0xc0/0x1b0
[   11.130014]  [<c1086e05>] irq_exit+0xa5/0xb0
[   11.130014]  [<c106cfd5>] smp_apic_timer_interrupt+0x35/0x50
[   11.130014]  [<c1bbfbca>] apic_timer_interrupt+0x32/0x38
[   11.130014]  [<c10936ed>] ? SyS_setpriority+0xfd/0x620
[   11.130014]  [<c10e26c9>] ? lock_release+0x9/0x240
[   11.130014]  [<c10936d7>] ? SyS_setpriority+0xe7/0x620
[   11.130014]  [<c1bbee6d>] ? _raw_read_unlock+0x1d/0x30
[   11.130014]  [<c1093701>] SyS_setpriority+0x111/0x620
[   11.130014]  [<c109363c>] ? SyS_setpriority+0x4c/0x620
[   11.130014]  [<c1bbf880>] syscall_call+0x7/0xb

Signed-off-by: John Stultz <john.stultz@linaro.org>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
Cc: James Morris <jmorris@namei.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Patrick McHardy <kaber@trash.net>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: netdev@vger.kernel.org
Link: http://lkml.kernel.org/r/1381186321-4906-5-git-send-email-john.stultz@linaro.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-11-06 12:40:28 +01:00
John Stultz 827da44c61 net: Explicitly initialize u64_stats_sync structures for lockdep
In order to enable lockdep on seqcount/seqlock structures, we
must explicitly initialize any locks.

The u64_stats_sync structure, uses a seqcount, and thus we need
to introduce a u64_stats_init() function and use it to initialize
the structure.

This unfortunately adds a lot of fairly trivial initialization code
to a number of drivers. But the benefit of ensuring correctness makes
this worth while.

Because these changes are required for lockdep to be enabled, and the
changes are quite trivial, I've not yet split this patch out into 30-some
separate patches, as I figured it would be better to get the various
maintainers thoughts on how to best merge this change along with
the seqcount lockdep enablement.

Feedback would be appreciated!

Signed-off-by: John Stultz <john.stultz@linaro.org>
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
Cc: James Morris <jmorris@namei.org>
Cc: Jesse Gross <jesse@nicira.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Mirko Lindner <mlindner@marvell.com>
Cc: Patrick McHardy <kaber@trash.net>
Cc: Roger Luethi <rl@hellgate.ch>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Simon Horman <horms@verge.net.au>
Cc: Stephen Hemminger <stephen@networkplumber.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
Cc: Wensong Zhang <wensong@linux-vs.org>
Cc: netdev@vger.kernel.org
Link: http://lkml.kernel.org/r/1381186321-4906-2-git-send-email-john.stultz@linaro.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-11-06 12:40:25 +01:00
Duan Jiong 249a3630c4 ipv6: drop the judgement in rt6_alloc_cow()
Now rt6_alloc_cow() is only called by ip6_pol_route() when
rt->rt6i_flags doesn't contain both RTF_NONEXTHOP and RTF_GATEWAY,
and rt->rt6i_flags hasn't been changed in ip6_rt_copy().
So there is no neccessary to judge whether rt->rt6i_flags contains
RTF_GATEWAY or not.

Signed-off-by: Duan Jiong <duanj.fnst@cn.fujitsu.com>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-11-05 22:17:05 -05:00
Hannes Frederic Sowa 0e033e04c2 ipv6: fix headroom calculation in udp6_ufo_fragment
Commit 1e2bd517c1 ("udp6: Fix udp
fragmentation for tunnel traffic.") changed the calculation if
there is enough space to include a fragment header in the skb from a
skb->mac_header dervived one to skb_headroom. Because we already peeled
off the skb to transport_header this is wrong. Change this back to check
if we have enough room before the mac_header.

This fixes a panic Saran Neti reported. He used the tbf scheduler which
skb_gso_segments the skb. The offsets get negative and we panic in memcpy
because the skb was erroneously not expanded at the head.

Reported-by: Saran Neti <Saran.Neti@telus.com>
Cc: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-11-05 22:09:53 -05:00
Hannes Frederic Sowa 482fc6094a ipv4: introduce new IP_MTU_DISCOVER mode IP_PMTUDISC_INTERFACE
Sockets marked with IP_PMTUDISC_INTERFACE won't do path mtu discovery,
their sockets won't accept and install new path mtu information and they
will always use the interface mtu for outgoing packets. It is guaranteed
that the packet is not fragmented locally. But we won't set the DF-Flag
on the outgoing frames.

Florian Weimer had the idea to use this flag to ensure DNS servers are
never generating outgoing fragments. They may well be fragmented on the
path, but the server never stores or usees path mtu values, which could
well be forged in an attack.

(The root of the problem with path MTU discovery is that there is
no reliable way to authenticate ICMP Fragmentation Needed But DF Set
messages because they are sent from intermediate routers with their
source addresses, and the IMCP payload will not always contain sufficient
information to identify a flow.)

Recent research in the DNS community showed that it is possible to
implement an attack where DNS cache poisoning is feasible by spoofing
fragments. This work was done by Amir Herzberg and Haya Shulman:
<https://sites.google.com/site/hayashulman/files/fragmentation-poisoning.pdf>

This issue was previously discussed among the DNS community, e.g.
<http://www.ietf.org/mail-archive/web/dnsext/current/msg01204.html>,
without leading to fixes.

This patch depends on the patch "ipv4: fix DO and PROBE pmtu mode
regarding local fragmentation with UFO/CORK" for the enforcement of the
non-fragmentable checks. If other users than ip_append_page/data should
use this semantic too, we have to add a new flag to IPCB(skb)->flags to
suppress local fragmentation and check for this in ip_finish_output.

Many thanks to Florian Weimer for the idea and feedback while implementing
this patch.

Cc: David S. Miller <davem@davemloft.net>
Suggested-by: Florian Weimer <fweimer@redhat.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-11-05 21:52:27 -05:00
John W. Linville 33b443422e Merge branch 'for-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next 2013-11-05 15:50:22 -05:00
John W. Linville b476d3f143 Merge branch 'for-john' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211 2013-11-05 15:49:16 -05:00
John W. Linville 353c78152c Merge branch 'for-john' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next
Conflicts:
	net/wireless/reg.c
2013-11-05 15:49:02 -05:00
Florent Fourcot b579035ff7 ipv6: remove old conditions on flow label sharing
The code of flow label in Linux Kernel follows
the rules of RFC 1809 (an informational one) for
conditions on flow label sharing. There rules are
not in the last proposed standard for flow label
(RFC 6437), or in the previous one (RFC 3697).

Since this code does not follow any current or
old standard, we can remove it.

With this removal, the ipv6_opt_cmp function is
now a dead code and it can be removed too.

Changelog to v1:
 * add justification for the change
 * remove the condition on IPv6 options

[ Remove ipv6_hdr_cmp and it is now unused as well. -DaveM ]

Signed-off-by: Florent Fourcot <florent.fourcot@enst-bretagne.fr>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-11-05 14:40:53 -05:00