Commit Graph

439991 Commits

Author SHA1 Message Date
Heiko Carstens a1977d128a s390: wire up sys_renameat2
Actually this also enable sys_setattr and sys_getattr, since I forgot to
increase NR_syscalls when adding those syscalls.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2014-04-11 13:53:29 +02:00
Heiko Carstens 9ea806621d s390: show_registers() should not map user space addresses to kernel symbols
It doesn't make sense to map user space addresses to kernel symbols when
show_registers() prints a user space psw. So just skip the translation part
if a user space psw is handled.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2014-04-11 13:53:27 +02:00
Heiko Carstens 3b7df3421f s390/mm: print control registers and page table walk on crash
Print extra debugging information to the console if the kernel or a user
space process crashed (with user space debugging enabled):

- contents of control register 7 and 13
- failing address and translation exception identification
- page table walk for the failing address

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2014-04-09 10:19:13 +02:00
Heiko Carstens e7c46c66db s390/smp: fix smp_stop_cpu() for !CONFIG_SMP
smp_stop_cpu() should stop the current cpu even for !CONFIG_SMP.
Otherwise machine_halt() will return and and the machine generates a
panic instread of simply stopping the current cpu:

Kernel panic - not syncing: Attempted to kill init! exitcode=0x00000000

CPU: 0 PID: 1 Comm: systemd-shutdow Not tainted 3.14.0-01527-g2b6ef16a6bc5 #10
[...]
Call Trace:
([<0000000000110db0>] show_trace+0xf8/0x158)
 [<0000000000110e7a>] show_stack+0x6a/0xe8
 [<000000000074dba8>] panic+0xe4/0x268
 [<0000000000140570>] do_exit+0xa88/0xb2c
 [<000000000016e12c>] SyS_reboot+0x1f0/0x234
 [<000000000075da70>] sysc_nr_ok+0x22/0x28
 [<000000007d5a09b4>] 0x7d5a09b4

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2014-04-09 10:19:12 +02:00
Martin Schwidefsky a8a934e44f s390: fix control register update
The git commit c63badebfe
"s390: optimize control register update" broke the update for
control register 0. After the update do the lctlg from the correct
value.

Cc: <stable@vger.kernel.org> # 3.14
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2014-04-09 10:19:11 +02:00
Linus Torvalds 75ff24fa52 Merge branch 'for-3.15' of git://linux-nfs.org/~bfields/linux
Pull nfsd updates from Bruce Fields:
 "Highlights:
   - server-side nfs/rdma fixes from Jeff Layton and Tom Tucker
   - xdr fixes (a larger xdr rewrite has been posted but I decided it
     would be better to queue it up for 3.16).
   - miscellaneous fixes and cleanup from all over (thanks especially to
     Kinglong Mee)"

* 'for-3.15' of git://linux-nfs.org/~bfields/linux: (36 commits)
  nfsd4: don't create unnecessary mask acl
  nfsd: revert v2 half of "nfsd: don't return high mode bits"
  nfsd4: fix memory leak in nfsd4_encode_fattr()
  nfsd: check passed socket's net matches NFSd superblock's one
  SUNRPC: Clear xpt_bc_xprt if xs_setup_bc_tcp failed
  NFSD/SUNRPC: Check rpc_xprt out of xs_setup_bc_tcp
  SUNRPC: New helper for creating client with rpc_xprt
  NFSD: Free backchannel xprt in bc_destroy
  NFSD: Clear wcc data between compound ops
  nfsd: Don't return NFS4ERR_STALE_STATEID for NFSv4.1+
  nfsd4: fix nfs4err_resource in 4.1 case
  nfsd4: fix setclientid encode size
  nfsd4: remove redundant check from nfsd4_check_resp_size
  nfsd4: use more generous NFS4_ACL_MAX
  nfsd4: minor nfsd4_replay_cache_entry cleanup
  nfsd4: nfsd4_replay_cache_entry should be static
  nfsd4: update comments with obsolete function name
  rpc: Allow xdr_buf_subsegment to operate in-place
  NFSD: Using free_conn free connection
  SUNRPC: fix memory leak of peer addresses in XPRT
  ...
2014-04-08 18:28:14 -07:00
Linus Torvalds 0f386a7074 Merge branch 'akpm' (incoming from Andrew)
Merge a few more patches from Andrew Morton:
 "A few leftovers"

* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
  fs/ncpfs/dir.c: fix indenting in ncp_lookup()
  ncpfs/inode.c: fix mismatch printk formats and arguments
  ncpfs: remove now unused PRINTK macro
  ncpfs: convert PPRINTK to ncp_vdbg
  ncpfs: convert DPRINTK/DDPRINTK to ncp_dbg
  ncpfs: Add pr_fmt and convert printks to pr_<level>
  arch/x86/mm/kmemcheck/kmemcheck.c: use kstrtoint() instead of sscanf()
  lib/percpu_counter.c: fix bad percpu counter state during suspend
  autofs4: check dev ioctl size before allocating
  mm: vmscan: do not swap anon pages just because free+file is low
2014-04-08 16:51:05 -07:00
Dan Carpenter ffddc5fd19 fs/ncpfs/dir.c: fix indenting in ncp_lookup()
My static checker suggests adding curly braces here.  Probably that was
the intent, but actually the code works the same either way.  I've just
changed the indenting and left the code as-is.

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Cc: Petr Vandrovec <petr@vandrovec.name>
Acked-by: Dave Chiluk <chiluk@canonical.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-04-08 16:48:53 -07:00
Joe Perches 15a03ac6f8 ncpfs/inode.c: fix mismatch printk formats and arguments
Conversions to ncp_dbg showed some format/argument mismatches so fix
them.

Signed-off-by: Joe Perches <joe@perches.com>
Cc: Petr Vandrovec <petr@vandrovec.name>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-04-08 16:48:53 -07:00
Joe Perches 485b47f68c ncpfs: remove now unused PRINTK macro
Uses are gone, remove the macro.

Signed-off-by: Joe Perches <joe@perches.com>
Cc: Petr Vandrovec <petr@vandrovec.name>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-04-08 16:48:52 -07:00
Joe Perches e45ca8baa3 ncpfs: convert PPRINTK to ncp_vdbg
Use a more current logging style.

Convert the paranoia debug statement to vdbg.
Remove the embedded function names as dynamic_debug can do that.

Signed-off-by: Joe Perches <joe@perches.com>
Cc: Petr Vandrovec <petr@vandrovec.name>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-04-08 16:48:52 -07:00
Joe Perches d3b73ca1be ncpfs: convert DPRINTK/DDPRINTK to ncp_dbg
Use a more current logging style and enable use of dynamic debugging.

Remove embedded function names, dynamic debug can add this instead.

Signed-off-by: Joe Perches <joe@perches.com>
Cc: Petr Vandrovec <petr@vandrovec.name>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-04-08 16:48:52 -07:00
Joe Perches b41f8b84d0 ncpfs: Add pr_fmt and convert printks to pr_<level>
Convert to a more current logging style.

Add pr_fmt to prefix with "ncpfs: ".
Remove the embedded function names and use "%s: ", __func__

Some previously unprefixed messages now have "ncpfs: "

Signed-off-by: Joe Perches <joe@perches.com>
Cc: Petr Vandrovec <petr@vandrovec.name>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-04-08 16:48:52 -07:00
David Rientjes d0057ca4c1 arch/x86/mm/kmemcheck/kmemcheck.c: use kstrtoint() instead of sscanf()
Kmemcheck should use the preferred interface for parsing command line
arguments, kstrto*(), rather than sscanf() itself.  Use it
appropriately.

Signed-off-by: David Rientjes <rientjes@google.com>
Cc: Vegard Nossum <vegardno@ifi.uio.no>
Acked-by: Pekka Enberg <penberg@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-04-08 16:48:52 -07:00
Jens Axboe e39435ce68 lib/percpu_counter.c: fix bad percpu counter state during suspend
I got a bug report yesterday from Laszlo Ersek in which he states that
his kvm instance fails to suspend.  Laszlo bisected it down to this
commit 1cf7e9c68f ("virtio_blk: blk-mq support") where virtio-blk is
converted to use the blk-mq infrastructure.

After digging a bit, it became clear that the issue was with the queue
drain.  blk-mq tracks queue usage in a percpu counter, which is
incremented on request alloc and decremented when the request is freed.
The initial hunt was for an inconsistency in blk-mq, but everything
seemed fine.  In fact, the counter only returned crazy values when
suspend was in progress.

When a CPU is unplugged, the percpu counters merges that CPU state with
the general state.  blk-mq takes care to register a hotcpu notifier with
the appropriate priority, so we know it runs after the percpu counter
notifier.  However, the percpu counter notifier only merges the state
when the CPU is fully gone.  This leaves a state transition where the
CPU going away is no longer in the online mask, yet it still holds
private values.  This means that in this state, percpu_counter_sum()
returns invalid results, and the suspend then hangs waiting for
abs(dead-cpu-value) requests to complete which of course will never
happen.

Fix this by clearing the state earlier, so we never have a case where
the CPU isn't in online mask but still holds private state.  This bug
has been there since forever, I guess we don't have a lot of users where
percpu counters needs to be reliable during the suspend cycle.

Signed-off-by: Jens Axboe <axboe@fb.com>
Reported-by: Laszlo Ersek <lersek@redhat.com>
Tested-by: Laszlo Ersek <lersek@redhat.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-04-08 16:48:51 -07:00
Sasha Levin e53d77eb8b autofs4: check dev ioctl size before allocating
There wasn't any check of the size passed from userspace before trying
to allocate the memory required.

This meant that userspace might request more space than allowed,
triggering an OOM.

Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
Signed-off-by: Ian Kent <raven@themaw.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-04-08 16:48:51 -07:00
Johannes Weiner 0bf1457f0c mm: vmscan: do not swap anon pages just because free+file is low
Page reclaim force-scans / swaps anonymous pages when file cache drops
below the high watermark of a zone in order to prevent what little cache
remains from thrashing.

However, on bigger machines the high watermark value can be quite large
and when the workload is dominated by a static anonymous/shmem set, the
file set might just be a small window of used-once cache.  In such
situations, the VM starts swapping heavily when instead it should be
recycling the no longer used cache.

This is a longer-standing problem, but it's more likely to trigger after
commit 81c0a2bb51 ("mm: page_alloc: fair zone allocator policy")
because file pages can no longer accumulate in a single zone and are
dispersed into smaller fractions among the available zones.

To resolve this, do not force scan anon when file pages are low but
instead rely on the scan/rotation ratios to make the right prediction.

Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Rafael Aquini <aquini@redhat.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Hugh Dickins <hughd@google.com>
Cc: Suleiman Souhlal <suleiman@google.com>
Cc: <stable@kernel.org>		[3.12+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-04-08 16:48:51 -07:00
Linus Torvalds ce7613db2d Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull more networking updates from David Miller:

 1) If a VXLAN interface is created with no groups, we can crash on
    reception of packets.  Fix from Mike Rapoport.

 2) Missing includes in CPTS driver, from Alexei Starovoitov.

 3) Fix string validations in isdnloop driver, from YOSHIFUJI Hideaki
    and Dan Carpenter.

 4) Missing irq.h include in bnxw2x, enic, and qlcnic drivers.  From
    Josh Boyer.

 5) AF_PACKET transmit doesn't statistically count TX drops, from Daniel
    Borkmann.

 6) Byte-Queue-Limit enabled drivers aren't handled properly in
    AF_PACKET transmit path, also from Daniel Borkmann.

    Same problem exists in pktgen, and Daniel fixed it there too.

 7) Fix resource leaks in driver probe error paths of new sxgbe driver,
    from Francois Romieu.

 8) Truesize of SKBs can gradually get more and more corrupted in NAPI
    packet recycling path, fix from Eric Dumazet.

 9) Fix uniprocessor netfilter build, from Florian Westphal.  In the
    longer term we should perhaps try to find a way for ARRAY_SIZE() to
    work even with zero sized array elements.

10) Fix crash in netfilter conntrack extensions due to mis-estimation of
    required extension space.  From Andrey Vagin.

11) Since we commit table rule updates before trying to copy the
    counters back to userspace (it's the last action we perform), we
    really can't signal the user copy with an error as we are beyond the
    point from which we can unwind everything.  This causes all kinds of
    use after free crashes and other mysterious behavior.

    From Thomas Graf.

12) Restore previous behvaior of div/mod by zero in BPF filter
    processing.  From Daniel Borkmann.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (38 commits)
  net: sctp: wake up all assocs if sndbuf policy is per socket
  isdnloop: several buffer overflows
  netdev: remove potentially harmful checks
  pktgen: fix xmit test for BQL enabled devices
  net/at91_ether: avoid NULL pointer dereference
  tipc: Let tipc_release() return 0
  at86rf230: fix MAX_CSMA_RETRIES parameter
  mac802154: fix duplicate #include headers
  sxgbe: fix duplicate #include headers
  net: filter: be more defensive on div/mod by X==0
  netfilter: Can't fail and free after table replacement
  xen-netback: Trivial format string fix
  net: bcmgenet: Remove unnecessary version.h inclusion
  net: smc911x: Remove unused local variable
  bonding: Inactive slaves should keep inactive flag's value
  netfilter: nf_tables: fix wrong format in request_module()
  netfilter: nf_tables: set names cannot be larger than 15 bytes
  netfilter: nf_conntrack: reserve two bytes for nf_ct_ext->len
  netfilter: Add {ipt,ip6t}_osf aliases for xt_osf
  netfilter: x_tables: allow to use cgroup match for LOCAL_IN nf hooks
  ...
2014-04-08 12:41:23 -07:00
Linus Torvalds 0afccc4cce More staging patches for 3.15-rc1
Here are some more staging patches for 3.15-rc1.  They include a
 late-submission of a wireless driver that a bunch of people seem to have
 the hardware for now.  As it's stand-alone, it should be fine (now
 passes the 0-day random build bot tests.)  There are also some fixes for
 the unisys drivers, as they were causing havoc on a number of different
 machines.  To resolve all of those issues, we just mark the driver as
 BROKEN now, and we can fix it up "properly" over time.
 
 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iEYEABECAAYFAlNEO1sACgkQMUfUDdst+ymMegCgy/ZpC010cymW2ZYgDxqjEEss
 3xUAoLA0Psv1z59hZv4EwrfStxzTiYAQ
 =waW7
 -----END PGP SIGNATURE-----

Merge tag 'staging-3.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging

Pull more staging patches from Greg KH:
 "Here are some more staging patches for 3.15-rc1.

  They include a late-submission of a wireless driver that a bunch of
  people seem to have the hardware for now.  As it's stand-alone, it
  should be fine (now passes the 0-day random build bot tests).

  There are also some fixes for the unisys drivers, as they were causing
  havoc on a number of different machines.  To resolve all of those
  issues, we just mark the driver as BROKEN now, and we can fix it up
  "properly" over time"

* tag 'staging-3.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging:
  staging: rtl8723au: The 8723 only has two paths
  Staging: unisys: mark drivers as BROKEN
  Staging: unisys: verify that a control channel exists
  staging: unisys: Add missing close parentheses in filexfer.c
  staging: r8723au: Fix build problem when RFKILL is not selected
  staging: r8723au: Fix randconfig build errors
  staging: r8723au: Turn on build of new driver
  staging: r8723au: Additional source patches
  staging: r8723au: Add source files for new driver - part 4
  staging: r8723au: Add source files for new driver - part 3
  staging: r8723au: Add source files for new driver - part 2
  staging: r8723au: Add source files for new driver - part 1
2014-04-08 12:37:29 -07:00
Linus Torvalds e4f30545a2 - Documentation clarification on CPU topology and booting requirements
- Additional cache flushing during boot (needed in the presence of
   external caches or under virtualisation)
 - DMA range invalidation fix for non cache line aligned buffers
 - Build failure fix with !COMPAT
 - Kconfig update for STRICT_DEVMEM
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.9 (GNU/Linux)
 
 iQIcBAABAgAGBQJTRDI8AAoJEGvWsS0AyF7xx2IP/jgBjjeIII7L6T45dG3BeJR1
 ph8CBfnsHK3wr5KXiFrdQ2mhAWFr44SLNOd9xuZZmzpA/QWlyqaH46+oqz2GozUf
 y2HucJoEIO+wXMFscQ/EQDiT4uUSbEXBQ6JeZzrhCgVaXeSs+wkKTtGXkxEh2gWo
 w4OI1/JX7phv4heim51aabzziQ3o9JziIs6hALv6OVZVsuPF/TX+mK8C2ejJWLnv
 ou+6E6iv69wNrgPnM3fcKj1CDisCNdFVvjd2LwzkJS7MUra74SWoXczCbfBYW6Ty
 1GgZ/t3TOluDoaLgXfGyQXxnhOUyHdV16034/k8wISfuXG59V0eT+CaCgAotftKD
 5oH+P4MfyVOvZpILrRtY/4MajlCr4V1RnzSYKnS/h3zzHW7Cx/BtYbbQEOVQnZwc
 Gh4adLqc0f8QtkD4zGI7UWmxPxiI9KX9EEpVDAU3TJw6FjVSp7qZ1ifajsWc201h
 STzQEu8LDBWQY2WKrtZxXvFjZj4eSSXNaDHNVCugODk2FBU6wNv4P5q1S+23xt+G
 rR9UI8a0mpginNvhnwHoIR6X+RW3CDaUzn9k3gaJWDWvGoTeuAIstdyxCn6Dm26c
 XF2B5xf5SdxXeNv513WfULCyLgJmLs9CzVFybgYWKLciLbLn7f3pAiisHuO1DWvu
 Gv+70wfYArAcbkGPw5Qt
 =mDkd
 -----END PGP SIGNATURE-----

Merge tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux

Pull second set of arm64 updates from Catalin Marinas:
 "A second pull request for this merging window, mainly with fixes and
  docs clarification:

   - Documentation clarification on CPU topology and booting
     requirements
   - Additional cache flushing during boot (needed in the presence of
     external caches or under virtualisation)
   - DMA range invalidation fix for non cache line aligned buffers
   - Build failure fix with !COMPAT
   - Kconfig update for STRICT_DEVMEM"

* tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
  arm64: Fix DMA range invalidation for cache line unaligned buffers
  arm64: Add missing Kconfig for CONFIG_STRICT_DEVMEM
  arm64: fix !CONFIG_COMPAT build failures
  Revert "arm64: virt: ensure visibility of __boot_cpu_mode"
  arm64: Relax the kernel cache requirements for boot
  arm64: Update the TCR_EL1 translation granule definitions for 16K pages
  ARM: topology: Make it clear that all CPUs need to be described
2014-04-08 12:06:03 -07:00
Linus Torvalds d586c86d50 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
Pull second set of s390 patches from Martin Schwidefsky:
 "The second part of Heikos uaccess rework, the page table walker for
  uaccess is now a thing of the past (yay!)

  The code change to fix the theoretical TLB flush problem allows us to
  add a TLB flush optimization for zEC12, this machine has new
  instructions that allow to do CPU local TLB flushes for single pages
  and for all pages of a specific address space.

  Plus the usual bug fixing and some more cleanup"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
  s390/uaccess: rework uaccess code - fix locking issues
  s390/mm,tlb: optimize TLB flushing for zEC12
  s390/mm,tlb: safeguard against speculative TLB creation
  s390/irq: Use defines for external interruption codes
  s390/irq: Add defines for external interruption codes
  s390/sclp: add timeout for queued requests
  kvm/s390: also set guest pages back to stable on kexec/kdump
  lcs: Add missing destroy_timer_on_stack()
  s390/tape: Add missing destroy_timer_on_stack()
  s390/tape: Use del_timer_sync()
  s390/3270: fix crash with multiple reset device requests
  s390/bitops,atomic: add missing memory barriers
  s390/zcrypt: add length check for aligned data to avoid overflow in msg-type 6
2014-04-08 12:02:28 -07:00
Daniel Borkmann 52c35befb6 net: sctp: wake up all assocs if sndbuf policy is per socket
SCTP charges chunks for wmem accounting via skb->truesize in
sctp_set_owner_w(), and sctp_wfree() respectively as the
reverse operation. If a sender runs out of wmem, it needs to
wait via sctp_wait_for_sndbuf(), and gets woken up by a call
to __sctp_write_space() mostly via sctp_wfree().

__sctp_write_space() is being called per association. Although
we assign sk->sk_write_space() to sctp_write_space(), which
is then being done per socket, it is only used if send space
is increased per socket option (SO_SNDBUF), as SOCK_USE_WRITE_QUEUE
is set and therefore not invoked in sock_wfree().

Commit 4c3a5bdae2 ("sctp: Don't charge for data in sndbuf
again when transmitting packet") fixed an issue where in case
sctp_packet_transmit() manages to queue up more than sndbuf
bytes, sctp_wait_for_sndbuf() will never be woken up again
unless it is interrupted by a signal. However, a still
remaining issue is that if net.sctp.sndbuf_policy=0, that is
accounting per socket, and one-to-many sockets are in use,
the reclaimed write space from sctp_wfree() is 'unfairly'
handed back on the server to the association that is the lucky
one to be woken up again via __sctp_write_space(), while
the remaining associations are never be woken up again
(unless by a signal).

The effect disappears with net.sctp.sndbuf_policy=1, that
is wmem accounting per association, as it guarantees a fair
share of wmem among associations.

Therefore, if we have reclaimed memory in case of per socket
accounting, wake all related associations to a socket in a
fair manner, that is, traverse the socket association list
starting from the current neighbour of the association and
issue a __sctp_write_space() to everyone until we end up
waking ourselves. This guarantees that no association is
preferred over another and even if more associations are
taken into the one-to-many session, all receivers will get
messages from the server and are not stalled forever on
high load. This setting still leaves the advantage of per
socket accounting in touch as an association can still use
up global limits if unused by others.

Fixes: 4eb701dfc6 ("[SCTP] Fix SCTP sendbuffer accouting.")
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Cc: Thomas Graf <tgraf@suug.ch>
Cc: Neil Horman <nhorman@tuxdriver.com>
Cc: Vlad Yasevich <vyasevic@redhat.com>
Acked-by: Vlad Yasevich <vyasevic@redhat.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-04-08 13:06:07 -04:00
Linus Torvalds e9f37d3a8d Merge branch 'drm-next' of git://people.freedesktop.org/~airlied/linux
Pull drm updates from Dave Airlie:
 "Highlights:

   - drm:

     Generic display port aux features, primary plane support, drm
     master management fixes, logging cleanups, enforced locking checks
     (instead of docs), documentation improvements, minor number
     handling cleanup, pseudofs for shared inodes.

   - ttm:

     add ability to allocate from both ends

   - i915:

     broadwell features, power domain and runtime pm, per-process
     address space infrastructure (not enabled)

   - msm:

     power management, hdmi audio support

   - nouveau:

     ongoing GPU fault recovery, initial maxwell support, random fixes

   - exynos:

     refactored driver to clean up a lot of abstraction, DP support
     moved into drm, LVDS bridge support added, parallel panel support

   - gma500:

     SGX MMU support, SGX irq handling, asle irq work fixes

   - radeon:

     video engine bringup, ring handling fixes, use dp aux helpers

   - vmwgfx:

     add rendernode support"

* 'drm-next' of git://people.freedesktop.org/~airlied/linux: (849 commits)
  DRM: armada: fix corruption while loading cursors
  drm/dp_helper: don't return EPROTO for defers (v2)
  drm/bridge: export ptn3460_init function
  drm/exynos: remove MODULE_DEVICE_TABLE definitions
  ARM: dts: exynos4412-trats2: enable exynos/fimd node
  ARM: dts: exynos4210-trats: enable exynos/fimd node
  ARM: dts: exynos4412-trats2: add panel node
  ARM: dts: exynos4210-trats: add panel node
  ARM: dts: exynos4: add MIPI DSI Master node
  drm/panel: add S6E8AA0 driver
  ARM: dts: exynos4210-universal_c210: add proper panel node
  drm/panel: add ld9040 driver
  panel/ld9040: add DT bindings
  panel/s6e8aa0: add DT bindings
  drm/exynos: add DSIM driver
  exynos/dsim: add DT bindings
  drm/exynos: disallow fbdev initialization if no device is connected
  drm/mipi_dsi: create dsi devices only for nodes with reg property
  drm/mipi_dsi: add flags to DSI messages
  Skip intel_crt_init for Dell XPS 8700
  ...
2014-04-08 09:52:16 -07:00
Dan Carpenter 7563487cbf isdnloop: several buffer overflows
There are three buffer overflows addressed in this patch.

1) In isdnloop_fake_err() we add an 'E' to a 60 character string and
then copy it into a 60 character buffer.  I have made the destination
buffer 64 characters and I'm changed the sprintf() to a snprintf().

2) In isdnloop_parse_cmd(), p points to a 6 characters into a 60
character buffer so we have 54 characters.  The ->eazlist[] is 11
characters long.  I have modified the code to return if the source
buffer is too long.

3) In isdnloop_command() the cbuf[] array was 60 characters long but the
max length of the string then can be up to 79 characters.  I made the
cbuf array 80 characters long and changed the sprintf() to snprintf().
I also removed the temporary "dial" buffer and changed it to use "p"
directly.

Unfortunately, we pass the "cbuf" string from isdnloop_command() to
isdnloop_writecmd() which truncates anything over 60 characters to make
it fit in card->omsg[].  (It can accept values up to 255 characters so
long as there is a '\n' character every 60 characters).  For now I have
just fixed the memory corruption bug and left the other problems in this
driver alone.

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-04-08 12:41:13 -04:00
Heiko Carstens 5fb6b953bb include/linux/syscalls.h: add sys_renameat2() prototype
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-04-08 09:24:25 -07:00
Catalin Marinas ebf81a938d arm64: Fix DMA range invalidation for cache line unaligned buffers
If the buffer needing cache invalidation for inbound DMA does start or
end on a cache line aligned address, we need to use the non-destructive
clean&invalidate operation. This issue was introduced by commit
7363590d2c (arm64: Implement coherent DMA API based on swiotlb).

Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Reported-by: Jon Medhurst (Tixy) <tixy@linaro.org>
2014-04-08 11:45:08 +01:00
Linus Torvalds a7963eb7f4 Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs
Pull ext3 improvements, cleanups, reiserfs fix from Jan Kara:
 "various cleanups for ext2, ext3, udf, isofs, a documentation update
  for quota, and a fix of a race in reiserfs readdir implementation"

* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
  reiserfs: fix race in readdir
  ext2: acl: remove unneeded include of linux/capability.h
  ext3: explicitly remove inode from orphan list after failed direct io
  fs/isofs/inode.c add __init to init_inodecache()
  ext3: Speedup WB_SYNC_ALL pass
  fs/quota/Kconfig: Update filesystems
  ext3: Update outdated comment before ext3_ordered_writepage()
  ext3: Update PF_MEMALLOC handling in ext3_write_inode()
  ext2/3: use prandom_u32() instead of get_random_bytes()
  ext3: remove an unneeded check in ext3_new_blocks()
  ext3: remove unneeded check in ext3_ordered_writepage()
  fs: Mark function as static in ext3/xattr_security.c
  fs: Mark function as static in ext3/dir.c
  fs: Mark function as static in ext2/xattr_security.c
  ext3: Add __init macro to init_inodecache
  ext2: Add __init macro to init_inodecache
  udf: Add __init macro to init_inodecache
  fs: udf: parse_options: blocksize check
2014-04-07 17:59:17 -07:00
Linus Torvalds b003d7706a Merge branch 'kbuild' of git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild
Pull kbuild changes from Michal Marek:
 - cleanups in the main Makefiles and Documentation/DocBook/Makefile
 - make O=...  directory is automatically created if needed
 - mrproper/distclean removes the old include/linux/version.h to make
   life easier when bisecting across the commit that moved the version.h
   file

* 'kbuild' of git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild:
  kbuild: docbook: fix the include error when executing "make help"
  kbuild: create a build directory automatically for out-of-tree build
  kbuild: remove redundant '.*.cmd' pattern from make distclean
  kbuild: move "quote" to Kbuild.include to be consistent
  kbuild: docbook: use $(obj) and $(src) rather than specific path
  kbuild: unconditionally clobber include/linux/version.h on distclean
  kbuild: docbook: specify KERNELDOC dependency correctly
  kbuild: docbook: include cmd files more simply
  kbuild: specify build_docproc as a phony target
2014-04-07 17:52:31 -07:00
Linus Torvalds 3573d3869d ARC changes for 3.15
* Support for external initrd from Noam
 * Fix broken serial console in nsimosci Virtual Platform
 * Reuse of ENTRY/END assembler macros across hand asm code
 * Other minor fixes here and there
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.14 (GNU/Linux)
 
 iQIcBAABAgAGBQJTQo6NAAoJEGnX8d3iisJeNTkP/jwPKxaDx88xbyUK0+OtmB9o
 v6BLhRI/mzdXf4KX7aq+0B3II1niYvD9e65LnQlx/f+fExyiaeIucxI4rKcFsbYH
 X1MBOKe1FNcFUhu6xSeRs1IVXR9z9bU4+H9K2U38RhZHBQtb9apDFUqTJoNy6PEP
 PO0nkwB8vaeoocVdCRWD574d8eQpUYejo5pNesQG4zLClO0IafYiP0vQWXB4dGjh
 ppfq+UR0GfXYwFgO+JbSoJFg0SmQuw02LoQ6vFZbtvZ3l7nfmv+F4SJKzDQR++vq
 nV9Z++cSdBsgDD+7nFeo1VlDkFsJY3148yFSfqf79E4JIn3VQyFFmzHasDR6GjYn
 HCW7WS9OFYmcnG6WUsH5EyJK/QbuNFHRE0kxbJap2AnYsXyQkdDkAZqQWNeG4TmO
 SI8YKMAWh2CcH0FOKntcGPmlOvyi//C28mAPKKsqO/D7YShGkhhWZG1BpbX4Vb1u
 0YFuignfAk/R6RckBXKymguRA7kloj+cI7KkvzkWmfkzJzh8qSrp6UwR8A/UwiDb
 U+ZSL3WOBaA5Hc4sU+58uUDIxPcob034vgQpxNXWAmGg3QJ62j+EGinGsj9fw3XV
 EyXTIYIDByriMu5winn9Xj7KLyaI7YZb+3HAFO76whlHD+sMAiLzKMKLUKfD8z42
 5e4R9+CsDpwnK3iCi1P5
 =dmgU
 -----END PGP SIGNATURE-----

Merge tag 'arc-v3.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc

Pull ARC changes from Vineet Gupta:
 - Support for external initrd from Noam
 - Fix broken serial console in nsimosci Virtual Platform
 - Reuse of ENTRY/END assembler macros across hand asm code
 - Other minor fixes here and there

* tag 'arc-v3.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc:
  ARC: [nsimosci] Unbork console
  ARC: [nsimosci] Change .dts to use generic 8250 UART
  ARC: [SMP] General Fixes
  ARC: Remove unused DT template file
  ARC: [clockevent] simplify timer ISR
  ARC: [clockevent] can't be SoC specific
  ARC: Remove ARC_HAS_COH_RTSC
  ARC: switch to generic ENTRY/END assembler annotations
  ARC: support external initrd
  ARC: add uImage to .gitignore
  ARC: [arcfpga] Fix __initconst data const-correctness
2014-04-07 17:51:34 -07:00
Russell King c39b06951f DRM: armada: fix corruption while loading cursors
Loading cursors to the LCD controller's SRAM can be corrupted when the
configured pixel clock is relatively slow.  This seems to be caused
when we write back-to-back to the SRAM registers.

There doesn't appear to be any status register we can read to check
when an access has completed.

Inserting a dummy read between the writes appears to fix the problem.

Cc: <stable@vger.kernel.org> # 3.13
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2014-04-08 10:51:03 +10:00
Linus Torvalds c8d9762aff Fix arm build of drivers/xen/events/
The merge of irq-core-for-linus branch broke it.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.10 (GNU/Linux)
 
 iQEcBAABAgAGBQJTQqWqAAoJEFxbo/MsZsTRCTAH/j9JC0q5OuCPoZlvhHgrOsaI
 Z8gZj9MsU5zBiGrBrp96ULHVBxe/PWSCFhPWvxykhS5d/tgpeg0l2B2n7KrsgtrE
 YSepxlyGzGK5UUYZmCCiE4fjZA2GrtHtz4tET3lu5lyHWCNgHuApmGDsoIJ+PH13
 wCokIJcBpVu+V7nsxSifguud1R9ZEA7gTwp1q5omSerrqNLPPYuxlwSRTDIS4pEl
 5mhP5C9ej+u7Dx7h4uRpk7p2YMQy1pnPtJIq3TBH2B/l/u96b+gyXQlLhf289LSP
 RwiDhurZprLZmaM8W1qrQwqsoiPY/uQQFObp0icvrarK7SfYtiot+guK3oRn1NI=
 =1P/a
 -----END PGP SIGNATURE-----

Merge tag 'stable/for-linus-3.15-tag2' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip

Pull Xen build fix from David Vrabel:
 "Fix arm build of drivers/xen/events/

  The merge of irq-core-for-linus branch broke it"

* tag 'stable/for-linus-3.15-tag2' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
  Xen: do hv callback accounting only on x86
2014-04-07 17:50:18 -07:00
Linus Torvalds 26c12d9334 Merge branch 'akpm' (incoming from Andrew)
Merge second patch-bomb from Andrew Morton:
 - the rest of MM
 - zram updates
 - zswap updates
 - exit
 - procfs
 - exec
 - wait
 - crash dump
 - lib/idr
 - rapidio
 - adfs, affs, bfs, ufs
 - cris
 - Kconfig things
 - initramfs
 - small amount of IPC material
 - percpu enhancements
 - early ioremap support
 - various other misc things

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (156 commits)
  MAINTAINERS: update Intel C600 SAS driver maintainers
  fs/ufs: remove unused ufs_super_block_third pointer
  fs/ufs: remove unused ufs_super_block_second pointer
  fs/ufs: remove unused ufs_super_block_first pointer
  fs/ufs/super.c: add __init to init_inodecache()
  doc/kernel-parameters.txt: add early_ioremap_debug
  arm64: add early_ioremap support
  arm64: initialize pgprot info earlier in boot
  x86: use generic early_ioremap
  mm: create generic early_ioremap() support
  x86/mm: sparse warning fix for early_memremap
  lglock: map to spinlock when !CONFIG_SMP
  percpu: add preemption checks to __this_cpu ops
  vmstat: use raw_cpu_ops to avoid false positives on preemption checks
  slub: use raw_cpu_inc for incrementing statistics
  net: replace __this_cpu_inc in route.c with raw_cpu_inc
  modules: use raw_cpu_write for initialization of per cpu refcount.
  mm: use raw_cpu ops for determining current NUMA node
  percpu: add raw_cpu_ops
  slub: fix leak of 'name' in sysfs_slab_add
  ...
2014-04-07 16:38:06 -07:00
Lukasz Dorau fdc5813fbb MAINTAINERS: update Intel C600 SAS driver maintainers
Signed-off-by: Lukasz Dorau <lukasz.dorau@intel.com>
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Signed-off-by: Maciej Patelczyk <maciej.patelczyk@intel.com>
Cc: Artur Paszkiewicz <artur.paszkiewicz@intel.com>
Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-04-07 16:36:16 -07:00
Christian Engelmayer fe4487d18f fs/ufs: remove unused ufs_super_block_third pointer
Pointer 'usb3' to struct ufs_super_block_third acquired via
ubh_get_usb_third() is never used in function
ufs_read_cylinder_structures().  Thus remove it.

Detected by Coverity: CID 139939.

Signed-off-by: Christian Engelmayer <cengelma@gmx.at>
Cc: Evgeniy Dushistov <dushistov@mail.ru>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-04-07 16:36:16 -07:00
Christian Engelmayer 48968a112c fs/ufs: remove unused ufs_super_block_second pointer
Pointer 'usb2' to struct ufs_super_block_second acquired via
ubh_get_usb_second() is never used in function ufs_statfs().  Thus
remove it.

Detected by Coverity: CID 139940.

Signed-off-by: Christian Engelmayer <cengelma@gmx.at>
Cc: Evgeniy Dushistov <dushistov@mail.ru>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-04-07 16:36:16 -07:00
Christian Engelmayer 6e0bd34c33 fs/ufs: remove unused ufs_super_block_first pointer
Remove occurences of unused pointers to struct ufs_super_block_first
that were acquired via ubh_get_usb_first().

Detected by Coverity: CID 139929 - CID 139936, CID 139940.

Signed-off-by: Christian Engelmayer <cengelma@gmx.at>
Cc: Evgeniy Dushistov <dushistov@mail.ru>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-04-07 16:36:16 -07:00
Fabian Frederick 76ee473578 fs/ufs/super.c: add __init to init_inodecache()
init_inodecache is only called by __init init_ufs_fs.

Signed-off-by: Fabian Frederick <fabf@skynet.be>
Cc: Evgeniy Dushistov <dushistov@mail.ru>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-04-07 16:36:16 -07:00
Mark Salter 56aeeba8c1 doc/kernel-parameters.txt: add early_ioremap_debug
Add description of early_ioremap_debug kernel parameter.

Signed-off-by: Mark Salter <msalter@redhat.com>
Cc: Borislav Petkov <borislav.petkov@amd.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-04-07 16:36:15 -07:00
Mark Salter bf4b558eba arm64: add early_ioremap support
Add support for early IO or memory mappings which are needed before the
normal ioremap() is usable.  This also adds fixmap support for permanent
fixed mappings such as that used by the earlyprintk device register
region.

Signed-off-by: Mark Salter <msalter@redhat.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Borislav Petkov <borislav.petkov@amd.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-04-07 16:36:15 -07:00
Mark Salter 0bf757c73d arm64: initialize pgprot info earlier in boot
Presently, paging_init() calls init_mem_pgprot() to initialize pgprot
values used by macros such as PAGE_KERNEL, PAGE_KERNEL_EXEC, etc.

The new fixmap and early_ioremap support also needs to use these macros
before paging_init() is called.  This patch moves the init_mem_pgprot()
call out of paging_init() and into setup_arch() so that pgprot_default
gets initialized in time for fixmap and early_ioremap.

Signed-off-by: Mark Salter <msalter@redhat.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Borislav Petkov <borislav.petkov@amd.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-04-07 16:36:15 -07:00
Mark Salter 5b7c73e009 x86: use generic early_ioremap
Move x86 over to the generic early ioremap implementation.

Signed-off-by: Mark Salter <msalter@redhat.com>
Acked-by: H. Peter Anvin <hpa@zytor.com>
Cc: Borislav Petkov <borislav.petkov@amd.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-04-07 16:36:15 -07:00
Mark Salter 9e5c33d7ae mm: create generic early_ioremap() support
This patch creates a generic implementation of early_ioremap() support
based on the existing x86 implementation.  early_ioremp() is useful for
early boot code which needs to temporarily map I/O or memory regions
before normal mapping functions such as ioremap() are available.

Some architectures have optional MMU.  In the no-MMU case, the remap
functions simply return the passed in physical address and the unmap
functions do nothing.

Signed-off-by: Mark Salter <msalter@redhat.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Acked-by: H. Peter Anvin <hpa@zytor.com>
Cc: Borislav Petkov <borislav.petkov@amd.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-04-07 16:36:15 -07:00
Dave Young 6b550f6f20 x86/mm: sparse warning fix for early_memremap
This patch series takes the common bits from the x86 early ioremap
implementation and creates a generic implementation which may be used by
other architectures.  The early ioremap interfaces are intended for
situations where boot code needs to make temporary virtual mappings
before the normal ioremap interfaces are available.  Typically, this
means before paging_init() has run.

This patch (of 6):

There's a lot of sparse warnings for code like below: void *a =
early_memremap(phys_addr, size);

early_memremap intend to map kernel memory with ioremap facility, the
return pointer should be a kernel ram pointer instead of iomem one.

For making the function clearer and supressing sparse warnings this patch
do below two things:
1. cast to (__force void *) for the return value of early_memremap
2. add early_memunmap function and pass (__force void __iomem *) to iounmap

From Boris:
  "Ingo told me yesterday, it makes sense too.  I'd guess we can try it.
   FWIW, all callers of early_memremap use the memory they get remapped
   as normal memory so we should be safe"

Signed-off-by: Dave Young <dyoung@redhat.com>
Signed-off-by: Mark Salter <msalter@redhat.com>
Acked-by: H. Peter Anvin <hpa@zytor.com>
Cc: Borislav Petkov <borislav.petkov@amd.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-04-07 16:36:14 -07:00
Josh Triplett 64b47e8fdb lglock: map to spinlock when !CONFIG_SMP
When the system has only one CPU, lglock is effectively a spinlock; map
it directly to spinlock to eliminate the indirection and duplicate code.

In addition to removing overhead, this drops 1.6k of code with a
defconfig modified to have !CONFIG_SMP, and 1.1k with a minimal config.

Signed-off-by: Josh Triplett <josh@joshtriplett.org>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Michal Marek <mmarek@suse.cz>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: David Howells <dhowells@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Nick Piggin <npiggin@kernel.dk>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-04-07 16:36:14 -07:00
Christoph Lameter 188a81409f percpu: add preemption checks to __this_cpu ops
We define a check function in order to avoid trouble with the include
files.  Then the higher level __this_cpu macros are modified to invoke
the preemption check.

[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Christoph Lameter <cl@linux.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
Cc: Tejun Heo <tj@kernel.org>
Tested-by: Grygorii Strashko <grygorii.strashko@ti.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-04-07 16:36:14 -07:00
Christoph Lameter 293b6a4c87 vmstat: use raw_cpu_ops to avoid false positives on preemption checks
vm counters are allowed to be racy.  Use raw_cpu_ops to avoid the
local_irq_disable overhead and to avoid preemption checks which will be
added to the __this_cpu operations.

[akpm@linux-foundation.org: Add comment.  Again.]
Signed-off-by: Christoph Lameter <cl@linux.com>
Reported-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-04-07 16:36:14 -07:00
Christoph Lameter 88da03a676 slub: use raw_cpu_inc for incrementing statistics
Statistics are not critical to the operation of the allocation but
should also not cause too much overhead.

When __this_cpu_inc is altered to check if preemption is disabled this
triggers.  Use raw_cpu_inc to avoid the checks.  Using this_cpu_ops may
cause interrupt disable/enable sequences on various arches which may
significantly impact allocator performance.

[akpm@linux-foundation.org: add comment]
Signed-off-by: Christoph Lameter <cl@linux.com>
Cc: Fengguang Wu <fengguang.wu@intel.com>
Cc: Pekka Enberg <penberg@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-04-07 16:36:14 -07:00
Christoph Lameter 3ed66e910c net: replace __this_cpu_inc in route.c with raw_cpu_inc
The RT_CACHE_STAT_INC macro triggers the new preemption checks
for __this_cpu ops.

I do not see any other synchronization that would allow the use of a
__this_cpu operation here however in commit dbd2915ce8 ("[IPV4]:
RT_CACHE_STAT_INC() warning fix") Andrew justifies the use of
raw_smp_processor_id() here because "we do not care" about races.  In
the past we agreed that the price of disabling interrupts here to get
consistent counters would be too high.  These counters may be inaccurate
due to race conditions.

The use of __this_cpu op improves the situation already from what commit
dbd2915ce8 did since the single instruction emitted on x86 does not
allow the race to occur anymore.  However, non x86 platforms could still
experience a race here.

Trace:

  __this_cpu_add operation in preemptible [00000000] code: avahi-daemon/1193
  caller is __this_cpu_preempt_check+0x38/0x60
  CPU: 1 PID: 1193 Comm: avahi-daemon Tainted: GF            3.12.0-rc4+ #187
  Call Trace:
    check_preemption_disabled+0xec/0x110
    __this_cpu_preempt_check+0x38/0x60
    __ip_route_output_key+0x575/0x8c0
    ip_route_output_flow+0x27/0x70
    udp_sendmsg+0x825/0xa20
    inet_sendmsg+0x85/0xc0
    sock_sendmsg+0x9c/0xd0
    ___sys_sendmsg+0x37c/0x390
    __sys_sendmsg+0x49/0x90
    SyS_sendmsg+0x12/0x20
    tracesys+0xe1/0xe6

Signed-off-by: Christoph Lameter <cl@linux.com>
Acked-by: David S. Miller <davem@davemloft.net>
Acked-by: Ingo Molnar <mingo@kernel.org>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Tejun Heo <tj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-04-07 16:36:14 -07:00
Christoph Lameter 08f141d3db modules: use raw_cpu_write for initialization of per cpu refcount.
The initialization of a structure is not subject to synchronization.
The use of __this_cpu would trigger a false positive with the additional
preemption checks for __this_cpu ops.

So simply disable the check through the use of raw_cpu ops.

Trace:

  __this_cpu_write operation in preemptible [00000000] code: modprobe/286
  caller is __this_cpu_preempt_check+0x38/0x60
  CPU: 3 PID: 286 Comm: modprobe Tainted: GF            3.12.0-rc4+ #187
  Call Trace:
    dump_stack+0x4e/0x82
    check_preemption_disabled+0xec/0x110
    __this_cpu_preempt_check+0x38/0x60
    load_module+0xcfd/0x2650
    SyS_init_module+0xa6/0xd0
    tracesys+0xe1/0xe6

Signed-off-by: Christoph Lameter <cl@linux.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Rusty Russell <rusty@rustcorp.com.au>
Cc: Tejun Heo <tj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-04-07 16:36:14 -07:00
Christoph Lameter dc322a99d3 mm: use raw_cpu ops for determining current NUMA node
With the preempt checking logic for __this_cpu_ops we will get false
positives from locations in the code that use numa_node_id.

Before the __this_cpu ops where introduced there were no checks for
preemption present either.  smp_raw_processor_id() was used.  See

  http://www.spinics.net/lists/linux-numa/msg00641.html

Therefore we need to use raw_cpu_read here to avoid false postives.

Note that this issue has been discussed in prior years.  If the process
changes nodes after retrieving the current numa node then that is
acceptable since most uses of numa_node etc are for optimization and not
for correctness.

There were suggestions to implement a raw_numa_node_id in order to do
preempt checks for numa_node_id as well.  But I think we better defer
that to another patch since that would mean investigating how
numa_node_id() is used throughout the kernel which would increase the
scope of this patchset significantly.  After all preemption was never
checked before when numa_node_id() was used.

Some sample traces:

__this_cpu_read operation in preemptible [00000000] code: login/1456
caller is __this_cpu_preempt_check+0x2b/0x2d
CPU: 0 PID: 1456 Comm: login Not tainted 3.12.0-rc4-cl-00062-g2fe80d3-dirty #185
Call Trace:
  dump_stack+0x4e/0x82
  check_preemption_disabled+0xc5/0xe0
  __this_cpu_preempt_check+0x2b/0x2d
  get_task_policy+0x1d/0x49
  get_vma_policy+0x14/0x76
  alloc_pages_vma+0x35/0xff
  handle_mm_fault+0x290/0x73b
  __do_page_fault+0x3fe/0x44d
  do_page_fault+0x9/0xc
  page_fault+0x22/0x30
  generic_file_aio_read+0x38e/0x624
  do_sync_read+0x54/0x73
  vfs_read+0x9d/0x12a
  SyS_read+0x47/0x7e
  cstar_dispatch+0x7/0x23

caller is __this_cpu_preempt_check+0x2b/0x2d
CPU: 0 PID: 1456 Comm: login Not tainted 3.12.0-rc4-cl-00062-g2fe80d3-dirty #185
Call Trace:
  dump_stack+0x4e/0x82
  check_preemption_disabled+0xc5/0xe0
  __this_cpu_preempt_check+0x2b/0x2d
  alloc_pages_current+0x8f/0xbc
  __page_cache_alloc+0xb/0xd
  __do_page_cache_readahead+0xf4/0x219
  ra_submit+0x1c/0x20
  ondemand_readahead+0x28c/0x2b4
  page_cache_sync_readahead+0x38/0x3a
  generic_file_aio_read+0x261/0x624
  do_sync_read+0x54/0x73
  vfs_read+0x9d/0x12a
  SyS_read+0x47/0x7e
  cstar_dispatch+0x7/0x23

Signed-off-by: Christoph Lameter <cl@linux.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
Cc: Alex Shi <alex.shi@intel.com>
Cc: Tejun Heo <tj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-04-07 16:36:13 -07:00