Commit Graph

534501 Commits

Author SHA1 Message Date
Naoya Horiguchi f4c18e6f7b mm: check __PG_HWPOISON separately from PAGE_FLAGS_CHECK_AT_*
The race condition addressed in commit add05cecef ("mm: soft-offline:
don't free target page in successful page migration") was not closed
completely, because that can happen not only for soft-offline, but also
for hard-offline.  Consider that a slab page is about to be freed into
buddy pool, and then an uncorrected memory error hits the page just
after entering __free_one_page(), then VM_BUG_ON_PAGE(page->flags &
PAGE_FLAGS_CHECK_AT_PREP) is triggered, despite the fact that it's not
necessary because the data on the affected page is not consumed.

To solve it, this patch drops __PG_HWPOISON from page flag checks at
allocation/free time.  I think it's justified because __PG_HWPOISON
flags is defined to prevent the page from being reused, and setting it
outside the page's alloc-free cycle is a designed behavior (not a bug.)

For recent months, I was annoyed about BUG_ON when soft-offlined page
remains on lru cache list for a while, which is avoided by calling
put_page() instead of putback_lru_page() in page migration's success
path.  This means that this patch reverts a major change from commit
add05cecef about the new refcounting rule of soft-offlined pages, so
"reuse window" revives.  This will be closed by a subsequent patch.

Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Dean Nelson <dnelson@redhat.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
Cc: Hugh Dickins <hughd@google.com>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-08-07 04:39:42 +03:00
Naoya Horiguchi 98ed2b0052 mm/memory-failure: give up error handling for non-tail-refcounted thp
"non anonymous thp" case is still racy with freeing thp, which causes
panic due to put_page() for refcount-0 page.  It seems that closing up
this race might be hard (and/or not worth doing,) so let's give up the
error handling for this case.

Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Dean Nelson <dnelson@redhat.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
Cc: Hugh Dickins <hughd@google.com>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-08-07 04:39:41 +03:00
Naoya Horiguchi a209ef09af mm/memory-failure: fix race in counting num_poisoned_pages
When memory_failure() is called on a page which are just freed after
page migration from soft offlining, the counter num_poisoned_pages is
raised twi= ce.  So let's fix it with using TestSetPageHWPoison.

Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Dean Nelson <dnelson@redhat.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
Cc: Hugh Dickins <hughd@google.com>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-08-07 04:39:41 +03:00
Naoya Horiguchi a09233f3e1 mm/memory-failure: unlock_page before put_page
Recently I addressed a few of hwpoison race problems and the patches are
merged on v4.2-rc1.  It made progress, but unfortunately some problems
still remain due to less coverage of my testing.  So I'm trying to fix
or avoid them in this series.

One point I'm expecting to discuss is that patch 4/5 changes the page
flag set to be checked on free time.  In current behavior, __PG_HWPOISON
is not supposed to be set when the page is freed.  I think that there is
no strong reason for this behavior, and it causes a problem hard to fix
only in error handler side (because __PG_HWPOISON could be set at
arbitrary timing.) So I suggest to change it.

With this patchset, hwpoison stress testing in official mce-test
testsuite (which previously failed) passes.

This patch (of 5):

In "just unpoisoned" path, we do put_page and then unlock_page, which is
a wrong order and causes "freeing locked page" bug.  So let's fix it.

Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Dean Nelson <dnelson@redhat.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
Cc: Hugh Dickins <hughd@google.com>
Acked-by: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-08-07 04:39:41 +03:00
Stephen Smalley e1832f2923 ipc: use private shmem or hugetlbfs inodes for shm segments.
The shm implementation internally uses shmem or hugetlbfs inodes for shm
segments.  As these inodes are never directly exposed to userspace and
only accessed through the shm operations which are already hooked by
security modules, mark the inodes with the S_PRIVATE flag so that inode
security initialization and permission checking is skipped.

This was motivated by the following lockdep warning:

  ======================================================
   [ INFO: possible circular locking dependency detected ]
   4.2.0-0.rc3.git0.1.fc24.x86_64+debug #1 Tainted: G        W
  -------------------------------------------------------
   httpd/1597 is trying to acquire lock:
   (&ids->rwsem){+++++.}, at: shm_close+0x34/0x130
   but task is already holding lock:
   (&mm->mmap_sem){++++++}, at: SyS_shmdt+0x4b/0x180
   which lock already depends on the new lock.
   the existing dependency chain (in reverse order) is:
   -> #3 (&mm->mmap_sem){++++++}:
        lock_acquire+0xc7/0x270
        __might_fault+0x7a/0xa0
        filldir+0x9e/0x130
        xfs_dir2_block_getdents.isra.12+0x198/0x1c0 [xfs]
        xfs_readdir+0x1b4/0x330 [xfs]
        xfs_file_readdir+0x2b/0x30 [xfs]
        iterate_dir+0x97/0x130
        SyS_getdents+0x91/0x120
        entry_SYSCALL_64_fastpath+0x12/0x76
   -> #2 (&xfs_dir_ilock_class){++++.+}:
        lock_acquire+0xc7/0x270
        down_read_nested+0x57/0xa0
        xfs_ilock+0x167/0x350 [xfs]
        xfs_ilock_attr_map_shared+0x38/0x50 [xfs]
        xfs_attr_get+0xbd/0x190 [xfs]
        xfs_xattr_get+0x3d/0x70 [xfs]
        generic_getxattr+0x4f/0x70
        inode_doinit_with_dentry+0x162/0x670
        sb_finish_set_opts+0xd9/0x230
        selinux_set_mnt_opts+0x35c/0x660
        superblock_doinit+0x77/0xf0
        delayed_superblock_init+0x10/0x20
        iterate_supers+0xb3/0x110
        selinux_complete_init+0x2f/0x40
        security_load_policy+0x103/0x600
        sel_write_load+0xc1/0x750
        __vfs_write+0x37/0x100
        vfs_write+0xa9/0x1a0
        SyS_write+0x58/0xd0
        entry_SYSCALL_64_fastpath+0x12/0x76
  ...

Signed-off-by: Stephen Smalley <sds@tycho.nsa.gov>
Reported-by: Morten Stevens <mstevens@fedoraproject.org>
Acked-by: Hugh Dickins <hughd@google.com>
Acked-by: Paul Moore <paul@paul-moore.com>
Cc: Manfred Spraul <manfred@colorfullife.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Eric Paris <eparis@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-08-07 04:39:41 +03:00
Mel Gorman e298ff75f1 mm: initialize hotplugged pages as reserved
Commit 92923ca3aa ("mm: meminit: only set page reserved in the
memblock region") broke memory hotplug which expects the memmap for
newly added sections to be reserved until onlined by
online_pages_range().  This patch marks hotplugged pages as reserved
when adding new zones.

Signed-off-by: Mel Gorman <mgorman@suse.de>
Reported-by: David Vrabel <david.vrabel@citrix.com>
Tested-by: David Vrabel <david.vrabel@citrix.com>
Cc: Nathan Zimmer <nzimmer@sgi.com>
Cc: Robin Holt <holt@sgi.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-08-07 04:39:41 +03:00
Joseph Qi 32e5a2a2be ocfs2: fix shift left overflow
When using a large volume, for example 9T volume with 2T already used,
frequent creation of small files with O_DIRECT when the IO is not
cluster aligned may clear sectors in the wrong place.  This will cause
filesystem corruption.

This is because p_cpos is a u32.  When calculating the corresponding
sector it should be converted to u64 first, otherwise it may overflow.

Signed-off-by: Joseph Qi <joseph.qi@huawei.com>
Cc: Mark Fasheh <mfasheh@suse.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: <stable@vger.kernel.org>	[4.0+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-08-07 04:39:41 +03:00
David Kershner 18896451ea kthread: export kthread functions
The s-Par visornic driver, currently in staging, processes a queue being
serviced by the an s-Par service partition.  We can get a message that
something has happened with the Service Partition, when that happens, we
must not access the channel until we get a message that the service
partition is back again.

The visornic driver has a thread for processing the channel, when we get
the message, we need to be able to park the thread and then resume it
when the problem clears.

We can do this with kthread_park and unpark but they are not exported
from the kernel, this patch exports the needed functions.

Signed-off-by: David Kershner <david.kershner@unisys.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Richard Weinberger <richard.weinberger@gmail.com>
Cc: Tejun Heo <tj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-08-07 04:39:41 +03:00
Jan Kara 8f2f3eb59d fsnotify: fix oops in fsnotify_clear_marks_by_group_flags()
fsnotify_clear_marks_by_group_flags() can race with
fsnotify_destroy_marks() so that when fsnotify_destroy_mark_locked()
drops mark_mutex, a mark from the list iterated by
fsnotify_clear_marks_by_group_flags() can be freed and thus the next
entry pointer we have cached may become stale and we dereference free
memory.

Fix the problem by first moving marks to free to a special private list
and then always free the first entry in the special list.  This method
is safe even when entries from the list can disappear once we drop the
lock.

Signed-off-by: Jan Kara <jack@suse.com>
Reported-by: Ashish Sangwan <a.sangwan@samsung.com>
Reviewed-by: Ashish Sangwan <a.sangwan@samsung.com>
Cc: Lino Sanfilippo <LinoSanfilippo@gmx.de>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-08-07 04:39:41 +03:00
Sowmini Varadhan 447f6a95a9 lib/iommu-common.c: do not use 0xffffffffffffffffl for computing align_mask
Using a 64 bit constant generates "warning: integer constant is too
large for 'long' type" on 32 bit platforms.  Instead use ~0ul and
BITS_PER_LONG.

Detected by Andrew Morton on ARMD.

Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: David S. Miller <davem@davemloft.net>
Cc: Guenter Roeck <linux@roeck-us.net>
Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-08-07 04:39:41 +03:00
Konstantin Khlebnikov 3e810ae2db mm/slub: allow merging when SLAB_DEBUG_FREE is set
This patch fixes creation of new kmem-caches after enabling
sanity_checks for existing mergeable kmem-caches in runtime: before that
patch creation fails because unique name in sysfs already taken by
existing kmem-cache.

Unlike other debug options this doesn't change object layout and could
be enabled and disabled at any time.

Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Acked-by: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Acked-by: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-08-07 04:39:40 +03:00
Amanieu d'Antras 3ead7c52bd signalfd: fix information leak in signalfd_copyinfo
This function may copy the si_addr_lsb field to user mode when it hasn't
been initialized, which can leak kernel stack data to user mode.

Just checking the value of si_code is insufficient because the same
si_code value is shared between multiple signals.  This is solved by
checking the value of si_signo in addition to si_code.

Signed-off-by: Amanieu d'Antras <amanieu@gmail.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-08-07 04:39:40 +03:00
Amanieu d'Antras 26135022f8 signal: fix information leak in copy_siginfo_to_user
This function may copy the si_addr_lsb, si_lower and si_upper fields to
user mode when they haven't been initialized, which can leak kernel
stack data to user mode.

Just checking the value of si_code is insufficient because the same
si_code value is shared between multiple signals.  This is solved by
checking the value of si_signo in addition to si_code.

Signed-off-by: Amanieu d'Antras <amanieu@gmail.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Russell King <rmk@arm.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-08-07 04:39:40 +03:00
Amanieu d'Antras 3c00cb5e68 signal: fix information leak in copy_siginfo_from_user32
This function can leak kernel stack data when the user siginfo_t has a
positive si_code value.  The top 16 bits of si_code descibe which fields
in the siginfo_t union are active, but they are treated inconsistently
between copy_siginfo_from_user32, copy_siginfo_to_user32 and
copy_siginfo_to_user.

copy_siginfo_from_user32 is called from rt_sigqueueinfo and
rt_tgsigqueueinfo in which the user has full control overthe top 16 bits
of si_code.

This fixes the following information leaks:
x86:   8 bytes leaked when sending a signal from a 32-bit process to
       itself. This leak grows to 16 bytes if the process uses x32.
       (si_code = __SI_CHLD)
x86:   100 bytes leaked when sending a signal from a 32-bit process to
       a 64-bit process. (si_code = -1)
sparc: 4 bytes leaked when sending a signal from a 32-bit process to a
       64-bit process. (si_code = any)

parsic and s390 have similar bugs, but they are not vulnerable because
rt_[tg]sigqueueinfo have checks that prevent sending a positive si_code
to a different process.  These bugs are also fixed for consistency.

Signed-off-by: Amanieu d'Antras <amanieu@gmail.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Chris Metcalf <cmetcalf@ezchip.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-08-07 04:39:40 +03:00
Joseph Qi 209f7512d0 ocfs2: fix BUG in ocfs2_downconvert_thread_do_work()
The "BUG_ON(list_empty(&osb->blocked_lock_list))" in
ocfs2_downconvert_thread_do_work can be triggered in the following case:

ocfs2dc has firstly saved osb->blocked_lock_count to local varibale
processed, and then processes the dentry lockres.  During the dentry
put, it calls iput and then deletes rw, inode and open lockres from
blocked list in ocfs2_mark_lockres_freeing.  And this causes the
variable `processed' to not reflect the number of blocked lockres to be
processed, which triggers the BUG.

Signed-off-by: Joseph Qi <joseph.qi@huawei.com>
Cc: Mark Fasheh <mfasheh@suse.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-08-07 04:39:40 +03:00
Mel Gorman 4248b0da46 fs, file table: reinit files_stat.max_files after deferred memory initialisation
Dave Hansen reported the following;

	My laptop has been behaving strangely with 4.2-rc2.  Once I log
	in to my X session, I start getting all kinds of strange errors
	from applications and see this in my dmesg:

        	VFS: file-max limit 8192 reached

The problem is that the file-max is calculated before memory is fully
initialised and miscalculates how much memory the kernel is using.  This
patch recalculates file-max after deferred memory initialisation.  Note
that using memory hotplug infrastructure would not have avoided this
problem as the value is not recalculated after memory hot-add.

4.1:             files_stat.max_files = 6582781
4.2-rc2:         files_stat.max_files = 8192
4.2-rc2 patched: files_stat.max_files = 6562467

Small differences with the patch applied and 4.1 but not enough to matter.

Signed-off-by: Mel Gorman <mgorman@suse.de>
Reported-by: Dave Hansen <dave.hansen@intel.com>
Cc: Nicolai Stange <nicstange@gmail.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Alex Ng <alexng@microsoft.com>
Cc: Fengguang Wu <fengguang.wu@intel.com>
Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-08-07 04:39:40 +03:00
Nicolai Stange d3cd131d93 mm, meminit: replace rwsem with completion
Commit 0e1cc95b4c ("mm: meminit: finish initialisation of struct pages
before basic setup") introduced a rwsem to signal completion of the
initialization workers.

Lockdep complains about possible recursive locking:
  =============================================
  [ INFO: possible recursive locking detected ]
  4.1.0-12802-g1dc51b8 #3 Not tainted
  ---------------------------------------------
  swapper/0/1 is trying to acquire lock:
  (pgdat_init_rwsem){++++.+},
    at: [<ffffffff8424c7fb>] page_alloc_init_late+0xc7/0xe6

  but task is already holding lock:
  (pgdat_init_rwsem){++++.+},
    at: [<ffffffff8424c772>] page_alloc_init_late+0x3e/0xe6

Replace the rwsem by a completion together with an atomic
"outstanding work counter".

[peterz@infradead.org: Barrier removal on the grounds of being pointless]
[mgorman@suse.de: Applied review feedback]
Signed-off-by: Nicolai Stange <nicstange@gmail.com>
Signed-off-by: Mel Gorman <mgorman@suse.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Alex Ng <alexng@microsoft.com>
Cc: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-08-07 04:39:40 +03:00
Mel Gorman 7ace991707 mm, meminit: allow early_pfn_to_nid to be used during runtime
early_pfn_to_nid() historically was inherently not SMP safe but only
used during boot which is inherently single threaded or during hotplug
which is protected by a giant mutex.

With deferred memory initialisation there was a thread-safe version
introduced and the early_pfn_to_nid would trigger a BUG_ON if used
unsafely.  Memory hotplug hit that check.  This patch makes
early_pfn_to_nid introduces a lock to make it safe to use during
hotplug.

Signed-off-by: Mel Gorman <mgorman@suse.de>
Reported-by: Alex Ng <alexng@microsoft.com>
Tested-by: Alex Ng <alexng@microsoft.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Nicolai Stange <nicstange@gmail.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-08-07 04:39:40 +03:00
Marcus Gelderie de54b9ac25 ipc: modify message queue accounting to not take kernel data structures into account
A while back, the message queue implementation in the kernel was
improved to use btrees to speed up retrieval of messages, in commit
d6629859b3 ("ipc/mqueue: improve performance of send/recv").

That patch introducing the improved kernel handling of message queues
(using btrees) has, as a by-product, changed the meaning of the QSIZE
field in the pseudo-file created for the queue.  Before, this field
reflected the size of the user-data in the queue.  Since, it also takes
kernel data structures into account.  For example, if 13 bytes of user
data are in the queue, on my machine the file reports a size of 61
bytes.

There was some discussion on this topic before (for example
https://lkml.org/lkml/2014/10/1/115).  Commenting on a th lkml, Michael
Kerrisk gave the following background
(https://lkml.org/lkml/2015/6/16/74):

    The pseudofiles in the mqueue filesystem (usually mounted at
    /dev/mqueue) expose fields with metadata describing a message
    queue. One of these fields, QSIZE, as originally implemented,
    showed the total number of bytes of user data in all messages in
    the message queue, and this feature was documented from the
    beginning in the mq_overview(7) page. In 3.5, some other (useful)
    work happened to break the user-space API in a couple of places,
    including the value exposed via QSIZE, which now includes a measure
    of kernel overhead bytes for the queue, a figure that renders QSIZE
    useless for its original purpose, since there's no way to deduce
    the number of overhead bytes consumed by the implementation.
    (The other user-space breakage was subsequently fixed.)

This patch removes the accounting of kernel data structures in the
queue.  Reporting the size of these data-structures in the QSIZE field
was a breaking change (see Michael's comment above).  Without the QSIZE
field reporting the total size of user-data in the queue, there is no
way to deduce this number.

It should be noted that the resource limit RLIMIT_MSGQUEUE is counted
against the worst-case size of the queue (in both the old and the new
implementation).  Therefore, the kernel overhead accounting in QSIZE is
not necessary to help the user understand the limitations RLIMIT imposes
on the processes.

Signed-off-by: Marcus Gelderie <redmnic@gmail.com>
Acked-by: Doug Ledford <dledford@redhat.com>
Acked-by: Michael Kerrisk <mtk.manpages@gmail.com>
Acked-by: Davidlohr Bueso <dbueso@suse.de>
Cc: David Howells <dhowells@redhat.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: John Duffy <jb_duffy@btinternet.com>
Cc: Arto Bendiken <arto@bendiken.net>
Cc: Manfred Spraul <manfred@colorfullife.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-08-07 04:39:39 +03:00
Linus Torvalds 4469942bbb Just two very small & simple patches.
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iQEcBAABAgAGBQJVwhncAAoJEL/70l94x66Dy7IIAJXfraikJQ9ghhLhjrP+5f5H
 MNBL+e3jKGmGVgItrtOMcLlJJvPkFNBkFMmYRJtdawezu46eFBLnIoTp8ZcG6cvu
 5Gjs1PNfq1nP5IzWsYYbohlaf1xkij+Jm2JZ/fxuEGC6xM91WVGV7YENt87S7O16
 ZdfhhEFHTTe+Fg86QwDGZ2bOhTBwZEAaVFM6siCml/WiqYtecwzEn19OiP6XeVbO
 FczG7CUXumrPnEohYrAVrCtIIb5dGzUCstQGlo3bC7CJ/G6CjaBl4cSd6Y/BHkhD
 KV6M7VJxjJ84HAKy9PMhC2iPC7H7Vfjg1iq6czHWu/Tida0d6dBiVzLVKcz2jj4=
 =SYMM
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull KVM fixes from Paolo Bonzini:
 "Just two very small & simple patches"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
  KVM: MTRR: Use default type for non-MTRR-covered gfn before WARN_ON
  KVM: s390: Fix hang VCPU hang/loop regression
2015-08-05 18:50:38 +03:00
Alex Williamson fc1a8126bf KVM: MTRR: Use default type for non-MTRR-covered gfn before WARN_ON
The patch was munged on commit to re-order these tests resulting in
excessive warnings when trying to do device assignment.  Return to
original ordering: https://lkml.org/lkml/2015/7/15/769

Fixes: 3e5d2fdced ("KVM: MTRR: simplify kvm_mtrr_get_guest_memory_type")
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Reviewed-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-08-05 11:57:57 +02:00
Linus Torvalds 4e6b6ee253 Three more fixes for md in 4.2
Mostly corner-case stuff.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iQIcBAABCAAGBQJVvxRBAAoJEDnsnt1WYoG5OsgQAJS0pG3E44L0c+JVzgkoJQbm
 xusoPJrFo6ALCwZ5w4wvEhUFJkvY4qh4A9zG0oaRAAKBibJ4CVZ+NWADQfbPPIUe
 QK6IRbkYRNvZEP19ZcdeU3J51PAA+tFqhaSE4G5JZzoUr6RXgyQ7fYeY/wNmoy0+
 zJEDxkVHJmasRm6/F+8/nOtnQLY8L9eRYZp6n186mykVVb4HBCKXVvGh+0+Wfsr8
 svob+bTWPsSELDm9Rt8JESRXE91HAxz+stuCP20tfrHULkvMavA91Ni16D3o9Okd
 2eMvIAovS2cFa1pwOMw9HsRYcg5sOlylw27v56286mIfZVi2ndS7wU6QCB9n0YUw
 FjkDGGRnFFh24scC2np4SdjrTudELN6OGfWHrYxDxe/ZZcTXw3rTlC5Y6fhkNERO
 QjsBvJbjnlUBoiLAdvlKrEFwyB5GUqux+eSGJTTcXG/2fjDYDnaC1/qwTxDdRoyA
 +wCcvjvK1tOQIUlhFfLGC4po33rCfG904TDXDPiDDLtjM4iFArOlckI4cIFtLB6S
 feBgYiPDgtbwl2FzdXdLu9OT6wgsMGLvN8MdZ0Xy1qrqQ2ouGVSa+SZgKqdvMXpu
 EktXUX1PRCmo5mcAwMA0VXmFY4yPlmNFDX6i4MVT2qV2JSwzkyM6A0P7smfWzD/U
 MrH8jZzQEojqeqYDZYIx
 =umiE
 -----END PGP SIGNATURE-----

Merge tag 'md/4.2-rc5-fixes' of git://neil.brown.name/md

Pull md fixes from Neil Brown:
 "Three more fixes for md in 4.2

  Mostly corner-case stuff.

  One of these patches is for a CVE: CVE-2015-5697

  I'm not convinced it is serious (data leak from CAP_SYS_ADMIN ioctl)
  but as people seem to want to back-port it, I've included a minimal
  version here.  The remainder of that patch from Benjamin is
  code-cleanup and will arrive in the 4.3 merge window"

* tag 'md/4.2-rc5-fixes' of git://neil.brown.name/md:
  md/raid5: don't let shrink_slab shrink too far.
  md: use kzalloc() when bitmap is disabled
  md/raid1: extend spinlock to protect raid1_end_read_request against inconsistencies
2015-08-05 11:02:42 +02:00
Linus Torvalds 9e91edcd1b Merge branch 'for-4.2' of git://linux-nfs.org/~bfields/linux
Pull nfsd fixes from Bruce Fields.

* 'for-4.2' of git://linux-nfs.org/~bfields/linux:
  nfsd: do nfs4_check_fh in nfs4_check_file instead of nfs4_check_olstateid
  nfsd: Fix a file leak on nfsd4_layout_setlease failure
  nfsd: Drop BUG_ON and ignore SECLABEL on absent filesystem
2015-08-05 10:59:59 +02:00
Michal Hocko ecf5fc6e96 mm, vmscan: Do not wait for page writeback for GFP_NOFS allocations
Nikolay has reported a hang when a memcg reclaim got stuck with the
following backtrace:

PID: 18308  TASK: ffff883d7c9b0a30  CPU: 1   COMMAND: "rsync"
  #0 __schedule at ffffffff815ab152
  #1 schedule at ffffffff815ab76e
  #2 schedule_timeout at ffffffff815ae5e5
  #3 io_schedule_timeout at ffffffff815aad6a
  #4 bit_wait_io at ffffffff815abfc6
  #5 __wait_on_bit at ffffffff815abda5
  #6 wait_on_page_bit at ffffffff8111fd4f
  #7 shrink_page_list at ffffffff81135445
  #8 shrink_inactive_list at ffffffff81135845
  #9 shrink_lruvec at ffffffff81135ead
 #10 shrink_zone at ffffffff811360c3
 #11 shrink_zones at ffffffff81136eff
 #12 do_try_to_free_pages at ffffffff8113712f
 #13 try_to_free_mem_cgroup_pages at ffffffff811372be
 #14 try_charge at ffffffff81189423
 #15 mem_cgroup_try_charge at ffffffff8118c6f5
 #16 __add_to_page_cache_locked at ffffffff8112137d
 #17 add_to_page_cache_lru at ffffffff81121618
 #18 pagecache_get_page at ffffffff8112170b
 #19 grow_dev_page at ffffffff811c8297
 #20 __getblk_slow at ffffffff811c91d6
 #21 __getblk_gfp at ffffffff811c92c1
 #22 ext4_ext_grow_indepth at ffffffff8124565c
 #23 ext4_ext_create_new_leaf at ffffffff81246ca8
 #24 ext4_ext_insert_extent at ffffffff81246f09
 #25 ext4_ext_map_blocks at ffffffff8124a848
 #26 ext4_map_blocks at ffffffff8121a5b7
 #27 mpage_map_one_extent at ffffffff8121b1fa
 #28 mpage_map_and_submit_extent at ffffffff8121f07b
 #29 ext4_writepages at ffffffff8121f6d5
 #30 do_writepages at ffffffff8112c490
 #31 __filemap_fdatawrite_range at ffffffff81120199
 #32 filemap_flush at ffffffff8112041c
 #33 ext4_alloc_da_blocks at ffffffff81219da1
 #34 ext4_rename at ffffffff81229b91
 #35 ext4_rename2 at ffffffff81229e32
 #36 vfs_rename at ffffffff811a08a5
 #37 SYSC_renameat2 at ffffffff811a3ffc
 #38 sys_renameat2 at ffffffff811a408e
 #39 sys_rename at ffffffff8119e51e
 #40 system_call_fastpath at ffffffff815afa89

Dave Chinner has properly pointed out that this is a deadlock in the
reclaim code because ext4 doesn't submit pages which are marked by
PG_writeback right away.

The heuristic was introduced by commit e62e384e9d ("memcg: prevent OOM
with too many dirty pages") and it was applied only when may_enter_fs
was specified.  The code has been changed by c3b94f44fc ("memcg:
further prevent OOM with too many dirty pages") which has removed the
__GFP_FS restriction with a reasoning that we do not get into the fs
code.  But this is not sufficient apparently because the fs doesn't
necessarily submit pages marked PG_writeback for IO right away.

ext4_bio_write_page calls io_submit_add_bh but that doesn't necessarily
submit the bio.  Instead it tries to map more pages into the bio and
mpage_map_one_extent might trigger memcg charge which might end up
waiting on a page which is marked PG_writeback but hasn't been submitted
yet so we would end up waiting for something that never finishes.

Fix this issue by replacing __GFP_IO by may_enter_fs check (for case 2)
before we go to wait on the writeback.  The page fault path, which is
the only path that triggers memcg oom killer since 3.12, shouldn't
require GFP_NOFS and so we shouldn't reintroduce the premature OOM
killer issue which was originally addressed by the heuristic.

As per David Chinner the xfs is doing similar thing since 2.6.15 already
so ext4 is not the only affected filesystem.  Moreover he notes:

: For example: IO completion might require unwritten extent conversion
: which executes filesystem transactions and GFP_NOFS allocations. The
: writeback flag on the pages can not be cleared until unwritten
: extent conversion completes. Hence memory reclaim cannot wait on
: page writeback to complete in GFP_NOFS context because it is not
: safe to do so, memcg reclaim or otherwise.

Cc: stable@vger.kernel.org # 3.9+
[tytso@mit.edu: corrected the control flow]
Fixes: c3b94f44fc ("memcg: further prevent OOM with too many dirty pages")
Reported-by: Nikolay Borisov <kernel@kyup.com>
Signed-off-by: Michal Hocko <mhocko@suse.cz>
Signed-off-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-08-05 10:49:38 +02:00
Linus Torvalds 6c84461c0c PCI update for v4.2:
Miscellaneous
     - Restore PCI_MSIX_FLAGS_BIRMASK definition (Michael S. Tsirkin)
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJVwOPGAAoJEFmIoMA60/r8HrEP/0268Kiy74sbmXJs6FU8fq37
 Bwq9W+EnLAhz2fXK+/XFz2vnPN8Qdpq02Xo1SK5uDZP7KPUkNX24j7WH4GxoSuoH
 F0Dh49IvvbsTWzcC0nDDZk8sV0eYZ0viix1FINZDuWk9G1BHf6uXMv/VBagzkLKj
 Tvqizz/lvIWya6G34POj3KnKtrsIpiJO7wZlTn/A6vmpoxeTGe810yJPosEOUt+D
 K2WVrg5xY+msm86bZUk5dk2KRHPwLRfcCLgv/I/lU7AQ9XznHaZHyeggpOCd3XTV
 BQWOTFuX8dXOlcVpsMuV5f60mhNziAkuu2mG2Xd9uyPJLF4N3EuBDi78BdeqPU/i
 gfx/nlX6yNplpJD0+XyIwmP2GbXQWD2T6Xmo8Ulls2n0WE4aFdj+BmpqxPYUn2rC
 MwmDM4gOwsEUEfHQ4GMa5V84xbIO40OW80ywno1Ug+WaHlrO/8QzGsAcBIORe1X6
 1ljwkgMEMSFH9o5PyUVoBzX7en3xidhFFYWeFi8rfFsB2xTZcB7dxFPaOMiIalqC
 Xbqi9pAvW2XLKBJW1Pjnzt9zZAQI18LmgnED7WV2jN14+Nu4L3AEpfa/UurnAtW1
 gPWYdvcdJZAj/p+ooux1XX+0v5x4E7F7rPoV/HDoPCOZSmtxRgdpSd2wLe+YoN7e
 ycA8omJVXFCb6sNRJEM5
 =apd+
 -----END PGP SIGNATURE-----

Merge tag 'pci-v4.2-fixes-1' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci

Pull PCI fix from Bjorn Helgaas:
 "This is a trivial fix for a change that broke user program compilation
  (QEMU in this case)"

* tag 'pci-v4.2-fixes-1' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci:
  PCI: Restore PCI_MSIX_FLAGS_BIRMASK definition
2015-08-04 09:27:19 -07:00
Linus Torvalds 50d091b6bf Merge tag 'topic/mst-fixes-2015-08-04' of git://anongit.freedesktop.org/drm-intel
Pull drm mst fixes from Daniel Vetter:
 "Special pull request for mst fixes since most of the patches touch
  code outside of i915 proper.  DRM parts have also been reviewed by
  Thierry (nvidia) since Dave's enjoying vacations"

* tag 'topic/mst-fixes-2015-08-04' of git://anongit.freedesktop.org/drm-intel:
  drm/atomic-helpers: Make encoder picking more robust
  drm/dp-mst: Remove debug WARN_ON
  drm/i915: Fixup dp mst encoder selection
  drm/atomic-helper: Add an atomice best_encoder callback
2015-08-04 08:51:06 -07:00
Linus Torvalds 1ddc6dd855 xen: bug fixes for 4.2-rc5
- Don't lose interrupts when offlining CPUs.
 - Fix gntdev oops during unmap.
 - Drop the balloon lock occasionally to allow domain create/destroy.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABAgAGBQJVwNFHAAoJEFxbo/MsZsTRZ0oH/1PBpbABz2iK7SZc75paORkm
 BwVlSTn1Z6ftmRjC8r5BS5KHOrQRsDf8cZnDptCIWpm+uWTXAfeZP5HaH1bA6qMy
 d+T7d9QMlC5nsCxebfduXYl4AHl7fupblBi3y8CmrJdVW0aASyL7roAtkSS23rXl
 ND3288juhI6E0Y5kch3b7yip5vjJtKFR7Mw3RkAZO5ihdx30NCYMdqBRBFcKT0Tp
 s901VXo+87ZSutY12BcToqCwr9E0y2oBdkDIQ5Q7rUlqKM1ifLVOYbx4sxXvNQtk
 3WYcAMuBrwfcBP00xYHt18ozEaJxQ2bTOokhZ//2p6LLnhzo0ZiZcxNRX3jda8A=
 =niy/
 -----END PGP SIGNATURE-----

Merge tag 'for-linus-4.2-rc5-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip

Pull xen bug fixes from David Vrabel:

 - don't lose interrupts when offlining CPUs

 - fix gntdev oops during unmap

 - drop the balloon lock occasionally to allow domain create/destroy

* tag 'for-linus-4.2-rc5-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
  xen/events/fifo: Handle linked events when closing a port
  xen: release lock occasionally during ballooning
  xen/gntdevt: Fix race condition in gntdev_release()
2015-08-04 08:49:08 -07:00
Ross Lagerwall fcdf31a7c1 xen/events/fifo: Handle linked events when closing a port
An event channel bound to a CPU that was offlined may still be linked
on that CPU's queue.  If this event channel is closed and reused,
subsequent events will be lost because the event channel is never
unlinked and thus cannot be linked onto the correct queue.

When a channel is closed and the event is still linked into a queue,
ensure that it is unlinked before completing.

If the CPU to which the event channel bound is online, spin until the
event is handled by that CPU. If that CPU is offline, it can't handle
the event, so clear the event queue during the close, dropping the
events.

This fixes the missing interrupts (and subsequent disk stalls etc.)
when offlining a CPU.

Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
2015-08-04 15:41:59 +01:00
Linus Torvalds ed8bbba0f6 Merge branch 'rc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild
Pull kbuild fixes from Michal Marek:
 "Two fixes for kbuild:

   - The new ARCH_{CPP,A,C}FLAGS variables are reset before including
     the arch Makefile

   - Fix calling make modules_install twice when module compression is
     enabled"

* 'rc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild:
  Makefile: Force gzip and xz on module install
  kbuild: Do not pick up ARCH_{CPP,A,C}FLAGS from the environment
2015-08-04 06:57:32 -07:00
Daniel Vetter 6ea76f3cad drm/atomic-helpers: Make encoder picking more robust
We've had a few issues with atomic where subtle bugs in the encoder
picking logic lead to accidental self-stealing of the encoder,
resulting in a NULL connector_state->crtc in update_connector_routing
and subsequent.

Linus applied some duct-tape for an mst regression in

commit 27667f4744
Author: Linus Torvalds <torvalds@linux-foundation.org>
Date:   Wed Jul 29 22:18:16 2015 -0700

    i915: temporary fix for DP MST docking station NULL pointer dereference

But that was incomplete (the code will still oops when debuggin is
enabled) and mangled the state even further. So instead WARN and bail
out as the more future-proof option.

Cc: Theodore Ts'o <tytso@mit.edu>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Reviewed-by: Thierry Reding <treding@nvidia.com>
Reviewed-by: Ander Conselvan de Oliveira <conselvan2@gmail.com>
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
2015-08-04 11:10:41 +02:00
Daniel Vetter 42639ba554 drm/dp-mst: Remove debug WARN_ON
Apparently been in there since forever and fairly easy to hit when
hotplugging really fast. I can do that since my mst hub has a manual
button to flick the hpd line for reprobing. The resulting WARNING spam
isn't pretty.

Cc: Dave Airlie <airlied@gmail.com>
Cc: stable@vger.kernel.org
Reviewed-by: Thierry Reding <treding@nvidia.com>
Reviewed-by: Ander Conselvan de Oliveira <conselvan2@gmail.com>
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
2015-08-04 11:10:16 +02:00
Daniel Vetter 459485ad35 drm/i915: Fixup dp mst encoder selection
In

commit 8c7b5ccb72
Author: Ander Conselvan de Oliveira <ander.conselvan.de.oliveira@intel.com>
Date:   Tue Apr 21 17:13:19 2015 +0300

    drm/i915: Use atomic helpers for computing changed flags

we've switched over to the atomic version to compute the
crtc->encoder->connector routing from the i915 variant. That one
relies upon the ->best_encoder callback, but the i915-private version
relied upon intel_find_encoder. Which didn't matter except for dp mst,
where the encoder depends upon the selected crtc.

Fix this functional bug by implemented a correct atomic-state based
encoder selector for dp mst.

Note that we can't get rid of the legacy best_encoder callback since
the fbdev emulation uses that still. That means it's incorrect there
still, but that's been the case ever since i915 dp mst support was
merged so not a regression. Best to fix that by converting fbdev over
to atomic too.

Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Theodore Ts'o <tytso@mit.edu>
Reviewed-by: Ander Conselvan de Oliveira <conselvan2@gmail.com>
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
2015-08-04 11:10:02 +02:00
Daniel Vetter 3b8a684bd6 drm/atomic-helper: Add an atomice best_encoder callback
With legacy helpers all the routing was already set up when calling
best_encoder and so could be inspected. But with atomic it's staged,
hence we need a new atomic compliant callback for drivers which need
to inspect the requested state and can't just decided the best encoder
statically.

This is needed to fix up i915 dp mst where we need to pick the right
encoder depending upon the requested CRTC for the connector.

v2: Don't forget to amend the kerneldoc

Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Theodore Ts'o <tytso@mit.edu>
Acked-by: Thierry Reding <treding@nvidia.com>
Reviewed-by: Ander Conselvan de Oliveira <conselvan2@gmail.com>
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
2015-08-04 11:09:25 +02:00
Linus Torvalds c2f3ba745d Merge branch 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux
Pull i2c fixes from Wolfram Sang:
 "A refcounting bugfix for the i2c-core, bugfixes for the generic bus
  recovery algorithm and for its omap-user, making binary file
  attributes for EEPROMs behave POSIX compliant, and a small typo fix
  while we are here"

* 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
  i2c: fix leaked device refcount on of_find_i2c_* error path
  i2c: Fix typo in i2c-bfin-twi.c
  i2c: omap: fix bus recovery setup
  i2c: core: only use set_scl for bus recovery after calling prepare_recovery
  misc: eeprom: at24: clean up at24_bin_write()
  i2c: slave eeprom: clean up sysfs bin attribute read()/write()
2015-08-03 14:51:30 -07:00
Linus Torvalds 7e884479bf Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client
Pull Ceph fixes from Sage Weil:
 "There are two critical regression fixes for CephFS from Zheng, and an
  RBD completion fix for layered images from Ilya"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client:
  rbd: fix copyup completion race
  ceph: always re-send cap flushes when MDS recovers
  ceph: fix ceph_encode_locks_to_buffer()
2015-08-03 11:09:07 -07:00
Linus Torvalds 665aadc1d6 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security
Pull security layer fix from James Morris:
 "Yama initialization fix"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security:
  Adding YAMA hooks also when YAMA is not stacked.
2015-08-03 11:00:53 -07:00
Linus Torvalds abeb4f572d Merge git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6
Pull crypto fixes from Herbert Xu:
 "This fixes the following issues:

   - a bogus BUG_ON in ixp4xx that can be triggered by a dst buffer that
     is an SG list.

   - the error handling in hwrngd may cause a crash in case of an error.

   - fix a race condition in qat registration when multiple devices are
     present"

* git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
  hwrng: core - correct error check of kthread_run call
  crypto: ixp4xx - Remove bogus BUG_ON on scattered dst buffer
  crypto: qat - Fix invalid synchronization between register/unregister sym algs
2015-08-03 10:53:58 -07:00
Linus Torvalds 5b2a0eeea7 Single overzealous locking assertion fix.
Cheers,
 Rusty.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJVvtamAAoJENkgDmzRrbjxGjcQAIpU+NggGTh8Duangp43e8Qp
 YJq0aO6biyVn6gXynlQ+7nl98nxFzaYkluUvaTbED3OP2uD+XQe1BGFiuAGX1JhR
 O4Trn5+0NU2O0DJdda4dxnIBa+OgB8Z3TSxxEt1YImFbpUvckk81isuwl1x2ORLs
 ypX+JrQo5nQRXkhjV4Jg74CB0spHeMUcGbTVsp0S5iX2r+bZw4H4qgQvvkUChX8d
 zb9ej96b8wpdNYBW5AB7kcf1gcLGSRqvenpAseAzGmUVqr1SGgCsNLUwEgLvCgwB
 JyABdaXz51mopap7Q3CRbU8LWnVHp+Ft0ebJ8cOOsr9hQ0O44vg8dBdCza/jcjyA
 C9s1gP3v0BTBd71dwsuWlw3DfB+gMX5GP3qtdcBe5ULs3hBYnO8xvr+w41+GJcFQ
 vt3tiu6ljfRceGI8mBHy8/lDbmWVYnm4L1w27NNi+CSZzSy5uONyl/9PWz4L1EKM
 o4gV23jnnza254vnpKQ5am8DKIo8I0MxHca7wzO+M3+kJ6Sjfd0CnySuMulwpYD7
 CimVnxKyc+x7hK/nXTA1UrBYJbJ3i0xxsfQmNBY6HG322rXnBXXcz76G905S0FJ+
 /9ru913zuZkNZWXNz9KBk5+724nW0eqm3f0iWebHWxUsjtF/KZ8rMskSvRUvxdEV
 mDw3H05X4viTmyNqH9Sy
 =L1VK
 -----END PGP SIGNATURE-----

Merge tag 'fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux

Pull module fix from Rusty Russell:
 "Single overzealous locking assertion fix"

* tag 'fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux:
  module: weaken locking assertion for oops path.
2015-08-03 10:25:32 -07:00
Salvatore Mesoraca 5413fcdbe9 Adding YAMA hooks also when YAMA is not stacked.
Without this patch YAMA will not work at all if it is chosen
as the primary LSM instead of being "stacked".

Signed-off-by: Salvatore Mesoraca <s.mesoraca16@gmail.com>
Acked-by: Kees Cook <keescook@chromium.org>
Signed-off-by: James Morris <james.l.morris@oracle.com>
2015-08-04 01:36:18 +10:00
NeilBrown 49895bcc7e md/raid5: don't let shrink_slab shrink too far.
I have a report of drop_one_stripe() called from
raid5_cache_scan() apparently finding ->max_nr_stripes == 0.

This should not be allowed.

So add a test to keep max_nr_stripes above min_nr_stripes.

Also use a 'mask' rather than a 'mod' in drop_one_stripe
to ensure 'hash' is valid even if max_nr_stripes does reach zero.


Fixes: edbe83ab4c ("md/raid5: allow the stripe_cache to grow and shrink.")
Cc: stable@vger.kernel.org (4.1 - please release with 2d5b569b66)
Reported-by: Tomas Papan <tomas.papan@gmail.com>
Signed-off-by: NeilBrown <neilb@suse.com>
2015-08-03 17:10:56 +10:00
Benjamin Randazzo b6878d9e03 md: use kzalloc() when bitmap is disabled
In drivers/md/md.c get_bitmap_file() uses kmalloc() for creating a
mdu_bitmap_file_t called "file".

5769         file = kmalloc(sizeof(*file), GFP_NOIO);
5770         if (!file)
5771                 return -ENOMEM;

This structure is copied to user space at the end of the function.

5786         if (err == 0 &&
5787             copy_to_user(arg, file, sizeof(*file)))
5788                 err = -EFAULT

But if bitmap is disabled only the first byte of "file" is initialized
with zero, so it's possible to read some bytes (up to 4095) of kernel
space memory from user space. This is an information leak.

5775         /* bitmap disabled, zero the first byte and copy out */
5776         if (!mddev->bitmap_info.file)
5777                 file->pathname[0] = '\0';

Signed-off-by: Benjamin Randazzo <benjamin@randazzo.fr>
Signed-off-by: NeilBrown <neilb@suse.com>
2015-08-03 14:56:02 +10:00
NeilBrown 423f04d63c md/raid1: extend spinlock to protect raid1_end_read_request against inconsistencies
raid1_end_read_request() assumes that the In_sync bits are consistent
with the ->degaded count.
raid1_spare_active updates the In_sync bit before the ->degraded count
and so exposes an inconsistency, as does error()
So extend the spinlock in raid1_spare_active() and error() to hide those
inconsistencies.

This should probably be part of
  Commit: 34cab6f420 ("md/raid1: fix test for 'was read error from
  last working device'.")
as it addresses the same issue.  It fixes the same bug and should go
to -stable for same reasons.

Fixes: 76073054c9 ("md/raid1: clean up read_balance.")
Cc: stable@vger.kernel.org (v3.0+)
Signed-off-by: NeilBrown <neilb@suse.com>
2015-08-03 12:29:42 +10:00
Linus Torvalds 74d33293e4 Linux 4.2-rc5 2015-08-02 18:34:55 -07:00
Linus Torvalds d08c31812e powerpc fixes for 4.2 #2
- TCE table memory calculation fix from Alexey
 - Build fix for ans-lcd from Luis
 - Unbalanced IRQ warning fix from Alistair
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJVvq+8AAoJEFHr6jzI4aWAelAP/jjW+N8OQpqlkmj0cFbcdu+8
 U1QRCSbi681A6NSKDse4oHsY65nZdjQmdLXEMUhzx8Re2T13lpz0w1mJ/ZmUe5q2
 RZhFc76vvYw7jjYIEcXVyM80uTx34zNdWkGUkSkXb0u+BcFxajl2288YNp69QZ9F
 wXxUYfXF/Ea3tEsERRjOL4S6SzwHb6VcxO3SA/lhasK2ylhMEKHvuZSSyC6KKH4Q
 1GpD69jeTvddFZI7Tsjk+dzWO3QrPnrDqLVrSxreqJBzqY6sgYguoRN5PJKlWuDA
 KzntexxdcEefAADDCRC7vRmthA3FgAYCXyNtezeYUYLqF+EKaGMZ+9xJFGA3mQLx
 x3/i5By8he3VB67+9+71VfF5ZZXfpJAHmBaPl1eATjQ7oZHXnKFKhskuBRldG0rQ
 4EpVVQVyKf6XZ3QoxF7QHOUg/cYtnqumwEXJ9qh2DXs5mPBMQ5Ci65ao9ijNrKcz
 PTibIlRulkQy+HhxJcvm1iO85dyqUsENscpuiP/ErLFioFXGPVMmtjE/3ZPFOG3R
 B6ZMsxpmt3aXxKr0fjLz8c2u6uAl0TVoWvwtKe1ONWHnVwAnn0DJdCvf0Ll1JuZ9
 XKdbXPqWl+BJn6wPtj3IvU2oHzGimvQ+6EbL1o8H3sLSmx0htHZnTXrjSxZYb5Hl
 VBfNS1N7MgGmEQ/M+mOP
 =XINd
 -----END PGP SIGNATURE-----

Merge tag 'powerpc-4.2-3' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux

Pull powerpc fixes from Michael Ellerman:
 - TCE table memory calculation fix from Alexey
 - Build fix for ans-lcd from Luis
 - Unbalanced IRQ warning fix from Alistair

* tag 'powerpc-4.2-3' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
  powerpc/eeh-powernv: Fix unbalanced IRQ warning
  macintosh/ans-lcd: fix build failure after module_init/exit relocation
  powerpc/powernv/ioda2: Fix calculation for memory allocated for TCE table
2015-08-02 18:07:36 -07:00
Linus Torvalds 27667f4744 i915: temporary fix for DP MST docking station NULL pointer dereference
Ted Ts'o reports that his Lenovo T540p ThinkPad crashes at boot if
attached to the docking station.  This is a regression that he was able
to bisect to commit 8c7b5ccb7298: "drm/i915: Use atomic helpers for
computing changed flags:"

The reason seems to be the new call to drm_atomic_helper_check_modeset()
added to intel_modeset_compute_config(), which in turn calls
update_connector_routing(), and somehow ends up picking a NULL crtc for
the connector state, causing the subsequent drm_crtc_index() to OOPS.

Daniel Vetter says that the fundamental issue seems to be confusion in
the encoder selection, and this isn't the right fix, but while he chases
down the proper fix, this at least avoids the NULL pointer dereference
and makes Ted's docking station work again.

Reported-bisected-and-tested-by: Theodore Ts'o <tytso@mit.edu>
Cc: Daniel Vetter <daniel.vetter@intel.com>
Cc: Mani Nikula <jani.nikula@linux.intel.com>
Cc: Dave Airlie <airlied@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-08-02 11:02:16 -07:00
Linus Torvalds d4edea4038 SCSI fixes on 20150802
A set of three fixes for the ipr driver and one fairly major one for memory
 leaks in the mq path of SCSI.
 
 Signed-off-by: James Bottomley <JBottomley@Odin.com>
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iQEcBAABAgAGBQJVvkKdAAoJEDeqqVYsXL0McOkH/3JRKdIqfI1vOWG5kje0f2DJ
 qWwLZ68iG2t31Y8vdrSgT94/gfX3PH6rQ3yxuXDvr+oupTSoAyYR2D/R5zTOvusC
 pKvYZSE74yUD23J6Www5WlkTZpSzbSknm7Hj6UkIkHW0+Ihk53pWfQReLbAWpY7h
 MiMG7hqEnQEW2Wsp3SVD5jVy/tDyt2IeRZxcAqgBo5aU5xXzre0Kj8ya7hBZRCO0
 cJaZjLPSJwapAzrqU2X7dOTCAojiMkDmHA4wGDmCtbdST6191JfOz8aiky7SJqnU
 ii4kZLf2wPDUxmRYIJJnY8IweN+VMu7vAvCo2GbeJebpIgu8ZsjHd/9X/6XF0v4=
 =oAZ+
 -----END PGP SIGNATURE-----

Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi

Pull SCSI fixes from James Bottomley:
 "A set of three fixes for the ipr driver and one fairly major one for
  memory leaks in the mq path of SCSI"

* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
  scsi: fix memory leak with scsi-mq
  ipr: Fix invalid array indexing for HRRQ
  ipr: Fix incorrect trace indexing
  ipr: Fix locking for unit attention handling
2015-08-02 09:36:21 -07:00
Linus Torvalds 30c7b56d63 ARM: SoC fixes
Things are calming down nicely here w.r.t. fixes. This batch includes two
 week's worth since I missed to send before -rc4.
 
 Nothing particularly scary to point out, smaller fixes here and
 there. Shortlog describes it pretty well.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJVvcZ0AAoJEIwa5zzehBx3+dYQAKtl3XvLYKUurbwX5u79qgI5
 66iLTdPkJxmdOArfKERbC4wstb39WcTjP3KRGHGpqWMxQZTZoAHZqx2itoU0CKyI
 IzIu789SzmrwMDbGTOoU4OFeTxA3GNZBPdlxXbcilFqPALnvR9cT9HANc0nOaeOG
 kwunJEKIZoVRDmeAd/u25Z//zRk4BYHcgRMfJRqpGEAIEXT2f+v4whLGjCa1pdPz
 PpL6StHoXQ4raeocDhWAUkz/2HpjOFds1bhvaKmPb1zFissSSYBlS5QpCn110E3k
 kpeu5lPojsVBkLPNqmyyx3vobj6pnDWuz2BdaZa8epqsV00hUnM+kIb+sfXnS24w
 23gEAguT91Vw9hgFdVYc0R4xQwuQWqOmNgS6tkS96Aeie/bFBrPxB86AiA76fIaw
 I/0aDJH2pQc6dMQFpzYK1hK3B4KSwlffKnfgIBUecLiXbWDwcTTwZH8Diwc25hdP
 ozI9k6omUkiMTtyjLuj67/e7yTszxffLExPZlccu//kahhGSGJLhCQoRuRTBA0I6
 bnAXC4hc7damn9Xj4RCM9PBXSWonraGyd6Mlgmr+h4MWZMANHuL4bwNcyQAx/gNq
 muzSSFKak3zbo8zn/8j8l8W+UEPJap4pF01Et3HqeleAUx2j2ap7SKy+7eHn9P4F
 D9EnzPopeZJpXJjf03qV
 =AGL5
 -----END PGP SIGNATURE-----

Merge tag 'armsoc-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc

Pull ARM SoC fixes from Olof Johansson:
 "Things are calming down nicely here w.r.t. fixes.  This batch
  includes two week's worth since I missed to send before -rc4.

  Nothing particularly scary to point out, smaller fixes here and there.
  Shortlog describes it pretty well"

* tag 'armsoc-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc:
  ARM: dts: keystone: fix dt bindings to use post div register for mainpll
  ARM: nomadik: disable UART0 on Nomadik boards
  ARM: dts: i.MX35: Fix can support.
  ARM: OMAP2+: hwmod: Fix _wait_target_ready() for hwmods without sysc
  ARM: dts: add CPU OPP and regulator supply property for exynos4210
  ARM: dts: Update video-phy node with syscon phandle for exynos3250
  ARM: DRA7: hwmod: fix gpmc hwmod
2015-08-02 09:12:46 -07:00
Linus Torvalds 01183609ab Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull VFS fix from Al Viro:
 "Spurious ENOTDIR fix"

This should fix the problems reported by Dominique Martinet and Hugh
Dickins.

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  link_path_walk(): be careful when failing with ENOTDIR
2015-08-01 17:42:14 -07:00
Al Viro 97242f99a0 link_path_walk(): be careful when failing with ENOTDIR
In RCU mode we might end up with dentry evicted just we check
that it's a directory.  In such case we should return ECHILD
rather than ENOTDIR, so that pathwalk would be retries in non-RCU
mode.

Breakage had been introduced in commit b18825a - prior to that
we were looking at nd->inode, which had been fetched before
verifying that ->d_seq was still valid.  That form of check
would only be satisfied if at some point the pathname prefix
would indeed have resolved to a non-directory.  The fix consists
of checking ->d_seq after we'd run into a non-directory dentry,
and failing with ECHILD in case of mismatch.

Note that all branches since 3.12 have that problem...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-08-01 20:18:38 -04:00
Linus Torvalds 3f6d9e0896 dmaengine fixes for 4.2-rc5
We had a regression due to reuse of descriptor so we have reverted that.
   Rest are driver fixes
      at_hdmac and at_xdmac for residue, trannfer width, and channel config
      pl330 final fix for dma fails and overflow issue
      xgene resouce map fix
      mv_xor big endian op fix
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJVvOweAAoJEHwUBw8lI4NH21QP/0D8rEh/iXVUZOUqp7ANp+NX
 B96LMvxTmc7Vn8C7dLeMvktZy+SSvlSrG2kqN+X02syhttWjXvvwEYUDw6/InLCy
 ZnXzPxFmPPZEIGiUqb0zFbfUSYtV/7qjTGcXdamxWR3dw2ti1114sQ4K4RfMUvgh
 9aU8PmFw3PYMi1w9boxaoU5KHIAc8zogcKHo21mxSzFPOa9ej4Bcaxa1AtKCsawG
 lPBbjKI7/VWtvMReMF2GVK/mummZ03Iro+iXGL78QUud2hlcxbF7OLPuFHazhi7x
 B8PprnvbVk/DDRy9zO3EVVRpEgWa0E4ms24UKt2eg06k8o/ibaqdZsGR6QpqLmZI
 bl26tQiBpoX1PBxgP8w+6v84FXDzE8pA64dt5t0mCnFrcehyCfPek4P5UmbbfAo1
 S4AH4E9vlNQbjyhB6MYSZD0Ck8BmxxrHqzp/xbUzfRl0Qsyqe9zyaSOraqcmveAZ
 XCETHDb82EetOJh8ukWPGw95Pi9rrKX98FZFWKU8+oxePlGPIeVc3s7T06hj+j+Y
 9ShalP9TG56kmIRGvKFmxW5T9VGQWu/GiglN8LtJSN1hrGAxyaK4QCD8nnYBrxvG
 59WwR/XjkQhldxH3IhuU7LqaphOzOcokFX5kD5imyYRMTQsMjL89LYXshw+8DsQw
 mzZsRA6L3777Zq9SlnsF
 =X0jd
 -----END PGP SIGNATURE-----

Merge tag 'dmaengine-fix-4.2-rc5' of git://git.infradead.org/users/vkoul/slave-dma

Pull dmaengine fixes from Vinod Koul:
 "We had a regression due to reuse of descriptor so we have reverted
  that.

  The rest are driver fixes:

   - at_hdmac and at_xdmac for residue, trannfer width, and channel config
   - pl330 final fix for dma fails and overflow issue
   - xgene resouce map fix
   - mv_xor big endian op fix"

* tag 'dmaengine-fix-4.2-rc5' of git://git.infradead.org/users/vkoul/slave-dma:
  Revert "dmaengine: virt-dma: don't always free descriptor upon completion"
  dmaengine: mv_xor: fix big endian operation in register mode
  dmaengine: xgene-dma: Fix the resource map to handle overlapping
  dmaengine: at_xdmac: fix transfer data width in at_xdmac_prep_slave_sg()
  dmaengine: at_hdmac: fix residue computation
  dmaengine: at_xdmac: fix bug about channel configuration
  dmaengine: pl330: Really fix choppy sound because of wrong residue calculation
  dmaengine: pl330: Fix overflow when reporting residue in memcpy
2015-08-01 12:47:04 -07:00