mirror of https://gitee.com/openkylin/linux.git
692892 Commits
Author | SHA1 | Message | Date |
---|---|---|---|
Mike Rapoport | e86b298beb |
userfaultfd: replace ENOSPC with ESRCH in case mm has gone during copy/zeropage
When the process exit races with outstanding mcopy_atomic, it would be better to return ESRCH error. When such race occurs the process and it's mm are going away and returning "no such process" to the uffd monitor seems better fit than ENOSPC. Link: http://lkml.kernel.org/r/1502111545-32305-1-git-send-email-rppt@linux.vnet.ibm.com Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com> Suggested-by: Michal Hocko <mhocko@suse.com> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: "Dr. David Alan Gilbert" <dgilbert@redhat.com> Cc: Pavel Emelyanov <xemul@virtuozzo.com> Cc: Mike Kravetz <mike.kravetz@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
|
Matthias Kaehlcke | f357e345ee |
zram: rework copy of compressor name in comp_algorithm_store()
comp_algorithm_store() passes the size of the source buffer to strlcpy() instead of the destination buffer size. Make it explicit that the two buffers have the same size and use strcpy() instead of strlcpy(). The latter can be done safely since the function ensures that the string in the source buffer is terminated. Link: http://lkml.kernel.org/r/20170803163350.45245-1-mka@chromium.org Signed-off-by: Matthias Kaehlcke <mka@chromium.org> Reviewed-by: Douglas Anderson <dianders@chromium.org> Reviewed-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Acked-by: Minchan Kim <minchan@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
|
Kirill A. Shutemov | aac2fea94f |
rmap: do not call mmu_notifier_invalidate_page() under ptl
MMU notifiers can sleep, but in page_mkclean_one() we call
mmu_notifier_invalidate_page() under page table lock.
Let's instead use mmu_notifier_invalidate_range() outside
page_vma_mapped_walk() loop.
[jglisse@redhat.com: try_to_unmap_one() do not call mmu_notifier under ptl]
Link: http://lkml.kernel.org/r/20170809204333.27485-1-jglisse@redhat.com
Link: http://lkml.kernel.org/r/20170804134928.l4klfcnqatni7vsc@black.fi.intel.com
Fixes:
|
|
Cong Wang | d041353dc9 |
mm: fix list corruptions on shmem shrinklist
We saw many list corruption warnings on shmem shrinklist:
WARNING: CPU: 18 PID: 177 at lib/list_debug.c:59 __list_del_entry+0x9e/0xc0
list_del corruption. prev->next should be ffff9ae5694b82d8, but was ffff9ae5699ba960
Modules linked in: intel_rapl sb_edac edac_core x86_pkg_temp_thermal coretemp iTCO_wdt iTCO_vendor_support crct10dif_pclmul crc32_pclmul ghash_clmulni_intel raid0 dcdbas shpchp wmi hed i2c_i801 ioatdma lpc_ich i2c_smbus acpi_cpufreq tcp_diag inet_diag sch_fq_codel ipmi_si ipmi_devintf ipmi_msghandler igb ptp crc32c_intel pps_core i2c_algo_bit i2c_core dca ipv6 crc_ccitt
CPU: 18 PID: 177 Comm: kswapd1 Not tainted 4.9.34-t3.el7.twitter.x86_64 #1
Hardware name: Dell Inc. PowerEdge C6220/0W6W6G, BIOS 2.2.3 11/07/2013
Call Trace:
dump_stack+0x4d/0x66
__warn+0xcb/0xf0
warn_slowpath_fmt+0x4f/0x60
__list_del_entry+0x9e/0xc0
shmem_unused_huge_shrink+0xfa/0x2e0
shmem_unused_huge_scan+0x20/0x30
super_cache_scan+0x193/0x1a0
shrink_slab.part.41+0x1e3/0x3f0
shrink_slab+0x29/0x30
shrink_node+0xf9/0x2f0
kswapd+0x2d8/0x6c0
kthread+0xd7/0xf0
ret_from_fork+0x22/0x30
WARNING: CPU: 23 PID: 639 at lib/list_debug.c:33 __list_add+0x89/0xb0
list_add corruption. prev->next should be next (ffff9ae5699ba960), but was ffff9ae5694b82d8. (prev=ffff9ae5694b82d8).
Modules linked in: intel_rapl sb_edac edac_core x86_pkg_temp_thermal coretemp iTCO_wdt iTCO_vendor_support crct10dif_pclmul crc32_pclmul ghash_clmulni_intel raid0 dcdbas shpchp wmi hed i2c_i801 ioatdma lpc_ich i2c_smbus acpi_cpufreq tcp_diag inet_diag sch_fq_codel ipmi_si ipmi_devintf ipmi_msghandler igb ptp crc32c_intel pps_core i2c_algo_bit i2c_core dca ipv6 crc_ccitt
CPU: 23 PID: 639 Comm: systemd-udevd Tainted: G W 4.9.34-t3.el7.twitter.x86_64 #1
Hardware name: Dell Inc. PowerEdge C6220/0W6W6G, BIOS 2.2.3 11/07/2013
Call Trace:
dump_stack+0x4d/0x66
__warn+0xcb/0xf0
warn_slowpath_fmt+0x4f/0x60
__list_add+0x89/0xb0
shmem_setattr+0x204/0x230
notify_change+0x2ef/0x440
do_truncate+0x5d/0x90
path_openat+0x331/0x1190
do_filp_open+0x7e/0xe0
do_sys_open+0x123/0x200
SyS_open+0x1e/0x20
do_syscall_64+0x61/0x170
entry_SYSCALL64_slow_path+0x25/0x25
The problem is that shmem_unused_huge_shrink() moves entries from the
global sbinfo->shrinklist to its local lists and then releases the
spinlock. However, a parallel shmem_setattr() could access one of these
entries directly and add it back to the global shrinklist if it is
removed, with the spinlock held.
The logic itself looks solid since an entry could be either in a local
list or the global list, otherwise it is removed from one of them by
list_del_init(). So probably the race condition is that, one CPU is in
the middle of INIT_LIST_HEAD() but the other CPU calls list_empty()
which returns true too early then the following list_add_tail() sees a
corrupted entry.
list_empty_careful() is designed to fix this situation.
[akpm@linux-foundation.org: add comments]
Link: http://lkml.kernel.org/r/20170803054630.18775-1-xiyou.wangcong@gmail.com
Fixes:
|
|
Wei Wang | af54aed94b |
mm/balloon_compaction.c: don't zero ballooned pages
Revert commit |
|
Michael S. Tsirkin | c0a6a5ae6b |
MAINTAINERS: copy virtio on balloon_compaction.c
Changes to mm/balloon_compaction.c can easily break virtio, and virtio is the only user of that interface. Add a line to MAINTAINERS so whoever changes that file remembers to copy us. Link: http://lkml.kernel.org/r/1501764010-24456-1-git-send-email-mst@redhat.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Rafael Aquini <aquini@redhat.com> Acked-by: Wei Wang <wei.w.wang@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
|
Minchan Kim | b3a81d0841 |
mm: fix KSM data corruption
Nadav reported KSM can corrupt the user data by the TLB batching race[1]. That means data user written can be lost. Quote from Nadav Amit: "For this race we need 4 CPUs: CPU0: Caches a writable and dirty PTE entry, and uses the stale value for write later. CPU1: Runs madvise_free on the range that includes the PTE. It would clear the dirty-bit. It batches TLB flushes. CPU2: Writes 4 to /proc/PID/clear_refs , clearing the PTEs soft-dirty. We care about the fact that it clears the PTE write-bit, and of course, batches TLB flushes. CPU3: Runs KSM. Our purpose is to pass the following test in write_protect_page(): if (pte_write(*pvmw.pte) || pte_dirty(*pvmw.pte) || (pte_protnone(*pvmw.pte) && pte_savedwrite(*pvmw.pte))) Since it will avoid TLB flush. And we want to do it while the PTE is stale. Later, and before replacing the page, we would be able to change the page. Note that all the operations the CPU1-3 perform canhappen in parallel since they only acquire mmap_sem for read. We start with two identical pages. Everything below regards the same page/PTE. CPU0 CPU1 CPU2 CPU3 ---- ---- ---- ---- Write the same value on page [cache PTE as dirty in TLB] MADV_FREE pte_mkclean() 4 > clear_refs pte_wrprotect() write_protect_page() [ success, no flush ] pages_indentical() [ ok ] Write to page different value [Ok, using stale PTE] replace_page() Later, CPU1, CPU2 and CPU3 would flush the TLB, but that is too late. CPU0 already wrote on the page, but KSM ignored this write, and it got lost" In above scenario, MADV_FREE is fixed by changing TLB batching API including [set|clear]_tlb_flush_pending. Remained thing is soft-dirty part. This patch changes soft-dirty uses TLB batching API instead of flush_tlb_mm and KSM checks pending TLB flush by using mm_tlb_flush_pending so that it will flush TLB to avoid data lost if there are other parallel threads pending TLB flush. [1] http://lkml.kernel.org/r/BD3A0EBE-ECF4-41D4-87FA-C755EA9AB6BD@gmail.com Link: http://lkml.kernel.org/r/20170802000818.4760-8-namit@vmware.com Signed-off-by: Minchan Kim <minchan@kernel.org> Signed-off-by: Nadav Amit <namit@vmware.com> Reported-by: Nadav Amit <namit@vmware.com> Tested-by: Nadav Amit <namit@vmware.com> Reviewed-by: Andrea Arcangeli <aarcange@redhat.com> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Hugh Dickins <hughd@google.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Andy Lutomirski <luto@kernel.org> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jeff Dike <jdike@addtoit.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Nadav Amit <nadav.amit@gmail.com> Cc: Rik van Riel <riel@redhat.com> Cc: Russell King <linux@armlinux.org.uk> Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
|
Minchan Kim | 99baac21e4 |
mm: fix MADV_[FREE|DONTNEED] TLB flush miss problem
Nadav reported parallel MADV_DONTNEED on same range has a stale TLB problem and Mel fixed it[1] and found same problem on MADV_FREE[2]. Quote from Mel Gorman: "The race in question is CPU 0 running madv_free and updating some PTEs while CPU 1 is also running madv_free and looking at the same PTEs. CPU 1 may have writable TLB entries for a page but fail the pte_dirty check (because CPU 0 has updated it already) and potentially fail to flush. Hence, when madv_free on CPU 1 returns, there are still potentially writable TLB entries and the underlying PTE is still present so that a subsequent write does not necessarily propagate the dirty bit to the underlying PTE any more. Reclaim at some unknown time at the future may then see that the PTE is still clean and discard the page even though a write has happened in the meantime. I think this is possible but I could have missed some protection in madv_free that prevents it happening." This patch aims for solving both problems all at once and is ready for other problem with KSM, MADV_FREE and soft-dirty story[3]. TLB batch API(tlb_[gather|finish]_mmu] uses [inc|dec]_tlb_flush_pending and mmu_tlb_flush_pending so that when tlb_finish_mmu is called, we can catch there are parallel threads going on. In that case, forcefully, flush TLB to prevent for user to access memory via stale TLB entry although it fail to gather page table entry. I confirmed this patch works with [4] test program Nadav gave so this patch supersedes "mm: Always flush VMA ranges affected by zap_page_range v2" in current mmotm. NOTE: This patch modifies arch-specific TLB gathering interface(x86, ia64, s390, sh, um). It seems most of architecture are straightforward but s390 need to be careful because tlb_flush_mmu works only if mm->context.flush_mm is set to non-zero which happens only a pte entry really is cleared by ptep_get_and_clear and friends. However, this problem never changes the pte entries but need to flush to prevent memory access from stale tlb. [1] http://lkml.kernel.org/r/20170725101230.5v7gvnjmcnkzzql3@techsingularity.net [2] http://lkml.kernel.org/r/20170725100722.2dxnmgypmwnrfawp@suse.de [3] http://lkml.kernel.org/r/BD3A0EBE-ECF4-41D4-87FA-C755EA9AB6BD@gmail.com [4] https://patchwork.kernel.org/patch/9861621/ [minchan@kernel.org: decrease tlb flush pending count in tlb_finish_mmu] Link: http://lkml.kernel.org/r/20170808080821.GA31730@bbox Link: http://lkml.kernel.org/r/20170802000818.4760-7-namit@vmware.com Signed-off-by: Minchan Kim <minchan@kernel.org> Signed-off-by: Nadav Amit <namit@vmware.com> Reported-by: Nadav Amit <namit@vmware.com> Reported-by: Mel Gorman <mgorman@techsingularity.net> Acked-by: Mel Gorman <mgorman@techsingularity.net> Cc: Ingo Molnar <mingo@redhat.com> Cc: Russell King <linux@armlinux.org.uk> Cc: Tony Luck <tony.luck@intel.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Cc: Jeff Dike <jdike@addtoit.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Hugh Dickins <hughd@google.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Nadav Amit <nadav.amit@gmail.com> Cc: Rik van Riel <riel@redhat.com> Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
|
Minchan Kim | 0a2dd266dd |
mm: make tlb_flush_pending global
Currently, tlb_flush_pending is used only for CONFIG_[NUMA_BALANCING| COMPACTION] but upcoming patches to solve subtle TLB flush batching problem will use it regardless of compaction/NUMA so this patch doesn't remove the dependency. [akpm@linux-foundation.org: remove more ifdefs from world's ugliest printk statement] Link: http://lkml.kernel.org/r/20170802000818.4760-6-namit@vmware.com Signed-off-by: Minchan Kim <minchan@kernel.org> Signed-off-by: Nadav Amit <namit@vmware.com> Acked-by: Mel Gorman <mgorman@techsingularity.net> Cc: "David S. Miller" <davem@davemloft.net> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Hugh Dickins <hughd@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jeff Dike <jdike@addtoit.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Nadav Amit <nadav.amit@gmail.com> Cc: Rik van Riel <riel@redhat.com> Cc: Russell King <linux@armlinux.org.uk> Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
|
Minchan Kim | 56236a5955 |
mm: refactor TLB gathering API
This patch is a preparatory patch for solving race problems caused by TLB batch. For that, we will increase/decrease TLB flush pending count of mm_struct whenever tlb_[gather|finish]_mmu is called. Before making it simple, this patch separates architecture specific part and rename it to arch_tlb_[gather|finish]_mmu and generic part just calls it. It shouldn't change any behavior. Link: http://lkml.kernel.org/r/20170802000818.4760-5-namit@vmware.com Signed-off-by: Minchan Kim <minchan@kernel.org> Signed-off-by: Nadav Amit <namit@vmware.com> Acked-by: Mel Gorman <mgorman@techsingularity.net> Cc: Ingo Molnar <mingo@redhat.com> Cc: Russell King <linux@armlinux.org.uk> Cc: Tony Luck <tony.luck@intel.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Cc: Jeff Dike <jdike@addtoit.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Hugh Dickins <hughd@google.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Nadav Amit <nadav.amit@gmail.com> Cc: Rik van Riel <riel@redhat.com> Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
|
Nadav Amit | a9b802500e |
Revert "mm: numa: defer TLB flush for THP migration as long as possible"
While deferring TLB flushes is a good practice, the reverted patch
caused pending TLB flushes to be checked while the page-table lock is
not taken. As a result, in architectures with weak memory model (PPC),
Linux may miss a memory-barrier, miss the fact TLB flushes are pending,
and cause (in theory) a memory corruption.
Since the alternative of using smp_mb__after_unlock_lock() was
considered a bit open-coded, and the performance impact is expected to
be small, the previous patch is reverted.
This reverts
|
|
Nadav Amit | 0a2c40487f |
mm: migrate: fix barriers around tlb_flush_pending
Reading tlb_flush_pending while the page-table lock is taken does not require a barrier, since the lock/unlock already acts as a barrier. Removing the barrier in mm_tlb_flush_pending() to address this issue. However, migrate_misplaced_transhuge_page() calls mm_tlb_flush_pending() while the page-table lock is already released, which may present a problem on architectures with weak memory model (PPC). To deal with this case, a new parameter is added to mm_tlb_flush_pending() to indicate if it is read without the page-table lock taken, and calling smp_mb__after_unlock_lock() in this case. Link: http://lkml.kernel.org/r/20170802000818.4760-3-namit@vmware.com Signed-off-by: Nadav Amit <namit@vmware.com> Acked-by: Rik van Riel <riel@redhat.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Mel Gorman <mgorman@suse.de> Cc: "David S. Miller" <davem@davemloft.net> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Hugh Dickins <hughd@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jeff Dike <jdike@addtoit.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Nadav Amit <nadav.amit@gmail.com> Cc: Russell King <linux@armlinux.org.uk> Cc: Tony Luck <tony.luck@intel.com> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
|
Nadav Amit | 16af97dc5a |
mm: migrate: prevent racy access to tlb_flush_pending
Patch series "fixes of TLB batching races", v6.
It turns out that Linux TLB batching mechanism suffers from various
races. Races that are caused due to batching during reclamation were
recently handled by Mel and this patch-set deals with others. The more
fundamental issue is that concurrent updates of the page-tables allow
for TLB flushes to be batched on one core, while another core changes
the page-tables. This other core may assume a PTE change does not
require a flush based on the updated PTE value, while it is unaware that
TLB flushes are still pending.
This behavior affects KSM (which may result in memory corruption) and
MADV_FREE and MADV_DONTNEED (which may result in incorrect behavior). A
proof-of-concept can easily produce the wrong behavior of MADV_DONTNEED.
Memory corruption in KSM is harder to produce in practice, but was
observed by hacking the kernel and adding a delay before flushing and
replacing the KSM page.
Finally, there is also one memory barrier missing, which may affect
architectures with weak memory model.
This patch (of 7):
Setting and clearing mm->tlb_flush_pending can be performed by multiple
threads, since mmap_sem may only be acquired for read in
task_numa_work(). If this happens, tlb_flush_pending might be cleared
while one of the threads still changes PTEs and batches TLB flushes.
This can lead to the same race between migration and
change_protection_range() that led to the introduction of
tlb_flush_pending. The result of this race was data corruption, which
means that this patch also addresses a theoretically possible data
corruption.
An actual data corruption was not observed, yet the race was was
confirmed by adding assertion to check tlb_flush_pending is not set by
two threads, adding artificial latency in change_protection_range() and
using sysctl to reduce kernel.numa_balancing_scan_delay_ms.
Link: http://lkml.kernel.org/r/20170802000818.4760-2-namit@vmware.com
Fixes:
|
|
Akinobu Mita | 9eeb52ae71 |
fault-inject: fix wrong should_fail() decision in task context
Commit |
|
Dan Carpenter | 4e98ebe5f4 |
test_kmod: fix small memory leak on filesystem tests
The break was in the wrong place so file system tests don't work as intended, leaking memory at each test switch. [mcgrof@kernel.org: massaged commit subject, noted memory leak issue without the fix] Link: http://lkml.kernel.org/r/20170802211450.27928-6-mcgrof@kernel.org Fixes: 39258f448d71 ("kmod: add test driver to stress test the module loader") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Luis R. Rodriguez <mcgrof@kernel.org> Reported-by: David Binderman <dcb314@hotmail.com> Cc: Colin Ian King <colin.king@canonical.com> Cc: Dmitry Torokhov <dmitry.torokhov@gmail.com> Cc: Eric W. Biederman <ebiederm@xmission.com> Cc: Jessica Yu <jeyu@redhat.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Kees Cook <keescook@chromium.org> Cc: Michal Marek <mmarek@suse.com> Cc: Miroslav Benes <mbenes@suse.cz> Cc: Petr Mladek <pmladek@suse.com> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Shuah Khan <shuah@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
|
Dan Carpenter | 9c56771316 |
test_kmod: fix the lock in register_test_dev_kmod()
We accidentally just drop the lock twice instead of taking it and then releasing it. This isn't a big issue unless you are adding more than one device to test on, and the kmod.sh doesn't do that yet, however this obviously is the correct thing to do. [mcgrof@kernel.org: massaged subject, explain what happens] Link: http://lkml.kernel.org/r/20170802211450.27928-5-mcgrof@kernel.org Fixes: 39258f448d71 ("kmod: add test driver to stress test the module loader") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Luis R. Rodriguez <mcgrof@kernel.org> Cc: Colin Ian King <colin.king@canonical.com> Cc: David Binderman <dcb314@hotmail.com> Cc: Dmitry Torokhov <dmitry.torokhov@gmail.com> Cc: Eric W. Biederman <ebiederm@xmission.com> Cc: Jessica Yu <jeyu@redhat.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Kees Cook <keescook@chromium.org> Cc: Michal Marek <mmarek@suse.com> Cc: Miroslav Benes <mbenes@suse.cz> Cc: Petr Mladek <pmladek@suse.com> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Shuah Khan <shuah@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
|
Luis R. Rodriguez | 434b06ae23 |
test_kmod: fix bug which allows negative values on two config options
Parsing with kstrtol() enables values to be negative, and we failed to check for negative values when parsing with test_dev_config_update_uint_sync() or test_dev_config_update_uint_range(). test_dev_config_update_uint_range() has a minimum check though so an issue is not present there. test_dev_config_update_uint_sync() is only used for the number of threads to use (config_num_threads_store()), and indeed this would fail with an attempt for a large allocation. Although the issue is only present in practice with the first fix both by using kstrtoul() instead of kstrtol(). Link: http://lkml.kernel.org/r/20170802211450.27928-4-mcgrof@kernel.org Fixes: 39258f448d71 ("kmod: add test driver to stress test the module loader") Signed-off-by: Luis R. Rodriguez <mcgrof@kernel.org> Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Cc: Colin Ian King <colin.king@canonical.com> Cc: David Binderman <dcb314@hotmail.com> Cc: Dmitry Torokhov <dmitry.torokhov@gmail.com> Cc: Eric W. Biederman <ebiederm@xmission.com> Cc: Jessica Yu <jeyu@redhat.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Kees Cook <keescook@chromium.org> Cc: Michal Marek <mmarek@suse.com> Cc: Miroslav Benes <mbenes@suse.cz> Cc: Petr Mladek <pmladek@suse.com> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Shuah Khan <shuah@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
|
Colin Ian King | a4afe8cdec |
test_kmod: fix spelling mistake: "EMTPY" -> "EMPTY"
Trivial fix to spelling mistake in snprintf text [mcgrof@kernel.org: massaged commit message] Link: http://lkml.kernel.org/r/20170802211450.27928-3-mcgrof@kernel.org Fixes: 39258f448d71 ("kmod: add test driver to stress test the module loader") Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Luis R. Rodriguez <mcgrof@kernel.org> Cc: Dmitry Torokhov <dmitry.torokhov@gmail.com> Cc: Kees Cook <keescook@chromium.org> Cc: Jessica Yu <jeyu@redhat.com> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Michal Marek <mmarek@suse.com> Cc: Petr Mladek <pmladek@suse.com> Cc: Miroslav Benes <mbenes@suse.cz> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Eric W. Biederman <ebiederm@xmission.com> Cc: Shuah Khan <shuah@kernel.org> Cc: Dan Carpenter <dan.carpenter@oracle.com> Cc: David Binderman <dcb314@hotmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
|
Andrea Arcangeli | 5af10dfd0a |
userfaultfd: hugetlbfs: remove superfluous page unlock in VM_SHARED case
huge_add_to_page_cache->add_to_page_cache implicitly unlocks the page before returning in case of errors. The error returned was -EEXIST by running UFFDIO_COPY on a non-hole offset of a VM_SHARED hugetlbfs mapping. It was an userland bug that triggered it and the kernel must cope with it returning -EEXIST from ioctl(UFFDIO_COPY) as expected. page dumped because: VM_BUG_ON_PAGE(!PageLocked(page)) kernel BUG at mm/filemap.c:964! invalid opcode: 0000 [#1] SMP CPU: 1 PID: 22582 Comm: qemu-system-x86 Not tainted 4.11.11-300.fc26.x86_64 #1 RIP: unlock_page+0x4a/0x50 Call Trace: hugetlb_mcopy_atomic_pte+0xc0/0x320 mcopy_atomic+0x96f/0xbe0 userfaultfd_ioctl+0x218/0xe90 do_vfs_ioctl+0xa5/0x600 SyS_ioctl+0x79/0x90 entry_SYSCALL_64_fastpath+0x1a/0xa9 Link: http://lkml.kernel.org/r/20170802165145.22628-2-aarcange@redhat.com Signed-off-by: Andrea Arcangeli <aarcange@redhat.com> Tested-by: Maxime Coquelin <maxime.coquelin@redhat.com> Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com> Cc: "Dr. David Alan Gilbert" <dgilbert@redhat.com> Cc: Mike Rapoport <rppt@linux.vnet.ibm.com> Cc: Alexey Perevalov <a.perevalov@samsung.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
|
Jonathan Toppins | 75dddef325 |
mm: ratelimit PFNs busy info message
The RDMA subsystem can generate several thousand of these messages per second eventually leading to a kernel crash. Ratelimit these messages to prevent this crash. Doug said: "I've been carrying a version of this for several kernel versions. I don't remember when they started, but we have one (and only one) class of machines: Dell PE R730xd, that generate these errors. When it happens, without a rate limit, we get rcu timeouts and kernel oopses. With the rate limit, we just get a lot of annoying kernel messages but the machine continues on, recovers, and eventually the memory operations all succeed" And: "> Well... why are all these EBUSY's occurring? It sounds inefficient > (at least) but if it is expected, normal and unavoidable then > perhaps we should just remove that message altogether? I don't have an answer to that question. To be honest, I haven't looked real hard. We never had this at all, then it started out of the blue, but only on our Dell 730xd machines (and it hits all of them), but no other classes or brands of machines. And we have our 730xd machines loaded up with different brands and models of cards (for instance one dedicated to mlx4 hardware, one for qib, one for mlx5, an ocrdma/cxgb4 combo, etc), so the fact that it hit all of the machines meant it wasn't tied to any particular brand/model of RDMA hardware. To me, it always smelled of a hardware oddity specific to maybe the CPUs or mainboard chipsets in these machines, so given that I'm not an mm expert anyway, I never chased it down. A few other relevant details: it showed up somewhere around 4.8/4.9 or thereabouts. It never happened before, but the prinkt has been there since the 3.18 days, so possibly the test to trigger this message was changed, or something else in the allocator changed such that the situation started happening on these machines? And, like I said, it is specific to our 730xd machines (but they are all identical, so that could mean it's something like their specific ram configuration is causing the allocator to hit this on these machine but not on other machines in the cluster, I don't want to say it's necessarily the model of chipset or CPU, there are other bits of identicalness between these machines)" Link: http://lkml.kernel.org/r/499c0f6cc10d6eb829a67f2a4d75b4228a9b356e.1501695897.git.jtoppins@redhat.com Signed-off-by: Jonathan Toppins <jtoppins@redhat.com> Reviewed-by: Doug Ledford <dledford@redhat.com> Tested-by: Doug Ledford <dledford@redhat.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Hillf Danton <hillf.zj@alibaba-inc.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
|
Johannes Weiner | d507e2ebd2 |
mm: fix global NR_SLAB_.*CLAIMABLE counter reads
As Tetsuo points out: "Commit |
|
Linus Torvalds | 4e082e9ba7 |
pci-v4.13-fixes-2
-----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAABAgAGBQJZjMwWAAoJEFmIoMA60/r8wIAQAK7w2Gp77O6fcy449UpkIrxe 9NnkiFtAgHy4IZgEchwpATYw2+kpLHKnOebmzdQmQcKWH1TiZ4SUW79sn7wMraYG x7+yFSV8myRuznHlarEaxoitH9hDUzR1mE9xXwa1cs4bwPsmEZrrQyQXQyr8M8mR WvkQkQL2/70c0ue42yTCUxK4ZDhi0R4GpbE3Hul6wdMT0eaSw2f7GEw96ruxPGWG 8/ao0R2evueJRaxaY8sd9Nj1jtlfuLXoY/KIEb/i43nSq6FPyWS9It3wJy6zjM5t uQJZvWnH/ON8FuEeut+0Pqq3voeXrmsEtB3oP37Hh5v4qz5Ge3rTje18oc4/tVmN vR4tekwkI992VMvhyOXaogSHPhVAy6Ykc60UbQO4W5F9H+Z8yE1TgKv2+ss/SU9r NHdEBN9XvrNk4kPXbiXfSylt4bPElszRka3mmj4NGA3QOKSKIQeLBburmr2ExSxN SgTIzf9SpTREJinHKxwsCHa3ahceTsBnzznBtgKEsrbHy2rwmg1FWcUBXONDfEeI uKv3ycFf+ErG2MD5akUxT044tl9Zi3nrZXna0viWzxKIWOCyYCr+lOzGxG0yQth9 W2LT/UJpqAE8PQUXYH0ALpLGXY+ZSbHqb/NpYS87KqiTneeSNsjfxZAfjcjgvI31 BEvXbgndYIOWRjsle9SN =ih3D -----END PGP SIGNATURE----- Merge tag 'pci-v4.13-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci Pull PCI fix from Bjorn Helgaas: "Work around Renesas uPD72020x 32-bit DMA issue" * tag 'pci-v4.13-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: xhci: Reset Renesas uPD72020x USB controller for 32-bit DMA issue PCI: Add pci_reset_function_locked() |
|
Mika Westerberg | 1cd65d1761 |
thunderbolt: Do not enumerate more ports from DROM than the controller has
Some Alpine Ridge LP DROMs (there might be others) erroneusly list more ports than the controller actually has. Most probably because DROM of the full Dual/Single port Thunderbolt controller was reused for LP version. The current DROM parser does not check the upper bound thus it leads to crash when sw->ports[] is accessed over bounds: BUG: unable to handle kernel NULL pointer dereference at 00000000000002ec IP: tb_drom_read+0x383/0x890 [thunderbolt] PGD 0 P4D 0 Oops: 0000 [#1] SMP CPU: 3 PID: 12248 Comm: systemd-udevd Not tainted 4.13.0-rc1-next-20170719 #1 Hardware name: LENOVO 20HF000YGE/20HF000YGE, BIOS N1WET32W (1.11 ) 05/23/2017 task: ffff8a293e4bcd80 task.stack: ffffa698027a8000 RIP: 0010:tb_drom_read+0x383/0x890 [thunderbolt] RSP: 0018:ffffa698027ab990 EFLAGS: 00010246 RAX: 0000000000000000 RBX: ffff8a2940af7800 RCX: 0000000000000000 RDX: ffff8a2940ebb400 RSI: 0000000000000000 RDI: ffffa698027ab9a0 RBP: ffffa698027ab9d0 R08: 0000000000000001 R09: 0000000000000002 R10: ffff8a2940ebb5b0 R11: 0000000000000000 R12: ffff8a293bfa968c R13: 000000000000002c R14: 0000000000000056 R15: 0000000000000056 FS: 00007f0a945a38c0(0000) GS:ffff8a2961580000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00000000000002ec CR3: 000000043e785000 CR4: 00000000003606e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: tb_switch_add+0x9d/0x730 [thunderbolt] ? tb_switch_alloc+0x3cd/0x4d0 [thunderbolt] icm_start+0x5a/0xa0 [thunderbolt] tb_domain_add+0xc3/0xf0 [thunderbolt] nhi_probe+0x19e/0x310 [thunderbolt] local_pci_probe+0x42/0xa0 pci_device_probe+0x18d/0x1a0 driver_probe_device+0x2ff/0x450 __driver_attach+0xa4/0xe0 ? driver_probe_device+0x450/0x450 bus_for_each_dev+0x6e/0xb0 driver_attach+0x1e/0x20 bus_add_driver+0x1d0/0x270 ? 0xffffffffc0bbb000 driver_register+0x60/0xe0 ? 0xffffffffc0bbb000 __pci_register_driver+0x4c/0x50 nhi_init+0x28/0x1000 [thunderbolt] do_one_initcall+0x50/0x190 ? __vunmap+0x81/0xb0 ? _cond_resched+0x1a/0x50 ? kmem_cache_alloc_trace+0x15f/0x1c0 ? do_init_module+0x27/0x1e9 do_init_module+0x5f/0x1e9 load_module+0x24e7/0x2a60 ? vfs_read+0x115/0x130 SYSC_finit_module+0xfc/0x120 ? SYSC_finit_module+0xfc/0x120 SyS_finit_module+0xe/0x10 do_syscall_64+0x67/0x170 entry_SYSCALL64_slow_path+0x25/0x25 Fix this by making sure we only enumerate DROM port entries the hardware actually has. Reported-by: Christian Kellner <ckellner@redhat.com> Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com> Reviewed-by: Lukas Wunner <lukas@wunner.de> Tested-by: Christian Kellner <ckellner@redhat.com> Cc: stable <stable@vger.kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> |
|
Alexander Usyskin | 557909e195 |
mei: exclude device from suspend direct complete optimization
MEI device performs link reset during system suspend sequence. The link reset cannot be performed while device is in runtime suspend state. The resume sequence is bypassed with suspend direct complete optimization,so the optimization should be disabled for mei devices. Fixes: [ 192.940537] Restarting tasks ... [ 192.940610] PGI is not set [ 192.940619] ------------[ cut here ]------------ [ 192.940623] WARNING: CPU: 0 me.c:653 mei_me_pg_exit_sync+0x351/0x360 [ 192.940624] Modules linked in: [ 192.940627] CPU: 0 PID: 1661 Comm: kworker/0:3 Not tainted 4.13.0-rc2+ #2 [ 192.940628] Hardware name: Dell Inc. XPS 13 9343/0TM99H, BIOS A11 12/08/2016 [ 192.940630] Workqueue: pm pm_runtime_work <snip> [ 192.940642] Call Trace: [ 192.940646] ? pci_pme_active+0x1de/0x1f0 [ 192.940649] ? pci_restore_standard_config+0x50/0x50 [ 192.940651] ? kfree+0x172/0x190 [ 192.940653] ? kfree+0x172/0x190 [ 192.940655] ? pci_restore_standard_config+0x50/0x50 [ 192.940663] mei_me_pm_runtime_resume+0x3f/0xc0 [ 192.940665] pci_pm_runtime_resume+0x7a/0xa0 [ 192.940667] __rpm_callback+0xb9/0x1e0 [ 192.940668] ? preempt_count_add+0x6d/0xc0 [ 192.940670] rpm_callback+0x24/0x90 [ 192.940672] ? pci_restore_standard_config+0x50/0x50 [ 192.940674] rpm_resume+0x4e8/0x800 [ 192.940676] pm_runtime_work+0x55/0xb0 [ 192.940678] process_one_work+0x184/0x3e0 [ 192.940680] worker_thread+0x4d/0x3a0 [ 192.940681] ? preempt_count_sub+0x9b/0x100 [ 192.940683] kthread+0x122/0x140 [ 192.940684] ? process_one_work+0x3e0/0x3e0 [ 192.940685] ? __kthread_create_on_node+0x1a0/0x1a0 [ 192.940688] ret_from_fork+0x27/0x40 [ 192.940690] Code: 96 3a 9e ff 48 8b 7d 98 e8 cd 21 58 00 83 bb bc 01 00 00 04 0f 85 40 fe ff ff e9 41 fe ff ff 48 c7 c7 5f 04 99 96 e8 93 6b 9f ff <0f> ff e9 5d fd ff ff e8 33 fe 99 ff 0f 1f 00 0f 1f 44 00 00 55 [ 192.940719] ---[ end trace a86955597774ead8 ]--- [ 192.942540] done. Suggested-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Reported-by: Dominik Brodowski <linux@dominikbrodowski.net> Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Cc: Dominik Brodowski <linux@dominikbrodowski.net> Signed-off-by: Alexander Usyskin <alexander.usyskin@intel.com> Signed-off-by: Tomas Winkler <tomas.winkler@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> |
|
Luis R. Rodriguez | 260d9f2fc5 |
firmware: avoid invalid fallback aborts by using killable wait
Commit |
|
Luis R. Rodriguez | 90d41e74a9 |
firmware: fix batched requests - send wake up on failure on direct lookups
Fix batched requests from waiting forever on failure. The firmware API batched requests feature has been broken since the API call request_firmware_direct() was introduced on commit |
|
Luis R. Rodriguez | e44565f62a |
firmware: fix batched requests - wake all waiters
The firmware cache mechanism serves two purposes, the secondary purpose is not well documented nor understood. This fixes a regression with the secondary purpose of the firmware cache mechanism: batched requests on successful lookups. Without this fix *any* time a batched request is triggered, secondary requests for which the batched request mechanism was designed for will seem to last forver and seem to never return. This issue is present for all kernel builds possible, and a hard reset is required. The firmware cache is used for: 1) Addressing races with file lookups during the suspend/resume cycle by keeping firmware in memory during the suspend/resume cycle 2) Batched requests for the same file rely only on work from the first file lookup, which keeps the firmware in memory until the last release_firmware() is called Batched requests *only* take effect if secondary requests come in prior to the first user calling release_firmware(). The devres name used for the internal firmware cache is used as a hint other pending requests are ongoing, the firmware buffer data is kept in memory until the last user of the buffer calls release_firmware(), therefore serializing requests and delaying the release until all requests are done. Batched requests wait for a wakup or signal so we can rely on the first file fetch to write to the pending secondary requests. Commit |
|
Greg Kroah-Hartman | 3b6bcd3d09 |
USB: serial: pl2303: add new ATEN device id
This adds a new ATEN device id for a new pl2303-based device. Reported-by: Peter Kuo <PeterKuo@aten.com.tw> Cc: stable <stable@vger.kernel.org> Cc: Johan Hovold <johan@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> |
|
Kai-Heng Feng | 7496cfe543 |
usb: quirks: Add no-lpm quirk for Moshi USB to Ethernet Adapter
Moshi USB to Ethernet Adapter internally uses a Genesys Logic hub to connect to Realtek r8153. The Realtek r8153 ethernet does not work on the internal hub, no-lpm quirk can make it work. Since another r8153 dongle at my hand does not have the issue, so add the quirk to the Genesys Logic hub instead. Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com> Cc: stable <stable@vger.kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> |
|
Alan Stern | 94c43b9897 |
USB: Check for dropped connection before switching to full speed
Some buggy USB disk adapters disconnect and reconnect multiple times
during the enumeration procedure. This may lead to a device
connecting at full speed instead of high speed, because when the USB
stack sees that a device isn't able to enumerate at high speed, it
tries to hand the connection over to a full-speed companion
controller.
The logic for doing this is careful to check that the device is still
connected. But this check is inadequate if the device disconnects and
reconnects before the check is done. The symptom is that a device
works, but much more slowly than it is capable of operating.
The situation was made worse recently by commit
|
|
Sandeep Singh | e788787ef4 |
usb:xhci:Add quirk for Certain failing HP keyboard on reset after resume
Certain HP keyboards would keep inputting a character automatically which is the wake-up key after S3 resume On some AMD platforms USB host fails to respond (by holding resume-K) to USB device (an HP keyboard) resume request within 1ms (TURSM) and ensures that resume is signaled for at least 20 ms (TDRSMDN), which is defined in USB 2.0 spec. The result is that the keyboard is out of function. In SNPS USB design, the host responds to the resume request only after system gets back to S0 and the host gets to functional after the internal HW restore operation that is more than 1 second after the initial resume request from the USB device. As a workaround for specific keyboard ID(HP Keyboards), applying port reset after resume when the keyboard is plugged in. Signed-off-by: Sandeep Singh <Sandeep.Singh@amd.com> Signed-off-by: Shyam Sundar S K <Shyam-sundar.S-k@amd.com> cc: Nehal Shah <Nehal-bakulchandra.Shah@amd.com> Reviewed-by: Felipe Balbi <felipe.balbi@linux.intel.com> Cc: stable <stable@vger.kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> |
|
Kwan (Hingkwan) Huen-SSI | a082b42628 |
nvme: fix directive command numd calculation
The numd field of directive receive command takes number of dwords to transfer. This fix has the correct calculation for numd. Signed-off-by: Kwan (Hingkwan) Huen-SSI <kwan.huen@samsung.com> Reviewed-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Christoph Hellwig <hch@lst.de> |
|
Keith Busch | 634b832590 |
nvme: fix nvme reset command timeout handling
We need to return an error if a timeout occurs on any NVMe command during initialization. Without this, the nvme reset work will be stuck. A timeout will have a negative error code, meaning we need to stop initializing the controller. All postitive returns mean the controller is still usable. bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=196325 Signed-off-by: Keith Busch <keith.busch@intel.com> Cc: Martin Peres <martin.peres@intel.com> [jth consolidated cleanup path ] Signed-off-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Christoph Hellwig <hch@lst.de> |
|
Linus Torvalds | 26273939ac |
Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull networking fixes from David Miller: 1) Fix handling of initial STATE message in TIPC, from Jon Paul Maloy. 2) Fix stats handling in bcm_sysport_get_stats(), from Florian Fainelli. 3) Reject 16777215 VNI value in geneve_validate(), from Girish Moodalbail. 4) Fix initial IGMP sysctl setting regression, from Nikolay Borisov. 5) Once a UFO fragmented frame is treated as UFO, we should continue doing so. Likewise once a frame has been segmented, we should continue doing that and not try to convert it to a UFO frame. From Willem de Bruijn. 6) Test the AF_PACKET RX/TX ring pg_vec state under the socket lock to prevent races. From Willem de Bruijn. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: packet: fix tp_reserve race in packet_set_ring udp: consistently apply ufo or fragmentation net: sched: set xt_tgchk_param par.nft_compat as 0 in ipt_init_target igmp: Fix regression caused by igmp sysctl namespace code. geneve: maximum value of VNI cannot be used net: systemport: Fix software statistics for SYSTEMPORT Lite tipc: remove premature ESTABLISH FSM event at link synchronization |
|
Willem de Bruijn | c27927e372 |
packet: fix tp_reserve race in packet_set_ring
Updates to tp_reserve can race with reads of the field in
packet_set_ring. Avoid this by holding the socket lock during
updates in setsockopt PACKET_RESERVE.
This bug was discovered by syzkaller.
Fixes:
|
|
Willem de Bruijn | 85f1bd9a7b |
udp: consistently apply ufo or fragmentation
When iteratively building a UDP datagram with MSG_MORE and that
datagram exceeds MTU, consistently choose UFO or fragmentation.
Once skb_is_gso, always apply ufo. Conversely, once a datagram is
split across multiple skbs, do not consider ufo.
Sendpage already maintains the first invariant, only add the second.
IPv6 does not have a sendpage implementation to modify.
A gso skb must have a partial checksum, do not follow sk_no_check_tx
in udp_send_skb.
Found by syzkaller.
Fixes:
|
|
Linus Torvalds | f213ad386b |
Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc
Pull sparc updates from David Miller: 1) Recognize M8 cpus, just basic chip ID matching, from Allen Pais. 2) Prevent crashes when bringing up sunvdc virtual block devices in some environments. From Jim Quigley. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc: sunvdc: prevent sunvdc panic when mpgroup disk added to guest domain sparc64: Increase max_phys_bits to 51 and VA bits to 53 for M8. sparc64: recognize and support sparc M8 cpu type sparc64: properly name the cpu constants |
|
Max Gurtovoy | 1c78f7735b |
nvme-pci: fix CMB sysfs file removal in reset path
Currently we create the sysfs entry even if we fail mapping
it. In that case, the unmapping will not remove the sysfs created
file. There is no good reason to create a sysfs entry for a non
working CMB and show his characteristics.
Fixes:
|
|
James Smart | 5073842093 |
lpfc: support nvmet_fc defer_rcv callback
Currently, calls to nvmet_fc_rcv_fcp_req() always copied the FC-NVME cmd iu to a temporary buffer before returning, allowing the driver to immediately repost the buffer to the hardware. To address timing conditions on queue element structures vs async command reception, the nvmet_fc transport occasionally may need to hold on to the command iu buffer for a short period. In these cases, the nvmet_fc_rcv_fcp_req() will return a special return code (-EOVERFLOW). In these cases, the LLDD must delay until the new defer_rcv lldd callback is called before recycling the buffer back to the hw. This patch adds support for the new nvmet_fc transport defer_rcv callback and recognition of the new error code when passing commands to the transport. Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <james.smart@broadcom.com> Signed-off-by: Christoph Hellwig <hch@lst.de> |
|
James Smart | 0fb228d30b |
nvmet_fc: add defer_req callback for deferment of cmd buffer return
At queue creation, the transport allocates a local job struct (struct nvmet_fc_fcp_iod) for each possible element of the queue. When a new CMD is received from the wire, a jobs struct is allocated from the queue and then used for the duration of the command. The job struct contains buffer space for the wire command iu. Thus, upon allocation of the job struct, the cmd iu buffer is copied to the job struct and the LLDD may immediately free/reuse the CMD IU buffer passed in the call. However, in some circumstances, due to the packetized nature of FC and the api of the FC LLDD which may issue a hw command to send the wire response, but the LLDD may not get the hw completion for the command and upcall the nvmet_fc layer before a new command may be asynchronously received on the wire. In other words, its possible for the initiator to get the response from the wire, thus believe a command slot free, and send a new command iu. The new command iu may be received by the LLDD and passed to the transport before the LLDD had serviced the hw completion and made the teardown calls for the original job struct. As such, there is no available job struct available for the new io. E.g. it appears like the host sent more queue elements than the queue size. It didn't based on it's understanding. Rather than treat this as a hard connection failure queue the new request until the job struct does free up. As the buffer isn't copied as there's no job struct, a special return value must be returned to the LLDD to signify to hold off on recycling the cmd iu buffer. And later, when a job struct is allocated and the buffer copied, a new LLDD callback is introduced to notify the LLDD and allow it to recycle it's command iu buffer. Signed-off-by: James Smart <james.smart@broadcom.com> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Christoph Hellwig <hch@lst.de> |
|
Martin Wilck | 758f373558 |
nvme: strip trailing 0-bytes in wwid_show
Some broken controllers (such as earlier Linux targets) pad model or serial fields with 0-bytes rather than spaces. The NVMe spec disallows 0 bytes in "ASCII" fields. Thus strip trailing 0-bytes, too. Also make sure that we get no underflow for pathological input. Signed-off-by: Martin Wilck <mwilck@suse.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Christoph Hellwig <hch@lst.de> |
|
Xin Long | 96d9703050 |
net: sched: set xt_tgchk_param par.nft_compat as 0 in ipt_init_target
Commit |
|
Nikolay Borisov | 1714020e42 |
igmp: Fix regression caused by igmp sysctl namespace code.
Commit |
|
Girish Moodalbail | 04db70d9fe |
geneve: maximum value of VNI cannot be used
Geneve's Virtual Network Identifier (VNI) is 24 bit long, so the range of values for it would be from 0 to 16777215 (2^24 -1). However, one cannot create a geneve device with VNI set to 16777215. This patch fixes this issue. Signed-off-by: Girish Moodalbail <girish.moodalbail@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net> |
|
Florian Fainelli | 50ddfbafcd |
net: systemport: Fix software statistics for SYSTEMPORT Lite
With SYSTEMPORT Lite we have holes in our statistics layout that make us
skip over the hardware MIB counters, bcm_sysport_get_stats() was not
taking that into account resulting in reporting 0 for all SW-maintained
statistics, fix this by skipping accordingly.
Fixes:
|
|
Jon Paul Maloy | ed43594aed |
tipc: remove premature ESTABLISH FSM event at link synchronization
When a link between two nodes come up, both endpoints will initially send out a STATE message to the peer, to increase the probability that the peer endpoint also is up when the first traffic message arrives. Thereafter, if the establishing link is the second link between two nodes, this first "traffic" message is a TUNNEL_PROTOCOL/SYNCH message, helping the peer to perform initial synchronization between the two links. However, the initial STATE message may be lost, in which case the SYNCH message will be the first one arriving at the peer. This should also work, as the SYNCH message itself will be used to take up the link endpoint before initializing synchronization. Unfortunately the code for this case is broken. Currently, the link is brought up through a tipc_link_fsm_evt(ESTABLISHED) when a SYNCH arrives, whereupon __tipc_node_link_up() is called to distribute the link slots and take the link into traffic. But, __tipc_node_link_up() is itself starting with a test for whether the link is up, and if true, returns without action. Clearly, the tipc_link_fsm_evt(ESTABLISHED) call is unnecessary, since tipc_node_link_up() is itself issuing such an event, but also harmful, since it inhibits tipc_node_link_up() to perform the test of its tasks, and the link endpoint in question hence is never taken into traffic. This problem has been exposed when we set up dual links between pre- and post-4.4 kernels, because the former ones don't send out the initial STATE message described above. We fix this by removing the unnecessary event call. Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net> |
|
Jim Quigley | 3ee70591d6 |
sunvdc: prevent sunvdc panic when mpgroup disk added to guest domain
Using mpgroup to define multiple paths for a virtual disk causes multiple virtual-device-port ports to be created for that virtual device. Each virtual-device-port port then gets a vdisk created for it by the Linux sunvdc driver. As mpgroup is not supported by the Linux sunvdc driver it cannot handle multiple ports for a single vdisk, leading to a kernel panic at startup. This fix prevents more than one vdisk per virtual-device-port being created until full virtual disk multipathing (mpgroup) support is implemented. Signed-off-by: Jim Quigley <Jim.Quigley@oracle.com> Reviewed-by: Liam Merwick <liam.merwick@oracle.com> Reviewed-by: Shannon Nelson <shannon.nelson@oracle.com> Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com> Reviewed-by: Aaron Young <aaron.young@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net> |
|
Nicholas Bellinger | 6f48655fac |
target: Fix node_acl demo-mode + uncached dynamic shutdown regression
This patch fixes a generate_node_acls = 1 + cache_dynamic_acls = 0
regression, that was introduced by
commit
|
|
Bart Van Assche | d4acf3650c |
block: Make blk_mq_delay_kick_requeue_list() rerun the queue at a quiet time
The blk_mq_delay_kick_requeue_list() function is used by the device
mapper and only by the device mapper to rerun the queue and requeue
list after a delay. This function is called once per request that
gets requeued. Modify this function such that the queue is run once
per path change event instead of once per request that is requeued.
Fixes: commit
|
|
Christoph Hellwig | f86e28c4dc |
bio-integrity: only verify integrity on the lowest stacked driver
This gets us back to the behavior in 4.12 and earlier.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Fixes:
|