mirror of https://gitee.com/openkylin/linux.git
55589 Commits
Author | SHA1 | Message | Date |
---|---|---|---|
Jaegeuk Kim | 6f8d445506 |
f2fs: avoid fi->i_gc_rwsem[WRITE] lock in f2fs_gc
The f2fs_gc() called by f2fs_balance_fs() requires to be called outside of fi->i_gc_rwsem[WRITE], since f2fs_gc() can try to grab it in a loop. If it hits the miximum retrials in GC, let's give a chance to release gc_mutex for a short time in order not to go into live lock in the worst case. Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> |
|
Jaegeuk Kim | 853137cef4 |
f2fs: fix performance issue observed with multi-thread sequential read
This reverts the commit - "b93f771 - f2fs: remove writepages lock" to fix the drop in sequential read throughput. Test: ./tiotest -t 32 -d /data/tio_tmp -f 32 -b 524288 -k 1 -k 3 -L device: UFS Before - read throughput: 185 MB/s total read requests: 85177 (of these ~80000 are 4KB size requests). total write requests: 2546 (of these ~2208 requests are written in 512KB). After - read throughput: 758 MB/s total read requests: 2417 (of these ~2042 are 512KB reads). total write requests: 2701 (of these ~2034 requests are written in 512KB). Signed-off-by: Sahitya Tummala <stummala@codeaurora.org> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> |
|
Linus Torvalds | 7140ad3898 |
Updates for v4.19:
- Restructure of lockdep and latency tracers This is the biggest change. Joel Fernandes restructured the hooks from irqs and preemption disabling and enabling. He got rid of a lot of the preprocessor #ifdef mess that they caused. He turned both lockdep and the latency tracers to use trace events inserted in the preempt/irqs disabling paths. But unfortunately, these started to cause issues in corner cases. Thus, parts of the code was reverted back to where lockde and the latency tracers just get called directly (without using the trace events). But because the original change cleaned up the code very nicely we kept that, as well as the trace events for preempt and irqs disabling, but they are limited to not being called in NMIs. - Have trace events use SRCU for "rcu idle" calls. This was required for the preempt/irqs off trace events. But it also had to not allow them to be called in NMI context. Waiting till Paul makes an NMI safe SRCU API. - New notrace SRCU API to allow trace events to use SRCU. - Addition of mcount-nop option support - SPDX headers replacing GPL templates. - Various other fixes and clean ups. - Some fixes are marked for stable, but were not fully tested before the merge window opened. -----BEGIN PGP SIGNATURE----- iIoEABYIADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCW3ruhRQccm9zdGVkdEBn b29kbWlzLm9yZwAKCRAp5XQQmuv6qiM7AP47NhYdSnCFCRUJfrt6PovXmQtuCHt3 c3QMoGGdvzh9YAEAqcSXwh7uLhpHUp1LjMAPkXdZVwNddf4zJQ1zyxQ+EAU= =vgEr -----END PGP SIGNATURE----- Merge tag 'trace-v4.19' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull tracing updates from Steven Rostedt: - Restructure of lockdep and latency tracers This is the biggest change. Joel Fernandes restructured the hooks from irqs and preemption disabling and enabling. He got rid of a lot of the preprocessor #ifdef mess that they caused. He turned both lockdep and the latency tracers to use trace events inserted in the preempt/irqs disabling paths. But unfortunately, these started to cause issues in corner cases. Thus, parts of the code was reverted back to where lockdep and the latency tracers just get called directly (without using the trace events). But because the original change cleaned up the code very nicely we kept that, as well as the trace events for preempt and irqs disabling, but they are limited to not being called in NMIs. - Have trace events use SRCU for "rcu idle" calls. This was required for the preempt/irqs off trace events. But it also had to not allow them to be called in NMI context. Waiting till Paul makes an NMI safe SRCU API. - New notrace SRCU API to allow trace events to use SRCU. - Addition of mcount-nop option support - SPDX headers replacing GPL templates. - Various other fixes and clean ups. - Some fixes are marked for stable, but were not fully tested before the merge window opened. * tag 'trace-v4.19' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: (44 commits) tracing: Fix SPDX format headers to use C++ style comments tracing: Add SPDX License format tags to tracing files tracing: Add SPDX License format to bpf_trace.c blktrace: Add SPDX License format header s390/ftrace: Add -mfentry and -mnop-mcount support tracing: Add -mcount-nop option support tracing: Avoid calling cc-option -mrecord-mcount for every Makefile tracing: Handle CC_FLAGS_FTRACE more accurately Uprobe: Additional argument arch_uprobe to uprobe_write_opcode() Uprobes: Simplify uprobe_register() body tracepoints: Free early tracepoints after RCU is initialized uprobes: Use synchronize_rcu() not synchronize_sched() tracing: Fix synchronizing to event changes with tracepoint_synchronize_unregister() ftrace: Remove unused pointer ftrace_swapper_pid tracing: More reverting of "tracing: Centralize preemptirq tracepoints and unify their usage" tracing/irqsoff: Handle preempt_count for different configs tracing: Partial revert of "tracing: Centralize preemptirq tracepoints and unify their usage" tracing: irqsoff: Account for additional preempt_disable trace: Use rcu_dereference_raw for hooks from trace-event subsystem tracing/kprobes: Fix within_notrace_func() to check only notrace functions ... |
|
Linus Torvalds | 0a78ac4b9b |
The main things are support for cephx v2 authentication protocol and
basic support for rbd images within namespaces (myself). Also included y2038 conversion patches from Arnd, a pile of miscellaneous fixes from Chengguang and Zheng's feature bit infrastructure for the filesystem. -----BEGIN PGP SIGNATURE----- iQFHBAABCAAxFiEEydHwtzie9C7TfviiSn/eOAIR84sFAlt62CkTHGlkcnlvbW92 QGdtYWlsLmNvbQAKCRBKf944AhHzizfhB/0c/rz6frunc6EcZMWuBNzlOIOktJ/m MEbPGjCxMAsmidO1rqHHYF4iEN5hr+3AWTbtIL2m6wkqYVdg3FjmNaAYB27AdQMG kH9bLfrKIew72/NZqXfm25yjY/86kIt8t91kay4Lchc97tSYhnFSnku7iAX2HTND TMhq/1O/GvEyw/RmqnenJEQqFJvKnfgPPQm6W8sM2bH0T5j+EXmDT/Rv+90LogFR J4+pZkHqDfvyMb1WJ5MkumohytbRVzRNKcMpOvjquJSqUgtgZa2JdrIsypDqSNKY nUT6jGGlxoSbHCqRwDJoFEJOlh5A9RwKqYxNuM2a/vs9u7HpvdCK/Iah =AtgY -----END PGP SIGNATURE----- Merge tag 'ceph-for-4.19-rc1' of git://github.com/ceph/ceph-client Pull ceph updates from Ilya Dryomov: "The main things are support for cephx v2 authentication protocol and basic support for rbd images within namespaces (myself). Also included are y2038 conversion patches from Arnd, a pile of miscellaneous fixes from Chengguang and Zheng's feature bit infrastructure for the filesystem" * tag 'ceph-for-4.19-rc1' of git://github.com/ceph/ceph-client: (40 commits) ceph: don't drop message if it contains more data than expected ceph: support cephfs' own feature bits crush: fix using plain integer as NULL warning libceph: remove unnecessary non NULL check for request_key ceph: refactor error handling code in ceph_reserve_caps() ceph: refactor ceph_unreserve_caps() ceph: change to void return type for __do_request() ceph: compare fsc->max_file_size and inode->i_size for max file size limit ceph: add additional size check in ceph_setattr() ceph: add additional offset check in ceph_write_iter() ceph: add additional range check in ceph_fallocate() ceph: add new field max_file_size in ceph_fs_client libceph: weaken sizeof check in ceph_x_verify_authorizer_reply() libceph: check authorizer reply/challenge length before reading libceph: implement CEPHX_V2 calculation mode libceph: add authorizer challenge libceph: factor out encrypt_authorizer() libceph: factor out __ceph_x_decrypt() libceph: factor out __prepare_write_connect() libceph: store ceph_auth_handshake pointer in ceph_connection ... |
|
Linus Torvalds | a18d783fed |
Driver core patches for 4.19-rc1
Here are all of the driver core and related patches for 4.19-rc1. Nothing huge here, just a number of small cleanups and the ability to now stop the deferred probing after init happens. All of these have been in linux-next for a while with only a merge issue reported. That merge issue is in fs/sysfs/group.c and Stephen has posted the diff of what it should be to resolve this. I'll follow up with that diff to this pull request. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> -----BEGIN PGP SIGNATURE----- iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCW3g86Q8cZ3JlZ0Brcm9h aC5jb20ACgkQMUfUDdst+ynyXQCePaZSW8wft4b7nLN8RdZ98ATBru0Ani10lrJa HQeQJRNbWU1AZ0ym7695 =tOaH -----END PGP SIGNATURE----- Merge tag 'driver-core-4.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core Pull driver core updates from Greg KH: "Here are all of the driver core and related patches for 4.19-rc1. Nothing huge here, just a number of small cleanups and the ability to now stop the deferred probing after init happens. All of these have been in linux-next for a while with only a merge issue reported" * tag 'driver-core-4.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (21 commits) base: core: Remove WARN_ON from link dependencies check drivers/base: stop new probing during shutdown drivers: core: Remove glue dirs from sysfs earlier driver core: remove unnecessary function extern declare sysfs.h: fix non-kernel-doc comment PM / Domains: Stop deferring probe at the end of initcall iommu: Remove IOMMU_OF_DECLARE iommu: Stop deferring probe at end of initcalls pinctrl: Support stopping deferred probe after initcalls dt-bindings: pinctrl: add a 'pinctrl-use-default' property driver core: allow stopping deferred probe after init driver core: add a debugfs entry to show deferred devices sysfs: Fix internal_create_group() for named group updates base: fix order of OF initialization linux/device.h: fix kernel-doc notation warning Documentation: update firmware loader fallback reference kobject: Replace strncpy with memcpy drivers: base: cacheinfo: use OF property_read_u32 instead of get_property,read_number kernfs: Replace strncpy with memcpy device: Add #define dev_fmt similar to #define pr_fmt ... |
|
Ingo Molnar | 5804b11034 |
perf/core improvements ad fixes:
kernel: . kallsyms, x86: Export addresses of PTI entry trampolines (Alexander Shishkin) . kallsyms: Simplify update_iter_mod() (Adrian Hunter) . x86: Add entry trampolines to kcore (Adrian Hunter) Hardware tracing: . Fix auxtrace queue resize (Adrian Hunter) Arch specific: . Fix uninitialized ARM SPE record error variable (Kim Phillips) . Fix trace event post-processing in powerpc (Sandipan Das) Build: . Fix check-headers.sh AND list path of execution (Alexander Kapshuk) . Remove -mcet and -fcf-protection when building the python binding with older clang versions (Arnaldo Carvalho de Melo) . Make check-headers.sh check based on kernel dir (Jiri Olsa) . Move syscall_64.tbl check into check-headers.sh (Jiri Olsa) Infrastructure: . Check for null when copying nsinfo. (Benno Evers) Libraries: . Rename libtraceevent prefixes, prep work for making it a shared library generaly available (Tzvetomir Stoyanov (VMware)) Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEELb9bqkb7Te0zijNb1lAW81NSqkAFAlt0OP8ACgkQ1lAW81NS qkD0lg/8Dcjd6bCgHzrRJYcR4VUNgHq4LnpJEp3VVvtV29AmptVW3hOCF6siuwDI H8rtMzE0gflMVHae310qTaUIzo1A6/ugoRwxUKLKU8aWkA1ikl9iJn6uaTttOCIG H4a/mExILDicGfxMk6kAPdyDbYr7r+1UoF58asrWVjPQNPxoJSALJCPtMnLK7Cn8 qMZN68TqIL5zifbRe6UHKCH/SmDuowVmEIz4Nin3QtwKFPH+I02TtSdYkNWTC5WK o469/zy9cceA6a8Q+bVEUP0OD1mU8BvRnZogOiZ5SdMiYZlDkFSqG5MzgTtJUXQC DxOKkANdWu7zON/KywGDX8kcIBySzd5toTiXLvsHNNhxR8pT3bU57QvrscqLIMX+ SVbCR03h5EwLeJopvDKZjwtcSwCWp1aCXCrmLP04tcB1zi+mUIRohbzf2vZDPfwo IRtoHvPAgV9UAA7M+UAAtNc7G0Gg/K2c6LlfiuxDhgjk9Jqg4NbOz3cn6rvXtGQX B4RM9pdhoNi9tqrXNYCnLzinzKtPhWjyEbb0FBlgvXTNQNiDVjhrkt3pC1fH1zK9 GX1F6L78x24z3bZl5hUyYXmxOkktpAeprjICXAikYvwwige6aNENVLhCI/fVFHcx ByxXbFMom/dSIgU0tBfdqktd7ZmSLm94obSuA6BW8/htf2JIztM= =W1go -----END PGP SIGNATURE----- Merge tag 'perf-core-for-mingo-4.19-20180815' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/urgent Pull perf/core improvements and fixes from Arnaldo Carvalho de Melo: kernel: - kallsyms, x86: Export addresses of PTI entry trampolines (Alexander Shishkin) - kallsyms: Simplify update_iter_mod() (Adrian Hunter) - x86: Add entry trampolines to kcore (Adrian Hunter) Hardware tracing: - Fix auxtrace queue resize (Adrian Hunter) Arch specific: - Fix uninitialized ARM SPE record error variable (Kim Phillips) - Fix trace event post-processing in powerpc (Sandipan Das) Build: - Fix check-headers.sh AND list path of execution (Alexander Kapshuk) - Remove -mcet and -fcf-protection when building the python binding with older clang versions (Arnaldo Carvalho de Melo) - Make check-headers.sh check based on kernel dir (Jiri Olsa) - Move syscall_64.tbl check into check-headers.sh (Jiri Olsa) Infrastructure: - Check for null when copying nsinfo. (Benno Evers) Libraries: - Rename libtraceevent prefixes, prep work for making it a shared library generaly available (Tzvetomir Stoyanov (VMware)) Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
Linus Torvalds | 1f7a4c73a7 |
Pull request for inclusion in 4.19, take two
This tag is the same as 9p-for-4.19 without the two MAINTAINERS patches Contains mostly fixes (6 to be backported to stable) and a few changes, here is the breakdown: * Rework how fids are attributed by replacing some custom tracking in a list by an idr ( |
|
Linus Torvalds | 6ada4e2826 |
Merge branch 'akpm' (patches from Andrew)
Merge updates from Andrew Morton: - a few misc things - a few Y2038 fixes - ntfs fixes - arch/sh tweaks - ocfs2 updates - most of MM * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (111 commits) mm/hmm.c: remove unused variables align_start and align_end fs/userfaultfd.c: remove redundant pointer uwq mm, vmacache: hash addresses based on pmd mm/list_lru: introduce list_lru_shrink_walk_irq() mm/list_lru.c: pass struct list_lru_node* as an argument to __list_lru_walk_one() mm/list_lru.c: move locking from __list_lru_walk_one() to its caller mm/list_lru.c: use list_lru_walk_one() in list_lru_walk_node() mm, swap: make CONFIG_THP_SWAP depend on CONFIG_SWAP mm/sparse: delete old sparse_init and enable new one mm/sparse: add new sparse_init_nid() and sparse_init() mm/sparse: move buffer init/fini to the common place mm/sparse: use the new sparse buffer functions in non-vmemmap mm/sparse: abstract sparse buffer allocations mm/hugetlb.c: don't zero 1GiB bootmem pages mm, page_alloc: double zone's batchsize mm/oom_kill.c: document oom_lock mm/hugetlb: remove gigantic page support for HIGHMEM mm, oom: remove sleep from under oom_lock kernel/dma: remove unsupported gfp_mask parameter from dma_alloc_from_contiguous() mm/cma: remove unsupported gfp_mask parameter from cma_alloc() ... |
|
Colin Ian King | 5241d47274 |
fs/userfaultfd.c: remove redundant pointer uwq
Pointer uwq is being assigned but is never used hence it is redundant and can be removed. Cleans up clang warning: warning: variable 'uwq' set but not used [-Wunused-but-set-variable] Link: http://lkml.kernel.org/r/20180717090802.18357-1-colin.king@canonical.com Signed-off-by: Colin Ian King <colin.king@canonical.com> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
|
Kirill Tkhai | 9b996468cf |
mm: add SHRINK_EMPTY shrinker methods return value
We need to distinguish the situations when shrinker has very small amount of objects (see vfs_pressure_ratio() called from super_cache_count()), and when it has no objects at all. Currently, in the both of these cases, shrinker::count_objects() returns 0. The patch introduces new SHRINK_EMPTY return value, which will be used for "no objects at all" case. It's is a refactoring mostly, as SHRINK_EMPTY is replaced by 0 by all callers of do_shrink_slab() in this patch, and all the magic will happen in further. Link: http://lkml.kernel.org/r/153063069574.1818.11037751256699341813.stgit@localhost.localdomain Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com> Acked-by: Vladimir Davydov <vdavydov.dev@gmail.com> Tested-by: Shakeel Butt <shakeelb@google.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Guenter Roeck <linux@roeck-us.net> Cc: "Huang, Ying" <ying.huang@intel.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Josef Bacik <jbacik@fb.com> Cc: Li RongQing <lirongqing@baidu.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Matthias Kaehlcke <mka@chromium.org> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Michal Hocko <mhocko@kernel.org> Cc: Minchan Kim <minchan@kernel.org> Cc: Philippe Ombredanne <pombredanne@nexb.com> Cc: Roman Gushchin <guro@fb.com> Cc: Sahitya Tummala <stummala@codeaurora.org> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Waiman Long <longman@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
|
Kirill Tkhai | c92e8e10ca |
fs: propagate shrinker::id to list_lru
Add list_lru::shrinker_id field and populate it by registered shrinker id. This will be used to set correct bit in memcg shrinkers map by lru code in next patches, after there appeared the first related to memcg element in list_lru. Link: http://lkml.kernel.org/r/153063059758.1818.14866596416857717800.stgit@localhost.localdomain Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com> Acked-by: Vladimir Davydov <vdavydov.dev@gmail.com> Tested-by: Shakeel Butt <shakeelb@google.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Guenter Roeck <linux@roeck-us.net> Cc: "Huang, Ying" <ying.huang@intel.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Josef Bacik <jbacik@fb.com> Cc: Li RongQing <lirongqing@baidu.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Matthias Kaehlcke <mka@chromium.org> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Michal Hocko <mhocko@kernel.org> Cc: Minchan Kim <minchan@kernel.org> Cc: Philippe Ombredanne <pombredanne@nexb.com> Cc: Roman Gushchin <guro@fb.com> Cc: Sahitya Tummala <stummala@codeaurora.org> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Waiman Long <longman@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
|
Kirill Tkhai | 2b3648a6ff |
fs/super.c: refactor alloc_super()
Do two list_lru_init_memcg() calls after prealloc_super(). destroy_unused_super() in fail path is OK with this. Next patch needs such the order. Link: http://lkml.kernel.org/r/153063058712.1818.3382490999719078571.stgit@localhost.localdomain Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com> Acked-by: Vladimir Davydov <vdavydov.dev@gmail.com> Tested-by: Shakeel Butt <shakeelb@google.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Guenter Roeck <linux@roeck-us.net> Cc: "Huang, Ying" <ying.huang@intel.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Josef Bacik <jbacik@fb.com> Cc: Li RongQing <lirongqing@baidu.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Matthias Kaehlcke <mka@chromium.org> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Michal Hocko <mhocko@kernel.org> Cc: Minchan Kim <minchan@kernel.org> Cc: Philippe Ombredanne <pombredanne@nexb.com> Cc: Roman Gushchin <guro@fb.com> Cc: Sahitya Tummala <stummala@codeaurora.org> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Waiman Long <longman@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
|
Shakeel Butt | f745c6f5fe |
fs, mm: account buffer_head to kmemcg
The buffer_head can consume a significant amount of system memory and is directly related to the amount of page cache. In our production environment we have observed that a lot of machines are spending a significant amount of memory as buffer_head and can not be left as system memory overhead. Charging buffer_head is not as simple as adding __GFP_ACCOUNT to the allocation. The buffer_heads can be allocated in a memcg different from the memcg of the page for which buffer_heads are being allocated. One concrete example is memory reclaim. The reclaim can trigger I/O of pages of any memcg on the system. So, the right way to charge buffer_head is to extract the memcg from the page for which buffer_heads are being allocated and then use targeted memcg charging API. [shakeelb@google.com: use __GFP_ACCOUNT for directed memcg charging] Link: http://lkml.kernel.org/r/20180702220208.213380-1-shakeelb@google.com Link: http://lkml.kernel.org/r/20180627191250.209150-3-shakeelb@google.com Signed-off-by: Shakeel Butt <shakeelb@google.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: Jan Kara <jack@suse.cz> Cc: Amir Goldstein <amir73il@gmail.com> Cc: Greg Thelen <gthelen@google.com> Cc: Vladimir Davydov <vdavydov.dev@gmail.com> Cc: Roman Gushchin <guro@fb.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
|
Shakeel Butt | d46eb14b73 |
fs: fsnotify: account fsnotify metadata to kmemcg
Patch series "Directed kmem charging", v8. The Linux kernel's memory cgroup allows limiting the memory usage of the jobs running on the system to provide isolation between the jobs. All the kernel memory allocated in the context of the job and marked with __GFP_ACCOUNT will also be included in the memory usage and be limited by the job's limit. The kernel memory can only be charged to the memcg of the process in whose context kernel memory was allocated. However there are cases where the allocated kernel memory should be charged to the memcg different from the current processes's memcg. This patch series contains two such concrete use-cases i.e. fsnotify and buffer_head. The fsnotify event objects can consume a lot of system memory for large or unlimited queues if there is either no or slow listener. The events are allocated in the context of the event producer. However they should be charged to the event consumer. Similarly the buffer_head objects can be allocated in a memcg different from the memcg of the page for which buffer_head objects are being allocated. To solve this issue, this patch series introduces mechanism to charge kernel memory to a given memcg. In case of fsnotify events, the memcg of the consumer can be used for charging and for buffer_head, the memcg of the page can be charged. For directed charging, the caller can use the scope API memalloc_[un]use_memcg() to specify the memcg to charge for all the __GFP_ACCOUNT allocations within the scope. This patch (of 2): A lot of memory can be consumed by the events generated for the huge or unlimited queues if there is either no or slow listener. This can cause system level memory pressure or OOMs. So, it's better to account the fsnotify kmem caches to the memcg of the listener. However the listener can be in a different memcg than the memcg of the producer and these allocations happen in the context of the event producer. This patch introduces remote memcg charging API which the producer can use to charge the allocations to the memcg of the listener. There are seven fsnotify kmem caches and among them allocations from dnotify_struct_cache, dnotify_mark_cache, fanotify_mark_cache and inotify_inode_mark_cachep happens in the context of syscall from the listener. So, SLAB_ACCOUNT is enough for these caches. The objects from fsnotify_mark_connector_cachep are not accounted as they are small compared to the notification mark or events and it is unclear whom to account connector to since it is shared by all events attached to the inode. The allocations from the event caches happen in the context of the event producer. For such caches we will need to remote charge the allocations to the listener's memcg. Thus we save the memcg reference in the fsnotify_group structure of the listener. This patch has also moved the members of fsnotify_group to keep the size same, at least for 64 bit build, even with additional member by filling the holes. [shakeelb@google.com: use GFP_KERNEL_ACCOUNT rather than open-coding it] Link: http://lkml.kernel.org/r/20180702215439.211597-1-shakeelb@google.com Link: http://lkml.kernel.org/r/20180627191250.209150-2-shakeelb@google.com Signed-off-by: Shakeel Butt <shakeelb@google.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: Jan Kara <jack@suse.cz> Cc: Amir Goldstein <amir73il@gmail.com> Cc: Greg Thelen <gthelen@google.com> Cc: Vladimir Davydov <vdavydov.dev@gmail.com> Cc: Roman Gushchin <guro@fb.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
|
Jens Axboe | ac22b46a0b |
ext4: readpages() should submit IO as read-ahead
a_ops->readpages() is only ever used for read-ahead. Ensure that we pass this information down to the block layer. Link: http://lkml.kernel.org/r/20180621010725.17813-5-axboe@kernel.dk Signed-off-by: Jens Axboe <axboe@kernel.dk> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Chris Mason <clm@fb.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
|
Jens Axboe | 5e9d398240 |
btrfs: readpages() should submit IO as read-ahead
a_ops->readpages() is only ever used for read-ahead. Ensure that we pass this information down to the block layer. Link: http://lkml.kernel.org/r/20180621010725.17813-4-axboe@kernel.dk Signed-off-by: Jens Axboe <axboe@kernel.dk> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Chris Mason <clm@fb.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
|
Jens Axboe | 74c8164e1c |
mpage: mpage_readpages() should submit IO as read-ahead
a_ops->readpages() is only ever used for read-ahead, yet we don't flag the IO being submitted as such. Fix that up. Any file system that uses mpage_readpages() as its ->readpages() implementation will now get this right. Since we're passing in whether the IO is read-ahead or not, we don't need to pass in the 'gfp' separately, as it is dependent on the IO being read-ahead. Kill off that member. Add some documentation notes on ->readpages() being purely for read-ahead. Link: http://lkml.kernel.org/r/20180621010725.17813-3-axboe@kernel.dk Signed-off-by: Jens Axboe <axboe@kernel.dk> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Chris Mason <clm@fb.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
|
Jens Axboe | 357c120652 |
mpage: add argument structure for do_mpage_readpage()
Patch series "Submit ->readpages() IO as read-ahead", v4. The only caller of ->readpages() is from read-ahead, yet we don't submit IO flagged with REQ_RAHEAD. This means we don't see it in blktrace, for instance, which is a shame. Additionally, it's preventing further functional changes in the block layer for deadling with read-ahead more intelligently. We already make assumptions about ->readpages() just being for read-ahead in the mpage implementation, using readahead_gfp_mask(mapping) as out GFP mask of choice. This small series fixes up mpage_readpages() to submit with REQ_RAHEAD, which takes care of file systems using mpage_readpages(). The first patch is a prep patch, that makes do_mpage_readpage() take an argument structure. This patch (of 4): We're currently passing 8 arguments to this function, clean it up a bit by packing the arguments in an args structure we pass to it. No intentional functional changes in this patch. [akpm@linux-foundation.org: coding-style fixes] Link: http://lkml.kernel.org/r/20180621010725.17813-2-axboe@kernel.dk Signed-off-by: Jens Axboe <axboe@kernel.dk> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Christoph Hellwig <hch@lst.de> Cc: Theodore Ts'o <tytso@mit.edu> Cc: Chris Mason <clm@fb.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
|
NeilBrown | 1f4aace60b |
fs/seq_file.c: simplify seq_file iteration code and interface
The documentation for seq_file suggests that it is necessary to be able to move the iterator to a given offset, however that is not the case. If the iterator is stored in the private data and is stable from one read() syscall to the next, it is only necessary to support first/next interactions. Implementing this in a client is a little clumsy. - if ->start() is given a pos of zero, it should go to start of sequence. - if ->start() is given the name pos that was given to the most recent next() or start(), it should restore the iterator to state just before that last call - if ->start is given another number, it should set the iterator one beyond the start just before the last ->start or ->next call. Also, the documentation says that the implementation can interpret the pos however it likes (other than zero meaning start), but seq_file increments the pos sometimes which does impose on the implementation. This patch simplifies the interface for first/next iteration and simplifies the code, while maintaining complete backward compatability. Now: - if ->start() is given a pos of zero, it should return an iterator placed at the start of the sequence - if ->start() is given a non-zero pos, it should return the iterator in the same state it was after the last ->start or ->next. This is particularly useful for interators which walk the multiple chains in a hash table, e.g. using rhashtable_walk*. See fs/gfs2/glock.c and drivers/staging/lustre/lustre/llite/vvp_dev.c A large part of achieving this is to *always* call ->next after ->show has successfully stored all of an entry in the buffer. Never just increment the index instead. Also: - always pass &m->index to ->start() and ->next(), never a temp variable - don't clear ->from when ->count is zero, as ->from is dead when ->count is zero. Some ->next functions do not increment *pos when they return NULL. To maintain compatability with this, we still need to increment m->index in one place, if ->next didn't increment it. Note that such ->next functions are buggy and should be fixed. A simple demonstration is dd if=/proc/swaps bs=1000 skip=1 Choose any block size larger than the size of /proc/swaps. This will always show the whole last line of /proc/swaps. This patch doesn't work around buggy next() functions for this case. [neilb@suse.com: ensure ->from is valid] Link: http://lkml.kernel.org/r/87601ryb8a.fsf@notabene.neil.brown.name Signed-off-by: NeilBrown <neilb@suse.com> Acked-by: Jonathan Corbet <corbet@lwn.net> [docs] Tested-by: Jann Horn <jannh@google.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Kees Cook <keescook@chromium.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
|
NeilBrown | 4cdfffc872 |
vfs: discard ATTR_ATTR_FLAG
This flag was introduce in 2.1.37pre1 and the only place it was tested was removed in 2.1.43pre1. The flag was never set. Let's discard it properly. Link: http://lkml.kernel.org/r/877en0hewz.fsf@notabene.neil.brown.name Signed-off-by: NeilBrown <neilb@suse.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
|
Tetsuo Handa | 6cd00a01f0 |
fs/dcache.c: fix kmemcheck splat at take_dentry_name_snapshot()
Since only dentry->d_name.len + 1 bytes out of DNAME_INLINE_LEN bytes are initialized at __d_alloc(), we can't copy the whole size unconditionally. WARNING: kmemcheck: Caught 32-bit read from uninitialized memory (ffff8fa27465ac50) 636f6e66696766732e746d70000000000010000000000000020000000188ffff i i i i i i i i i i i i i u u u u u u u u u u i i i i i u u u u ^ RIP: 0010:take_dentry_name_snapshot+0x28/0x50 RSP: 0018:ffffa83000f5bdf8 EFLAGS: 00010246 RAX: 0000000000000020 RBX: ffff8fa274b20550 RCX: 0000000000000002 RDX: ffffa83000f5be40 RSI: ffff8fa27465ac50 RDI: ffffa83000f5be60 RBP: ffffa83000f5bdf8 R08: ffffa83000f5be48 R09: 0000000000000001 R10: ffff8fa27465ac00 R11: ffff8fa27465acc0 R12: ffff8fa27465ac00 R13: ffff8fa27465acc0 R14: 0000000000000000 R15: 0000000000000000 FS: 00007f79737ac8c0(0000) GS:ffffffff8fc30000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: ffff8fa274c0b000 CR3: 0000000134aa7002 CR4: 00000000000606f0 take_dentry_name_snapshot+0x28/0x50 vfs_rename+0x128/0x870 SyS_rename+0x3b2/0x3d0 entry_SYSCALL_64_fastpath+0x1a/0xa4 0xffffffffffffffff Link: http://lkml.kernel.org/r/201709131912.GBG39012.QMJLOVFSFFOOtH@I-love.SAKURA.ne.jp Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Cc: Vegard Nossum <vegard.nossum@gmail.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
|
Colin Ian King | 480bd56485 |
ocfs2: make several functions and variables static (and some const)
There are a variety of functions and variables that are local to the source and do not need to be in global scope, so make them static. Also make a couple of char arrays static const. Cleans up sparse warnings: symbol 'o2hb_heartbeat_mode_desc' was not declared. Should it be static? symbol 'o2hb_heartbeat_mode' was not declared. Should it be static? symbol 'o2hb_dependent_users' was not declared. Should it be static? symbol 'o2hb_region_dec_user' was not declared. Should it be static? symbol 'o2nm_fence_method_desc' was not declared. Should it be static? symbol 'lockdep_keys' was not declared. Should it be static? Link: http://lkml.kernel.org/r/20180628131659.12133-1-colin.king@canonical.com Signed-off-by: Colin Ian King <colin.king@canonical.com> Cc: Mark Fasheh <mfasheh@versity.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Joseph Qi <joseph.qi@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
|
wangyan | 229ba1f82a |
ocfs2: clean up some unnecessary code
Several functions have some unnecessary code, clean up these code. Link: http://lkml.kernel.org/r/5B14DF72.5020800@huawei.com Signed-off-by: Yan Wang <wangyan122@huawei.com> Reviewed-by: Jun Piao <piaojun@huawei.com> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Cc: Mark Fasheh <mark@fasheh.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Joseph Qi <jiangqi903@gmail.com> Cc: Changwei Ge <ge.changwei@h3c.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
|
Jun Piao | 93f5920d86 |
ocfs2: return -EROFS when filesystem becomes read-only
We should return -EROFS rather than other errno if filesystem becomes read-only. [akpm@linux-foundation.org: coding-style fixes] Link: http://lkml.kernel.org/r/5B191B26.9010501@huawei.com Signed-off-by: Jun Piao <piaojun@huawei.com> Reviewed-by: Yiwen Jiang <jiangyiwen@huawei.com> Acked-by: Joseph Qi <jiangqi903@gmail.com> Cc: Mark Fasheh <mark@fasheh.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Changwei Ge <ge.changwei@h3c.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
|
Kees Cook | ab62ef82ea |
ntfs: mft: remove VLA usage
In the quest to remove all stack VLA usage from the kernel[1], this allocates the maximum size stack buffer. Existing checks already require that blocksize >= NTFS_BLOCK_SIZE and mft_record_size <= PAGE_SIZE, so max_bhs can be at most PAGE_SIZE / NTFS_BLOCK_SIZE. Sanity checks are added for robustness. [1] https://lkml.kernel.org/r/CA+55aFzCG-zNmZwX4A2FQpadafLfEzK6CC=qPXydAacU1RqZWA@mail.gmail.com Link: http://lkml.kernel.org/r/20180626172909.41453-4-keescook@chromium.org Signed-off-by: Kees Cook <keescook@chromium.org> Cc: Anton Altaparmakov <anton@tuxera.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
|
Kees Cook | 2c27ce9150 |
ntfs: decompress: remove VLA usage
In the quest to remove all stack VLA usage from the kernel[1], this moves the stack buffer used during decompression to be allocated externally. The existing "dest_max_index" used in the VLA is bounded by cb_max_page. cb_max_page is bounded by max_page, and max_page is bounded by nr_pages. Since nr_pages is used for the "pages" allocation, it can similarly be used for the "completed_pages" allocation and passed into the decompression function. The error paths are updated to free the new allocation. [1] https://lkml.kernel.org/r/CA+55aFzCG-zNmZwX4A2FQpadafLfEzK6CC=qPXydAacU1RqZWA@mail.gmail.com Link: http://lkml.kernel.org/r/20180626172909.41453-3-keescook@chromium.org Signed-off-by: Kees Cook <keescook@chromium.org> Cc: Anton Altaparmakov <anton@tuxera.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
|
Kees Cook | ac4ecf968a |
ntfs: aops: remove VLA usage
In the quest to remove all stack VLA usage from the kernel[1], this uses the maximum size needed on the stack and adds a sanity check for robustness: index.block_size cannot be larger than PAGE_SIZE nor less than NTFS_BLOCK_SIZE. [1] https://lkml.kernel.org/r/CA+55aFzCG-zNmZwX4A2FQpadafLfEzK6CC=qPXydAacU1RqZWA@mail.gmail.com Link: http://lkml.kernel.org/r/20180626172909.41453-2-keescook@chromium.org Signed-off-by: Kees Cook <keescook@chromium.org> Cc: Anton Altaparmakov <anton@tuxera.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
|
Sebastian Andrzej Siewior | a10dcebacd |
fs/ntfs/aops.c: don't disable interrupts during kmap_atomic()
ntfs_end_buffer_async_read() disables interrupts around kmap_atomic().
This is a leftover from the old kmap_atomic() implementation which
relied on fixed mapping slots, so the caller had to make sure that the
same slot could not be reused from an interrupting context.
kmap_atomic() was changed to dynamic slots long ago and commit
|
|
Arnd Bergmann | f08957d0ff |
fs/hpfs: extend gmt_to_local() conversion to 64-bit times
The VFS timestamps are all 64-bit now, the only missing piece for hpfs is the internal conversion function. One interesting bit about hpfs is that it can already deal with moving the 136 year window of its timestamps to support a much wider range than other file systems with 32-bit timestamps. It also treats the timestamps as 'unsigned' on 64-bit architectures (but signed on 32-bit, because time_t always around to negative numbers in 2038). Changing the conversion to use time64_t makes 32-bit architectures behave the same way as 64-bit. For completeness, this also adds a clamp_t call for each conversion, so we don't wrap the timestamps but instead stay within the [0..U32_MAX] range of the on-disk timestamps. Link: http://lkml.kernel.org/r/20180718115017.742609-3-arnd@arndb.de Signed-off-by: Arnd Bergmann <arnd@arndb.de> Cc: Mikulas Patocka <mikulas@artax.karlin.mff.cuni.cz> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
|
Arnd Bergmann | bcf451ecfc |
fs/ntfs: use timespec64 directly for timestamp conversion
Now that the VFS has been converted from timespec to timespec64 timestamps, only the conversion to/from ntfs timestamps uses 32-bit seconds. This changes that last missing piece to get the ntfs implementation y2038 safe on 32-bit architectures. Link: http://lkml.kernel.org/r/20180718115017.742609-2-arnd@arndb.de Signed-off-by: Arnd Bergmann <arnd@arndb.de> Cc: Anton Altaparmakov <anton@tuxera.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
|
Arnd Bergmann | a3fda0ffea |
fs/ufs: use ktime_get_real_seconds for sb and cg timestamps
get_seconds() is deprecated because of the 32-bit overflow and will be removed. All callers in ufs also truncate to a 32-bit number, so nothing changes during the conversion, but this should be harmless as the superblock and cylinder group timestamps are not visible to user space, except for checking the fs-dirty state, wich works fine across the overflow. This moves the call to get_seconds() into a new inline function, with a comment explaining the constraints, while converting it to ktime_get_real_seconds(). Link: http://lkml.kernel.org/r/20180718115017.742609-1-arnd@arndb.de Signed-off-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Thomas Gleixner <tglx@linutronix.de> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
|
Dave Jiang | e1fb4a0864 |
dax: remove VM_MIXEDMAP for fsdax and device dax
This patch is reworked from an earlier patch that Dan has posted: https://patchwork.kernel.org/patch/10131727/ VM_MIXEDMAP is used by dax to direct mm paths like vm_normal_page() that the memory page it is dealing with is not typical memory from the linear map. The get_user_pages_fast() path, since it does not resolve the vma, is already using {pte,pmd}_devmap() as a stand-in for VM_MIXEDMAP, so we use that as a VM_MIXEDMAP replacement in some locations. In the cases where there is no pte to consult we fallback to using vma_is_dax() to detect the VM_MIXEDMAP special case. Now that we have explicit driver pfn_t-flag opt-in/opt-out for get_user_pages() support for DAX we can stop setting VM_MIXEDMAP. This also means we no longer need to worry about safely manipulating vm_flags in a future where we support dynamically changing the dax mode of a file. DAX should also now be supported with madvise_behavior(), vma_merge(), and copy_page_range(). This patch has been tested against ndctl unit test. It has also been tested against xfstests commit: 625515d using fake pmem created by memmap and no additional issues have been observed. Link: http://lkml.kernel.org/r/152847720311.55924.16999195879201817653.stgit@djiang5-desk3.ch.intel.com Signed-off-by: Dave Jiang <dave.jiang@intel.com> Acked-by: Dan Williams <dan.j.williams@intel.com> Cc: Jan Kara <jack@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
|
Linus Torvalds | 9bd553929f |
This has been a large cycle for RDMA, with several major patch series
reworking parts of the core code. - Rework the so-called 'gid cache' and internal APIs to use a kref'd pointer to a struct instead of copying, push this upwards into the callers and add more stuff to the struct. The new design avoids some ugly races the old one suffered with. This is part of the namespace enablement work as the new struct is learning to be namespace aware. - Various uapi cleanups, moving more stuff to include/uapi and fixing some long standing bugs that have recently been discovered. - Driver updates for mlx5, mlx4 i40iw, rxe, cxgb4, hfi1, usnic, pvrdma, and hns - Provide max_send_sge and max_recv_sge attributes to better support HW where these values are asymmetric. - mlx5 user API 'devx' allows sending commands directly to the device FW, instead of trying to cram every wild and niche feature into the common API. Sort of like what GPU does. - Major write() and ioctl() API rework to cleanly support PCI device hot unplug and advance the ioctl conversion work - Sparse and compile warning cleanups - Add 'const' to the ib_poll_cq() signature, and permit a NULL 'bad_wr', which is the common use case - Various patches to avoid high order allocations across the stack - SRQ support for cxgb4, hns and qedr - Changes to IPoIB to better follow the netdev model for working with struct net_device liftime -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEfB7FMLh+8QxL+6i3OG33FX4gmxoFAlt17oMACgkQOG33FX4g mxpRsQ//YZY1Gci1IoYLMuq0Rn9+/4lRHaBev+B728z1dvEFBW8m/i2DV5dPnSxO AUN9dZOKBYYhc08h8vphtnBdMEtYJz6Dl76F8W+mt5vSuM5D4+0ba415RYSnV1Dc d6Js33OTMVbQVHmYCIAXh9FNDX8lkywT346aXlMOpW3z74xoaLkkQ0cnfB0SEX0y q9jiu70s6eisLlu9zJsXmCCLQ1b8eUD6IZm7hX8wMheuhDWyfrOv8JBeBCQdICuI MASc2T7X8E++dvIePAL7Hgx/0SH/2Mit8zaJ0Sbt2OjBDcImLSs8bcple5gPoCPk 3vnCdb2GKg8xlxe3n1S89sGC1b8MY2CtQFElSs9C6npIGCwr2XlrZDDa0tE45+8I miVhoswakmKW61KTCkVf2d9RXWcIh1qwUIpan1aZMsWdNnA6FYXIF054mMmJO44+ HUi2C93zAhx3XhFuX6O2YAHkG6CSXcZPfO7U9zy++GwAoXtGU0g6OLZbaYdEfuQh lN8LLqxe3M5sMdDnHYc38AsLW9MmxyJXt+h2yLxtsdZ9jitypBDQxSVfAI68RNwL BB1qELflF9FtAousQU9qhdNHimsgwctJ9MoZ6I1Aa1+ovwcSQgmKoQlNJIHkFroB wUz2sz6q25OdLWDpFrGipmG7Kfnosg7xuBSYZUQMBzLmjg0HTVY= =F50c -----END PGP SIGNATURE----- Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma Pull rdma updates from Jason Gunthorpe: "This has been a large cycle for RDMA, with several major patch series reworking parts of the core code. - Rework the so-called 'gid cache' and internal APIs to use a kref'd pointer to a struct instead of copying, push this upwards into the callers and add more stuff to the struct. The new design avoids some ugly races the old one suffered with. This is part of the namespace enablement work as the new struct is learning to be namespace aware. - Various uapi cleanups, moving more stuff to include/uapi and fixing some long standing bugs that have recently been discovered. - Driver updates for mlx5, mlx4 i40iw, rxe, cxgb4, hfi1, usnic, pvrdma, and hns - Provide max_send_sge and max_recv_sge attributes to better support HW where these values are asymmetric. - mlx5 user API 'devx' allows sending commands directly to the device FW, instead of trying to cram every wild and niche feature into the common API. Sort of like what GPU does. - Major write() and ioctl() API rework to cleanly support PCI device hot unplug and advance the ioctl conversion work - Sparse and compile warning cleanups - Add 'const' to the ib_poll_cq() signature, and permit a NULL 'bad_wr', which is the common use case - Various patches to avoid high order allocations across the stack - SRQ support for cxgb4, hns and qedr - Changes to IPoIB to better follow the netdev model for working with struct net_device liftime" * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (312 commits) Revert "net/smc: Replace ib_query_gid with rdma_get_gid_attr" RDMA/hns: Fix usage of bitmap allocation functions return values IB/core: Change filter function return type from int to bool IB/core: Update GID entries for netdevice whose mac address changes IB/core: Add default GIDs of the bond master netdev IB/core: Consider adding default GIDs of bond device IB/core: Delete lower netdevice default GID entries in bonding scenario IB/core: Avoid confusing del_netdev_default_ips IB/core: Add comment for change upper netevent handling qedr: Add user space support for SRQ qedr: Add support for kernel mode SRQ's qedr: Add wrapping generic structure for qpidr and adjust idr routines. IB/mlx5: Fix leaking stack memory to userspace Update the e-mail address of Bart Van Assche IB/ucm: Fix compiling ucm.c IB/uverbs: Do not check for device disassociation during ioctl IB/uverbs: Remove struct uverbs_root_spec and all supporting code IB/uverbs: Use uverbs_api to unmarshal ioctl commands IB/uverbs: Use uverbs_alloc for allocations IB/uverbs: Add a simple allocator to uverbs_attr_bundle ... |
|
Linus Torvalds | 2645b9d1a4 |
\n
-----BEGIN PGP SIGNATURE----- iQEzBAABCAAdFiEEq1nRK9aeMoq1VSgcnJ2qBz9kQNkFAlt2mBQACgkQnJ2qBz9k QNntGQgAluTTnuJLjoUDjFfT37Fjf2x1ve8rg6xmYS3YIhYTWWA1oazUIeyBDfwa soutlfAZ/ix2bP1UEmeULxFhrCIXYBbWAe8s5MRqO/7s01QftNf0M72ASmd7gZRy rSVt2/BWpr745mWI38tEKlIF4sQJVD7IGrnc1cQslPzleeCqsCXA+uBkBPMlcDpJ ZWni2qK023y9E2dsg6RsJc1HemkQvrJtoLSVqRsdhty9GEuWseMbssdgz1zMXljQ eXIALE5BssoxISIpH6qVKZRlr7UWGxOmV4CDPmku7DFLOSiwMk/Ml0V80BwzjNNY hY8qfxcJOFOGZ8t82pWkVGMjgOAKjA== =IN6Y -----END PGP SIGNATURE----- Merge tag 'fsnotify_for_v4.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs Pull fsnotify updates from Jan Kara: "fsnotify cleanups from Amir and a small inotify improvement" * tag 'fsnotify_for_v4.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs: inotify: Add flag IN_MASK_CREATE for inotify_add_watch() fanotify: factor out helpers to add/remove mark fsnotify: add helper to get mask from connector fsnotify: let connector point to an abstract object fsnotify: pass connp and object type to fsnotify_add_mark() fsnotify: use typedef fsnotify_connp_t for brevity |
|
Linus Torvalds | 46e62a072a |
\n
-----BEGIN PGP SIGNATURE----- iQEzBAABCAAdFiEEq1nRK9aeMoq1VSgcnJ2qBz9kQNkFAlt2l2MACgkQnJ2qBz9k QNlZMAgAwVu/bMsRR6PbXJIAYEUNLehrmgUfSdYxIFqnZPq84ZfpOMQZKDYJIO5d WiLz9Z9pti/ldrQ33yllbJrsalAn8R+LB911eaKUvLscXyrIsoBxsBbOOtVZc9lZ jaQBUMLStdPvE6LgW93f1EwIg/Z8CSTzaeCO31wlZl7s7wsBhjg3MJ3f9sR6LG0G OKQZnjDxGbtsbeVl8cnOeeF3sd0kqYTT5EwSh+zkMIbHJQ0dbvEjj24TM9rHdzG2 AN35+rzFZeMHRGnfWsQ/I6il1nTuWIyPRpoc57cwV/dcYwpg1Pi6MZzrFcDsWfwx rHgRJIkmSqi1S6Ic8o6s9fYsn6266A== =ljWe -----END PGP SIGNATURE----- Merge tag 'for_v4.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs Pull UDF and ext2 update from Jan Kara. * tag 'for_v4.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs: ext2: use ktime_get_real_seconds for timestamps udf: convert inode stamps to timespec64 |
|
Jason Gunthorpe | 0a3173a5f0 |
Merge branch 'linus/master' into rdma.git for-next
rdma.git merge resolution for the 4.19 merge window Conflicts: drivers/infiniband/core/rdma_core.c - Use the rdma code and revise with the new spelling for atomic_fetch_add_unless drivers/nvme/host/rdma.c - Replace max_sge with max_send_sge in new blk code drivers/nvme/target/rdma.c - Use the blk code and revise to use NULL for ib_post_recv when appropriate - Replace max_sge with max_recv_sge in new blk code net/rds/ib_send.c - Use the net code and revise to use NULL for ib_post_recv when appropriate Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> |
|
Jason Gunthorpe | 89982f7cce |
Linux 4.18
-----BEGIN PGP SIGNATURE----- iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAltwm2geHHRvcnZhbGRz QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiGITkH/iSzkVhT2OxHoir0 mLVzTi7/Z17L0e/ELl7TvAC0iLFlWZKdlGR0g3b4/QpXLPmNK4HxiDRTQuWn8ke0 qDZyDq89HqLt+mpeFZ43PCd9oqV8CH2xxK3iCWReqv6bNnowGnRpSStlks4rDqWn zURC/5sUh7TzEG4s997RrrpnyPeQWUlf/Mhtzg2/WvK2btoLWgu5qzjX1uFh3s7u vaF2NXVJ3X03gPktyxZzwtO1SwLFS1jhwUXWBZ5AnoJ99ywkghQnkqS/2YpekNTm wFk80/78sU+d91aAqO8kkhHj8VRrd+9SGnZ4mB2aZHwjZjGcics4RRtxukSfOQ+6 L47IdXo= =sJkt -----END PGP SIGNATURE----- Merge tag 'v4.18' into rdma.git for-next Resolve merge conflicts from the -rc cycle against the rdma.git tree: Conflicts: drivers/infiniband/core/uverbs_cmd.c - New ifs added to ib_uverbs_ex_create_flow in -rc and for-next - Merge removal of file->ucontext in for-next with new code in -rc drivers/infiniband/core/uverbs_main.c - for-next removed code from ib_uverbs_write() that was modified in for-rc Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> |
|
Linus Torvalds | 5c60a7389d |
Orangefs: one cleanup and Souptick's vm_fault_t patch
1. Adding new return type vm_fault_t (Souptick Joarder) 2. remove redundant pointer orangefs_inode (Colin Ian King) -----BEGIN PGP SIGNATURE----- iQIcBAABAgAGBQJbdZXmAAoJEM9EDqnrzg2+O+YP/AoU+NnPj9rDYKC/OImp4uhh aIER9LOFXFJocWULAQccFXLawRVzllBwwcWSwLlGAa2AT8DyIxpuyxJhNLIfrEKV axsfAQA/mU529i8PRgwnYdQJ0cKgzHR9qrQvTrBPAV+xhrlIeQI48cNlriwJikFF 0bXkWZt5ZSn+e5FkKFm/OqiialwcrOkMGnM+Apa0B9MSvmapLcCuvGxqYYKEbSaV JYqnZ3DiDnBp/6RYUY/qn/Azp8gCDfrPlm05lUZnAbyFGwaidunOgNMHTbQAZ//H hLuGRsMWOdQqwEMr+H9vPZVBTp6DfupgH8BgB5Y5EHcwgoWK5U3sZZQKP5f8+9vh 7StCSnc9qT5iJWTbOWIngIpSeNnVa6iF7QMXt7wxOQY2ITu5Cnot1fWhuj2UcA36 xmf38B6YRX4VeLMc/eryQCD7d4EpBYIqdyaLAg0Qg1Y35DU9b3QkC56ca56uQrHY QZeQAqH63CpHiajrYCHE5wsr5zrLXbYj229Idq2KBhEqXcxCV17kwjLF3rpyEbxu 9I4HpafzQ0Sho+zsCgakyu5DYBAfMbAYqR7pT5MGNB8yYVzxMcSEsAWSQ42Ab1qb P09p1ojQQxjrqApMOa6L4MrLNA7Wl75LGRnwNy7c83qkys8Y90JhdZsQlLwlp+PT rYnIliKQuTRY+7JV/4WL =3Oz+ -----END PGP SIGNATURE----- Merge tag 'for-linus-4.19-ofs1' of git://git.kernel.org/pub/scm/linux/kernel/git/hubcap/linux Pull orangefs updates from Mike Marshall: "Orangefs: one cleanup and Souptick's vm_fault_t patch: - add new return type vm_fault_t (Souptick Joarder) - remove redundant pointer (Colin Ian King)" * tag 'for-linus-4.19-ofs1' of git://git.kernel.org/pub/scm/linux/kernel/git/hubcap/linux: orangefs: remove redundant pointer orangefs_inode orangefs: Adding new return type vm_fault_t |
|
Trond Myklebust | ea51f94b45 |
pNFS: Treat RECALLCONFLICT like DELAY...
Yes, it is possible to get trapped in a loop, but the server should be administratively revoking the recalled layout if it never gets returned. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com> |
|
Trond Myklebust | ecf8402603 |
pNFS: When updating the stateid in layoutreturn, also update the recall range
When we update the layout stateid in nfs4_layoutreturn_refresh_stateid, we should also update the range in order to let the server know we're actually returning everything. Fixes: 16c278dbfa63 ("pnfs: Fix handling of NFS4ERR_OLD_STATEID replies...") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com> |
|
Linus Torvalds | 5bae2be4ef |
Just one jfs patch for 4.19
-----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEEIodevzQLVs53l6BhNqiEXrVAjGQFAlt0SSYACgkQNqiEXrVA jGTwvA//aTnRN3lPQK0iEeiGw6lzhQX/cO7eXlTTGnHMMxy62NZTSUlZhTaoTm63 bg0MHknX13oikraMMpSEwsf4qoPcIgalTXmFd7RWmOHKw7GBd6LJznsRQQ4i3G8g 0a1KtXLyRXT68UgJ6U0BmukWBjNC1qG9ToWbBG8SXMhrxuFbpg4uPRtMl6eRI9fV U0CH//x94TYXSB2D4N1eVzvRrIs4l0iJA1RxfqmYZSBQe7b7LW3GLafsIm0axQO3 hy7XEUtjWzhuGILRVTJ+9hmyTG41YARWYrG0Rdd4h0sB5nK/jl8YRZtofGs7zuBK RqzJHUSNGPXza54O0bBKQk6IwTJTsjhWg2f3AXFComEP4hvTA50i/CAa/XBZYKam Fq99C70txMA4Ufwrmh4dN+20qtMYFwuvpsbNMiyuQjCDUxXwvey4RNLc1o6J3tWH 1qVNNk/k+kJY704CqA+h7Ay0A1ocaa64glwPIcBDgP5Us72LE/QjDRE4NfQVg4wq WbVO+Rml8kB+uU2ma2U2y4XXgZIFv7JWxmQ6fxWfMe2kH0+Z6Ech2D66t/oBW6w7 Q0kp2+YYaSpvIbKnQYzBQjW+W3kPIPAYLV4HptM89p0ZLi4DlKLi4EywbeGPsA/R gynd4Uxi95TA+2bAnKSNuoNse/5mq+R2F4+RuVZMUHt2DWEz2iA= =VxRQ -----END PGP SIGNATURE----- Merge tag 'jfs-4.19' of git://github.com/kleikamp/linux-shaggy Pull jfs update from David Kleikamp: "Just one jfs patch for 4.19" * tag 'jfs-4.19' of git://github.com/kleikamp/linux-shaggy: jfs: use time64_t for otime |
|
Linus Torvalds | 2b2f2aedba |
gfs2 4.19 merge
Changes on top of v4.18-rc1 / iomap-4.19-merge-1: 1. Iomap support for buffered writes and for direct I/O. 2. Two patches that reduce the size of struct gfs2_inode. 3. Lots of fixes and cleanups. -----BEGIN PGP SIGNATURE----- iQIcBAABAgAGBQJbcakSAAoJENW/n+sDE2U6AR0QAJlai+92ERML2pM+1hiuEHWP KizBoV/53pc+dll1dlcEOHQFys2vbcFavcCtcsTXNhLSp1wOqxyzcFQTX6ekWfSZ hTvJvAKTbeXu0zOWSV2DcX40JWb7SKDAxjxNb8XhL0COilgM9r+mdqoY/UNyVSel SVmWvs8UYt6UBnw4G8h5UlzSYxl/M64udU1pVO5D8JMQ5cxDKj3kfFoJLLKBDwLF vaNFxiihdTzmMwMNo3Px7GFSsb5Jnyo9LgAoDKsYd9YlzqGpAvvoYXH8itj4TuSb sM1KTUZK+97XvquZfuv5BniEifP7XZSq4xYIxyr9HMaOefeys0GdzaCSCb3ifFte 7bqjowlAbHWwBNa9ofuJ1NShsAiOv0GUGDzlY+T/0IgSlqRr0JxAikJ3jLIZQ1Hf CwWY66XakeSi5euDTi41SuGZMcxTXaX15VbXl6/SGsv4X0dyVXleBz6RuC9Q+n2H 7nqlGppRW2NB1WUqkJ15n9JaNLAF5I6umERTBXKGODM56p/GmZYoScCEGrqj9obN CntPtL6yrazASjV3+zqXA//OvTb3xfykYu17wVLKhXWD0YWQiuDfA481BLvEut2G aTtNU3b4VDwv5NuBf7G3wvN0+v3WyJ3gRfhTaEdFnX5PpcH3eHz5/fzU7zCJJwDu g53icn3efqNu7WvpAkB+ =oGs4 -----END PGP SIGNATURE----- Merge tag 'gfs2-4.19.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2 Pull gfs2 updates from Andreas Gruenbacher: - iomap support for buffered writes and for direct I/O - two patches that reduce the size of struct gfs2_inode - lots of fixes and cleanups * tag 'gfs2-4.19.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2: (25 commits) gfs2: eliminate update_rgrp_lvb_unlinked gfs2: Fix gfs2_testbit to use clone bitmaps gfs2: Get rid of gfs2_ea_strlen gfs2: cleanup: call gfs2_rgrp_ondisk2lvb from gfs2_rgrp_out gfs2: Special-case rindex for gfs2_grow GFS2: rgrp free blocks used incorrectly gfs2: remove redundant variable 'moved' gfs2: use iomap_readpage for blocksize == PAGE_SIZE gfs2: Use iomap for stuffed direct I/O reads gfs2: fallocate_chunk: Always initialize struct iomap GFS2: Fix recovery issues for spectators fs: gfs2: Adding new return type vm_fault_t gfs2: using posix_acl_xattr_size instead of posix_acl_to_xattr gfs2: Don't reject a supposedly full bitmap if we have blocks reserved gfs2: Eliminate redundant ip->i_rgd gfs2: Stop messing with ip->i_rgd in the rlist code gfs2: Remove gfs2_write_{begin,end} gfs2: iomap direct I/O support gfs2: gfs2_extent_length cleanup gfs2: iomap buffered write support ... |
|
Linus Torvalds | 72f02ba66b |
SCSI misc on 20180815
This is mostly updates to the usual drivers: mpt3sas, lpfc, qla2xxx, hisi_sas, smartpqi, megaraid_sas, arcmsr. In addition, with the continuing absence of Nic we have target updates for tcmu and target core (all with reviews and acks). The biggest observable change is going to be that we're (again) trying to switch to mulitqueue as the default (a user can still override the setting on the kernel command line). Other major core stuff is the removal of the remaining Microchannel drivers, an update of the internal timers and some reworks of completion and result handling. Signed-off-by: James E.J. Bottomley <jejb@linux.vnet.ibm.com> -----BEGIN PGP SIGNATURE----- iJwEABMIAEQWIQTnYEDbdso9F2cI+arnQslM7pishQUCW3R3niYcamFtZXMuYm90 dG9tbGV5QGhhbnNlbnBhcnRuZXJzaGlwLmNvbQAKCRDnQslM7pishauRAP4yfBKK dbxF81c/Bxi/Stk16FWkOOrjs4CizwmnMcpM5wD/UmM9o6ebDzaYpZgA8wIl7X/N o/JckEZZpIp+5NySZNc= =ggLB -----END PGP SIGNATURE----- Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi Pull SCSI updates from James Bottomley: "This is mostly updates to the usual drivers: mpt3sas, lpfc, qla2xxx, hisi_sas, smartpqi, megaraid_sas, arcmsr. In addition, with the continuing absence of Nic we have target updates for tcmu and target core (all with reviews and acks). The biggest observable change is going to be that we're (again) trying to switch to mulitqueue as the default (a user can still override the setting on the kernel command line). Other major core stuff is the removal of the remaining Microchannel drivers, an update of the internal timers and some reworks of completion and result handling" * tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (203 commits) scsi: core: use blk_mq_run_hw_queues in scsi_kick_queue scsi: ufs: remove unnecessary query(DM) UPIU trace scsi: qla2xxx: Fix issue reported by static checker for qla2x00_els_dcmd2_sp_done() scsi: aacraid: Spelling fix in comment scsi: mpt3sas: Fix calltrace observed while running IO & reset scsi: aic94xx: fix an error code in aic94xx_init() scsi: st: remove redundant pointer STbuffer scsi: qla2xxx: Update driver version to 10.00.00.08-k scsi: qla2xxx: Migrate NVME N2N handling into state machine scsi: qla2xxx: Save frame payload size from ICB scsi: qla2xxx: Fix stalled relogin scsi: qla2xxx: Fix race between switch cmd completion and timeout scsi: qla2xxx: Fix Management Server NPort handle reservation logic scsi: qla2xxx: Flush mailbox commands on chip reset scsi: qla2xxx: Fix unintended Logout scsi: qla2xxx: Fix session state stuck in Get Port DB scsi: qla2xxx: Fix redundant fc_rport registration scsi: qla2xxx: Silent erroneous message scsi: qla2xxx: Prevent sysfs access when chip is down scsi: qla2xxx: Add longer window for chip reset ... |
|
Eric W. Biederman | 84fe4cc09a |
signal: Don't send signals to tasks that don't exist
Recently syzbot reported crashes in send_sigio_to_task and
send_sigurg_to_task in linux-next. Despite finding a reproducer
syzbot apparently did not bisected this or otherwise track down the
offending commit in linux-next.
I happened to see this report and examined the code because I had
recently changed these functions as part of making PIDTYPE_TGID a real
pid type so that fork would does not need to restart when receiving a
signal. By examination I see that I spotted a bug in the code
that could explain the reported crashes.
When I took Oleg's suggestion and optimized send_sigurg and send_sigio
to only send to a single task when type is PIDTYPE_PID or PIDTYPE_TGID
I failed to handle pids that no longer point to tasks. The macro
do_each_pid_task simply iterates for zero iterations. With pid_task
an explicit NULL test is needed.
Update the code to include the missing NULL test.
Fixes:
|
|
Linus Torvalds | 71f3a82fab |
media updates for v4.19-rc1
-----BEGIN PGP SIGNATURE----- iQIcBAABAgAGBQJbdGisAAoJEAhfPr2O5OEVmXcQAIe+wWHfAFwFDCqRGvG7yHIV q3Rf6bqYE/IK++sSX9qrCmPC1jXsw6oXjaYqjtdplLb6GBBkNdKo6neGtARV1phB Moj0JVmFwvnq6yMBZl4TFIZIZHIlfXIe/Ac6b9h5aiL1bGUdnukLhSnGVnoWteVv F6BRAhH/PDIf7xUpxchEic7HbwnWDWI51ALeDWZQCv5SkNgBUBrLDumWr7b2YCWF BcNfC14fNN1Jk4Hi9NpS7VCC4Nxvxn+D3iFsVa10Ur/A8EPApIlj+8nu/CDA8p4c cJa8ImvaKhK0PAfciKMi5RBs/5r5BFRlrmLADkyVz8pmz5AQqjHot/TGoC22LqaN h0T9lTKEmFAE8CYFQhcOfb6qcR6bfnM6yyVjeCrnAQ3Fw9TgESiPbPE7BpgnxoTU i3h7a5WHYs4xjkMByJUYda1GhahPD82eJJp8l4iIQ2IMlTZbrEmLFmg13WNO3950 a+dy3HqpXUd2EYwFn9CVFzjTzWKE+63obQWIxLFWJWRirqr5XKbTmB7nPLeik6Os kPmn8t0eKig76SYiJ2rhKgTMUB1rXjtBYtqP/f02FYN0WSZCEIs7oKMb/SoVfTlP 2rF6k2ZPTzB2ZJp3RzNRX/+/39f04i26QzeEwwQ6/8daxSna3YlKuLMvHzJ5UuFp q28aN8hD5eXU1IEWHtvH =ClJZ -----END PGP SIGNATURE----- Merge tag 'media/v4.19-1' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media Pull media updates from Mauro Carvalho Chehab: - new Socionext MN88443x ISDB-S/T demodulator driver: mn88443x - new sensor drivers: ak7375, ov2680 and rj54n1cb0c - an old soc-camera sensor driver converted to the V4L2 framework: mt9v111 - a new Voice-Coil Motor (VCM) driver: dw9807-vcm - some cleanups at cx25821, removing legacy unused code - some improvements at ddbridge driver - new platform driver: vicodec - some DVB API cleanups, removing ioctls and compat code for old out-of-tree drivers that were never merged upstream - improvements at DVB core to support frontents that support both Satellite and non-satellite delivery systems - got rid of the unused VIDIOC_RESERVED V4L2 ioctl - some cleanups/improvements at gl861 ISDB driver - several improvements on ov772x, ov7670 and ov5640, imx274, ov5645, and smiapp sensor drivers - fixes at em28xx to support dual TS devices - some cleanups at V4L2/VB2 locking logic - some API improvements at media controller - some cec core and drivers improvements - some uvcvideo improvements - some improvements at platform drivers: stm32-dcmi, rcar-vin, coda, reneseas-ceu, imx, vsp1, venus, camss - lots of other cleanups and fixes * tag 'media/v4.19-1' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media: (406 commits) Revert "media: vivid: shut up warnings due to a non-trivial logic" siano: get rid of an unused return code for debugfs register media: isp: fix a warning about a wrong struct initializer media: radio-wl1273: fix return code for the polling routine media: s3c-camif: fix return code for the polling routine media: saa7164: fix return codes for the polling routine media: exynos-gsc: fix return code if mutex was interrupted media: mt9v111: Fix build error with no VIDEO_V4L2_SUBDEV_API media: xc4000: get rid of uneeded casts media: drxj: get rid of uneeded casts media: tuner-xc2028: don't use casts for printing sizes media: cleanup fall-through comments media: vivid: shut up warnings due to a non-trivial logic media: rtl28xxu: be sure that it won't go past the array size media: mt9v111: avoid going past the buffer media: vsp1_dl: add a description for cmdpool field media: sta2x11: add a missing parameter description media: v4l2-mem2mem: add descriptions to MC fields media: i2c: fix warning in Aptina MT9V111 media: imx: shut up a false positive warning ... |
|
Linus Torvalds | 9a76aba02a |
Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next
Pull networking updates from David Miller: "Highlights: - Gustavo A. R. Silva keeps working on the implicit switch fallthru changes. - Support 802.11ax High-Efficiency wireless in cfg80211 et al, From Luca Coelho. - Re-enable ASPM in r8169, from Kai-Heng Feng. - Add virtual XFRM interfaces, which avoids all of the limitations of existing IPSEC tunnels. From Steffen Klassert. - Convert GRO over to use a hash table, so that when we have many flows active we don't traverse a long list during accumluation. - Many new self tests for routing, TC, tunnels, etc. Too many contributors to mention them all, but I'm really happy to keep seeing this stuff. - Hardware timestamping support for dpaa_eth/fsl-fman from Yangbo Lu. - Lots of cleanups and fixes in L2TP code from Guillaume Nault. - Add IPSEC offload support to netdevsim, from Shannon Nelson. - Add support for slotting with non-uniform distribution to netem packet scheduler, from Yousuk Seung. - Add UDP GSO support to mlx5e, from Boris Pismenny. - Support offloading of Team LAG in NFP, from John Hurley. - Allow to configure TX queue selection based upon RX queue, from Amritha Nambiar. - Support ethtool ring size configuration in aquantia, from Anton Mikaev. - Support DSCP and flowlabel per-transport in SCTP, from Xin Long. - Support list based batching and stack traversal of SKBs, this is very exciting work. From Edward Cree. - Busyloop optimizations in vhost_net, from Toshiaki Makita. - Introduce the ETF qdisc, which allows time based transmissions. IGB can offload this in hardware. From Vinicius Costa Gomes. - Add parameter support to devlink, from Moshe Shemesh. - Several multiplication and division optimizations for BPF JIT in nfp driver, from Jiong Wang. - Lots of prepatory work to make more of the packet scheduler layer lockless, when possible, from Vlad Buslov. - Add ACK filter and NAT awareness to sch_cake packet scheduler, from Toke Høiland-Jørgensen. - Support regions and region snapshots in devlink, from Alex Vesker. - Allow to attach XDP programs to both HW and SW at the same time on a given device, with initial support in nfp. From Jakub Kicinski. - Add TLS RX offload and support in mlx5, from Ilya Lesokhin. - Use PHYLIB in r8169 driver, from Heiner Kallweit. - All sorts of changes to support Spectrum 2 in mlxsw driver, from Ido Schimmel. - PTP support in mv88e6xxx DSA driver, from Andrew Lunn. - Make TCP_USER_TIMEOUT socket option more accurate, from Jon Maxwell. - Support for templates in packet scheduler classifier, from Jiri Pirko. - IPV6 support in RDS, from Ka-Cheong Poon. - Native tproxy support in nf_tables, from Máté Eckl. - Maintain IP fragment queue in an rbtree, but optimize properly for in-order frags. From Peter Oskolkov. - Improvde handling of ACKs on hole repairs, from Yuchung Cheng" * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1996 commits) bpf: test: fix spelling mistake "REUSEEPORT" -> "REUSEPORT" hv/netvsc: Fix NULL dereference at single queue mode fallback net: filter: mark expected switch fall-through xen-netfront: fix warn message as irq device name has '/' cxgb4: Add new T5 PCI device ids 0x50af and 0x50b0 net: dsa: mv88e6xxx: missing unlock on error path rds: fix building with IPV6=m inet/connection_sock: prefer _THIS_IP_ to current_text_addr net: dsa: mv88e6xxx: bitwise vs logical bug net: sock_diag: Fix spectre v1 gadget in __sock_diag_cmd() ieee802154: hwsim: using right kind of iteration net: hns3: Add vlan filter setting by ethtool command -K net: hns3: Set tx ring' tc info when netdev is up net: hns3: Remove tx ring BD len register in hns3_enet net: hns3: Fix desc num set to default when setting channel net: hns3: Fix for phy link issue when using marvell phy driver net: hns3: Fix for information of phydev lost problem when down/up net: hns3: Fix for command format parsing error in hclge_is_all_function_id_zero net: hns3: Add support for serdes loopback selftest bnxt_en: take coredump_record structure off stack ... |
|
Linus Torvalds | fa1b5d09d0 |
Consolidation of Kconfig files by Christoph Hellwig.
Move the source statements of arch-independent Kconfig files instead of duplicating the includes in every arch/$(SRCARCH)/Kconfig. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAABAgAGBQJbdFsfAAoJED2LAQed4NsGxHsP/1tmA57OOOj8oGxO2OXhXVbr Q0MZqCoV4bqMvK/hgCQdl9f+tp0m+j12x4xDLdVf4OqnTXMbqvPDu3uQVKvaj/k1 gHhsFA1tFgSbuJ8InltUsrPEQqbceeJsj50xHVAKijqI6LYeRPPSU7aE9obn+OzH n2nd5sLKvMI/dqdJvW6i5KPydqTH3r3iA7D+ne/XQj0s0EMXvXUPmDT1+ijTnM4a yfm6W5p7L/c3Ugf1Pz5PfnPl4BxBwZMfW5ie/UO8j5C6Rl0iPaOGuuHurocaaJb3 MefR/7NEAR3G8MhJyL2+70jbbwhjpqR2b5ooz1vpuulPHxjeU45BY60XIBWq1afR ewsc12MMCYB695ieYWoHdaWgxD/jhffyRuajfpkXKIZEMgDxS03sMhdULXENVMx1 M0ZQ01g/NLWt9ti9DY3eTKB3ymOhnBa1sa77nGGUHkITq4DQKwPX1J9FP/HT6RNt uOvzeH5kGzc7tqOlZAO0kHbwhQG1uqGcd78IYd4lgf/XfkSgDERTWjnJmnQbwr9m 3PFuST2u8eyO+8Lh1MK76TXOEkXsHMdFugPmb6SlgtMEPKGVLDPlsj52o/LFtgzl eygfMiBFr2+ttkZ6IpNcpmQ4IztmDpz6XoMk3PqDAfUTUSYpCnq1gAEuff/eisCM Odva1ZZaeQ7WpxhsP8rr =gsQJ -----END PGP SIGNATURE----- Merge tag 'kconfig-v4.19-2' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild Pull Kconfig consolidation from Masahiro Yamada: "Consolidation of Kconfig files by Christoph Hellwig. Move the source statements of arch-independent Kconfig files instead of duplicating the includes in every arch/$(SRCARCH)/Kconfig" * tag 'kconfig-v4.19-2' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: kconfig: add a Memory Management options" menu kconfig: move the "Executable file formats" menu to fs/Kconfig.binfmt kconfig: use a menu in arch/Kconfig to reduce clutter kconfig: include kernel/Kconfig.preempt from init/Kconfig Kconfig: consolidate the "Kernel hacking" menu kconfig: include common Kconfig files from top-level Kconfig kconfig: remove duplicate SWAP symbol defintions um: create a proper drivers Kconfig um: cleanup Kconfig files um: stop abusing KBUILD_KCONFIG |
|
Linus Torvalds | 3529b9703c |
- Add awareness of zstd compression (Geliang Tang)
-----BEGIN PGP SIGNATURE----- Comment: Kees Cook <kees@outflux.net> iQJKBAABCgA0FiEEpcP2jyKd1g9yPm4TiXL039xtwCYFAltx60wWHGtlZXNjb29r QGNocm9taXVtLm9yZwAKCRCJcvTf3G3AJqmtD/43lXkoLGl31JsCVkVASg2K34B3 DI6uc4vwqJmGCNo9JrGGFNOqnV9n/rBg77Ymi6qDVRImbDm6t3FofaPLTVU1G9oK 3KYlv7x99j6DoFmP+U6sEvnRb5znqrmFR/t6Nx4yn7p68XY6hd7IFx8wZ6ZNt26i eNUN67Hzg+eEuuDSf1XbVLIbeY9qqIyJvjqf6WUWXD2iAxcVPBUyxpNcsEGtmqti 9e76t/iaPq+T4V8AODthl/lfyD/xauD6g48CpIRTeDvEbkC/cBZgYS3y5frVv4+d xve5j8aM2YYIaWzJGeE+3j+PXY6pdq/50iSPalo0SQg8ypaDpB7IRYmnBoR08dlX GNQxNLkfqXGniJf5Khaxzta9l25yjr8nRC3mLkI42dM/FIhHWi3/wF6nJSMaKZXJ FAZPC06VZR4Amb2pseNtRd09nYRDtqydmjsq53Y/1X0aaqgDiX1vEtothroLS4J1 Lg3VWNzKjowTA40xc0z/5H+8jxw2gUfjtJKIQvYvaNXmEEfC+BvHk9OC5ZHSizu5 PvBHNTGD2inyOpjWZishr6bgpeS4K/i/IDJy6UkFF1Q+z3udRbQTqWEUwwVjRnO0 IlluTw7PcEy5ctbcLT+ABjdyjtWGzp0lATx+s3YvvZ53MvM20rV6nj9CPwfk+PE1 MyqBBtSvtGvOsl9p9g== =3N3c -----END PGP SIGNATURE----- Merge tag 'pstore-v4.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux Pull pstore update from Kees Cook: "This cycle has been very quiet for pstore: the only change is adding awareness of the zstd compression method" * tag 'pstore-v4.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: pstore: add zstd compression support |
|
Trond Myklebust | 8618289c46 |
NFSv4: Fix a sleep in atomic context in nfs4_callback_sequence()
We must drop the lock before we can sleep in referring_call_exists().
Reported-by: Jia-Ju Bai <baijiaju1990@gmail.com>
Fixes:
|
|
Trond Myklebust | d0fbb1d8a1 |
NFSv4: Fix locking in pnfs_generic_recover_commit_reqs
The use of the inode->i_lock was converted to a mutex, but we forgot
to remove the old inode unlock/lock() pair that allowed the layout
segment to be put inside the loop.
Reported-by: Jia-Ju Bai <baijiaju1990@gmail.com>
Fixes:
|
|
Linus Torvalds | 1202f4fdbc |
arm64 updates for 4.19
A bunch of good stuff in here: - Wire up support for qspinlock, replacing our trusty ticket lock code - Add an IPI to flush_icache_range() to ensure that stale instructions fetched into the pipeline are discarded along with the I-cache lines - Support for the GCC "stackleak" plugin - Support for restartable sequences, plus an arm64 port for the selftest - Kexec/kdump support on systems booting with ACPI - Rewrite of our syscall entry code in C, which allows us to zero the GPRs on entry from userspace - Support for chained PMU counters, allowing 64-bit event counters to be constructed on current CPUs - Ensure scheduler topology information is kept up-to-date with CPU hotplug events - Re-enable support for huge vmalloc/IO mappings now that the core code has the correct hooks to use break-before-make sequences - Miscellaneous, non-critical fixes and cleanups -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQEcBAABCgAGBQJbbV41AAoJELescNyEwWM0WoEIALhrKtsIn6vqFlSs/w6aDuJL cMWmFxjTaKLmIq2+cJIdFLOJ3CH80Pu9gB+nEv/k+cZdCTfUVKfRf28HTpmYWsht bb4AhdHMC7yFW752BHk+mzJspeC8h/2Rm8wMuNVplZ3MkPrwo3vsiuJTofLhVL/y BihlU3+5sfBvCYIsWnuEZIev+/I/s/qm1ASiqIcKSrFRZP6VTt5f9TC75vFI8seW 7yc3odKb0CArexB8yBjiPNziehctQF42doxQyL45hezLfWw4qdgHOSiwyiOMxEz9 Fwwpp8Tx33SKLNJgqoqYznGW9PhYJ7n2Kslv19uchJrEV+mds82vdDNaWRULld4= =kQn6 -----END PGP SIGNATURE----- Merge tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux Pull arm64 updates from Will Deacon: "A bunch of good stuff in here. Worth noting is that we've pulled in the x86/mm branch from -tip so that we can make use of the core ioremap changes which allow us to put down huge mappings in the vmalloc area without screwing up the TLB. Much of the positive diffstat is because of the rseq selftest for arm64. Summary: - Wire up support for qspinlock, replacing our trusty ticket lock code - Add an IPI to flush_icache_range() to ensure that stale instructions fetched into the pipeline are discarded along with the I-cache lines - Support for the GCC "stackleak" plugin - Support for restartable sequences, plus an arm64 port for the selftest - Kexec/kdump support on systems booting with ACPI - Rewrite of our syscall entry code in C, which allows us to zero the GPRs on entry from userspace - Support for chained PMU counters, allowing 64-bit event counters to be constructed on current CPUs - Ensure scheduler topology information is kept up-to-date with CPU hotplug events - Re-enable support for huge vmalloc/IO mappings now that the core code has the correct hooks to use break-before-make sequences - Miscellaneous, non-critical fixes and cleanups" * tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (90 commits) arm64: alternative: Use true and false for boolean values arm64: kexec: Add comment to explain use of __flush_icache_range() arm64: sdei: Mark sdei stack helper functions as static arm64, kaslr: export offset in VMCOREINFO ELF notes arm64: perf: Add cap_user_time aarch64 efi/libstub: Only disable stackleak plugin for arm64 arm64: drop unused kernel_neon_begin_partial() macro arm64: kexec: machine_kexec should call __flush_icache_range arm64: svc: Ensure hardirq tracing is updated before return arm64: mm: Export __sync_icache_dcache() for xen-privcmd drivers/perf: arm-ccn: Use devm_ioremap_resource() to map memory arm64: Add support for STACKLEAK gcc plugin arm64: Add stack information to on_accessible_stack drivers/perf: hisi: update the sccl_id/ccl_id when MT is supported arm64: fix ACPI dependencies rseq/selftests: Add support for arm64 arm64: acpi: fix alignment fault in accessing ACPI efi/arm: map UEFI memory map even w/o runtime services enabled efi/arm: preserve early mapping of UEFI memory map longer for BGRT drivers: acpi: add dependency of EFI for arm64 ... |
|
Richard Weinberger | 99a24e02cc |
ubifs: Set default assert action to read-only
Traditionally UBIFS just reported a failed assertion and moved on. The drawback is that users will notice UBIFS bugs when it is too late, most of the time when it is no longer about to mount. This makes bug hunting problematic since valuable information from failing asserts is long gone when UBIFS is dead. The other extreme, panic'ing on a failing assert is also not worthwhile, we want users and developers give a chance to collect as much debugging information as possible if UBIFS hits an assert. Therefore go for the third option, switch to read-only mode when an assert fails. That way UBIFS will not write possible bad data to the MTD and gives users the chance to collect debugging information. Signed-off-by: Richard Weinberger <richard@nod.at> |
|
Richard Weinberger | c38c5a7f2e |
ubifs: Allow setting assert action as mount parameter
Expose our three options to userspace. Signed-off-by: Richard Weinberger <richard@nod.at> |
|
Richard Weinberger | 2e52eb7446 |
ubifs: Rework ubifs_assert()
With having access to struct ubifs_info in ubifs_assert() we can give more information when an assert is failing. By using ubifs_err() we can tell which UBIFS instance failed. Also multiple actions can be taken now. We support: - report: This is what UBIFS did so far, just report the failure and go on. - read-only: Switch to read-only mode. - panic: shoot the kernel in the head. Signed-off-by: Richard Weinberger <richard@nod.at> |
|
Richard Weinberger | 6eb61d587f |
ubifs: Pass struct ubifs_info to ubifs_assert()
This allows us to have more context in ubifs_assert() and take different actions depending on the configuration. Signed-off-by: Richard Weinberger <richard@nod.at> |
|
Richard Weinberger | 54169ddd38 |
ubifs: Turn two ubifs_assert() into a WARN_ON()
We are going to pass struct ubifs_info to ubifs_assert() but while unloading the UBIFS module we don't have the info struct anymore. Therefore replace the asserts by a regular WARN_ON(). Signed-off-by: Richard Weinberger <richard@nod.at> |
|
Richard Weinberger | a3d218280c |
ubifs: Use kmalloc_array()
Since commit
|
|
Richard Weinberger | 95a22d2084 |
ubifs: Check data node size before truncate
Check whether the size is within bounds before using it.
If the size is not correct, abort and dump the bad data node.
Cc: Kees Cook <keescook@chromium.org>
Cc: Silvio Cesare <silvio.cesare@gmail.com>
Cc: stable@vger.kernel.org
Fixes:
|
|
Richard Weinberger | 08acbdd6fd |
Revert "UBIFS: Fix potential integer overflow in allocation"
This reverts commit
|
|
Richard Weinberger | 49d2e05fb4 |
ubifs: Add comment on c->commit_sem
Every single time I come across that code, I get confused because it looks like a possible dead lock. Help myself by adding a comment. Signed-off-by: Richard Weinberger <richard@nod.at> |
|
Stefan Agner | 7e5471ce6d |
ubifs: introduce Kconfig symbol for xattr support
Allow to disable extended attribute support. This aids in reliability testing, especially since some xattr related bugs have surfaced. Also an embedded system might not need it, so this allows for a slightly smaller kernel (about 4KiB). Signed-off-by: Stefan Agner <stefan@agner.ch> Signed-off-by: Richard Weinberger <richard@nod.at> |
|
Gustavo A. R. Silva | 1bf0572fe2 |
ubifs: use swap macro in swap_dirty_idx
Make use of the swap macro and remove unnecessary variable *t*. This makes the code easier to read and maintain. This code was detected with the help of Coccinelle. Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com> Signed-off-by: Richard Weinberger <richard@nod.at> |
|
Adrian Hunter | 6855dc41b2 |
x86: Add entry trampolines to kcore
Without program headers for PTI entry trampoline pages, the trampoline virtual addresses do not map to anything. Example before: sudo gdb --quiet vmlinux /proc/kcore Reading symbols from vmlinux...done. [New process 1] Core was generated by `BOOT_IMAGE=/boot/vmlinuz-4.16.0 root=UUID=a6096b83-b763-4101-807e-f33daff63233'. #0 0x0000000000000000 in irq_stack_union () (gdb) x /21ib 0xfffffe0000006000 0xfffffe0000006000: Cannot access memory at address 0xfffffe0000006000 (gdb) quit After: sudo gdb --quiet vmlinux /proc/kcore [sudo] password for ahunter: Reading symbols from vmlinux...done. [New process 1] Core was generated by `BOOT_IMAGE=/boot/vmlinuz-4.16.0-fix-4-00005-gd6e65a8b4072 root=UUID=a6096b83-b7'. #0 0x0000000000000000 in irq_stack_union () (gdb) x /21ib 0xfffffe0000006000 0xfffffe0000006000: swapgs 0xfffffe0000006003: mov %rsp,-0x3e12(%rip) # 0xfffffe00000021f8 0xfffffe000000600a: xchg %ax,%ax 0xfffffe000000600c: mov %cr3,%rsp 0xfffffe000000600f: bts $0x3f,%rsp 0xfffffe0000006014: and $0xffffffffffffe7ff,%rsp 0xfffffe000000601b: mov %rsp,%cr3 0xfffffe000000601e: mov -0x3019(%rip),%rsp # 0xfffffe000000300c 0xfffffe0000006025: pushq $0x2b 0xfffffe0000006027: pushq -0x3e35(%rip) # 0xfffffe00000021f8 0xfffffe000000602d: push %r11 0xfffffe000000602f: pushq $0x33 0xfffffe0000006031: push %rcx 0xfffffe0000006032: push %rdi 0xfffffe0000006033: mov $0xffffffff91a00010,%rdi 0xfffffe000000603a: callq 0xfffffe0000006046 0xfffffe000000603f: pause 0xfffffe0000006041: lfence 0xfffffe0000006044: jmp 0xfffffe000000603f 0xfffffe0000006046: mov %rdi,(%rsp) 0xfffffe000000604a: retq (gdb) quit In addition, entry trampolines all map to the same page. Represent that by giving the corresponding program headers in kcore the same offset. This has the benefit that, when perf tools uses /proc/kcore as a source for kernel object code, samples from different CPU trampolines are aggregated together. Note, such aggregation is normal for profiling i.e. people want to profile the object code, not every different virtual address the object code might be mapped to (across different processes for example). Notes by PeterZ: This also adds the KCORE_REMAP functionality. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Acked-by: Andi Kleen <ak@linux.intel.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Joerg Roedel <joro@8bytes.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: x86@kernel.org Link: http://lkml.kernel.org/r/1528289651-4113-4-git-send-email-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
Arnd Bergmann | 6cff573202 |
ubifs: tnc: use monotonic znode timestamp
The tnc uses get_seconds() based timestamps to check the age of a znode, which has two problems: on 32-bit architectures this may overflow in 2038 or 2106, and it gives incorrect information when the system time is updated using settimeofday(). Using montonic timestamps with ktime_get_seconds() solves both thes problems. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Richard Weinberger <richard@nod.at> |
|
Arnd Bergmann | 0eca0b8067 |
ubifs: use timespec64 for inode timestamps
Both vfs and the on-disk inode structures can deal with fine-grained timestamps now, so this is the last missing piece to make ubifs y2038-safe on 32-bit architectures. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Richard Weinberger <richard@nod.at> |
|
Richard Weinberger | 11a6fc3dc7 |
ubifs: xattr: Don't operate on deleted inodes
xattr operations can race with unlink and the following assert triggers:
UBIFS assert failed in ubifs_jnl_change_xattr at 1606 (pid 6256)
Fix this by checking i_nlink before working on the host inode.
Cc: <stable@vger.kernel.org>
Fixes:
|
|
Richard Weinberger | 312c39bd6d |
ubifs: gc: Fix typo
UBIFS operates on LEBs, not PEBs. Signed-off-by: Richard Weinberger <richard@nod.at> |
|
Richard Weinberger | eef19816ad |
ubifs: Fix memory leak in lprobs self-check
Allocate the buffer after we return early.
Otherwise memory is being leaked.
Cc: <stable@vger.kernel.org>
Fixes:
|
|
Richard Weinberger | 5996559320 |
ubifs: Fix synced_i_size calculation for xattr inodes
In ubifs_jnl_update() we sync parent and child inodes to the flash,
in case of xattrs, the parent inode (AKA host inode) has a non-zero
data_len. Therefore we need to adjust synced_i_size too.
This issue was reported by ubifs self tests unter a xattr related work
load.
UBIFS error (ubi0:0 pid 1896): dbg_check_synced_i_size: ui_size is 4, synced_i_size is 0, but inode is clean
UBIFS error (ubi0:0 pid 1896): dbg_check_synced_i_size: i_ino 65, i_mode 0x81a4, i_size 4
Cc: <stable@vger.kernel.org>
Fixes:
|
|
Richard Weinberger | 00ee8b6010 |
ubifs: Fix directory size calculation for symlinks
We have to account the name of the symlink and not the target length.
Fixes:
|
|
Linus Torvalds | be718b524d |
configfs updates for 4.19
- simplify the cide by using kvasprintf (Bart Van Assche). - fix a gcc 8 string truncation warning by making the code simpler (Guenter Roeck). - fix a bug in rmdir() handling (Mike Christie). -----BEGIN PGP SIGNATURE----- iQI/BAABCgApFiEEgdbnc3r/njty3Iq9D55TZVIEUYMFAltxL44LHGhjaEBsc3Qu ZGUACgkQD55TZVIEUYMjKxAAy9M8gy87QhrkOTVLher1A3DY48Qbvx5AEHM2jnZQ MekqGIcgAGYLlS3ifu54JJBB6QoM+UjKkvkjJiFaJbg685yFepkNkUUfRTy2tVqF 2TcsxUrhZu5gz2gm7tBll321Jz34w++PWzuVoI82YefeobYGu+XxWf0JnJQYJVL3 Q2W73AZBLwZQ1s0F5UNAFa7s+pznIpJLTMUD3iFmzoXx0Qihy7VGxcKxn6+Bzg2E UfojBH//K9cxk+dMZEiVaA+bXY39fd7bj+yDzcF8ThR+SPJwXjP9yLDDFqM7qsrK BnhcPYNV3Qjk5WCwvn/Lx22hFHgB44URjyvxaP6Bntp7CVk4IPVJqmqjUaov0fsZ zHNvdjk3cy6JCOTvcUECikM3qkfgpabNUbN/QSK+nKN6JT22wGrf0ojKpSYfdBb3 nhh8lV7YFkbmVFmigN9qAw2PH/l8C8XM0P/VezRUz9gba1jW8TcLhcvL1OKs4NFb LSS07pHfdhipm5go1kxcCLDb93Gg0MwYEFCX3xA6YoYU7bjg7uLUa8IR7dFVNdCC uipcxY9fCfcu2G2AV6WUVT+TQNYjC8tgUgSzM3LI+t12NTJ5hB7MFqSQUmCZSKmJ nSDjoHHFbY56H0rrelEdhzjXLNGTc8UmkIRDqd3oZk0wnvxZlq8fPRkmU7DLFEh6 tAU= =5AGn -----END PGP SIGNATURE----- Merge tag 'configfs-for-4.19' of git://git.infradead.org/users/hch/configfs Pull configfs updates from Christoph Hellwig: - simplify the cide by using kvasprintf (Bart Van Assche) - fix a gcc 8 string truncation warning by making the code simpler (Guenter Roeck) - fix a bug in rmdir() handling (Mike Christie) * tag 'configfs-for-4.19' of git://git.infradead.org/users/hch/configfs: configfs: fix registered group removal configfs: replace strncpy with memcpy configfs: use kvasprintf() instead of open-coding it |
|
Linus Torvalds | c2fc71c9b7 |
JFFS2 changes:
- Support 64-bit timestamps MTD changes: Core changes: - Support sub-partitions - Clarify mtd_oob_ops documentation - Make Kconfig formatting consistent - Fix potential overflows in mtdchar_{write,read}() - Fallback to ->_{read,write}() when ->_{read,write}_oob() is missing and no OOB data were requested - Remove VLA usage in the bch lib Driver changes: - Use mtd_device_register() instead of mtd_device_parse_register() where applicable - Use proper printk format to print physical addresses in the solutionengine driver - Add missing mtd_set_of_node() call in the powernv driver - Remove unneeded variables in a few drivers - Plug the TRX part parser to the DT partition parsers logic - Check ioremap_cache() return code in the gpio-addr-flash driver - Stop using VMLINUX_SYMBOL_STR() in gen_probe.c SPI NOR changes: Core changes: - Apply reset hacks only when reset is explicitly marked as broken in the DT Driver changes: - Minor cleanup/fixes in the m25p80 driver - Release flash_np in the nxp-spifi driver - Add suspend/resume hooks to the atmel-quadspi driver - Include gpio/consumer.h instead of gpio.h in the atmel-quadspi driver - Use %pK instead of %p in the stm32-quadspi driver - Improve timeout handling in the cadence-quadspi driver - Use mtd_device_register() instead of mtd_device_parse_register() in the intel-spi driver NAND changes: Core changes: - Add the SPI-NAND framework. - Create a helper to find the best ECC configuration. - Create NAND controller operations. - Allocate dynamically ONFI parameters structure. - Add defines for ONFI version bits. - Add manufacturer fixup for ONFI parameter page. - Add an option to specify NAND chip as a boot device. - Add Reed-Solomon error correction algorithm. - Better name for the controller structure. - Remove unused caller_is_module() definition. - Make subop helpers return unsigned values. - Expose _notsupp() helpers for raw page accessors. - Add default values for dynamic timings. - Kill the chip->scan_bbt() hook. - Rename nand_default_bbt() into nand_create_bbt(). - Start to clean the nand_chip structure. - Remove stale prototype from rawnand.h. Raw NAND controllers drivers changes: - Qcom: structuring cleanup. - Denali: use core helper to find the best ECC configuration. - Possible build of almost all drivers by adding a dependency on COMPILE_TEST for almost all of them in Kconfig, implies various fixes, Kconfig cleanup, GPIO headers inclusion cleanup, and even changes in sparc64 and ia64 architectures. - Clean the ->probe() functions error path of a lot of drivers. - Migrate all drivers to use nand_scan() instead of nand_scan_ident()/nand_scan_tail() pair. - Use mtd_device_register() where applicable to simplify the code. - Marvell: * Handle on-die ECC. * Better clocks handling. * Remove bogus comment. * Add suspend and resume support. - Tegra: add NAND controller driver. - Atmel: * Add module param to avoid using dma. * Drop Wenyou Yang from MAINTAINERS. - Denali: optimize timings handling. - FSMC: Stop using chip->read_buf(). - FSL: * Switch to SPDX license tag identifiers. * Fix qualifiers in MXC init functions. Raw NAND chip drivers changes: - Micron: * Add fixup for ONFI revision. * Update ecc_stats.corrected. * Make ECC activation stateful. * Avoid enabling/disabling ECC when it can't be disabled. * Get the actual number of bitflips. * Allow forced on-die ECC. * Support 8/512 on-die ECC. * Fix on-die ECC detection logic. - Hynix: * Fix decoding the OOB size on H27UCG8T2BTR. * Use ->exec_op() in hynix_nand_reg_write_op(). -----BEGIN PGP SIGNATURE----- iQI5BAABCAAjBQJbcSYdHBxib3Jpcy5icmV6aWxsb25AYm9vdGxpbi5jb20ACgkQ Ze02AX4ItwA85Q//ViUNiWoluaj7q3Kpv44NZZClPjVfirDtIHEfyzob7HbkrqjT ivjNUu5cjKU2aqptR24UzIV2lwmaSGicIzvyANN5bNedHy8tb4BWXo1iTEflXc1P lYihrRz7AeHQqdrOYAiHxq0fcqgIm2NjCDxfdtkP1HJsMQH+LgvoosOkd8RyLF8E jV5IyreTfg10AiVS44yvnqdsYo7GDVZRgnx1dBRbdJ+WwZS05m6LSeNu0ivgfLMM NmISrn7pzlMYnuDkzckXwVka7STFhV9befoSCmAHedqNEHtzaGDaNMLEilaC4WzV 1yvsdRGI3hDkdO4s44nP/Vs2XlVmCevuTpOOQ1dEK94Bbek+us7NEPjxOIAf22jE 5MVZR6vpNN7JXJ+nvEZjjL8X9zysqZgk2r6os35Wi2zJK7iHIrFpjlMYwTJfVPzv JKabOQZuOUqRACbL0isTTdG0+Tzr1BxaQqFiFUteQ9QpG4nSifoHUmI+9XUCJmil iGTOy6rEkaMd6qu3xapPtZbxGbUbvcLiJvNrZ02ZfsELDAELowz66O0PX6rSwTV9 NIWeVNXK9Bsp0oxbb8P8oJIB5K4F6vqncPBls34J7E/xrLO9nLk1VvFQrGqzuwwP HmEdgcRwKeXrvJITZz2u0u6lrKpgdDA3FTqfGCNec51PCgb1FwxcpKF/Xz0= =yIkF -----END PGP SIGNATURE----- Merge tag 'mtd/for-4.19' of git://git.infradead.org/linux-mtd Pull mtd updates from Boris Brezillon: "JFFS2 changes: - Support 64-bit timestamps MTD core changes: - Support sub-partitions - Clarify mtd_oob_ops documentation - Make Kconfig formatting consistent - Fix potential overflows in mtdchar_{write,read}() - Fallback to ->_{read,write}() when ->_{read,write}_oob() is missing and no OOB data were requested - Remove VLA usage in the bch lib MTD driver changes: - Use mtd_device_register() instead of mtd_device_parse_register() where applicable - Use proper printk format to print physical addresses in the solutionengine driver - Add missing mtd_set_of_node() call in the powernv driver - Remove unneeded variables in a few drivers - Plug the TRX part parser to the DT partition parsers logic - Check ioremap_cache() return code in the gpio-addr-flash driver - Stop using VMLINUX_SYMBOL_STR() in gen_probe.c SPI NOR core changes: - Apply reset hacks only when reset is explicitly marked as broken in the DT SPI NOR driver changes: - Minor cleanup/fixes in the m25p80 driver - Release flash_np in the nxp-spifi driver - Add suspend/resume hooks to the atmel-quadspi driver - Include gpio/consumer.h instead of gpio.h in the atmel-quadspi driver - Use %pK instead of %p in the stm32-quadspi driver - Improve timeout handling in the cadence-quadspi driver - Use mtd_device_register() instead of mtd_device_parse_register() in the intel-spi driver NAND core changes: - Add the SPI-NAND framework. - Create a helper to find the best ECC configuration. - Create NAND controller operations. - Allocate dynamically ONFI parameters structure. - Add defines for ONFI version bits. - Add manufacturer fixup for ONFI parameter page. - Add an option to specify NAND chip as a boot device. - Add Reed-Solomon error correction algorithm. - Better name for the controller structure. - Remove unused caller_is_module() definition. - Make subop helpers return unsigned values. - Expose _notsupp() helpers for raw page accessors. - Add default values for dynamic timings. - Kill the chip->scan_bbt() hook. - Rename nand_default_bbt() into nand_create_bbt(). - Start to clean the nand_chip structure. - Remove stale prototype from rawnand.h. Raw NAND controllers drivers changes: - Qcom: structuring cleanup. - Denali: use core helper to find the best ECC configuration. - Possible build of almost all drivers by adding a dependency on COMPILE_TEST for almost all of them in Kconfig, implies various fixes, Kconfig cleanup, GPIO headers inclusion cleanup, and even changes in sparc64 and ia64 architectures. - Clean the ->probe() functions error path of a lot of drivers. - Migrate all drivers to use nand_scan() instead of nand_scan_ident()/nand_scan_tail() pair. - Use mtd_device_register() where applicable to simplify the code. - Marvell: * Handle on-die ECC. * Better clocks handling. * Remove bogus comment. * Add suspend and resume support. - Tegra: add NAND controller driver. - Atmel: * Add module param to avoid using dma. * Drop Wenyou Yang from MAINTAINERS. - Denali: optimize timings handling. - FSMC: Stop using chip->read_buf(). - FSL: * Switch to SPDX license tag identifiers. * Fix qualifiers in MXC init functions. Raw NAND chip drivers changes: - Micron: * Add fixup for ONFI revision. * Update ecc_stats.corrected. * Make ECC activation stateful. * Avoid enabling/disabling ECC when it can't be disabled. * Get the actual number of bitflips. * Allow forced on-die ECC. * Support 8/512 on-die ECC. * Fix on-die ECC detection logic. - Hynix: * Fix decoding the OOB size on H27UCG8T2BTR. * Use ->exec_op() in hynix_nand_reg_write_op()" * tag 'mtd/for-4.19' of git://git.infradead.org/linux-mtd: (188 commits) mtd: rawnand: atmel: Select GENERIC_ALLOCATOR MAINTAINERS: drop Wenyou Yang from Atmel NAND driver support mtd: rawnand: allocate dynamically ONFI parameters during detection mtd: spi-nor: only apply reset hacks to broken hardware mtd: spi-nor: cadence-quadspi: fix timeout handling mtd: spi-nor: atmel-quadspi: Include gpio/consumer.h instead of gpio.h mtd: spi-nor: intel-spi: use mtd_device_register() mtd: spi-nor: stm32-quadspi: replace "%p" with "%pK" mtd: spi-nor: atmel-quadspi: add suspend/resume hooks mtd: rawnand: allocate model parameter dynamically mtd: rawnand: do not export nand_scan_[ident|tail]() anymore mtd: rawnand: txx9ndfmc: convert driver to nand_scan() mtd: rawnand: txx9ndfmc: clarify ECC parameters assignation mtd: rawnand: tegra: convert driver to nand_scan() mtd: rawnand: jz4740: convert driver to nand_scan() mtd: rawnand: jz4740: group nand_scan_{ident, tail} calls mtd: rawnand: jz4740: fix probe function error path mtd: rawnand: docg4: convert driver to nand_scan() mtd: rawnand: do not execute nand_scan_ident() if maxchips is zero mtd: rawnand: atmel: convert driver to nand_scan() ... |
|
Chao Yu | dda9f4b9ca |
f2fs: fix to skip verifying block address for non-regular inode
generic/184 1s ... [failed, exit status 1]- output mismatch --- tests/generic/184.out 2015-01-11 16:52:27.643681072 +0800 QA output created by 184 - silence is golden +rm: cannot remove '/mnt/f2fs/null': Bad address +mknod: '/mnt/f2fs/null': Bad address +chmod: cannot access '/mnt/f2fs/null': Bad address +./tests/generic/184: line 36: /mnt/f2fs/null: Bad address ... F2FS-fs (zram0): access invalid blkaddr:259 EIP: f2fs_is_valid_blkaddr+0x14b/0x1b0 [f2fs] f2fs_iget+0x927/0x1010 [f2fs] f2fs_lookup+0x26e/0x630 [f2fs] __lookup_slow+0xb3/0x140 lookup_slow+0x31/0x50 walk_component+0x185/0x1f0 path_lookupat+0x51/0x190 filename_lookup+0x7f/0x140 user_path_at_empty+0x36/0x40 vfs_statx+0x61/0xc0 __do_sys_stat64+0x29/0x40 sys_stat64+0x13/0x20 do_fast_syscall_32+0xaa/0x22c entry_SYSENTER_32+0x53/0x86 In f2fs_iget(), we will check inode's first block address, if it is valid, we will set FI_FIRST_BLOCK_WRITTEN flag in inode. But we should only do this for regular inode, otherwise, like special inode, i_addr[0] is used for storing device info instead of block address, it will fail checking flow obviously. So for non-regular inode, let's skip verifying address and setting flag. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> |
|
Linus Torvalds | 73ba2fb33c |
for-4.19/block-20180812
-----BEGIN PGP SIGNATURE----- iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAltwvasQHGF4Ym9lQGtl cm5lbC5kawAKCRD301j7KXHgpv65EACTq5gSLnJBI6ZPr1RAHruVDnjfzO2Veitl tUtjm0XfWmnEiwQ3dYvnyhy99xbyaG3900d9BClCTlH6xaUdSiQkDpcKG/R2F36J 5mZitYukQcpFAQJWF8YKsTTE7JPl4VglCIDqYiC4+C3rOSVi8lrKn2qp4J4MMCFn thRg3jCcq7c5s9Eigsop1pXWQSasubkXfk55Krcp4oybKYpYRKXXf74Mj14QAbwJ QHN3VisyAUWoBRg7UQZo1Npe2oPk6bbnJypnjf8M0M2EnlvddEkIlHob91sodka8 6p4APOEu5cbyXOBCAQsw/koff14mb8aEadqeQA68WvXfIdX9ZjfxCX0OoC3sBEXk yqJhZ0C980AM13zIBD8ejv4uasGcPca8W+47mE5P8sRiI++5kBsFWDZPCtUBna0X 2Kh24NsmEya9XRR5vsB84dsIPQ3tLMkxg/IgQRVDaSnfJz0c/+zm54xDyKRaFT4l 5iERk2WSkm9+8jNfVmWG0edrv6nRAXjpGwFfOCPh6/LCSCi4xQRULYN7sVzsX8ZK FRjt24HftBI8mJbh4BtweJvg+ppVe1gAk3IO3HvxAQhv29Hz+uvFYe9kL+3N8LJA Qosr9n9O4+wKYizJcDnw+5iPqCHfAwOm9th4pyedR+R7SmNcP3yNC8AbbheNBiF5 Zolos5H+JA== =b9ib -----END PGP SIGNATURE----- Merge tag 'for-4.19/block-20180812' of git://git.kernel.dk/linux-block Pull block updates from Jens Axboe: "First pull request for this merge window, there will also be a followup request with some stragglers. This pull request contains: - Fix for a thundering heard issue in the wbt block code (Anchal Agarwal) - A few NVMe pull requests: * Improved tracepoints (Keith) * Larger inline data support for RDMA (Steve Wise) * RDMA setup/teardown fixes (Sagi) * Effects log suppor for NVMe target (Chaitanya Kulkarni) * Buffered IO suppor for NVMe target (Chaitanya Kulkarni) * TP4004 (ANA) support (Christoph) * Various NVMe fixes - Block io-latency controller support. Much needed support for properly containing block devices. (Josef) - Series improving how we handle sense information on the stack (Kees) - Lightnvm fixes and updates/improvements (Mathias/Javier et al) - Zoned device support for null_blk (Matias) - AIX partition fixes (Mauricio Faria de Oliveira) - DIF checksum code made generic (Max Gurtovoy) - Add support for discard in iostats (Michael Callahan / Tejun) - Set of updates for BFQ (Paolo) - Removal of async write support for bsg (Christoph) - Bio page dirtying and clone fixups (Christoph) - Set of bcache fix/changes (via Coly) - Series improving blk-mq queue setup/teardown speed (Ming) - Series improving merging performance on blk-mq (Ming) - Lots of other fixes and cleanups from a slew of folks" * tag 'for-4.19/block-20180812' of git://git.kernel.dk/linux-block: (190 commits) blkcg: Make blkg_root_lookup() work for queues in bypass mode bcache: fix error setting writeback_rate through sysfs interface null_blk: add lock drop/acquire annotation Blk-throttle: reduce tail io latency when iops limit is enforced block: paride: pd: mark expected switch fall-throughs block: Ensure that a request queue is dissociated from the cgroup controller block: Introduce blk_exit_queue() blkcg: Introduce blkg_root_lookup() block: Remove two superfluous #include directives blk-mq: count the hctx as active before allocating tag block: bvec_nr_vecs() returns value for wrong slab bcache: trivial - remove tailing backslash in macro BTREE_FLAG bcache: make the pr_err statement used for ENOENT only in sysfs_attatch section bcache: set max writeback rate when I/O request is idle bcache: add code comments for bset.c bcache: fix mistaken comments in request.c bcache: fix mistaken code comments in bcache.h bcache: add a comment in super.c bcache: avoid unncessary cache prefetch bch_btree_node_get() bcache: display rate debug parameters to 0 when writeback is not running ... |
|
Arnd Bergmann | 7fa750a163 |
f2fs: rework fault injection handling to avoid a warning
When CONFIG_F2FS_FAULT_INJECTION is disabled, we get a warning about an unused label: fs/f2fs/segment.c: In function '__submit_discard_cmd': fs/f2fs/segment.c:1059:1: error: label 'submit' defined but not used [-Werror=unused-label] This could be fixed by adding another #ifdef around it, but the more reliable way of doing this seems to be to remove the other #ifdefs where that is easily possible. By defining time_to_inject() as a trivial stub, most of the checks for CONFIG_F2FS_FAULT_INJECTION can go away. This also leads to nicer formatting of the code. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> |
|
Colin Ian King | e1b437691a |
orangefs: remove redundant pointer orangefs_inode
Pointer orangefs_inode is being assigned but is never used hence it is redundant and can be removed. Cleans up clang warning: warning: variable 'orangefs_inode' set but not used [-Wunused-but-set-variable] Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Mike Marshall <hubcap@omnibond.com> |
|
Souptick Joarder | 8bf782f647 |
orangefs: Adding new return type vm_fault_t
Use new return type vm_fault_t for fault handler. For now,
this is just documenting that the function returns a VM_FAULT
value rather than an errno. Once all instances are converted,
vm_fault_t will become a distinct type.
See the following
commit
|
|
Linus Torvalds | 781fca5b10 |
Changes for 4.19:
- Use extent maps to track pagecache page status instead of bufferhead state. - Refactor pagecache read and write paths to use the new iomap library functions, which enable us to drop the old bufferhead code for pagesize == blocksize filesystems. - Set up parallel per-block-per-page metadata to track subpage information that was tracked by buffer heads, which enables us to drop the old bufferhead code for pagesize > blocksize filesystems. - Tie a deferred ops control structure to a transaction so that we can take advantage of an upper-level dfops without having to plumb pointer passing through the code. - Refactor the deferred ops code to track deferred ops as part of the transaction structure (instead of as a separate data structure) so that we can simplify the scoping rules around defer_ops. - Refactor twisty delwri buffer submission code to avoid deadlocks. - Shorten and fix indenting problems in the scrub code. - Detect obviously bad summary counts at mount and fix them. - Directly associate deferred ops control structure with a transaction so that callers no longer have to manage it themselves. - Remove a couple of IRIX-era inode macros. - Remove the long-deprecated 'barrier' and 'nobarrier' mount options. - Clean up the inode fork structure a bit. - Check for bad fs summary counter values in the superblock. - Reduce COW fork lookups during writeback. - Refactor the deferred ops control structures into the transaction structure, thereby eliminating the need for transaction users to handle the deferred ops as a separate data structure. - Add the ability to repair AG headers online. - Fix a crash due to insufficient return value checking. - Various fixes and cleanups. -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEUzaAxoMeQq6m2jMV+H93GTRKtOsFAltwVGoACgkQ+H93GTRK tOt3bw//WaG1mR44Oo/mhf27MaEJK74LXeViqCH4Sdk10gujClTVnl6h33ChAyEi 7BT4x1JtwM6xOh7nPsXQy/besVxWadjQcTtAz/3U2wJoFyOX2+I27SAawrmX6jfR Hi1DxXFFK7z/8YvuZqYl3vTgxMNb7bLAUybe2sYX8q+vrQaUvl9eLQlHSaT3sxrc /lBkog1dYmbw3yjLnWYpQtC0I6Pa3ZuG/S2vpeJ2H5MADtzrRNjuC9MHZJW7tIGm +rCLm0agk8yFkEA84VvS5Afee3TppY/JBaYlsvG1rp3bs0fELAJFnzS4g/QDbbsX HAKPcMICJksF4C9y0Xb7wXPz/4PKur5/OSuGXN4QtOivOEoAdWfh2PLInqAjo/Le mO92PdkBucfVqJzfEC2q2QAnGIaJlG8txhAz87wZ1YfZDQQlJDy385Z9GQXfUpy5 /1xH7V0cze1ZBSxWSddSFg0gCtaWSerfp0CmAG3A+HWKIN6c/ZNSCrqdq0DBC99D qOn6ThjckZWGvz/KV5xBr/KvUYOpSeEyREtgcAN008TiUaNy4nOhWV2xgLGuPY/J ed4V2B9qVbq+l+sZyzukB8cmOXmcCey6omwJ7LqZzoTWTAtTQtM2MwhaQFUWtQG8 mCqPXJp1XyL24sn0bI1t2NuKgQcs6QEQWX3zN4DA6I+N9+sTDqo= =2G+i -----END PGP SIGNATURE----- Merge tag 'xfs-4.19-merge-6' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux Pull xfs updates from Darrick Wong: "This is the second part of the XFS changes for 4.19. The biggest changes are the removal of buffer heads frm XFS, a massive reworking of the deferred transaction operations handling code, the removal of the long defunct barrier/nobarrier mount options, and the addition of a few more online repair functions. Summary: - Use extent maps to track pagecache page status instead of bufferhead state. - Refactor pagecache read and write paths to use the new iomap library functions, which enable us to drop the old bufferhead code for pagesize == blocksize filesystems. - Set up parallel per-block-per-page metadata to track subpage information that was tracked by buffer heads, which enables us to drop the old bufferhead code for pagesize > blocksize filesystems. - Tie a deferred ops control structure to a transaction so that we can take advantage of an upper-level dfops without having to plumb pointer passing through the code. - Refactor the deferred ops code to track deferred ops as part of the transaction structure (instead of as a separate data structure) so that we can simplify the scoping rules around defer_ops. - Refactor twisty delwri buffer submission code to avoid deadlocks. - Shorten and fix indenting problems in the scrub code. - Detect obviously bad summary counts at mount and fix them. - Directly associate deferred ops control structure with a transaction so that callers no longer have to manage it themselves. - Remove a couple of IRIX-era inode macros. - Remove the long-deprecated 'barrier' and 'nobarrier' mount options. - Clean up the inode fork structure a bit. - Check for bad fs summary counter values in the superblock. - Reduce COW fork lookups during writeback. - Refactor the deferred ops control structures into the transaction structure, thereby eliminating the need for transaction users to handle the deferred ops as a separate data structure. - Add the ability to repair AG headers online. - Fix a crash due to insufficient return value checking. - Various fixes and cleanups" * tag 'xfs-4.19-merge-6' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: (155 commits) xfs: fix a null pointer dereference in xfs_bmap_extents_to_btree xfs: remove b_last_holder & associated macros iomap: Switch to offset_in_page for clarity xfs: Close race between direct IO and xfs_break_layouts() xfs: repair the AGI xfs: repair the AGFL xfs: repair the AGF xfs: remove dead error handling code in xfs_dquot_disk_alloc() xfs: use WRITE_ONCE to update if_seq xfs: fix a comment in xfs_log_reserve xfs: only validate summary counts on primary superblock xfs: substitute spaces with tabs xfs: fold dfops into the transaction xfs: always defer agfl block frees xfs: pass transaction to xfs_defer_add() xfs: replace xfs_defer_ops ->dop_pending with on-stack list xfs: cancel dfops on xfs_defer_finish() error xfs: clean out superfluous dfops dop params/vars xfs: drop dop param from xfs_defer_op_type ->finish_item() callback xfs: automatic dfops inode relogging ... |
|
Darrick J. Wong | 7d5e049e72 |
iomap: fix WARN_ON_ONCE on uninitialized variable
In commit
|
|
Darrick J. Wong | 1fc25f51d7 |
xfs: sanity check ag header values in xrep_calc_ag_resblks
Check the values we read in from the AG headers when calculating the block reservations for a repair transaction. If they're obviously wrong, substitute worst case assumptions (rather than ENOSPC on a bogus reservation request). Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Allison Henderson <allison.henderson@oracle.com> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> |
|
Linus Torvalds | 10f3e23f07 |
Convert content from the ext4 wiki to Documentation rst files so it is
more likely to be updated as we add new features to ext4. Add 64-bit timestamp support to ext4's superblock fields. And the usual bug fixes and cleanups, including a Spectre gadget fixup and some hardening against maliciously corrupted file systems. -----BEGIN PGP SIGNATURE----- iQEzBAABCAAdFiEEK2m5VNv+CHkogTfJ8vlZVpUNgaMFAltx3CAACgkQ8vlZVpUN gaOfkAgApkNU46yPWWFqJpeTR76Go4zW00x9Piaita+KpQcTCUokK0BDXXeydOEo zhtNjCRl00vbhnYJ90T75Y3ce9oGQDkWpXHFZBdl+kmMBaIHQJYEECCLR2zrvxVk LhsixvLozQd/VFenwzmXLqYshiZjhjBZOKo7/xsoy4waer1vNidA9/SR3hGWTMOk 9IsEVqw1h6pZt7ZB/leZmZ348H9QZZbYOUWl2ArvdmNeg2t8pWH0MD0bos+mj5lW xihyJEIS/OjWZRJhM2WEPMO7U4jD/aX3zmDMGePp/vC5KSqA4FGwk671xIyD+LXR Hd8Rn8chRe4/iyhBqHpJNkqHQRUV6g== =SA3p -----END PGP SIGNATURE----- Merge tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 Pull ext4 updates from Ted Ts'o: - Convert content from the ext4 wiki to Documentation rst files so it is more likely to be updated as we add new features to ext4. - Add 64-bit timestamp support to ext4's superblock fields. - ... and the usual bug fixes and cleanups, including a Spectre gadget fixup and some hardening against maliciously corrupted file systems. * tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (34 commits) ext4: remove unneeded variable "err" in ext4_mb_release_inode_pa() ext4: improve code readability in ext4_iget() ext4: fix spectre gadget in ext4_mb_regular_allocator() ext4: check for NUL characters in extended attribute's name ext4: use ext4_warning() for sb_getblk failure ext4: fix race when setting the bitmap corrupted flag ext4: reset error code in ext4_find_entry in fallback ext4: handle layout changes to pinned DAX mappings dax: dax_layout_busy_page() warn on !exceptional docs: fix up the obviously obsolete bits in the new ext4 documentation docs: add new ext4 superblock time extension fields docs: create filesystem internal section ext4: use swap macro in mext_page_double_lock ext4: check allocation failure when duplicating "data" in ext4_remount() ext4: fix warning message in ext4_enable_quotas() ext4: super: extend timestamps to 40 bits jbd2: replace current_kernel_time64 with ktime equivalent ext4: use timespec64 for all inode times ext4: use ktime_get_real_seconds for i_dtime ext4: use 64-bit timestamps for mmp_time ... |
|
Linus Torvalds | 3bb37da509 |
smb3/cifs fixes (including 8 for stable). Improved tracing, stats, snapshots (previous version mounts work now), performance (compounding enabled for statfs)
-----BEGIN PGP SIGNATURE----- iQGzBAABCgAdFiEE6fsu8pdIjtWE/DpLiiy9cAdyT1EFAltwwVkACgkQiiy9cAdy T1H06gv/d+b52w2X05YlLBB1GboJjv1dl3sVrFisJrMrCCr6Hotk+4+RtC8CJHh6 /9Joq6iY8B/XLl7HrWr61vJhn8vlWOGhOQaUmHCBGeiIMX93RbaP8KvkPfHAKQiJ T6+7v70z/WL7mUQA2TEQ3Iz6kpCuP2y/XT6eQglTFXasqsWHy08TbsbnmP9yX2i4 E1B6zMwpn6m4PFxEURg14eBJTrQomb/UBSLHNwxwwOQjnKzsmeH3pyiFCJSPJNFF Xv/lyBibgvQAsuWtLTi82fqei+c60as2LVrsFLdoWXeD6JMlx1O+SHSz0NNJL/Lk oxMNx9NI+LsYmekIBHgoX01PzVgCpcn6ThfoQv3JLHtqiZuk3Wlai7bPvJMIApoM 8fgHGR9xvhRCfHgYvxx4MmP3e7xewajSe7fh8/y8BqreO+fLeoyqho5G9zDpn6bU wGFPtVYW/fgTEM5E3NOjsuC1zE35bjwU7U40jqjGHP1VsTs3hkLFy9fZNNq4EP1M ytfJPlTG =gzZK -----END PGP SIGNATURE----- Merge tag '4.19-smb3' of git://git.samba.org/sfrench/cifs-2.6 Pull cifs updates from Steve French: "smb3/cifs fixes (including 8 for stable). Other improvements include: - improved tracing, improved stats - snapshots (previous version mounts work now over SMB3) - performance (compounding enabled for statfs, ~40% faster). - security (make it possible to build cifs.ko with insecure vers=1.0 disabled in Kconfig)" * tag '4.19-smb3' of git://git.samba.org/sfrench/cifs-2.6: (43 commits) smb3: create smb3 equivalent alias for cifs pseudo-xattrs smb3: allow previous versions to be mounted with snapshot= mount parm cifs: don't show domain= in mount output when domain is empty cifs: add missing support for ACLs in SMB 3.11 smb3: enumerating snapshots was leaving part of the data off end cifs: update smb2_queryfs() to use compounding cifs: update receive_encrypted_standard to handle compounded responses cifs: create SMB2_open_init()/SMB2_open_free() helpers. cifs: add SMB2_query_info_[init|free]() cifs: add SMB2_close_init()/SMB2_close_free() smb3: display stats counters for number of slow commands CIFS: fix uninitialized ptr deref in smb2 signing smb3: Do not send SMB3 SET_INFO if nothing changed smb3: fix minor debug output for CONFIG_CIFS_STATS smb3: add tracepoint for slow responses cifs: add compound_send_recv() cifs: make smb_send_rqst take an array of requests cifs: update init_sg, crypt_message to take an array of rqst smb3: update readme to correct information about /proc/fs/cifs/Stats smb3: fix reset of bytes read and written stats ... |
|
Linus Torvalds | 161fa27ff2 |
Merge branch 'iomap-4.19-merge' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux
Pull fs iomap refactoring from Darrick Wong: "This is the first part of the XFS changes for 4.19. Christoph and Andreas coordinated some refactoring work on the iomap code in preparation for removing buffer heads from XFS and porting gfs2 to iomap. I'm sending this small pull request ahead of the main XFS merge to avoid holding up gfs2 unnecessarily" * 'iomap-4.19-merge' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: iomap: add inline data support to iomap_readpage_actor iomap: support direct I/O to inline data iomap: refactor iomap_dio_actor iomap: add initial support for writes without buffer heads iomap: add an iomap-based readpage and readpages implementation iomap: add private pointer to struct iomap iomap: add a page_done callback iomap: generic inline data handling iomap: complete partial direct I/O writes synchronously iomap: mark newly allocated buffer heads as new fs: factor out a __generic_write_end helper |
|
Linus Torvalds | a1a4f841ec |
for-4.19-tag
-----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEE8rQSAMVO+zA4DBdWxWXV+ddtWDsFAltxe7QACgkQxWXV+ddt WDswMA//QlRO+Ln5CH+RlT4fyf1RQUQZblWss2zxrmlo1GRI3Ljf2DNsBE3rD7P4 NSiXfHmgkdjcQP6poPLJwHxwkNd4NFXglYg64wWO10RjHGhKglmH6ztU88wsPfr2 2RZv271/NvYIEkEi6kdyy8ilKeWMshOfyj3+PaeapQn67uJfimyiUvDgUgbvwH3c yj0nVRLP1C7snNj4Atti/rjXMhG+m1UWfjRkZsmqlBp52k2UAcrtiwQK+DS5b9mL aWLSaGmIcJtSMkNJPQBST9GTWbJfKTpceoCzkT0o3irvQpN2e2flAJ4ireL8q4mN MvqJ7giPBFHNDcHEzN6VERvsaA1Rx9Vq20ieQl8JAMd4p/bi5ehN3ww+9vau5zCw Pc8WeKEILKrLYEAgHOnUO1wxHw994Iv5CA26roTQ0HNXQJjyEZ4m40Ch6LzmfKPm WKcHX14Uw22GKaFEXHTOpRZ0U0d1cMTcn5zaAajGsB9LwcaiLM+OiFSPtDkwUOB9 QGJHklZVXAD1IH9HFPuq85uUtXTLXbxsw1g8phEJGbmaVxxCOAUAXwEk3qxuZNbz CHL3G5+l3JEXxfoJSbDW60kr8xic7teqQDszqqP2qlqtP15ty2xc9d5Q8MZajSTZ H1z9+0gfjYYHrGuAp69MtCbdQhhDSqLyivjJJm0HBaKfVNGW2Xg= =jBaz -----END PGP SIGNATURE----- Merge tag 'for-4.19-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux Pull btrfs updates from David Sterba: "Mostly fixes and cleanups, nothing big, though the notable thing is the inserted/deleted lines delta -1124. User visible changes: - allow defrag on opened read-only files that have rw permissions; similar to what dedupe will allow on such files Core changes: - tree checker improvements, reported by fuzzing: * more checks for: block group items, essential trees * chunk type validation * mount time cross-checks that physical and logical chunks match * switch more error codes to EUCLEAN aka EFSCORRUPTED Fixes: - fsync corner case fixes - fix send failure when root has deleted files still open - send, fix incorrect file layout after hole punching beyond eof - fix races between mount and deice scan ioctl, found by fuzzing - fix deadlock when delayed iput is called from writeback on the same inode; rare but has been observed in practice, also removes code - fix pinned byte accounting, using the right percpu helpers; this should avoid some write IO inefficiency during low space conditions - don't remove block group that still has pinned bytes - reset on-disk device stats value after replace, otherwise this would report stale values for the new device Cleanups: - time64_t/timespec64 cleanups - remove remaining dead code in scrub handling NOCOW extents after disabling it in previous cycle - simplify fsync regarding ordered extents logic and remove all the related code - remove redundant arguments in order to reduce stack space consumption - remove support for V0 type of extents, not in use since 2.6.30 - remove several unused structure members - fewer indirect function calls by inlining some callbacks - qgroup rescan timing fixes - vfs: iget cleanups" * tag 'for-4.19-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: (182 commits) btrfs: revert fs_devices state on error of btrfs_init_new_device btrfs: Exit gracefully when chunk map cannot be inserted to the tree btrfs: Introduce mount time chunk <-> dev extent mapping check btrfs: Verify that every chunk has corresponding block group at mount time btrfs: Check that each block group has corresponding chunk at mount time Btrfs: send, fix incorrect file layout after hole punching beyond eof btrfs: Use wrapper macro for rcu string to remove duplicate code btrfs: simplify btrfs_iget btrfs: lift make_bad_inode into btrfs_iget btrfs: simplify IS_ERR/PTR_ERR checks btrfs: btrfs_iget never returns an is_bad_inode inode btrfs: replace: Reset on-disk dev stats value after replace btrfs: extent-tree: Remove unused __btrfs_free_block_rsv btrfs: backref: Use ERR_CAST to return error code btrfs: Remove redundant btrfs_release_path from btrfs_unlink_subvol btrfs: Remove root parameter from btrfs_unlink_subvol btrfs: Remove fs_info from btrfs_add_root_ref btrfs: Remove fs_info from btrfs_del_root_ref btrfs: Remove fs_info from btrfs_del_root btrfs: Remove fs_info from btrfs_delete_delayed_dir_index ... |
|
Linus Torvalds | 575b94386b |
File locking fixes and enhancements for v4.19
-----BEGIN PGP SIGNATURE----- iQIcBAABAgAGBQJbcDltAAoJEAAOaEEZVoIV9YUP/ioYw3+kIXwRa05ec8Twbn6m O+1HOX9UAuDWX+97P2kCJSqi9a/TkpfLQxoGxZowDv97ZkVeHdzVaJVN/0sbH1uP Oj86b2Y5zGyrLk3D4c5VcyHkvogr16DUvhYeRtcjPyufSaKqefftCeetTntRvr4Z rZ8HLKYxa4AZlP/FkgVZ/S6PmvG7tH8rVLMFCfY+rASbSU4gBxmrS0nB2UgU4ycb b2SVZo4Zxycw0KiP14T5AgN4puid7VEofFezBpMKjNPjCUk1wJ3mtUyLeS8jRUEx M6qnl7DU4BWPRtGxiXNebN85M8iaJnlaeQglowklrubrFtqnVeDQM0rz42NiY2/S 2jk6q8b9XKvoDEev2VeKfmFztu4gGXypNR6xAolvgBnUZziKKFUJutZwJe95pZV1 kO6CvdchuCQdLMLdHPji2kk2AA8Go5webONfioFhoWYfeSvgjRELmOFTDjP22eNv s1/vfCsjuU6vdyaXXXZpN/7VMenoNegSQUXFoxhmHO7BtNuTqk+ZVFNgaWsOrXcx I0ZIjCAmOsU+DKLn+DTT8kDtU3ihZz/OCXXZJi9JPeLFZ6gSghDRnFThwYqcZvA+ syIn1XGGtn15Q7A5FZNvwGqYir1GtyOslrxxIKxGlhleovhT94HuKXkL1T+k74zl FXItKgRs0TPQfTsI8q+C =IMcz -----END PGP SIGNATURE----- Merge tag 'locks-v4.19-1' of git://git.kernel.org/pub/scm/linux/kernel/git/jlayton/linux Pull file locking updates from Jeff Layton: "Just a couple of patches from Konstantin to fix /proc/locks when the process that set the lock has exited, and a new tracepoint for the flock() codepath. Also threw in mailmap entries for my addresses and a comment cleanup" * tag 'locks-v4.19-1' of git://git.kernel.org/pub/scm/linux/kernel/git/jlayton/linux: locks: remove misleading obsolete comment mailmap: remap some of my email addresses to kernel.org address locks: add tracepoint in flock codepath fs/lock: show locks taken by processes from another pidns fs/lock: skip lock owner pid translation in case we are in init_pid_ns |
|
Linus Torvalds | 4591343e35 |
Merge branches 'work.misc' and 'work.dcache' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull misc vfs updates from Al Viro: "Misc cleanups from various folks all over the place I expected more fs/dcache.c cleanups this cycle, so that went into a separate branch. Said cleanups have missed the window, so in the hindsight it could've gone into work.misc instead. Decided not to cherry-pick, thus the 'work.dcache' branch" * 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: fs: dcache: Use true and false for boolean values fold generic_readlink() into its only caller fs: shave 8 bytes off of struct inode fs: Add more kernel-doc to the produced documentation fs: Fix attr.c kernel-doc removed extra extern file_fdatawait_range * 'work.dcache' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: kill dentry_update_name_case() |
|
Linus Torvalds | f2be269897 |
Merge branch 'work.aio' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull vfs aio updates from Al Viro: "Christoph's aio poll, saner this time around. This time it's pretty much local to fs/aio.c. Hopefully race-free..." * 'work.aio' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: aio: allow direct aio poll comletions for keyed wakeups aio: implement IOCB_CMD_POLL aio: add a iocb refcount timerfd: add support for keyed wakeups |
|
Linus Torvalds | 4d2a073cde |
Merge branch 'work.lookup' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull vfs lookup() updates from Al Viro: "More conversions of ->lookup() to d_splice_alias(). Should be reasonably complete now - the only leftovers are in ceph" * 'work.lookup' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: afs_try_auto_mntpt(): return NULL instead of ERR_PTR(-ENOENT) afs_lookup(): switch to d_splice_alias() afs: switch dynroot lookups to d_splice_alias() hpfs: fix an inode leak in lookup, switch to d_splice_alias() hostfs_lookup: switch to d_splice_alias() |
|
Linus Torvalds | 0ea97a2d61 |
Merge branch 'work.mkdir' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull vfs icache updates from Al Viro: - NFS mkdir/open_by_handle race fix - analogous solution for FUSE, replacing the one currently in mainline - new primitive to be used when discarding halfway set up inodes on failed object creation; gives sane warranties re icache lookups not returning such doomed by still not freed inodes. A bunch of filesystems switched to that animal. - Miklos' fix for last cycle regression in iget5_locked(); -stable will need a slightly different variant, unfortunately. - misc bits and pieces around things icache-related (in adfs and jfs). * 'work.mkdir' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: jfs: don't bother with make_bad_inode() in ialloc() adfs: don't put inodes into icache new helper: inode_fake_hash() vfs: don't evict uninitialized inode jfs: switch to discard_new_inode() ext2: make sure that partially set up inodes won't be returned by ext2_iget() udf: switch to discard_new_inode() ufs: switch to discard_new_inode() btrfs: switch to discard_new_inode() new primitive: discard_new_inode() kill d_instantiate_no_diralias() nfs_instantiate(): prevent multiple aliases for directory inode |
|
Linus Torvalds | a66b4cd1e7 |
Merge branch 'work.open3' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull vfs open-related updates from Al Viro: - "do we need fput() or put_filp()" rules are gone - it's always fput() now. We keep track of that state where it belongs - in ->f_mode. - int *opened mess killed - in finish_open(), in ->atomic_open() instances and in fs/namei.c code around do_last()/lookup_open()/atomic_open(). - alloc_file() wrappers with saner calling conventions are introduced (alloc_file_clone() and alloc_file_pseudo()); callers converted, with much simplification. - while we are at it, saner calling conventions for path_init() and link_path_walk(), simplifying things inside fs/namei.c (both on open-related paths and elsewhere). * 'work.open3' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (40 commits) few more cleanups of link_path_walk() callers allow link_path_walk() to take ERR_PTR() make path_init() unconditionally paired with terminate_walk() document alloc_file() changes make alloc_file() static do_shmat(): grab shp->shm_file earlier, switch to alloc_file_clone() new helper: alloc_file_clone() create_pipe_files(): switch the first allocation to alloc_file_pseudo() anon_inode_getfile(): switch to alloc_file_pseudo() hugetlb_file_setup(): switch to alloc_file_pseudo() ocxlflash_getfile(): switch to alloc_file_pseudo() cxl_getfile(): switch to alloc_file_pseudo() ... and switch shmem_file_setup() to alloc_file_pseudo() __shmem_file_setup(): reorder allocations new wrapper: alloc_file_pseudo() kill FILE_{CREATED,OPENED} switch atomic_open() and lookup_open() to returning 0 in all success cases document ->atomic_open() changes ->atomic_open(): return 0 in all success cases get rid of 'opened' in path_openat() and the helpers downstream ... |
|
Linus Torvalds | e5a32b5b21 |
Here are the main MIPS changes for 4.19.
An overview of the general architecture changes: - Massive DMA ops refactoring from Christoph Hellwig (huzzah for deleting crufty code!). - We introduce NT_MIPS_DSP & NT_MIPS_FP_MODE ELF notes & corresponding regsets to expose DSP ASE & floating point mode state respectively, both for live debugging & core dumps. - We better optimize our code by hard-coding cpu_has_* macros at compile time where their values are known due to the ISA revision that the kernel build is targeting. - The EJTAG exception handler now better handles SMP systems, where it was previously possible for CPUs to clobber a register value saved by another CPU. - Our implementation of memset() gained a couple of fixes for MIPSr6 systems to return correct values in some cases where stores fault. - We now implement ioremap_wc() using the uncached-accelerated cache coherency attribute where supported, which is detected during boot, and fall back to plain uncached access where necessary. The MIPS-specific (and unused in tree) ioremap_uncached_accelerated() & ioremap_cacheable_cow() are removed. - The prctl(PR_SET_FP_MODE, ...) syscall is better supported for SMP systems by reworking the way we ensure remote CPUs that may be running threads within the affected process switch mode. - Systems using the MIPS Coherence Manager will now set the MIPS_IC_SNOOPS_REMOTE flag to avoid some unnecessary cache maintenance overhead when flushing the icache. - A few fixes were made for building with clang/LLVM, which now sucessfully builds kernels for many of our platforms. - Miscellaneous cleanups all over. And some platform-specific changes: - ar7 gained stubs for a few clock API functions to fix build failures for some drivers. - ath79 gained support for a few new SoCs, a few fixes & better gpio-keys support. - Ci20 now exposes its SPI bus using the spi-gpio driver. - The generic platform can now auto-detect a suitable value for PHYS_OFFSET based upon the memory map described by the device tree, allowing us to avoid wasting memory on page book-keeping for systems where RAM starts at a non-zero physical address. - Ingenic systems using the jz4740 platform code now link their vmlinuz higher to allow for kernels of a realistic size. - Loongson32 now builds the kernel targeting MIPSr1 rather than MIPSr2 to avoid CPU errata. - Loongson64 gains a couple of fixes, a workaround for a write buffering issue & support for the Loongson 3A R3.1 CPU. - Malta now uses the piix4-poweroff driver to handle powering down. - Microsemi Ocelot gained support for its SPI bus & NOR flash, its second MDIO bus and can now be supported by a FIT/.itb image. - Octeon saw a bunch of header cleanups which remove a lot of duplicate or unused code. -----BEGIN PGP SIGNATURE----- iIsEABYIADMWIQRgLjeFAZEXQzy86/s+p5+stXUA3QUCW3G6JxUccGF1bC5idXJ0 b25AbWlwcy5jb20ACgkQPqefrLV1AN0n/gD/Rpdgay31G/4eTTKBmBrcaju6Shjt /2Iu6WC5Sj4hDHUBAJSbuI+B9YjcNsjekBYxB/LLD7ImcLBl6nLMIvKmXLAL =cUiF -----END PGP SIGNATURE----- Merge tag 'mips_4.19' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux Pull MIPS updates from Paul Burton: "Here are the main MIPS changes for 4.19. An overview of the general architecture changes: - Massive DMA ops refactoring from Christoph Hellwig (huzzah for deleting crufty code!). - We introduce NT_MIPS_DSP & NT_MIPS_FP_MODE ELF notes & corresponding regsets to expose DSP ASE & floating point mode state respectively, both for live debugging & core dumps. - We better optimize our code by hard-coding cpu_has_* macros at compile time where their values are known due to the ISA revision that the kernel build is targeting. - The EJTAG exception handler now better handles SMP systems, where it was previously possible for CPUs to clobber a register value saved by another CPU. - Our implementation of memset() gained a couple of fixes for MIPSr6 systems to return correct values in some cases where stores fault. - We now implement ioremap_wc() using the uncached-accelerated cache coherency attribute where supported, which is detected during boot, and fall back to plain uncached access where necessary. The MIPS-specific (and unused in tree) ioremap_uncached_accelerated() & ioremap_cacheable_cow() are removed. - The prctl(PR_SET_FP_MODE, ...) syscall is better supported for SMP systems by reworking the way we ensure remote CPUs that may be running threads within the affected process switch mode. - Systems using the MIPS Coherence Manager will now set the MIPS_IC_SNOOPS_REMOTE flag to avoid some unnecessary cache maintenance overhead when flushing the icache. - A few fixes were made for building with clang/LLVM, which now sucessfully builds kernels for many of our platforms. - Miscellaneous cleanups all over. And some platform-specific changes: - ar7 gained stubs for a few clock API functions to fix build failures for some drivers. - ath79 gained support for a few new SoCs, a few fixes & better gpio-keys support. - Ci20 now exposes its SPI bus using the spi-gpio driver. - The generic platform can now auto-detect a suitable value for PHYS_OFFSET based upon the memory map described by the device tree, allowing us to avoid wasting memory on page book-keeping for systems where RAM starts at a non-zero physical address. - Ingenic systems using the jz4740 platform code now link their vmlinuz higher to allow for kernels of a realistic size. - Loongson32 now builds the kernel targeting MIPSr1 rather than MIPSr2 to avoid CPU errata. - Loongson64 gains a couple of fixes, a workaround for a write buffering issue & support for the Loongson 3A R3.1 CPU. - Malta now uses the piix4-poweroff driver to handle powering down. - Microsemi Ocelot gained support for its SPI bus & NOR flash, its second MDIO bus and can now be supported by a FIT/.itb image. - Octeon saw a bunch of header cleanups which remove a lot of duplicate or unused code" * tag 'mips_4.19' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux: (123 commits) MIPS: Remove remnants of UASM_ISA MIPS: netlogic: xlr: Remove erroneous check in nlm_fmn_send() MIPS: VDSO: Force link endianness MIPS: Always specify -EB or -EL when using clang MIPS: Use dins to simplify __write_64bit_c0_split() MIPS: Use read-write output operand in __write_64bit_c0_split() MIPS: Avoid using array as parameter to write_c0_kpgd() MIPS: vdso: Allow clang's --target flag in VDSO cflags MIPS: genvdso: Remove GOT checks MIPS: Remove obsolete MIPS checks for DST node "chosen@0" MIPS: generic: Remove input symbols from defconfig MIPS: Delete unused code in linux32.c MIPS: Remove unused sys_32_mmap2 MIPS: Remove nabi_no_regargs mips: dts: mscc: enable spi and NOR flash support on ocelot PCB123 mips: dts: mscc: Add spi on Ocelot MIPS: Loongson: Merge load addresses MIPS: Loongson: Set Loongson32 to MIPS32R1 MIPS: mscc: ocelot: add interrupt controller properties to GPIO controller MIPS: generic: Select MIPS_AUTO_PFN_OFFSET ... |
|
Trond Myklebust | 62421cd943 |
NFSv4: Fix a typo in nfs4_init_channel_attrs()
The back channel size is allowed to be 1 or greater. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com> |
|
Trond Myklebust | 8aafd2fde3 |
NFSv4: Don't busy wait if NFSv4 session draining is interrupted
Catch the ERESTARTSYS error so that it can be processed by the callers. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com> |
|
Olga Kornievskaia | e4648aa4f9 |
NFS recover from destination server reboot for copies
Mark the destination state to indicate a server-side copy is happening. On detecting a reboot and recovering open state check if any state is engaged in a server-side copy, if so, find the copy and mark it and then signal the waiting thread. Upon wakeup, if copy was marked then propage EAGAIN to the nfsd_copy_file_range and restart the copy from scratch. Signed-off-by: Olga Kornievskaia <kolga@netapp.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com> |
|
Linus Torvalds | 1e45e9a95e |
Merge branch 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timer updates from Thomas Gleixner: "The timers departement more or less proudly presents: - More Y2038 timekeeping work mostly in the core code. The work is slowly, but steadily targeting the actuall syscalls. - Enhanced timekeeping suspend/resume support by utilizing clocksources which do not stop during suspend, but are otherwise not the main timekeeping clocksources. - Make NTP adjustmets more accurate and immediate when the frequency is set directly and not incrementally. - Sanitize the overrung handing of posix timers - A new timer driver for Mediatek SoCs - The usual pile of fixes and updates all over the place" * 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (32 commits) clockevents: Warn if cpu_all_mask is used as cpumask tick/broadcast-hrtimer: Use cpu_possible_mask for ce_broadcast_hrtimer clocksource/drivers/arm_arch_timer: Fix bogus cpu_all_mask usage clocksource: ti-32k: Remove CLOCK_SOURCE_SUSPEND_NONSTOP flag timers: Clear timer_base::must_forward_clk with timer_base::lock held clocksource/drivers/sprd: Register one always-on timer to compensate suspend time clocksource/drivers/timer-mediatek: Add support for system timer clocksource/drivers/timer-mediatek: Convert the driver to timer-of clocksource/drivers/timer-mediatek: Use specific prefix for GPT clocksource/drivers/timer-mediatek: Rename mtk_timer to timer-mediatek clocksource/drivers/timer-mediatek: Add system timer bindings clocksource/drivers: Set clockevent device cpumask to cpu_possible_mask time: Introduce one suspend clocksource to compensate the suspend time time: Fix extra sleeptime injection when suspend fails timekeeping/ntp: Constify some function arguments ntp: Use kstrtos64 for s64 variable ntp: Remove redundant arguments timer: Fix coding style ktime: Provide typesafe ktime_to_ns() hrtimer: Improve kernel message printing ... |
|
Linus Torvalds | de5d1b39ea |
Merge branch 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull locking/atomics update from Thomas Gleixner: "The locking, atomics and memory model brains delivered: - A larger update to the atomics code which reworks the ordering barriers, consolidates the atomic primitives, provides the new atomic64_fetch_add_unless() primitive and cleans up the include hell. - Simplify cmpxchg() instrumentation and add instrumentation for xchg() and cmpxchg_double(). - Updates to the memory model and documentation" * 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (48 commits) locking/atomics: Rework ordering barriers locking/atomics: Instrument cmpxchg_double*() locking/atomics: Instrument xchg() locking/atomics: Simplify cmpxchg() instrumentation locking/atomics/x86: Reduce arch_cmpxchg64*() instrumentation tools/memory-model: Rename litmus tests to comply to norm7 tools/memory-model/Documentation: Fix typo, smb->smp sched/Documentation: Update wake_up() & co. memory-barrier guarantees locking/spinlock, sched/core: Clarify requirements for smp_mb__after_spinlock() sched/core: Use smp_mb() in wake_woken_function() tools/memory-model: Add informal LKMM documentation to MAINTAINERS locking/atomics/Documentation: Describe atomic_set() as a write operation tools/memory-model: Make scripts executable tools/memory-model: Remove ACCESS_ONCE() from model tools/memory-model: Remove ACCESS_ONCE() from recipes locking/memory-barriers.txt/kokr: Update Korean translation to fix broken DMA vs. MMIO ordering example MAINTAINERS: Add Daniel Lustig as an LKMM reviewer tools/memory-model: Fix ISA2+pooncelock+pooncelock+pombonce name tools/memory-model: Add litmus test for full multicopy atomicity locking/refcount: Always allow checked forms ... |
|
Chao Yu | d494500a70 |
f2fs: support fault_type mount option
Previously, once fault injection is on, by default, all kind of faults will be injected to f2fs, if we want to trigger single or specified combined type during the test, we need to configure sysfs entry, it will be a little inconvenient to integrate sysfs configuring into testsuit, such as xfstest. So this patch introduces a new mount option 'fault_type' to assist old option 'fault_injection', with these two mount options, we can specify any fault rate/type at mount-time. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> |
|
Chao Yu | 3f16ecd950 |
f2fs: fix to return success when trimming meta area
generic/251 --- tests/generic/251.out 2016-05-03 20:20:11.381899000 +0800 QA output created by 251 Running the test: done. +fstrim: /mnt/scratch_f2fs: FITRIM ioctl failed: Invalid argument +fstrim: /mnt/scratch_f2fs: FITRIM ioctl failed: Invalid argument +fstrim: /mnt/scratch_f2fs: FITRIM ioctl failed: Invalid argument +fstrim: /mnt/scratch_f2fs: FITRIM ioctl failed: Invalid argument +fstrim: /mnt/scratch_f2fs: FITRIM ioctl failed: Invalid argument ... Ran: generic/251 Failures: generic/251 The reason is coverage of fstrim locates in meta area, previously we just return -EINVAL for such case, making generic/251 failed, to fix this problem, let's relieve restriction to return success with no block discarded. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> |
|
Chao Yu | 6b9cb1242c |
f2fs: fix use-after-free of dicard command entry
As Dan Carpenter reported: The patch 20ee4382322c: "f2fs: issue small discard by LBA order" from Jul 8, 2018, leads to the following Smatch warning: fs/f2fs/segment.c:1277 __issue_discard_cmd_orderly() warn: 'dc' was already freed. See also: fs/f2fs/segment.c:2550 __issue_discard_cmd_range() warn: 'dc' was already freed. In order to fix this issue, let's get error from __submit_discard_cmd(), and release current discard command after we referenced next one. Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> |
|
Chao Yu | b83dcfe671 |
f2fs: support discard submission error injection
This patch adds to support discard submission error injection for testing error handling of __submit_discard_cmd(). Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> |
|
Chao Yu | 35ec7d5748 |
f2fs: split discard command in prior to block layer
Some devices has small max_{hw,}discard_sectors, so that in __blkdev_issue_discard(), one big size discard bio can be split into multiple small size discard bios, result in heavy load in IO scheduler and device, which can hang other sync IO for long time. Now, f2fs is trying to control discard commands more elaboratively, in order to make less conflict in between discard IO and user IO to enhance application's performance, so in this patch, we will split discard bio in f2fs in prior to in block layer to reduce issuing multiple discard bios in a short time. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> |
|
Sheng Yong | a690efffd1 |
f2fs: wake up gc thread immediately when gc_urgent is set
Fixes:
|
|
Chao Yu | 6eae269461 |
f2fs: fix incorrect range->len in f2fs_trim_fs()
generic/260 reported below error: [+] Default length with start set (should succeed) [+] Length beyond the end of fs (should succeed) [+] Length beyond the end of fs with start set (should succeed) +./tests/generic/260: line 94: [: 18446744073709551615: integer expression expected +./tests/generic/260: line 104: [: 18446744073709551615: integer expression expected Test done ... In f2fs_trim_fs(), if there is no discard being trimmed, we need to correct range->len before return. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> |
|
Chao Yu | 2296915808 |
f2fs: refresh recent accessed nat entry in lru list
Introduce nat_list_lock to protect nm_i->nat_entries list, and manage it as a LRU list, refresh location for therein recent accessed entries in the list. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> |
|
Chao Yu | a33c150237 |
f2fs: fix avoid race between truncate and background GC
Thread A Background GC - f2fs_setattr isize to 0 - truncate_setsize - gc_data_segment - f2fs_get_read_data_page page #0 - set_page_dirty - set_cold_data - f2fs_truncate - f2fs_setattr isize to 4k - read 4k <--- hit data in cached page #0 Above race condition can cause read out invalid data in a truncated page, fix it by i_gc_rwsem[WRITE] lock. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> |
|
Chao Yu | c7079853c8 |
f2fs: avoid race between zero_range and background GC
Thread A Background GC - f2fs_zero_range - truncate_pagecache_range - gc_data_segment - get_read_data_page - move_data_page - set_page_dirty - set_cold_data - f2fs_do_zero_range - dn->data_blkaddr = NEW_ADDR; - f2fs_set_data_blkaddr Actually, we don't need to set dirty & checked flag on the page, since all valid data in the page should be zeroed by zero_range(). Use i_gc_rwsem[WRITE] to avoid such race condition. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> |
|
Chao Yu | 91291e9998 |
f2fs: fix to do sanity check with block address in main area v2
This patch adds f2fs_is_valid_blkaddr() in below functions to do sanity check with block address to avoid pentential panic: - f2fs_grab_read_bio() - __written_first_block() https://bugzilla.kernel.org/show_bug.cgi?id=200465 - Reproduce - POC (poc.c) #define _GNU_SOURCE #include <sys/types.h> #include <sys/mount.h> #include <sys/mman.h> #include <sys/stat.h> #include <sys/xattr.h> #include <dirent.h> #include <errno.h> #include <error.h> #include <fcntl.h> #include <stdio.h> #include <stdlib.h> #include <string.h> #include <unistd.h> #include <linux/falloc.h> #include <linux/loop.h> static void activity(char *mpoint) { char *xattr; int err; err = asprintf(&xattr, "%s/foo/bar/xattr", mpoint); char buf2[113]; memset(buf2, 0, sizeof(buf2)); listxattr(xattr, buf2, sizeof(buf2)); } int main(int argc, char *argv[]) { activity(argv[1]); return 0; } - kernel message [ 844.718738] F2FS-fs (loop0): Mounted with checkpoint version = 2 [ 846.430929] F2FS-fs (loop0): access invalid blkaddr:1024 [ 846.431058] WARNING: CPU: 1 PID: 1249 at fs/f2fs/checkpoint.c:154 f2fs_is_valid_blkaddr+0x10f/0x160 [ 846.431059] Modules linked in: snd_hda_codec_generic snd_hda_intel snd_hda_codec snd_hda_core snd_hwdep snd_pcm snd_timer snd input_leds joydev soundcore serio_raw i2c_piix4 mac_hid ib_iser rdma_cm iw_cm ib_cm ib_core configfs iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi autofs4 raid10 raid456 libcrc32c async_raid6_recov async_memcpy async_pq async_xor xor async_tx raid6_pq raid1 raid0 multipath linear qxl ttm crct10dif_pclmul crc32_pclmul drm_kms_helper ghash_clmulni_intel syscopyarea sysfillrect sysimgblt fb_sys_fops pcbc drm 8139too aesni_intel 8139cp floppy psmouse mii aes_x86_64 crypto_simd pata_acpi cryptd glue_helper [ 846.431310] CPU: 1 PID: 1249 Comm: a.out Not tainted 4.18.0-rc3+ #1 [ 846.431312] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014 [ 846.431315] RIP: 0010:f2fs_is_valid_blkaddr+0x10f/0x160 [ 846.431316] Code: 00 eb ed 31 c0 83 fa 05 75 ae 48 83 ec 08 48 8b 3f 89 f1 48 c7 c2 fc 0b 0f 8b 48 c7 c6 8b d7 09 8b 88 44 24 07 e8 61 8b ff ff <0f> 0b 0f b6 44 24 07 48 83 c4 08 eb 81 4c 8b 47 10 8b 8f 38 04 00 [ 846.431347] RSP: 0018:ffff961c414a7bc0 EFLAGS: 00010282 [ 846.431349] RAX: 0000000000000000 RBX: ffffc5f787b8ea80 RCX: 0000000000000000 [ 846.431350] RDX: 0000000000000000 RSI: ffff89dfffd165d8 RDI: ffff89dfffd165d8 [ 846.431351] RBP: ffff961c414a7c20 R08: 0000000000000001 R09: 0000000000000248 [ 846.431353] R10: 0000000000000000 R11: 0000000000000248 R12: 0000000000000007 [ 846.431369] R13: ffff89dff5492800 R14: ffff89dfae3aa000 R15: ffff89dff4ff88d0 [ 846.431372] FS: 00007f882e2fb700(0000) GS:ffff89dfffd00000(0000) knlGS:0000000000000000 [ 846.431373] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 846.431374] CR2: 0000000001a88008 CR3: 00000001eb572000 CR4: 00000000000006e0 [ 846.431384] Call Trace: [ 846.431426] f2fs_iget+0x6f4/0xe70 [ 846.431430] ? f2fs_find_entry+0x71/0x90 [ 846.431432] f2fs_lookup+0x1aa/0x390 [ 846.431452] __lookup_slow+0x97/0x150 [ 846.431459] lookup_slow+0x35/0x50 [ 846.431462] walk_component+0x1c6/0x470 [ 846.431479] ? memcg_kmem_charge_memcg+0x70/0x90 [ 846.431488] ? page_add_file_rmap+0x13/0x200 [ 846.431491] path_lookupat+0x76/0x230 [ 846.431501] ? __alloc_pages_nodemask+0xfc/0x280 [ 846.431504] filename_lookup+0xb8/0x1a0 [ 846.431534] ? _cond_resched+0x16/0x40 [ 846.431541] ? kmem_cache_alloc+0x160/0x1d0 [ 846.431549] ? path_listxattr+0x41/0xa0 [ 846.431551] path_listxattr+0x41/0xa0 [ 846.431570] do_syscall_64+0x55/0x100 [ 846.431583] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [ 846.431607] RIP: 0033:0x7f882de1c0d7 [ 846.431607] Code: f0 ff ff 73 01 c3 48 8b 0d be dd 2b 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 b8 c2 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 91 dd 2b 00 f7 d8 64 89 01 48 [ 846.431639] RSP: 002b:00007ffe8e66c238 EFLAGS: 00000202 ORIG_RAX: 00000000000000c2 [ 846.431641] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f882de1c0d7 [ 846.431642] RDX: 0000000000000071 RSI: 00007ffe8e66c280 RDI: 0000000001a880c0 [ 846.431643] RBP: 00007ffe8e66c300 R08: 0000000001a88010 R09: 0000000000000000 [ 846.431645] R10: 00000000000001ab R11: 0000000000000202 R12: 0000000000400550 [ 846.431646] R13: 00007ffe8e66c400 R14: 0000000000000000 R15: 0000000000000000 [ 846.431648] ---[ end trace abca54df39d14f5c ]--- [ 846.431651] F2FS-fs (loop0): invalid blkaddr: 1024, type: 5, run fsck to fix. [ 846.431762] WARNING: CPU: 1 PID: 1249 at fs/f2fs/f2fs.h:2697 f2fs_iget+0xd17/0xe70 [ 846.431763] Modules linked in: snd_hda_codec_generic snd_hda_intel snd_hda_codec snd_hda_core snd_hwdep snd_pcm snd_timer snd input_leds joydev soundcore serio_raw i2c_piix4 mac_hid ib_iser rdma_cm iw_cm ib_cm ib_core configfs iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi autofs4 raid10 raid456 libcrc32c async_raid6_recov async_memcpy async_pq async_xor xor async_tx raid6_pq raid1 raid0 multipath linear qxl ttm crct10dif_pclmul crc32_pclmul drm_kms_helper ghash_clmulni_intel syscopyarea sysfillrect sysimgblt fb_sys_fops pcbc drm 8139too aesni_intel 8139cp floppy psmouse mii aes_x86_64 crypto_simd pata_acpi cryptd glue_helper [ 846.431797] CPU: 1 PID: 1249 Comm: a.out Tainted: G W 4.18.0-rc3+ #1 [ 846.431798] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014 [ 846.431800] RIP: 0010:f2fs_iget+0xd17/0xe70 [ 846.431801] Code: ff ff 48 63 d8 e9 e1 f6 ff ff 48 8b 45 c8 41 b8 05 00 00 00 48 c7 c2 d8 e8 0e 8b 48 c7 c6 1d b0 0a 8b 48 8b 38 e8 f9 b4 00 00 <0f> 0b 48 8b 45 c8 f0 80 48 48 04 e9 d8 f9 ff ff 0f 0b 48 8b 43 18 [ 846.431832] RSP: 0018:ffff961c414a7bd0 EFLAGS: 00010282 [ 846.431834] RAX: 0000000000000000 RBX: ffffc5f787b8ea80 RCX: 0000000000000006 [ 846.431835] RDX: 0000000000000000 RSI: 0000000000000096 RDI: ffff89dfffd165d0 [ 846.431836] RBP: ffff961c414a7c20 R08: 0000000000000000 R09: 0000000000000273 [ 846.431837] R10: 0000000000000000 R11: ffff89dfad50ca60 R12: 0000000000000007 [ 846.431838] R13: ffff89dff5492800 R14: ffff89dfae3aa000 R15: ffff89dff4ff88d0 [ 846.431840] FS: 00007f882e2fb700(0000) GS:ffff89dfffd00000(0000) knlGS:0000000000000000 [ 846.431841] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 846.431842] CR2: 0000000001a88008 CR3: 00000001eb572000 CR4: 00000000000006e0 [ 846.431846] Call Trace: [ 846.431850] ? f2fs_find_entry+0x71/0x90 [ 846.431853] f2fs_lookup+0x1aa/0x390 [ 846.431856] __lookup_slow+0x97/0x150 [ 846.431858] lookup_slow+0x35/0x50 [ 846.431874] walk_component+0x1c6/0x470 [ 846.431878] ? memcg_kmem_charge_memcg+0x70/0x90 [ 846.431880] ? page_add_file_rmap+0x13/0x200 [ 846.431882] path_lookupat+0x76/0x230 [ 846.431884] ? __alloc_pages_nodemask+0xfc/0x280 [ 846.431886] filename_lookup+0xb8/0x1a0 [ 846.431890] ? _cond_resched+0x16/0x40 [ 846.431891] ? kmem_cache_alloc+0x160/0x1d0 [ 846.431894] ? path_listxattr+0x41/0xa0 [ 846.431896] path_listxattr+0x41/0xa0 [ 846.431898] do_syscall_64+0x55/0x100 [ 846.431901] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [ 846.431902] RIP: 0033:0x7f882de1c0d7 [ 846.431903] Code: f0 ff ff 73 01 c3 48 8b 0d be dd 2b 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 b8 c2 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 91 dd 2b 00 f7 d8 64 89 01 48 [ 846.431934] RSP: 002b:00007ffe8e66c238 EFLAGS: 00000202 ORIG_RAX: 00000000000000c2 [ 846.431936] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f882de1c0d7 [ 846.431937] RDX: 0000000000000071 RSI: 00007ffe8e66c280 RDI: 0000000001a880c0 [ 846.431939] RBP: 00007ffe8e66c300 R08: 0000000001a88010 R09: 0000000000000000 [ 846.431940] R10: 00000000000001ab R11: 0000000000000202 R12: 0000000000400550 [ 846.431941] R13: 00007ffe8e66c400 R14: 0000000000000000 R15: 0000000000000000 [ 846.431943] ---[ end trace abca54df39d14f5d ]--- [ 846.432033] F2FS-fs (loop0): access invalid blkaddr:1024 [ 846.432051] WARNING: CPU: 1 PID: 1249 at fs/f2fs/checkpoint.c:154 f2fs_is_valid_blkaddr+0x10f/0x160 [ 846.432051] Modules linked in: snd_hda_codec_generic snd_hda_intel snd_hda_codec snd_hda_core snd_hwdep snd_pcm snd_timer snd input_leds joydev soundcore serio_raw i2c_piix4 mac_hid ib_iser rdma_cm iw_cm ib_cm ib_core configfs iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi autofs4 raid10 raid456 libcrc32c async_raid6_recov async_memcpy async_pq async_xor xor async_tx raid6_pq raid1 raid0 multipath linear qxl ttm crct10dif_pclmul crc32_pclmul drm_kms_helper ghash_clmulni_intel syscopyarea sysfillrect sysimgblt fb_sys_fops pcbc drm 8139too aesni_intel 8139cp floppy psmouse mii aes_x86_64 crypto_simd pata_acpi cryptd glue_helper [ 846.432085] CPU: 1 PID: 1249 Comm: a.out Tainted: G W 4.18.0-rc3+ #1 [ 846.432086] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014 [ 846.432089] RIP: 0010:f2fs_is_valid_blkaddr+0x10f/0x160 [ 846.432089] Code: 00 eb ed 31 c0 83 fa 05 75 ae 48 83 ec 08 48 8b 3f 89 f1 48 c7 c2 fc 0b 0f 8b 48 c7 c6 8b d7 09 8b 88 44 24 07 e8 61 8b ff ff <0f> 0b 0f b6 44 24 07 48 83 c4 08 eb 81 4c 8b 47 10 8b 8f 38 04 00 [ 846.432120] RSP: 0018:ffff961c414a7900 EFLAGS: 00010286 [ 846.432122] RAX: 0000000000000000 RBX: 0000000000000400 RCX: 0000000000000006 [ 846.432123] RDX: 0000000000000000 RSI: 0000000000000096 RDI: ffff89dfffd165d0 [ 846.432124] RBP: ffff89dff5492800 R08: 0000000000000001 R09: 000000000000029d [ 846.432125] R10: ffff961c414a7820 R11: 000000000000029d R12: 0000000000000400 [ 846.432126] R13: 0000000000000000 R14: ffff89dff4ff88d0 R15: 0000000000000000 [ 846.432128] FS: 00007f882e2fb700(0000) GS:ffff89dfffd00000(0000) knlGS:0000000000000000 [ 846.432130] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 846.432131] CR2: 0000000001a88008 CR3: 00000001eb572000 CR4: 00000000000006e0 [ 846.432135] Call Trace: [ 846.432151] f2fs_wait_on_block_writeback+0x20/0x110 [ 846.432158] f2fs_grab_read_bio+0xbc/0xe0 [ 846.432161] f2fs_submit_page_read+0x21/0x280 [ 846.432163] f2fs_get_read_data_page+0xb7/0x3c0 [ 846.432165] f2fs_get_lock_data_page+0x29/0x1e0 [ 846.432167] f2fs_get_new_data_page+0x148/0x550 [ 846.432170] f2fs_add_regular_entry+0x1d2/0x550 [ 846.432178] ? __switch_to+0x12f/0x460 [ 846.432181] f2fs_add_dentry+0x6a/0xd0 [ 846.432184] f2fs_do_add_link+0xe9/0x140 [ 846.432186] __recover_dot_dentries+0x260/0x280 [ 846.432189] f2fs_lookup+0x343/0x390 [ 846.432193] __lookup_slow+0x97/0x150 [ 846.432195] lookup_slow+0x35/0x50 [ 846.432208] walk_component+0x1c6/0x470 [ 846.432212] ? memcg_kmem_charge_memcg+0x70/0x90 [ 846.432215] ? page_add_file_rmap+0x13/0x200 [ 846.432217] path_lookupat+0x76/0x230 [ 846.432219] ? __alloc_pages_nodemask+0xfc/0x280 [ 846.432221] filename_lookup+0xb8/0x1a0 [ 846.432224] ? _cond_resched+0x16/0x40 [ 846.432226] ? kmem_cache_alloc+0x160/0x1d0 [ 846.432228] ? path_listxattr+0x41/0xa0 [ 846.432230] path_listxattr+0x41/0xa0 [ 846.432233] do_syscall_64+0x55/0x100 [ 846.432235] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [ 846.432237] RIP: 0033:0x7f882de1c0d7 [ 846.432237] Code: f0 ff ff 73 01 c3 48 8b 0d be dd 2b 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 b8 c2 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 91 dd 2b 00 f7 d8 64 89 01 48 [ 846.432269] RSP: 002b:00007ffe8e66c238 EFLAGS: 00000202 ORIG_RAX: 00000000000000c2 [ 846.432271] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f882de1c0d7 [ 846.432272] RDX: 0000000000000071 RSI: 00007ffe8e66c280 RDI: 0000000001a880c0 [ 846.432273] RBP: 00007ffe8e66c300 R08: 0000000001a88010 R09: 0000000000000000 [ 846.432274] R10: 00000000000001ab R11: 0000000000000202 R12: 0000000000400550 [ 846.432275] R13: 00007ffe8e66c400 R14: 0000000000000000 R15: 0000000000000000 [ 846.432277] ---[ end trace abca54df39d14f5e ]--- [ 846.432279] F2FS-fs (loop0): invalid blkaddr: 1024, type: 5, run fsck to fix. [ 846.432376] WARNING: CPU: 1 PID: 1249 at fs/f2fs/f2fs.h:2697 f2fs_wait_on_block_writeback+0xb1/0x110 [ 846.432376] Modules linked in: snd_hda_codec_generic snd_hda_intel snd_hda_codec snd_hda_core snd_hwdep snd_pcm snd_timer snd input_leds joydev soundcore serio_raw i2c_piix4 mac_hid ib_iser rdma_cm iw_cm ib_cm ib_core configfs iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi autofs4 raid10 raid456 libcrc32c async_raid6_recov async_memcpy async_pq async_xor xor async_tx raid6_pq raid1 raid0 multipath linear qxl ttm crct10dif_pclmul crc32_pclmul drm_kms_helper ghash_clmulni_intel syscopyarea sysfillrect sysimgblt fb_sys_fops pcbc drm 8139too aesni_intel 8139cp floppy psmouse mii aes_x86_64 crypto_simd pata_acpi cryptd glue_helper [ 846.432410] CPU: 1 PID: 1249 Comm: a.out Tainted: G W 4.18.0-rc3+ #1 [ 846.432411] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014 [ 846.432413] RIP: 0010:f2fs_wait_on_block_writeback+0xb1/0x110 [ 846.432414] Code: 66 90 f0 ff 4b 34 74 59 5b 5d c3 48 8b 7d 00 41 b8 05 00 00 00 89 d9 48 c7 c2 d8 e8 0e 8b 48 c7 c6 1d b0 0a 8b e8 df bc fd ff <0f> 0b f0 80 4d 48 04 e9 67 ff ff ff 48 8b 03 48 c1 e8 37 83 e0 07 [ 846.432445] RSP: 0018:ffff961c414a7910 EFLAGS: 00010286 [ 846.432447] RAX: 0000000000000000 RBX: 0000000000000400 RCX: 0000000000000006 [ 846.432448] RDX: 0000000000000000 RSI: 0000000000000092 RDI: ffff89dfffd165d0 [ 846.432449] RBP: ffff89dff5492800 R08: 0000000000000000 R09: 00000000000002d1 [ 846.432450] R10: ffff961c414a7820 R11: ffff89dfad50cf80 R12: 0000000000000400 [ 846.432451] R13: 0000000000000000 R14: ffff89dff4ff88d0 R15: 0000000000000000 [ 846.432453] FS: 00007f882e2fb700(0000) GS:ffff89dfffd00000(0000) knlGS:0000000000000000 [ 846.432454] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 846.432455] CR2: 0000000001a88008 CR3: 00000001eb572000 CR4: 00000000000006e0 [ 846.432459] Call Trace: [ 846.432463] f2fs_grab_read_bio+0xbc/0xe0 [ 846.432464] f2fs_submit_page_read+0x21/0x280 [ 846.432466] f2fs_get_read_data_page+0xb7/0x3c0 [ 846.432468] f2fs_get_lock_data_page+0x29/0x1e0 [ 846.432470] f2fs_get_new_data_page+0x148/0x550 [ 846.432473] f2fs_add_regular_entry+0x1d2/0x550 [ 846.432475] ? __switch_to+0x12f/0x460 [ 846.432477] f2fs_add_dentry+0x6a/0xd0 [ 846.432480] f2fs_do_add_link+0xe9/0x140 [ 846.432483] __recover_dot_dentries+0x260/0x280 [ 846.432485] f2fs_lookup+0x343/0x390 [ 846.432488] __lookup_slow+0x97/0x150 [ 846.432490] lookup_slow+0x35/0x50 [ 846.432505] walk_component+0x1c6/0x470 [ 846.432509] ? memcg_kmem_charge_memcg+0x70/0x90 [ 846.432511] ? page_add_file_rmap+0x13/0x200 [ 846.432513] path_lookupat+0x76/0x230 [ 846.432515] ? __alloc_pages_nodemask+0xfc/0x280 [ 846.432517] filename_lookup+0xb8/0x1a0 [ 846.432520] ? _cond_resched+0x16/0x40 [ 846.432522] ? kmem_cache_alloc+0x160/0x1d0 [ 846.432525] ? path_listxattr+0x41/0xa0 [ 846.432526] path_listxattr+0x41/0xa0 [ 846.432529] do_syscall_64+0x55/0x100 [ 846.432531] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [ 846.432533] RIP: 0033:0x7f882de1c0d7 [ 846.432533] Code: f0 ff ff 73 01 c3 48 8b 0d be dd 2b 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 b8 c2 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 91 dd 2b 00 f7 d8 64 89 01 48 [ 846.432565] RSP: 002b:00007ffe8e66c238 EFLAGS: 00000202 ORIG_RAX: 00000000000000c2 [ 846.432567] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f882de1c0d7 [ 846.432568] RDX: 0000000000000071 RSI: 00007ffe8e66c280 RDI: 0000000001a880c0 [ 846.432569] RBP: 00007ffe8e66c300 R08: 0000000001a88010 R09: 0000000000000000 [ 846.432570] R10: 00000000000001ab R11: 0000000000000202 R12: 0000000000400550 [ 846.432571] R13: 00007ffe8e66c400 R14: 0000000000000000 R15: 0000000000000000 [ 846.432573] ---[ end trace abca54df39d14f5f ]--- [ 846.434280] BUG: unable to handle kernel NULL pointer dereference at 0000000000000008 [ 846.434424] PGD 80000001ebd3a067 P4D 80000001ebd3a067 PUD 1eb1ae067 PMD 0 [ 846.434551] Oops: 0000 [#1] SMP PTI [ 846.434697] CPU: 0 PID: 44 Comm: kworker/u5:0 Tainted: G W 4.18.0-rc3+ #1 [ 846.434805] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014 [ 846.435000] Workqueue: fscrypt_read_queue decrypt_work [ 846.435174] RIP: 0010:fscrypt_do_page_crypto+0x6e/0x2d0 [ 846.435351] Code: 00 65 48 8b 04 25 28 00 00 00 48 89 84 24 88 00 00 00 31 c0 e8 43 c2 e0 ff 49 8b 86 48 02 00 00 85 ed c7 44 24 70 00 00 00 00 <48> 8b 58 08 0f 84 14 02 00 00 48 8b 78 10 48 8b 0c 24 48 c7 84 24 [ 846.435696] RSP: 0018:ffff961c40f9bd60 EFLAGS: 00010206 [ 846.435870] RAX: 0000000000000000 RBX: ffffc5f787719b80 RCX: ffffc5f787719b80 [ 846.436051] RDX: ffffffff8b9f4b88 RSI: ffffffff8b0ae622 RDI: ffff961c40f9bdb8 [ 846.436261] RBP: 0000000000001000 R08: ffffc5f787719b80 R09: 0000000000001000 [ 846.436433] R10: 0000000000000018 R11: fefefefefefefeff R12: ffffc5f787719b80 [ 846.436562] R13: ffffc5f787719b80 R14: ffff89dff4ff88d0 R15: 0ffff89dfaddee60 [ 846.436658] FS: 0000000000000000(0000) GS:ffff89dfffc00000(0000) knlGS:0000000000000000 [ 846.436758] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 846.436898] CR2: 0000000000000008 CR3: 00000001eddd0000 CR4: 00000000000006f0 [ 846.437001] Call Trace: [ 846.437181] ? check_preempt_wakeup+0xf2/0x230 [ 846.437276] ? check_preempt_curr+0x7c/0x90 [ 846.437370] fscrypt_decrypt_page+0x48/0x4d [ 846.437466] __fscrypt_decrypt_bio+0x5b/0x90 [ 846.437542] decrypt_work+0x12/0x20 [ 846.437651] process_one_work+0x15e/0x3d0 [ 846.437740] worker_thread+0x4c/0x440 [ 846.437848] kthread+0xf8/0x130 [ 846.437938] ? rescuer_thread+0x350/0x350 [ 846.438022] ? kthread_associate_blkcg+0x90/0x90 [ 846.438117] ret_from_fork+0x35/0x40 [ 846.438201] Modules linked in: snd_hda_codec_generic snd_hda_intel snd_hda_codec snd_hda_core snd_hwdep snd_pcm snd_timer snd input_leds joydev soundcore serio_raw i2c_piix4 mac_hid ib_iser rdma_cm iw_cm ib_cm ib_core configfs iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi autofs4 raid10 raid456 libcrc32c async_raid6_recov async_memcpy async_pq async_xor xor async_tx raid6_pq raid1 raid0 multipath linear qxl ttm crct10dif_pclmul crc32_pclmul drm_kms_helper ghash_clmulni_intel syscopyarea sysfillrect sysimgblt fb_sys_fops pcbc drm 8139too aesni_intel 8139cp floppy psmouse mii aes_x86_64 crypto_simd pata_acpi cryptd glue_helper [ 846.438653] CR2: 0000000000000008 [ 846.438713] ---[ end trace abca54df39d14f60 ]--- [ 846.438796] RIP: 0010:fscrypt_do_page_crypto+0x6e/0x2d0 [ 846.438844] Code: 00 65 48 8b 04 25 28 00 00 00 48 89 84 24 88 00 00 00 31 c0 e8 43 c2 e0 ff 49 8b 86 48 02 00 00 85 ed c7 44 24 70 00 00 00 00 <48> 8b 58 08 0f 84 14 02 00 00 48 8b 78 10 48 8b 0c 24 48 c7 84 24 [ 846.439084] RSP: 0018:ffff961c40f9bd60 EFLAGS: 00010206 [ 846.439176] RAX: 0000000000000000 RBX: ffffc5f787719b80 RCX: ffffc5f787719b80 [ 846.440927] RDX: ffffffff8b9f4b88 RSI: ffffffff8b0ae622 RDI: ffff961c40f9bdb8 [ 846.442083] RBP: 0000000000001000 R08: ffffc5f787719b80 R09: 0000000000001000 [ 846.443284] R10: 0000000000000018 R11: fefefefefefefeff R12: ffffc5f787719b80 [ 846.444448] R13: ffffc5f787719b80 R14: ffff89dff4ff88d0 R15: 0ffff89dfaddee60 [ 846.445558] FS: 0000000000000000(0000) GS:ffff89dfffc00000(0000) knlGS:0000000000000000 [ 846.446687] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 846.447796] CR2: 0000000000000008 CR3: 00000001eddd0000 CR4: 00000000000006f0 - Location https://elixir.bootlin.com/linux/v4.18-rc4/source/fs/crypto/crypto.c#L149 struct crypto_skcipher *tfm = ci->ci_ctfm; Here ci can be NULL Note that this issue maybe require CONFIG_F2FS_FS_ENCRYPTION=y to reproduce. Reported-by Wen Xu <wen.xu@gatech.edu> Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> |
|
Chao Yu | bcbfbd604d |
f2fs: fix to do sanity check with inline flags
https://bugzilla.kernel.org/show_bug.cgi?id=200221 - Overview BUG() in clear_inode() when mounting and un-mounting a corrupted f2fs image - Reproduce - Kernel message [ 538.601448] F2FS-fs (loop0): Invalid segment/section count (31, 24 x 1376257) [ 538.601458] F2FS-fs (loop0): Can't find valid F2FS filesystem in 2th superblock [ 538.724091] F2FS-fs (loop0): Try to recover 2th superblock, ret: 0 [ 538.724102] F2FS-fs (loop0): Mounted with checkpoint version = 2 [ 540.970834] ------------[ cut here ]------------ [ 540.970838] kernel BUG at fs/inode.c:512! [ 540.971750] invalid opcode: 0000 [#1] SMP KASAN PTI [ 540.972755] CPU: 1 PID: 1305 Comm: umount Not tainted 4.18.0-rc1+ #4 [ 540.974034] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014 [ 540.982913] RIP: 0010:clear_inode+0xc0/0xd0 [ 540.983774] Code: 8d a3 30 01 00 00 4c 89 e7 e8 1c ec f8 ff 48 8b 83 30 01 00 00 49 39 c4 75 1a 48 c7 83 a0 00 00 00 60 00 00 00 5b 41 5c 5d c3 <0f> 0b 0f 0b 0f 0b 0f 0b 0f 0b 0f 0b 0f 1f 40 00 66 66 66 66 90 55 [ 540.987570] RSP: 0018:ffff8801e34a7b70 EFLAGS: 00010002 [ 540.988636] RAX: 0000000000000000 RBX: ffff8801e9b744e8 RCX: ffffffffb840eb3a [ 540.990063] RDX: dffffc0000000000 RSI: 0000000000000004 RDI: ffff8801e9b746b8 [ 540.991499] RBP: ffff8801e34a7b80 R08: ffffed003d36e8ce R09: ffffed003d36e8ce [ 540.992923] R10: 0000000000000001 R11: ffffed003d36e8cd R12: ffff8801e9b74668 [ 540.994360] R13: ffff8801e9b74760 R14: ffff8801e9b74528 R15: ffff8801e9b74530 [ 540.995786] FS: 00007f4662bdf840(0000) GS:ffff8801f6f00000(0000) knlGS:0000000000000000 [ 540.997403] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 540.998571] CR2: 000000000175c568 CR3: 00000001dcfe6000 CR4: 00000000000006e0 [ 541.000015] Call Trace: [ 541.000554] f2fs_evict_inode+0x253/0x630 [ 541.001381] evict+0x16f/0x290 [ 541.002015] iput+0x280/0x300 [ 541.002654] dentry_unlink_inode+0x165/0x1e0 [ 541.003528] __dentry_kill+0x16a/0x260 [ 541.004300] dentry_kill+0x70/0x250 [ 541.005018] dput+0x154/0x1d0 [ 541.005635] do_one_tree+0x34/0x40 [ 541.006354] shrink_dcache_for_umount+0x3f/0xa0 [ 541.007285] generic_shutdown_super+0x43/0x1c0 [ 541.008192] kill_block_super+0x52/0x80 [ 541.008978] kill_f2fs_super+0x62/0x70 [ 541.009750] deactivate_locked_super+0x6f/0xa0 [ 541.010664] deactivate_super+0x5e/0x80 [ 541.011450] cleanup_mnt+0x61/0xa0 [ 541.012151] __cleanup_mnt+0x12/0x20 [ 541.012893] task_work_run+0xc8/0xf0 [ 541.013635] exit_to_usermode_loop+0x125/0x130 [ 541.014555] do_syscall_64+0x138/0x170 [ 541.015340] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [ 541.016375] RIP: 0033:0x7f46624bf487 [ 541.017104] Code: 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 31 f6 e9 09 00 00 00 66 0f 1f 84 00 00 00 00 00 b8 a6 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d e1 c9 2b 00 f7 d8 64 89 01 48 [ 541.020923] RSP: 002b:00007fff5e12e9a8 EFLAGS: 00000246 ORIG_RAX: 00000000000000a6 [ 541.022452] RAX: 0000000000000000 RBX: 0000000001753030 RCX: 00007f46624bf487 [ 541.023885] RDX: 0000000000000001 RSI: 0000000000000000 RDI: 000000000175a1e0 [ 541.025318] RBP: 000000000175a1e0 R08: 0000000000000000 R09: 0000000000000014 [ 541.026755] R10: 00000000000006b2 R11: 0000000000000246 R12: 00007f46629c883c [ 541.028186] R13: 0000000000000000 R14: 0000000001753210 R15: 00007fff5e12ec30 [ 541.029626] Modules linked in: snd_hda_codec_generic snd_hda_intel snd_hda_codec snd_hwdep snd_hda_core snd_pcm snd_timer snd mac_hid i2c_piix4 soundcore ib_iser rdma_cm iw_cm ib_cm ib_core iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx raid1 raid0 multipath linear 8139too crct10dif_pclmul crc32_pclmul qxl drm_kms_helper syscopyarea aesni_intel sysfillrect sysimgblt fb_sys_fops ttm drm aes_x86_64 crypto_simd cryptd 8139cp glue_helper mii pata_acpi floppy [ 541.039445] ---[ end trace 4ce02f25ff7d3df5 ]--- [ 541.040392] RIP: 0010:clear_inode+0xc0/0xd0 [ 541.041240] Code: 8d a3 30 01 00 00 4c 89 e7 e8 1c ec f8 ff 48 8b 83 30 01 00 00 49 39 c4 75 1a 48 c7 83 a0 00 00 00 60 00 00 00 5b 41 5c 5d c3 <0f> 0b 0f 0b 0f 0b 0f 0b 0f 0b 0f 0b 0f 1f 40 00 66 66 66 66 90 55 [ 541.045042] RSP: 0018:ffff8801e34a7b70 EFLAGS: 00010002 [ 541.046099] RAX: 0000000000000000 RBX: ffff8801e9b744e8 RCX: ffffffffb840eb3a [ 541.047537] RDX: dffffc0000000000 RSI: 0000000000000004 RDI: ffff8801e9b746b8 [ 541.048965] RBP: ffff8801e34a7b80 R08: ffffed003d36e8ce R09: ffffed003d36e8ce [ 541.050402] R10: 0000000000000001 R11: ffffed003d36e8cd R12: ffff8801e9b74668 [ 541.051832] R13: ffff8801e9b74760 R14: ffff8801e9b74528 R15: ffff8801e9b74530 [ 541.053263] FS: 00007f4662bdf840(0000) GS:ffff8801f6f00000(0000) knlGS:0000000000000000 [ 541.054891] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 541.056039] CR2: 000000000175c568 CR3: 00000001dcfe6000 CR4: 00000000000006e0 [ 541.058506] ================================================================== [ 541.059991] BUG: KASAN: stack-out-of-bounds in update_stack_state+0x38c/0x3e0 [ 541.061513] Read of size 8 at addr ffff8801e34a7970 by task umount/1305 [ 541.063302] CPU: 1 PID: 1305 Comm: umount Tainted: G D 4.18.0-rc1+ #4 [ 541.064838] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014 [ 541.066778] Call Trace: [ 541.067294] dump_stack+0x7b/0xb5 [ 541.067986] print_address_description+0x70/0x290 [ 541.068941] kasan_report+0x291/0x390 [ 541.069692] ? update_stack_state+0x38c/0x3e0 [ 541.070598] __asan_load8+0x54/0x90 [ 541.071315] update_stack_state+0x38c/0x3e0 [ 541.072172] ? __read_once_size_nocheck.constprop.7+0x20/0x20 [ 541.073340] ? vprintk_func+0x27/0x60 [ 541.074096] ? printk+0xa3/0xd3 [ 541.074762] ? __save_stack_trace+0x5e/0x100 [ 541.075634] unwind_next_frame.part.5+0x18e/0x490 [ 541.076594] ? unwind_dump+0x290/0x290 [ 541.077368] ? __show_regs+0x2c4/0x330 [ 541.078142] __unwind_start+0x106/0x190 [ 541.085422] __save_stack_trace+0x5e/0x100 [ 541.086268] ? __save_stack_trace+0x5e/0x100 [ 541.087161] ? unlink_anon_vmas+0xba/0x2c0 [ 541.087997] save_stack_trace+0x1f/0x30 [ 541.088782] save_stack+0x46/0xd0 [ 541.089475] ? __alloc_pages_slowpath+0x1420/0x1420 [ 541.090477] ? flush_tlb_mm_range+0x15e/0x220 [ 541.091364] ? __dec_node_state+0x24/0xb0 [ 541.092180] ? lock_page_memcg+0x85/0xf0 [ 541.092979] ? unlock_page_memcg+0x16/0x80 [ 541.093812] ? page_remove_rmap+0x198/0x520 [ 541.094674] ? mark_page_accessed+0x133/0x200 [ 541.095559] ? _cond_resched+0x1a/0x50 [ 541.096326] ? unmap_page_range+0xcd4/0xe50 [ 541.097179] ? rb_next+0x58/0x80 [ 541.097845] ? rb_next+0x58/0x80 [ 541.098518] __kasan_slab_free+0x13c/0x1a0 [ 541.099352] ? unlink_anon_vmas+0xba/0x2c0 [ 541.100184] kasan_slab_free+0xe/0x10 [ 541.100934] kmem_cache_free+0x89/0x1e0 [ 541.101724] unlink_anon_vmas+0xba/0x2c0 [ 541.102534] free_pgtables+0x101/0x1b0 [ 541.103299] exit_mmap+0x146/0x2a0 [ 541.103996] ? __ia32_sys_munmap+0x50/0x50 [ 541.104829] ? kasan_check_read+0x11/0x20 [ 541.105649] ? mm_update_next_owner+0x322/0x380 [ 541.106578] mmput+0x8b/0x1d0 [ 541.107191] do_exit+0x43a/0x1390 [ 541.107876] ? mm_update_next_owner+0x380/0x380 [ 541.108791] ? deactivate_super+0x5e/0x80 [ 541.109610] ? cleanup_mnt+0x61/0xa0 [ 541.110351] ? __cleanup_mnt+0x12/0x20 [ 541.111115] ? task_work_run+0xc8/0xf0 [ 541.111879] ? exit_to_usermode_loop+0x125/0x130 [ 541.112817] rewind_stack_do_exit+0x17/0x20 [ 541.113666] RIP: 0033:0x7f46624bf487 [ 541.114404] Code: Bad RIP value. [ 541.115094] RSP: 002b:00007fff5e12e9a8 EFLAGS: 00000246 ORIG_RAX: 00000000000000a6 [ 541.116605] RAX: 0000000000000000 RBX: 0000000001753030 RCX: 00007f46624bf487 [ 541.118034] RDX: 0000000000000001 RSI: 0000000000000000 RDI: 000000000175a1e0 [ 541.119472] RBP: 000000000175a1e0 R08: 0000000000000000 R09: 0000000000000014 [ 541.120890] R10: 00000000000006b2 R11: 0000000000000246 R12: 00007f46629c883c [ 541.122321] R13: 0000000000000000 R14: 0000000001753210 R15: 00007fff5e12ec30 [ 541.124061] The buggy address belongs to the page: [ 541.125042] page:ffffea00078d29c0 count:0 mapcount:0 mapping:0000000000000000 index:0x0 [ 541.126651] flags: 0x2ffff0000000000() [ 541.127418] raw: 02ffff0000000000 dead000000000100 dead000000000200 0000000000000000 [ 541.128963] raw: 0000000000000000 0000000000000000 00000000ffffffff 0000000000000000 [ 541.130516] page dumped because: kasan: bad access detected [ 541.131954] Memory state around the buggy address: [ 541.132924] ffff8801e34a7800: 00 f1 f1 f1 f1 00 f4 f4 f4 f3 f3 f3 f3 00 00 00 [ 541.134378] ffff8801e34a7880: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [ 541.135814] >ffff8801e34a7900: 00 00 00 00 00 00 00 00 00 00 00 00 00 f1 f1 f1 [ 541.137253] ^ [ 541.138637] ffff8801e34a7980: f1 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [ 541.140075] ffff8801e34a7a00: 00 00 00 00 00 00 00 00 f3 00 00 00 00 00 00 00 [ 541.141509] ================================================================== - Location https://elixir.bootlin.com/linux/v4.18-rc1/source/fs/inode.c#L512 BUG_ON(inode->i_data.nrpages); The root cause is root directory inode is corrupted, it has both inline_data and inline_dentry flag, and its nlink is zero, so in ->evict(), after dropping all page cache, it grabs page #0 for inline data truncation, result in panic in later clear_inode() where we will check inode->i_data.nrpages value. This patch adds inline flags check in sanity_check_inode, in addition, do sanity check with root inode's nlink. Reported-by Wen Xu <wen.xu@gatech.edu> Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> |
|
Linus Torvalds | 400439275d |
Merge branch 'efi-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull EFI updates from Thomas Gleixner: "The EFI pile: - Make mixed mode UEFI runtime service invocations mutually exclusive, as mandated by the UEFI spec - Perform UEFI runtime services calls from a work queue so the calls into the firmware occur from a kernel thread - Honor the UEFI memory map attributes for live memory regions configured by UEFI as a framebuffer. This works around a coherency problem with KVM guests running on ARM. - Cleanups, improvements and fixes all over the place" * 'efi-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: efivars: Call guid_parse() against guid_t type of variable efi/cper: Use consistent types for UUIDs efi/x86: Replace references to efi_early->is64 with efi_is_64bit() efi: Deduplicate efi_open_volume() efi/x86: Add missing NULL initialization in UGA draw protocol discovery efi/x86: Merge 32-bit and 64-bit UGA draw protocol setup routines efi/x86: Align efi_uga_draw_protocol typedef names to convention efi/x86: Merge the setup_efi_pci32() and setup_efi_pci64() routines efi/x86: Prevent reentrant firmware calls in mixed mode efi/esrt: Only call efi_mem_reserve() for boot services memory fbdev/efifb: Honour UEFI memory map attributes when mapping the FB efi: Drop type and attribute checks in efi_mem_desc_lookup() efi/libstub/arm: Add opt-in Kconfig option for the DTB loader efi: Remove the declaration of efi_late_init() as the function is unused efi/cper: Avoid using get_seconds() efi: Use a work queue to invoke EFI Runtime Services efi/x86: Use non-blocking SetVariable() for efi_delete_dummy_variable() efi/x86: Clean up the eboot code |
|
Yan, Zheng | 0fcf6c02b2 |
ceph: don't drop message if it contains more data than expected
Later version mds may encode more data into messages. Signed-off-by: "Yan, Zheng" <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com> |
|
Yan, Zheng | 342ce1823e |
ceph: support cephfs' own feature bits
Signed-off-by: "Yan, Zheng" <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com> |
|
Chengguang Xu | e5bc08d09f |
ceph: refactor error handling code in ceph_reserve_caps()
Call new helper __ceph_unreserve_caps() to reduce duplicated code. Signed-off-by: Chengguang Xu <cgxu519@gmx.com> Reviewed-by: "Yan, Zheng" <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com> |
|
Chengguang Xu | 7bf8f736c8 |
ceph: refactor ceph_unreserve_caps()
The code of ceph_unreserve_caps() and error handling in ceph_reserve_caps() are duplicated, so introduce a helper __ceph_unreserve_caps() to reduce duplicated code. Signed-off-by: Chengguang Xu <cgxu519@gmx.com> Reviewed-by: "Yan, Zheng" <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com> |
|
Chengguang Xu | d554849290 |
ceph: change to void return type for __do_request()
We do not check return code for __do_request() in all callers, so change to void return type. Signed-off-by: Chengguang Xu <cgxu519@gmx.com> Reviewed-by: "Yan, Zheng" <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com> |
|
Chengguang Xu | 9da12e3a7d |
ceph: compare fsc->max_file_size and inode->i_size for max file size limit
In ceph_llseek(), we compare fsc->max_file_size and inode->i_size to choose max file size limit. Signed-off-by: Chengguang Xu <cgxu519@gmx.com> Reviewed-by: "Yan, Zheng" <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com> |
|
Chengguang Xu | 36a4c72d1c |
ceph: add additional size check in ceph_setattr()
ceph_setattr() finally calls vfs function inode_newsize_ok() to do offset validation and that is based on sb->s_maxbytes. Because we set sb->s_maxbytes to MAX_LFS_FILESIZE to through VFS check and do proper offset validation in cephfs level, we need adding proper offset validation before calling inode_newsize_ok(). Signed-off-by: Chengguang Xu <cgxu519@gmx.com> Reviewed-by: "Yan, Zheng" <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com> |
|
Darrick J. Wong | 00d22a1c36 |
xfs: recalculate summary counters at mount time if icount is bad
Since the sb write verifier trips on bad icounts, we should also force a mount time recalculation of the summary counters if the icount is bad. This helps us avoid blowing up at freeze/unmount time when the bad counter gets written back out. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Allison Henderson <allison.henderson@oracle.com> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> |
|
piaojun | 3111784bee |
fs/9p/xattr.c: catch the error of p9_client_clunk when setting xattr failed
In my testing, v9fs_fid_xattr_set will return successfully even if the backend ext4 filesystem has no space to store xattr key-value. That will cause inconsistent behavior between front end and back end. The reason is that lsetxattr will be triggered by p9_client_clunk, and unfortunately we did not catch the error. This patch will catch the error to notify upper caller. p9_client_clunk (in 9p) p9_client_rpc(clnt, P9_TCLUNK, "d", fid->fid); v9fs_clunk (in qemu) put_fid free_fid v9fs_xattr_fid_clunk v9fs_co_lsetxattr s->ops->lsetxattr ext4_xattr_user_set (in host ext4 filesystem) Link: http://lkml.kernel.org/r/5B57EACC.2060900@huawei.com Signed-off-by: Jun Piao <piaojun@huawei.com> Cc: Eric Van Hensbergen <ericvh@gmail.com> Cc: Ron Minnich <rminnich@sandia.gov> Cc: Latchesar Ionkov <lucho@ionkov.net> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: stable@vger.kernel.org Signed-off-by: Dominique Martinet <dominique.martinet@cea.fr> |
|
Colin Ian King | 6baaac0961 |
fs/9p/v9fs.c: fix spelling mistake "Uknown" -> "Unknown"
fix spelling mistake in pr_info message text Link: http://lkml.kernel.org/r/20180526150650.10562-1-colin.king@canonical.com Signed-off-by: Colin Ian King <colin.king@canonical.com> Cc: Eric Van Hensbergen <ericvh@gmail.com> Cc: Ron Minnich <rminnich@sandia.gov> Cc: Latchesar Ionkov <lucho@ionkov.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Dominique Martinet <dominique.martinet@cea.fr> |
|
Souptick Joarder | fe6340e2d1 |
fs/9p/vfs_file.c: use new return type vm_fault_t
Use new return type vm_fault_t for page_mkwrite handler.
See
|
|
Linus Torvalds | d6dd643159 |
Merge branch 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull vfs fixes from Al Viro: "A bunch of race fixes, mostly around lazy pathwalk. All of it is -stable fodder, a large part going back to 2013" * 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: make sure that __dentry_kill() always invalidates d_seq, unhashed or not fix __legitimize_mnt()/mntput() race fix mntput/mntput race root dentries need RCU-delayed freeing |
|
Shan Hai | 01239d77b9 |
xfs: fix a null pointer dereference in xfs_bmap_extents_to_btree
Fuzzing tool reports a write to null pointer error in the xfs_bmap_extents_to_btree, fix it by bailing out on encountering a null pointer. Signed-off-by: Shan Hai <shan.hai@oracle.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> |
|
Eric Sandeen | fa6c668d80 |
xfs: remove b_last_holder & associated macros
The old lock tracking infrastructure in xfs using the b_last_holder field seems to only be useful if you can get into the system with a debugger; it seems that the existing tracepoints would be the way to go these days, and this old infrastructure can be removed. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> |
|
Andreas Gruenbacher | 10259de1d8 |
iomap: Switch to offset_in_page for clarity
Instead of open-coding pos & (PAGE_SIZE - 1) and pos & ~PAGE_MASK, use the offset_in_page macro. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> |
|
Dave Jiang | e25ff835af |
xfs: Close race between direct IO and xfs_break_layouts()
This patch is the duplicate of ross's fix for ext4 for xfs. If the refcount of a page is lowered between the time that it is returned by dax_busy_page() and when the refcount is again checked in xfs_break_layouts() => ___wait_var_event(), the waiting function xfs_wait_dax_page() will never be called. This means that xfs_break_layouts() will still have 'retry' set to false, so we'll stop looping and never check the refcount of other pages in this inode. Instead, always continue looping as long as dax_layout_busy_page() gives us a page which it found with an elevated refcount. Signed-off-by: Dave Jiang <dave.jiang@intel.com> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> |
|
Boris Brezillon | da86748bf6 |
NAND core changes:
- Add the SPI-NAND framework. - Create a helper to find the best ECC configuration. - Create NAND controller operations. - Allocate dynamically ONFI parameters structure. - Add defines for ONFI version bits. - Add manufacturer fixup for ONFI parameter page. - Add an option to specify NAND chip as a boot device. - Add Reed-Solomon error correction algorithm. - Better name for the controller structure. - Remove unused caller_is_module() definition. - Make subop helpers return unsigned values. - Expose _notsupp() helpers for raw page accessors. - Add default values for dynamic timings. - Kill the chip->scan_bbt() hook. - Rename nand_default_bbt() into nand_create_bbt(). - Start to clean the nand_chip structure. - Remove stale prototype from rawnand.h. Raw NAND controllers drivers changes: - Qcom: structuring cleanup. - Denali: use core helper to find the best ECC configuration. - Possible build of almost all drivers by adding a dependency on COMPILE_TEST for almost all of them in Kconfig, implies various fixes, Kconfig cleanup, GPIO headers inclusion cleanup, and even changes in sparc64 and ia64 architectures. - Clean the ->probe() functions error path of a lot of drivers. - Migrate all drivers to use nand_scan() instead of nand_scan_ident()/nand_scan_tail() pair. - Use mtd_device_register() where applicable to simplify the code. - Marvell: * Handle on-die ECC. * Better clocks handling. * Remove bogus comment. * Add suspend and resume support. - Tegra: add NAND controller driver. - Atmel: * Add module param to avoid using dma. * Drop Wenyou Yang from MAINTAINERS. - Denali: optimize timings handling. - FSMC: Stop using chip->read_buf(). - FSL: * Switch to SPDX license tag identifiers. * Fix qualifiers in MXC init functions. Raw NAND chip drivers changes: - Micron: * Add fixup for ONFI revision. * Update ecc_stats.corrected. * Make ECC activation stateful. * Avoid enabling/disabling ECC when it can't be disabled. * Get the actual number of bitflips. * Allow forced on-die ECC. * Support 8/512 on-die ECC. * Fix on-die ECC detection logic. - Hynix: * Fix decoding the OOB size on H27UCG8T2BTR. * Use ->exec_op() in hynix_nand_reg_write_op(). -----BEGIN PGP SIGNATURE----- iQEcBAABCAAGBQJbYYGVAAoJECVq6hhHvVaE0poIAJy+VpZl0jTPQ/oO8TQui9hE IZbc8LwohCvegYYhiY1cNESyMYamDfoK6M93i/0zTJF2AJAxPl25ldT8N5Wr16DO 5Vfsdjv75V8l0JEY2SvWYmC6glOAYs0UEDdcFNJRMPqUnQz+VvBIafJOCQqzo4ZH SDnLx3XzOxO4PAPnztWEg50WvaqMPt7ThcqoxThHMcQaLrNjgJUsV0mN+vNEv16Q 6gH6hl1C019k+Kj2Zu0vAifHw1K7gIYT4HvqKwstQ6HYUX2IzIzuEpRIcIze0S5z XKzZ57USItb3l+Y3YwFBLjgP4N+VTT5X59LxdtCOXJ+YvzgxwtKElRvalNcryYI= =zkEf -----END PGP SIGNATURE----- Merge tag 'nand/for-4.19' of git://git.infradead.org/linux-mtd into mtd/next Pull NAND updates from Miquel Raynal: " NAND core changes: - Add the SPI-NAND framework. - Create a helper to find the best ECC configuration. - Create NAND controller operations. - Allocate dynamically ONFI parameters structure. - Add defines for ONFI version bits. - Add manufacturer fixup for ONFI parameter page. - Add an option to specify NAND chip as a boot device. - Add Reed-Solomon error correction algorithm. - Better name for the controller structure. - Remove unused caller_is_module() definition. - Make subop helpers return unsigned values. - Expose _notsupp() helpers for raw page accessors. - Add default values for dynamic timings. - Kill the chip->scan_bbt() hook. - Rename nand_default_bbt() into nand_create_bbt(). - Start to clean the nand_chip structure. - Remove stale prototype from rawnand.h. Raw NAND controllers drivers changes: - Qcom: structuring cleanup. - Denali: use core helper to find the best ECC configuration. - Possible build of almost all drivers by adding a dependency on COMPILE_TEST for almost all of them in Kconfig, implies various fixes, Kconfig cleanup, GPIO headers inclusion cleanup, and even changes in sparc64 and ia64 architectures. - Clean the ->probe() functions error path of a lot of drivers. - Migrate all drivers to use nand_scan() instead of nand_scan_ident()/nand_scan_tail() pair. - Use mtd_device_register() where applicable to simplify the code. - Marvell: * Handle on-die ECC. * Better clocks handling. * Remove bogus comment. * Add suspend and resume support. - Tegra: add NAND controller driver. - Atmel: * Add module param to avoid using dma. * Drop Wenyou Yang from MAINTAINERS. - Denali: optimize timings handling. - FSMC: Stop using chip->read_buf(). - FSL: * Switch to SPDX license tag identifiers. * Fix qualifiers in MXC init functions. Raw NAND chip drivers changes: - Micron: * Add fixup for ONFI revision. * Update ecc_stats.corrected. * Make ECC activation stateful. * Avoid enabling/disabling ECC when it can't be disabled. * Get the actual number of bitflips. * Allow forced on-die ECC. * Support 8/512 on-die ECC. * Fix on-die ECC detection logic. - Hynix: * Fix decoding the OOB size on H27UCG8T2BTR. * Use ->exec_op() in hynix_nand_reg_write_op(). " |
|
Steve French | c4f7173ac3 |
smb3: create smb3 equivalent alias for cifs pseudo-xattrs
We really, really don't want to be encouraging people to use cifs (the dialect) since it is insecure, so to avoid confusion we want to move them to names which include 'smb3' instead of 'cifs' - so this simply creates an alias for the pseudo-xattrs e.g. can now do: getfattr -n user.smb3.creationtime /mnt1/file and getfattr -n user.smb3.dosattrib /mnt1/file and getfattr -n system.smb3_acl /mnt1/file instead of forcing you to use the string 'cifs' in these (e.g. getfattr -n system.cifs_acl /mnt1/file) Signed-off-by: Steve French <stfrench@microsoft.com> |
|
Chao Yu | 3093336481 |
f2fs: fix to reset i_gc_failures correctly
Let's reset i_gc_failures to zero when we unset pinned state for file. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> |
|
Chao Yu | d3f07c049d |
f2fs: fix invalid memory access
syzbot found the following crash on: HEAD commit: d9bd94c0bcaa Add linux-next specific files for 20180801 git tree: linux-next console output: https://syzkaller.appspot.com/x/log.txt?x=1001189c400000 kernel config: https://syzkaller.appspot.com/x/.config?x=cc8964ea4d04518c dashboard link: https://syzkaller.appspot.com/bug?extid=c966a82db0b14aa37e81 compiler: gcc (GCC) 8.0.1 20180413 (experimental) Unfortunately, I don't have any reproducer for this crash yet. IMPORTANT: if you fix the bug, please add the following tag to the commit: Reported-by: syzbot+c966a82db0b14aa37e81@syzkaller.appspotmail.com loop7: rw=12288, want=8200, limit=20 netlink: 65342 bytes leftover after parsing attributes in process `syz-executor4'. openvswitch: netlink: Message has 8 unknown bytes. kasan: CONFIG_KASAN_INLINE enabled kasan: GPF could be caused by NULL-ptr deref or user memory access general protection fault: 0000 [#1] SMP KASAN CPU: 1 PID: 7615 Comm: syz-executor7 Not tainted 4.18.0-rc7-next-20180801+ #29 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 RIP: 0010:__read_once_size include/linux/compiler.h:188 [inline] RIP: 0010:compound_head include/linux/page-flags.h:142 [inline] RIP: 0010:PageLocked include/linux/page-flags.h:272 [inline] RIP: 0010:f2fs_put_page fs/f2fs/f2fs.h:2011 [inline] RIP: 0010:validate_checkpoint+0x66d/0xec0 fs/f2fs/checkpoint.c:835 Code: e8 58 05 7f fe 4c 8d 6b 80 4d 8d 74 24 08 48 b8 00 00 00 00 00 fc ff df 4c 89 ea 48 c1 ea 03 c6 04 02 00 4c 89 f2 48 c1 ea 03 <80> 3c 02 00 0f 85 f4 06 00 00 4c 89 ea 4d 8b 7c 24 08 48 b8 00 00 RSP: 0018:ffff8801937cebe8 EFLAGS: 00010246 RAX: dffffc0000000000 RBX: ffff8801937cef30 RCX: ffffc90006035000 RDX: 0000000000000000 RSI: ffffffff82fd9658 RDI: 0000000000000005 RBP: ffff8801937cef58 R08: ffff8801ab254700 R09: fffff94000d9e026 R10: fffff94000d9e026 R11: ffffea0006cf0137 R12: fffffffffffffffb R13: ffff8801937ceeb0 R14: 0000000000000003 R15: ffff880193419b40 FS: 00007f36a61d5700(0000) GS:ffff8801db100000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007fc04ff93000 CR3: 00000001d0562000 CR4: 00000000001426e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: f2fs_get_valid_checkpoint+0x436/0x1ec0 fs/f2fs/checkpoint.c:860 f2fs_fill_super+0x2d42/0x8110 fs/f2fs/super.c:2883 mount_bdev+0x314/0x3e0 fs/super.c:1344 f2fs_mount+0x3c/0x50 fs/f2fs/super.c:3133 legacy_get_tree+0x131/0x460 fs/fs_context.c:729 vfs_get_tree+0x1cb/0x5c0 fs/super.c:1743 do_new_mount fs/namespace.c:2603 [inline] do_mount+0x6f2/0x1e20 fs/namespace.c:2927 ksys_mount+0x12d/0x140 fs/namespace.c:3143 __do_sys_mount fs/namespace.c:3157 [inline] __se_sys_mount fs/namespace.c:3154 [inline] __x64_sys_mount+0xbe/0x150 fs/namespace.c:3154 do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290 entry_SYSCALL_64_after_hwframe+0x49/0xbe RIP: 0033:0x45943a Code: b8 a6 00 00 00 0f 05 48 3d 01 f0 ff ff 0f 83 bd 8a fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 49 89 ca b8 a5 00 00 00 0f 05 <48> 3d 01 f0 ff ff 0f 83 9a 8a fb ff c3 66 0f 1f 84 00 00 00 00 00 RSP: 002b:00007f36a61d4a88 EFLAGS: 00000206 ORIG_RAX: 00000000000000a5 RAX: ffffffffffffffda RBX: 00007f36a61d4b30 RCX: 000000000045943a RDX: 00007f36a61d4ad0 RSI: 0000000020000100 RDI: 00007f36a61d4af0 RBP: 0000000020000100 R08: 00007f36a61d4b30 R09: 00007f36a61d4ad0 R10: 0000000000000000 R11: 0000000000000206 R12: 0000000000000013 R13: 0000000000000000 R14: 00000000004c8ea0 R15: 0000000000000000 Modules linked in: Dumping ftrace buffer: (ftrace buffer empty) ---[ end trace bd8550c129352286 ]--- RIP: 0010:__read_once_size include/linux/compiler.h:188 [inline] RIP: 0010:compound_head include/linux/page-flags.h:142 [inline] RIP: 0010:PageLocked include/linux/page-flags.h:272 [inline] RIP: 0010:f2fs_put_page fs/f2fs/f2fs.h:2011 [inline] RIP: 0010:validate_checkpoint+0x66d/0xec0 fs/f2fs/checkpoint.c:835 Code: e8 58 05 7f fe 4c 8d 6b 80 4d 8d 74 24 08 48 b8 00 00 00 00 00 fc ff df 4c 89 ea 48 c1 ea 03 c6 04 02 00 4c 89 f2 48 c1 ea 03 <80> 3c 02 00 0f 85 f4 06 00 00 4c 89 ea 4d 8b 7c 24 08 48 b8 00 00 RSP: 0018:ffff8801937cebe8 EFLAGS: 00010246 RAX: dffffc0000000000 RBX: ffff8801937cef30 RCX: ffffc90006035000 RDX: 0000000000000000 RSI: ffffffff82fd9658 RDI: 0000000000000005 netlink: 65342 bytes leftover after parsing attributes in process `syz-executor4'. RBP: ffff8801937cef58 R08: ffff8801ab254700 R09: fffff94000d9e026 openvswitch: netlink: Message has 8 unknown bytes. R10: fffff94000d9e026 R11: ffffea0006cf0137 R12: fffffffffffffffb R13: ffff8801937ceeb0 R14: 0000000000000003 R15: ffff880193419b40 FS: 00007f36a61d5700(0000) GS:ffff8801db100000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007fc04ff93000 CR3: 00000001d0562000 CR4: 00000000001426e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 In validate_checkpoint(), if we failed to call get_checkpoint_version(), we will pass returned invalid page pointer into f2fs_put_page, cause accessing invalid memory, this patch tries to handle error path correctly to fix this issue. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> |
|
Chao Yu | 50fa53eccf |
f2fs: fix to avoid broken of dnode block list
f2fs recovery flow is relying on dnode block link list, it means fsynced file recovery depends on previous dnode's persistence in the list, so during fsync() we should wait on all regular inode's dnode writebacked before issuing flush. By this way, we can avoid dnode block list being broken by out-of-order IO submission due to IO scheduler or driver. Sheng Yong helps to do the test with this patch: Target:/data (f2fs, -) 64MB / 32768KB / 4KB / 8 1 / PERSIST / Index Base: SEQ-RD(MB/s) SEQ-WR(MB/s) RND-RD(IOPS) RND-WR(IOPS) Insert(TPS) Update(TPS) Delete(TPS) 1 867.82 204.15 41440.03 41370.54 680.8 1025.94 1031.08 2 871.87 205.87 41370.3 40275.2 791.14 1065.84 1101.7 3 866.52 205.69 41795.67 40596.16 694.69 1037.16 1031.48 Avg 868.7366667 205.2366667 41535.33333 40747.3 722.21 1042.98 1054.753333 After: SEQ-RD(MB/s) SEQ-WR(MB/s) RND-RD(IOPS) RND-WR(IOPS) Insert(TPS) Update(TPS) Delete(TPS) 1 798.81 202.5 41143 40613.87 602.71 838.08 913.83 2 805.79 206.47 40297.2 41291.46 604.44 840.75 924.27 3 814.83 206.17 41209.57 40453.62 602.85 834.66 927.91 Avg 806.4766667 205.0466667 40883.25667 40786.31667 603.3333333 837.83 922.0033333 Patched/Original: 0.928332713 0.999074239 0.984300676 1.000957528 0.835398753 0.803303994 0.874141189 It looks like atomic write will suffer performance regression. I suspect that the criminal is that we forcing to wait all dnode being in storage cache before we issue PREFLUSH+FUA. BTW, will commit ("f2fs: don't need to wait for node writes for atomic write") cause the problem: we will lose data of last transaction after SPO, even if atomic write return no error: - atomic_open(); - write() P1, P2, P3; - atomic_commit(); - writeback data: P1, P2, P3; - writeback node: N1, N2, N3; <--- If N1, N2 is not writebacked, N3 with fsync_mark is writebacked, In SPOR, we won't find N3 since node chain is broken, turns out that losing last transaction. - preflush + fua; - power-cut If we don't wait dnode writeback for atomic_write: SEQ-RD(MB/s) SEQ-WR(MB/s) RND-RD(IOPS) RND-WR(IOPS) Insert(TPS) Update(TPS) Delete(TPS) 1 779.91 206.03 41621.5 40333.16 716.9 1038.21 1034.85 2 848.51 204.35 40082.44 39486.17 791.83 1119.96 1083.77 3 772.12 206.27 41335.25 41599.65 723.29 1055.07 971.92 Avg 800.18 205.55 41013.06333 40472.99333 744.0066667 1071.08 1030.18 Patched/Original: 0.92108464 1.001526693 0.987425886 0.993268102 1.030180511 1.026942031 0.976702294 SQLite's performance recovers. Jaegeuk: "Practically, I don't see db corruption becase of this. We can excuse to lose the last transaction." Finally, we decide to keep original implementation of atomic write interface sematics that we don't wait all dnode writeback before preflush+fua submission. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> |
|
Gustavo A. R. Silva | 6e45f2a59f |
f2fs: use true and false for boolean values
Return statements in functions returning bool should use true or false instead of an integer value. This issue was detected with the help of Coccinelle. Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> |
|
Chao Yu | e494c2f995 |
f2fs: fix to do sanity check with cp_pack_start_sum
After fuzzing, cp_pack_start_sum could be corrupted, so current log's summary info should be wrong due to loading incorrect summary block. Then, if segment's type in current log is exceeded NR_CURSEG_TYPE, it can lead accessing invalid dirty_i->dirty_segmap bitmap finally. Add sanity check for cp_pack_start_sum to fix this issue. https://bugzilla.kernel.org/show_bug.cgi?id=200419 - Reproduce - Kernel message (f2fs-dev w/ KASAN) [ 3117.578432] F2FS-fs (loop0): Invalid log blocks per segment (8) [ 3117.578445] F2FS-fs (loop0): Can't find valid F2FS filesystem in 2th superblock [ 3117.581364] F2FS-fs (loop0): invalid crc_offset: 30716 [ 3117.583564] WARNING: CPU: 1 PID: 1225 at fs/f2fs/checkpoint.c:90 __get_meta_page+0x448/0x4b0 [ 3117.583570] Modules linked in: snd_hda_codec_generic snd_hda_intel snd_hda_codec snd_hda_core snd_hwdep snd_pcm snd_timer joydev input_leds serio_raw snd soundcore mac_hid i2c_piix4 ib_iser rdma_cm iw_cm ib_cm ib_core configfs iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi btrfs zstd_decompress zstd_compress xxhash raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear 8139too qxl ttm drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcbc aesni_intel psmouse aes_x86_64 8139cp crypto_simd cryptd mii glue_helper pata_acpi floppy [ 3117.584014] CPU: 1 PID: 1225 Comm: mount Not tainted 4.17.0+ #1 [ 3117.584017] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014 [ 3117.584022] RIP: 0010:__get_meta_page+0x448/0x4b0 [ 3117.584023] Code: 00 49 8d bc 24 84 00 00 00 e8 74 54 da ff 41 83 8c 24 84 00 00 00 08 4c 89 f6 4c 89 ef e8 c0 d9 95 00 48 89 ef e8 18 e3 00 00 <0f> 0b f0 80 4d 48 04 e9 0f fe ff ff 0f 0b 48 89 c7 48 89 04 24 e8 [ 3117.584072] RSP: 0018:ffff88018eb678c0 EFLAGS: 00010286 [ 3117.584082] RAX: ffff88018f0a6a78 RBX: ffffea0007a46600 RCX: ffffffff9314d1b2 [ 3117.584085] RDX: ffffffff00000001 RSI: 0000000000000000 RDI: ffff88018f0a6a98 [ 3117.584087] RBP: ffff88018ebe9980 R08: 0000000000000002 R09: 0000000000000001 [ 3117.584090] R10: 0000000000000001 R11: ffffed00326e4450 R12: ffff880193722200 [ 3117.584092] R13: ffff88018ebe9afc R14: 0000000000000206 R15: ffff88018eb67900 [ 3117.584096] FS: 00007f5694636840(0000) GS:ffff8801f3b00000(0000) knlGS:0000000000000000 [ 3117.584098] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 3117.584101] CR2: 00000000016f21b8 CR3: 0000000191c22000 CR4: 00000000000006e0 [ 3117.584112] Call Trace: [ 3117.584121] ? f2fs_set_meta_page_dirty+0x150/0x150 [ 3117.584127] ? f2fs_build_segment_manager+0xbf9/0x3190 [ 3117.584133] ? f2fs_npages_for_summary_flush+0x75/0x120 [ 3117.584145] f2fs_build_segment_manager+0xda8/0x3190 [ 3117.584151] ? f2fs_get_valid_checkpoint+0x298/0xa00 [ 3117.584156] ? f2fs_flush_sit_entries+0x10e0/0x10e0 [ 3117.584184] ? map_id_range_down+0x17c/0x1b0 [ 3117.584188] ? __put_user_ns+0x30/0x30 [ 3117.584206] ? find_next_bit+0x53/0x90 [ 3117.584237] ? cpumask_next+0x16/0x20 [ 3117.584249] f2fs_fill_super+0x1948/0x2b40 [ 3117.584258] ? f2fs_commit_super+0x1a0/0x1a0 [ 3117.584279] ? sget_userns+0x65e/0x690 [ 3117.584296] ? set_blocksize+0x88/0x130 [ 3117.584302] ? f2fs_commit_super+0x1a0/0x1a0 [ 3117.584305] mount_bdev+0x1c0/0x200 [ 3117.584310] mount_fs+0x5c/0x190 [ 3117.584320] vfs_kern_mount+0x64/0x190 [ 3117.584330] do_mount+0x2e4/0x1450 [ 3117.584343] ? lockref_put_return+0x130/0x130 [ 3117.584347] ? copy_mount_string+0x20/0x20 [ 3117.584357] ? kasan_unpoison_shadow+0x31/0x40 [ 3117.584362] ? kasan_kmalloc+0xa6/0xd0 [ 3117.584373] ? memcg_kmem_put_cache+0x16/0x90 [ 3117.584377] ? __kmalloc_track_caller+0x196/0x210 [ 3117.584383] ? _copy_from_user+0x61/0x90 [ 3117.584396] ? memdup_user+0x3e/0x60 [ 3117.584401] ksys_mount+0x7e/0xd0 [ 3117.584405] __x64_sys_mount+0x62/0x70 [ 3117.584427] do_syscall_64+0x73/0x160 [ 3117.584440] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [ 3117.584455] RIP: 0033:0x7f5693f14b9a [ 3117.584456] Code: 48 8b 0d 01 c3 2b 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 49 89 ca b8 a5 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d ce c2 2b 00 f7 d8 64 89 01 48 [ 3117.584505] RSP: 002b:00007fff27346488 EFLAGS: 00000206 ORIG_RAX: 00000000000000a5 [ 3117.584510] RAX: ffffffffffffffda RBX: 00000000016e2030 RCX: 00007f5693f14b9a [ 3117.584512] RDX: 00000000016e2210 RSI: 00000000016e3f30 RDI: 00000000016ee040 [ 3117.584514] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000013 [ 3117.584516] R10: 00000000c0ed0000 R11: 0000000000000206 R12: 00000000016ee040 [ 3117.584519] R13: 00000000016e2210 R14: 0000000000000000 R15: 0000000000000003 [ 3117.584523] ---[ end trace a8e0d899985faf31 ]--- [ 3117.685663] F2FS-fs (loop0): f2fs_check_nid_range: out-of-range nid=2, run fsck to fix. [ 3117.685673] F2FS-fs (loop0): recover_data: ino = 2 (i_size: recover) recovered = 1, err = 0 [ 3117.685707] ================================================================== [ 3117.685955] BUG: KASAN: slab-out-of-bounds in __remove_dirty_segment+0xdd/0x1e0 [ 3117.686175] Read of size 8 at addr ffff88018f0a63d0 by task mount/1225 [ 3117.686477] CPU: 0 PID: 1225 Comm: mount Tainted: G W 4.17.0+ #1 [ 3117.686481] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014 [ 3117.686483] Call Trace: [ 3117.686494] dump_stack+0x71/0xab [ 3117.686512] print_address_description+0x6b/0x290 [ 3117.686517] kasan_report+0x28e/0x390 [ 3117.686522] ? __remove_dirty_segment+0xdd/0x1e0 [ 3117.686527] __remove_dirty_segment+0xdd/0x1e0 [ 3117.686532] locate_dirty_segment+0x189/0x190 [ 3117.686538] f2fs_allocate_new_segments+0xa9/0xe0 [ 3117.686543] recover_data+0x703/0x2c20 [ 3117.686547] ? f2fs_recover_fsync_data+0x48f/0xd50 [ 3117.686553] ? ksys_mount+0x7e/0xd0 [ 3117.686564] ? policy_nodemask+0x1a/0x90 [ 3117.686567] ? policy_node+0x56/0x70 [ 3117.686571] ? add_fsync_inode+0xf0/0xf0 [ 3117.686592] ? blk_finish_plug+0x44/0x60 [ 3117.686597] ? f2fs_ra_meta_pages+0x38b/0x5e0 [ 3117.686602] ? find_inode_fast+0xac/0xc0 [ 3117.686606] ? f2fs_is_valid_blkaddr+0x320/0x320 [ 3117.686618] ? __radix_tree_lookup+0x150/0x150 [ 3117.686633] ? dqget+0x670/0x670 [ 3117.686648] ? pagecache_get_page+0x29/0x410 [ 3117.686656] ? kmem_cache_alloc+0x176/0x1e0 [ 3117.686660] ? f2fs_is_valid_blkaddr+0x11d/0x320 [ 3117.686664] f2fs_recover_fsync_data+0xc23/0xd50 [ 3117.686670] ? f2fs_space_for_roll_forward+0x60/0x60 [ 3117.686674] ? rb_insert_color+0x323/0x3d0 [ 3117.686678] ? f2fs_recover_orphan_inodes+0xa5/0x700 [ 3117.686683] ? proc_register+0x153/0x1d0 [ 3117.686686] ? f2fs_remove_orphan_inode+0x10/0x10 [ 3117.686695] ? f2fs_attr_store+0x50/0x50 [ 3117.686700] ? proc_create_single_data+0x52/0x60 [ 3117.686707] f2fs_fill_super+0x1d06/0x2b40 [ 3117.686728] ? f2fs_commit_super+0x1a0/0x1a0 [ 3117.686735] ? sget_userns+0x65e/0x690 [ 3117.686740] ? set_blocksize+0x88/0x130 [ 3117.686745] ? f2fs_commit_super+0x1a0/0x1a0 [ 3117.686748] mount_bdev+0x1c0/0x200 [ 3117.686753] mount_fs+0x5c/0x190 [ 3117.686758] vfs_kern_mount+0x64/0x190 [ 3117.686762] do_mount+0x2e4/0x1450 [ 3117.686769] ? lockref_put_return+0x130/0x130 [ 3117.686773] ? copy_mount_string+0x20/0x20 [ 3117.686777] ? kasan_unpoison_shadow+0x31/0x40 [ 3117.686780] ? kasan_kmalloc+0xa6/0xd0 [ 3117.686786] ? memcg_kmem_put_cache+0x16/0x90 [ 3117.686790] ? __kmalloc_track_caller+0x196/0x210 [ 3117.686795] ? _copy_from_user+0x61/0x90 [ 3117.686801] ? memdup_user+0x3e/0x60 [ 3117.686804] ksys_mount+0x7e/0xd0 [ 3117.686809] __x64_sys_mount+0x62/0x70 [ 3117.686816] do_syscall_64+0x73/0x160 [ 3117.686824] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [ 3117.686829] RIP: 0033:0x7f5693f14b9a [ 3117.686830] Code: 48 8b 0d 01 c3 2b 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 49 89 ca b8 a5 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d ce c2 2b 00 f7 d8 64 89 01 48 [ 3117.686887] RSP: 002b:00007fff27346488 EFLAGS: 00000206 ORIG_RAX: 00000000000000a5 [ 3117.686892] RAX: ffffffffffffffda RBX: 00000000016e2030 RCX: 00007f5693f14b9a [ 3117.686894] RDX: 00000000016e2210 RSI: 00000000016e3f30 RDI: 00000000016ee040 [ 3117.686896] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000013 [ 3117.686899] R10: 00000000c0ed0000 R11: 0000000000000206 R12: 00000000016ee040 [ 3117.686901] R13: 00000000016e2210 R14: 0000000000000000 R15: 0000000000000003 [ 3117.687005] Allocated by task 1225: [ 3117.687152] kasan_kmalloc+0xa6/0xd0 [ 3117.687157] kmem_cache_alloc_trace+0xfd/0x200 [ 3117.687161] f2fs_build_segment_manager+0x2d09/0x3190 [ 3117.687165] f2fs_fill_super+0x1948/0x2b40 [ 3117.687168] mount_bdev+0x1c0/0x200 [ 3117.687171] mount_fs+0x5c/0x190 [ 3117.687174] vfs_kern_mount+0x64/0x190 [ 3117.687177] do_mount+0x2e4/0x1450 [ 3117.687180] ksys_mount+0x7e/0xd0 [ 3117.687182] __x64_sys_mount+0x62/0x70 [ 3117.687186] do_syscall_64+0x73/0x160 [ 3117.687190] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [ 3117.687285] Freed by task 19: [ 3117.687412] __kasan_slab_free+0x137/0x190 [ 3117.687416] kfree+0x8b/0x1b0 [ 3117.687460] ttm_bo_man_put_node+0x61/0x80 [ttm] [ 3117.687476] ttm_bo_cleanup_refs+0x15f/0x250 [ttm] [ 3117.687492] ttm_bo_delayed_delete+0x2f0/0x300 [ttm] [ 3117.687507] ttm_bo_delayed_workqueue+0x17/0x50 [ttm] [ 3117.687528] process_one_work+0x2f9/0x740 [ 3117.687531] worker_thread+0x78/0x6b0 [ 3117.687541] kthread+0x177/0x1c0 [ 3117.687545] ret_from_fork+0x35/0x40 [ 3117.687638] The buggy address belongs to the object at ffff88018f0a6300 which belongs to the cache kmalloc-192 of size 192 [ 3117.688014] The buggy address is located 16 bytes to the right of 192-byte region [ffff88018f0a6300, ffff88018f0a63c0) [ 3117.688382] The buggy address belongs to the page: [ 3117.688554] page:ffffea00063c2980 count:1 mapcount:0 mapping:ffff8801f3403180 index:0x0 [ 3117.688788] flags: 0x17fff8000000100(slab) [ 3117.688944] raw: 017fff8000000100 ffffea00063c2840 0000000e0000000e ffff8801f3403180 [ 3117.689166] raw: 0000000000000000 0000000080100010 00000001ffffffff 0000000000000000 [ 3117.689386] page dumped because: kasan: bad access detected [ 3117.689653] Memory state around the buggy address: [ 3117.689816] ffff88018f0a6280: fb fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc [ 3117.690027] ffff88018f0a6300: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [ 3117.690239] >ffff88018f0a6380: 00 00 fc fc fc fc fc fc fc fc fc fc fc fc fc fc [ 3117.690448] ^ [ 3117.690644] ffff88018f0a6400: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [ 3117.690868] ffff88018f0a6480: 00 00 fc fc fc fc fc fc fc fc fc fc fc fc fc fc [ 3117.691077] ================================================================== [ 3117.691290] Disabling lock debugging due to kernel taint [ 3117.693893] BUG: unable to handle kernel NULL pointer dereference at 0000000000000000 [ 3117.694120] PGD 80000001f01bc067 P4D 80000001f01bc067 PUD 1d9638067 PMD 0 [ 3117.694338] Oops: 0002 [#1] SMP KASAN PTI [ 3117.694490] CPU: 1 PID: 1225 Comm: mount Tainted: G B W 4.17.0+ #1 [ 3117.694703] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014 [ 3117.695073] RIP: 0010:__remove_dirty_segment+0xe2/0x1e0 [ 3117.695246] Code: c4 48 89 c7 e8 cf bb d7 ff 45 0f b6 24 24 41 83 e4 3f 44 88 64 24 07 41 83 e4 3f 4a 8d 7c e3 08 e8 b3 bc d7 ff 4a 8b 4c e3 08 <f0> 4c 0f b3 29 0f 82 94 00 00 00 48 8d bd 20 04 00 00 e8 97 bb d7 [ 3117.695793] RSP: 0018:ffff88018eb67638 EFLAGS: 00010292 [ 3117.695969] RAX: 0000000000000000 RBX: ffff88018f0a6300 RCX: 0000000000000000 [ 3117.696182] RDX: 0000000000000000 RSI: 0000000000000297 RDI: 0000000000000297 [ 3117.696391] RBP: ffff88018ebe9980 R08: ffffed003e743ebb R09: ffffed003e743ebb [ 3117.696604] R10: 0000000000000001 R11: ffffed003e743eba R12: 0000000000000019 [ 3117.696813] R13: 0000000000000014 R14: 0000000000000320 R15: ffff88018ebe99e0 [ 3117.697032] FS: 00007f5694636840(0000) GS:ffff8801f3b00000(0000) knlGS:0000000000000000 [ 3117.697280] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 3117.702357] CR2: 00007fe89bb1a000 CR3: 0000000191c22000 CR4: 00000000000006e0 [ 3117.707235] Call Trace: [ 3117.712077] locate_dirty_segment+0x189/0x190 [ 3117.716891] f2fs_allocate_new_segments+0xa9/0xe0 [ 3117.721617] recover_data+0x703/0x2c20 [ 3117.726316] ? f2fs_recover_fsync_data+0x48f/0xd50 [ 3117.730957] ? ksys_mount+0x7e/0xd0 [ 3117.735573] ? policy_nodemask+0x1a/0x90 [ 3117.740198] ? policy_node+0x56/0x70 [ 3117.744829] ? add_fsync_inode+0xf0/0xf0 [ 3117.749487] ? blk_finish_plug+0x44/0x60 [ 3117.754152] ? f2fs_ra_meta_pages+0x38b/0x5e0 [ 3117.758831] ? find_inode_fast+0xac/0xc0 [ 3117.763448] ? f2fs_is_valid_blkaddr+0x320/0x320 [ 3117.768046] ? __radix_tree_lookup+0x150/0x150 [ 3117.772603] ? dqget+0x670/0x670 [ 3117.777159] ? pagecache_get_page+0x29/0x410 [ 3117.781648] ? kmem_cache_alloc+0x176/0x1e0 [ 3117.786067] ? f2fs_is_valid_blkaddr+0x11d/0x320 [ 3117.790476] f2fs_recover_fsync_data+0xc23/0xd50 [ 3117.794790] ? f2fs_space_for_roll_forward+0x60/0x60 [ 3117.799086] ? rb_insert_color+0x323/0x3d0 [ 3117.803304] ? f2fs_recover_orphan_inodes+0xa5/0x700 [ 3117.807563] ? proc_register+0x153/0x1d0 [ 3117.811766] ? f2fs_remove_orphan_inode+0x10/0x10 [ 3117.815947] ? f2fs_attr_store+0x50/0x50 [ 3117.820087] ? proc_create_single_data+0x52/0x60 [ 3117.824262] f2fs_fill_super+0x1d06/0x2b40 [ 3117.828367] ? f2fs_commit_super+0x1a0/0x1a0 [ 3117.832432] ? sget_userns+0x65e/0x690 [ 3117.836500] ? set_blocksize+0x88/0x130 [ 3117.840501] ? f2fs_commit_super+0x1a0/0x1a0 [ 3117.844420] mount_bdev+0x1c0/0x200 [ 3117.848275] mount_fs+0x5c/0x190 [ 3117.852053] vfs_kern_mount+0x64/0x190 [ 3117.855810] do_mount+0x2e4/0x1450 [ 3117.859441] ? lockref_put_return+0x130/0x130 [ 3117.862996] ? copy_mount_string+0x20/0x20 [ 3117.866417] ? kasan_unpoison_shadow+0x31/0x40 [ 3117.869719] ? kasan_kmalloc+0xa6/0xd0 [ 3117.872948] ? memcg_kmem_put_cache+0x16/0x90 [ 3117.876121] ? __kmalloc_track_caller+0x196/0x210 [ 3117.879333] ? _copy_from_user+0x61/0x90 [ 3117.882467] ? memdup_user+0x3e/0x60 [ 3117.885604] ksys_mount+0x7e/0xd0 [ 3117.888700] __x64_sys_mount+0x62/0x70 [ 3117.891742] do_syscall_64+0x73/0x160 [ 3117.894692] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [ 3117.897669] RIP: 0033:0x7f5693f14b9a [ 3117.900563] Code: 48 8b 0d 01 c3 2b 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 49 89 ca b8 a5 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d ce c2 2b 00 f7 d8 64 89 01 48 [ 3117.906922] RSP: 002b:00007fff27346488 EFLAGS: 00000206 ORIG_RAX: 00000000000000a5 [ 3117.910159] RAX: ffffffffffffffda RBX: 00000000016e2030 RCX: 00007f5693f14b9a [ 3117.913469] RDX: 00000000016e2210 RSI: 00000000016e3f30 RDI: 00000000016ee040 [ 3117.916764] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000013 [ 3117.920071] R10: 00000000c0ed0000 R11: 0000000000000206 R12: 00000000016ee040 [ 3117.923393] R13: 00000000016e2210 R14: 0000000000000000 R15: 0000000000000003 [ 3117.926680] Modules linked in: snd_hda_codec_generic snd_hda_intel snd_hda_codec snd_hda_core snd_hwdep snd_pcm snd_timer joydev input_leds serio_raw snd soundcore mac_hid i2c_piix4 ib_iser rdma_cm iw_cm ib_cm ib_core configfs iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi btrfs zstd_decompress zstd_compress xxhash raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear 8139too qxl ttm drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcbc aesni_intel psmouse aes_x86_64 8139cp crypto_simd cryptd mii glue_helper pata_acpi floppy [ 3117.949979] CR2: 0000000000000000 [ 3117.954283] ---[ end trace a8e0d899985faf32 ]--- [ 3117.958575] RIP: 0010:__remove_dirty_segment+0xe2/0x1e0 [ 3117.962810] Code: c4 48 89 c7 e8 cf bb d7 ff 45 0f b6 24 24 41 83 e4 3f 44 88 64 24 07 41 83 e4 3f 4a 8d 7c e3 08 e8 b3 bc d7 ff 4a 8b 4c e3 08 <f0> 4c 0f b3 29 0f 82 94 00 00 00 48 8d bd 20 04 00 00 e8 97 bb d7 [ 3117.971789] RSP: 0018:ffff88018eb67638 EFLAGS: 00010292 [ 3117.976333] RAX: 0000000000000000 RBX: ffff88018f0a6300 RCX: 0000000000000000 [ 3117.980926] RDX: 0000000000000000 RSI: 0000000000000297 RDI: 0000000000000297 [ 3117.985497] RBP: ffff88018ebe9980 R08: ffffed003e743ebb R09: ffffed003e743ebb [ 3117.990098] R10: 0000000000000001 R11: ffffed003e743eba R12: 0000000000000019 [ 3117.994761] R13: 0000000000000014 R14: 0000000000000320 R15: ffff88018ebe99e0 [ 3117.999392] FS: 00007f5694636840(0000) GS:ffff8801f3b00000(0000) knlGS:0000000000000000 [ 3118.004096] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 3118.008816] CR2: 00007fe89bb1a000 CR3: 0000000191c22000 CR4: 00000000000006e0 - Location https://elixir.bootlin.com/linux/v4.18-rc3/source/fs/f2fs/segment.c#L775 if (test_and_clear_bit(segno, dirty_i->dirty_segmap[t])) dirty_i->nr_dirty[t]--; Here dirty_i->dirty_segmap[t] can be NULL which leads to crash in test_and_clear_bit() Reported-by Wen Xu <wen.xu@gatech.edu> Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> |
|
Jaegeuk Kim | 8d714f8aa3 |
f2fs: avoid f2fs_bug_on() in cp_error case
There is a subtle race condition to invoke f2fs_bug_on() in shutdown tests. I've confirmed that the last checkpoint is preserved in consistent state, so it'd be fine to just return error at this moment. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> |
|
Chao Yu | 66110abc4c |
f2fs: fix to clear PG_checked flag in set_page_dirty()
PG_checked flag will be set on data page during GC, later, we can recognize such page by the flag and migrate page to cold segment. But previously, we don't clear this flag when invalidating data page, after page redirtying, we will write it into wrong log. Let's clear PG_checked flag in set_page_dirty() to avoid this. Signed-off-by: Weichao Guo <guoweichao@huawei.com> Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> |
|
Darrick J. Wong | 13942aa94a |
xfs: repair the AGI
Rebuild the AGI header items with some help from the rmapbt. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com> |
|
Darrick J. Wong | 0e93d3f43e |
xfs: repair the AGFL
Repair the AGFL from the rmap data. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com> |
|
Darrick J. Wong | f9ed6debca |
xfs: repair the AGF
Regenerate the AGF from the rmap data. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com> |
|
Steve French | cdeaf9d04a |
smb3: allow previous versions to be mounted with snapshot= mount parm
mounting with the "snapshots=" mount parm allows a read-only view of a previous version of a file system (see MS-SMB2 and "timewarp" tokens, section 2.2.13.2.6) based on the timestamp passed in on the snapshots mount parm. Add processing to optionally send this create context. Example output: /mnt1 is mounted with "snapshots=..." and will see an earlier version of the directory, with three fewer files than /mnt2 the current version of the directory. root@Ubuntu-17-Virtual-Machine:~/cifs-2.6# cat /proc/mounts | grep cifs //172.22.149.186/public /mnt1 cifs ro,relatime,vers=default,cache=strict,username=smfrench,uid=0,noforceuid,gid=0,noforcegid,addr=172.22.149.186,file_mode=0755,dir_mode=0755,soft,nounix,mapposix,rsize=1048576,wsize=1048576,echo_interval=60,snapshot=131748608570000000,actimeo=1 //172.22.149.186/public /mnt2 cifs rw,relatime,vers=default,cache=strict,username=smfrench,uid=0,noforceuid,gid=0,noforcegid,addr=172.22.149.186,file_mode=0755,dir_mode=0755,soft,nounix,mapposix,rsize=1048576,wsize=1048576,echo_interval=60,actimeo=1 root@Ubuntu-17-Virtual-Machine:~/cifs-2.6# ls /mnt1 EmptyDir newerdir root@Ubuntu-17-Virtual-Machine:~/cifs-2.6# ls /mnt1/newerdir root@Ubuntu-17-Virtual-Machine:~/cifs-2.6# ls /mnt2 EmptyDir file newerdir newestdir timestamp-trace.cap root@Ubuntu-17-Virtual-Machine:~/cifs-2.6# ls /mnt2/newerdir new-file-not-in-snapshot Snapshots are extremely useful for comparing previous versions of files or directories, and recovering from data corruptions or mistakes. Signed-off-by: Steve French <stfrench@microsoft.com> Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com> |
|
Ronnie Sahlberg | e55954a5f7 |
cifs: don't show domain= in mount output when domain is empty
Reported-by: Xiaoli Feng <xifeng@redhat.com> Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com> Signed-off-by: Steve French <stfrench@microsoft.com> Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com> |
|
Ronnie Sahlberg | c1777df1a5 |
cifs: add missing support for ACLs in SMB 3.11
We were missing the methods for get_acl and friends for the 3.11 dialect. Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com> Signed-off-by: Steve French <stfrench@microsoft.com> CC: Stable <stable@vger.kernel.org> Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com> |
|
Steve French | e02789a53d |
smb3: enumerating snapshots was leaving part of the data off end
When enumerating snapshots, the last few bytes of the final snapshot could be left off since we were miscalculating the length returned (leaving off the sizeof struct SRV_SNAPSHOT_ARRAY) See MS-SMB2 section 2.2.32.2. In addition fixup the length used to allow smaller buffer to be passed in, in order to allow returning the size of the whole snapshot array more easily. Sample userspace output with a kernel patched with this (mounted to a Windows volume with two snapshots). Before this patch, the second snapshot would be missing a few bytes at the end. ~/cifs-2.6# ~/enum-snapshots /mnt/file press enter to issue the ioctl to retrieve snapshot information ... size of snapshot array = 102 Num snapshots: 2 Num returned: 2 Array Size: 102 Snapshot 0:@GMT-2018.06.30-19.34.17 Snapshot 1:@GMT-2018.06.30-19.33.37 CC: Stable <stable@vger.kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com> Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com> |
|
Ronnie Sahlberg | 730928c8f4 |
cifs: update smb2_queryfs() to use compounding
Change smb2_queryfs() to use a Create/QueryInfo/Close compound request. Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com> Signed-off-by: Steve French <stfrench@microsoft.com> Reviewed-by: Paulo Alcantara <palcantara@suse.com> Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com> |
|
Ronnie Sahlberg | b24df3e30c |
cifs: update receive_encrypted_standard to handle compounded responses
Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com> Signed-off-by: Steve French <stfrench@microsoft.com> Reviewed-by: Paulo Alcantara <palcantara@suse.com> Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com> |
|
Al Viro | 4c0d7cd5c8 |
make sure that __dentry_kill() always invalidates d_seq, unhashed or not
RCU pathwalk relies upon the assumption that anything that changes ->d_inode of a dentry will invalidate its ->d_seq. That's almost true - the one exception is that the final dput() of already unhashed dentry does *not* touch ->d_seq at all. Unhashing does, though, so for anything we'd found by RCU dcache lookup we are fine. Unfortunately, we can *start* with an unhashed dentry or jump into it. We could try and be careful in the (few) places where that could happen. Or we could just make the final dput() invalidate the damn thing, unhashed or not. The latter is much simpler and easier to backport, so let's do it that way. Reported-by: "Dae R. Jeong" <threeearcat@gmail.com> Cc: stable@vger.kernel.org Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> |
|
Al Viro | 119e1ef80e |
fix __legitimize_mnt()/mntput() race
__legitimize_mnt() has two problems - one is that in case of success
the check of mount_lock is not ordered wrt preceding increment of
refcount, making it possible to have successful __legitimize_mnt()
on one CPU just before the otherwise final mntpu() on another,
with __legitimize_mnt() not seeing mntput() taking the lock and
mntput() not seeing the increment done by __legitimize_mnt().
Solved by a pair of barriers.
Another is that failure of __legitimize_mnt() on the second
read_seqretry() leaves us with reference that'll need to be
dropped by caller; however, if that races with final mntput()
we can end up with caller dropping rcu_read_lock() and doing
mntput() to release that reference - with the first mntput()
having freed the damn thing just as rcu_read_lock() had been
dropped. Solution: in "do mntput() yourself" failure case
grab mount_lock, check if MNT_DOOMED has been set by racing
final mntput() that has missed our increment and if it has -
undo the increment and treat that as "failure, caller doesn't
need to drop anything" case.
It's not easy to hit - the final mntput() has to come right
after the first read_seqretry() in __legitimize_mnt() *and*
manage to miss the increment done by __legitimize_mnt() before
the second read_seqretry() in there. The things that are almost
impossible to hit on bare hardware are not impossible on SMP
KVM, though...
Reported-by: Oleg Nesterov <oleg@redhat.com>
Fixes:
|
|
Al Viro | 9ea0a46ca2 |
fix mntput/mntput race
mntput_no_expire() does the calculation of total refcount under mount_lock;
unfortunately, the decrement (as well as all increments) are done outside
of it, leading to false positives in the "are we dropping the last reference"
test. Consider the following situation:
* mnt is a lazy-umounted mount, kept alive by two opened files. One
of those files gets closed. Total refcount of mnt is 2. On CPU 42
mntput(mnt) (called from __fput()) drops one reference, decrementing component
* After it has looked at component #0, the process on CPU 0 does
mntget(), incrementing component #0, gets preempted and gets to run again -
on CPU 69. There it does mntput(), which drops the reference (component #69)
and proceeds to spin on mount_lock.
* On CPU 42 our first mntput() finishes counting. It observes the
decrement of component #69, but not the increment of component #0. As the
result, the total it gets is not 1 as it should've been - it's 0. At which
point we decide that vfsmount needs to be killed and proceed to free it and
shut the filesystem down. However, there's still another opened file
on that filesystem, with reference to (now freed) vfsmount, etc. and we are
screwed.
It's not a wide race, but it can be reproduced with artificial slowdown of
the mnt_get_count() loop, and it should be easier to hit on SMP KVM setups.
Fix consists of moving the refcount decrement under mount_lock; the tricky
part is that we want (and can) keep the fast case (i.e. mount that still
has non-NULL ->mnt_ns) entirely out of mount_lock. All places that zero
mnt->mnt_ns are dropping some reference to mnt and they call synchronize_rcu()
before that mntput(). IOW, if mntput() observes (under rcu_read_lock())
a non-NULL ->mnt_ns, it is guaranteed that there is another reference yet to
be dropped.
Reported-by: Jann Horn <jannh@google.com>
Tested-by: Jann Horn <jannh@google.com>
Fixes:
|
|
Gustavo A. R. Silva | a677a78325 |
nfsd: use true and false for boolean values
Return statements in functions returning bool should use true or false instead of an integer value. This issue was detected with the help of Coccinelle. Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com> |
|
Eric Biggers | c2cdc2ab24 |
nfsd: constify write_op[]
write_op[] is never modified, so make it 'const'. Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com> |
|
nixiaoming | 5ed96bc545 |
fs/nfsd: Delete invalid assignment statements in nfsd4_decode_exchange_id
READ_BUF(8); dummy = be32_to_cpup(p++); dummy = be32_to_cpup(p++); ... READ_BUF(4); dummy = be32_to_cpup(p++); Assigning value to "dummy" here, but that stored value is overwritten before it can be used. At the same time READ_BUF() will re-update the pointer p. delete invalid assignment statements Signed-off-by: nixiaoming <nixiaoming@huawei.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <trondmy@hammerspace.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com> |
|
Chuck Lever | 11b4d66ea3 |
NFSD: Handle full-length symlinks
I've given up on the idea of zero-copy handling of SYMLINK on the server side. This is because the Linux VFS symlink API requires the symlink pathname to be in a NUL-terminated kmalloc'd buffer. The NUL-termination is going to be problematic (watching out for landing on a page boundary and dealing with a 4096-byte pathname). I don't believe that SYMLINK creation is on a performance path or is requested frequently enough that it will cause noticeable CPU cache pollution due to data copies. There will be two places where a transport callout will be necessary to fill in the rqstp: one will be in the svc_fill_symlink_pathname() helper that is used by NFSv2 and NFSv3, and the other will be in nfsd4_decode_create(). Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com> |