Commit Graph

145 Commits

Author SHA1 Message Date
Linus Torvalds 80201fe175 for-5.1/block-20190302
-----BEGIN PGP SIGNATURE-----
 
 iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAlx63XIQHGF4Ym9lQGtl
 cm5lbC5kawAKCRD301j7KXHgpp2vEACfrrQsap7R+Av28mmXpmXi2FPa3g5Tev1t
 yYjK2qHvhlMZjPTYw3hCmbYdDDczlF7PEgSE2x2DjdcsYapb8Fy1lZ2X16c7ztBR
 HD/t9b5AVSQsczZzKgv3RqsNtTnjzS5V0A8XH8FAP2QRgiwDMwSN6G0FP0JBLbE/
 ZgxQrH1Iy1F33Wz4hI3Z7dEghKPZrH1IlegkZCEu47q9SlWS76qUetSy2GEtchOl
 3Lgu54mQZyVdI5/QZf9DyMDLF6dIz3tYU2qhuo01AHjGRCC72v86p8sIiXcUr94Q
 8pbegJhJ/g8KBol9Qhv3+pWG/QUAZwi/ZwasTkK+MJ4klRXfOrznxPubW1z6t9Vn
 QRo39Po5SqqP0QWAscDxCFjESIQlWlKa+LZurJL7DJDCUGrSgzTpnVwFqKwc5zTP
 HJa5MT2tEeL2TfUYRYCfh0ZV0elINdHA1y1klDBh38drh4EWr2gW8xdseGYXqRjh
 fLgEpoF7VQ8kTvxKN+E4jZXkcZmoLmefp0ZyAbblS6IawpPVC7kXM9Fdn2OU8f2c
 fjVjvSiqxfeN6dnpfeLDRbbN9894HwgP/LPropJOQ7KmjCorQq5zMDkAvoh3tElq
 qwluRqdBJpWT/F05KweY+XVW8OawIycmUWqt6JrVNoIDAK31auHQv47kR0VA4OvE
 DRVVhYpocw==
 =VBaU
 -----END PGP SIGNATURE-----

Merge tag 'for-5.1/block-20190302' of git://git.kernel.dk/linux-block

Pull block layer updates from Jens Axboe:
 "Not a huge amount of changes in this round, the biggest one is that we
  finally have Mings multi-page bvec support merged. Apart from that,
  this pull request contains:

   - Small series that avoids quiescing the queue for sysfs changes that
     match what we currently have (Aleksei)

   - Series of bcache fixes (via Coly)

   - Series of lightnvm fixes (via Mathias)

   - NVMe pull request from Christoph. Nothing major, just SPDX/license
     cleanups, RR mp policy (Hannes), and little fixes (Bart,
     Chaitanya).

   - BFQ series (Paolo)

   - Save blk-mq cpu -> hw queue mapping, removing a pointer indirection
     for the fast path (Jianchao)

   - fops->iopoll() added for async IO polling, this is a feature that
     the upcoming io_uring interface will use (Christoph, me)

   - Partition scan loop fixes (Dongli)

   - mtip32xx conversion from managed resource API (Christoph)

   - cdrom registration race fix (Guenter)

   - MD pull from Song, two minor fixes.

   - Various documentation fixes (Marcos)

   - Multi-page bvec feature. This brings a lot of nice improvements
     with it, like more efficient splitting, larger IOs can be supported
     without growing the bvec table size, and so on. (Ming)

   - Various little fixes to core and drivers"

* tag 'for-5.1/block-20190302' of git://git.kernel.dk/linux-block: (117 commits)
  block: fix updating bio's front segment size
  block: Replace function name in string with __func__
  nbd: propagate genlmsg_reply return code
  floppy: remove set but not used variable 'q'
  null_blk: fix checking for REQ_FUA
  block: fix NULL pointer dereference in register_disk
  fs: fix guard_bio_eod to check for real EOD errors
  blk-mq: use HCTX_TYPE_DEFAULT but not 0 to index blk_mq_tag_set->map
  block: optimize bvec iteration in bvec_iter_advance
  block: introduce mp_bvec_for_each_page() for iterating over page
  block: optimize blk_bio_segment_split for single-page bvec
  block: optimize __blk_segment_map_sg() for single-page bvec
  block: introduce bvec_nth_page()
  iomap: wire up the iopoll method
  block: add bio_set_polled() helper
  block: wire up block device iopoll method
  fs: add an iopoll method to struct file_operations
  loop: set GENHD_FL_NO_PART_SCAN after blkdev_reread_part()
  loop: do not print warn message if partition scan is successful
  block: bounce: make sure that bvec table is updated
  ...
2019-03-08 14:12:17 -08:00
Gao Xiang a112152f6f staging: erofs: fix mis-acted TAIL merging behavior
EROFS has an optimized path called TAIL merging, which is designed
to merge multiple reads and the corresponding decompressions into
one if these requests read continuous pages almost at the same time.

In general, it behaves as follows:
 ________________________________________________________________
  ... |  TAIL  .  HEAD  |  PAGE  |  PAGE  |  TAIL    . HEAD | ...
 _____|_combined page A_|________|________|_combined page B_|____
        1  ]  ->  [  2                          ]  ->  [ 3
If the above three reads are requested in the order 1-2-3, it will
generate a large work chain rather than 3 individual work chains
to reduce scheduling overhead and boost up sequential read.

However, if Read 2 is processed slightly earlier than Read 1,
currently it still generates 2 individual work chains (chain 1, 2)
but it does in-place decompression for combined page A, moreover,
if chain 2 decompresses ahead of chain 1, it will be a race and
lead to corrupted decompressed page. This patch fixes it.

Fixes: 3883a79abd ("staging: erofs: introduce VLE decompression support")
Cc: <stable@vger.kernel.org> # 4.19+
Signed-off-by: Gao Xiang <gaoxiang25@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-02-27 15:41:57 +01:00
Gao Xiang 1e5ceeab69 staging: erofs: fix illegal address access under memory pressure
Considering a read request with two decompressed file pages,
If a decompression work cannot be started on the previous page
due to memory pressure but in-memory LTP map lookup is done,
builder->work should be still NULL.

Moreover, if the current page also belongs to the same map,
it won't try to start the decompression work again and then
run into trouble.

This patch aims to solve the above issue only with little changes
as much as possible in order to make the fix backport easier.

kernel message is:
<4>[1051408.015930s]SLUB: Unable to allocate memory on node -1, gfp=0x2408040(GFP_NOFS|__GFP_ZERO)
<4>[1051408.015930s]  cache: erofs_compress, object size: 144, buffer size: 144, default order: 0, min order: 0
<4>[1051408.015930s]  node 0: slabs: 98, objs: 2744, free: 0
  * Cannot allocate the decompression work

<3>[1051408.015960s]erofs: z_erofs_vle_normalaccess_readpages, readahead error at page 1008 of nid 5391488
  * Note that the previous page was failed to read

<0>[1051408.015960s]Internal error: Accessing user space memory outside uaccess.h routines: 96000005 [#1] PREEMPT SMP
...
<4>[1051408.015991s]Hardware name: kirin710 (DT)
...
<4>[1051408.016021s]PC is at z_erofs_vle_work_add_page+0xa0/0x17c
<4>[1051408.016021s]LR is at z_erofs_do_read_page+0x12c/0xcf0
...
<4>[1051408.018096s][<ffffff80c6fb0fd4>] z_erofs_vle_work_add_page+0xa0/0x17c
<4>[1051408.018096s][<ffffff80c6fb3814>] z_erofs_vle_normalaccess_readpages+0x1a0/0x37c
<4>[1051408.018096s][<ffffff80c6d670b8>] read_pages+0x70/0x190
<4>[1051408.018127s][<ffffff80c6d6736c>] __do_page_cache_readahead+0x194/0x1a8
<4>[1051408.018127s][<ffffff80c6d59318>] filemap_fault+0x398/0x684
<4>[1051408.018127s][<ffffff80c6d8a9e0>] __do_fault+0x8c/0x138
<4>[1051408.018127s][<ffffff80c6d8f90c>] handle_pte_fault+0x730/0xb7c
<4>[1051408.018127s][<ffffff80c6d8fe04>] __handle_mm_fault+0xac/0xf4
<4>[1051408.018157s][<ffffff80c6d8fec8>] handle_mm_fault+0x7c/0x118
<4>[1051408.018157s][<ffffff80c8c52998>] do_page_fault+0x354/0x474
<4>[1051408.018157s][<ffffff80c8c52af8>] do_translation_fault+0x40/0x48
<4>[1051408.018157s][<ffffff80c6c002f4>] do_mem_abort+0x80/0x100
<4>[1051408.018310s]---[ end trace 9f4009a3283bd78b ]---

Fixes: 3883a79abd ("staging: erofs: introduce VLE decompression support")
Cc: <stable@vger.kernel.org> # 4.19+
Signed-off-by: Gao Xiang <gaoxiang25@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-02-27 15:41:57 +01:00
Gao Xiang af692e117c staging: erofs: compressed_pages should not be accessed again after freed
This patch resolves the following page use-after-free issue,
z_erofs_vle_unzip:
    ...
    for (i = 0; i < nr_pages; ++i) {
        ...
        z_erofs_onlinepage_endio(page);  (1)
    }

    for (i = 0; i < clusterpages; ++i) {
        page = compressed_pages[i];

        if (page->mapping == mngda)      (2)
            continue;
        /* recycle all individual staging pages */
        (void)z_erofs_gather_if_stagingpage(page_pool, page); (3)
        WRITE_ONCE(compressed_pages[i], NULL);
    }
    ...

After (1) is executed, page is freed and could be then reused, if
compressed_pages is scanned after that, it could fall info (2) or
(3) by mistake and that could finally be in a mess.

This patch aims to solve the above issue only with little changes
as much as possible in order to make the fix backport easier.

Fixes: 3883a79abd ("staging: erofs: introduce VLE decompression support")
Cc: <stable@vger.kernel.org> # 4.19+
Signed-off-by: Gao Xiang <gaoxiang25@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-02-27 15:41:57 +01:00
Gao Xiang bee1568293 staging: erofs: switch to ->iterate_shared()
After commit 6192269444 ("introduce a parallel variant of ->iterate()"),
readdir can be done without taking exclusive inode lock of course.

Signed-off-by: Gao Xiang <gaoxiang25@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-02-26 11:52:46 +01:00
Gao Xiang b2bb112db1 staging: erofs: no need to take page lock in readdir
VFS will take inode_lock for readdir, therefore no need to
take page lock in readdir at all just as the majority of
other generic filesystems.

Signed-off-by: Gao Xiang <gaoxiang25@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-02-26 11:52:45 +01:00
Gao Xiang 047d4abc4d staging: erofs: remove rcu_read_lock() in erofs_try_to_free_cached_page
page_private(page) cannot be changed if page lock is taken.

Besides, the corresponding workgroup won't be freed
if the page is already protected by page lock, therefore
no need to take rcu read lock.

Signed-off-by: Gao Xiang <gaoxiang25@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-02-20 11:20:55 +01:00
Gao Xiang 62dc45979f staging: erofs: fix race of initializing xattrs of a inode at the same time
In real scenario, there could be several threads accessing xattrs
of the same xattr-uninitialized inode, and init_inode_xattrs()
almost at the same time.

That's actually an unexpected behavior, this patch closes the race.

Fixes: b17500a0fd ("staging: erofs: introduce xattr & acl support")
Cc: <stable@vger.kernel.org> # 4.19+
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Gao Xiang <gaoxiang25@huawei.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-02-20 11:20:55 +01:00
Bhanusree Pola cbebe5d05d staging: erofs: match alignment with open parentheses
Align code with open parantheses to improve the readability.
Issue found using checkpatch.pl

Signed-off-by: Bhanusree Pola <bhanusreemahesh@gmail.com>
Reviewed-by: Gao Xiang <gaoxiang25@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-02-19 15:18:49 +01:00
Ming Lei 6dc4f100c1 block: allow bio_for_each_segment_all() to iterate over multi-page bvec
This patch introduces one extra iterator variable to bio_for_each_segment_all(),
then we can allow bio_for_each_segment_all() to iterate over multi-page bvec.

Given it is just one mechannical & simple change on all bio_for_each_segment_all()
users, this patch does tree-wide change in one single patch, so that we can
avoid to use a temporary helper for this conversion.

Reviewed-by: Omar Sandoval <osandov@fb.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-02-15 08:40:11 -07:00
Gao Xiang 419d6efc50 staging: erofs: keep corrupted fs from crashing kernel in erofs_namei()
As Al pointed out, "
... and while we are at it, what happens to
	unsigned int nameoff = le16_to_cpu(de[mid].nameoff);
	unsigned int matched = min(startprfx, endprfx);

	struct qstr dname = QSTR_INIT(data + nameoff,
		unlikely(mid >= ndirents - 1) ?
			maxsize - nameoff :
			le16_to_cpu(de[mid + 1].nameoff) - nameoff);

	/* string comparison without already matched prefix */
	int ret = dirnamecmp(name, &dname, &matched);
if le16_to_cpu(de[...].nameoff) is not monotonically increasing?  I.e.
what's to prevent e.g. (unsigned)-1 ending up in dname.len?

Corrupted fs image shouldn't oops the kernel.. "

Revisit the related lookup flow to address the issue.

Fixes: d72d1ce601 ("staging: erofs: add namei functions")
Cc: <stable@vger.kernel.org> # 4.19+
Suggested-by: Al Viro <viro@ZenIV.linux.org.uk>
Signed-off-by: Gao Xiang <gaoxiang25@huawei.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-02-14 10:47:21 +01:00
Sheng Yong 3b1b5291f7 staging: erofs: fix memleak of inode's shared xattr array
If it fails to read a shared xattr page, the inode's shared xattr array
is not freed. The next time the inode's xattr is accessed, the previously
allocated array is leaked.

Signed-off-by: Sheng Yong <shengyong1@huawei.com>
Fixes: b17500a0fd ("staging: erofs: introduce xattr & acl support")
Cc: <stable@vger.kernel.org> # 4.19+
Reviewed-by: Gao Xiang <gaoxiang25@huawei.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-02-14 10:47:20 +01:00
Chengguang Xu 209312369e staging: erofs: remove redundant unlikely annotation in unzip_vle.c
unlikely has already included in IS_ERR(),
so just remove it.

Signed-off-by: Chengguang Xu <cgxu519@gmx.com>
Reviewed-by: Gao Xiang <gaoxiang25@huawei.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-02-12 10:45:45 +01:00
Chengguang Xu 7fadcdce5d staging: erofs: remove redundant likely/unlikely annotation in namei.c
unlikely has already included in IS_ERR(),
so just remove redundant likely/unlikely
annotation.

Signed-off-by: Chengguang Xu <cgxu519@gmx.com>
Reviewed-by: Gao Xiang <gaoxiang25@huawei.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-02-12 10:45:45 +01:00
Masahiro Yamada 2fa495892b staging: prefix header search paths with $(srctree)/
Currently, the Kbuild core manipulates header search paths in a crazy
way [1].

To fix this mess, I want all Makefiles to add explicit $(srctree)/ to
the search paths in the srctree. Some Makefiles are already written in
that way, but not all. The goal of this work is to make the notation
consistent, and finally get rid of the gross hacks.

Having whitespaces after -I does not matter since commit 48f6e3cf5b
("kbuild: do not drop -I without parameter").

[1]: https://patchwork.kernel.org/patch/9632347/

Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-02-04 12:30:27 +01:00
Gao Xiang 516c115c91 staging: erofs: complete POSIX ACL support
Let's add .get_acl() to read the file's acl from its xattrs
to make POSIX ACL usable.

Here is the on-disk detail,
fullname: system.posix_acl_access
struct erofs_xattr_entry:
        .e_name_len = 0
        .e_name_index = EROFS_XATTR_INDEX_POSIX_ACL_ACCESS (2)

fullname: system.posix_acl_default
struct erofs_xattr_entry:
	.e_name_len = 0
	.e_name_index = EROFS_XATTR_INDEX_POSIX_ACL_DEFAULT (3)

Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Gao Xiang <gaoxiang25@huawei.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-01-30 15:38:50 +01:00
Gao Xiang a24df1f62f staging: erofs: use xattr_prefix to wrap up
Let's use xattr_prefix instead of open code.
No logic changes.

Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Gao Xiang <gaoxiang25@huawei.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-01-30 15:38:50 +01:00
Chengguang Xu 94832d9399 staging: erofs: fix potential double iput in erofs_read_super()
Some error cases like failing from d_make_root() will
cause double iput because d_make_root() also does iput
in its error path.

Signed-off-by: Chengguang Xu <cgxu519@gmx.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Reviewed-by: Gao Xiang <gaoxiang25@huawei.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-01-25 09:52:02 +01:00
Gao Xiang 2e1d66379e staging: erofs: drop the extern prefix for function definitions
Fix all `CHECK: extern prototypes should be avoided in .h files'
reported by checkpatch.pl.

Signed-off-by: Gao Xiang <gaoxiang25@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-01-18 10:37:16 +01:00
Gao Xiang d55bc7ba6b staging: erofs: staticize erofs_shrink_count, erofs_shrink_scan
Move erofs_shrinker_info to utils.c and therefore
no need to globalize erofs_shrink_count and erofs_shrink_scan.

Signed-off-by: Gao Xiang <gaoxiang25@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-01-18 10:37:16 +01:00
Gao Xiang 4501ca36bc staging: erofs: move shrink accounting inside the function
This patch moves the &erofs_global_shrink_cnt accounting
from the caller to erofs_workgroup_get().  It's cleaner and
it matches erofs_workgroup_put() better. No behavior change.

Reviewed-by: Chao Yu <yuchao0@huawei.com>
Reviewed-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Gao Xiang <gaoxiang25@huawei.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-01-18 10:37:16 +01:00
Gao Xiang d60eff4396 staging: erofs: localize erofs_workgroup_get
Staticize erofs_workgroup_get since no external user
out of utils.c directly calls erofs_workgroup_get.

Signed-off-by: Gao Xiang <gaoxiang25@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-01-18 10:34:01 +01:00
Gao Xiang 61c9314fdd staging: erofs: sunset erofs_workstation_cleanup_all
There is only one user calling erofs_workstation_cleanup_all,
and it is no likely that more users will use in that way
in the future.

Signed-off-by: Gao Xiang <gaoxiang25@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-01-18 10:34:01 +01:00
Chao Yu 3b423417d0 staging: erofs: clean up erofs_map_blocks_iter
This patch cleans up erofs_map_blocks* function and structure family,
just simply the code, no logic change.

Reviewed-by: Gao Xiang <gaoxiang25@huawei.com>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Gao Xiang <gaoxiang25@huawei.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-01-15 15:52:26 +01:00
Gao Xiang 6af7b48305 staging: erofs: move erofs_xattr_handlers to xattr.h
Let's move independent xattr-related stuffs to xattr.h.
No logic changes.

Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Gao Xiang <gaoxiang25@huawei.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-01-15 15:51:57 +01:00
Gao Xiang 609398266c staging: erofs: remove unneeded inode_operations
Currently, EROFS uses generic iops when xattr is off,
it seems unnecessary and a lot of extra code is there.
Let's follow what other filesystems do instead.

Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Gao Xiang <gaoxiang25@huawei.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-01-15 15:51:57 +01:00
Gao Xiang 7077fffcb0 staging: erofs: fix fast symlink w/o xattr when fs xattr is on
Currently, this will hit a BUG_ON for these symlinks as follows:

- kernel message
------------[ cut here ]------------
kernel BUG at drivers/staging/erofs/xattr.c:59!
SMP PTI
CPU: 1 PID: 1170 Comm: getllxattr Not tainted 4.20.0-rc6+ #92
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-2.fc27 04/01/2014
RIP: 0010:init_inode_xattrs+0x22b/0x270
Code: 48 0f 45 ea f0 ff 4d 34 74 0d 41 83 4c 24 e0 01 31 c0 e9 00 fe ff ff 48 89 ef e8 e0 31 9e ff eb e9 89 e8 e9 ef fd ff ff 0f 0$
 <0f> 0b 48 89 ef e8 fb f6 9c ff 48 8b 45 08 a8 01 75 24 f0 ff 4d 34
RSP: 0018:ffffa03ac026bdf8 EFLAGS: 00010246
------------[ cut here ]------------
...
Call Trace:
 erofs_listxattr+0x30/0x2c0
 ? selinux_inode_listxattr+0x5a/0x80
 ? kmem_cache_alloc+0x33/0x170
 ? security_inode_listxattr+0x27/0x40
 listxattr+0xaf/0xc0
 path_listxattr+0x5a/0xa0
 do_syscall_64+0x43/0xf0
 entry_SYSCALL_64_after_hwframe+0x44/0xa9
...
---[ end trace 3c24b49408dc0c72 ]---

Fix it by checking ->xattr_isize in init_inode_xattrs(),
and it also fixes improper return value -ENOTSUPP
(it should be -ENODATA if xattr is enabled) for those inodes.

Fixes: b17500a0fd ("staging: erofs: introduce xattr & acl support")
Cc: <stable@vger.kernel.org> # 4.19+
Reported-by: Li Guifu <bluce.liguifu@huawei.com>
Tested-by: Li Guifu <bluce.liguifu@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Gao Xiang <gaoxiang25@huawei.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-01-15 15:51:57 +01:00
Gao Xiang fdb0536469 staging: erofs: add document
This documents key feature, usage, and on-disk design of erofs.

Reviewed-by: Chao Yu <yuchao0@huawei.com>
Cc: <linux-fsdevel@vger.kernel.org>
Signed-off-by: Gao Xiang <gaoxiang25@huawei.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-01-15 15:51:22 +01:00
Sidong Yang 4b03f3f4cc staging: erofs: Add identifier for function definition arguments
Add identifier for function definition arguments in xattr_iter_handlers,
this change clears the checkpatch.pl issue and make code more explicit.

Signed-off-by: Sidong Yang <realwakka@gmail.com>
Reviewed-by: Gao Xiang <gaoxiang25@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-01-11 10:44:20 +01:00
Jeremy Sowden e7dfb1cff6 staging: erofs: fixed -Wmissing-prototype warnings by moving prototypes to header file.
Moved prototypes for two functions to a header file in order to fix the
following warnings:

  drivers/staging/erofs/unzip_vle.c:577:6: warning: no previous prototype for ‘erofs_workgroup_free_rcu’ [-Wmissing-prototypes]
   void erofs_workgroup_free_rcu(struct erofs_workgroup *grp)
        ^~~~~~~~~~~~~~~~~~~~~~~~

  drivers/staging/erofs/unzip_vle.c:1702:5: warning: no previous prototype for ‘z_erofs_map_blocks_iter’ [-Wmissing-prototypes]
   int z_erofs_map_blocks_iter(struct inode *inode,
       ^~~~~~~~~~~~~~~~~~~~~~~

Signed-off-by: Jeremy Sowden <jeremy@azazel.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-01-11 10:44:20 +01:00
Jeremy Sowden 0a64d62d53 staging: erofs: fixed -Wmissing-prototype warnings by making functions static.
Define three functions which are not used outside their own
translation-units as static in order to fix the following warnings:

  drivers/staging/erofs/utils.c:134:6: warning: no previous prototype for ‘erofs_try_to_release_workgroup’ [-Wmissing-prototypes]
   bool erofs_try_to_release_workgroup(struct erofs_sb_info *sbi,
        ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

  drivers/staging/erofs/unzip_vle.c:592:6: warning: no previous prototype for ‘z_erofs_vle_work_release’ [-Wmissing-prototypes]
   void z_erofs_vle_work_release(struct z_erofs_vle_work *work)
        ^~~~~~~~~~~~~~~~~~~~~~~~

  drivers/staging/erofs/unzip_vle_lz4.c:16:5: warning: no previous prototype for ‘z_erofs_unzip_lz4’ [-Wmissing-prototypes]
   int z_erofs_unzip_lz4(void *in, void *out, size_t inlen, size_t outlen)
       ^~~~~~~~~~~~~~~~~

Signed-off-by: Jeremy Sowden <jeremy@azazel.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-01-11 10:44:20 +01:00
Gao Xiang c706d4b744 staging: erofs: fix return type of erofs_workgroup_get
There exists a return type misuse (`int'->`bool') since all
users assume it fails if only return value != 0, let's fix
the return type to `int' instead of confusing `bool'.

No logic changes.

Signed-off-by: Gao Xiang <gaoxiang25@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-01-07 08:56:07 +01:00
Aaron Strahlberger 019ec6c14f staging: erofs: Fix spelling issue
Changed "stoped" to "stopped".

Signed-off-by: Aaron Strahlberger <aaron.strahlberger@posteo.de>
Signed-off-by: Julius Wiedmann <julius.wiedmann@fau.de>
Signed-off-by: Dominik Huber <domi250@gmx.de>
Reviewed-by: Gao Xiang <gaoxiang25@huawei.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-12-12 11:30:05 +01:00
Gao Xiang ccd9c19c7a staging: erofs: remove __EROFS_BIT
It's better to use pre-calculated values to make
on-disk definition more straight-forward and human-readable.

Since there is the only one user, let's remove
__EROFS_BIT entirely.

Suggested-by: Joe Perches <joe@perches.com>
Signed-off-by: Gao Xiang <hsiangkao@aol.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-12-12 11:29:46 +01:00
Gao Xiang b8e076a6ef staging: erofs: unzip_vle_lz4.c,utils.c: rectify BUG_ONs
remove all redundant BUG_ONs, and turn the rest
useful usages to DBG_BUGONs.

Signed-off-by: Gao Xiang <gaoxiang25@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-12-12 10:56:34 +01:00
Gao Xiang 70b17991d8 staging: erofs: unzip_{pagevec.h,vle.c}: rectify BUG_ONs
remove all redundant BUG_ONs, and turn the rest
useful usages to DBG_BUGONs.

Signed-off-by: Gao Xiang <gaoxiang25@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-12-12 10:56:34 +01:00
Gao Xiang 7146a4f026 staging: erofs: simplify `z_erofs_vle_submit_all'
Previously, there are too many hacked stuffs such as `__FSIO_1',
`lstgrp_noio', `lstgrp_io' out there in `z_erofs_vle_submit_all'.

Revisit the whole process by properly introducing jobqueue to
represent each type of queued workgroups, furthermore hide all of
crazyness behind independent separated functions.

After this patch, 2 independent jobqueues exist if managed cache
is enabled, or 1 jobqueue if disabled.

Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Gao Xiang <gaoxiang25@huawei.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-12-07 17:10:48 +01:00
Gao Xiang 6afd227ca1 staging: erofs: redefine where `owned_workgrp_t' points
By design, workgroups are queued in the form of linked lists.

Previously, it points to the next `z_erofs_vle_workgroup',
which isn't flexible enough to simplify `z_erofs_vle_submit_all'.

Let's fix it by pointing to the next `owned_workgrp_t' and use
container_of to get its coresponding `z_erofs_vle_workgroup'.

Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Gao Xiang <gaoxiang25@huawei.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-12-07 17:10:48 +01:00
Gao Xiang 92e6efd566 staging: erofs: refine compressed pages preload flow
Currently, there are two kinds of compressed pages in erofs:
  1) file pages for the in-place decompression and
  2) managed pages for cached decompression.
Both are all stored in grp->compressed_pages[].

For managed pages, they could already exist or could be preloaded
in this round, including the following cases in detail:
  1) Already valid (loaded in some previous round);
  2) PAGE_UNALLOCATED, should be allocated at the time of submission;
  3) Just found in the managed cache, and with an extra page ref.
Currently, 1) and 3) can be distinguishable by lock_page and
checking its PG_private, which is guaranteed by the reclaim path,
but it's better to do a double check by using an extra tag.

This patch reworks the preload flow by introducing such the tag
by using tagged pointer, too many #ifdefs are removed as well.

Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Gao Xiang <gaoxiang25@huawei.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-12-07 17:10:48 +01:00
Gao Xiang 9248fce714 staging: erofs: revisit the page submission flow
Previously, the submission flow works with cached compressed pages
reclaim path in a tricky way, and it could be buggy if the reclaim
path changes later without such tricky restrictions. For example,
currently one PagePrivate(page) is evaluated without taking page
lock (it only follows a wait_for_page_locked which closes such race)
and no handling solves the potential page truncation case.

In addition, it's also full of #ifdefs in the function, which
is hard to understand and maintain. this patch fixes them all.

Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Gao Xiang <gaoxiang25@huawei.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-12-07 17:10:48 +01:00
Gao Xiang 672e547610 staging: erofs: localize UNALLOCATED_CACHED_PAGE placeholder
In practice, in order to do cached decompression rather than reuse
them for in-place decompression and make full use of pages in
page_pool instead of allocating as much as possible, an unallocated
placeholder was introduce to mark all in compressed_pages[] and
they will be replaced at the time of submission.

Previously EROFS_UNALLOCATED_CACHED_PAGE was included in internal.h,
which is unnecessary since it's only internally used in decompression
subsystem, move it to unzip_vle.c and rename it to PAGE_UNALLOCATED.

Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Gao Xiang <gaoxiang25@huawei.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-12-07 17:10:48 +01:00
Gao Xiang c1448fa880 staging: erofs: introduce MNGD_MAPPING helper
This patch introduces MNGD_MAPPING to wrap up
sbi->managed_cache->i_mapping, which will be used
to solve too many #ifdefs in a single function.

No logic changes.

Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Gao Xiang <gaoxiang25@huawei.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-12-07 17:10:48 +01:00
Gao Xiang 848bd9acdc staging: erofs: fix use-after-free of on-stack `z_erofs_vle_unzip_io'
The root cause is the race as follows:
 Thread #0                         Thread #1

 z_erofs_vle_unzip_kickoff         z_erofs_submit_and_unzip

                                    struct z_erofs_vle_unzip_io io[]
   atomic_add_return()
                                    wait_event()
                                    [end of function]
   wake_up()

Fix it by taking the waitqueue lock between atomic_add_return and
wake_up to close such the race.

kernel message:

Unable to handle kernel paging request at virtual address 97f7052caa1303dc
...
Workqueue: kverityd verity_work
task: ffffffe32bcb8000 task.stack: ffffffe3298a0000
PC is at __wake_up_common+0x48/0xa8
LR is at __wake_up+0x3c/0x58
...
Call trace:
...
[<ffffff94a08ff648>] __wake_up_common+0x48/0xa8
[<ffffff94a08ff8b8>] __wake_up+0x3c/0x58
[<ffffff94a0c11b60>] z_erofs_vle_unzip_kickoff+0x40/0x64
[<ffffff94a0c118e4>] z_erofs_vle_read_endio+0x94/0x134
[<ffffff94a0c83c9c>] bio_endio+0xe4/0xf8
[<ffffff94a1076540>] dec_pending+0x134/0x32c
[<ffffff94a1076f28>] clone_endio+0x90/0xf4
[<ffffff94a0c83c9c>] bio_endio+0xe4/0xf8
[<ffffff94a1095024>] verity_work+0x210/0x368
[<ffffff94a08c4150>] process_one_work+0x188/0x4b4
[<ffffff94a08c45bc>] worker_thread+0x140/0x458
[<ffffff94a08cad48>] kthread+0xec/0x108
[<ffffff94a0883ab4>] ret_from_fork+0x10/0x1c
Code: d1006273 54000260 f9400804 b9400019 (b85fc081)
---[ end trace be9dde154f677cd1 ]---

Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Gao Xiang <gaoxiang25@huawei.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-12-07 17:10:48 +01:00
Gao Xiang 3c49898715 staging: erofs: update erofs-utils information in TODO
erofs-utils was released by the original author Li Guifu
weeks ago in the linux-erofs mailing list [1], update
information in TODO and let's wait the original author
finish all open source works.

[1] https://lists.ozlabs.org/pipermail/linux-erofs/2018-November/001004.html

Cc: Li Guifu <bluce.liguifu@huawei.com>
Cc: Fang Wei <fangwei1@huawei.com>
Signed-off-by: Gao Xiang <gaoxiang25@huawei.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-12-06 16:08:56 +01:00
Gao Xiang 8b987bca2d staging: erofs: {dir,inode,super}.c: rectify BUG_ONs
remove all redundant BUG_ONs, and turn the rest
useful usages to DBG_BUGONs.

Signed-off-by: Gao Xiang <gaoxiang25@huawei.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-12-06 16:08:56 +01:00
Gao Xiang f0c519fc26 staging: erofs: rename strange variable names in z_erofs_vle_frontend
Previously, 2 members called `initial' and `cachedzone_la' are used
for applying caching policy (whether the workgroup is at either end),
which are hard to understand, rename them to `backmost' and `headoffset'.

Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Gao Xiang <gaoxiang25@huawei.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-11-23 10:53:08 +01:00
Gao Xiang 2d9b5dcd99 staging: erofs: decompress asynchronously if PG_readahead page at first
For the case of nr_to_read == lookahead_size, it is better to
decompress asynchronously as well since no page will be needed immediately.

Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Gao Xiang <gaoxiang25@huawei.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-11-23 10:53:08 +01:00
Gao Xiang 23edf3abe7 staging: erofs: locked before registering for all new workgroups
Let's make sure that the one registering a workgroup will also
take the primary work lock at first for two reasons:
  1) There's no need to introduce such a race window (and consequently
     overhead) between registering and locking, other tasks could break
     in by chance, and the race seems unnecessary (no benefit at all);

  2) It's better to take the primary work when a workgroup
     is registered to apply the cache managed policy, for example,
     if some other tasks break in, it could turn into the in-place
     decompression rather than use as the cached decompression.

Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Gao Xiang <gaoxiang25@huawei.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-11-23 10:53:08 +01:00
Gao Xiang 48d4bf3b05 staging: erofs: separate into init_once / always
`z_erofs_vle_workgroup' is heavily generated in the decompression,
for example, it resets 32 bytes redundantly for 64-bit platforms
even through Z_EROFS_VLE_INLINE_PAGEVECS + Z_EROFS_CLUSTER_MAX_PAGES,
default 4, pages are stored in `z_erofs_vle_workgroup'.

As an another example, `struct mutex' takes 72 bytes for our kirin
64-bit platforms, it's unnecessary to be reseted at first and
be initialized each time.

Let's avoid filling all `z_erofs_vle_workgroup' with 0 at first
since most fields are reinitialized to meaningful values later,
and pagevec is no need to initialized at all.

Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Gao Xiang <gaoxiang25@huawei.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-11-23 10:53:08 +01:00
Gao Xiang 948bbdb181 staging: erofs: add a full barrier in erofs_workgroup_unfreeze
Just like other generic locks, insert a full barrier
in case of memory reorder.

Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Gao Xiang <gaoxiang25@huawei.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-11-23 10:53:08 +01:00