linux

Commit Graph

Author	SHA1	Message	Date
Wang Shilong	736dedbb1a	ext4: mark block bitmap corrupted when found There are still some cases that we missed to set block bitmaps corrupted bit properly: 1) block bitmap number is wrong. 2) failed to read block bitmap due to disk errors. 3) double free block bitmaps.. 4) some mismatch check with bitmaps vs buddy information. Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Liu Bo <bo.liu@linux.alibaba.com> Signed-off-by: Wang Shilong <wshilong@ddn.com> Reviewed-by: Liu Bo <bo.liu@linux.alibaba.com> Reviewed-by: Andreas Dilger <adilger@dilger.ca>	2018-05-12 12:37:58 -04:00
Wang Shilong	206f6d552d	ext4: mark inode bitmap corrupted when found There are still some cases that we missed to set block bitmaps corrupted bit properly: 1)inode bitmap number is wrong. 2)failed to read block bitmap due to disk errors. 3)double allocations from bitmap Also remove a duplicated call ext4_error() afer ext4_read_inode_bitmap(), as ext4_error() have been called inside ext4_read_inode_bitmap() properly. Signed-off-by: Wang Shilong <wshilong@ddn.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Andreas Dilger <adilger@dilger.ca>	2018-05-12 12:15:21 -04:00
Wang Shilong	db79e6d1fb	ext4: add new ext4_mark_group_bitmap_corrupted() helper Since there are many places to set inode/block bitmap corrupt bit, add a new helper for it, which will make codes more clear. Signed-off-by: Wang Shilong <wshilong@ddn.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Andreas Dilger <adilger@dilger.ca>	2018-05-12 11:39:40 -04:00
Wang Shilong	0db9fdeb34	ext4: fix wrong return value in ext4_read_inode_bitmap() The only reason that sb_getblk() could fail is out of memory, ext4 codes have returned -ENOMME for all other places except this one, let's fix it here too. Signed-off-by: Wang Shilong <wshilong@ddn.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2018-05-12 11:35:01 -04:00
Al Viro	1e2e547a93	do d_instantiate/unlock_new_inode combinations safely For anything NFS-exported we do _not_ want to unlock new inode before it has grown an alias; original set of fixes got the ordering right, but missed the nasty complication in case of lockdep being enabled - unlock_new_inode() does lockdep_annotate_inode_mutex_key(inode) which can only be done before anyone gets a chance to touch ->i_mutex. Unfortunately, flipping the order and doing unlock_new_inode() before d_instantiate() opens a window when mkdir can race with open-by-fhandle on a guessed fhandle, leading to multiple aliases for a directory inode and all the breakage that follows from that. Correct solution: a new primitive (d_instantiate_new()) combining these two in the right order - lockdep annotate, then d_instantiate(), then the rest of unlock_new_inode(). All combinations of d_instantiate() with unlock_new_inode() should be converted to that. Cc: stable@kernel.org # 2.6.29 and later Tested-by: Mike Marshall <hubcap@omnibond.com> Reviewed-by: Andreas Dilger <adilger@dilger.ca> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2018-05-11 15:36:37 -04:00
Eryu Guan	e254d1afac	ext4: use raw i_version value for ea_inode Currently, creating large xattr (e.g. 2k) in ea_inode would cause ea_inode refcount corruption, e.g. Pass 4: Checking reference counts Extended attribute inode 13 ref count is 0, should be 1. Fix? no This is because that we save the lower 32bit of refcount in inode->i_version and store it in raw_inode->i_disk_version on disk. But since commit `ee73f9a52a` ("ext4: convert to new i_version API"), we load/store modified i_disk_version from/to disk instead of raw value, which causes on-disk ea_inode refcount corruption. Fix it by loading/storing raw i_version/i_disk_version, because it's a self-managed value in this case. Fixes: `ee73f9a52a` ("ext4: convert to new i_version API") Cc: Tahsin Erdogan <tahsin@google.com> Signed-off-by: Eryu Guan <guaneryu@gmail.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2018-05-10 11:55:31 -04:00
Eryu Guan	3f706c8c92	ext4: use XATTR_CREATE in ext4_initxattrs() I hit ENOSPC error when creating new file in a newly created ext4 with ea_inode feature enabled, if selinux is enabled and ext4 is mounted without any selinux context. e.g. mkfs -t ext4 -O ea_inode -F /dev/sda5 mount /dev/sda5 /mnt/ext4 touch /mnt/ext4/testfile # got ENOSPC here It turns out that we run out of journal credits in ext4_xattr_set_handle() when creating new selinux label for the newly created inode. This is because that in __ext4_new_inode() we use __ext4_xattr_set_credits() to calculate the reserved credits for new xattr, with the 'is_create' argument being true, which implies less credits in the ea_inode case. But we calculate the required credits in ext4_xattr_set_handle() with 'is_create' being false, which means we need more credits if ea_inode feature is enabled. So we don't have enough credits and error out with ENOSPC. Fix it by simply calling ext4_xattr_set_handle() with XATTR_CREATE flag in ext4_initxattrs(), so we end up with requiring less credits than reserved. The semantic of XATTR_CREATE is "Perform a pure create, which fails if the named attribute exists already." (from setxattr(2)), which is fine in this case, because we only call ext4_initxattrs() on newly created inode. Fixes: `af65207c76` ("ext4: fix __ext4_new_inode() journal credits calculation") Cc: Tahsin Erdogan <tahsin@google.com> Signed-off-by: Eryu Guan <guaneryu@gmail.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2018-05-10 11:52:14 -04:00
Mathieu Malaterre	472d8ea195	ext4: make function ‘ext4_getfsmap_find_fixed_metadata’ static Since function ‘ext4_getfsmap_find_fixed_metadata’ can be made static, make it so. Remove the following gcc warning (W=1): fs/ext4/fsmap.c:405:5: warning: no previous prototype for ‘ext4_getfsmap_find_fixed_metadata’ [-Wmissing-prototypes] Signed-off-by: Mathieu Malaterre <malat@debian.org> Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2018-05-10 11:50:04 -04:00
Eric Biggers	0cb8dae4a0	fscrypt: allow synchronous bio decryption Currently, fscrypt provides fscrypt_decrypt_bio_pages() which decrypts a bio's pages asynchronously, then unlocks them afterwards. But, this assumes that decryption is the last "postprocessing step" for the bio, so it's incompatible with additional postprocessing steps such as authenticity verification after decryption. Therefore, rename the existing fscrypt_decrypt_bio_pages() to fscrypt_enqueue_decrypt_bio(). Then, add fscrypt_decrypt_bio() which decrypts the pages in the bio synchronously without unlocking the pages, nor setting them Uptodate; and add fscrypt_enqueue_decrypt_work(), which enqueues work on the fscrypt_read_workqueue. The new functions will be used by filesystems that support both fscrypt and fs-verity. Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2018-05-02 14:30:57 -07:00
Linus Torvalds	cdface5209	Fix misc. bugs and a regression for ext4. -----BEGIN PGP SIGNATURE----- iQEzBAABCAAdFiEEK2m5VNv+CHkogTfJ8vlZVpUNgaMFAlrlM9IACgkQ8vlZVpUN gaOtrQf+OhNwH0vIfCQwQG36m7DM5DNJ1eiFV6Qp++7B+nzMO7nRaXrINTKty/iO vSK59a2lCa1Uufjii/vIiVoVjiaiRfVJJt4/WFV6jtESxnBAPTj//nFDGBoQANsR XxTNDO9Bkra9QlWgasSiqbkyyKd2KFHf23LP1fdfspXuRFrGhu6pYqaZpbx8V0/2 j+TeLS9V8vNx/rWNmvpMK+WopapvrGYoA0YESAZJBCLMNO5uZZ+2qteORPp+Y5oQ 0d0mCLPedYy5gagHUIN4EnpwP4zNh8efQhQA16teEqs+foIHnyp7VnYYG1lJ3z+Y bYQoQGmKKdnd6Hl+/5sLYct8yZEhXQ== =EF3H -----END PGP SIGNATURE----- Merge tag 'for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 Pull ext4 fixes from Ted Ts'o: "Fix misc bugs and a regression for ext4" * tag 'for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: ext4: add MODULE_SOFTDEP to ensure crc32c is included in the initramfs ext4: fix bitmap position validation ext4: set h_journal if there is a failure starting a reserved handle ext4: prevent right-shifting extents beyond EXT_MAX_BLOCKS	2018-04-28 20:07:21 -07:00
Theodore Ts'o	7ef79ad521	ext4: add MODULE_SOFTDEP to ensure crc32c is included in the initramfs Fixes: `a45403b515` ("ext4: always initialize the crc32c checksum driver") Reported-by: François Valenduc <francoisvalenduc@gmail.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@vger.kernel.org	2018-04-26 00:44:46 -04:00
Lukas Czerner	22be37acce	ext4: fix bitmap position validation Currently in ext4_valid_block_bitmap() we expect the bitmap to be positioned anywhere between 0 and s_blocksize clusters, but that's wrong because the bitmap can be placed anywhere in the block group. This causes false positives when validating bitmaps on perfectly valid file system layouts. Fix it by checking whether the bitmap is within the group boundary. The problem can be reproduced using the following mkfs -t ext3 -E stride=256 /dev/vdb1 mount /dev/vdb1 /mnt/test cd /mnt/test wget https://cdn.kernel.org/pub/linux/kernel/v4.x/linux-4.16.3.tar.xz tar xf linux-4.16.3.tar.xz This will result in the warnings in the logs EXT4-fs error (device vdb1): ext4_validate_block_bitmap:399: comm tar: bg 84: block 2774529: invalid block bitmap [ Changed slightly for clarity and to not drop a overflow test -- TYT ] Signed-off-by: Lukas Czerner <lczerner@redhat.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reported-by: Ilya Dryomov <idryomov@gmail.com> Fixes: `7dac4a1726` ("ext4: add validity checks for bitmap block numbers") Cc: stable@vger.kernel.org	2018-04-24 11:31:44 -04:00
Eric Biggers	349fa7d6e1	ext4: prevent right-shifting extents beyond EXT_MAX_BLOCKS During the "insert range" fallocate operation, extents starting at the range offset are shifted "right" (to a higher file offset) by the range length. But, as shown by syzbot, it's not validated that this doesn't cause extents to be shifted beyond EXT_MAX_BLOCKS. In that case ->ee_block can wrap around, corrupting the extent tree. Fix it by returning an error if the space between the end of the last extent and EXT4_MAX_BLOCKS is smaller than the range being inserted. This bug can be reproduced by running the following commands when the current directory is on an ext4 filesystem with a 4k block size: fallocate -l 8192 file fallocate --keep-size -o 0xfffffffe000 -l 4096 -n file fallocate --insert-range -l 8192 file Then after unmounting the filesystem, e2fsck reports corruption. Reported-by: syzbot+06c885be0edcdaeab40c@syzkaller.appspotmail.com Fixes: `331573febb` ("ext4: Add support FALLOC_FL_INSERT_RANGE for fallocate") Cc: stable@vger.kernel.org # v4.2+ Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2018-04-12 11:48:09 -04:00
Linus Torvalds	9f3a0941fb	libnvdimm for 4.17 * A rework of the filesytem-dax implementation provides for detection of unmap operations (truncate / hole punch) colliding with in-progress device-DMA. A fix for these collisions remains a work-in-progress pending resolution of truncate latency and starvation regressions. * The of_pmem driver expands the users of libnvdimm outside of x86 and ACPI to describe an implementation of persistent memory on PowerPC with Open Firmware / Device tree. * Address Range Scrub (ARS) handling is completely rewritten to account for the fact that ARS may run for 100s of seconds and there is no platform defined way to cancel it. ARS will now no longer block namespace initialization. * The NVDIMM Namespace Label implementation is updated to handle label areas as small as 1K, down from 128K. * Miscellaneous cleanups and updates to unit test infrastructure. -----BEGIN PGP SIGNATURE----- iQIcBAABAgAGBQJazDt5AAoJEB7SkWpmfYgCqGMQALLwdPeY87cUK7AvQ2IXj46B lJgeVuHPzyQDbC03AS5uUYnnU3I5lFd7i4y7ZrywNpFs4lsb/bNmbUpQE5xp+Yvc 1MJ/JYDIP5X4misWYm3VJo85N49+VqSRgAQk52PBigwnZ7M6/u4cSptXM9//c9JL /NYbat6IjjY6Tx49Tec6+F3GMZjsFLcuTVkQcREoOyOqVJE4YpP0vhNjEe0vq6vr EsSWiqEI5VFH4PfJwKdKj/64IKB4FGKj2A5cEgjQBxW2vw7tTJnkRkdE3jDUjqtg xYAqGp/Dqs4+bgdYlT817YhiOVrcr5mOHj7TKWQrBPgzKCbcG5eKDmfT8t+3NEga 9kBlgisqIcG72lwZNA7QkEHxq1Omy9yc1hUv9qz2YA0G+J1WE8l1T15k1DOFwV57 qIrLLUypklNZLxvrzNjclempboKc4JCUlj+TdN5E5Y6pRs55UWTXaP7Xf5O7z0vf l/uiiHkc3MPH73YD2PSEGFJ8m8EU0N8xhrcz3M9E2sHgYCnbty1Lw3FH0/GhThVA ya1mMeDdb8A2P7gWCBk1Lqeig+rJKXSey4hKM6D0njOEtMQO1H4tFqGjyfDX1xlJ 3plUR9WBVEYzN5+9xWbwGag/ezGZ+NfcVO2gmy6yXiEph796BxRAZx/18zKRJr0m 9eGJG1H+JspcbtLF9iHn =acZQ -----END PGP SIGNATURE----- Merge tag 'libnvdimm-for-4.17' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm Pull libnvdimm updates from Dan Williams: "This cycle was was not something I ever want to repeat as there were several late changes that have only now just settled. Half of the branch up to commit `d2c997c0f1` ("fs, dax: use page->mapping to warn...") have been in -next for several releases. The of_pmem driver and the address range scrub rework were late arrivals, and the dax work was scaled back at the last moment. The of_pmem driver missed a previous merge window due to an oversight. A sense of obligation to rectify that miss is why it is included for 4.17. It has acks from PowerPC folks. Stephen reported a build failure that only occurs when merging it with your latest tree, for now I have fixed that up by disabling modular builds of of_pmem. A test merge with your tree has received a build success report from the 0day robot over 156 configs. An initial version of the ARS rework was submitted before the merge window. It is self contained to libnvdimm, a net code reduction, and passing all unit tests. The filesystem-dax changes are based on the wait_var_event() functionality from tip/sched/core. However, late review feedback showed that those changes regressed truncate performance to a large degree. The branch was rewound to drop the truncate behavior change and now only includes preparation patches and cleanups (with full acks and reviews). The finalization of this dax-dma-vs-trnucate work will need to wait for 4.18. Summary: - A rework of the filesytem-dax implementation provides for detection of unmap operations (truncate / hole punch) colliding with in-progress device-DMA. A fix for these collisions remains a work-in-progress pending resolution of truncate latency and starvation regressions. - The of_pmem driver expands the users of libnvdimm outside of x86 and ACPI to describe an implementation of persistent memory on PowerPC with Open Firmware / Device tree. - Address Range Scrub (ARS) handling is completely rewritten to account for the fact that ARS may run for 100s of seconds and there is no platform defined way to cancel it. ARS will now no longer block namespace initialization. - The NVDIMM Namespace Label implementation is updated to handle label areas as small as 1K, down from 128K. - Miscellaneous cleanups and updates to unit test infrastructure" * tag 'libnvdimm-for-4.17' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm: (39 commits) libnvdimm, of_pmem: workaround OF_NUMA=n build error nfit, address-range-scrub: add module option to skip initial ars nfit, address-range-scrub: rework and simplify ARS state machine nfit, address-range-scrub: determine one platform max_ars value powerpc/powernv: Create platform devs for nvdimm buses doc/devicetree: Persistent memory region bindings libnvdimm: Add device-tree based driver libnvdimm: Add of_node to region and bus descriptors libnvdimm, region: quiet region probe libnvdimm, namespace: use a safe lookup for dimm device name libnvdimm, dimm: fix dpa reservation vs uninitialized label area libnvdimm, testing: update the default smart ctrl_temperature libnvdimm, testing: Add emulation for smart injection commands nfit, address-range-scrub: introduce nfit_spa->ars_state libnvdimm: add an api to cast a 'struct nd_region' to its 'struct device' nfit, address-range-scrub: fix scrub in-progress reporting dax, dm: allow device-mapper to operate without dax support dax: introduce CONFIG_DAX_DRIVER fs, dax: use page->mapping to warn if truncate collides with a busy page ext2, dax: introduce ext2_dax_aops ...	2018-04-10 10:25:57 -07:00
Linus Torvalds	9022ca6b11	Merge branch 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull misc vfs updates from Al Viro: "Assorted stuff, including Christoph's I_DIRTY patches" * 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: fs: move I_DIRTY_INODE to fs.h ubifs: fix bogus __mark_inode_dirty(I_DIRTY_SYNC \| I_DIRTY_DATASYNC) call ntfs: fix bogus __mark_inode_dirty(I_DIRTY_SYNC \| I_DIRTY_DATASYNC) call gfs2: fix bogus __mark_inode_dirty(I_DIRTY_SYNC \| I_DIRTY_DATASYNC) calls fs: fold open_check_o_direct into do_dentry_open vfs: Replace stray non-ASCII homoglyph characters with their ASCII equivalents vfs: make sure struct filename->iname is word-aligned get rid of pointless includes of fs_struct.h [poll] annotate SAA6588_CMD_POLL users	2018-04-06 11:07:08 -07:00
Theodore Ts'o	e40ff21389	ext4: force revalidation of directory pointer after seekdir(2) A malicious user could force the directory pointer to be in an invalid spot by using seekdir(2). Use the mechanism we already have to notice if the directory has changed since the last time we called ext4_readdir() to force a revalidation of the pointer. Reported-by: syzbot+1236ce66f79263e8a862@syzkaller.appspotmail.com Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@vger.kernel.org	2018-04-01 23:21:03 -04:00
Theodore Ts'o	54dd0e0a1b	ext4: add extra checks to ext4_xattr_block_get() Add explicit checks in ext4_xattr_block_get() just in case the e_value_offs and e_value_size fields in the the xattr block are corrupted in memory after the buffer_verified bit is set on the xattr block. Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@kernel.org	2018-03-30 20:04:11 -04:00
Theodore Ts'o	9496005d6c	ext4: add bounds checking to ext4_xattr_find_entry() Add some paranoia checks to make sure we don't stray beyond the end of the valid memory region containing ext4 xattr entries while we are scanning for a match. Also rename the function to xattr_find_entry() since it is static and thus only used in fs/ext4/xattr.c Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@kernel.org	2018-03-30 20:00:56 -04:00
Theodore Ts'o	de05ca8526	ext4: move call to ext4_error() into ext4_xattr_check_block() Refactor the call to EXT4_ERROR_INODE() into ext4_xattr_check_block(). This simplifies the code, and fixes a problem where not all callers of ext4_xattr_check_block() were not resulting in ext4_error() getting called when the xattr block is corrupted. Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@vger.kernel.org	2018-03-30 15:42:25 -04:00
Dan Williams	5f0663bb4a	ext4, dax: introduce ext4_dax_aops In preparation for the dax implementation to start associating dax pages to inodes via page->mapping, we need to provide a 'struct address_space_operations' instance for dax. Otherwise, direct-I/O triggers incorrect page cache assumptions and warnings. Cc: "Theodore Ts'o" <tytso@mit.edu> Cc: Andreas Dilger <adilger.kernel@dilger.ca> Cc: linux-ext4@vger.kernel.org Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Dan Williams <dan.j.williams@intel.com>	2018-03-30 11:34:55 -07:00
Tyson Nottingham	27f394a771	ext4: don't show data=<mode> option if defaulted Previously, mount -l would show data=<mode> even if the ext4 default journaling mode was being used. Change this to be consistent with the rest of the options. Ext4 already did the right thing when the journaling mode being used matched the one specified in the superblock's default mount options. The reason it failed to do the right thing for the ext4 defaults is that, when set, they were never included in sbi->s_def_mount_opt (unlike the superblock's defaults, which were). Signed-off-by: Tyson Nottingham <tgnottingham@gmail.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2018-03-30 00:56:10 -04:00
Tyson Nottingham	ceec03764a	ext4: omit init_itable=n in procfs when disabled Don't show init_itable=n in /proc/fs/ext4/<dev>/options when filesystem is mounted with noinit_itable. Signed-off-by: Tyson Nottingham <tgnottingham@gmail.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2018-03-30 00:53:33 -04:00
Tyson Nottingham	68afa7e083	ext4: show more binary mount options in procfs Previously, /proc/fs/ext4/<dev>/options would only show binary options if they were set (1 in the options bit mask). E.g. it would show "grpid" if it was set, but it would not show "nogrpid" if grpid was not set. This seems sensible, but when an option is absent from the file, it can be hard for the unfamiliar to know what is being used. E.g. if there isn't a (no)grpid entry, nogrpid is in effect. But if there isn't a (no)auto_da_alloc entry, auto_da_alloc is in effect. If there isn't a (minixdf\|bsddf) entry, it turns out bsddf is in effect. It all depends on how the option is implemented. It's clearer to be explicit, so print the corresponding option regardless of whether it means a 1 or a 0 in the bit mask. Note that options which do not have an explicit disable option aren't indicated as being disabled even with this change (e.g. dax). Signed-off-by: Tyson Nottingham <tgnottingham@gmail.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2018-03-30 00:51:10 -04:00
Tyson Nottingham	bc1420ae56	ext4: simplify kobject usage Replace kset with generic kobject provided by kobject_create_and_add(), since the latter is sufficient. Signed-off-by: Tyson Nottingham <tgnottingham@gmail.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2018-03-30 00:41:34 -04:00
Tyson Nottingham	6ca06829fb	ext4: remove unused parameters in sysfs code Signed-off-by: Tyson Nottingham <tgnottingham@gmail.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2018-03-30 00:13:10 -04:00
Tyson Nottingham	c2e5df7626	ext4: null out kobject* during sysfs cleanup Make cleanup of ext4_feat kobject consistent with similar objects. Signed-off-by: Tyson Nottingham <tgnottingham@gmail.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2018-03-30 00:03:38 -04:00
Theodore Ts'o	18db4b4e6f	ext4: don't allow r/w mounts if metadata blocks overlap the superblock If some metadata block, such as an allocation bitmap, overlaps the superblock, it's very likely that if the file system is mounted read/write, the results will not be pretty. So disallow r/w mounts for file systems corrupted in this particular way. Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@vger.kernel.org	2018-03-29 22:10:35 -04:00
Theodore Ts'o	a45403b515	ext4: always initialize the crc32c checksum driver The extended attribute code now uses the crc32c checksum for hashing purposes, so we should just always always initialize it. We also want to prevent NULL pointer dereferences if one of the metadata checksum features is enabled after the file sytsem is originally mounted. This issue has been assigned CVE-2018-1094. https://bugzilla.kernel.org/show_bug.cgi?id=199183 https://bugzilla.redhat.com/show_bug.cgi?id=1560788 Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@vger.kernel.org	2018-03-29 22:10:31 -04:00
Theodore Ts'o	8e4b5eae5d	ext4: fail ext4_iget for root directory if unallocated If the root directory has an i_links_count of zero, then when the file system is mounted, then when ext4_fill_super() notices the problem and tries to call iput() the root directory in the error return path, ext4_evict_inode() will try to free the inode on disk, before all of the file system structures are set up, and this will result in an OOPS caused by a NULL pointer dereference. This issue has been assigned CVE-2018-1092. https://bugzilla.kernel.org/show_bug.cgi?id=199179 https://bugzilla.redhat.com/show_bug.cgi?id=1560777 Reported-by: Wen Xu <wen.xu@gatech.edu> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@vger.kernel.org	2018-03-29 21:56:09 -04:00
Eric Biggers	ce3fd194fc	ext4: limit xattr size to INT_MAX ext4 isn't validating the sizes of xattrs where the value of the xattr is stored in an external inode. This is problematic because ->e_value_size is a u32, but ext4_xattr_get() returns an int. A very large size is misinterpreted as an error code, which ext4_get_acl() translates into a bogus ERR_PTR() for which IS_ERR() returns false, causing a crash. Fix this by validating that all xattrs are <= INT_MAX bytes. This issue has been assigned CVE-2018-1095. https://bugzilla.kernel.org/show_bug.cgi?id=199185 https://bugzilla.redhat.com/show_bug.cgi?id=1560793 Reported-by: Wen Xu <wen.xu@gatech.edu> Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@vger.kernel.org Fixes: `e50e5129f3` ("ext4: xattr-in-inode support")	2018-03-29 14:31:42 -04:00
Christoph Hellwig	0e11f6443f	fs: move I_DIRTY_INODE to fs.h And use it in a few more places rather than opencoding the values. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2018-03-28 01:39:02 -04:00
Theodore Ts'o	7dac4a1726	ext4: add validity checks for bitmap block numbers An privileged attacker can cause a crash by mounting a crafted ext4 image which triggers a out-of-bounds read in the function ext4_valid_block_bitmap() in fs/ext4/balloc.c. This issue has been assigned CVE-2018-1093. BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=199181 BugLink: https://bugzilla.redhat.com/show_bug.cgi?id=1560782 Reported-by: Wen Xu <wen.xu@gatech.edu> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@vger.kernel.org	2018-03-26 23:54:10 -04:00
zhenwei.pi	dcae058a8d	ext4: fix comments in ext4_swap_extents() "mark_unwritten" in comment and "unwritten" in the function arguments is mismatched. Signed-off-by: zhenwei.pi <zhenwei.pi@youruncloud.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2018-03-26 01:44:03 -04:00
Goldwyn Rodrigues	043d20d159	ext4: use generic_writepages instead of __writepage/write_cache_pages Code cleanup. Instead of writing an internal static function, use the available generic_writepages(). Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2018-03-26 01:32:50 -04:00
Eric Sandeen	0d9366d67b	ext4: don't complain about incorrect features when probing If mount is auto-probing for filesystem type, it will try various filesystems in order, with the MS_SILENT flag set. We get that flag as the silent arg to ext4_fill_super. If we're probing (silent==1) then don't complain about feature incompatibilities that are found if it looks like it's actually a different valid extN type - failed probes should be silent in this case. If the on-disk features are unknown even to ext4, then complain. Reported-by: Joakim Tjernlund <Joakim.Tjernlund@infinera.com> Tested-by: Joakim Tjernlund <Joakim.Tjernlund@infinera.com> Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Jan Kara <jack@suse.cz>	2018-03-22 11:59:00 -04:00
Nikolay Borisov	1d39834fba	ext4: remove EXT4_STATE_DIOREAD_LOCK flag Commit `16c5468859` ("ext4: Allow parallel DIO reads") reworked the way locking happens around parallel dio reads. This resulted in obviating the need for EXT4_STATE_DIOREAD_LOCK flag and accompanying logic. Currently this amounts to dead code so let's remove it. No functional changes Signed-off-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Jan Kara <jack@suse.cz>	2018-03-22 11:52:10 -04:00
Jiri Slaby	fe23cb65c2	ext4: fix offset overflow on 32-bit archs in ext4_iomap_begin() ext4_iomap_begin() has a bug where offset returned in the iomap structure will be truncated to unsigned long size. On 64-bit architectures this is fine but on 32-bit architectures obviously not. Not many places actually use the offset stored in the iomap structure but one of visible failures is in SEEK_HOLE / SEEK_DATA implementation. If we create a file like: dd if=/dev/urandom of=file bs=1k seek=8m count=1 then lseek64("file", 0x100000000ULL, SEEK_DATA) wrongly returns 0x100000000 on unfixed kernel while it should return 0x200000000. Avoid the overflow by proper type cast. Fixes: `545052e9e3` ("ext4: Switch to iomap for SEEK_HOLE / SEEK_DATA") Signed-off-by: Jiri Slaby <jslaby@suse.cz> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@vger.kernel.org # v4.15	2018-03-22 11:50:26 -04:00
Eryu Guan	45d8ec4d9f	ext4: update i_disksize if direct write past ondisk size Currently in ext4 direct write path, we update i_disksize only when new eof is greater than i_size, and don't update it even when new eof is greater than i_disksize but less than i_size. This doesn't work well with delalloc buffer write, which updates i_size and i_disksize only when delalloc blocks are resolved (at writeback time), the i_disksize from direct write can be lost if a previous buffer write succeeded at write time but failed at writeback time, then results in corrupted ondisk inode size. Consider this case, first buffer write 4k data to a new file at offset 16k with delayed allocation, then direct write 4k data to the same file at offset 4k before delalloc blocks are resolved, which doesn't update i_disksize because it writes within i_size(20k), but the extent tree metadata has been committed in journal. Then writeback of the delalloc blocks fails (due to device error etc.), and i_size/i_disksize from buffer write can't be written to disk (still zero). A subsequent umount/mount cycle recovers journal and writes extent tree metadata from direct write to disk, but with i_disksize being zero. Fix it by updating i_disksize too in direct write path when new eof is greater than i_disksize but less than i_size, so i_disksize is always consistent with direct write. This fixes occasional i_size corruption in fstests generic/475. Signed-off-by: Eryu Guan <guaneryu@gmail.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2018-03-22 11:44:59 -04:00
Eryu Guan	73fdad00b2	ext4: protect i_disksize update by i_data_sem in direct write path i_disksize update should be protected by i_data_sem, by either taking the lock explicitly or by using ext4_update_i_disksize() helper. But the i_disksize updates in ext4_direct_IO_write() are not protected at all, which may be racing with i_disksize updates in writeback path in delalloc buffer write path. This is found by code inspection, and I didn't hit any i_disksize corruption due to this bug. Thanks to Jan Kara for catching this bug and suggesting the fix! Reported-by: Jan Kara <jack@suse.cz> Suggested-by: Jan Kara <jack@suse.cz> Signed-off-by: Eryu Guan <guaneryu@gmail.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@vger.kernel.org	2018-03-22 11:41:25 -04:00
Theodore Ts'o	044e6e3d74	ext4: don't update checksum of new initialized bitmaps When reading the inode or block allocation bitmap, if the bitmap needs to be initialized, do not update the checksum in the block group descriptor. That's because we're not set up to journal those changes. Instead, just set the verified bit on the bitmap block, so that it's not necessary to validate the checksum. When a block or inode allocation actually happens, at that point the checksum will be calculated, and update of the bg descriptor block will be properly journalled. Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@vger.kernel.org	2018-02-19 14:16:47 -05:00
Theodore Ts'o	fb7c02445c	ext4: pass -ESHUTDOWN code to jbd2 layer Previously the jbd2 layer assumed that a file system check would be required after a journal abort. In the case of the deliberate file system shutdown, this should not be necessary. Allow the jbd2 layer to distinguish between these two cases by using the ESHUTDOWN errno. Also add proper locking to __journal_abort_soft(). Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@vger.kernel.org	2018-02-18 23:45:18 -05:00
Theodore Ts'o	a6d9946bb9	ext4: eliminate sleep from shutdown ioctl The msleep() when processing EXT4_GOING_FLAGS_NOLOGFLUSH was a hack to avoid some races (that are now fixed), but in fact it introduced its own race. Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@vger.kernel.org	2018-02-18 23:16:28 -05:00
Theodore Ts'o	576d18ed60	ext4: shutdown should not prevent get_write_access The ext4 forced shutdown flag needs to prevent new handles from being started, but it needs to allow existing handles to complete. So the forced shutdown flag should not force ext4_journal_get_write_access to fail. Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@vger.kernel.org	2018-02-18 22:07:36 -05:00
Theodore Ts'o	ccf0f32acd	ext4: add tracepoints for shutdown and file system errors Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2018-02-18 20:53:23 -05:00
Linus Torvalds	6fbac201f9	iversion.h related cleanup for v4.16 -----BEGIN PGP SIGNATURE----- iQIcBAABAgAGBQJae0mSAAoJEAAOaEEZVoIVs98P+wSbwfgLeyTufmrRYrD9kxfh EQXfuvnJqPzRHLJIUXfwzTN3IV9RZ1434ci31lZvQE3PKrgb90QuBLiR6OIKULef UqpYRmjsg7BfFBdAnyUR8xSmmeN94PjXQk7tG+YQn096HJVZ6cG5qCA8RjJ9dFoq 2haDcOfDU+3e8mbtrrF4doP6jGrVwV+okqRsshFBclQv62Kk3m7L5AjQINyZpTM5 ZKX5JIMOAmlJcHsz/2J1qLAIRQKsvEUbRLV43bzp3E03PuVFPhig3dVtpGPUe+Yi OW0JX49hIoTCrQ4KZk6uweLG7ZpaSoppXggEi2ERNCUkCf3nhejLlScfye+yLx7f sItgPkOYU0VVF70Y72XH1DbOekZr/XCLZdEEUNCS/P68hnyK0gBNC9zPGetlxMMi wjjQ9Qe45vD2JFlrvhHrdUdCnxnE05zC9ckBrmM94uRwIfDR0WVgo6pfebfRkAJd Wp4/PfbaySY7vk4oyaXlNxcDIH2NvWwYkioI/K9rRGbB2KjTdXonQojBy+rT0LeS f3mufyZYyCxdwu3Wf8WO36H23L+4fseMthKIIPA0aL4wasB9LgD8gDnkyKx28DT4 S32tdK4UALC8SAVsPr+vSaMVzKOZmuNHac+XB2i+5lHl8G/n4M2a+JFTeR4CnKJ/ 9LsBEBL5Oj7ZXL7lfFIO =iEKM -----END PGP SIGNATURE----- Merge tag 'iversion-v4.16-2' of git://git.kernel.org/pub/scm/linux/kernel/git/jlayton/linux Pull inode->i_version cleanup from Jeff Layton: "Goffredo went ahead and sent a patch to rename this function, and reverse its sense, as we discussed last week. The patch is very straightforward and I figure it's probably best to go ahead and merge this to get the API as settled as possible" * tag 'iversion-v4.16-2' of git://git.kernel.org/pub/scm/linux/kernel/git/jlayton/linux: iversion: Rename make inode_cmp_iversion{+raw} to inode_eq_iversion{+raw}	2018-02-07 14:25:22 -08:00
Linus Torvalds	3ff1b28caa	libnvdimm for 4.16 * Require struct page by default for filesystem DAX to remove a number of surprising failure cases. This includes failures with direct I/O, gdb and fork(2). * Add support for the new Platform Capabilities Structure added to the NFIT in ACPI 6.2a. This new table tells us whether the platform supports flushing of CPU and memory controller caches on unexpected power loss events. * Revamp vmem_altmap and dev_pagemap handling to clean up code and better support future future PCI P2P uses. * Deprecate the ND_IOCTL_SMART_THRESHOLD command whose payload has become out-of-sync with recent versions of the NVDIMM_FAMILY_INTEL spec, and instead rely on the generic ND_CMD_CALL approach used by the two other IOCTL families, NVDIMM_FAMILY_{HPE,MSFT}. * Enhance nfit_test so we can test some of the new things added in version 1.6 of the DSM specification. This includes testing firmware download and simulating the Last Shutdown State (LSS) status. -----BEGIN PGP SIGNATURE----- iQIcBAABAgAGBQJaeOg0AAoJEJ/BjXdf9fLBAFoQAI/IgcgJ2h9lfEpgjBRTC44t 2p8dxwT1Ofw3Y1aR/tI8nYRXjRtAGuP4UIeRVnb1CL/N7PagJyoMGU+6hmzg+ptY c7cEDvw6nZOhrFwXx/xn7R53sYG8zH+UE6+jTR/PP/G4mQJfFCg4iF9R72Y7z0n7 aurf82Kz137NPUy6dNr4V9bmPMJWAaOci9WOj5SKddR5ZSNbjoxylTwQRvre5y4r 7HQTScEkirABOdSf1JoXTSUXCH/RC9UFFXR03ScHstGb1HjCj3KdcicVc50Q++Ub qsEudhE6i44PEW1Hh4Qkg6hjHMEa8qHP+ShBuRuVaUmlghYTQn66niJAYLZilwdz EVjE7vR+toHA5g3YCalEmYVutUEhIDkh/xfpd7vM6ZorUGJy95a2elEJs2fHBffC gEhnCip7FROPcK5RDNUM8hBgnG/q5wwWPQMKY+6rKDZQx3mXssCrKp2Vlx7kBwMG rpblkEpYjPonbLEHxsSU8yTg9Uq55ciIWgnOToffcjZvjbihi8WUVlHcwHUMPf/o DWElg+4qmG0Sdd4S2NeAGwTl1Ewrf2RrtUGMjHtH4OUFs1wo6ZmfrxFzzMfoZ1Od ko/s65v4uwtTzECh2o+XQaNsReR5YETXxmA40N/Jpo7/7twABIoZ/ASvj/3ZBYj+ sie+u2rTod8/gQWSfHpJ =MIMX -----END PGP SIGNATURE----- Merge tag 'libnvdimm-for-4.16' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm Pull libnvdimm updates from Ross Zwisler: - Require struct page by default for filesystem DAX to remove a number of surprising failure cases. This includes failures with direct I/O, gdb and fork(2). - Add support for the new Platform Capabilities Structure added to the NFIT in ACPI 6.2a. This new table tells us whether the platform supports flushing of CPU and memory controller caches on unexpected power loss events. - Revamp vmem_altmap and dev_pagemap handling to clean up code and better support future future PCI P2P uses. - Deprecate the ND_IOCTL_SMART_THRESHOLD command whose payload has become out-of-sync with recent versions of the NVDIMM_FAMILY_INTEL spec, and instead rely on the generic ND_CMD_CALL approach used by the two other IOCTL families, NVDIMM_FAMILY_{HPE,MSFT}. - Enhance nfit_test so we can test some of the new things added in version 1.6 of the DSM specification. This includes testing firmware download and simulating the Last Shutdown State (LSS) status. * tag 'libnvdimm-for-4.16' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm: (37 commits) libnvdimm, namespace: remove redundant initialization of 'nd_mapping' acpi, nfit: fix register dimm error handling libnvdimm, namespace: make min namespace size 4K tools/testing/nvdimm: force nfit_test to depend on instrumented modules libnvdimm/nfit_test: adding support for unit testing enable LSS status libnvdimm/nfit_test: add firmware download emulation nfit-test: Add platform cap support from ACPI 6.2a to test libnvdimm: expose platform persistence attribute for nd_region acpi: nfit: add persistent memory control flag for nd_region acpi: nfit: Add support for detect platform CPU cache flush on power loss device-dax: Fix trailing semicolon libnvdimm, btt: fix uninitialized err_lock dax: require 'struct page' by default for filesystem dax ext2: auto disable dax instead of failing mount ext4: auto disable dax instead of failing mount mm, dax: introduce pfn_t_special() mm: Fix devm_memremap_pages() collision handling mm: Fix memory size alignment in devm_memremap_pages_release() memremap: merge find_dev_pagemap into get_dev_pagemap memremap: change devm_memremap_pages interface to use struct dev_pagemap ...	2018-02-06 10:41:33 -08:00
Linus Torvalds	3462ac5703	Refactor support for encrypted symlinks to move common code to fscrypt. -----BEGIN PGP SIGNATURE----- iQEzBAABCAAdFiEEK2m5VNv+CHkogTfJ8vlZVpUNgaMFAlp2R3AACgkQ8vlZVpUN gaOIdAgApEdlFR2Gf93z2hMj5HxVL5rjkuPJVtVkKu0eH2HMQJyxNmjymrRfuFmM 8W1CrEvVKi5Aj6r8q4KHIdVV247Ya0SVEhLwKM0LX4CvlZUXmwgCmZ/MPDTXA1eq C4vPVuJAuSNGNVYDlDs3+NiMHINGNVnBVQQFSPBP9P+iNWPD7o486712qaF8maVn RbfbQ2rWtOIRdlAOD1U5WqgQku59lOsmHk2pc0+X4LHCZFpMoaO80JVjENPAw+BF daRt6TX+WljMyx6DRIaszqau876CJhe/tqlZcCLOkpXZP0jJS13yodp26dVQmjCh w8YdiY7uHK2D+S/8eyj7h7DIwzu3vg== =ZjQP -----END PGP SIGNATURE----- Merge tag 'fscrypt_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/fscrypt Pull fscrypt updates from Ted Ts'o: "Refactor support for encrypted symlinks to move common code to fscrypt" Ted also points out about the merge: "This makes the f2fs symlink code use the fscrypt_encrypt_symlink() from the fscrypt tree. This will end up dropping the kzalloc() -> f2fs_kzalloc() change, which means the fscrypt-specific allocation won't get tested by f2fs's kmalloc error injection system; which is fine" * tag 'fscrypt_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/fscrypt: (26 commits) fscrypt: fix build with pre-4.6 gcc versions fscrypt: remove 'ci' parameter from fscrypt_put_encryption_info() fscrypt: document symlink length restriction fscrypt: fix up fscrypt_fname_encrypted_size() for internal use fscrypt: define fscrypt_fname_alloc_buffer() to be for presented names fscrypt: calculate NUL-padding length in one place only fscrypt: move fscrypt_symlink_data to fscrypt_private.h fscrypt: remove fscrypt_fname_usr_to_disk() ubifs: switch to fscrypt_get_symlink() ubifs: switch to fscrypt ->symlink() helper functions ubifs: free the encrypted symlink target f2fs: switch to fscrypt_get_symlink() f2fs: switch to fscrypt ->symlink() helper functions ext4: switch to fscrypt_get_symlink() ext4: switch to fscrypt ->symlink() helper functions fscrypt: new helper function - fscrypt_get_symlink() fscrypt: new helper functions for ->symlink() fscrypt: trim down fscrypt.h includes fscrypt: move fscrypt_is_dot_dotdot() to fs/crypto/fname.c fscrypt: move fscrypt_valid_enc_modes() to fscrypt_private.h ...	2018-02-04 10:43:12 -08:00
Linus Torvalds	617aebe6a9	Currently, hardened usercopy performs dynamic bounds checking on slab cache objects. This is good, but still leaves a lot of kernel memory available to be copied to/from userspace in the face of bugs. To further restrict what memory is available for copying, this creates a way to whitelist specific areas of a given slab cache object for copying to/from userspace, allowing much finer granularity of access control. Slab caches that are never exposed to userspace can declare no whitelist for their objects, thereby keeping them unavailable to userspace via dynamic copy operations. (Note, an implicit form of whitelisting is the use of constant sizes in usercopy operations and get_user()/put_user(); these bypass all hardened usercopy checks since these sizes cannot change at runtime.) This new check is WARN-by-default, so any mistakes can be found over the next several releases without breaking anyone's system. The series has roughly the following sections: - remove %p and improve reporting with offset - prepare infrastructure and whitelist kmalloc - update VFS subsystem with whitelists - update SCSI subsystem with whitelists - update network subsystem with whitelists - update process memory with whitelists - update per-architecture thread_struct with whitelists - update KVM with whitelists and fix ioctl bug - mark all other allocations as not whitelisted - update lkdtm for more sensible test overage -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 Comment: Kees Cook <kees@outflux.net> iQIcBAABCgAGBQJabvleAAoJEIly9N/cbcAmO1kQAJnjVPutnLSbnUteZxtsv7W4 43Cggvokfxr6l08Yh3hUowNxZVKjhF9uwMVgRRg9Nl5WdYCN+vCQbHz+ZdzGJXKq cGqdKWgexMKX+aBdNDrK7BphUeD46sH7JWR+a/lDV/BgPxBCm9i5ZZCgXbPP89AZ NpLBji7gz49wMsnm/x135xtNlZ3dG0oKETzi7MiR+NtKtUGvoIszSKy5JdPZ4m8q 9fnXmHqmwM6uQFuzDJPt1o+D1fusTuYnjI7EgyrJRRhQ+BB3qEFZApXnKNDRS9Dm uB7jtcwefJCjlZVCf2+PWTOEifH2WFZXLPFlC8f44jK6iRW2Nc+wVRisJ3vSNBG1 gaRUe/FSge68eyfQj5OFiwM/2099MNkKdZ0fSOjEBeubQpiFChjgWgcOXa5Bhlrr C4CIhFV2qg/tOuHDAF+Q5S96oZkaTy5qcEEwhBSW15ySDUaRWFSrtboNt6ZVOhug d8JJvDCQWoNu1IQozcbv6xW/Rk7miy8c0INZ4q33YUvIZpH862+vgDWfTJ73Zy9H jR/8eG6t3kFHKS1vWdKZzOX1bEcnd02CGElFnFYUEewKoV7ZeeLsYX7zodyUAKyi Yp5CImsDbWWTsptBg6h9nt2TseXTxYCt2bbmpJcqzsqSCUwOQNQ4/YpuzLeG0ihc JgOmUnQNJWCTwUUw5AS1 =tzmJ -----END PGP SIGNATURE----- Merge tag 'usercopy-v4.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux Pull hardened usercopy whitelisting from Kees Cook: "Currently, hardened usercopy performs dynamic bounds checking on slab cache objects. This is good, but still leaves a lot of kernel memory available to be copied to/from userspace in the face of bugs. To further restrict what memory is available for copying, this creates a way to whitelist specific areas of a given slab cache object for copying to/from userspace, allowing much finer granularity of access control. Slab caches that are never exposed to userspace can declare no whitelist for their objects, thereby keeping them unavailable to userspace via dynamic copy operations. (Note, an implicit form of whitelisting is the use of constant sizes in usercopy operations and get_user()/put_user(); these bypass all hardened usercopy checks since these sizes cannot change at runtime.) This new check is WARN-by-default, so any mistakes can be found over the next several releases without breaking anyone's system. The series has roughly the following sections: - remove %p and improve reporting with offset - prepare infrastructure and whitelist kmalloc - update VFS subsystem with whitelists - update SCSI subsystem with whitelists - update network subsystem with whitelists - update process memory with whitelists - update per-architecture thread_struct with whitelists - update KVM with whitelists and fix ioctl bug - mark all other allocations as not whitelisted - update lkdtm for more sensible test overage" * tag 'usercopy-v4.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: (38 commits) lkdtm: Update usercopy tests for whitelisting usercopy: Restrict non-usercopy caches to size 0 kvm: x86: fix KVM_XEN_HVM_CONFIG ioctl kvm: whitelist struct kvm_vcpu_arch arm: Implement thread_struct whitelist for hardened usercopy arm64: Implement thread_struct whitelist for hardened usercopy x86: Implement thread_struct whitelist for hardened usercopy fork: Provide usercopy whitelisting for task_struct fork: Define usercopy region in thread_stack slab caches fork: Define usercopy region in mm_struct slab caches net: Restrict unwhitelisted proto caches to size 0 sctp: Copy struct sctp_sock.autoclose to userspace using put_user() sctp: Define usercopy region in SCTP proto slab cache caif: Define usercopy region in caif proto slab cache ip: Define usercopy region in IP proto slab cache net: Define usercopy region in struct proto slab cache scsi: Define usercopy region in scsi_sense_cache slab cache cifs: Define usercopy region in cifs_request slab cache vxfs: Define usercopy region in vxfs_inode slab cache ufs: Define usercopy region in ufs_inode_cache slab cache ...	2018-02-03 16:25:42 -08:00
Linus Torvalds	23aedc4b9b	Only miscellaneous cleanups and bug fixes for ext4 this cycle. -----BEGIN PGP SIGNATURE----- iQEzBAABCAAdFiEEK2m5VNv+CHkogTfJ8vlZVpUNgaMFAlp16xMACgkQ8vlZVpUN gaP1IAf8C48AKVnqy6ftFphzV1CdeGHDwJLL63lChs97fNr1mxo5TZE/6vdYB55j k7C7huQ582cEiGWQJ0U4/+En0hF85zkAk5mTfnSao5BqxLr9ANsAocwBUNBXdFSp B7IyMo4Dct7NCkwfmKLPRcEqZ49vwyv99TqM/9wUkgUStkTjPT7bhHgarB6VPbhp BxoXVnFYgU0sZN0y71IBt8ngWqCK6j7fjw3gsl37oEenG3/h3SO0H9ih1FrysX8S VOwwLJq6vfAgEwQvZACnBwWKDYsZpH7akNp9WGeDMByo28t514RNRjIi0mvLHEZa h72I8Sb3bwHO9MJNvHFe/0b1Say4vw== =dxAX -----END PGP SIGNATURE----- Merge tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 Pull ext4 updates from Ted Ts'o: "Only miscellaneous cleanups and bug fixes for ext4 this cycle" * tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: ext4: create ext4_kset dynamically ext4: create ext4_feat kobject dynamically ext4: release kobject/kset even when init/register fail ext4: fix incorrect indentation of if statement ext4: correct documentation for grpid mount option ext4: use 'sbi' instead of 'EXT4_SB(sb)' ext4: save error to disk in __ext4_grp_locked_error() jbd2: fix sphinx kernel-doc build warnings ext4: fix a race in the ext4 shutdown path mbcache: make sure c_entry_count is not decremented past zero ext4: no need flush workqueue before destroying it ext4: fixed alignment and minor code cleanup in ext4.h ext4: fix ENOSPC handling in DAX page fault handler dax: pass detailed error code from dax_iomap_fault() mbcache: revert "fs/mbcache.c: make count_objects() more robust" mbcache: initialize entry->e_referenced in mb_cache_entry_create() ext4: fix up remaining files with SPDX cleanups	2018-02-03 13:49:22 -08:00
Goffredo Baroncelli	c472c07bfe	iversion: Rename make inode_cmp_iversion{+raw} to inode_eq_iversion{+raw} The function inode_cmp_iversion{+raw} is counter-intuitive, because it returns true when the counters are different and false when these are equal. Rename it to inode_eq_iversion{+raw}, which will returns true when the counters are equal and false otherwise. Signed-off-by: Goffredo Baroncelli <kreijack@inwind.it> Signed-off-by: Jeff Layton <jlayton@redhat.com>	2018-02-01 08:15:25 -05:00
Linus Torvalds	255442c938	Documentation updates for 4.16. New stuff includes refcount_t documentation, errseq documentation, kernel-doc support for nested structure definitions, the removal of lots of crufty kernel-doc support for unused formats, SPDX tag documentation, the beginnings of a manual for subsystem maintainers, and lots of fixes and updates. As usual, some of the changesets reach outside of Documentation/ to effect kerneldoc comment fixes. It also adds the new LICENSES directory, of which Thomas promises I do not need to be the maintainer. -----BEGIN PGP SIGNATURE----- iQIcBAABAgAGBQJab11TAAoJEI3ONVYwIuV6i1UP/1LgGPHW9Ygq5qaLFbReZd/u Mx/orrhHX0PdkbCCE+CbL8Vm1m4UKFDTBdlpk3s542zxeeG0ZBXuTnvq4Kyk+cTN p4/vsIEzk/Ih13/glGE5MlV+EjiEK+8hK69TIUj7bAyuHmpzofjRz9/1M6RLDGDC HY6UI58AXG0yOQWMWCGRMYpQAFUGij2equ7Doe1ugXRq14dx7V4RsOhI140iRk7t bquAq1rS2fXniiuPFmLBUe4dWW28isVa/Vl/aXcaWQDKMyT0OLhjOMW36wWKqtPi WdVCpHv1NLZNyZZr9S3kvfOwW+BUqpEzfVwssyBLW4h0tsnIx0U0HVhSTY8/TvFZ QD9yCSana4LB/e5CHXIX5lBHbjHxf+rETXqVV4MgwDaMvM3mCo4X6WUTJDmZADo6 vQISEKeb4su5uWAbc9T9xwRSLhZnFVdJ/QuYdNQ5+EpFJYLhzQ9eBvEz6JstSIXL p9ASBiPNY3ulpVZ8q0JOHJRBhq5mHJH6Dy8achzbILy2l/ZI4b8lJ53mw9II04cp puF96E6HpvuZ8Tgjjrg9U3ZdxXNrUgc/tjk2ZDkyTglk1XF2jKSq2tiNSZ3oLrJm XqJPnpCeyJM5UDvwkIBzgC41WEHwe8uvoNbUnc4X7UJSZegFzcSLQXf5qaprHS5k XeQ7sbd+S+jzVVjFi0W5 =Z15Z -----END PGP SIGNATURE----- Merge tag 'docs-4.16' of git://git.lwn.net/linux Pull documentation updates from Jonathan Corbet: "Documentation updates for 4.16. New stuff includes refcount_t documentation, errseq documentation, kernel-doc support for nested structure definitions, the removal of lots of crufty kernel-doc support for unused formats, SPDX tag documentation, the beginnings of a manual for subsystem maintainers, and lots of fixes and updates. As usual, some of the changesets reach outside of Documentation/ to effect kerneldoc comment fixes. It also adds the new LICENSES directory, of which Thomas promises I do not need to be the maintainer" * tag 'docs-4.16' of git://git.lwn.net/linux: (65 commits) linux-next: docs-rst: Fix typos in kfigure.py linux-next: DOC: HWPOISON: Fix path to debugfs in hwpoison.txt Documentation: Fix misconversion of #if docs: add index entry for networking/msg_zerocopy Documentation: security/credentials.rst: explain need to sort group_list LICENSES: Add MPL-1.1 license LICENSES: Add the GPL 1.0 license LICENSES: Add Linux syscall note exception LICENSES: Add the MIT license LICENSES: Add the BSD-3-clause "Clear" license LICENSES: Add the BSD 3-clause "New" or "Revised" License LICENSES: Add the BSD 2-clause "Simplified" license LICENSES: Add the LGPL-2.1 license LICENSES: Add the LGPL 2.0 license LICENSES: Add the GPL 2.0 license Documentation: Add license-rules.rst to describe how to properly identify file licenses scripts: kernel_doc: better handle show warnings logic fs/*/Kconfig: drop links to 404-compliant http://acl.bestbits.at doc: md: Fix a file name to md-fault.c in fault-injection.txt errseq: Add to documentation tree ...	2018-01-31 19:25:25 -08:00
Linus Torvalds	a4b7fd7d34	inode->i_version rework for v4.16 -----BEGIN PGP SIGNATURE----- iQIcBAABAgAGBQJabwjlAAoJEAAOaEEZVoIVeEEP/R84kZJjlZV/vNmFFvY46jM+ 0hpMHXRNym+nW1Du1CKNkesEUAY8ACAQIyzJh63Q72341QTDdz3+asHwPYRNOqdC PgryidPieojkNKQg+h7dmoKYlYh1xiCicvn66Q5PFb9B0lH36twekOK4X1qqJj8Z breRmRoFLka9looMSuYgwbErts023fmASalvGum6T0ZM/7F9hUj4O3OsQtKTLUNM VQ+gLJTQrUqrgzvWUwq3WTMa9YAaKP4oad8nsglNSpiVLG7WtURr5HokW9hAziqL k99Y+K2ni1wZJlNGJAyV7PyEG2ieI5Xn+LzM2RM+SndD1QHF2QXACmSTDYfL51k5 G2RsKeTZvQPtX4qx9+vnCp/4oV6JduvCaq2Mt8SQb9nYZxKjs85TNLrARJv+85eQ zP0OTxlH1Gfu3j36n3cny4XemyMYYF4hCFYfRPqTGst37fgLBtfIfUSQ6jedoCK2 Xcyb6ukGXMh6If/A7DSy91hvSSPrWSH7TPPsbfLy6o+wUOtpAGR4eXVlEuAiXrzc gnoAz85oIMUQae66LrdrPk1NyE59qOb24g/yU5gyRBSpi2+/aoboNCKaD73tgs/C XIMwGXLYmqkcud7IBQF0tHHiM+jsEkbSM4LUqRXSnqMdwNnS18Z4Q+JKqpdP0cii eRdenDvUfu8Gu1Y9vWBv =iihN -----END PGP SIGNATURE----- Merge tag 'iversion-v4.16-1' of git://git.kernel.org/pub/scm/linux/kernel/git/jlayton/linux Pull inode->i_version rework from Jeff Layton: "This pile of patches is a rework of the inode->i_version field. We have traditionally incremented that field on every inode data or metadata change. Typically this increment needs to be logged on disk even when nothing else has changed, which is rather expensive. It turns out though that none of the consumers of that field actually require this behavior. The only real requirement for all of them is that it be different iff the inode has changed since the last time the field was checked. Given that, we can optimize away most of the i_version increments and avoid dirtying inode metadata when the only change is to the i_version and no one is querying it. Queries of the i_version field are rather rare, so we can help write performance under many common workloads. This patch series converts existing accesses of the i_version field to a new API, and then converts all of the in-kernel filesystems to use it. The last patch in the series then converts the backend implementation to a scheme that optimizes away a large portion of the metadata updates when no one is looking at it. In my own testing this series significantly helps performance with small I/O sizes. I also got this email for Christmas this year from the kernel test robot (a 244% r/w bandwidth improvement with XFS over DAX, with 4k writes): https://lkml.org/lkml/2017/12/25/8 A few of the earlier patches in this pile are also flowing to you via other trees (mm, integrity, and nfsd trees in particular)". * tag 'iversion-v4.16-1' of git://git.kernel.org/pub/scm/linux/kernel/git/jlayton/linux: (22 commits) fs: handle inode->i_version more efficiently btrfs: only dirty the inode in btrfs_update_time if something was changed xfs: avoid setting XFS_ILOG_CORE if i_version doesn't need incrementing fs: only set S_VERSION when updating times if necessary IMA: switch IMA over to new i_version API xfs: convert to new i_version API ufs: use new i_version API ocfs2: convert to new i_version API nfsd: convert to new i_version API nfs: convert to new i_version API ext4: convert to new i_version API ext2: convert to new i_version API exofs: switch to new i_version API btrfs: convert to new i_version API afs: convert to new i_version API affs: convert to new i_version API fat: convert to new i_version API fs: don't take the i_lock in inode_inc_iversion fs: new API for handling inode->i_version ntfs: remove i_version handling ...	2018-01-29 13:33:53 -08:00
Jeff Layton	ee73f9a52a	ext4: convert to new i_version API Signed-off-by: Jeff Layton <jlayton@redhat.com> Acked-by: Theodore Ts'o <tytso@mit.edu>	2018-01-29 06:42:21 -05:00
Jeff Layton	ae5e165d85	fs: new API for handling inode->i_version Add a documentation blob that explains what the i_version field is, how it is expected to work, and how it is currently implemented by various filesystems. We already have inode_inc_iversion. Add several other functions for manipulating and accessing the i_version counter. For now, the implementation is trivial and basically works the way that all of the open-coded i_version accesses work today. Future patches will convert existing users of i_version to use the new API, and then convert the backend implementation to do things more efficiently. Signed-off-by: Jeff Layton <jlayton@redhat.com> Reviewed-by: Jan Kara <jack@suse.cz>	2018-01-29 06:41:30 -05:00
Dan Williams	24f3478d66	ext4: auto disable dax instead of failing mount Bring the ext4 filesystem in line with xfs that only warns and continues when the "-o dax" option is specified to mount and the backing device does not support dax. This is in preparation for removing dax support from devices that do not enable get_user_pages() operations on dax mappings. In other words 'gup' support is required and configurations that were using so called 'page-less' dax will be converted back to using the page cache. Removing the broken 'page-less' dax support is a pre-requisite for removing the "EXPERIMENTAL" warning when mounting a filesystem in dax mode. Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Dan Williams <dan.j.williams@intel.com>	2018-01-19 16:50:53 -08:00
David Windsor	f8dd7c7086	ext4: Define usercopy region in ext4_inode_cache slab cache The ext4 symlink pathnames, stored in struct ext4_inode_info.i_data and therefore contained in the ext4_inode_cache slab cache, need to be copied to/from userspace. cache object allocation: fs/ext4/super.c: ext4_alloc_inode(...): struct ext4_inode_info ei; ... ei = kmem_cache_alloc(ext4_inode_cachep, GFP_NOFS); ... return &ei->vfs_inode; include/trace/events/ext4.h: #define EXT4_I(inode) \ (container_of(inode, struct ext4_inode_info, vfs_inode)) fs/ext4/namei.c: ext4_symlink(...): ... inode->i_link = (char )&EXT4_I(inode)->i_data; example usage trace: readlink_copy+0x43/0x70 vfs_readlink+0x62/0x110 SyS_readlinkat+0x100/0x130 fs/namei.c: readlink_copy(..., link): ... copy_to_user(..., link, len) (inlined into vfs_readlink) generic_readlink(dentry, ...): struct inode inode = d_inode(dentry); const char link = inode->i_link; ... readlink_copy(..., link); In support of usercopy hardening, this patch defines a region in the ext4_inode_cache slab cache in which userspace copy operations are allowed. This region is known as the slab cache's usercopy region. Slab caches can now check that each dynamically sized copy operation involving cache-managed memory falls entirely within the slab's usercopy region. This patch is modified from Brad Spengler/PaX Team's PAX_USERCOPY whitelisting code in the last public patch of grsecurity/PaX based on my understanding of the code. Changes or omissions from the original code are mine and don't reflect the original grsecurity/PaX code. Signed-off-by: David Windsor <dave@nullcore.net> [kees: adjust commit log, provide usage trace] Cc: "Theodore Ts'o" <tytso@mit.edu> Cc: Andreas Dilger <adilger.kernel@dilger.ca> Cc: linux-ext4@vger.kernel.org Signed-off-by: Kees Cook <keescook@chromium.org>	2018-01-15 12:07:52 -08:00
Eric Biggers	3d204e24d4	fscrypt: remove 'ci' parameter from fscrypt_put_encryption_info() fscrypt_put_encryption_info() is only called when evicting an inode, so the 'struct fscrypt_info *ci' parameter is always NULL, and there cannot be races with other threads. This was cruft left over from the broken key revocation code. Remove the unused parameter and the cmpxchg(). Also remove the #ifdefs around the fscrypt_put_encryption_info() calls, since fscrypt_notsupp.h defines a no-op stub for it. Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2018-01-11 23:30:13 -05:00
Eric Biggers	6a9269c838	ext4: switch to fscrypt_get_symlink() Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2018-01-11 22:10:40 -05:00
Eric Biggers	78e1060c94	ext4: switch to fscrypt ->symlink() helper functions Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2018-01-11 22:10:40 -05:00
Riccardo Schirone	5dc397113d	ext4: create ext4_kset dynamically ksets contain a kobject and they should always be allocated dynamically, because it is unknown to whoever creates them when ksets can be released. Signed-off-by: Riccardo Schirone <sirmy15@gmail.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2018-01-11 15:34:04 -05:00
Riccardo Schirone	b99fee58a2	ext4: create ext4_feat kobject dynamically kobjects should always be allocated dynamically, because it is unknown to whoever creates them when kobjects can be released. Signed-off-by: Riccardo Schirone <sirmy15@gmail.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2018-01-11 15:11:32 -05:00
Riccardo Schirone	95c4df0293	ext4: release kobject/kset even when init/register fail Even when kobject_init_and_add/kset_register fail, the kobject has been already initialized and the refcount set to 1. Thus it is necessary to release the kobject/kset, to avoid the memory associated with it hanging around forever. Signed-off-by: Riccardo Schirone <sirmy15@gmail.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2018-01-11 14:28:13 -05:00
Colin Ian King	a794df0ecd	ext4: fix incorrect indentation of if statement The indentation is incorrect and spaces need replacing with a tab on the if statement. Cleans up smatch warning: fs/ext4/namei.c:3220 ext4_link() warn: inconsistent indenting Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Jan Kara <jack@suse.cz>	2018-01-11 14:17:30 -05:00
Jun Piao	49598e04b5	ext4: use 'sbi' instead of 'EXT4_SB(sb)' We could use 'sbi' instead of 'EXT4_SB(sb)' to make code more elegant. Signed-off-by: Jun Piao <piaojun@huawei.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Jan Kara <jack@suse.cz>	2018-01-11 13:17:49 -05:00
Zhouyi Zhou	06f29cc81f	ext4: save error to disk in __ext4_grp_locked_error() In the function __ext4_grp_locked_error(), __save_error_info() is called to save error info in super block block, but does not sync that information to disk to info the subsequence fsck after reboot. This patch writes the error information to disk. After this patch, I think there is no obvious EXT4 error handle branches which leads to "Remounting filesystem read-only" will leave the disk partition miss the subsequence fsck. Signed-off-by: Zhouyi Zhou <zhouzhouyi@gmail.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@vger.kernel.org	2018-01-10 00:34:19 -05:00
Harshad Shirwadkar	abbc3f9395	ext4: fix a race in the ext4 shutdown path This patch fixes a race between the shutdown path and bio completion handling. In the ext4 direct io path with async io, after submitting a bio to the block layer, if journal starting fails, ext4_direct_IO_write() would bail out pretending that the IO failed. The caller would have had no way of knowing whether or not the IO was successfully submitted. So instead, we return -EIOCBQUEUED in this case. Now, the caller knows that the IO was submitted. The bio completion handler takes care of the error. Tested: Ran the shutdown xfstest test 461 in loop for over 2 hours across 4 machines resulting in over 400 runs. Verified that the race didn't occur. Usually the race was seen in about 20-30 iterations. Signed-off-by: Harshad Shirwadkar <harshads@google.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@vger.kernel.org	2018-01-10 00:13:13 -05:00
piaojun	a90ac0f5dc	ext4: no need flush workqueue before destroying it destroy_workqueue() will do flushing work for us. Signed-off-by: Jun Piao <piaojun@huawei.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Jan Kara <jack@suse.cz>	2018-01-09 21:32:41 -05:00
Petros Koutoupis	e7093f0d63	ext4: fixed alignment and minor code cleanup in ext4.h Signed-off-by: Petros Koutoupis <petros@petroskoutoupis.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2018-01-07 23:36:19 -05:00
Jan Kara	2244642310	ext4: fix ENOSPC handling in DAX page fault handler When allocation of underlying block for a page fault fails, we fail the fault with SIGBUS. However we may well hit ENOSPC just due to lots of free blocks being held by the running / committing transaction. So propagate the error from ext4_iomap_begin() and implement do standard allocation retry loop in ext4_dax_huge_fault(). Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2018-01-07 16:41:01 -05:00
Jan Kara	c0b2462597	dax: pass detailed error code from dax_iomap_fault() Ext4 needs to pass through error from its iomap handler to the page fault handler so that it can properly detect ENOSPC and force transaction commit and retry the fault (and block allocation). Add argument to dax_iomap_fault() for passing such error. Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2018-01-07 16:38:43 -05:00
Adam Borowski	91581e4c60	fs/*/Kconfig: drop links to 404-compliant http://acl.bestbits.at This link is replicated in most filesystems' config stanzas. Referring to an archived version of that site is pointless as it mostly deals with patches; user documentation is available elsewhere. Signed-off-by: Adam Borowski <kilobyte@angband.pl> CC: Alexander Viro <viro@zeniv.linux.org.uk> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Acked-by: Jan Kara <jack@suse.cz> Acked-by: Dave Kleikamp <dave.kleikamp@oracle.com> Acked-by: David Sterba <dsterba@suse.com> Acked-by: "Yan, Zheng" <zyan@redhat.com> Acked-by: Chao Yu <yuchao0@huawei.com> Acked-by: Jaegeuk Kim <jaegeuk@kernel.org> Acked-by: Steve French <smfrench@gmail.com> Signed-off-by: Jonathan Corbet <corbet@lwn.net>	2018-01-01 12:45:37 -07:00
Theodore Ts'o	f516676857	ext4: fix up remaining files with SPDX cleanups A number of ext4 source files were skipped due because their copyright permission statements didn't match the expected text used by the automated conversion utilities. I've added SPDX tags for the rest. While looking at some of these files, I've noticed that we have quite a bit of variation on the licenses that were used --- in particular some of the Red Hat licenses on the jbd2 files use a GPL2+ license, and we have some files that have a LGPL-2.1 license (which was quite surprising). I've not attempted to do any license changes. Even if it is perfectly legal to relicense to GPL 2.0-only for consistency's sake, that should be done with ext4 developer community discussion. Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2017-12-17 22:00:59 -05:00
Chandan Rajendra	9d5afec6b8	ext4: fix crash when a directory's i_size is too small On a ppc64 machine, when mounting a fuzzed ext2 image (generated by fsfuzzer) the following call trace is seen, VFS: brelse: Trying to free free buffer WARNING: CPU: 1 PID: 6913 at /root/repos/linux/fs/buffer.c:1165 .__brelse.part.6+0x24/0x40 .__brelse.part.6+0x20/0x40 (unreliable) .ext4_find_entry+0x384/0x4f0 .ext4_lookup+0x84/0x250 .lookup_slow+0xdc/0x230 .walk_component+0x268/0x400 .path_lookupat+0xec/0x2d0 .filename_lookup+0x9c/0x1d0 .vfs_statx+0x98/0x140 .SyS_newfstatat+0x48/0x80 system_call+0x58/0x6c This happens because the directory that ext4_find_entry() looks up has inode->i_size that is less than the block size of the filesystem. This causes 'nblocks' to have a value of zero. ext4_bread_batch() ends up not reading any of the directory file's blocks. This renders the entries in bh_use[] array to continue to have garbage data. buffer_uptodate() on bh_use[0] can then return a zero value upon which brelse() function is invoked. This commit fixes the bug by returning -ENOENT when the directory file has no associated blocks. Reported-by: Abdul Haleem <abdhalee@linux.vnet.ibm.com> Signed-off-by: Chandan Rajendra <chandan@linux.vnet.ibm.com> Cc: stable@vger.kernel.org	2017-12-11 15:00:57 -05:00
Theodore Ts'o	996fc4477a	ext4: add missing error check in __ext4_new_inode() It's possible for ext4_get_acl() to return an ERR_PTR. So we need to add a check for this case in __ext4_new_inode(). Otherwise on an error we can end up oops the kernel. This was getting triggered by xfstests generic/388, which is a test which exercises the shutdown code path. Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@vger.kernel.org	2017-12-10 23:44:11 -05:00
Eryu Guan	c894aa9757	ext4: fix fdatasync(2) after fallocate(2) operation Currently, fallocate(2) with KEEP_SIZE followed by a fdatasync(2) then crash, we'll see wrong allocated block number (stat -c %b), the blocks allocated beyond EOF are all lost. fstests generic/468 exposes this bug. Commit `67a7d5f561` ("ext4: fix fdatasync(2) after extent manipulation operations") fixed all the other extent manipulation operation paths such as hole punch, zero range, collapse range etc., but forgot the fallocate case. So similarly, fix it by recording the correct journal tid in ext4 inode in fallocate(2) path, so that ext4_sync_file() will wait for the right tid to be committed on fdatasync(2). This addresses the test failure in xfstests test generic/468. Signed-off-by: Eryu Guan <eguan@redhat.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@vger.kernel.org	2017-12-03 22:52:51 -05:00
Andi Kleen	fc82228a5e	ext4: support fast symlinks from ext3 file systems `407cd7fb83` (ext4: change fast symlink test to not rely on i_blocks) broke ~10 years old ext3 file systems created by 2.6.17. Any ELF executable fails because the /lib/ld-linux.so.2 fast symlink cannot be read anymore. The patch assumed fast symlinks were created in a specific way, but that's not true on these really old file systems. The new behavior is apparently needed only with the large EA inode feature. Revert to the old behavior if the large EA inode feature is not set. This makes my old VM boot again. Fixes: `407cd7fb83` (ext4: change fast symlink test to not rely on i_blocks) Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Andreas Dilger <adilger@dilger.ca> Cc: stable@vger.kernel.org	2017-12-03 20:38:01 -05:00
Linus Torvalds	1751e8a6cb	Rename superblock flags (MS_xyz -> SB_xyz) This is a pure automated search-and-replace of the internal kernel superblock flags. The s_flags are now called SB_, with the names and the values for the moment mirroring the MS_ flags that they're equivalent to. Note how the MS_xyz flags are the ones passed to the mount system call, while the SB_xyz flags are what we then use in sb->s_flags. The script to do this was: # places to look in; re security/: it generally should not* be # touched (that stuff parses mount(2) arguments directly), but # there are two places where we really deal with superblock flags. FILES="drivers/mtd drivers/staging/lustre fs ipc mm \ include/linux/fs.h include/uapi/linux/bfs_fs.h \ security/apparmor/apparmorfs.c security/apparmor/include/lib.h" # the list of MS_... constants SYMS="RDONLY NOSUID NODEV NOEXEC SYNCHRONOUS REMOUNT MANDLOCK \ DIRSYNC NOATIME NODIRATIME BIND MOVE REC VERBOSE SILENT \ POSIXACL UNBINDABLE PRIVATE SLAVE SHARED RELATIME KERNMOUNT \ I_VERSION STRICTATIME LAZYTIME SUBMOUNT NOREMOTELOCK NOSEC BORN \ ACTIVE NOUSER" SED_PROG= for i in $SYMS; do SED_PROG="$SED_PROG -e s/MS_$i/SB_$i/g"; done # we want files that contain at least one of MS_..., # with fs/namespace.c and fs/pnode.c excluded. L=$(for i in $SYMS; do git grep -w -l MS_$i $FILES; done\| sort\|uniq\|grep -v '^fs/namespace.c'\|grep -v '^fs/pnode.c') for f in $L; do sed -i $f $SED_PROG; done Requested-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2017-11-27 13:05:09 -08:00
Linus Torvalds	93f30c73ec	Merge branch 'misc.compat' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull compat and uaccess updates from Al Viro: - {get,put}_compat_sigset() series - assorted compat ioctl stuff - more set_fs() elimination - a few more timespec64 conversions - several removals of pointless access_ok() in places where it was followed only by non-__ variants of primitives * 'misc.compat' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (24 commits) coredump: call do_unlinkat directly instead of sys_unlink fs: expose do_unlinkat for built-in callers ext4: take handling of EXT4_IOC_GROUP_ADD into a helper, get rid of set_fs() ipmi: get rid of pointless access_ok() pi433: sanitize ioctl cxlflash: get rid of pointless access_ok() mtdchar: get rid of pointless access_ok() r128: switch compat ioctls to drm_ioctl_kernel() selection: get rid of field-by-field copyin VT_RESIZEX: get rid of field-by-field copyin i2c compat ioctls: move to ->compat_ioctl() sched_rr_get_interval(): move compat to native, get rid of set_fs() mips: switch to {get,put}_compat_sigset() sparc: switch to {get,put}_compat_sigset() s390: switch to {get,put}_compat_sigset() ppc: switch to {get,put}_compat_sigset() parisc: switch to {get,put}_compat_sigset() get_compat_sigset() get rid of {get,put}_compat_itimerspec() io_getevents: Use timespec64 to represent timeouts ...	2017-11-17 11:54:55 -08:00
Linus Torvalds	a3841f94c7	libnvdimm for 4.15 * Introduce MAP_SYNC and MAP_SHARED_VALIDATE, a mechanism to enable 'userspace flush' of persistent memory updates via filesystem-dax mappings. It arranges for any filesystem metadata updates that may be required to satisfy a write fault to also be flushed ("on disk") before the kernel returns to userspace from the fault handler. Effectively every write-fault that dirties metadata completes an fsync() before returning from the fault handler. The new MAP_SHARED_VALIDATE mapping type guarantees that the MAP_SYNC flag is validated as supported by the filesystem's ->mmap() file operation. * Add support for the standard ACPI 6.2 label access methods that replace the NVDIMM_FAMILY_INTEL (vendor specific) label methods. This enables interoperability with environments that only implement the standardized methods. * Add support for the ACPI 6.2 NVDIMM media error injection methods. * Add support for the NVDIMM_FAMILY_INTEL v1.6 DIMM commands for latch last shutdown status, firmware update, SMART error injection, and SMART alarm threshold control. * Cleanup physical address information disclosures to be root-only. * Fix revalidation of the DIMM "locked label area" status to support dynamic unlock of the label area. * Expand unit test infrastructure to mock the ACPI 6.2 Translate SPA (system-physical-address) command and error injection commands. Acknowledgements that came after the commits were pushed to -next: `957ac8c421` dax: fix PMD faults on zero-length files Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com> `a39e596baa` xfs: support for synchronous DAX faults Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> `7b565c9f96` xfs: Implement xfs_filemap_pfn_mkwrite() using __xfs_filemap_fault() Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> -----BEGIN PGP SIGNATURE----- iQIcBAABAgAGBQJaDfvcAAoJEB7SkWpmfYgCk7sP/2qJhBH+VTTdg2osDnhAdAhI co/AGEmsHFlUCMBb/Ek7UnMAmhBYiJU2q4ywPsNFBpusXpMlqNy5Iwo7k4/wQHE/ SJcIM0g4zg0ViFuUhwV+C2T0R5UzFR8JLd9EYWj/YS6aJpurtotm5l4UStaM0Hzo AhxSXJLrBDuqCpbOxbctfiGEmdRL7aRfBEAARTNRKBn/iXxJUcYHlp62rtXQS+t4 I6LC/URCWTNTTMGmzW6TRsgSD9WMfd19xKcGzN3qL6ee0KFccxN4ctFqHA/sFGOh iYLeR0XJUjJxyp+PkWGteXPVZL0Kj3bD/lSTG+Co5bm/ra8a/sh3TSFfgFyoBZD1 EqMN8Ryf80hGp3FabeH2Iw2SviYPZpHSWgjddjxLD0RA6OmpzINc+Wm8eqApjMME sbZDTOijiab4QMQ0XamF4GuDHyQtawv5Y/w2Ehhl1tmiqW+5tKhsKqxkQt+/V3Yt RTVSRe2Pkway66b+cD64IdQ6L2tyonPnmi5IzgkKOhlOEGomy+4/U2Jt2bMbhzq6 ymszKmXp2XI8P06wU8sHrIUeXO5I9qoKn/fZA73Eb8aIzgJe3tBE/5+Ab7RG6HB9 1OVfcMWoXU1gNgNktTs63X1Lsg4aW9kt/K4fPHHcqUcaliEJpJTlAbg9GLF2buoW nQ+0fTRgMRihE3ZA0Fs3 =h2vZ -----END PGP SIGNATURE----- Merge tag 'libnvdimm-for-4.15' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm Pull libnvdimm and dax updates from Dan Williams: "Save for a few late fixes, all of these commits have shipped in -next releases since before the merge window opened, and 0day has given a build success notification. The ext4 touches came from Jan, and the xfs touches have Darrick's reviewed-by. An xfstest for the MAP_SYNC feature has been through a few round of reviews and is on track to be merged. - Introduce MAP_SYNC and MAP_SHARED_VALIDATE, a mechanism to enable 'userspace flush' of persistent memory updates via filesystem-dax mappings. It arranges for any filesystem metadata updates that may be required to satisfy a write fault to also be flushed ("on disk") before the kernel returns to userspace from the fault handler. Effectively every write-fault that dirties metadata completes an fsync() before returning from the fault handler. The new MAP_SHARED_VALIDATE mapping type guarantees that the MAP_SYNC flag is validated as supported by the filesystem's ->mmap() file operation. - Add support for the standard ACPI 6.2 label access methods that replace the NVDIMM_FAMILY_INTEL (vendor specific) label methods. This enables interoperability with environments that only implement the standardized methods. - Add support for the ACPI 6.2 NVDIMM media error injection methods. - Add support for the NVDIMM_FAMILY_INTEL v1.6 DIMM commands for latch last shutdown status, firmware update, SMART error injection, and SMART alarm threshold control. - Cleanup physical address information disclosures to be root-only. - Fix revalidation of the DIMM "locked label area" status to support dynamic unlock of the label area. - Expand unit test infrastructure to mock the ACPI 6.2 Translate SPA (system-physical-address) command and error injection commands. Acknowledgements that came after the commits were pushed to -next: - `957ac8c421` ("dax: fix PMD faults on zero-length files"): Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com> - `a39e596baa` ("xfs: support for synchronous DAX faults") and `7b565c9f96` ("xfs: Implement xfs_filemap_pfn_mkwrite() using __xfs_filemap_fault()") Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>" * tag 'libnvdimm-for-4.15' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm: (49 commits) acpi, nfit: add 'Enable Latch System Shutdown Status' command support dax: fix general protection fault in dax_alloc_inode dax: fix PMD faults on zero-length files dax: stop requiring a live device for dax_flush() brd: remove dax support dax: quiet bdev_dax_supported() fs, dax: unify IOMAP_F_DIRTY read vs write handling policy in the dax core tools/testing/nvdimm: unit test clear-error commands acpi, nfit: validate commands against the device type tools/testing/nvdimm: stricter bounds checking for error injection commands xfs: support for synchronous DAX faults xfs: Implement xfs_filemap_pfn_mkwrite() using __xfs_filemap_fault() ext4: Support for synchronous DAX faults ext4: Simplify error handling in ext4_dax_huge_fault() dax: Implement dax_finish_sync_fault() dax, iomap: Add support for synchronous faults mm: Define MAP_SYNC and VM_SYNC flags dax: Allow tuning whether dax_insert_mapping_entry() dirties entry dax: Allow dax_iomap_fault() to return pfn dax: Fix comment describing dax_iomap_fault() ...	2017-11-17 09:51:57 -08:00
Mel Gorman	8667982014	mm, pagevec: remove cold parameter for pagevecs Every pagevec_init user claims the pages being released are hot even in cases where it is unlikely the pages are hot. As no one cares about the hotness of pages being released to the allocator, just ditch the parameter. No performance impact is expected as the overhead is marginal. The parameter is removed simply because it is a bit stupid to have a useless parameter copied everywhere. Link: http://lkml.kernel.org/r/20171018075952.10627-6-mgorman@techsingularity.net Signed-off-by: Mel Gorman <mgorman@techsingularity.net> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: Andi Kleen <ak@linux.intel.com> Cc: Dave Chinner <david@fromorbit.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Jan Kara <jack@suse.cz> Cc: Johannes Weiner <hannes@cmpxchg.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2017-11-15 18:21:06 -08:00
Jan Kara	67fd707f46	mm: remove nr_pages argument from pagevec_lookup_{,range}_tag() All users of pagevec_lookup() and pagevec_lookup_range() now pass PAGEVEC_SIZE as a desired number of pages. Just drop the argument. Link: http://lkml.kernel.org/r/20171009151359.31984-15-jack@suse.cz Signed-off-by: Jan Kara <jack@suse.cz> Reviewed-by: Daniel Jordan <daniel.m.jordan@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2017-11-15 18:21:04 -08:00
Jan Kara	dc7f3e868a	ext4: use pagevec_lookup_range_tag() We want only pages from given range in ext4_writepages(). Use pagevec_lookup_range_tag() instead of pagevec_lookup_tag() and remove unnecessary code. Link: http://lkml.kernel.org/r/20171009151359.31984-5-jack@suse.cz Signed-off-by: Jan Kara <jack@suse.cz> Reviewed-by: Daniel Jordan <daniel.m.jordan@oracle.com> Cc: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2017-11-15 18:21:03 -08:00
Linus Torvalds	ae9a8c4bdc	Add support for online resizing of file systems with bigalloc. Fix a two data corruption bugs involving DAX, as well as a corruption bug after a crash during a racing fallocate and delayed allocation. Finally, a number of cleanups and optimizations. -----BEGIN PGP SIGNATURE----- iQEzBAABCAAdFiEEK2m5VNv+CHkogTfJ8vlZVpUNgaMFAloJCiEACgkQ8vlZVpUN gaOahAgAhcgdPagn/B5w+6vKFdH+hOJLKyGI0adGDyWD9YBXN0wFQvliVgXrTKei hxW2GdQGc6yHv9mOjvD+4Fn2AnTZk8F3GtG6zdqRM08JGF/IN2Jax2boczG/XnUz rT9cd3ic2Ff0KaUX+Yos55QwomTh5CAeRPgvB69o9D6L4VJzTlsWKSOBR19FmrSG NDmzZibgWmHcqzW9Bq8ZrXXx+KB42kUlc8tYYm2n6MTaE0LMvp3d9XcFcnm/I7Bk MGa2d3/3FArGD6Rkl/E82MXMSElOHJnY6jGYSDaadUeMI5FXkA6tECOSJYXqShdb ZJwkOBwfv2lbYZJxIBJTy/iA6zdsoQ== =ZzaJ -----END PGP SIGNATURE----- Merge tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 Pull ext4 updates from Ted Ts'o: - Add support for online resizing of file systems with bigalloc - Fix a two data corruption bugs involving DAX, as well as a corruption bug after a crash during a racing fallocate and delayed allocation. - Finally, a number of cleanups and optimizations. * tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: ext4: improve smp scalability for inode generation ext4: add support for online resizing with bigalloc ext4: mention noload when recovering on read-only device Documentation: fix little inconsistencies ext4: convert timers to use timer_setup() jbd2: convert timers to use timer_setup() ext4: remove duplicate extended attributes defs ext4: add ext4_should_use_dax() ext4: add sanity check for encryption + DAX ext4: prevent data corruption with journaling + DAX ext4: prevent data corruption with inline data + DAX ext4: fix interaction between i_size, fallocate, and delalloc after a crash ext4: retry allocations conservatively ext4: Switch to iomap for SEEK_HOLE / SEEK_DATA ext4: Add iomap support for inline data iomap: Add IOMAP_F_DATA_INLINE flag iomap: Switch from blkno to disk offset	2017-11-14 12:59:42 -08:00
Linus Torvalds	32190f0afb	fscrypt: lots of cleanups, mostly courtesy by Eric Biggers -----BEGIN PGP SIGNATURE----- iQEzBAABCAAdFiEEK2m5VNv+CHkogTfJ8vlZVpUNgaMFAloI8AUACgkQ8vlZVpUN gaMdjgf8CCW7UhPjoZYwF8sUNtAaX9+JZT1maOcXUhpJ3vRQiRn+AzRH6yBYMm79 +NZBwVlk4dlEe55Wh4yFIStMAstqzCrke4C9CSbExjgHNsJdU4znyYuLRMbLfyO0 6c4NObiAIKJdW1/te1aN90keGC6min8pBZot+FqZsRr+Kq2+IOtM43JAv7efOLev v3LCjUf9JKxatoB8tgw4AJRa1p18p7D2APWTG05VlFq63TjhVIYNvvwcQlizLwGY cuEq3X59FbFdX06fJnucujU3WP3ES4/3rhufBK4NNaec5e5dbnH2KlAx7J5SyMIZ 0qUFB/dmXDSb3gsfScSGo1F71Ad0CA== =asAm -----END PGP SIGNATURE----- Merge tag 'fscrypt-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/fscrypt Pull fscrypt updates from Ted Ts'o: "Lots of cleanups, mostly courtesy by Eric Biggers" * tag 'fscrypt-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/fscrypt: fscrypt: lock mutex before checking for bounce page pool fscrypt: add a documentation file for filesystem-level encryption ext4: switch to fscrypt_prepare_setattr() ext4: switch to fscrypt_prepare_lookup() ext4: switch to fscrypt_prepare_rename() ext4: switch to fscrypt_prepare_link() ext4: switch to fscrypt_file_open() fscrypt: new helper function - fscrypt_prepare_setattr() fscrypt: new helper function - fscrypt_prepare_lookup() fscrypt: new helper function - fscrypt_prepare_rename() fscrypt: new helper function - fscrypt_prepare_link() fscrypt: new helper function - fscrypt_file_open() fscrypt: new helper function - fscrypt_require_key() fscrypt: remove unneeded empty fscrypt_operations structs fscrypt: remove ->is_encrypted() fscrypt: switch from ->is_encrypted() to IS_ENCRYPTED() fs, fscrypt: add an S_ENCRYPTED inode flag fscrypt: clean up include file mess	2017-11-14 11:35:15 -08:00
Dan Williams	aaa422c4c3	fs, dax: unify IOMAP_F_DIRTY read vs write handling policy in the dax core While reviewing whether MAP_SYNC should strengthen its current guarantee of syncing writes from the initiating process to also include third-party readers observing dirty metadata, Dave pointed out that the check of IOMAP_WRITE is misplaced. The policy of what to with IOMAP_F_DIRTY should be separated from the generic filesystem mechanism of reporting dirty metadata. Move this policy to the fs-dax core to simplify the per-filesystem iomap handlers, and further centralize code that implements the MAP_SYNC policy. This otherwise should not change behavior, it just makes it easier to change behavior in the future. Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Cc: Ross Zwisler <ross.zwisler@linux.intel.com> Reported-by: Dave Chinner <david@fromorbit.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>	2017-11-13 16:38:44 -08:00
Al Viro	e145b35bb9	ext4: take handling of EXT4_IOC_GROUP_ADD into a helper, get rid of set_fs() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2017-11-10 08:48:44 -05:00
Theodore Ts'o	2325306802	ext4: improve smp scalability for inode generation ->s_next_generation is protected by s_next_gen_lock but its usage pattern is very primitive. We don't actually need sequentially increasing new generation numbers, so let's use prandom_u32() instead. Reported-by: Dmitry Monakhov <dmonakhov@openvz.org> Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2017-11-08 22:23:20 -05:00
Jan Kara	b8a6176c21	ext4: Support for synchronous DAX faults We return IOMAP_F_DIRTY flag from ext4_iomap_begin() when asked to prepare blocks for writing and the inode has some uncommitted metadata changes. In the fault handler ext4_dax_fault() we then detect this case (through VM_FAULT_NEEDDSYNC return value) and call helper dax_finish_sync_fault() to flush metadata changes and insert page table entry. Note that this will also dirty corresponding radix tree entry which is what we want - fsync(2) will still provide data integrity guarantees for applications not using userspace flushing. And applications using userspace flushing can avoid calling fsync(2) and thus avoid the performance overhead. Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Dan Williams <dan.j.williams@intel.com>	2017-11-03 06:26:26 -07:00
Jan Kara	497f6926d8	ext4: Simplify error handling in ext4_dax_huge_fault() If transaction starting fails, just bail out of the function immediately instead of checking for that condition throughout the function. Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Dan Williams <dan.j.williams@intel.com>	2017-11-03 06:26:26 -07:00
Jan Kara	9a0dd42251	dax: Allow dax_iomap_fault() to return pfn For synchronous page fault dax_iomap_fault() will need to return PFN which will then need to be inserted into page tables after fsync() completes. Add necessary parameter to dax_iomap_fault(). Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Dan Williams <dan.j.williams@intel.com>	2017-11-03 06:26:24 -07:00
Greg Kroah-Hartman	b24413180f	License cleanup: add SPDX GPL-2.0 license identifier to files with no license Many source files in the tree are missing licensing information, which makes it harder for compliance tools to determine the correct license. By default all files without license information are under the default license of the kernel, which is GPL version 2. Update the files which contain no license information with the 'GPL-2.0' SPDX license identifier. The SPDX identifier is a legally binding shorthand, which can be used instead of the full boiler plate text. This patch is based on work done by Thomas Gleixner and Kate Stewart and Philippe Ombredanne. How this work was done: Patches were generated and checked against linux-4.14-rc6 for a subset of the use cases: - file had no licensing information it it. - file was a /uapi/ one with no licensing information in it, - file was a /uapi/ one with existing licensing information, Further patches will be generated in subsequent months to fix up cases where non-standard license headers were used, and references to license had to be inferred by heuristics based on keywords. The analysis to determine which SPDX License Identifier to be applied to a file was done in a spreadsheet of side by side results from of the output of two independent scanners (ScanCode & Windriver) producing SPDX tag:value files created by Philippe Ombredanne. Philippe prepared the base worksheet, and did an initial spot review of a few 1000 files. The 4.13 kernel was the starting point of the analysis with 60,537 files assessed. Kate Stewart did a file by file comparison of the scanner results in the spreadsheet to determine which SPDX license identifier(s) to be applied to the file. She confirmed any determination that was not immediately clear with lawyers working with the Linux Foundation. Criteria used to select files for SPDX license identifier tagging was: - Files considered eligible had to be source code files. - Make and config files were included as candidates if they contained >5 lines of source - File already had some variant of a license header in it (even if <5 lines). All documentation files were explicitly excluded. The following heuristics were used to determine which SPDX license identifiers to apply. - when both scanners couldn't find any license traces, file was considered to have no license information in it, and the top level COPYING file license applied. For non /uapi/ files that summary was: SPDX license identifier # files ---------------------------------------------------\|------- GPL-2.0 11139 and resulted in the first patch in this series. If that file was a /uapi/ path one, it was "GPL-2.0 WITH Linux-syscall-note" otherwise it was "GPL-2.0". Results of that was: SPDX license identifier # files ---------------------------------------------------\|------- GPL-2.0 WITH Linux-syscall-note 930 and resulted in the second patch in this series. - if a file had some form of licensing information in it, and was one of the /uapi/ ones, it was denoted with the Linux-syscall-note if any GPL family license was found in the file or had no licensing in it (per prior point). Results summary: SPDX license identifier # files ---------------------------------------------------\|------ GPL-2.0 WITH Linux-syscall-note 270 GPL-2.0+ WITH Linux-syscall-note 169 ((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause) 21 ((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause) 17 LGPL-2.1+ WITH Linux-syscall-note 15 GPL-1.0+ WITH Linux-syscall-note 14 ((GPL-2.0+ WITH Linux-syscall-note) OR BSD-3-Clause) 5 LGPL-2.0+ WITH Linux-syscall-note 4 LGPL-2.1 WITH Linux-syscall-note 3 ((GPL-2.0 WITH Linux-syscall-note) OR MIT) 3 ((GPL-2.0 WITH Linux-syscall-note) AND MIT) 1 and that resulted in the third patch in this series. - when the two scanners agreed on the detected license(s), that became the concluded license(s). - when there was disagreement between the two scanners (one detected a license but the other didn't, or they both detected different licenses) a manual inspection of the file occurred. - In most cases a manual inspection of the information in the file resulted in a clear resolution of the license that should apply (and which scanner probably needed to revisit its heuristics). - When it was not immediately clear, the license identifier was confirmed with lawyers working with the Linux Foundation. - If there was any question as to the appropriate license identifier, the file was flagged for further research and to be revisited later in time. In total, over 70 hours of logged manual review was done on the spreadsheet to determine the SPDX license identifiers to apply to the source files by Kate, Philippe, Thomas and, in some cases, confirmation by lawyers working with the Linux Foundation. Kate also obtained a third independent scan of the 4.13 code base from FOSSology, and compared selected files where the other two scanners disagreed against that SPDX file, to see if there was new insights. The Windriver scanner is based on an older version of FOSSology in part, so they are related. Thomas did random spot checks in about 500 files from the spreadsheets for the uapi headers and agreed with SPDX license identifier in the files he inspected. For the non-uapi files Thomas did random spot checks in about 15000 files. In initial set of patches against 4.14-rc6, 3 files were found to have copy/paste license identifier errors, and have been fixed to reflect the correct identifier. Additionally Philippe spent 10 hours this week doing a detailed manual inspection and review of the 12,461 patched files from the initial patch version early this week with: - a full scancode scan run, collecting the matched texts, detected license ids and scores - reviewing anything where there was a license detected (about 500+ files) to ensure that the applied SPDX license was correct - reviewing anything where there was no detection but the patch license was not GPL-2.0 WITH Linux-syscall-note to ensure that the applied SPDX license was correct This produced a worksheet with 20 files needing minor correction. This worksheet was then exported into 3 different .csv files for the different types of files to be modified. These .csv files were then reviewed by Greg. Thomas wrote a script to parse the csv files and add the proper SPDX tag to the file, in the format that the file expected. This script was further refined by Greg based on the output to detect more types of files automatically and to distinguish between header and source .c files (which need different comment types.) Finally Greg ran the script using the .csv files to generate the patches. Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org> Reviewed-by: Philippe Ombredanne <pombredanne@nexb.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2017-11-02 11:10:55 +01:00
harshads	d77147ff44	ext4: add support for online resizing with bigalloc This patch adds support for online resizing on bigalloc file system by implementing EXT4_IOC_RESIZE_FS ioctl. Old resize interfaces (add block groups and extend last block group) are left untouched. Tests performed with cluster sizes of 1, 2, 4 and 8 blocks (of size 4k) per cluster. I will add these tests to xfstests. Signed-off-by: Harshad Shirwadkar <harshads@google.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2017-10-29 09:38:46 -04:00
Eric Biggers	3ce2b8ddd8	ext4: switch to fscrypt_prepare_setattr() Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2017-10-18 20:21:58 -04:00
Eric Biggers	8990427501	ext4: switch to fscrypt_prepare_lookup() Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2017-10-18 20:21:58 -04:00
Eric Biggers	07543d164b	ext4: switch to fscrypt_prepare_rename() Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2017-10-18 20:21:57 -04:00
Eric Biggers	697251816d	ext4: switch to fscrypt_prepare_link() Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2017-10-18 20:21:57 -04:00
Eric Biggers	09a5c31c91	ext4: switch to fscrypt_file_open() Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2017-10-18 20:21:57 -04:00
Eric Biggers	ffcc41829a	fscrypt: remove unneeded empty fscrypt_operations structs In the case where a filesystem has been configured without encryption support, there is no longer any need to initialize ->s_cop at all, since none of the methods are ever called. Reviewed-by: Chao Yu <yuchao0@huawei.com> Acked-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2017-10-18 19:52:37 -04:00
Eric Biggers	f7293e48bb	fscrypt: remove ->is_encrypted() Now that all callers of fscrypt_operations.is_encrypted() have been switched to IS_ENCRYPTED(), remove ->is_encrypted(). Reviewed-by: Chao Yu <yuchao0@huawei.com> Acked-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2017-10-18 19:52:37 -04:00
Eric Biggers	2ee6a576be	fs, fscrypt: add an S_ENCRYPTED inode flag Introduce a flag S_ENCRYPTED which can be set in ->i_flags to indicate that the inode is encrypted using the fscrypt (fs/crypto/) mechanism. Checking this flag will give the same information that inode->i_sb->s_cop->is_encrypted(inode) currently does, but will be more efficient. This will be useful for adding higher-level helper functions for filesystems to use. For example we'll be able to replace this: if (ext4_encrypted_inode(inode)) { ret = fscrypt_get_encryption_info(inode); if (ret) return ret; if (!fscrypt_has_encryption_key(inode)) return -ENOKEY; } with this: ret = fscrypt_require_key(inode); if (ret) return ret; ... since we'll be able to retain the fast path for unencrypted files as a single flag check, using an inline function. This wasn't possible before because we'd have had to frequently call through the ->i_sb->s_cop->is_encrypted function pointer, even when the encryption support was disabled or not being used. Note: we don't define S_ENCRYPTED to 0 if CONFIG_FS_ENCRYPTION is disabled because we want to continue to return an error if an encrypted file is accessed without encryption support, rather than pretending that it is unencrypted. Reviewed-by: Chao Yu <yuchao0@huawei.com> Acked-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2017-10-18 19:52:36 -04:00

1 2 3 4 5 ...

3267 Commits