mirror of https://gitee.com/openkylin/linux.git
187 Commits
Author | SHA1 | Message | Date |
---|---|---|---|
Allison Henderson | d29f781c32 |
xfs: Remove all strlen in all xfs_attr_* functions for attr names.
This helps to pre-simplify the extra handling of the null terminator in delayed operations which use memcpy rather than strlen. Later when we introduce parent pointers, attribute names will become binary, so strlen will not work at all. Removing uses of strlen now will help reduce complexities later Signed-off-by: Allison Collins <allison.henderson@oracle.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> |
|
Christoph Hellwig | de7a866fd4 |
xfs: merge the projid fields in struct xfs_icdinode
There is no point in splitting the fields like this in an purely in-memory structure. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> |
|
Christoph Hellwig | 8d2d878db8 |
xfs: use a struct timespec64 for the in-core crtime
struct xfs_icdinode is purely an in-memory data structure, so don't use a log on-disk structure for it. This simplifies the code a bit, and also reduces our include hell slightly. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> [darrick: fix a minor indenting problem in xfs_trans_ichgtime] Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> |
|
Darrick J. Wong | a71895c5da |
xfs: convert open coded corruption check to use XFS_IS_CORRUPT
Convert the last of the open coded corruption check and report idioms to use the XFS_IS_CORRUPT macro. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de> |
|
Christoph Hellwig | 957ee13e20 |
xfs: remove the now unused dir ops infrastructure
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> |
|
Christoph Hellwig | 3b34441309 |
xfs: move the node header size to struct xfs_da_geometry
Move the node header size field to struct xfs_da_geometry, and remove the now unused non-directory dir ops infrastructure. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> |
|
Darrick J. Wong | a5155b870d |
xfs: always log corruption errors
Make sure we log something to dmesg whenever we return -EFSCORRUPTED up the call stack. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> |
|
Christoph Hellwig | 7c6b94b1b5 |
xfs: reverse the polarity of XFS_MOUNT_COMPAT_IOSIZE
Replace XFS_MOUNT_COMPAT_IOSIZE with an inverted XFS_MOUNT_LARGEIO flag that makes the usage more clear. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Eric Sandeen <sandeen@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> |
|
Christoph Hellwig | 3274d00801 |
xfs: rename the XFS_MOUNT_DFLT_IOSIZE option to
Make the flag match the mount option and usage. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> |
|
Christoph Hellwig | 5da8a07c79 |
xfs: rename the m_writeio_* fields in struct xfs_mount
Use the allocsize name to match the mount option and usage instead. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> |
|
Christoph Hellwig | 3cd1d18b0d |
xfs: remove the m_readio_* fields in struct xfs_mount
m_readio_blocks is entirely unused, and m_readio_blocks is only used in xfs_stat_blksize in a max statements that is a no-op as it always has the same value as m_writeio_log. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> |
|
Christoph Hellwig | dd2d535e3f |
xfs: cleanup calculating the stat optimal I/O size
Move xfs_preferred_iosize to xfs_iops.c, unobsfucate it and also handle the realtime special case in the helper. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Eric Sandeen <sandeen@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> |
|
Christoph Hellwig | 30fa529e3b |
xfs: add a xfs_inode_buftarg helper
Add a new xfs_inode_buftarg helper that gets the data I/O buftarg for a given inode. Replace the existing xfs_find_bdev_for_inode and xfs_find_daxdev_for_inode helpers with this new general one and cleanup some of the callers. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> |
|
Christoph Hellwig | f150b42343 |
xfs: split the iomap ops for buffered vs direct writes
Instead of lots of magic conditionals in the main write_begin handler this make the intent very clear. Thing will become even better once we support delayed allocations for extent size hints and realtime allocations. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> |
|
Christoph Hellwig | 690c2a3887 |
xfs: split out a new set of read-only iomap ops
Start untangling xfs_file_iomap_begin by splitting out the read-only case into its own set of iomap_ops with a very simply iomap_begin helper. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> |
|
Darrick J. Wong | 1fb254aa98 |
xfs: fix missing ILOCK unlock when xfs_setattr_nonsize fails due to EDQUOT
Benjamin Moody reported to Debian that XFS partially wedges when a chgrp
fails on account of being out of disk quota. I ran his reproducer
script:
# adduser dummy
# adduser dummy plugdev
# dd if=/dev/zero bs=1M count=100 of=test.img
# mkfs.xfs test.img
# mount -t xfs -o gquota test.img /mnt
# mkdir -p /mnt/dummy
# chown -c dummy /mnt/dummy
# xfs_quota -xc 'limit -g bsoft=100k bhard=100k plugdev' /mnt
(and then as user dummy)
$ dd if=/dev/urandom bs=1M count=50 of=/mnt/dummy/foo
$ chgrp plugdev /mnt/dummy/foo
and saw:
================================================
WARNING: lock held when returning to user space!
5.3.0-rc5 #rc5 Tainted: G W
------------------------------------------------
chgrp/47006 is leaving the kernel with locks still held!
1 lock held by chgrp/47006:
#0: 000000006664ea2d (&xfs_nondir_ilock_class){++++}, at: xfs_ilock+0xd2/0x290 [xfs]
...which is clearly caused by xfs_setattr_nonsize failing to unlock the
ILOCK after the xfs_qm_vop_chown_reserve call fails. Add the missing
unlock.
Reported-by: benjamin.moody@gmail.com
Fixes:
|
|
Eric Sandeen | 250d4b4c40 |
xfs: remove unused header files
There are many, many xfs header files which are included but unneeded (or included twice) in the xfs code, so remove them. nb: xfs_linux.h includes about 9 headers for everyone, so those explicit includes get removed by this. I'm not sure what the preference is, but if we wanted explicit includes everywhere, a followup patch could remove those xfs_*.h includes from xfs_linux.h and move them into the files that need them. Or it could be left as-is. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> |
|
Luis R. Rodriguez | 1b9598c8fb |
xfs: fix reporting supported extra file attributes for statx()
statx(2) notes that any attribute that is not indicated as supported by stx_attributes_mask has no usable value. Commit |
|
Darrick J. Wong | c4a6bf7f6c |
xfs: don't ever put nlink > 0 inodes on the unlinked list
When XFS creates an O_TMPFILE file, the inode is created with nlink = 1, put on the unlinked list, and then the VFS sets nlink = 0 in d_tmpfile. If we crash before anything logs the inode (it's dirty incore but the vfs doesn't tell us it's dirty so we never log that change), the iunlink processing part of recovery will then explode with a pile of: XFS: Assertion failed: VFS_I(ip)->i_nlink == 0, file: fs/xfs/xfs_log_recover.c, line: 5072 Worse yet, since nlink is nonzero, the inodes also don't get cleaned up and they just leak until the next xfs_repair run. Therefore, change xfs_iunlink to require that inodes being put on the unlinked list have nlink == 0, change the tmpfile callers to instantiate nodes that way, and set the nlink to 1 just prior to calling d_tmpfile. Fix the comment for xfs_iunlink while we're at it. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de> |
|
Darrick J. Wong | ae29478766 |
xfs: don't crash the vfs on a garbage inline symlink
The VFS routine that calls ->get_link blindly copies whatever's returned into the user's buffer. If we return a NULL pointer, the vfs will crash on the null pointer. Therefore, return -EFSCORRUPTED instead of blowing up the kernel. [dgc: clean up with hch's suggestions] Reported-by: wen.xu@gatech.edu Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Allison Henderson <allison.henderson@oracle.com> Signed-off-by: Dave Chinner <david@fromorbit.com> |
|
Linus Torvalds | 781fca5b10 |
Changes for 4.19:
- Use extent maps to track pagecache page status instead of bufferhead state. - Refactor pagecache read and write paths to use the new iomap library functions, which enable us to drop the old bufferhead code for pagesize == blocksize filesystems. - Set up parallel per-block-per-page metadata to track subpage information that was tracked by buffer heads, which enables us to drop the old bufferhead code for pagesize > blocksize filesystems. - Tie a deferred ops control structure to a transaction so that we can take advantage of an upper-level dfops without having to plumb pointer passing through the code. - Refactor the deferred ops code to track deferred ops as part of the transaction structure (instead of as a separate data structure) so that we can simplify the scoping rules around defer_ops. - Refactor twisty delwri buffer submission code to avoid deadlocks. - Shorten and fix indenting problems in the scrub code. - Detect obviously bad summary counts at mount and fix them. - Directly associate deferred ops control structure with a transaction so that callers no longer have to manage it themselves. - Remove a couple of IRIX-era inode macros. - Remove the long-deprecated 'barrier' and 'nobarrier' mount options. - Clean up the inode fork structure a bit. - Check for bad fs summary counter values in the superblock. - Reduce COW fork lookups during writeback. - Refactor the deferred ops control structures into the transaction structure, thereby eliminating the need for transaction users to handle the deferred ops as a separate data structure. - Add the ability to repair AG headers online. - Fix a crash due to insufficient return value checking. - Various fixes and cleanups. -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEUzaAxoMeQq6m2jMV+H93GTRKtOsFAltwVGoACgkQ+H93GTRK tOt3bw//WaG1mR44Oo/mhf27MaEJK74LXeViqCH4Sdk10gujClTVnl6h33ChAyEi 7BT4x1JtwM6xOh7nPsXQy/besVxWadjQcTtAz/3U2wJoFyOX2+I27SAawrmX6jfR Hi1DxXFFK7z/8YvuZqYl3vTgxMNb7bLAUybe2sYX8q+vrQaUvl9eLQlHSaT3sxrc /lBkog1dYmbw3yjLnWYpQtC0I6Pa3ZuG/S2vpeJ2H5MADtzrRNjuC9MHZJW7tIGm +rCLm0agk8yFkEA84VvS5Afee3TppY/JBaYlsvG1rp3bs0fELAJFnzS4g/QDbbsX HAKPcMICJksF4C9y0Xb7wXPz/4PKur5/OSuGXN4QtOivOEoAdWfh2PLInqAjo/Le mO92PdkBucfVqJzfEC2q2QAnGIaJlG8txhAz87wZ1YfZDQQlJDy385Z9GQXfUpy5 /1xH7V0cze1ZBSxWSddSFg0gCtaWSerfp0CmAG3A+HWKIN6c/ZNSCrqdq0DBC99D qOn6ThjckZWGvz/KV5xBr/KvUYOpSeEyREtgcAN008TiUaNy4nOhWV2xgLGuPY/J ed4V2B9qVbq+l+sZyzukB8cmOXmcCey6omwJ7LqZzoTWTAtTQtM2MwhaQFUWtQG8 mCqPXJp1XyL24sn0bI1t2NuKgQcs6QEQWX3zN4DA6I+N9+sTDqo= =2G+i -----END PGP SIGNATURE----- Merge tag 'xfs-4.19-merge-6' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux Pull xfs updates from Darrick Wong: "This is the second part of the XFS changes for 4.19. The biggest changes are the removal of buffer heads frm XFS, a massive reworking of the deferred transaction operations handling code, the removal of the long defunct barrier/nobarrier mount options, and the addition of a few more online repair functions. Summary: - Use extent maps to track pagecache page status instead of bufferhead state. - Refactor pagecache read and write paths to use the new iomap library functions, which enable us to drop the old bufferhead code for pagesize == blocksize filesystems. - Set up parallel per-block-per-page metadata to track subpage information that was tracked by buffer heads, which enables us to drop the old bufferhead code for pagesize > blocksize filesystems. - Tie a deferred ops control structure to a transaction so that we can take advantage of an upper-level dfops without having to plumb pointer passing through the code. - Refactor the deferred ops code to track deferred ops as part of the transaction structure (instead of as a separate data structure) so that we can simplify the scoping rules around defer_ops. - Refactor twisty delwri buffer submission code to avoid deadlocks. - Shorten and fix indenting problems in the scrub code. - Detect obviously bad summary counts at mount and fix them. - Directly associate deferred ops control structure with a transaction so that callers no longer have to manage it themselves. - Remove a couple of IRIX-era inode macros. - Remove the long-deprecated 'barrier' and 'nobarrier' mount options. - Clean up the inode fork structure a bit. - Check for bad fs summary counter values in the superblock. - Reduce COW fork lookups during writeback. - Refactor the deferred ops control structures into the transaction structure, thereby eliminating the need for transaction users to handle the deferred ops as a separate data structure. - Add the ability to repair AG headers online. - Fix a crash due to insufficient return value checking. - Various fixes and cleanups" * tag 'xfs-4.19-merge-6' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: (155 commits) xfs: fix a null pointer dereference in xfs_bmap_extents_to_btree xfs: remove b_last_holder & associated macros iomap: Switch to offset_in_page for clarity xfs: Close race between direct IO and xfs_break_layouts() xfs: repair the AGI xfs: repair the AGFL xfs: repair the AGF xfs: remove dead error handling code in xfs_dquot_disk_alloc() xfs: use WRITE_ONCE to update if_seq xfs: fix a comment in xfs_log_reserve xfs: only validate summary counts on primary superblock xfs: substitute spaces with tabs xfs: fold dfops into the transaction xfs: always defer agfl block frees xfs: pass transaction to xfs_defer_add() xfs: replace xfs_defer_ops ->dop_pending with on-stack list xfs: cancel dfops on xfs_defer_finish() error xfs: clean out superfluous dfops dop params/vars xfs: drop dop param from xfs_defer_op_type ->finish_item() callback xfs: automatic dfops inode relogging ... |
|
Al Viro | 5bef915104 |
new helper: inode_fake_hash()
open-coded in a quite a few places... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> |
|
Darrick J. Wong | 44a8736bd2 |
xfs: clean up IRELE/iput callsites
Replace the IRELE macro with a proper function so that we can do proper typechecking and so that we can stop open-coding iput in scrub, which means that we'll be able to ftrace inode lifetimes going through scrub correctly. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> |
|
Brian Foster | c8eac49ef7 |
xfs: remove all boilerplate defer init/finish code
At this point, the transaction subsystem completely manages deferred items internally such that the common and boilerplate xfs_trans_alloc() -> xfs_defer_init() -> xfs_defer_finish() -> xfs_trans_commit() sequence can be replaced with a simple transaction allocation and commit. Remove all such boilerplate deferred ops code. In doing so, we change each case over to use the dfops in the transaction and specifically eliminate: - The on-stack dfops and associated xfs_defer_init() call, as the internal dfops is initialized on transaction allocation. - xfs_bmap_finish() calls that precede a final xfs_trans_commit() of a transaction. - xfs_defer_cancel() calls in error handlers that precede a transaction cancel. The only deferred ops calls that remain are those that are non-deterministic with respect to the final commit of the associated transaction or are open-coded due to special handling. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Bill O'Donnell <billodo@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> |
|
Brian Foster | 02dff7bf81 |
xfs: pull up dfops from xfs_itruncate_extents()
xfs_itruncate_extents[_flags]() uses a local dfops with a transaction provided by the caller. It uses hacky ->t_dfops replacement logic to avoid stomping over an already populated ->t_dfops. The latter never occurs for current callers and the logic itself is not really appropriate. Clean this up by updating all callers to initialize a dfops and to use that down in xfs_itruncate_extents(). This more closely resembles the upcoming logic where dfops will be embedded within the transaction. We can also replace the xfs_defer_init() in the xfs_itruncate_extents_flags() loop with an assert. Both dfops and firstblock should be in a valid state after xfs_defer_finish() and the inode joined to the dfops is fixed throughout the loop. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Bill O'Donnell <billodo@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> |
|
Linus Torvalds | 7a932516f5 |
vfs/y2038: inode timestamps conversion to timespec64
This is a late set of changes from Deepa Dinamani doing an automated treewide conversion of the inode and iattr structures from 'timespec' to 'timespec64', to push the conversion from the VFS layer into the individual file systems. There were no conflicts between this and the contents of linux-next until just before the merge window, when we saw multiple problems: - A minor conflict with my own y2038 fixes, which I could address by adding another patch on top here. - One semantic conflict with late changes to the NFS tree. I addressed this by merging Deepa's original branch on top of the changes that now got merged into mainline and making sure the merge commit includes the necessary changes as produced by coccinelle. - A trivial conflict against the removal of staging/lustre. - Multiple conflicts against the VFS changes in the overlayfs tree. These are still part of linux-next, but apparently this is no longer intended for 4.18 [1], so I am ignoring that part. As Deepa writes: The series aims to switch vfs timestamps to use struct timespec64. Currently vfs uses struct timespec, which is not y2038 safe. The series involves the following: 1. Add vfs helper functions for supporting struct timepec64 timestamps. 2. Cast prints of vfs timestamps to avoid warnings after the switch. 3. Simplify code using vfs timestamps so that the actual replacement becomes easy. 4. Convert vfs timestamps to use struct timespec64 using a script. This is a flag day patch. Next steps: 1. Convert APIs that can handle timespec64, instead of converting timestamps at the boundaries. 2. Update internal data structures to avoid timestamp conversions. Thomas Gleixner adds: I think there is no point to drag that out for the next merge window. The whole thing needs to be done in one go for the core changes which means that you're going to play that catchup game forever. Let's get over with it towards the end of the merge window. [1] https://www.spinics.net/lists/linux-fsdevel/msg128294.html -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAABAgAGBQJbInZAAAoJEGCrR//JCVInReoQAIlVIIMt5ZX6wmaKbrjy9Itf MfgbFihQ/djLnuSPVQ3nztcxF0d66BKHZ9puVjz6+mIHqfDvJTRwZs9nU+sOF/T1 g78fRkM1cxq6ZCkGYAbzyjyo5aC4PnSMP/NQLmwqvi0MXqqrbDoq5ZdP9DHJw39h L9lD8FM/P7T29Fgp9tq/pT5l9X8VU8+s5KQG1uhB5hii4VL6pD6JyLElDita7rg+ Z7/V7jkxIGEUWF7vGaiR1QTFzEtpUA/exDf9cnsf51OGtK/LJfQ0oiZPPuq3oA/E LSbt8YQQObc+dvfnGxwgxEg1k5WP5ekj/Wdibv/+rQKgGyLOTz6Q4xK6r8F2ahxs nyZQBdXqHhJYyKr1H1reUH3mrSgQbE5U5R1i3My0xV2dSn+vtK5vgF21v2Ku3A1G wJratdtF/kVBzSEQUhsYTw14Un+xhBLRWzcq0cELonqxaKvRQK9r92KHLIWNE7/v c0TmhFbkZA+zR8HdsaL3iYf1+0W/eYy8PcvepyldKNeW2pVk3CyvdTfY2Z87G2XK tIkK+BUWbG3drEGG3hxZ3757Ln3a9qWyC5ruD3mBVkuug/wekbI8PykYJS7Mx4s/ WNXl0dAL0Eeu1M8uEJejRAe1Q3eXoMWZbvCYZc+wAm92pATfHVcKwPOh8P7NHlfy A3HkjIBrKW5AgQDxfgvm =CZX2 -----END PGP SIGNATURE----- Merge tag 'vfs-timespec64' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/playground Pull inode timestamps conversion to timespec64 from Arnd Bergmann: "This is a late set of changes from Deepa Dinamani doing an automated treewide conversion of the inode and iattr structures from 'timespec' to 'timespec64', to push the conversion from the VFS layer into the individual file systems. As Deepa writes: 'The series aims to switch vfs timestamps to use struct timespec64. Currently vfs uses struct timespec, which is not y2038 safe. The series involves the following: 1. Add vfs helper functions for supporting struct timepec64 timestamps. 2. Cast prints of vfs timestamps to avoid warnings after the switch. 3. Simplify code using vfs timestamps so that the actual replacement becomes easy. 4. Convert vfs timestamps to use struct timespec64 using a script. This is a flag day patch. Next steps: 1. Convert APIs that can handle timespec64, instead of converting timestamps at the boundaries. 2. Update internal data structures to avoid timestamp conversions' Thomas Gleixner adds: 'I think there is no point to drag that out for the next merge window. The whole thing needs to be done in one go for the core changes which means that you're going to play that catchup game forever. Let's get over with it towards the end of the merge window'" * tag 'vfs-timespec64' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/playground: pstore: Remove bogus format string definition vfs: change inode times to use struct timespec64 pstore: Convert internal records to timespec64 udf: Simplify calls to udf_disk_stamp_to_time fs: nfs: get rid of memcpys for inode times ceph: make inode time prints to be long long lustre: Use long long type to print inode time fs: add timespec64_truncate() |
|
Linus Torvalds | a205f0c974 |
Changes since last update:
- Strengthen metadata checking to avoid ASSERTing on bad disk contents - Validate btree records that are being retrieved for clients - Strengthen root inode verification - Convert license blurbs to SPDX tags - Enable changing DAX flag on directories - Fix some writeback deadlocks in reflink - Refactor out some old xfs helpers - Move type verifiers to a separate file - Fix some fuzzer crashes - Various other bug fixes -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEUzaAxoMeQq6m2jMV+H93GTRKtOsFAlsfUegACgkQ+H93GTRK tOv0sw/9HW63zhuzGs9uihmyGvtqNeeUBfKc+3ovJ80wnhYOa0n8yYCBORS1EaMS YPk74IbD0yAak0H9ePdOpont43gGVTDox6/K8+6rPFnWtX30Z2/ckb6BWE4UfoW8 QeojpB2+aS6fqfO1wcSb3i//XRu4h90ORQY0xNkHYcN4GWwIDwPCyBf+AT9HH1E+ GFHtB3QWANZg6LRT7X0GVgz5r68lzyxX1WisJ4uAm0NwKR5zVb9NWFCcOszQ45Ky +YFw4kfgithbIHlwTpo3LrvQk7+cBhlSpWuASZOYjugxcQ2d85B/+9mF/QDnLOey ddbO6WK+wo0KZImpFvOOQZY07cO7vtWwkWHraz0PkUdaEab5rcnooLoJg9UTMZa4 WT8wM8CrX1kkFvJQCuAMV9jblovjETeYhHfG8ak8Z/lWc3WEnEBUFQiO9ZVQdiAv B02xMmpOkfi0fqRCg6li9u3CJtN+2vxPiNEME3lz5zdY5aE2aXSmCspvP3aPVZMt y1fZ90u5NONz6Q9WrIh0plEru4oynhwVuqRrnVRDPCT4X64IZXuf/fBmYqrfZGmJ K45P/LQDvfcHj3xBLhfkKv5OpXtyYgDtLSBNqYAYrcGS4sW7Z4Ts8ohqcOhF1OqR g3mFp75aO4Ekw6hFbg9CRX13G4mu80BmnRKDVwFjThkl6d0Xyxw= =SD3u -----END PGP SIGNATURE----- Merge tag 'xfs-4.18-merge-10' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux Pull more xfs updates from Darrick Wong: "Here's the second round of patches for XFS for 4.18. Most of the commits are small cleanups, bug fixes, and continued strengthening of metadata verifiers; the bulk of the diff is the conversion of the fs/xfs/ tree to use SPDX tags. This series has been run through a full xfstests run over the weekend and through a quick xfstests run against this morning's master, with no major failures reported. Summary: - Strengthen metadata checking to avoid ASSERTing on bad disk contents - Validate btree records that are being retrieved for clients - Strengthen root inode verification - Convert license blurbs to SPDX tags - Enable changing DAX flag on directories - Fix some writeback deadlocks in reflink - Refactor out some old xfs helpers - Move type verifiers to a separate file - Fix some fuzzer crashes - Various other bug fixes" * tag 'xfs-4.18-merge-10' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: (31 commits) xfs: update incore per-AG inode count xfs: replace do_mod with native operations xfs: don't call xfs_da_shrink_inode with NULL bp xfs: clean up MIN/MAX xfs: move various type verifiers to common file xfs: xfs_reflink_convert_cow() memory allocation deadlock xfs: setup VFS i_rwsem lockdep state correctly xfs: fix string handling in label get/set functions xfs: convert to SPDX license tags xfs: validate btree records on retrieval xfs: push corruption -> ESTALE conversion to xfs_nfs_get_inode() xfs: verify root inode more thoroughly xfs: verify COW extent size hint is valid in inode verifier xfs: verify extent size hint is valid in inode verifier xfs: catch bad stripe alignment configurations iomap: fsync swap files before iterating mappings xfs: use xfs_trans_getsb in xfs_sync_sb_buf xfs: don't assert on corrupted unlinked inode list xfs: explicitly pass buffer size to xfs_corruption_error xfs: don't assert when on-disk btree pointers are garbage ... |
|
Linus Torvalds | 7d3bf613e9 |
libnvdimm for 4.18
* DAX broke a fundamental assumption of truncate of file mapped pages. The truncate path assumed that it is safe to disconnect a pinned page from a file and let the filesystem reclaim the physical block. With DAX the page is equivalent to the filesystem block. Introduce dax_layout_busy_page() to enable filesystems to wait for pinned DAX pages to be released. Without this wait a filesystem could allocate blocks under active device-DMA to a new file. * DAX arranges for the block layer to be bypassed and uses dax_direct_access() + copy_to_iter() to satisfy read(2) calls. However, the memcpy_mcsafe() facility is available through the pmem block driver. In order to safely handle media errors, via the DAX block-layer bypass, introduce copy_to_iter_mcsafe(). * Fix cache management policy relative to the ACPI NFIT Platform Capabilities Structure to properly elide cache flushes when they are not necessary. The table indicates whether CPU caches are power-fail protected. Clarify that a deep flush is always performed on REQ_{FUA,PREFLUSH} requests. -----BEGIN PGP SIGNATURE----- iQIcBAABAgAGBQJbGxI7AAoJEB7SkWpmfYgCDjsP/2Lcibu9Kf4tKIzuInsle6iE 6qP29qlkpHVTpDKbhvIxTYTYL9sMU0DNUrpPCJR/EYdeyztLWDFC5EAT1wF240vf maV37s/uP331jSC/2VJnKWzBs2ztQxmKLEIQCxh6aT0qs9cbaOvJgB/WlVu+qtsl aGJFLmb6vdQacp31noU5plKrMgMA1pADyF5qx9I9K2HwowHE7T368ZEFS/3S//c3 LXmpx/Nfq52sGu/qbRbu6B1CTJhIGhmarObyQnvBYoKntK1Ov4e8DS95wD3EhNDe FuRkOCUKhjl6cFy7QVWh1ct1bFm84ny+b4/AtbpOmv9l/+0mveJ7e+5mu8HQTifT wYiEe2xzXJ+OG/xntv8SvlZKMpjP3BqI0jYsTutsjT4oHrciiXdXM186cyS+BiGp KtFmWyncQJgfiTq6+Hj5XpP9BapNS+OYdYgUagw9ZwzdzptuGFYUMSVOBrYrn6c/ fwqtxjubykJoW0P3pkIoT91arFSea7nxOKnGwft06imQ7TwR4ARsI308feQ9itJq 2P2e7/20nYMsw2aRaUDDA70Yu+Lagn1m8WL87IybUGeUDLb1BAkjphAlWa6COJ+u PhvAD2tvyM9m0c7O5Mytvz7iWKG6SVgatoAyOPkaeplQK8khZ+wEpuK58sO6C1w8 4GBvt9ri9i/Ww/A+ppWs =4bfw -----END PGP SIGNATURE----- Merge tag 'libnvdimm-for-4.18' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm Pull libnvdimm updates from Dan Williams: "This adds a user for the new 'bytes-remaining' updates to memcpy_mcsafe() that you already received through Ingo via the x86-dax- for-linus pull. Not included here, but still targeting this cycle, is support for handling memory media errors (poison) consumed via userspace dax mappings. Summary: - DAX broke a fundamental assumption of truncate of file mapped pages. The truncate path assumed that it is safe to disconnect a pinned page from a file and let the filesystem reclaim the physical block. With DAX the page is equivalent to the filesystem block. Introduce dax_layout_busy_page() to enable filesystems to wait for pinned DAX pages to be released. Without this wait a filesystem could allocate blocks under active device-DMA to a new file. - DAX arranges for the block layer to be bypassed and uses dax_direct_access() + copy_to_iter() to satisfy read(2) calls. However, the memcpy_mcsafe() facility is available through the pmem block driver. In order to safely handle media errors, via the DAX block-layer bypass, introduce copy_to_iter_mcsafe(). - Fix cache management policy relative to the ACPI NFIT Platform Capabilities Structure to properly elide cache flushes when they are not necessary. The table indicates whether CPU caches are power-fail protected. Clarify that a deep flush is always performed on REQ_{FUA,PREFLUSH} requests" * tag 'libnvdimm-for-4.18' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm: (21 commits) dax: Use dax_write_cache* helpers libnvdimm, pmem: Do not flush power-fail protected CPU caches libnvdimm, pmem: Unconditionally deep flush on *sync libnvdimm, pmem: Complete REQ_FLUSH => REQ_PREFLUSH acpi, nfit: Remove ecc_unit_size dax: dax_insert_mapping_entry always succeeds libnvdimm, e820: Register all pmem resources libnvdimm: Debug probe times linvdimm, pmem: Preserve read-only setting for pmem devices x86, nfit_test: Add unit test for memcpy_mcsafe() pmem: Switch to copy_to_iter_mcsafe() dax: Report bytes remaining in dax_iomap_actor() dax: Introduce a ->copy_to_iter dax operation uio, lib: Fix CONFIG_ARCH_HAS_UACCESS_MCSAFE compilation xfs, dax: introduce xfs_break_dax_layouts() xfs: prepare xfs_break_layouts() for another layout type xfs: prepare xfs_break_layouts() to be called with XFS_MMAPLOCK_EXCL mm, fs, dax: handle layout changes to pinned dax mappings mm: fix __gup_device_huge vs unmap mm: introduce MEMORY_DEVICE_FS_DAX and CONFIG_DEV_PAGEMAP_OPS ... |
|
Dave Chinner | ef215e394e |
xfs: setup VFS i_rwsem lockdep state correctly
When lockdep is enabled, it changes the type of the inode i_rwsem
semaphore before unlocking a newly instantiated inode. THere is the
possibility that there is already a waiter on that inode lock by the
time we unlock the new inode, so having lockdep re-initialise the
lock is a vector for trouble.
Avoid this whole situation by setting up the i_rwsem lockdep class
at the same time we set up the XFS inode i_ilock classes and so the
VFS doesn't have to change the lock class itself when it is
potentially unsafe.
This change is necessary because the equivalent fixes to the VFS code
made in commit
|
|
Dave Chinner | 0b61f8a407 |
xfs: convert to SPDX license tags
Remove the verbose license text from XFS files and replace them with SPDX tags. This does not change the license of any of the code, merely refers to the common, up-to-date license files in LICENSES/ This change was mostly scripted. fs/xfs/Makefile and fs/xfs/libxfs/xfs_fs.h were modified by hand, the rest were detected and modified by the following command: for f in `git grep -l "GNU General" fs/xfs/` ; do echo $f cat $f | awk -f hdr.awk > $f.new mv -f $f.new $f done And the hdr.awk script that did the modification (including detecting the difference between GPL-2.0 and GPL-2.0+ licenses) is as follows: $ cat hdr.awk BEGIN { hdr = 1.0 tag = "GPL-2.0" str = "" } /^ \* This program is free software/ { hdr = 2.0; next } /any later version./ { tag = "GPL-2.0+" next } /^ \*\// { if (hdr > 0.0) { print "// SPDX-License-Identifier: " tag print str print $0 str="" hdr = 0.0 next } print $0 next } /^ \* / { if (hdr > 1.0) next if (hdr > 0.0) { if (str != "") str = str "\n" str = str $0 next } print $0 next } /^ \*/ { if (hdr > 0.0) next print $0 next } // { if (hdr > 0.0) { if (str != "") str = str "\n" str = str $0 next } print $0 } END { } $ Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> |
|
Deepa Dinamani | 95582b0083 |
vfs: change inode times to use struct timespec64
struct timespec is not y2038 safe. Transition vfs to use y2038 safe struct timespec64 instead. The change was made with the help of the following cocinelle script. This catches about 80% of the changes. All the header file and logic changes are included in the first 5 rules. The rest are trivial substitutions. I avoid changing any of the function signatures or any other filesystem specific data structures to keep the patch simple for review. The script can be a little shorter by combining different cases. But, this version was sufficient for my usecase. virtual patch @ depends on patch @ identifier now; @@ - struct timespec + struct timespec64 current_time ( ... ) { - struct timespec now = current_kernel_time(); + struct timespec64 now = current_kernel_time64(); ... - return timespec_trunc( + return timespec64_trunc( ... ); } @ depends on patch @ identifier xtime; @@ struct \( iattr \| inode \| kstat \) { ... - struct timespec xtime; + struct timespec64 xtime; ... } @ depends on patch @ identifier t; @@ struct inode_operations { ... int (*update_time) (..., - struct timespec t, + struct timespec64 t, ...); ... } @ depends on patch @ identifier t; identifier fn_update_time =~ "update_time$"; @@ fn_update_time (..., - struct timespec *t, + struct timespec64 *t, ...) { ... } @ depends on patch @ identifier t; @@ lease_get_mtime( ... , - struct timespec *t + struct timespec64 *t ) { ... } @te depends on patch forall@ identifier ts; local idexpression struct inode *inode_node; identifier i_xtime =~ "^i_[acm]time$"; identifier ia_xtime =~ "^ia_[acm]time$"; identifier fn_update_time =~ "update_time$"; identifier fn; expression e, E3; local idexpression struct inode *node1; local idexpression struct inode *node2; local idexpression struct iattr *attr1; local idexpression struct iattr *attr2; local idexpression struct iattr attr; identifier i_xtime1 =~ "^i_[acm]time$"; identifier i_xtime2 =~ "^i_[acm]time$"; identifier ia_xtime1 =~ "^ia_[acm]time$"; identifier ia_xtime2 =~ "^ia_[acm]time$"; @@ ( ( - struct timespec ts; + struct timespec64 ts; | - struct timespec ts = current_time(inode_node); + struct timespec64 ts = current_time(inode_node); ) <+... when != ts ( - timespec_equal(&inode_node->i_xtime, &ts) + timespec64_equal(&inode_node->i_xtime, &ts) | - timespec_equal(&ts, &inode_node->i_xtime) + timespec64_equal(&ts, &inode_node->i_xtime) | - timespec_compare(&inode_node->i_xtime, &ts) + timespec64_compare(&inode_node->i_xtime, &ts) | - timespec_compare(&ts, &inode_node->i_xtime) + timespec64_compare(&ts, &inode_node->i_xtime) | ts = current_time(e) | fn_update_time(..., &ts,...) | inode_node->i_xtime = ts | node1->i_xtime = ts | ts = inode_node->i_xtime | <+... attr1->ia_xtime ...+> = ts | ts = attr1->ia_xtime | ts.tv_sec | ts.tv_nsec | btrfs_set_stack_timespec_sec(..., ts.tv_sec) | btrfs_set_stack_timespec_nsec(..., ts.tv_nsec) | - ts = timespec64_to_timespec( + ts = ... -) | - ts = ktime_to_timespec( + ts = ktime_to_timespec64( ...) | - ts = E3 + ts = timespec_to_timespec64(E3) | - ktime_get_real_ts(&ts) + ktime_get_real_ts64(&ts) | fn(..., - ts + timespec64_to_timespec(ts) ,...) ) ...+> ( <... when != ts - return ts; + return timespec64_to_timespec(ts); ...> ) | - timespec_equal(&node1->i_xtime1, &node2->i_xtime2) + timespec64_equal(&node1->i_xtime2, &node2->i_xtime2) | - timespec_equal(&node1->i_xtime1, &attr2->ia_xtime2) + timespec64_equal(&node1->i_xtime2, &attr2->ia_xtime2) | - timespec_compare(&node1->i_xtime1, &node2->i_xtime2) + timespec64_compare(&node1->i_xtime1, &node2->i_xtime2) | node1->i_xtime1 = - timespec_trunc(attr1->ia_xtime1, + timespec64_trunc(attr1->ia_xtime1, ...) | - attr1->ia_xtime1 = timespec_trunc(attr2->ia_xtime2, + attr1->ia_xtime1 = timespec64_trunc(attr2->ia_xtime2, ...) | - ktime_get_real_ts(&attr1->ia_xtime1) + ktime_get_real_ts64(&attr1->ia_xtime1) | - ktime_get_real_ts(&attr.ia_xtime1) + ktime_get_real_ts64(&attr.ia_xtime1) ) @ depends on patch @ struct inode *node; struct iattr *attr; identifier fn; identifier i_xtime =~ "^i_[acm]time$"; identifier ia_xtime =~ "^ia_[acm]time$"; expression e; @@ ( - fn(node->i_xtime); + fn(timespec64_to_timespec(node->i_xtime)); | fn(..., - node->i_xtime); + timespec64_to_timespec(node->i_xtime)); | - e = fn(attr->ia_xtime); + e = fn(timespec64_to_timespec(attr->ia_xtime)); ) @ depends on patch forall @ struct inode *node; struct iattr *attr; identifier i_xtime =~ "^i_[acm]time$"; identifier ia_xtime =~ "^ia_[acm]time$"; identifier fn; @@ { + struct timespec ts; <+... ( + ts = timespec64_to_timespec(node->i_xtime); fn (..., - &node->i_xtime, + &ts, ...); | + ts = timespec64_to_timespec(attr->ia_xtime); fn (..., - &attr->ia_xtime, + &ts, ...); ) ...+> } @ depends on patch forall @ struct inode *node; struct iattr *attr; struct kstat *stat; identifier ia_xtime =~ "^ia_[acm]time$"; identifier i_xtime =~ "^i_[acm]time$"; identifier xtime =~ "^[acm]time$"; identifier fn, ret; @@ { + struct timespec ts; <+... ( + ts = timespec64_to_timespec(node->i_xtime); ret = fn (..., - &node->i_xtime, + &ts, ...); | + ts = timespec64_to_timespec(node->i_xtime); ret = fn (..., - &node->i_xtime); + &ts); | + ts = timespec64_to_timespec(attr->ia_xtime); ret = fn (..., - &attr->ia_xtime, + &ts, ...); | + ts = timespec64_to_timespec(attr->ia_xtime); ret = fn (..., - &attr->ia_xtime); + &ts); | + ts = timespec64_to_timespec(stat->xtime); ret = fn (..., - &stat->xtime); + &ts); ) ...+> } @ depends on patch @ struct inode *node; struct inode *node2; identifier i_xtime1 =~ "^i_[acm]time$"; identifier i_xtime2 =~ "^i_[acm]time$"; identifier i_xtime3 =~ "^i_[acm]time$"; struct iattr *attrp; struct iattr *attrp2; struct iattr attr ; identifier ia_xtime1 =~ "^ia_[acm]time$"; identifier ia_xtime2 =~ "^ia_[acm]time$"; struct kstat *stat; struct kstat stat1; struct timespec64 ts; identifier xtime =~ "^[acmb]time$"; expression e; @@ ( ( node->i_xtime2 \| attrp->ia_xtime2 \| attr.ia_xtime2 \) = node->i_xtime1 ; | node->i_xtime2 = \( node2->i_xtime1 \| timespec64_trunc(...) \); | node->i_xtime2 = node->i_xtime1 = node->i_xtime3 = \(ts \| current_time(...) \); | node->i_xtime1 = node->i_xtime3 = \(ts \| current_time(...) \); | stat->xtime = node2->i_xtime1; | stat1.xtime = node2->i_xtime1; | ( node->i_xtime2 \| attrp->ia_xtime2 \) = attrp->ia_xtime1 ; | ( attrp->ia_xtime1 \| attr.ia_xtime1 \) = attrp2->ia_xtime2; | - e = node->i_xtime1; + e = timespec64_to_timespec( node->i_xtime1 ); | - e = attrp->ia_xtime1; + e = timespec64_to_timespec( attrp->ia_xtime1 ); | node->i_xtime1 = current_time(...); | node->i_xtime2 = node->i_xtime1 = node->i_xtime3 = - e; + timespec_to_timespec64(e); | node->i_xtime1 = node->i_xtime3 = - e; + timespec_to_timespec64(e); | - node->i_xtime1 = e; + node->i_xtime1 = timespec_to_timespec64(e); ) Signed-off-by: Deepa Dinamani <deepa.kernel@gmail.com> Cc: <anton@tuxera.com> Cc: <balbi@kernel.org> Cc: <bfields@fieldses.org> Cc: <darrick.wong@oracle.com> Cc: <dhowells@redhat.com> Cc: <dsterba@suse.com> Cc: <dwmw2@infradead.org> Cc: <hch@lst.de> Cc: <hirofumi@mail.parknet.co.jp> Cc: <hubcap@omnibond.com> Cc: <jack@suse.com> Cc: <jaegeuk@kernel.org> Cc: <jaharkes@cs.cmu.edu> Cc: <jslaby@suse.com> Cc: <keescook@chromium.org> Cc: <mark@fasheh.com> Cc: <miklos@szeredi.hu> Cc: <nico@linaro.org> Cc: <reiserfs-devel@vger.kernel.org> Cc: <richard@nod.at> Cc: <sage@redhat.com> Cc: <sfrench@samba.org> Cc: <swhiteho@redhat.com> Cc: <tj@kernel.org> Cc: <trond.myklebust@primarydata.com> Cc: <tytso@mit.edu> Cc: <viro@zeniv.linux.org.uk> |
|
Linus Torvalds | 6567af78ac |
Changes for 4.18:
- Strengthen inode number and structure validation when allocating inodes. - Reduce pointless buffer allocations during cache miss - Use FUA for pure data O_DSYNC directio writes - Various iomap refactorings - Strengthen quota metadata verification to avoid unfixable broken quota - Make AGFL block freeing a deferred operation to avoid blowing out transaction reservations when running complex operations - Get rid of the log item descriptors to reduce log overhead - Fix various reflink bugs where inodes were double-joined to transactions - Don't issue discards when trimming unwritten extents - Refactor incore dquot initialization and retrieval interfaces - Fix some locking problmes in the quota scrub code - Strengthen btree structure checks in scrub code - Rewrite swapfile activation to use iomap and support unwritten extents - Make scrub exit to userspace sooner when corruptions or cross-referencing problems are found - Make scrub invoke the data fork scrubber directly on metadata inodes - Don't do background reclamation of post-eof and cow blocks when the fs is suspended - Fix secondary superblock buffer lifespan hinting - Refactor growfs to use table-dispatched functions instead of long stringy functions - Move growfs code to libxfs - Implement online fs label getting and setting - Introduce online filesystem repair (in a very limited capacity) - Fix unit conversion problems in the realtime freemap iteration functions - Various refactorings and cleanups in preparation to remove buffer heads in a future release - Reimplement the old bmap call with iomap - Remove direct buffer head accesses from seek hole/data - Various bug fixes -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEUzaAxoMeQq6m2jMV+H93GTRKtOsFAlsR9dEACgkQ+H93GTRK tOv0dw//cBwRgY4jhC6b9oMk2DNRWUiTt1F2yoqr28661GPo124iXAMLIwJe1DiV W/qpN3HUz7P46xKOVY+MXaj0JIDFxJ8c5tHAQMH/TkDc49S+mkcGyaoPJ39hnc6u yikG+Hq4m0YWhHaeUhKTe8pnhXBaziz5A2NtKtwh6lPOIW+Wds51T77DJnViqADq tZzmAq8fS9/ELpxe0Th/2D7iTWCr2c3FLsW2KgbbNvQ4e34zVE1ix1eBtEzQE+Mm GUjdQhYVS1oCzqZfCxJkzR4R/1TAFyS0FXOW7PHo8FAX/kas9aQbRlnHSAQ/08EE 8Z2p3GsFip7dgmd6O6nAmFAStW6GRvgyycJ7Y+Y0IsJj6aDp9OxhRExyF+uocJR9 b9ChOH6PMEtRB/RRlBg66pbS61abvNGutzl61ZQZGBHEvL3VqDcd68IomdD5bNSB pXo6mOJIcKuXsghZszsHAV9uuMe4zQAMbLy7QH6V8LyWeSAG9hTXOT9EA4MWktEJ SCQFf7RRPgU5pEAgOS8LgKrawqnBaqFcFvkvWsQhyiltTFz29cwxH7tjSXYMAOFE W+RMp8kbkPnGOaJJeKxT+/RGRB534URk0jIEKtRb679xkEF3HE58exXEVrnojJq6 0m712+EYuZSYhFBwrvEnQjNHr0x2r/A/iBJZ6HhyV0aO1RWm4n4= =11pr -----END PGP SIGNATURE----- Merge tag 'xfs-4.18-merge-3' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux Pull xfs updates from Darrick Wong: "New features this cycle include the ability to relabel mounted filesystems, support for fallocated swapfiles, and using FUA for pure data O_DSYNC directio writes. With this cycle we begin to integrate online filesystem repair and refactor the growfs code in preparation for eventual subvolume support, though the road ahead for both features is quite long. There are also numerous refactorings of the iomap code to remove unnecessary log overhead, to disentangle some of the quota code, and to prepare for buffer head removal in a future upstream kernel. Metadata validation continues to improve, both in the hot path veifiers and the online filesystem check code. I anticipate sending a second pull request in a few days with more metadata validation improvements. This series has been run through a full xfstests run over the weekend and through a quick xfstests run against this morning's master, with no major failures reported. Summary: - Strengthen inode number and structure validation when allocating inodes. - Reduce pointless buffer allocations during cache miss - Use FUA for pure data O_DSYNC directio writes - Various iomap refactorings - Strengthen quota metadata verification to avoid unfixable broken quota - Make AGFL block freeing a deferred operation to avoid blowing out transaction reservations when running complex operations - Get rid of the log item descriptors to reduce log overhead - Fix various reflink bugs where inodes were double-joined to transactions - Don't issue discards when trimming unwritten extents - Refactor incore dquot initialization and retrieval interfaces - Fix some locking problmes in the quota scrub code - Strengthen btree structure checks in scrub code - Rewrite swapfile activation to use iomap and support unwritten extents - Make scrub exit to userspace sooner when corruptions or cross-referencing problems are found - Make scrub invoke the data fork scrubber directly on metadata inodes - Don't do background reclamation of post-eof and cow blocks when the fs is suspended - Fix secondary superblock buffer lifespan hinting - Refactor growfs to use table-dispatched functions instead of long stringy functions - Move growfs code to libxfs - Implement online fs label getting and setting - Introduce online filesystem repair (in a very limited capacity) - Fix unit conversion problems in the realtime freemap iteration functions - Various refactorings and cleanups in preparation to remove buffer heads in a future release - Reimplement the old bmap call with iomap - Remove direct buffer head accesses from seek hole/data - Various bug fixes" * tag 'xfs-4.18-merge-3' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: (121 commits) fs: use ->is_partially_uptodate in page_cache_seek_hole_data fs: remove the buffer_unwritten check in page_seek_hole_data fs: move page_cache_seek_hole_data to iomap.c xfs: use iomap_bmap iomap: add an iomap-based bmap implementation iomap: add a iomap_sector helper iomap: use __bio_add_page in iomap_dio_zero iomap: move IOMAP_F_BOUNDARY to gfs2 iomap: fix the comment describing IOMAP_NOWAIT iomap: inline data should be an iomap type, not a flag mm: split ->readpages calls to avoid non-contiguous pages lists mm: return an unsigned int from __do_page_cache_readahead mm: give the 'ret' variable a better name __do_page_cache_readahead block: add a lower-level bio_add_page interface xfs: fix error handling in xfs_refcount_insert() xfs: fix xfs_rtalloc_rec units xfs: strengthen rtalloc query range checks xfs: xfs_rtbuf_get should check the bmapi_read results xfs: xfs_rtword_t should be unsigned, not signed dax: change bdev_dax_supported() to support boolean returns ... |
|
Darrick J. Wong | ba23cba9b3 |
fs: allow per-device dax status checking for filesystems
Change bdev_dax_supported so it takes a bdev parameter. This enables multi-device filesystems like xfs to check that a dax device can work for the particular filesystem. Once that's in place, actually fix all the parts of XFS where we need to be able to distinguish between datadev and rtdev. This patch fixes the problem where we screw up the dax support checking in xfs if the datadev and rtdev have different dax capabilities. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> [rez: Re-added __bdev_dax_supported() for !CONFIG_FS_DAX cases] Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com> Reviewed-by: Eric Sandeen <sandeen@redhat.com> |
|
Al Viro | b113a6d3cf |
xfs_vn_lookup: simplify a bit
have all post-xfs_lookup() branches converge on d_splice_alias() Cc: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> |
|
Dan Williams | 69eb5fa10e |
xfs: prepare xfs_break_layouts() for another layout type
When xfs is operating as the back-end of a pNFS block server, it prevents collisions between local and remote operations by requiring a lease to be held for remotely accessed blocks. Local filesystem operations break those leases before writing or mutating the extent map of the file. A similar mechanism is needed to prevent operations on pinned dax mappings, like device-DMA, from colliding with extent unmap operations. BREAK_WRITE and BREAK_UNMAP are introduced as two distinct levels of layout breaking. Layouts are broken in the BREAK_WRITE case to ensure that layout-holders do not collide with local writes. Additionally, layouts are broken in the BREAK_UNMAP case to make sure the layout-holder has a consistent view of the file's extent map. While BREAK_WRITE breaks can be satisfied be recalling FL_LAYOUT leases, BREAK_UNMAP breaks additionally require waiting for busy dax-pages to go idle while holding XFS_MMAPLOCK_EXCL. After this refactoring xfs_break_layouts() becomes the entry point for coordinating both types of breaks. Finally, xfs_break_leased_layouts() becomes just the BREAK_WRITE handler. Note that the unlock tracking is needed in a follow on change. That will coordinate retrying either break handler until both successfully test for a lease break while maintaining the lock state. Cc: Ross Zwisler <ross.zwisler@linux.intel.com> Cc: "Darrick J. Wong" <darrick.wong@oracle.com> Reported-by: Dave Chinner <david@fromorbit.com> Reported-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dan Williams <dan.j.williams@intel.com> |
|
Dan Williams | c63a8eae63 |
xfs: prepare xfs_break_layouts() to be called with XFS_MMAPLOCK_EXCL
In preparation for adding coordination between extent unmap operations and busy dax-pages, update xfs_break_layouts() to permit it to be called with the mmap lock held. This lock scheme will be required for coordinating the break of 'dax layouts' (non-idle dax (ZONE_DEVICE) pages mapped into the file's address space). Breaking dax layouts will be added to xfs_break_layouts() in a future patch, for now this preps the unmap call sites to take and hold XFS_MMAPLOCK_EXCL over the call to xfs_break_layouts(). Cc: "Darrick J. Wong" <darrick.wong@oracle.com> Cc: Ross Zwisler <ross.zwisler@linux.intel.com> Cc: Dave Chinner <david@fromorbit.com> Suggested-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: "Darrick J. Wong" <darrick.wong@oracle.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com> |
|
Darrick J. Wong | c14cfccabe |
xfs: remove unnecessary xfs_qm_dqattach parameter
The flags argument is always zero, get rid of it. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> |
|
Linus Torvalds | 80aa76bcd3 |
Changes since last update:
- Cleanup unnecessary function call parameters - Fix a use-after-free bug when aborting logging intents - Refactor filestreams state data to avoid use-after-free bug - Fix incorrect removal of cow extents when truncating extended attributes. - Refactor open-coded __set_page_dirty in favor of using vfs function. - Fix a deadlock when fstrim and fs shutdown race. -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iQIcBAABCgAGBQJazZ/HAAoJEPh/dxk0SrTrGeYP/Asis7MZ3TWfzsPHwK6EoH+w q1VnKVMqjSRnE8DYmx8w3tMqny2qg9klLkT1SRRz9Htr5CER5XqIX6ZYlQpVKx2R ycS+I1l/V/kqEmAgDgSK3DZ5uMgKHfo0w7GbKFDg69YDztN3yNBpDfUZ4VqA/Ua2 ADaeMYkJh6oBB5yZAWGyAJEM5TieqQppqyc+WLhbORyjEFreUTTmLqbzLPnceSih rQLg0noDwZK0XqbwndYXGTNKKoQtiJalnZP18DhH4zOr+FH03i/gmlU3w+ANl3eX IEE0TR2EkNHLZeduz7xT7ZHUOo0TaGObBK8CJFSojibQ/HooVjLcFQResd3q0coN WTkbTuxHwMqk2IujKRTqli/saENhvFrOrm/nYTFPw9+3GpRt0iLrXPSBbeMjmsZG XntdimPEHywjYrdW10VRH+6E6tvQiC/tl3abBuXdaEOJs1KZmPYNt42EF0ZQ5Xs5 IeDOhPLuuyUwRf12RVA9WS6xGdMR0+foMqncXZNcAzxQeAVfUNEtASNMTNIOO2H5 xD34/1ooFJnwT755VLT9U/qUd5CHtWO3AkH+9RVCpKWTaEOSY+gV6ZgmINv9npur Vw2xnZwxysegVlu76uct/b/sy/8J3OKYIoSYwbt6Zxi0M1WF9zanJ9pKCsBYdL6A xWoMr0X1yFGcYJozqwJU =uGz0 -----END PGP SIGNATURE----- Merge tag 'xfs-4.17-merge-4' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux Pull more xfs updates from Darrick Wong: "Most of these are code cleanups, but there are a couple of notable use-after-free bug fixes. This series has been run through a full xfstests run over the week and through a quick xfstests run against this morning's master, with no major failures reported. - clean up unnecessary function call parameters - fix a use-after-free bug when aborting logging intents - refactor filestreams state data to avoid use-after-free bug - fix incorrect removal of cow extents when truncating extended attributes. - refactor open-coded __set_page_dirty in favor of using vfs function. - fix a deadlock when fstrim and fs shutdown race" * tag 'xfs-4.17-merge-4' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: Force log to disk before reading the AGF during a fstrim Export __set_page_dirty xfs: only cancel cow blocks when truncating the data fork xfs: non-scrub - remove unused function parameters xfs: remove filestream item xfs_inode reference xfs: fix intent use-after-free on abort xfs: Remove "committed" argument of xfs_dir_ialloc |
|
Linus Torvalds | 9f3a0941fb |
libnvdimm for 4.17
* A rework of the filesytem-dax implementation provides for detection of
unmap operations (truncate / hole punch) colliding with in-progress
device-DMA. A fix for these collisions remains a work-in-progress
pending resolution of truncate latency and starvation regressions.
* The of_pmem driver expands the users of libnvdimm outside of x86 and
ACPI to describe an implementation of persistent memory on PowerPC with
Open Firmware / Device tree.
* Address Range Scrub (ARS) handling is completely rewritten to account for
the fact that ARS may run for 100s of seconds and there is no platform
defined way to cancel it. ARS will now no longer block namespace
initialization.
* The NVDIMM Namespace Label implementation is updated to handle label
areas as small as 1K, down from 128K.
* Miscellaneous cleanups and updates to unit test infrastructure.
-----BEGIN PGP SIGNATURE-----
iQIcBAABAgAGBQJazDt5AAoJEB7SkWpmfYgCqGMQALLwdPeY87cUK7AvQ2IXj46B
lJgeVuHPzyQDbC03AS5uUYnnU3I5lFd7i4y7ZrywNpFs4lsb/bNmbUpQE5xp+Yvc
1MJ/JYDIP5X4misWYm3VJo85N49+VqSRgAQk52PBigwnZ7M6/u4cSptXM9//c9JL
/NYbat6IjjY6Tx49Tec6+F3GMZjsFLcuTVkQcREoOyOqVJE4YpP0vhNjEe0vq6vr
EsSWiqEI5VFH4PfJwKdKj/64IKB4FGKj2A5cEgjQBxW2vw7tTJnkRkdE3jDUjqtg
xYAqGp/Dqs4+bgdYlT817YhiOVrcr5mOHj7TKWQrBPgzKCbcG5eKDmfT8t+3NEga
9kBlgisqIcG72lwZNA7QkEHxq1Omy9yc1hUv9qz2YA0G+J1WE8l1T15k1DOFwV57
qIrLLUypklNZLxvrzNjclempboKc4JCUlj+TdN5E5Y6pRs55UWTXaP7Xf5O7z0vf
l/uiiHkc3MPH73YD2PSEGFJ8m8EU0N8xhrcz3M9E2sHgYCnbty1Lw3FH0/GhThVA
ya1mMeDdb8A2P7gWCBk1Lqeig+rJKXSey4hKM6D0njOEtMQO1H4tFqGjyfDX1xlJ
3plUR9WBVEYzN5+9xWbwGag/ezGZ+NfcVO2gmy6yXiEph796BxRAZx/18zKRJr0m
9eGJG1H+JspcbtLF9iHn
=acZQ
-----END PGP SIGNATURE-----
Merge tag 'libnvdimm-for-4.17' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm
Pull libnvdimm updates from Dan Williams:
"This cycle was was not something I ever want to repeat as there were
several late changes that have only now just settled.
Half of the branch up to commit
|
|
Eric Sandeen | a1f69417c6 |
xfs: non-scrub - remove unused function parameters
Signed-off-by: Eric Sandeen <sandeen@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> |
|
Dan Williams | 6e2608dfd9 |
xfs, dax: introduce xfs_dax_aops
In preparation for the dax implementation to start associating dax pages to inodes via page->mapping, we need to provide a 'struct address_space_operations' instance for dax. Otherwise, direct-I/O triggers incorrect page cache assumptions and warnings like the following: WARNING: CPU: 27 PID: 1783 at fs/xfs/xfs_aops.c:1468 xfs_vm_set_page_dirty+0xf3/0x1b0 [xfs] [..] CPU: 27 PID: 1783 Comm: dma-collision Tainted: G O 4.15.0-rc2+ #984 [..] Call Trace: set_page_dirty_lock+0x40/0x60 bio_set_pages_dirty+0x37/0x50 iomap_dio_actor+0x2b7/0x3b0 ? iomap_dio_zero+0x110/0x110 iomap_apply+0xa4/0x110 iomap_dio_rw+0x29e/0x3b0 ? iomap_dio_zero+0x110/0x110 ? xfs_file_dio_aio_read+0x7c/0x1a0 [xfs] xfs_file_dio_aio_read+0x7c/0x1a0 [xfs] xfs_file_read_iter+0xa0/0xc0 [xfs] __vfs_read+0xf9/0x170 vfs_read+0xa6/0x150 SyS_pread64+0x93/0xb0 entry_SYSCALL_64_fastpath+0x1f/0x96 ...where the default set_page_dirty() handler assumes that dirty state is being tracked in 'struct page' flags. Cc: Jeff Moyer <jmoyer@redhat.com> Cc: Matthew Wilcox <mawilcox@microsoft.com> Cc: Ross Zwisler <ross.zwisler@linux.intel.com> Suggested-by: Jan Kara <jack@suse.cz> Suggested-by: Dave Chinner <david@fromorbit.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Dan Williams <dan.j.williams@intel.com> |
|
Christoph Hellwig | f5c54717bf |
xfs: remove xfs_zero_range
This helper doesn't add any real value over just calling iomap_zero_range directly, so remove it. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> |
|
Christoph Hellwig | c3b1b13190 |
xfs: implement the lazytime mount option
Use the VFS dirty inode tracking for lazytime inodes only, and just log them in ->dirty_inode. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> |
|
Eryu Guan | 350976ae21 |
xfs: truncate pagecache before writeback in xfs_setattr_size()
On truncate down, if new size is not block size aligned, we zero the rest of block to avoid exposing stale data to user, and iomap_truncate_page() skips zeroing if the range is already in unwritten state or a hole. Then we writeback from on-disk i_size to the new size if this range hasn't been written to disk yet, and truncate page cache beyond new EOF and set in-core i_size. The problem is that we could write data between di_size and newsize before removing the page cache beyond newsize, as the extents may still be in unwritten state right after a buffer write. As such, the page of data that newsize lies in has not been zeroed by page cache invalidation before it is written, and xfs_do_writepage() hasn't triggered it's "zero data beyond EOF" case because we haven't updated in-core i_size yet. Then a subsequent mmap read could see non-zeros past EOF. I occasionally see this in fsx runs in fstests generic/112, a simplified fsx operation sequence is like (assuming 4k block size xfs): fallocate 0x0 0x1000 0x0 keep_size write 0x0 0x1000 0x0 truncate 0x0 0x800 0x1000 punch_hole 0x0 0x800 0x800 mapread 0x0 0x800 0x800 where fallocate allocates unwritten extent but doesn't update i_size, buffer write populates the page cache and extent is still unwritten, truncate skips zeroing page past new EOF and writes the page to disk, punch_hole invalidates the page cache, at last mapread reads the block back and sees non-zero beyond EOF. Fix it by moving truncate_setsize() to before writeback so the page cache invalidation zeros the partial page at the new EOF. This also triggers "zero data beyond EOF" in xfs_do_writepage() at writeback time, because newsize has been set and page straddles the newsize. Also fixed the wrong 'end' param of filemap_write_and_wait_range() call while we're at it, the 'end' is inclusive and should be 'newsize - 1'. Suggested-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Eryu Guan <eguan@redhat.com> Acked-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> |
|
Christoph Hellwig | 66f364649d |
xfs: remove if_rdev
We can simply use the i_rdev field in the Linux inode and just convert to and from the XFS dev_t when reading or logging/writing the inode. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> |
|
Darrick J. Wong | 7bf7a193a9 |
xfs: fix compiler warnings
Fix up all the compiler warnings that have crept in. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de> |
|
Darrick J. Wong | 6eb0b8df9f |
xfs: rename MAXPATHLEN to XFS_SYMLINK_MAXLEN
XFS has a maximum symlink target length of 1024 bytes; this is a holdover from the Irix days. Unfortunately, the constant establishing this is 'MAXPATHLEN' and is /not/ the same as the Linux MAXPATHLEN, which is 4096. The kernel enforces its 1024 byte MAXPATHLEN on symlink targets, but xfsprogs picks up the (Linux) system 4096 byte MAXPATHLEN, which means that xfs_repair doesn't complain about oversized symlinks. Since this is an on-disk format constraint, put the define in the XFS namespace and move everything over to use the new name. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com> |
|
Jan Kara | 8ba358756a |
xfs: Don't clear SGID when inheriting ACLs
When new directory 'DIR1' is created in a directory 'DIR0' with SGID bit
set, DIR1 is expected to have SGID bit set (and owning group equal to
the owning group of 'DIR0'). However when 'DIR0' also has some default
ACLs that 'DIR1' inherits, setting these ACLs will result in SGID bit on
'DIR1' to get cleared if user is not member of the owning group.
Fix the problem by calling __xfs_set_acl() instead of xfs_set_acl() when
setting up inode in xfs_generic_create(). That prevents SGID bit
clearing and mode is properly set by posix_acl_create() anyway. We also
reorder arguments of __xfs_set_acl() to match the ordering of
xfs_set_acl() to make things consistent.
Fixes:
|
|
Darrick J. Wong | 5f955f26f3 |
xfs: report crtime and attribute flags to statx
statx has the ability to report inode creation times and inode flags, so hook up di_crtime and di_flags to that functionality. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> |
|
David Howells | a528d35e8b |
statx: Add a system call to make enhanced file info available
Add a system call to make extended file information available, including file creation and some attribute flags where available through the underlying filesystem. The getattr inode operation is altered to take two additional arguments: a u32 request_mask and an unsigned int flags that indicate the synchronisation mode. This change is propagated to the vfs_getattr*() function. Functions like vfs_stat() are now inline wrappers around new functions vfs_statx() and vfs_statx_fd() to reduce stack usage. ======== OVERVIEW ======== The idea was initially proposed as a set of xattrs that could be retrieved with getxattr(), but the general preference proved to be for a new syscall with an extended stat structure. A number of requests were gathered for features to be included. The following have been included: (1) Make the fields a consistent size on all arches and make them large. (2) Spare space, request flags and information flags are provided for future expansion. (3) Better support for the y2038 problem [Arnd Bergmann] (tv_sec is an __s64). (4) Creation time: The SMB protocol carries the creation time, which could be exported by Samba, which will in turn help CIFS make use of FS-Cache as that can be used for coherency data (stx_btime). This is also specified in NFSv4 as a recommended attribute and could be exported by NFSD [Steve French]. (5) Lightweight stat: Ask for just those details of interest, and allow a netfs (such as NFS) to approximate anything not of interest, possibly without going to the server [Trond Myklebust, Ulrich Drepper, Andreas Dilger] (AT_STATX_DONT_SYNC). (6) Heavyweight stat: Force a netfs to go to the server, even if it thinks its cached attributes are up to date [Trond Myklebust] (AT_STATX_FORCE_SYNC). And the following have been left out for future extension: (7) Data version number: Could be used by userspace NFS servers [Aneesh Kumar]. Can also be used to modify fill_post_wcc() in NFSD which retrieves i_version directly, but has just called vfs_getattr(). It could get it from the kstat struct if it used vfs_xgetattr() instead. (There's disagreement on the exact semantics of a single field, since not all filesystems do this the same way). (8) BSD stat compatibility: Including more fields from the BSD stat such as creation time (st_btime) and inode generation number (st_gen) [Jeremy Allison, Bernd Schubert]. (9) Inode generation number: Useful for FUSE and userspace NFS servers [Bernd Schubert]. (This was asked for but later deemed unnecessary with the open-by-handle capability available and caused disagreement as to whether it's a security hole or not). (10) Extra coherency data may be useful in making backups [Andreas Dilger]. (No particular data were offered, but things like last backup timestamp, the data version number and the DOS archive bit would come into this category). (11) Allow the filesystem to indicate what it can/cannot provide: A filesystem can now say it doesn't support a standard stat feature if that isn't available, so if, for instance, inode numbers or UIDs don't exist or are fabricated locally... (This requires a separate system call - I have an fsinfo() call idea for this). (12) Store a 16-byte volume ID in the superblock that can be returned in struct xstat [Steve French]. (Deferred to fsinfo). (13) Include granularity fields in the time data to indicate the granularity of each of the times (NFSv4 time_delta) [Steve French]. (Deferred to fsinfo). (14) FS_IOC_GETFLAGS value. These could be translated to BSD's st_flags. Note that the Linux IOC flags are a mess and filesystems such as Ext4 define flags that aren't in linux/fs.h, so translation in the kernel may be a necessity (or, possibly, we provide the filesystem type too). (Some attributes are made available in stx_attributes, but the general feeling was that the IOC flags were to ext[234]-specific and shouldn't be exposed through statx this way). (15) Mask of features available on file (eg: ACLs, seclabel) [Brad Boyer, Michael Kerrisk]. (Deferred, probably to fsinfo. Finding out if there's an ACL or seclabal might require extra filesystem operations). (16) Femtosecond-resolution timestamps [Dave Chinner]. (A __reserved field has been left in the statx_timestamp struct for this - if there proves to be a need). (17) A set multiple attributes syscall to go with this. =============== NEW SYSTEM CALL =============== The new system call is: int ret = statx(int dfd, const char *filename, unsigned int flags, unsigned int mask, struct statx *buffer); The dfd, filename and flags parameters indicate the file to query, in a similar way to fstatat(). There is no equivalent of lstat() as that can be emulated with statx() by passing AT_SYMLINK_NOFOLLOW in flags. There is also no equivalent of fstat() as that can be emulated by passing a NULL filename to statx() with the fd of interest in dfd. Whether or not statx() synchronises the attributes with the backing store can be controlled by OR'ing a value into the flags argument (this typically only affects network filesystems): (1) AT_STATX_SYNC_AS_STAT tells statx() to behave as stat() does in this respect. (2) AT_STATX_FORCE_SYNC will require a network filesystem to synchronise its attributes with the server - which might require data writeback to occur to get the timestamps correct. (3) AT_STATX_DONT_SYNC will suppress synchronisation with the server in a network filesystem. The resulting values should be considered approximate. mask is a bitmask indicating the fields in struct statx that are of interest to the caller. The user should set this to STATX_BASIC_STATS to get the basic set returned by stat(). It should be noted that asking for more information may entail extra I/O operations. buffer points to the destination for the data. This must be 256 bytes in size. ====================== MAIN ATTRIBUTES RECORD ====================== The following structures are defined in which to return the main attribute set: struct statx_timestamp { __s64 tv_sec; __s32 tv_nsec; __s32 __reserved; }; struct statx { __u32 stx_mask; __u32 stx_blksize; __u64 stx_attributes; __u32 stx_nlink; __u32 stx_uid; __u32 stx_gid; __u16 stx_mode; __u16 __spare0[1]; __u64 stx_ino; __u64 stx_size; __u64 stx_blocks; __u64 __spare1[1]; struct statx_timestamp stx_atime; struct statx_timestamp stx_btime; struct statx_timestamp stx_ctime; struct statx_timestamp stx_mtime; __u32 stx_rdev_major; __u32 stx_rdev_minor; __u32 stx_dev_major; __u32 stx_dev_minor; __u64 __spare2[14]; }; The defined bits in request_mask and stx_mask are: STATX_TYPE Want/got stx_mode & S_IFMT STATX_MODE Want/got stx_mode & ~S_IFMT STATX_NLINK Want/got stx_nlink STATX_UID Want/got stx_uid STATX_GID Want/got stx_gid STATX_ATIME Want/got stx_atime{,_ns} STATX_MTIME Want/got stx_mtime{,_ns} STATX_CTIME Want/got stx_ctime{,_ns} STATX_INO Want/got stx_ino STATX_SIZE Want/got stx_size STATX_BLOCKS Want/got stx_blocks STATX_BASIC_STATS [The stuff in the normal stat struct] STATX_BTIME Want/got stx_btime{,_ns} STATX_ALL [All currently available stuff] stx_btime is the file creation time, stx_mask is a bitmask indicating the data provided and __spares*[] are where as-yet undefined fields can be placed. Time fields are structures with separate seconds and nanoseconds fields plus a reserved field in case we want to add even finer resolution. Note that times will be negative if before 1970; in such a case, the nanosecond fields will also be negative if not zero. The bits defined in the stx_attributes field convey information about a file, how it is accessed, where it is and what it does. The following attributes map to FS_*_FL flags and are the same numerical value: STATX_ATTR_COMPRESSED File is compressed by the fs STATX_ATTR_IMMUTABLE File is marked immutable STATX_ATTR_APPEND File is append-only STATX_ATTR_NODUMP File is not to be dumped STATX_ATTR_ENCRYPTED File requires key to decrypt in fs Within the kernel, the supported flags are listed by: KSTAT_ATTR_FS_IOC_FLAGS [Are any other IOC flags of sufficient general interest to be exposed through this interface?] New flags include: STATX_ATTR_AUTOMOUNT Object is an automount trigger These are for the use of GUI tools that might want to mark files specially, depending on what they are. Fields in struct statx come in a number of classes: (0) stx_dev_*, stx_blksize. These are local system information and are always available. (1) stx_mode, stx_nlinks, stx_uid, stx_gid, stx_[amc]time, stx_ino, stx_size, stx_blocks. These will be returned whether the caller asks for them or not. The corresponding bits in stx_mask will be set to indicate whether they actually have valid values. If the caller didn't ask for them, then they may be approximated. For example, NFS won't waste any time updating them from the server, unless as a byproduct of updating something requested. If the values don't actually exist for the underlying object (such as UID or GID on a DOS file), then the bit won't be set in the stx_mask, even if the caller asked for the value. In such a case, the returned value will be a fabrication. Note that there are instances where the type might not be valid, for instance Windows reparse points. (2) stx_rdev_*. This will be set only if stx_mode indicates we're looking at a blockdev or a chardev, otherwise will be 0. (3) stx_btime. Similar to (1), except this will be set to 0 if it doesn't exist. ======= TESTING ======= The following test program can be used to test the statx system call: samples/statx/test-statx.c Just compile and run, passing it paths to the files you want to examine. The file is built automatically if CONFIG_SAMPLES is enabled. Here's some example output. Firstly, an NFS directory that crosses to another FSID. Note that the AUTOMOUNT attribute is set because transiting this directory will cause d_automount to be invoked by the VFS. [root@andromeda ~]# /tmp/test-statx -A /warthog/data statx(/warthog/data) = 0 results=7ff Size: 4096 Blocks: 8 IO Block: 1048576 directory Device: 00:26 Inode: 1703937 Links: 125 Access: (3777/drwxrwxrwx) Uid: 0 Gid: 4041 Access: 2016-11-24 09:02:12.219699527+0000 Modify: 2016-11-17 10:44:36.225653653+0000 Change: 2016-11-17 10:44:36.225653653+0000 Attributes: 0000000000001000 (-------- -------- -------- -------- -------- -------- ---m---- --------) Secondly, the result of automounting on that directory. [root@andromeda ~]# /tmp/test-statx /warthog/data statx(/warthog/data) = 0 results=7ff Size: 4096 Blocks: 8 IO Block: 1048576 directory Device: 00:27 Inode: 2 Links: 125 Access: (3777/drwxrwxrwx) Uid: 0 Gid: 4041 Access: 2016-11-24 09:02:12.219699527+0000 Modify: 2016-11-17 10:44:36.225653653+0000 Change: 2016-11-17 10:44:36.225653653+0000 Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> |