more likely to be updated as we add new features to ext4. Add 64-bit
timestamp support to ext4's superblock fields. And the usual bug
fixes and cleanups, including a Spectre gadget fixup and some
hardening against maliciously corrupted file systems.
-----BEGIN PGP SIGNATURE-----
iQEzBAABCAAdFiEEK2m5VNv+CHkogTfJ8vlZVpUNgaMFAltx3CAACgkQ8vlZVpUN
gaOfkAgApkNU46yPWWFqJpeTR76Go4zW00x9Piaita+KpQcTCUokK0BDXXeydOEo
zhtNjCRl00vbhnYJ90T75Y3ce9oGQDkWpXHFZBdl+kmMBaIHQJYEECCLR2zrvxVk
LhsixvLozQd/VFenwzmXLqYshiZjhjBZOKo7/xsoy4waer1vNidA9/SR3hGWTMOk
9IsEVqw1h6pZt7ZB/leZmZ348H9QZZbYOUWl2ArvdmNeg2t8pWH0MD0bos+mj5lW
xihyJEIS/OjWZRJhM2WEPMO7U4jD/aX3zmDMGePp/vC5KSqA4FGwk671xIyD+LXR
Hd8Rn8chRe4/iyhBqHpJNkqHQRUV6g==
=SA3p
-----END PGP SIGNATURE-----
Merge tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4
Pull ext4 updates from Ted Ts'o:
- Convert content from the ext4 wiki to Documentation rst files so it
is more likely to be updated as we add new features to ext4.
- Add 64-bit timestamp support to ext4's superblock fields.
- ... and the usual bug fixes and cleanups, including a Spectre gadget
fixup and some hardening against maliciously corrupted file systems.
* tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (34 commits)
ext4: remove unneeded variable "err" in ext4_mb_release_inode_pa()
ext4: improve code readability in ext4_iget()
ext4: fix spectre gadget in ext4_mb_regular_allocator()
ext4: check for NUL characters in extended attribute's name
ext4: use ext4_warning() for sb_getblk failure
ext4: fix race when setting the bitmap corrupted flag
ext4: reset error code in ext4_find_entry in fallback
ext4: handle layout changes to pinned DAX mappings
dax: dax_layout_busy_page() warn on !exceptional
docs: fix up the obviously obsolete bits in the new ext4 documentation
docs: add new ext4 superblock time extension fields
docs: create filesystem internal section
ext4: use swap macro in mext_page_double_lock
ext4: check allocation failure when duplicating "data" in ext4_remount()
ext4: fix warning message in ext4_enable_quotas()
ext4: super: extend timestamps to 40 bits
jbd2: replace current_kernel_time64 with ktime equivalent
ext4: use timespec64 for all inode times
ext4: use ktime_get_real_seconds for i_dtime
ext4: use 64-bit timestamps for mmp_time
...
-----BEGIN PGP SIGNATURE-----
iQGzBAABCgAdFiEE6fsu8pdIjtWE/DpLiiy9cAdyT1EFAltwwVkACgkQiiy9cAdy
T1H06gv/d+b52w2X05YlLBB1GboJjv1dl3sVrFisJrMrCCr6Hotk+4+RtC8CJHh6
/9Joq6iY8B/XLl7HrWr61vJhn8vlWOGhOQaUmHCBGeiIMX93RbaP8KvkPfHAKQiJ
T6+7v70z/WL7mUQA2TEQ3Iz6kpCuP2y/XT6eQglTFXasqsWHy08TbsbnmP9yX2i4
E1B6zMwpn6m4PFxEURg14eBJTrQomb/UBSLHNwxwwOQjnKzsmeH3pyiFCJSPJNFF
Xv/lyBibgvQAsuWtLTi82fqei+c60as2LVrsFLdoWXeD6JMlx1O+SHSz0NNJL/Lk
oxMNx9NI+LsYmekIBHgoX01PzVgCpcn6ThfoQv3JLHtqiZuk3Wlai7bPvJMIApoM
8fgHGR9xvhRCfHgYvxx4MmP3e7xewajSe7fh8/y8BqreO+fLeoyqho5G9zDpn6bU
wGFPtVYW/fgTEM5E3NOjsuC1zE35bjwU7U40jqjGHP1VsTs3hkLFy9fZNNq4EP1M
ytfJPlTG
=gzZK
-----END PGP SIGNATURE-----
Merge tag '4.19-smb3' of git://git.samba.org/sfrench/cifs-2.6
Pull cifs updates from Steve French:
"smb3/cifs fixes (including 8 for stable).
Other improvements include:
- improved tracing, improved stats
- snapshots (previous version mounts work now over SMB3)
- performance (compounding enabled for statfs, ~40% faster).
- security (make it possible to build cifs.ko with insecure vers=1.0
disabled in Kconfig)"
* tag '4.19-smb3' of git://git.samba.org/sfrench/cifs-2.6: (43 commits)
smb3: create smb3 equivalent alias for cifs pseudo-xattrs
smb3: allow previous versions to be mounted with snapshot= mount parm
cifs: don't show domain= in mount output when domain is empty
cifs: add missing support for ACLs in SMB 3.11
smb3: enumerating snapshots was leaving part of the data off end
cifs: update smb2_queryfs() to use compounding
cifs: update receive_encrypted_standard to handle compounded responses
cifs: create SMB2_open_init()/SMB2_open_free() helpers.
cifs: add SMB2_query_info_[init|free]()
cifs: add SMB2_close_init()/SMB2_close_free()
smb3: display stats counters for number of slow commands
CIFS: fix uninitialized ptr deref in smb2 signing
smb3: Do not send SMB3 SET_INFO if nothing changed
smb3: fix minor debug output for CONFIG_CIFS_STATS
smb3: add tracepoint for slow responses
cifs: add compound_send_recv()
cifs: make smb_send_rqst take an array of requests
cifs: update init_sg, crypt_message to take an array of rqst
smb3: update readme to correct information about /proc/fs/cifs/Stats
smb3: fix reset of bytes read and written stats
...
Pull fs iomap refactoring from Darrick Wong:
"This is the first part of the XFS changes for 4.19.
Christoph and Andreas coordinated some refactoring work on the iomap
code in preparation for removing buffer heads from XFS and porting
gfs2 to iomap. I'm sending this small pull request ahead of the main
XFS merge to avoid holding up gfs2 unnecessarily"
* 'iomap-4.19-merge' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
iomap: add inline data support to iomap_readpage_actor
iomap: support direct I/O to inline data
iomap: refactor iomap_dio_actor
iomap: add initial support for writes without buffer heads
iomap: add an iomap-based readpage and readpages implementation
iomap: add private pointer to struct iomap
iomap: add a page_done callback
iomap: generic inline data handling
iomap: complete partial direct I/O writes synchronously
iomap: mark newly allocated buffer heads as new
fs: factor out a __generic_write_end helper
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEE8rQSAMVO+zA4DBdWxWXV+ddtWDsFAltxe7QACgkQxWXV+ddt
WDswMA//QlRO+Ln5CH+RlT4fyf1RQUQZblWss2zxrmlo1GRI3Ljf2DNsBE3rD7P4
NSiXfHmgkdjcQP6poPLJwHxwkNd4NFXglYg64wWO10RjHGhKglmH6ztU88wsPfr2
2RZv271/NvYIEkEi6kdyy8ilKeWMshOfyj3+PaeapQn67uJfimyiUvDgUgbvwH3c
yj0nVRLP1C7snNj4Atti/rjXMhG+m1UWfjRkZsmqlBp52k2UAcrtiwQK+DS5b9mL
aWLSaGmIcJtSMkNJPQBST9GTWbJfKTpceoCzkT0o3irvQpN2e2flAJ4ireL8q4mN
MvqJ7giPBFHNDcHEzN6VERvsaA1Rx9Vq20ieQl8JAMd4p/bi5ehN3ww+9vau5zCw
Pc8WeKEILKrLYEAgHOnUO1wxHw994Iv5CA26roTQ0HNXQJjyEZ4m40Ch6LzmfKPm
WKcHX14Uw22GKaFEXHTOpRZ0U0d1cMTcn5zaAajGsB9LwcaiLM+OiFSPtDkwUOB9
QGJHklZVXAD1IH9HFPuq85uUtXTLXbxsw1g8phEJGbmaVxxCOAUAXwEk3qxuZNbz
CHL3G5+l3JEXxfoJSbDW60kr8xic7teqQDszqqP2qlqtP15ty2xc9d5Q8MZajSTZ
H1z9+0gfjYYHrGuAp69MtCbdQhhDSqLyivjJJm0HBaKfVNGW2Xg=
=jBaz
-----END PGP SIGNATURE-----
Merge tag 'for-4.19-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux
Pull btrfs updates from David Sterba:
"Mostly fixes and cleanups, nothing big, though the notable thing is
the inserted/deleted lines delta -1124.
User visible changes:
- allow defrag on opened read-only files that have rw permissions;
similar to what dedupe will allow on such files
Core changes:
- tree checker improvements, reported by fuzzing:
* more checks for: block group items, essential trees
* chunk type validation
* mount time cross-checks that physical and logical chunks match
* switch more error codes to EUCLEAN aka EFSCORRUPTED
Fixes:
- fsync corner case fixes
- fix send failure when root has deleted files still open
- send, fix incorrect file layout after hole punching beyond eof
- fix races between mount and deice scan ioctl, found by fuzzing
- fix deadlock when delayed iput is called from writeback on the same
inode; rare but has been observed in practice, also removes code
- fix pinned byte accounting, using the right percpu helpers; this
should avoid some write IO inefficiency during low space conditions
- don't remove block group that still has pinned bytes
- reset on-disk device stats value after replace, otherwise this
would report stale values for the new device
Cleanups:
- time64_t/timespec64 cleanups
- remove remaining dead code in scrub handling NOCOW extents after
disabling it in previous cycle
- simplify fsync regarding ordered extents logic and remove all the
related code
- remove redundant arguments in order to reduce stack space
consumption
- remove support for V0 type of extents, not in use since 2.6.30
- remove several unused structure members
- fewer indirect function calls by inlining some callbacks
- qgroup rescan timing fixes
- vfs: iget cleanups"
* tag 'for-4.19-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: (182 commits)
btrfs: revert fs_devices state on error of btrfs_init_new_device
btrfs: Exit gracefully when chunk map cannot be inserted to the tree
btrfs: Introduce mount time chunk <-> dev extent mapping check
btrfs: Verify that every chunk has corresponding block group at mount time
btrfs: Check that each block group has corresponding chunk at mount time
Btrfs: send, fix incorrect file layout after hole punching beyond eof
btrfs: Use wrapper macro for rcu string to remove duplicate code
btrfs: simplify btrfs_iget
btrfs: lift make_bad_inode into btrfs_iget
btrfs: simplify IS_ERR/PTR_ERR checks
btrfs: btrfs_iget never returns an is_bad_inode inode
btrfs: replace: Reset on-disk dev stats value after replace
btrfs: extent-tree: Remove unused __btrfs_free_block_rsv
btrfs: backref: Use ERR_CAST to return error code
btrfs: Remove redundant btrfs_release_path from btrfs_unlink_subvol
btrfs: Remove root parameter from btrfs_unlink_subvol
btrfs: Remove fs_info from btrfs_add_root_ref
btrfs: Remove fs_info from btrfs_del_root_ref
btrfs: Remove fs_info from btrfs_del_root
btrfs: Remove fs_info from btrfs_delete_delayed_dir_index
...
-----BEGIN PGP SIGNATURE-----
iQIcBAABAgAGBQJbcDltAAoJEAAOaEEZVoIV9YUP/ioYw3+kIXwRa05ec8Twbn6m
O+1HOX9UAuDWX+97P2kCJSqi9a/TkpfLQxoGxZowDv97ZkVeHdzVaJVN/0sbH1uP
Oj86b2Y5zGyrLk3D4c5VcyHkvogr16DUvhYeRtcjPyufSaKqefftCeetTntRvr4Z
rZ8HLKYxa4AZlP/FkgVZ/S6PmvG7tH8rVLMFCfY+rASbSU4gBxmrS0nB2UgU4ycb
b2SVZo4Zxycw0KiP14T5AgN4puid7VEofFezBpMKjNPjCUk1wJ3mtUyLeS8jRUEx
M6qnl7DU4BWPRtGxiXNebN85M8iaJnlaeQglowklrubrFtqnVeDQM0rz42NiY2/S
2jk6q8b9XKvoDEev2VeKfmFztu4gGXypNR6xAolvgBnUZziKKFUJutZwJe95pZV1
kO6CvdchuCQdLMLdHPji2kk2AA8Go5webONfioFhoWYfeSvgjRELmOFTDjP22eNv
s1/vfCsjuU6vdyaXXXZpN/7VMenoNegSQUXFoxhmHO7BtNuTqk+ZVFNgaWsOrXcx
I0ZIjCAmOsU+DKLn+DTT8kDtU3ihZz/OCXXZJi9JPeLFZ6gSghDRnFThwYqcZvA+
syIn1XGGtn15Q7A5FZNvwGqYir1GtyOslrxxIKxGlhleovhT94HuKXkL1T+k74zl
FXItKgRs0TPQfTsI8q+C
=IMcz
-----END PGP SIGNATURE-----
Merge tag 'locks-v4.19-1' of git://git.kernel.org/pub/scm/linux/kernel/git/jlayton/linux
Pull file locking updates from Jeff Layton:
"Just a couple of patches from Konstantin to fix /proc/locks when the
process that set the lock has exited, and a new tracepoint for the
flock() codepath. Also threw in mailmap entries for my addresses and a
comment cleanup"
* tag 'locks-v4.19-1' of git://git.kernel.org/pub/scm/linux/kernel/git/jlayton/linux:
locks: remove misleading obsolete comment
mailmap: remap some of my email addresses to kernel.org address
locks: add tracepoint in flock codepath
fs/lock: show locks taken by processes from another pidns
fs/lock: skip lock owner pid translation in case we are in init_pid_ns
Pull misc vfs updates from Al Viro:
"Misc cleanups from various folks all over the place
I expected more fs/dcache.c cleanups this cycle, so that went into a
separate branch. Said cleanups have missed the window, so in the
hindsight it could've gone into work.misc instead. Decided not to
cherry-pick, thus the 'work.dcache' branch"
* 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
fs: dcache: Use true and false for boolean values
fold generic_readlink() into its only caller
fs: shave 8 bytes off of struct inode
fs: Add more kernel-doc to the produced documentation
fs: Fix attr.c kernel-doc
removed extra extern file_fdatawait_range
* 'work.dcache' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
kill dentry_update_name_case()
Pull vfs aio updates from Al Viro:
"Christoph's aio poll, saner this time around.
This time it's pretty much local to fs/aio.c. Hopefully race-free..."
* 'work.aio' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
aio: allow direct aio poll comletions for keyed wakeups
aio: implement IOCB_CMD_POLL
aio: add a iocb refcount
timerfd: add support for keyed wakeups
Pull vfs lookup() updates from Al Viro:
"More conversions of ->lookup() to d_splice_alias().
Should be reasonably complete now - the only leftovers are in ceph"
* 'work.lookup' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
afs_try_auto_mntpt(): return NULL instead of ERR_PTR(-ENOENT)
afs_lookup(): switch to d_splice_alias()
afs: switch dynroot lookups to d_splice_alias()
hpfs: fix an inode leak in lookup, switch to d_splice_alias()
hostfs_lookup: switch to d_splice_alias()
Pull vfs icache updates from Al Viro:
- NFS mkdir/open_by_handle race fix
- analogous solution for FUSE, replacing the one currently in mainline
- new primitive to be used when discarding halfway set up inodes on
failed object creation; gives sane warranties re icache lookups not
returning such doomed by still not freed inodes. A bunch of
filesystems switched to that animal.
- Miklos' fix for last cycle regression in iget5_locked(); -stable will
need a slightly different variant, unfortunately.
- misc bits and pieces around things icache-related (in adfs and jfs).
* 'work.mkdir' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
jfs: don't bother with make_bad_inode() in ialloc()
adfs: don't put inodes into icache
new helper: inode_fake_hash()
vfs: don't evict uninitialized inode
jfs: switch to discard_new_inode()
ext2: make sure that partially set up inodes won't be returned by ext2_iget()
udf: switch to discard_new_inode()
ufs: switch to discard_new_inode()
btrfs: switch to discard_new_inode()
new primitive: discard_new_inode()
kill d_instantiate_no_diralias()
nfs_instantiate(): prevent multiple aliases for directory inode
Pull vfs open-related updates from Al Viro:
- "do we need fput() or put_filp()" rules are gone - it's always fput()
now. We keep track of that state where it belongs - in ->f_mode.
- int *opened mess killed - in finish_open(), in ->atomic_open()
instances and in fs/namei.c code around do_last()/lookup_open()/atomic_open().
- alloc_file() wrappers with saner calling conventions are introduced
(alloc_file_clone() and alloc_file_pseudo()); callers converted, with
much simplification.
- while we are at it, saner calling conventions for path_init() and
link_path_walk(), simplifying things inside fs/namei.c (both on
open-related paths and elsewhere).
* 'work.open3' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (40 commits)
few more cleanups of link_path_walk() callers
allow link_path_walk() to take ERR_PTR()
make path_init() unconditionally paired with terminate_walk()
document alloc_file() changes
make alloc_file() static
do_shmat(): grab shp->shm_file earlier, switch to alloc_file_clone()
new helper: alloc_file_clone()
create_pipe_files(): switch the first allocation to alloc_file_pseudo()
anon_inode_getfile(): switch to alloc_file_pseudo()
hugetlb_file_setup(): switch to alloc_file_pseudo()
ocxlflash_getfile(): switch to alloc_file_pseudo()
cxl_getfile(): switch to alloc_file_pseudo()
... and switch shmem_file_setup() to alloc_file_pseudo()
__shmem_file_setup(): reorder allocations
new wrapper: alloc_file_pseudo()
kill FILE_{CREATED,OPENED}
switch atomic_open() and lookup_open() to returning 0 in all success cases
document ->atomic_open() changes
->atomic_open(): return 0 in all success cases
get rid of 'opened' in path_openat() and the helpers downstream
...
An overview of the general architecture changes:
- Massive DMA ops refactoring from Christoph Hellwig (huzzah for
deleting crufty code!).
- We introduce NT_MIPS_DSP & NT_MIPS_FP_MODE ELF notes & corresponding
regsets to expose DSP ASE & floating point mode state respectively,
both for live debugging & core dumps.
- We better optimize our code by hard-coding cpu_has_* macros at
compile time where their values are known due to the ISA revision
that the kernel build is targeting.
- The EJTAG exception handler now better handles SMP systems, where it
was previously possible for CPUs to clobber a register value saved
by another CPU.
- Our implementation of memset() gained a couple of fixes for MIPSr6
systems to return correct values in some cases where stores fault.
- We now implement ioremap_wc() using the uncached-accelerated cache
coherency attribute where supported, which is detected during boot,
and fall back to plain uncached access where necessary. The
MIPS-specific (and unused in tree) ioremap_uncached_accelerated() &
ioremap_cacheable_cow() are removed.
- The prctl(PR_SET_FP_MODE, ...) syscall is better supported for SMP
systems by reworking the way we ensure remote CPUs that may be
running threads within the affected process switch mode.
- Systems using the MIPS Coherence Manager will now set the
MIPS_IC_SNOOPS_REMOTE flag to avoid some unnecessary cache
maintenance overhead when flushing the icache.
- A few fixes were made for building with clang/LLVM, which
now sucessfully builds kernels for many of our platforms.
- Miscellaneous cleanups all over.
And some platform-specific changes:
- ar7 gained stubs for a few clock API functions to fix build failures
for some drivers.
- ath79 gained support for a few new SoCs, a few fixes & better
gpio-keys support.
- Ci20 now exposes its SPI bus using the spi-gpio driver.
- The generic platform can now auto-detect a suitable value for
PHYS_OFFSET based upon the memory map described by the device tree,
allowing us to avoid wasting memory on page book-keeping for systems
where RAM starts at a non-zero physical address.
- Ingenic systems using the jz4740 platform code now link their
vmlinuz higher to allow for kernels of a realistic size.
- Loongson32 now builds the kernel targeting MIPSr1 rather than MIPSr2
to avoid CPU errata.
- Loongson64 gains a couple of fixes, a workaround for a write
buffering issue & support for the Loongson 3A R3.1 CPU.
- Malta now uses the piix4-poweroff driver to handle powering down.
- Microsemi Ocelot gained support for its SPI bus & NOR flash, its
second MDIO bus and can now be supported by a FIT/.itb image.
- Octeon saw a bunch of header cleanups which remove a lot of
duplicate or unused code.
-----BEGIN PGP SIGNATURE-----
iIsEABYIADMWIQRgLjeFAZEXQzy86/s+p5+stXUA3QUCW3G6JxUccGF1bC5idXJ0
b25AbWlwcy5jb20ACgkQPqefrLV1AN0n/gD/Rpdgay31G/4eTTKBmBrcaju6Shjt
/2Iu6WC5Sj4hDHUBAJSbuI+B9YjcNsjekBYxB/LLD7ImcLBl6nLMIvKmXLAL
=cUiF
-----END PGP SIGNATURE-----
Merge tag 'mips_4.19' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux
Pull MIPS updates from Paul Burton:
"Here are the main MIPS changes for 4.19.
An overview of the general architecture changes:
- Massive DMA ops refactoring from Christoph Hellwig (huzzah for
deleting crufty code!).
- We introduce NT_MIPS_DSP & NT_MIPS_FP_MODE ELF notes &
corresponding regsets to expose DSP ASE & floating point mode state
respectively, both for live debugging & core dumps.
- We better optimize our code by hard-coding cpu_has_* macros at
compile time where their values are known due to the ISA revision
that the kernel build is targeting.
- The EJTAG exception handler now better handles SMP systems, where
it was previously possible for CPUs to clobber a register value
saved by another CPU.
- Our implementation of memset() gained a couple of fixes for MIPSr6
systems to return correct values in some cases where stores fault.
- We now implement ioremap_wc() using the uncached-accelerated cache
coherency attribute where supported, which is detected during boot,
and fall back to plain uncached access where necessary. The
MIPS-specific (and unused in tree) ioremap_uncached_accelerated() &
ioremap_cacheable_cow() are removed.
- The prctl(PR_SET_FP_MODE, ...) syscall is better supported for SMP
systems by reworking the way we ensure remote CPUs that may be
running threads within the affected process switch mode.
- Systems using the MIPS Coherence Manager will now set the
MIPS_IC_SNOOPS_REMOTE flag to avoid some unnecessary cache
maintenance overhead when flushing the icache.
- A few fixes were made for building with clang/LLVM, which now
sucessfully builds kernels for many of our platforms.
- Miscellaneous cleanups all over.
And some platform-specific changes:
- ar7 gained stubs for a few clock API functions to fix build
failures for some drivers.
- ath79 gained support for a few new SoCs, a few fixes & better
gpio-keys support.
- Ci20 now exposes its SPI bus using the spi-gpio driver.
- The generic platform can now auto-detect a suitable value for
PHYS_OFFSET based upon the memory map described by the device tree,
allowing us to avoid wasting memory on page book-keeping for
systems where RAM starts at a non-zero physical address.
- Ingenic systems using the jz4740 platform code now link their
vmlinuz higher to allow for kernels of a realistic size.
- Loongson32 now builds the kernel targeting MIPSr1 rather than
MIPSr2 to avoid CPU errata.
- Loongson64 gains a couple of fixes, a workaround for a write
buffering issue & support for the Loongson 3A R3.1 CPU.
- Malta now uses the piix4-poweroff driver to handle powering down.
- Microsemi Ocelot gained support for its SPI bus & NOR flash, its
second MDIO bus and can now be supported by a FIT/.itb image.
- Octeon saw a bunch of header cleanups which remove a lot of
duplicate or unused code"
* tag 'mips_4.19' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux: (123 commits)
MIPS: Remove remnants of UASM_ISA
MIPS: netlogic: xlr: Remove erroneous check in nlm_fmn_send()
MIPS: VDSO: Force link endianness
MIPS: Always specify -EB or -EL when using clang
MIPS: Use dins to simplify __write_64bit_c0_split()
MIPS: Use read-write output operand in __write_64bit_c0_split()
MIPS: Avoid using array as parameter to write_c0_kpgd()
MIPS: vdso: Allow clang's --target flag in VDSO cflags
MIPS: genvdso: Remove GOT checks
MIPS: Remove obsolete MIPS checks for DST node "chosen@0"
MIPS: generic: Remove input symbols from defconfig
MIPS: Delete unused code in linux32.c
MIPS: Remove unused sys_32_mmap2
MIPS: Remove nabi_no_regargs
mips: dts: mscc: enable spi and NOR flash support on ocelot PCB123
mips: dts: mscc: Add spi on Ocelot
MIPS: Loongson: Merge load addresses
MIPS: Loongson: Set Loongson32 to MIPS32R1
MIPS: mscc: ocelot: add interrupt controller properties to GPIO controller
MIPS: generic: Select MIPS_AUTO_PFN_OFFSET
...
Pull timer updates from Thomas Gleixner:
"The timers departement more or less proudly presents:
- More Y2038 timekeeping work mostly in the core code. The work is
slowly, but steadily targeting the actuall syscalls.
- Enhanced timekeeping suspend/resume support by utilizing
clocksources which do not stop during suspend, but are otherwise
not the main timekeeping clocksources.
- Make NTP adjustmets more accurate and immediate when the frequency
is set directly and not incrementally.
- Sanitize the overrung handing of posix timers
- A new timer driver for Mediatek SoCs
- The usual pile of fixes and updates all over the place"
* 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (32 commits)
clockevents: Warn if cpu_all_mask is used as cpumask
tick/broadcast-hrtimer: Use cpu_possible_mask for ce_broadcast_hrtimer
clocksource/drivers/arm_arch_timer: Fix bogus cpu_all_mask usage
clocksource: ti-32k: Remove CLOCK_SOURCE_SUSPEND_NONSTOP flag
timers: Clear timer_base::must_forward_clk with timer_base::lock held
clocksource/drivers/sprd: Register one always-on timer to compensate suspend time
clocksource/drivers/timer-mediatek: Add support for system timer
clocksource/drivers/timer-mediatek: Convert the driver to timer-of
clocksource/drivers/timer-mediatek: Use specific prefix for GPT
clocksource/drivers/timer-mediatek: Rename mtk_timer to timer-mediatek
clocksource/drivers/timer-mediatek: Add system timer bindings
clocksource/drivers: Set clockevent device cpumask to cpu_possible_mask
time: Introduce one suspend clocksource to compensate the suspend time
time: Fix extra sleeptime injection when suspend fails
timekeeping/ntp: Constify some function arguments
ntp: Use kstrtos64 for s64 variable
ntp: Remove redundant arguments
timer: Fix coding style
ktime: Provide typesafe ktime_to_ns()
hrtimer: Improve kernel message printing
...
Pull locking/atomics update from Thomas Gleixner:
"The locking, atomics and memory model brains delivered:
- A larger update to the atomics code which reworks the ordering
barriers, consolidates the atomic primitives, provides the new
atomic64_fetch_add_unless() primitive and cleans up the include
hell.
- Simplify cmpxchg() instrumentation and add instrumentation for
xchg() and cmpxchg_double().
- Updates to the memory model and documentation"
* 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (48 commits)
locking/atomics: Rework ordering barriers
locking/atomics: Instrument cmpxchg_double*()
locking/atomics: Instrument xchg()
locking/atomics: Simplify cmpxchg() instrumentation
locking/atomics/x86: Reduce arch_cmpxchg64*() instrumentation
tools/memory-model: Rename litmus tests to comply to norm7
tools/memory-model/Documentation: Fix typo, smb->smp
sched/Documentation: Update wake_up() & co. memory-barrier guarantees
locking/spinlock, sched/core: Clarify requirements for smp_mb__after_spinlock()
sched/core: Use smp_mb() in wake_woken_function()
tools/memory-model: Add informal LKMM documentation to MAINTAINERS
locking/atomics/Documentation: Describe atomic_set() as a write operation
tools/memory-model: Make scripts executable
tools/memory-model: Remove ACCESS_ONCE() from model
tools/memory-model: Remove ACCESS_ONCE() from recipes
locking/memory-barriers.txt/kokr: Update Korean translation to fix broken DMA vs. MMIO ordering example
MAINTAINERS: Add Daniel Lustig as an LKMM reviewer
tools/memory-model: Fix ISA2+pooncelock+pooncelock+pombonce name
tools/memory-model: Add litmus test for full multicopy atomicity
locking/refcount: Always allow checked forms
...
Pull EFI updates from Thomas Gleixner:
"The EFI pile:
- Make mixed mode UEFI runtime service invocations mutually
exclusive, as mandated by the UEFI spec
- Perform UEFI runtime services calls from a work queue so the calls
into the firmware occur from a kernel thread
- Honor the UEFI memory map attributes for live memory regions
configured by UEFI as a framebuffer. This works around a coherency
problem with KVM guests running on ARM.
- Cleanups, improvements and fixes all over the place"
* 'efi-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
efivars: Call guid_parse() against guid_t type of variable
efi/cper: Use consistent types for UUIDs
efi/x86: Replace references to efi_early->is64 with efi_is_64bit()
efi: Deduplicate efi_open_volume()
efi/x86: Add missing NULL initialization in UGA draw protocol discovery
efi/x86: Merge 32-bit and 64-bit UGA draw protocol setup routines
efi/x86: Align efi_uga_draw_protocol typedef names to convention
efi/x86: Merge the setup_efi_pci32() and setup_efi_pci64() routines
efi/x86: Prevent reentrant firmware calls in mixed mode
efi/esrt: Only call efi_mem_reserve() for boot services memory
fbdev/efifb: Honour UEFI memory map attributes when mapping the FB
efi: Drop type and attribute checks in efi_mem_desc_lookup()
efi/libstub/arm: Add opt-in Kconfig option for the DTB loader
efi: Remove the declaration of efi_late_init() as the function is unused
efi/cper: Avoid using get_seconds()
efi: Use a work queue to invoke EFI Runtime Services
efi/x86: Use non-blocking SetVariable() for efi_delete_dummy_variable()
efi/x86: Clean up the eboot code
The code of ceph_unreserve_caps() and error handling in
ceph_reserve_caps() are duplicated, so introduce a helper
__ceph_unreserve_caps() to reduce duplicated code.
Signed-off-by: Chengguang Xu <cgxu519@gmx.com>
Reviewed-by: "Yan, Zheng" <zyan@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
We do not check return code for __do_request() in all callers,
so change to void return type.
Signed-off-by: Chengguang Xu <cgxu519@gmx.com>
Reviewed-by: "Yan, Zheng" <zyan@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
In ceph_llseek(), we compare fsc->max_file_size and inode->i_size to
choose max file size limit.
Signed-off-by: Chengguang Xu <cgxu519@gmx.com>
Reviewed-by: "Yan, Zheng" <zyan@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
ceph_setattr() finally calls vfs function inode_newsize_ok()
to do offset validation and that is based on sb->s_maxbytes.
Because we set sb->s_maxbytes to MAX_LFS_FILESIZE to through
VFS check and do proper offset validation in cephfs level,
we need adding proper offset validation before calling
inode_newsize_ok().
Signed-off-by: Chengguang Xu <cgxu519@gmx.com>
Reviewed-by: "Yan, Zheng" <zyan@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
In my testing, v9fs_fid_xattr_set will return successfully even if the
backend ext4 filesystem has no space to store xattr key-value. That will
cause inconsistent behavior between front end and back end. The reason is
that lsetxattr will be triggered by p9_client_clunk, and unfortunately we
did not catch the error. This patch will catch the error to notify upper
caller.
p9_client_clunk (in 9p)
p9_client_rpc(clnt, P9_TCLUNK, "d", fid->fid);
v9fs_clunk (in qemu)
put_fid
free_fid
v9fs_xattr_fid_clunk
v9fs_co_lsetxattr
s->ops->lsetxattr
ext4_xattr_user_set (in host ext4 filesystem)
Link: http://lkml.kernel.org/r/5B57EACC.2060900@huawei.com
Signed-off-by: Jun Piao <piaojun@huawei.com>
Cc: Eric Van Hensbergen <ericvh@gmail.com>
Cc: Ron Minnich <rminnich@sandia.gov>
Cc: Latchesar Ionkov <lucho@ionkov.net>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: stable@vger.kernel.org
Signed-off-by: Dominique Martinet <dominique.martinet@cea.fr>
fix spelling mistake in pr_info message text
Link: http://lkml.kernel.org/r/20180526150650.10562-1-colin.king@canonical.com
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Cc: Eric Van Hensbergen <ericvh@gmail.com>
Cc: Ron Minnich <rminnich@sandia.gov>
Cc: Latchesar Ionkov <lucho@ionkov.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Dominique Martinet <dominique.martinet@cea.fr>
Use new return type vm_fault_t for page_mkwrite handler.
See 1c8f422059 ("mm: change return type to vm_fault_t") for reference.
Link: http://lkml.kernel.org/r/20180702154928.GA3964@jordon-HP-15-Notebook-PC
Signed-off-by: Souptick Joarder <jrdr.linux@gmail.com>
Reviewed-by: Matthew Wilcox <mawilcox@microsoft.com>
Acked-by: Jun Piao <piaojun@huawei.com>
Cc: Eric Van Hensbergen <ericvh@gmail.com>
Cc: Ron Minnich <rminnich@sandia.gov>
Cc: Latchesar Ionkov <lucho@ionkov.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Dominique Martinet <dominique.martinet@cea.fr>
Pull vfs fixes from Al Viro:
"A bunch of race fixes, mostly around lazy pathwalk.
All of it is -stable fodder, a large part going back to 2013"
* 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
make sure that __dentry_kill() always invalidates d_seq, unhashed or not
fix __legitimize_mnt()/mntput() race
fix mntput/mntput race
root dentries need RCU-delayed freeing
Fuzzing tool reports a write to null pointer error in the
xfs_bmap_extents_to_btree, fix it by bailing out on encountering
a null pointer.
Signed-off-by: Shan Hai <shan.hai@oracle.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
The old lock tracking infrastructure in xfs using the b_last_holder
field seems to only be useful if you can get into the system with a
debugger; it seems that the existing tracepoints would be the way to
go these days, and this old infrastructure can be removed.
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Instead of open-coding pos & (PAGE_SIZE - 1) and pos & ~PAGE_MASK, use
the offset_in_page macro.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
This patch is the duplicate of ross's fix for ext4 for xfs.
If the refcount of a page is lowered between the time that it is returned
by dax_busy_page() and when the refcount is again checked in
xfs_break_layouts() => ___wait_var_event(), the waiting function
xfs_wait_dax_page() will never be called. This means that
xfs_break_layouts() will still have 'retry' set to false, so we'll stop
looping and never check the refcount of other pages in this inode.
Instead, always continue looping as long as dax_layout_busy_page() gives us
a page which it found with an elevated refcount.
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
- Add the SPI-NAND framework.
- Create a helper to find the best ECC configuration.
- Create NAND controller operations.
- Allocate dynamically ONFI parameters structure.
- Add defines for ONFI version bits.
- Add manufacturer fixup for ONFI parameter page.
- Add an option to specify NAND chip as a boot device.
- Add Reed-Solomon error correction algorithm.
- Better name for the controller structure.
- Remove unused caller_is_module() definition.
- Make subop helpers return unsigned values.
- Expose _notsupp() helpers for raw page accessors.
- Add default values for dynamic timings.
- Kill the chip->scan_bbt() hook.
- Rename nand_default_bbt() into nand_create_bbt().
- Start to clean the nand_chip structure.
- Remove stale prototype from rawnand.h.
Raw NAND controllers drivers changes:
- Qcom: structuring cleanup.
- Denali: use core helper to find the best ECC configuration.
- Possible build of almost all drivers by adding a dependency on
COMPILE_TEST for almost all of them in Kconfig, implies various
fixes, Kconfig cleanup, GPIO headers inclusion cleanup, and even
changes in sparc64 and ia64 architectures.
- Clean the ->probe() functions error path of a lot of drivers.
- Migrate all drivers to use nand_scan() instead of
nand_scan_ident()/nand_scan_tail() pair.
- Use mtd_device_register() where applicable to simplify the code.
- Marvell:
* Handle on-die ECC.
* Better clocks handling.
* Remove bogus comment.
* Add suspend and resume support.
- Tegra: add NAND controller driver.
- Atmel:
* Add module param to avoid using dma.
* Drop Wenyou Yang from MAINTAINERS.
- Denali: optimize timings handling.
- FSMC: Stop using chip->read_buf().
- FSL:
* Switch to SPDX license tag identifiers.
* Fix qualifiers in MXC init functions.
Raw NAND chip drivers changes:
- Micron:
* Add fixup for ONFI revision.
* Update ecc_stats.corrected.
* Make ECC activation stateful.
* Avoid enabling/disabling ECC when it can't be disabled.
* Get the actual number of bitflips.
* Allow forced on-die ECC.
* Support 8/512 on-die ECC.
* Fix on-die ECC detection logic.
- Hynix:
* Fix decoding the OOB size on H27UCG8T2BTR.
* Use ->exec_op() in hynix_nand_reg_write_op().
-----BEGIN PGP SIGNATURE-----
iQEcBAABCAAGBQJbYYGVAAoJECVq6hhHvVaE0poIAJy+VpZl0jTPQ/oO8TQui9hE
IZbc8LwohCvegYYhiY1cNESyMYamDfoK6M93i/0zTJF2AJAxPl25ldT8N5Wr16DO
5Vfsdjv75V8l0JEY2SvWYmC6glOAYs0UEDdcFNJRMPqUnQz+VvBIafJOCQqzo4ZH
SDnLx3XzOxO4PAPnztWEg50WvaqMPt7ThcqoxThHMcQaLrNjgJUsV0mN+vNEv16Q
6gH6hl1C019k+Kj2Zu0vAifHw1K7gIYT4HvqKwstQ6HYUX2IzIzuEpRIcIze0S5z
XKzZ57USItb3l+Y3YwFBLjgP4N+VTT5X59LxdtCOXJ+YvzgxwtKElRvalNcryYI=
=zkEf
-----END PGP SIGNATURE-----
Merge tag 'nand/for-4.19' of git://git.infradead.org/linux-mtd into mtd/next
Pull NAND updates from Miquel Raynal:
"
NAND core changes:
- Add the SPI-NAND framework.
- Create a helper to find the best ECC configuration.
- Create NAND controller operations.
- Allocate dynamically ONFI parameters structure.
- Add defines for ONFI version bits.
- Add manufacturer fixup for ONFI parameter page.
- Add an option to specify NAND chip as a boot device.
- Add Reed-Solomon error correction algorithm.
- Better name for the controller structure.
- Remove unused caller_is_module() definition.
- Make subop helpers return unsigned values.
- Expose _notsupp() helpers for raw page accessors.
- Add default values for dynamic timings.
- Kill the chip->scan_bbt() hook.
- Rename nand_default_bbt() into nand_create_bbt().
- Start to clean the nand_chip structure.
- Remove stale prototype from rawnand.h.
Raw NAND controllers drivers changes:
- Qcom: structuring cleanup.
- Denali: use core helper to find the best ECC configuration.
- Possible build of almost all drivers by adding a dependency on
COMPILE_TEST for almost all of them in Kconfig, implies various
fixes, Kconfig cleanup, GPIO headers inclusion cleanup, and even
changes in sparc64 and ia64 architectures.
- Clean the ->probe() functions error path of a lot of drivers.
- Migrate all drivers to use nand_scan() instead of
nand_scan_ident()/nand_scan_tail() pair.
- Use mtd_device_register() where applicable to simplify the code.
- Marvell:
* Handle on-die ECC.
* Better clocks handling.
* Remove bogus comment.
* Add suspend and resume support.
- Tegra: add NAND controller driver.
- Atmel:
* Add module param to avoid using dma.
* Drop Wenyou Yang from MAINTAINERS.
- Denali: optimize timings handling.
- FSMC: Stop using chip->read_buf().
- FSL:
* Switch to SPDX license tag identifiers.
* Fix qualifiers in MXC init functions.
Raw NAND chip drivers changes:
- Micron:
* Add fixup for ONFI revision.
* Update ecc_stats.corrected.
* Make ECC activation stateful.
* Avoid enabling/disabling ECC when it can't be disabled.
* Get the actual number of bitflips.
* Allow forced on-die ECC.
* Support 8/512 on-die ECC.
* Fix on-die ECC detection logic.
- Hynix:
* Fix decoding the OOB size on H27UCG8T2BTR.
* Use ->exec_op() in hynix_nand_reg_write_op().
"
We really, really don't want to be encouraging people to use
cifs (the dialect) since it is insecure, so to avoid confusion
we want to move them to names which include 'smb3' instead of
'cifs' - so this simply creates an alias for the pseudo-xattrs
e.g. can now do:
getfattr -n user.smb3.creationtime /mnt1/file
and
getfattr -n user.smb3.dosattrib /mnt1/file
and
getfattr -n system.smb3_acl /mnt1/file
instead of forcing you to use the string 'cifs' in
these (e.g. getfattr -n system.cifs_acl /mnt1/file)
Signed-off-by: Steve French <stfrench@microsoft.com>
Rebuild the AGI header items with some help from the rmapbt.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
mounting with the "snapshots=" mount parm allows a read-only
view of a previous version of a file system (see MS-SMB2
and "timewarp" tokens, section 2.2.13.2.6) based on the timestamp
passed in on the snapshots mount parm.
Add processing to optionally send this create context.
Example output:
/mnt1 is mounted with "snapshots=..." and will see an earlier
version of the directory, with three fewer files than /mnt2
the current version of the directory.
root@Ubuntu-17-Virtual-Machine:~/cifs-2.6# cat /proc/mounts | grep cifs
//172.22.149.186/public /mnt1 cifs
ro,relatime,vers=default,cache=strict,username=smfrench,uid=0,noforceuid,gid=0,noforcegid,addr=172.22.149.186,file_mode=0755,dir_mode=0755,soft,nounix,mapposix,rsize=1048576,wsize=1048576,echo_interval=60,snapshot=131748608570000000,actimeo=1
//172.22.149.186/public /mnt2 cifs
rw,relatime,vers=default,cache=strict,username=smfrench,uid=0,noforceuid,gid=0,noforcegid,addr=172.22.149.186,file_mode=0755,dir_mode=0755,soft,nounix,mapposix,rsize=1048576,wsize=1048576,echo_interval=60,actimeo=1
root@Ubuntu-17-Virtual-Machine:~/cifs-2.6# ls /mnt1
EmptyDir newerdir
root@Ubuntu-17-Virtual-Machine:~/cifs-2.6# ls /mnt1/newerdir
root@Ubuntu-17-Virtual-Machine:~/cifs-2.6# ls /mnt2
EmptyDir file newerdir newestdir timestamp-trace.cap
root@Ubuntu-17-Virtual-Machine:~/cifs-2.6# ls /mnt2/newerdir
new-file-not-in-snapshot
Snapshots are extremely useful for comparing previous versions of files or directories,
and recovering from data corruptions or mistakes.
Signed-off-by: Steve French <stfrench@microsoft.com>
Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>
Reported-by: Xiaoli Feng <xifeng@redhat.com>
Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
We were missing the methods for get_acl and friends for the 3.11
dialect.
Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
CC: Stable <stable@vger.kernel.org>
Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
When enumerating snapshots, the last few bytes of the final
snapshot could be left off since we were miscalculating the
length returned (leaving off the sizeof struct SRV_SNAPSHOT_ARRAY)
See MS-SMB2 section 2.2.32.2. In addition fixup the length used
to allow smaller buffer to be passed in, in order to allow
returning the size of the whole snapshot array more easily.
Sample userspace output with a kernel patched with this
(mounted to a Windows volume with two snapshots).
Before this patch, the second snapshot would be missing a
few bytes at the end.
~/cifs-2.6# ~/enum-snapshots /mnt/file
press enter to issue the ioctl to retrieve snapshot information ...
size of snapshot array = 102
Num snapshots: 2 Num returned: 2 Array Size: 102
Snapshot 0:@GMT-2018.06.30-19.34.17
Snapshot 1:@GMT-2018.06.30-19.33.37
CC: Stable <stable@vger.kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
Change smb2_queryfs() to use a Create/QueryInfo/Close compound request.
Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Reviewed-by: Paulo Alcantara <palcantara@suse.com>
Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Reviewed-by: Paulo Alcantara <palcantara@suse.com>
Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
RCU pathwalk relies upon the assumption that anything that changes
->d_inode of a dentry will invalidate its ->d_seq. That's almost
true - the one exception is that the final dput() of already unhashed
dentry does *not* touch ->d_seq at all. Unhashing does, though,
so for anything we'd found by RCU dcache lookup we are fine.
Unfortunately, we can *start* with an unhashed dentry or jump into
it.
We could try and be careful in the (few) places where that could
happen. Or we could just make the final dput() invalidate the damn
thing, unhashed or not. The latter is much simpler and easier to
backport, so let's do it that way.
Reported-by: "Dae R. Jeong" <threeearcat@gmail.com>
Cc: stable@vger.kernel.org
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
__legitimize_mnt() has two problems - one is that in case of success
the check of mount_lock is not ordered wrt preceding increment of
refcount, making it possible to have successful __legitimize_mnt()
on one CPU just before the otherwise final mntpu() on another,
with __legitimize_mnt() not seeing mntput() taking the lock and
mntput() not seeing the increment done by __legitimize_mnt().
Solved by a pair of barriers.
Another is that failure of __legitimize_mnt() on the second
read_seqretry() leaves us with reference that'll need to be
dropped by caller; however, if that races with final mntput()
we can end up with caller dropping rcu_read_lock() and doing
mntput() to release that reference - with the first mntput()
having freed the damn thing just as rcu_read_lock() had been
dropped. Solution: in "do mntput() yourself" failure case
grab mount_lock, check if MNT_DOOMED has been set by racing
final mntput() that has missed our increment and if it has -
undo the increment and treat that as "failure, caller doesn't
need to drop anything" case.
It's not easy to hit - the final mntput() has to come right
after the first read_seqretry() in __legitimize_mnt() *and*
manage to miss the increment done by __legitimize_mnt() before
the second read_seqretry() in there. The things that are almost
impossible to hit on bare hardware are not impossible on SMP
KVM, though...
Reported-by: Oleg Nesterov <oleg@redhat.com>
Fixes: 48a066e72d ("RCU'd vsfmounts")
Cc: stable@vger.kernel.org
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
mntput_no_expire() does the calculation of total refcount under mount_lock;
unfortunately, the decrement (as well as all increments) are done outside
of it, leading to false positives in the "are we dropping the last reference"
test. Consider the following situation:
* mnt is a lazy-umounted mount, kept alive by two opened files. One
of those files gets closed. Total refcount of mnt is 2. On CPU 42
mntput(mnt) (called from __fput()) drops one reference, decrementing component
* After it has looked at component #0, the process on CPU 0 does
mntget(), incrementing component #0, gets preempted and gets to run again -
on CPU 69. There it does mntput(), which drops the reference (component #69)
and proceeds to spin on mount_lock.
* On CPU 42 our first mntput() finishes counting. It observes the
decrement of component #69, but not the increment of component #0. As the
result, the total it gets is not 1 as it should've been - it's 0. At which
point we decide that vfsmount needs to be killed and proceed to free it and
shut the filesystem down. However, there's still another opened file
on that filesystem, with reference to (now freed) vfsmount, etc. and we are
screwed.
It's not a wide race, but it can be reproduced with artificial slowdown of
the mnt_get_count() loop, and it should be easier to hit on SMP KVM setups.
Fix consists of moving the refcount decrement under mount_lock; the tricky
part is that we want (and can) keep the fast case (i.e. mount that still
has non-NULL ->mnt_ns) entirely out of mount_lock. All places that zero
mnt->mnt_ns are dropping some reference to mnt and they call synchronize_rcu()
before that mntput(). IOW, if mntput() observes (under rcu_read_lock())
a non-NULL ->mnt_ns, it is guaranteed that there is another reference yet to
be dropped.
Reported-by: Jann Horn <jannh@google.com>
Tested-by: Jann Horn <jannh@google.com>
Fixes: 48a066e72d ("RCU'd vsfmounts")
Cc: stable@vger.kernel.org
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Reviewed-by: Paulo Alcantara <palcantara@suse.com>
Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
Reviewed-by: Paulo Alcantara <palcantara@suse.com>
Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
Reviewed-by: Paulo Alcantara <palcantara@suse.com>
The spinlock handling in this file has changed significantly since this
comment was written, and the file_lock_lock is no more. In addition,
this overall comment no longer applies. Deleting an entry now requires
both locks.
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Function update_rgrp_lvb_unlinked used to do the same thing as
be32_add_cpu. This patch removes it in favor of using be32_add_cpu
directly.
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Reviewed-by: Andrew Price <anprice@redhat.com>
When CONFIG_CIFS_STATS2 is enabled keep counters for slow
commands (ie server took longer than 1 second to respond)
by SMB2/SMB3 command code. This can help in diagnosing
whether performance problems are on server (instead of
client) and which commands are causing the problem.
Sample output (the new lines contain words "slow responses ...")
$ cat /proc/fs/cifs/Stats
Resources in use
CIFS Session: 1
Share (unique mount targets): 2
SMB Request/Response Buffer: 1 Pool size: 5
SMB Small Req/Resp Buffer: 1 Pool size: 30
Total Large 10 Small 490 Allocations
Operations (MIDs): 0
0 session 0 share reconnects
Total vfs operations: 67 maximum at one time: 2
4 slow responses from localhost for command 5
1 slow responses from localhost for command 6
1 slow responses from localhost for command 14
1 slow responses from localhost for command 16
1) \\localhost\test
SMBs: 243
Bytes read: 1024000 Bytes written: 104857600
TreeConnects: 1 total 0 failed
TreeDisconnects: 0 total 0 failed
Creates: 40 total 0 failed
Closes: 39 total 0 failed
...
Signed-off-by: Steve French <stfrench@microsoft.com>
Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
Reviewed-by: Aurelien Aptel <aaptel@suse.com>
server->secmech.sdeschmacsha256 is not properly initialized before
smb2_shash_allocate(), set shash after that call.
also fix typo in error message
Fixes: 8de8c4608f ("cifs: Fix validation of signed data in smb2")
Signed-off-by: Aurelien Aptel <aaptel@suse.com>
Reviewed-by: Paulo Alcantara <palcantara@suse.com>
Reported-by: Xiaoli Feng <xifeng@redhat.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
CC: Stable <stable@vger.kernel.org>
An earlier commit had a typo which prevented the
optimization from working:
commit 18dd8e1a65 ("Do not send SMB3 SET_INFO request if nothing is changing")
Thank you to Metze for noticing this. Also clear a
reserved field in the FILE_BASIC_INFO struct we send
that should be zero (all the other fields in that
struct were set or cleared explicitly already in
cifs_set_file_info).
Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
CC: Stable <stable@vger.kernel.org> # 4.9.x+
Reported-by: Stefan Metzmacher <metze@samba.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
CONFIG_CIFS_STATS is now always enabled (to simplify the
code and since the STATS are important for some common
customer use cases and also debugging), but needed one
minor change so that STATS shows as enabled in the debug
output in /proc/fs/cifs/DebugData, otherwise it could
get confusing with STATS no longer showing up in the
"Features" list in /proc/fs/cifs/DebugData when basic
stats were in fact available.
Signed-off-by: Steve French <stfrench@microsoft.com>
Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>
If responses take longer than one second from the server,
we can optionally log them to dmesg in current cifs.ko code
(CONFIG_CIFS_STATS2 must be configured and a
/proc/fs/cifs/cifsFYI flag must be set), but can be more useful
to log these via ftrace (tracepoint is smb3_slow_rsp) which
is easier and more granular (still requires CONFIG_CIFS_STATS2
to be configured in the build though).
Signed-off-by: Steve French <stfrench@microsoft.com>
Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
These are used for SMB3 encryption and compounded requests.
Update these functions and the other functions related to SMB3 encryption to
take an array of requests.
Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
echo 0 > /proc/fs/cifs/Stats is supposed to reset the stats
but there were four (see example below) that were not reset
(bytes read and witten, total vfs ops and max ops
at one time).
...
0 session 0 share reconnects
Total vfs operations: 100 maximum at one time: 2
1) \\localhost\test
SMBs: 0
Bytes read: 502092 Bytes written: 31457286
TreeConnects: 0 total 0 failed
TreeDisconnects: 0 total 0 failed
...
This patch fixes cifs_stats_proc_write to properly reset
those four.
Signed-off-by: Steve French <stfrench@microsoft.com>
Reviewed-by: Aurelien Aptel <aaptel@suse.com>
We were only displaying bytes_read and bytes_written in cifs
stats, fix smb3 stats to also display them. Sample output
with this patch:
cat /proc/fs/cifs/Stats:
CIFS Session: 1
Share (unique mount targets): 2
SMB Request/Response Buffer: 1 Pool size: 5
SMB Small Req/Resp Buffer: 1 Pool size: 30
Operations (MIDs): 0
0 session 0 share reconnects
Total vfs operations: 94 maximum at one time: 2
1) \\localhost\test
SMBs: 214
Bytes read: 502092 Bytes written: 31457286
TreeConnects: 1 total 0 failed
TreeDisconnects: 0 total 0 failed
Creates: 52 total 3 failed
Closes: 48 total 0 failed
Flushes: 0 total 0 failed
Reads: 17 total 0 failed
Writes: 31 total 0 failed
...
Signed-off-by: Steve French <stfrench@microsoft.com>
Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>
CONFIG_CIFS_STATS should always be enabled as Pavel recently
noted. Simple statistics are not a significant performance hit,
and removing the ifdef simplifies the code slightly.
Signed-off-by: Steve French <stfrench@microsoft.com>
Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>
Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
Cc: <stable@vger.kernel.org>
Add tracepoints for reconnecting an smb3 session
Example output (from trace-cmd) with the patch
(showing the session marked for reconnect, the stat failing, and then
the subsequent SMB3 commands after the server comes back up).
The "smb3_reconnect" event is the new one.
cifsd-25993 [000] .... 29635.368265: smb3_reconnect: server=localhost current_mid=0x1e
stat-26200 [001] .... 29638.516403: smb3_enter: cifs_revalidate_dentry_attr: xid=22
stat-26200 [001] .... 29648.723296: smb3_exit_err: cifs_revalidate_dentry_attr: xid=22 rc=-112
kworker/0:1-22830 [000] .... 29653.850947: smb3_cmd_done: sid=0x0 tid=0x0 cmd=0 mid=0
kworker/0:1-22830 [000] .... 29653.851191: smb3_cmd_err: sid=0x8ae4683c tid=0x0 cmd=1 mid=1 status=0xc0000016 rc=-5
kworker/0:1-22830 [000] .... 29653.855254: smb3_cmd_done: sid=0x8ae4683c tid=0x0 cmd=1 mid=2
kworker/0:1-22830 [000] .... 29653.855482: smb3_cmd_done: sid=0x8ae4683c tid=0x8084f30d cmd=3 mid=3
Signed-off-by: Steve French <stfrench@microsoft.com>
Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>
Reviewed-by: Aurelien Aptel <aaptel@suse.com>
In debugging reconnection problems, want to be able to more easily
trace cases in which the server has marked the SMB3 session
expired or deleted (to distinguish from timeout cases).
Signed-off-by: Steve French <stfrench@microsoft.com>
Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>
These timers were a good idea but weren't used in current code,
and the idea was cifs specific. Future patch will add similar timers
for SMB2/SMB3, but no sense using memory for cifs timers that
aren't used in current code.
Signed-off-by: Steve French <stfrench@microsoft.com>
Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>
Fixes problem pointed out by Pavel in discussions about commit
729c0c9dd5
Signed-off-by: Pavel Shilovsky <pshilov@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>
CC: Stable <stable@vger.kernel.org> # 3.18.x+
Remove counters from the per-tree connection /proc/fs/cifs/Stats
output that will always be zero (since they are not per-tcon ops)
ie SMB3 Negotiate, SessionSetup, Logoff, Echo, Cancel.
Also clarify "sent" to be "total" per-Pavel's suggestion
(since this "total" includes total for all operations that we try to
send whether or not succesffully sent). Sample output below:
Resources in use
CIFS Session: 1
Share (unique mount targets): 2
SMB Request/Response Buffer: 1 Pool size: 5
SMB Small Req/Resp Buffer: 1 Pool size: 30
Operations (MIDs): 0
1 session 2 share reconnects
Total vfs operations: 23 maximum at one time: 2
1) \\localhost\test
SMBs: 45
TreeConnects: 2 total 0 failed
TreeDisconnects: 0 total 0 failed
Creates: 13 total 2 failed
Closes: 9 total 0 failed
Flushes: 0 total 0 failed
Reads: 0 total 0 failed
Writes: 1 total 0 failed
Locks: 0 total 0 failed
IOCTLs: 3 total 1 failed
QueryDirectories: 4 total 2 failed
ChangeNotifies: 0 total 0 failed
QueryInfos: 10 total 0 failed
SetInfos: 3 total 0 failed
OplockBreaks: 0 sent 0 failed
Signed-off-by: Steve French <stfrench@microsoft.com>
Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
For SMB2/SMB3 the number of requests sent was not displayed
in /proc/fs/cifs/Stats unless CONFIG_CIFS_STATS2 was
enabled (only number of failed requests displayed). As
with earlier dialects, we should be displaying these
counters if CONFIG_CIFS_STATS is enabled. They
are important for debugging.
e.g. when you cat /proc/fs/cifs/Stats (before the patch)
Resources in use
CIFS Session: 1
Share (unique mount targets): 2
SMB Request/Response Buffer: 1 Pool size: 5
SMB Small Req/Resp Buffer: 1 Pool size: 30
Operations (MIDs): 0
0 session 0 share reconnects
Total vfs operations: 690 maximum at one time: 2
1) \\localhost\test
SMBs: 975
Negotiates: 0 sent 0 failed
SessionSetups: 0 sent 0 failed
Logoffs: 0 sent 0 failed
TreeConnects: 0 sent 0 failed
TreeDisconnects: 0 sent 0 failed
Creates: 0 sent 2 failed
Closes: 0 sent 0 failed
Flushes: 0 sent 0 failed
Reads: 0 sent 0 failed
Writes: 0 sent 0 failed
Locks: 0 sent 0 failed
IOCTLs: 0 sent 1 failed
Cancels: 0 sent 0 failed
Echos: 0 sent 0 failed
QueryDirectories: 0 sent 63 failed
Signed-off-by: Steve French <stfrench@microsoft.com>
Reviewed-by: Aurelien Aptel <aaptel@suse.com>
Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
snapshot mounts were not marked as read-only and did not display the snapshot
time (in /proc/mounts) specified on mount
With this patch - note that can not write to the snapshot mount (see "ro" in
/proc/mounts line) and also the missing snapshot timewarp token time is
dumped. Sample line from /proc/mounts with the patch:
//127.0.0.1/scratch /mnt2 smb3 ro,relatime,vers=default,cache=strict,username=testuser,domain=,uid=0,noforceuid,gid=0,noforcegid,addr=127.0.0.1,file_mode=0755,dir_mode=0755,soft,nounix,serverino,mapposix,noperm,rsize=1048576,wsize=1048576,echo_interval=60,snapshot=1234567,actimeo=1 0 0
Signed-off-by: Steve French <stfrench@microsoft.com>
Reviewed-by: Paulo Alcantara <palcantara@suse.de>
Some servers, like Samba, don't support the fsctl for
query_network_interface_info so don't log a noisy warning
message on mount for this by default unless the error is more serious.
Lower the error to an FYI level so it does not get logged by
default.
Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>
Reviewed-by: Aurelien Aptel <aaptel@suse.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
We really, really want to be encouraging use of secure dialects,
and SMB3.1.1 offers useful security features, and will soon
be the recommended dialect for many use cases. Simplify the code
by removing the CONFIG_CIFS_SMB311 ifdef so users don't disable
it in the build, and create compatibility and/or security issues
with modern servers - many of which have been supporting this
dialect for multiple years.
Also clarify some of the Kconfig text for cifs.ko about
SMB3.1.1 and current supported features in the module.
Signed-off-by: Steve French <stfrench@microsoft.com>
Acked-by: Aurelien Aptel <aaptel@suse.com>
Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>
/proc/fs/cifs/DebugData displays the features (Kconfig options)
used to build cifs.ko but it was missing some, and needed comma
separator. These can be useful in debugging certain problems
so we know which optional features were enabled in the user's build.
Also clarify them, by making them more closely match the
corresponding CONFIG_CIFS_* parm.
Old format:
Features: dfs fscache posix spnego xattr acl
New format:
Features: DFS,FSCACHE,SMB_DIRECT,STATS,DEBUG2,ALLOW_INSECURE_LEGACY,CIFS_POSIX,UPCALL(SPNEGO),XATTR,ACL
Signed-off-by: Steve French <stfrench@microsoft.com>
Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>
Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
Reviewed-by: Paulo Alcantara <palcantara@suse.de>
CC: Stable <stable@vger.kernel.org>
Output now matches expected stat -f output for all fields
except for Namelen and ID which were addressed in a companion
patch (which retrieves them from existing SMB3 mechanisms
and works whether POSIX enabled or not)
Signed-off-by: Steve French <smfrench@gmail.com>
Reviewed-by: Aurelien Aptel <aaptel@suse.com>
Fil in the correct namelen (typically 255 not 4096) in the
statfs response and also fill in a reasonably unique fsid
(in this case taken from the volume id, and the creation time
of the volume).
In the case of the POSIX statfs all fields are now filled in,
and in the case of non-POSIX mounts, all fields are filled
in which can be.
Signed-off-by: Steve French <stfrench@gmail.com>
CC: Stable <stable@vger.kernel.org>
Reviewed-by: Aurelien Aptel <aaptel@suse.com>
Check if every data page is signed correctly in sigining helper.
Signed-off-by: Paulo Alcantara <palcantara@suse.de>
Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
also fixes error code in smb311_posix_mkdir() (where
the error assignment needs to go before the goto)
a typo that Dan Carpenter and Paulo and Gustavo
pointed out.
Signed-off-by: Aurelien Aptel <aaptel@suse.com>
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Reviewed-by: Paulo Alcantara <palcantara@suse.de>
Signed-off-by: Steve French <stfrench@microsoft.com>
allow disabling cifs (SMB1 ie vers=1.0) and vers=2.0 in the
config for the build of cifs.ko if want to always prevent mounting
with these less secure dialects.
Signed-off-by: Steve French <stfrench@microsoft.com>
Reviewed-by: Aurelien Aptel <aaptel@suse.com>
Reviewed-by: Jeremy Allison <jra@samba.org>
If user specifies "posix" on an SMB3.11 mount, then fail the mount
if server does not return the POSIX negotiate context indicating
support for posix.
Signed-off-by: Steve French <stfrench@microsoft.com>
Reviewed-by: Aurelien Aptel <aaptel@suse.com>
In the fscache, we just need the timestamps as cookies to check for
changes, so we don't really care about the overflow, but it's better
to stop using the deprecated timespec so we don't have to go through
explicit conversion functions.
To avoid comparing uninitialized padding values that are copied
while assigning the timespec values, this rearranges the members of
cifs_fscache_inode_auxdata to avoid padding, and assigns them
individually.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Paulo Alcantara <palcantara@suse.de>
Signed-off-by: Steve French <stfrench@microsoft.com>
In cifs, the timestamps are stored in memory in the cifs_fattr structure,
which uses the deprecated 'timespec' structure. Now that the VFS code
has moved on to 'timespec64', the next step is to change over the fattr
as well.
This also makes 32-bit and 64-bit systems behave the same way, and
no longer overflow the 32-bit time_t in year 2038.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Paulo Alcantara <palcantara@suse.de>
Signed-off-by: Steve French <stfrench@microsoft.com>
This is not really a runtime issue but Smatch complains that:
fs/cifs/smb2ops.c:1740 smb2_query_symlink()
error: uninitialized symbol 'resp_buftype'.
The warning is right that it can be uninitialized... Also "err_buf"
would be NULL at this point and we're not supposed to pass NULLs to
free_rsp_buf() or it might trigger some extra output if we turn on
debugging.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Paulo Alcantara <palcantara@suse.de>
Signed-off-by: Steve French <stfrench@microsoft.com>
Colin Ian King reports that commit 82ff27bc52 ("xfs: automatic dfops
buffer relogging") leaves around some dead error handling code in
xfs_dquot_disk_alloc(). This was discovered via Coverity scan.
Since the associated commit eliminates the act of joining a buffer
to a dfops, this intermediate error state is no longer possible and
the error handling code can be removed. Since the caller cancels the
transaction on error, which cancels the dfops, eliminate the
unnecessary xfs_defer_cancel() call and error handling labels.
Fixes: 82ff27bc52 ("xfs: automatic dfops buffer relogging")
Reported-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
This adds ordering of the updates and makes sure we always see the if_seq
update before the extent tree is modified.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Function gfs2_testbit is called in three places. Two of those places,
gfs2_alloc_extent and gfs2_unaligned_extlen, should be using the clone
bitmaps, not the "real" bitmaps. Function gfs2_unaligned_extlen is used
by the block reservations scheme to determine the length of an extent of
free blocks. Before this patch, it wasn't using the clone bitmap, which
means recently-freed blocks were treated as free blocks for the purposes
of an allocation.
This patch adds a new parameter to gfs2_testbit to indicate whether or
not the clone bitmaps should be used (if available).
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Reviewed-by: Andreas Gruenbacher <agruenba@redhat.com>
Since mountpoint crossing can happen without leaving lazy mode,
root dentries do need the same protection against having their
memory freed without RCU delay as everything else in the tree.
It's partially hidden by RCU delay between detaching from the
mount tree and dropping the vfsmount reference, but the starting
point of pathwalk can be on an already detached mount, in which
case umount-caused RCU delay has already passed by the time the
lazy pathwalk grabs rcu_read_lock(). If the starting point
happens to be at the root of that vfsmount *and* that vfsmount
covers the entire filesystem, we get trouble.
Fixes: 48a066e72d ("RCU'd vsfmounts")
Cc: stable@vger.kernel.org
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
When btrfs hits error after modifying fs_devices in
btrfs_init_new_device() (such as btrfs_add_dev_item() returns error), it
leaves everything as is, but frees allocated btrfs_device. As a result,
fs_devices->devices and fs_devices->alloc_list contain already freed
btrfs_device, leading to later use-after-free bug.
Error path also messes the things like ->num_devices. While they go back
to the original value by unscanning btrfs devices, it is safe to revert
them here.
Fixes: 79787eaab4 ("btrfs: replace many BUG_ONs with proper error handling")
Signed-off-by: Naohiro Aota <naota@elisp.net>
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
It's entirely possible that a crafted btrfs image contains overlapping
chunks.
Although we can't detect such problem by tree-checker, it's not a
catastrophic problem, current extent map can already detect such problem
and return -EEXIST.
We just only need to exit gracefully and fail the mount.
Reported-by: Xu Wen <wen.xu@gatech.edu>
Link: https://bugzilla.kernel.org/show_bug.cgi?id=200409
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
This patch will introduce chunk <-> dev extent mapping check, to protect
us against invalid dev extents or chunks.
Since chunk mapping is the fundamental infrastructure of btrfs, extra
check at mount time could prevent a lot of unexpected behavior (BUG_ON).
Reported-by: Xu Wen <wen.xu@gatech.edu>
Link: https://bugzilla.kernel.org/show_bug.cgi?id=200403
Link: https://bugzilla.kernel.org/show_bug.cgi?id=200407
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: Su Yue <suy.fnst@cn.fujitsu.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
If a crafted image has missing block group items, it could cause
unexpected behavior and breaks the assumption of 1:1 chunk<->block group
mapping.
Although we have the block group -> chunk mapping check, we still need
chunk -> block group mapping check.
This patch will do extra check to ensure each chunk has its
corresponding block group.
Link: https://bugzilla.kernel.org/show_bug.cgi?id=199847
Reported-by: Xu Wen <wen.xu@gatech.edu>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: Gu Jinxiang <gujx@cn.fujitsu.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
A crafted btrfs image with incorrect chunk<->block group mapping will
trigger a lot of unexpected things as the mapping is essential.
Although the problem can be caught by block group item checker
added in "btrfs: tree-checker: Verify block_group_item", it's still not
sufficient. A sufficiently valid block group item can pass the check
added by the mentioned patch but could fail to match the existing chunk.
This patch will add extra block group -> chunk mapping check, to ensure
we have a completely matching (start, len, flags) chunk for each block
group at mount time.
Here we reuse the original helper find_first_block_group(), which is
already doing the basic bg -> chunk checks, adding further checks of the
start/len and type flags.
Link: https://bugzilla.kernel.org/show_bug.cgi?id=199837
Reported-by: Xu Wen <wen.xu@gatech.edu>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: Su Yue <suy.fnst@cn.fujitsu.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
When doing an incremental send, if we have a file in the parent snapshot
that has prealloc extents beyond EOF and in the send snapshot it got a
hole punch that partially covers the prealloc extents, the send stream,
when replayed by a receiver, can result in a file that has a size bigger
than it should and filled with zeroes past the correct EOF.
For example:
$ mkfs.btrfs -f /dev/sdb
$ mount /dev/sdb /mnt
$ xfs_io -f -c "falloc -k 0 4M" /mnt/foobar
$ xfs_io -c "pwrite -S 0xea 0 1M" /mnt/foobar
$ btrfs subvolume snapshot -r /mnt /mnt/snap1
$ btrfs send -f /tmp/1.send /mnt/snap1
$ xfs_io -c "fpunch 1M 2M" /mnt/foobar
$ btrfs subvolume snapshot -r /mnt /mnt/snap2
$ btrfs send -f /tmp/2.send -p /mnt/snap1 /mnt/snap2
$ stat --format %s /mnt/snap2/foobar
1048576
$ md5sum /mnt/snap2/foobar
d31659e82e87798acd4669a1e0a19d4f /mnt/snap2/foobar
$ umount /mnt
$ mkfs.btrfs -f /dev/sdc
$ mount /dev/sdc /mnt
$ btrfs receive -f /mnt/1.snap /mnt
$ btrfs receive -f /mnt/2.snap /mnt
$ stat --format %s /mnt/snap2/foobar
3145728
# --> should be 1Mb and not 3Mb (which was the end offset of hole
# punch operation)
$ md5sum /mnt/snap2/foobar
117baf295297c2a995f92da725b0b651 /mnt/snap2/foobar
# --> should be d31659e82e87798acd4669a1e0a19d4f as in the original fs
This issue actually happens only since commit ffa7c4296e ("Btrfs: send,
do not issue unnecessary truncate operations"), but before that commit we
were issuing a write operation full of zeroes (to "punch" a hole) which
was extending the file size beyond the correct value and then immediately
issue a truncate operation to the correct size and undoing the previous
write operation. Since the send protocol does not support fallocate, for
extent preallocation and hole punching, fix this by not even attempting
to send a "hole" (regular write full of zeroes) if it starts at an offset
greater then or equals to the file's size. This approach, besides being
much more simple then making send issue the truncate operation, adds the
benefit of avoiding the useless pair of write of zeroes and truncate
operations, saving time and IO at the receiver and reducing the size of
the send stream.
A test case for fstests follows soon.
Fixes: ffa7c4296e ("Btrfs: send, do not issue unnecessary truncate operations")
CC: stable@vger.kernel.org # 4.17+
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Don't open-code iget_failed(), don't bother with btrfs_free_path(NULL),
move handling of positive return values of btrfs_lookup_inode() from
btrfs_read_locked_inode() to btrfs_iget() and kill now obviously
pointless ASSERT() in there.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
We don't need to check is_bad_inode() after the call of
btrfs_read_locked_inode() - it's exactly the same as checking return
value for being non-zero.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Reviewed-by: David Sterba <dsterba@suse.com>
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
IS_ERR(p) && PTR_ERR(p) == n is a weird way to spell p == ERR_PTR(n).
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Reviewed-by: David Sterba <dsterba@suse.com>
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
[ update changelog ]
Signed-off-by: David Sterba <dsterba@suse.com>
Just get rid of pointless checks.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
on-disk devs stats value is updated in btrfs_run_dev_stats(),
which is called during commit transaction, if device->dev_stats_ccnt
is not zero.
Since current replace operation does not touch dev_stats_ccnt,
on-disk dev stats value is not updated. Therefore "btrfs device stats"
may return old device's value after umount/mount
(Example: See "btrfs ins dump-t -t DEV $DEV" after btrfs/100 finish).
Fix this by just incrementing dev_stats_ccnt in
btrfs_dev_replace_finishing() when replace is succeeded and this will
update the values.
Signed-off-by: Misono Tomohiro <misono.tomohiro@jp.fujitsu.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
There is no user of this function anymore.
This was forgotten to be removed in commit a575ceeb13
("Btrfs: get rid of unused orphan infrastructure").
Signed-off-by: Misono Tomohiro <misono.tomohiro@jp.fujitsu.com>
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Use ERR_CAST() instead of void * to make meaning clear.
Signed-off-by: Misono Tomohiro <misono.tomohiro@jp.fujitsu.com>
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Although it is safe to call this on already released paths with no locks
held or extent buffers, removing the redundant btrfs_release_path is
reasonable.
Signed-off-by: Lu Fengqi <lufq.fnst@cn.fujitsu.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
All callers pass the root tree of dir, we can push that down to the
function itself.
Signed-off-by: Lu Fengqi <lufq.fnst@cn.fujitsu.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
It can be referenced from the passed transaction handle.
Signed-off-by: Lu Fengqi <lufq.fnst@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.com>
It can be referenced from the passed transaction handle.
Signed-off-by: Lu Fengqi <lufq.fnst@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.com>
It can be referenced from the passed transaction handle.
Signed-off-by: Lu Fengqi <lufq.fnst@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.com>
It can be referenced from the passed transaction handle.
Signed-off-by: Lu Fengqi <lufq.fnst@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.com>
It can be referenced from the passed transaction handle.
Signed-off-by: Lu Fengqi <lufq.fnst@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Leftover after fix e339a6b097 ("Btrfs: __btrfs_mod_ref should always
use no_quota"), that removed it from the function calls but not the
structure.
Signed-off-by: David Sterba <dsterba@suse.com>
The more common use case of send involves creating a RO snapshot and then
use it for a send operation. In this case it's not possible to have inodes
in the snapshot that have a link count of zero (inode with an orphan item)
since during snapshot creation we do the orphan cleanup. However, other
less common use cases for send can end up seeing inodes with a link count
of zero and in this case the send operation fails with a ENOENT error
because any attempt to generate a path for the inode, with the purpose
of creating it or updating it at the receiver, fails since there are no
inode reference items. One use case it to use a regular subvolume for
a send operation after turning it to RO mode or turning a RW snapshot
into RO mode and then using it for a send operation. In both cases, if a
file gets all its hard links deleted while there is an open file
descriptor before turning the subvolume/snapshot into RO mode, the send
operation will encounter an inode with a link count of zero and then
fail with errno ENOENT.
Example using a full send with a subvolume:
$ mkfs.btrfs -f /dev/sdb
$ mount /dev/sdb /mnt
$ btrfs subvolume create /mnt/sv1
$ touch /mnt/sv1/foo
$ touch /mnt/sv1/bar
# keep an open file descriptor on file bar
$ exec 73</mnt/sv1/bar
$ unlink /mnt/sv1/bar
# Turn the subvolume to RO mode and use it for a full send, while
# holding the open file descriptor.
$ btrfs property set /mnt/sv1 ro true
$ btrfs send -f /tmp/full.send /mnt/sv1
At subvol /mnt/sv1
ERROR: send ioctl failed with -2: No such file or directory
Example using an incremental send with snapshots:
$ mkfs.btrfs -f /dev/sdb
$ mount /dev/sdb /mnt
$ btrfs subvolume create /mnt/sv1
$ touch /mnt/sv1/foo
$ touch /mnt/sv1/bar
$ btrfs subvolume snapshot -r /mnt/sv1 /mnt/snap1
$ echo "hello world" >> /mnt/sv1/bar
$ btrfs subvolume snapshot -r /mnt/sv1 /mnt/snap2
# Turn the second snapshot to RW mode and delete file foo while
# holding an open file descriptor on it.
$ btrfs property set /mnt/snap2 ro false
$ exec 73</mnt/snap2/foo
$ unlink /mnt/snap2/foo
# Set the second snapshot back to RO mode and do an incremental send.
$ btrfs property set /mnt/snap2 ro true
$ btrfs send -f /tmp/inc.send -p /mnt/snap1 /mnt/snap2
At subvol /mnt/snap2
ERROR: send ioctl failed with -2: No such file or directory
So fix this by ignoring inodes with a link count of zero if we are either
doing a full send or if they do not exist in the parent snapshot (they
are new in the send snapshot), and unlink all paths found in the parent
snapshot when doing an incremental send (and ignoring all other inode
items, such as xattrs and extents).
A test case for fstests follows soon.
CC: stable@vger.kernel.org # 4.4+
Reported-by: Martin Wilck <martin.wilck@suse.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
If we end up with logging an inode reference item which has the same name
but different index from the one we have persisted, we end up failing when
replaying the log with an errno value of -EEXIST. The error comes from
btrfs_add_link(), which is called from add_inode_ref(), when we are
replaying an inode reference item.
Example scenario where this happens:
$ mkfs.btrfs -f /dev/sdb
$ mount /dev/sdb /mnt
$ touch /mnt/foo
$ ln /mnt/foo /mnt/bar
$ sync
# Rename the first hard link (foo) to a new name and rename the second
# hard link (bar) to the old name of the first hard link (foo).
$ mv /mnt/foo /mnt/qwerty
$ mv /mnt/bar /mnt/foo
# Create a new file, in the same parent directory, with the old name of
# the second hard link (bar) and fsync this new file.
# We do this instead of calling fsync on foo/qwerty because if we did
# that the fsync resulted in a full transaction commit, not triggering
# the problem.
$ touch /mnt/bar
$ xfs_io -c "fsync" /mnt/bar
<power fail>
$ mount /dev/sdb /mnt
mount: mount /dev/sdb on /mnt failed: File exists
So fix this by checking if a conflicting inode reference exists (same
name, same parent but different index), removing it (and the associated
dir index entries from the parent inode) if it exists, before attempting
to add the new reference.
A test case for fstests follows soon.
CC: stable@vger.kernel.org # 4.4+
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
If we're trying to make a data reservation and we have to allocate a
data chunk we could leak ret == 1, as do_chunk_alloc() will return 1 if
it allocated a chunk. Since the end of the function is the success path
just return 0.
CC: stable@vger.kernel.org # 4.4+
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
The exported helper just calls the static one. There's no obvious reason
to have them separate eg. for performance reasons where the static one
could be better optimized in the same unit. There's a slight decrease in
code size and stack consumption.
Signed-off-by: David Sterba <dsterba@suse.com>
Lock owner and nesting level have been unused since day 1, probably
copy&pasted from the extent_buffer locking scheme without much thinking.
The locking of device replace is simpler and does not need any lock
nesting.
Signed-off-by: David Sterba <dsterba@suse.com>
Added in 58176a9604 ("Btrfs: Add per-root block accounting and sysfs
entries") in 2007, the roots had names exported in sysfs. The code
was commented out in 4df27c4d5c ("Btrfs: change how subvolumes
are organized") and cleaned by 182608c829 ("btrfs: remove old
unused commented out code").
Signed-off-by: David Sterba <dsterba@suse.com>
Requiring a read-write descriptor conflicts both ways with exec,
returning ETXTBSY whenever you try to defrag a program that's currently
being run, or causing intermittent exec failures on a live system being
defragged.
As defrag doesn't change the file's contents in any way, there's no
reason to consider it a rw operation. Thus, let's check only whether
the file could have been opened rw. Such access control is still needed
as currently defrag can use extra disk space, and might trigger bugs.
We return EINVAL when the request is invalid; here it's ok but merely
the user has insufficient privileges. Thus, the EPERM return value
reflects the error better -- as discussed in the identical case for
dedupe.
According to codesearch.debian.net, no userspace program distinguishes
these values beyond strerror().
Signed-off-by: Adam Borowski <kilobyte@angband.pl>
Reviewed-by: David Sterba <dsterba@suse.com>
[ fold the EPERM patch from Adam ]
Signed-off-by: David Sterba <dsterba@suse.com>
We recently ran into the following deadlock involving
btrfs_write_inode():
[ +0.005066] __schedule+0x38e/0x8c0
[ +0.007144] schedule+0x36/0x80
[ +0.006447] bit_wait+0x11/0x60
[ +0.006446] __wait_on_bit+0xbe/0x110
[ +0.007487] ? bit_wait_io+0x60/0x60
[ +0.007319] __inode_wait_for_writeback+0x96/0xc0
[ +0.009568] ? autoremove_wake_function+0x40/0x40
[ +0.009565] inode_wait_for_writeback+0x21/0x30
[ +0.009224] evict+0xb0/0x190
[ +0.006099] iput+0x1a8/0x210
[ +0.006103] btrfs_run_delayed_iputs+0x73/0xc0
[ +0.009047] btrfs_commit_transaction+0x799/0x8c0
[ +0.009567] btrfs_write_inode+0x81/0xb0
[ +0.008008] __writeback_single_inode+0x267/0x320
[ +0.009569] writeback_sb_inodes+0x25b/0x4e0
[ +0.008702] wb_writeback+0x102/0x2d0
[ +0.007487] wb_workfn+0xa4/0x310
[ +0.006794] ? wb_workfn+0xa4/0x310
[ +0.007143] process_one_work+0x150/0x410
[ +0.008179] worker_thread+0x6d/0x520
[ +0.007490] kthread+0x12c/0x160
[ +0.006620] ? put_pwq_unlocked+0x80/0x80
[ +0.008185] ? kthread_park+0xa0/0xa0
[ +0.007484] ? do_syscall_64+0x53/0x150
[ +0.007837] ret_from_fork+0x29/0x40
Writeback calls:
btrfs_write_inode
btrfs_commit_transaction
btrfs_run_delayed_iputs
If iput() is called on that same inode, evict() will wait for writeback
forever.
btrfs_write_inode() was originally added way back in 4730a4bc5b
("btrfs_dirty_inode") to support O_SYNC writes. However, ->write_inode()
hasn't been used for O_SYNC since 148f948ba8 ("vfs: Introduce new
helpers for syncing after writing to O_SYNC file or IS_SYNC inode"), so
btrfs_write_inode() is actually unnecessary (and leads to a bunch of
unnecessary commits). Get rid of it, which also gets rid of the
deadlock.
CC: stable@vger.kernel.org # 3.2+
Signed-off-by: Josef Bacik <jbacik@fb.com>
[Omar: new commit message]
Signed-off-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: David Sterba <dsterba@suse.com>
It can be referenced from the passed transaction handle.
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: Lu Fengqi <lufq.fnst@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.com>
It can be referenced from the passed transaction handle.
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: Lu Fengqi <lufq.fnst@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.com>
This function is always passed a well-formed tgtdevice so the fs_info
can be referenced from there.
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: Lu Fengqi <lufq.fnst@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.com>
It can be referenced from the passed 'device' argument which is always
a well-formed device.
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: Lu Fengqi <lufq.fnst@cn.fujitsu.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
It can be referenced from the passed transaction handle.
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: Lu Fengqi <lufq.fnst@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.com>
It can be referenced from the passed srcdev argument.
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: Lu Fengqi <lufq.fnst@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.com>
It can be referenced form the passed transaction handle.
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: Lu Fengqi <lufq.fnst@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.com>
In find_free_extent() under checks: label, we have the following code:
search_start = ALIGN(offset, fs_info->stripesize);
/* move on to the next group */
if (search_start + num_bytes >
block_group->key.objectid + block_group->key.offset) {
btrfs_add_free_space(block_group, offset, num_bytes);
goto loop;
}
if (offset < search_start)
btrfs_add_free_space(block_group, offset,
search_start - offset);
BUG_ON(offset > search_start);
However ALIGN() is rounding up, thus @search_start >= @offset and that
BUG_ON() will never be triggered.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
At send.c:full_send_tree() we were setting the 'key' variable in the loop
while never using it later. We were also using two btrfs_key variables
to store the initial key for search and the key found in every iteration
of the loop. So remove this useless key assignment and use the same
btrfs_key variable to store the initial search key and the key found in
each iteration. This was introduced in the initial send commit but was
never used (commit 31db9f7c23 ("Btrfs: introduce BTRFS_IOC_SEND for
btrfs send/receive").
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
The data and metadata callback implementation both use the same
function. We can remove the call indirection and intermediate helper
completely.
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
The data and metadata callback implementation both use the same
function. We can remove the call indirection completely.
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
All implementations of the callback are trivial and do the same and
there's only one user. Merge everything together.
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
The end_io callbacks passed to btrfs_wq_submit_bio
(btrfs_submit_bio_done and btree_submit_bio_done) are effectively the
same code, there's no point to do the indirection. Export
btrfs_submit_bio_done and call it directly.
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
After splitting the start and end hooks in a758781d4b ("btrfs:
separate types for submit_bio_start and submit_bio_done"), some of
the function arguments were dropped but not removed from the structure.
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Introduced by c6100a4b4e ("Btrfs: replace tree->mapping with
tree->private_data") to be used in run_one_async_done where it got
unused after 736cd52e0c ("Btrfs: remove nr_async_submits and
async_submit_draining").
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Reported in https://bugzilla.kernel.org/show_bug.cgi?id=199839, with an
image that has an invalid chunk type but does not return an error.
Add chunk type check in btrfs_check_chunk_valid, to detect the wrong
type combinations.
Link: https://bugzilla.kernel.org/show_bug.cgi?id=199839
Reported-by: Xu Wen <wen.xu@gatech.edu>
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Gu Jinxiang <gujx@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.com>
EXTENT_BUFFER_DUMMY is an awful name for this flag. Buffers which have
this flag set are not in any way dummy. Rather, they are private in the
sense that are not mapped and linked to the global buffer tree. This
flag has subtle implications to the way free_extent_buffer works for
example, as well as controls whether page->mapping->private_lock is held
during extent_buffer release. Pages for an unmapped buffer cannot be
under io, nor can they be written by a 3rd party so taking the lock is
unnecessary.
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
[ EXTENT_BUFFER_UNMAPPED, update changelog ]
Signed-off-by: David Sterba <dsterba@suse.com>
Remove stale comment since there is no longer an eb->eb_lock and
document the locking expectation with a lockdep_assert_held statement.
No functional changes.
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
The function used to release one page (and always the first one), but
not anymore since a50924e3a4 ("btrfs: drop constant param
from btrfs_release_extent_buffer_page"). Update the name and comment.
Signed-off-by: David Sterba <dsterba@suse.com>
The purpose of the function is to free all the pages comprising an
extent buffer. This can be achieved with a simple for loop rather than
the slightly more involved 'do {} while' construct. So rewrite the
loop using a 'for' construct. Additionally we can never have an
extent_buffer that has 0 pages so remove the check for index == 0. No
functional changes.
The reversed order used to have a meaning in the past where the first
page served as a blocking point for several callers. See eg
4f2de97ace ("Btrfs: set page->private to the eb").
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Commit eb14ab8ed2 ("Btrfs: fix page->private races") fixed a genuine
race between extent buffer initialisation and btree_releasepage.
Unfortunately as the code has evolved the comments weren't changed which
made them slightly wrong and they weren't very clear in the fist place.
Fix this by (hopefully) rewording them in a more approachable manner.
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Current version of the page unlocking code was added in
727011e07c ("Btrfs: allow metadata blocks larger than the page size")
but even in this commit that particular flag was never used per-se. In
fact, btrfs only uses PageChecked for data pages to identify pages
which have been dirtied but don't have ORDERED bit set. For more
information see 247e743cbe ("Btrfs: Use async helpers to deal with
pages that have been improperly dirtied").
However, this doesn't apply to extent buffer pages. The important bit
here is that the pages are unlocked AFTER the extent buffer has been
properly recorded in the radix tree to avoid races with
btree_releasepage. Let's exploit this fact and simplify the page
unlocking sequence by unlocking the pages in-order and removing the
redundant PageChecked flag setting/clearing.
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Remove the remaining code that misused the page cache pages during
device replace and could cause data corruption for compressed nodatasum
extents. Such files do not normally exist but there's a bug that allows
this combination and the corruption was exposed by device replace fixup
code.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
There are many places that open code the duplicity factor of the block
group profiles, create a common helper. This can be easily extended for
more copies.
Signed-off-by: David Sterba <dsterba@suse.com>
We have assigned the %fs_info->fs_devices in %fs_devices as its not
modified just use it for the mutex_lock().
Signed-off-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
The fs_info can be fetched from the transaction handle directly.
Signed-off-by: Lu Fengqi <lufq.fnst@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.com>
It can be fetched from the transaction handle. In addition, remove the
WARN_ON(trans == NULL) because it's not possible to hit this condition.
Signed-off-by: Lu Fengqi <lufq.fnst@cn.fujitsu.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>