Commit Graph

55278 Commits

Author SHA1 Message Date
Linus Torvalds 10f3e23f07 Convert content from the ext4 wiki to Documentation rst files so it is
more likely to be updated as we add new features to ext4.  Add 64-bit
 timestamp support to ext4's superblock fields.  And the usual bug
 fixes and cleanups, including a Spectre gadget fixup and some
 hardening against maliciously corrupted file systems.
 -----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCAAdFiEEK2m5VNv+CHkogTfJ8vlZVpUNgaMFAltx3CAACgkQ8vlZVpUN
 gaOfkAgApkNU46yPWWFqJpeTR76Go4zW00x9Piaita+KpQcTCUokK0BDXXeydOEo
 zhtNjCRl00vbhnYJ90T75Y3ce9oGQDkWpXHFZBdl+kmMBaIHQJYEECCLR2zrvxVk
 LhsixvLozQd/VFenwzmXLqYshiZjhjBZOKo7/xsoy4waer1vNidA9/SR3hGWTMOk
 9IsEVqw1h6pZt7ZB/leZmZ348H9QZZbYOUWl2ArvdmNeg2t8pWH0MD0bos+mj5lW
 xihyJEIS/OjWZRJhM2WEPMO7U4jD/aX3zmDMGePp/vC5KSqA4FGwk671xIyD+LXR
 Hd8Rn8chRe4/iyhBqHpJNkqHQRUV6g==
 =SA3p
 -----END PGP SIGNATURE-----

Merge tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4

Pull ext4 updates from Ted Ts'o:

 - Convert content from the ext4 wiki to Documentation rst files so it
   is more likely to be updated as we add new features to ext4.

 - Add 64-bit timestamp support to ext4's superblock fields.

 - ... and the usual bug fixes and cleanups, including a Spectre gadget
   fixup and some hardening against maliciously corrupted file systems.

* tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (34 commits)
  ext4: remove unneeded variable "err" in ext4_mb_release_inode_pa()
  ext4: improve code readability in ext4_iget()
  ext4: fix spectre gadget in ext4_mb_regular_allocator()
  ext4: check for NUL characters in extended attribute's name
  ext4: use ext4_warning() for sb_getblk failure
  ext4: fix race when setting the bitmap corrupted flag
  ext4: reset error code in ext4_find_entry in fallback
  ext4: handle layout changes to pinned DAX mappings
  dax: dax_layout_busy_page() warn on !exceptional
  docs: fix up the obviously obsolete bits in the new ext4 documentation
  docs: add new ext4 superblock time extension fields
  docs: create filesystem internal section
  ext4: use swap macro in mext_page_double_lock
  ext4: check allocation failure when duplicating "data" in ext4_remount()
  ext4: fix warning message in ext4_enable_quotas()
  ext4: super: extend timestamps to 40 bits
  jbd2: replace current_kernel_time64 with ktime equivalent
  ext4: use timespec64 for all inode times
  ext4: use ktime_get_real_seconds for i_dtime
  ext4: use 64-bit timestamps for mmp_time
  ...
2018-08-13 22:34:47 -07:00
Linus Torvalds 3bb37da509 smb3/cifs fixes (including 8 for stable). Improved tracing, stats, snapshots (previous version mounts work now), performance (compounding enabled for statfs)
-----BEGIN PGP SIGNATURE-----
 
 iQGzBAABCgAdFiEE6fsu8pdIjtWE/DpLiiy9cAdyT1EFAltwwVkACgkQiiy9cAdy
 T1H06gv/d+b52w2X05YlLBB1GboJjv1dl3sVrFisJrMrCCr6Hotk+4+RtC8CJHh6
 /9Joq6iY8B/XLl7HrWr61vJhn8vlWOGhOQaUmHCBGeiIMX93RbaP8KvkPfHAKQiJ
 T6+7v70z/WL7mUQA2TEQ3Iz6kpCuP2y/XT6eQglTFXasqsWHy08TbsbnmP9yX2i4
 E1B6zMwpn6m4PFxEURg14eBJTrQomb/UBSLHNwxwwOQjnKzsmeH3pyiFCJSPJNFF
 Xv/lyBibgvQAsuWtLTi82fqei+c60as2LVrsFLdoWXeD6JMlx1O+SHSz0NNJL/Lk
 oxMNx9NI+LsYmekIBHgoX01PzVgCpcn6ThfoQv3JLHtqiZuk3Wlai7bPvJMIApoM
 8fgHGR9xvhRCfHgYvxx4MmP3e7xewajSe7fh8/y8BqreO+fLeoyqho5G9zDpn6bU
 wGFPtVYW/fgTEM5E3NOjsuC1zE35bjwU7U40jqjGHP1VsTs3hkLFy9fZNNq4EP1M
 ytfJPlTG
 =gzZK
 -----END PGP SIGNATURE-----

Merge tag '4.19-smb3' of git://git.samba.org/sfrench/cifs-2.6

Pull cifs updates from Steve French:
 "smb3/cifs fixes (including 8 for stable).

  Other improvements include:

   - improved tracing, improved stats

   - snapshots (previous version mounts work now over SMB3)

   - performance (compounding enabled for statfs, ~40% faster).

   - security (make it possible to build cifs.ko with insecure vers=1.0
     disabled in Kconfig)"

* tag '4.19-smb3' of git://git.samba.org/sfrench/cifs-2.6: (43 commits)
  smb3: create smb3 equivalent alias for cifs pseudo-xattrs
  smb3: allow previous versions to be mounted with snapshot= mount parm
  cifs: don't show domain= in mount output when domain is empty
  cifs: add missing support for ACLs in SMB 3.11
  smb3: enumerating snapshots was leaving part of the data off end
  cifs: update smb2_queryfs() to use compounding
  cifs: update receive_encrypted_standard to handle compounded responses
  cifs: create SMB2_open_init()/SMB2_open_free() helpers.
  cifs: add SMB2_query_info_[init|free]()
  cifs: add SMB2_close_init()/SMB2_close_free()
  smb3: display stats counters for number of slow commands
  CIFS: fix uninitialized ptr deref in smb2 signing
  smb3: Do not send SMB3 SET_INFO if nothing changed
  smb3: fix minor debug output for CONFIG_CIFS_STATS
  smb3: add tracepoint for slow responses
  cifs: add compound_send_recv()
  cifs: make smb_send_rqst take an array of requests
  cifs: update init_sg, crypt_message to take an array of rqst
  smb3: update readme to correct information about /proc/fs/cifs/Stats
  smb3: fix reset of bytes read and written stats
  ...
2018-08-13 22:32:11 -07:00
Linus Torvalds 161fa27ff2 Merge branch 'iomap-4.19-merge' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux
Pull fs iomap refactoring from Darrick Wong:
 "This is the first part of the XFS changes for 4.19.

  Christoph and Andreas coordinated some refactoring work on the iomap
  code in preparation for removing buffer heads from XFS and porting
  gfs2 to iomap. I'm sending this small pull request ahead of the main
  XFS merge to avoid holding up gfs2 unnecessarily"

* 'iomap-4.19-merge' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
  iomap: add inline data support to iomap_readpage_actor
  iomap: support direct I/O to inline data
  iomap: refactor iomap_dio_actor
  iomap: add initial support for writes without buffer heads
  iomap: add an iomap-based readpage and readpages implementation
  iomap: add private pointer to struct iomap
  iomap: add a page_done callback
  iomap: generic inline data handling
  iomap: complete partial direct I/O writes synchronously
  iomap: mark newly allocated buffer heads as new
  fs: factor out a __generic_write_end helper
2018-08-13 22:29:03 -07:00
Linus Torvalds a1a4f841ec for-4.19-tag
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEE8rQSAMVO+zA4DBdWxWXV+ddtWDsFAltxe7QACgkQxWXV+ddt
 WDswMA//QlRO+Ln5CH+RlT4fyf1RQUQZblWss2zxrmlo1GRI3Ljf2DNsBE3rD7P4
 NSiXfHmgkdjcQP6poPLJwHxwkNd4NFXglYg64wWO10RjHGhKglmH6ztU88wsPfr2
 2RZv271/NvYIEkEi6kdyy8ilKeWMshOfyj3+PaeapQn67uJfimyiUvDgUgbvwH3c
 yj0nVRLP1C7snNj4Atti/rjXMhG+m1UWfjRkZsmqlBp52k2UAcrtiwQK+DS5b9mL
 aWLSaGmIcJtSMkNJPQBST9GTWbJfKTpceoCzkT0o3irvQpN2e2flAJ4ireL8q4mN
 MvqJ7giPBFHNDcHEzN6VERvsaA1Rx9Vq20ieQl8JAMd4p/bi5ehN3ww+9vau5zCw
 Pc8WeKEILKrLYEAgHOnUO1wxHw994Iv5CA26roTQ0HNXQJjyEZ4m40Ch6LzmfKPm
 WKcHX14Uw22GKaFEXHTOpRZ0U0d1cMTcn5zaAajGsB9LwcaiLM+OiFSPtDkwUOB9
 QGJHklZVXAD1IH9HFPuq85uUtXTLXbxsw1g8phEJGbmaVxxCOAUAXwEk3qxuZNbz
 CHL3G5+l3JEXxfoJSbDW60kr8xic7teqQDszqqP2qlqtP15ty2xc9d5Q8MZajSTZ
 H1z9+0gfjYYHrGuAp69MtCbdQhhDSqLyivjJJm0HBaKfVNGW2Xg=
 =jBaz
 -----END PGP SIGNATURE-----

Merge tag 'for-4.19-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux

Pull btrfs updates from David Sterba:
 "Mostly fixes and cleanups, nothing big, though the notable thing is
  the inserted/deleted lines delta -1124.

  User visible changes:
   - allow defrag on opened read-only files that have rw permissions;
     similar to what dedupe will allow on such files

  Core changes:
   - tree checker improvements, reported by fuzzing:
      * more checks for: block group items, essential trees
      * chunk type validation
      * mount time cross-checks that physical and logical chunks match
      * switch more error codes to EUCLEAN aka EFSCORRUPTED

  Fixes:
   - fsync corner case fixes

   - fix send failure when root has deleted files still open

   - send, fix incorrect file layout after hole punching beyond eof

   - fix races between mount and deice scan ioctl, found by fuzzing

   - fix deadlock when delayed iput is called from writeback on the same
     inode; rare but has been observed in practice, also removes code

   - fix pinned byte accounting, using the right percpu helpers; this
     should avoid some write IO inefficiency during low space conditions

   - don't remove block group that still has pinned bytes

   - reset on-disk device stats value after replace, otherwise this
     would report stale values for the new device

  Cleanups:
   - time64_t/timespec64 cleanups

   - remove remaining dead code in scrub handling NOCOW extents after
     disabling it in previous cycle

   - simplify fsync regarding ordered extents logic and remove all the
     related code

   - remove redundant arguments in order to reduce stack space
     consumption

   - remove support for V0 type of extents, not in use since 2.6.30

   - remove several unused structure members

   - fewer indirect function calls by inlining some callbacks

   - qgroup rescan timing fixes

   - vfs: iget cleanups"

* tag 'for-4.19-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: (182 commits)
  btrfs: revert fs_devices state on error of btrfs_init_new_device
  btrfs: Exit gracefully when chunk map cannot be inserted to the tree
  btrfs: Introduce mount time chunk <-> dev extent mapping check
  btrfs: Verify that every chunk has corresponding block group at mount time
  btrfs: Check that each block group has corresponding chunk at mount time
  Btrfs: send, fix incorrect file layout after hole punching beyond eof
  btrfs: Use wrapper macro for rcu string to remove duplicate code
  btrfs: simplify btrfs_iget
  btrfs: lift make_bad_inode into btrfs_iget
  btrfs: simplify IS_ERR/PTR_ERR checks
  btrfs: btrfs_iget never returns an is_bad_inode inode
  btrfs: replace: Reset on-disk dev stats value after replace
  btrfs: extent-tree: Remove unused __btrfs_free_block_rsv
  btrfs: backref: Use ERR_CAST to return error code
  btrfs: Remove redundant btrfs_release_path from btrfs_unlink_subvol
  btrfs: Remove root parameter from btrfs_unlink_subvol
  btrfs: Remove fs_info from btrfs_add_root_ref
  btrfs: Remove fs_info from btrfs_del_root_ref
  btrfs: Remove fs_info from btrfs_del_root
  btrfs: Remove fs_info from btrfs_delete_delayed_dir_index
  ...
2018-08-13 21:58:53 -07:00
Linus Torvalds 575b94386b File locking fixes and enhancements for v4.19
-----BEGIN PGP SIGNATURE-----
 
 iQIcBAABAgAGBQJbcDltAAoJEAAOaEEZVoIV9YUP/ioYw3+kIXwRa05ec8Twbn6m
 O+1HOX9UAuDWX+97P2kCJSqi9a/TkpfLQxoGxZowDv97ZkVeHdzVaJVN/0sbH1uP
 Oj86b2Y5zGyrLk3D4c5VcyHkvogr16DUvhYeRtcjPyufSaKqefftCeetTntRvr4Z
 rZ8HLKYxa4AZlP/FkgVZ/S6PmvG7tH8rVLMFCfY+rASbSU4gBxmrS0nB2UgU4ycb
 b2SVZo4Zxycw0KiP14T5AgN4puid7VEofFezBpMKjNPjCUk1wJ3mtUyLeS8jRUEx
 M6qnl7DU4BWPRtGxiXNebN85M8iaJnlaeQglowklrubrFtqnVeDQM0rz42NiY2/S
 2jk6q8b9XKvoDEev2VeKfmFztu4gGXypNR6xAolvgBnUZziKKFUJutZwJe95pZV1
 kO6CvdchuCQdLMLdHPji2kk2AA8Go5webONfioFhoWYfeSvgjRELmOFTDjP22eNv
 s1/vfCsjuU6vdyaXXXZpN/7VMenoNegSQUXFoxhmHO7BtNuTqk+ZVFNgaWsOrXcx
 I0ZIjCAmOsU+DKLn+DTT8kDtU3ihZz/OCXXZJi9JPeLFZ6gSghDRnFThwYqcZvA+
 syIn1XGGtn15Q7A5FZNvwGqYir1GtyOslrxxIKxGlhleovhT94HuKXkL1T+k74zl
 FXItKgRs0TPQfTsI8q+C
 =IMcz
 -----END PGP SIGNATURE-----

Merge tag 'locks-v4.19-1' of git://git.kernel.org/pub/scm/linux/kernel/git/jlayton/linux

Pull file locking updates from Jeff Layton:
 "Just a couple of patches from Konstantin to fix /proc/locks when the
  process that set the lock has exited, and a new tracepoint for the
  flock() codepath. Also threw in mailmap entries for my addresses and a
  comment cleanup"

* tag 'locks-v4.19-1' of git://git.kernel.org/pub/scm/linux/kernel/git/jlayton/linux:
  locks: remove misleading obsolete comment
  mailmap: remap some of my email addresses to kernel.org address
  locks: add tracepoint in flock codepath
  fs/lock: show locks taken by processes from another pidns
  fs/lock: skip lock owner pid translation in case we are in init_pid_ns
2018-08-13 21:56:50 -07:00
Linus Torvalds 4591343e35 Merge branches 'work.misc' and 'work.dcache' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull misc vfs updates from Al Viro:
 "Misc cleanups from various folks all over the place

  I expected more fs/dcache.c cleanups this cycle, so that went into a
  separate branch. Said cleanups have missed the window, so in the
  hindsight it could've gone into work.misc instead. Decided not to
  cherry-pick, thus the 'work.dcache' branch"

* 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  fs: dcache: Use true and false for boolean values
  fold generic_readlink() into its only caller
  fs: shave 8 bytes off of struct inode
  fs: Add more kernel-doc to the produced documentation
  fs: Fix attr.c kernel-doc
  removed extra extern file_fdatawait_range

* 'work.dcache' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  kill dentry_update_name_case()
2018-08-13 21:28:25 -07:00
Linus Torvalds f2be269897 Merge branch 'work.aio' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull vfs aio updates from Al Viro:
 "Christoph's aio poll, saner this time around.

  This time it's pretty much local to fs/aio.c. Hopefully race-free..."

* 'work.aio' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  aio: allow direct aio poll comletions for keyed wakeups
  aio: implement IOCB_CMD_POLL
  aio: add a iocb refcount
  timerfd: add support for keyed wakeups
2018-08-13 20:56:23 -07:00
Linus Torvalds 4d2a073cde Merge branch 'work.lookup' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull vfs lookup() updates from Al Viro:
 "More conversions of ->lookup() to d_splice_alias().

  Should be reasonably complete now - the only leftovers are in ceph"

* 'work.lookup' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  afs_try_auto_mntpt(): return NULL instead of ERR_PTR(-ENOENT)
  afs_lookup(): switch to d_splice_alias()
  afs: switch dynroot lookups to d_splice_alias()
  hpfs: fix an inode leak in lookup, switch to d_splice_alias()
  hostfs_lookup: switch to d_splice_alias()
2018-08-13 20:54:14 -07:00
Linus Torvalds 0ea97a2d61 Merge branch 'work.mkdir' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull vfs icache updates from Al Viro:

 - NFS mkdir/open_by_handle race fix

 - analogous solution for FUSE, replacing the one currently in mainline

 - new primitive to be used when discarding halfway set up inodes on
   failed object creation; gives sane warranties re icache lookups not
   returning such doomed by still not freed inodes. A bunch of
   filesystems switched to that animal.

 - Miklos' fix for last cycle regression in iget5_locked(); -stable will
   need a slightly different variant, unfortunately.

 - misc bits and pieces around things icache-related (in adfs and jfs).

* 'work.mkdir' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  jfs: don't bother with make_bad_inode() in ialloc()
  adfs: don't put inodes into icache
  new helper: inode_fake_hash()
  vfs: don't evict uninitialized inode
  jfs: switch to discard_new_inode()
  ext2: make sure that partially set up inodes won't be returned by ext2_iget()
  udf: switch to discard_new_inode()
  ufs: switch to discard_new_inode()
  btrfs: switch to discard_new_inode()
  new primitive: discard_new_inode()
  kill d_instantiate_no_diralias()
  nfs_instantiate(): prevent multiple aliases for directory inode
2018-08-13 20:25:58 -07:00
Linus Torvalds a66b4cd1e7 Merge branch 'work.open3' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull vfs open-related updates from Al Viro:

 - "do we need fput() or put_filp()" rules are gone - it's always fput()
   now. We keep track of that state where it belongs - in ->f_mode.

 - int *opened mess killed - in finish_open(), in ->atomic_open()
   instances and in fs/namei.c code around do_last()/lookup_open()/atomic_open().

 - alloc_file() wrappers with saner calling conventions are introduced
   (alloc_file_clone() and alloc_file_pseudo()); callers converted, with
   much simplification.

 - while we are at it, saner calling conventions for path_init() and
   link_path_walk(), simplifying things inside fs/namei.c (both on
   open-related paths and elsewhere).

* 'work.open3' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (40 commits)
  few more cleanups of link_path_walk() callers
  allow link_path_walk() to take ERR_PTR()
  make path_init() unconditionally paired with terminate_walk()
  document alloc_file() changes
  make alloc_file() static
  do_shmat(): grab shp->shm_file earlier, switch to alloc_file_clone()
  new helper: alloc_file_clone()
  create_pipe_files(): switch the first allocation to alloc_file_pseudo()
  anon_inode_getfile(): switch to alloc_file_pseudo()
  hugetlb_file_setup(): switch to alloc_file_pseudo()
  ocxlflash_getfile(): switch to alloc_file_pseudo()
  cxl_getfile(): switch to alloc_file_pseudo()
  ... and switch shmem_file_setup() to alloc_file_pseudo()
  __shmem_file_setup(): reorder allocations
  new wrapper: alloc_file_pseudo()
  kill FILE_{CREATED,OPENED}
  switch atomic_open() and lookup_open() to returning 0 in all success cases
  document ->atomic_open() changes
  ->atomic_open(): return 0 in all success cases
  get rid of 'opened' in path_openat() and the helpers downstream
  ...
2018-08-13 19:58:36 -07:00
Linus Torvalds e5a32b5b21 Here are the main MIPS changes for 4.19.
An overview of the general architecture changes:
 
   - Massive DMA ops refactoring from Christoph Hellwig (huzzah for
     deleting crufty code!).
 
   - We introduce NT_MIPS_DSP & NT_MIPS_FP_MODE ELF notes & corresponding
     regsets to expose DSP ASE & floating point mode state respectively,
     both for live debugging & core dumps.
 
   - We better optimize our code by hard-coding cpu_has_* macros at
     compile time where their values are known due to the ISA revision
     that the kernel build is targeting.
 
   - The EJTAG exception handler now better handles SMP systems, where it
     was previously possible for CPUs to clobber a register value saved
     by another CPU.
 
   - Our implementation of memset() gained a couple of fixes for MIPSr6
     systems to return correct values in some cases where stores fault.
 
   - We now implement ioremap_wc() using the uncached-accelerated cache
     coherency attribute where supported, which is detected during boot,
     and fall back to plain uncached access where necessary. The
     MIPS-specific (and unused in tree) ioremap_uncached_accelerated() &
     ioremap_cacheable_cow() are removed.
 
   - The prctl(PR_SET_FP_MODE, ...) syscall is better supported for SMP
     systems by reworking the way we ensure remote CPUs that may be
     running threads within the affected process switch mode.
 
   - Systems using the MIPS Coherence Manager will now set the
     MIPS_IC_SNOOPS_REMOTE flag to avoid some unnecessary cache
     maintenance overhead when flushing the icache.
 
   - A few fixes were made for building with clang/LLVM, which
     now sucessfully builds kernels for many of our platforms.
 
   - Miscellaneous cleanups all over.
 
 And some platform-specific changes:
 
   - ar7 gained stubs for a few clock API functions to fix build failures
     for some drivers.
 
   - ath79 gained support for a few new SoCs, a few fixes & better
     gpio-keys support.
 
   - Ci20 now exposes its SPI bus using the spi-gpio driver.
 
   - The generic platform can now auto-detect a suitable value for
     PHYS_OFFSET based upon the memory map described by the device tree,
     allowing us to avoid wasting memory on page book-keeping for systems
     where RAM starts at a non-zero physical address.
 
   - Ingenic systems using the jz4740 platform code now link their
     vmlinuz higher to allow for kernels of a realistic size.
 
   - Loongson32 now builds the kernel targeting MIPSr1 rather than MIPSr2
     to avoid CPU errata.
 
   - Loongson64 gains a couple of fixes, a workaround for a write
     buffering issue & support for the Loongson 3A R3.1 CPU.
 
   - Malta now uses the piix4-poweroff driver to handle powering down.
 
   - Microsemi Ocelot gained support for its SPI bus & NOR flash, its
     second MDIO bus and can now be supported by a FIT/.itb image.
 
   - Octeon saw a bunch of header cleanups which remove a lot of
     duplicate or unused code.
 -----BEGIN PGP SIGNATURE-----
 
 iIsEABYIADMWIQRgLjeFAZEXQzy86/s+p5+stXUA3QUCW3G6JxUccGF1bC5idXJ0
 b25AbWlwcy5jb20ACgkQPqefrLV1AN0n/gD/Rpdgay31G/4eTTKBmBrcaju6Shjt
 /2Iu6WC5Sj4hDHUBAJSbuI+B9YjcNsjekBYxB/LLD7ImcLBl6nLMIvKmXLAL
 =cUiF
 -----END PGP SIGNATURE-----

Merge tag 'mips_4.19' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux

Pull MIPS updates from Paul Burton:
 "Here are the main MIPS changes for 4.19.

  An overview of the general architecture changes:

   - Massive DMA ops refactoring from Christoph Hellwig (huzzah for
     deleting crufty code!).

   - We introduce NT_MIPS_DSP & NT_MIPS_FP_MODE ELF notes &
     corresponding regsets to expose DSP ASE & floating point mode state
     respectively, both for live debugging & core dumps.

   - We better optimize our code by hard-coding cpu_has_* macros at
     compile time where their values are known due to the ISA revision
     that the kernel build is targeting.

   - The EJTAG exception handler now better handles SMP systems, where
     it was previously possible for CPUs to clobber a register value
     saved by another CPU.

   - Our implementation of memset() gained a couple of fixes for MIPSr6
     systems to return correct values in some cases where stores fault.

   - We now implement ioremap_wc() using the uncached-accelerated cache
     coherency attribute where supported, which is detected during boot,
     and fall back to plain uncached access where necessary. The
     MIPS-specific (and unused in tree) ioremap_uncached_accelerated() &
     ioremap_cacheable_cow() are removed.

   - The prctl(PR_SET_FP_MODE, ...) syscall is better supported for SMP
     systems by reworking the way we ensure remote CPUs that may be
     running threads within the affected process switch mode.

   - Systems using the MIPS Coherence Manager will now set the
     MIPS_IC_SNOOPS_REMOTE flag to avoid some unnecessary cache
     maintenance overhead when flushing the icache.

   - A few fixes were made for building with clang/LLVM, which now
     sucessfully builds kernels for many of our platforms.

   - Miscellaneous cleanups all over.

  And some platform-specific changes:

   - ar7 gained stubs for a few clock API functions to fix build
     failures for some drivers.

   - ath79 gained support for a few new SoCs, a few fixes & better
     gpio-keys support.

   - Ci20 now exposes its SPI bus using the spi-gpio driver.

   - The generic platform can now auto-detect a suitable value for
     PHYS_OFFSET based upon the memory map described by the device tree,
     allowing us to avoid wasting memory on page book-keeping for
     systems where RAM starts at a non-zero physical address.

   - Ingenic systems using the jz4740 platform code now link their
     vmlinuz higher to allow for kernels of a realistic size.

   - Loongson32 now builds the kernel targeting MIPSr1 rather than
     MIPSr2 to avoid CPU errata.

   - Loongson64 gains a couple of fixes, a workaround for a write
     buffering issue & support for the Loongson 3A R3.1 CPU.

   - Malta now uses the piix4-poweroff driver to handle powering down.

   - Microsemi Ocelot gained support for its SPI bus & NOR flash, its
     second MDIO bus and can now be supported by a FIT/.itb image.

   - Octeon saw a bunch of header cleanups which remove a lot of
     duplicate or unused code"

* tag 'mips_4.19' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux: (123 commits)
  MIPS: Remove remnants of UASM_ISA
  MIPS: netlogic: xlr: Remove erroneous check in nlm_fmn_send()
  MIPS: VDSO: Force link endianness
  MIPS: Always specify -EB or -EL when using clang
  MIPS: Use dins to simplify __write_64bit_c0_split()
  MIPS: Use read-write output operand in __write_64bit_c0_split()
  MIPS: Avoid using array as parameter to write_c0_kpgd()
  MIPS: vdso: Allow clang's --target flag in VDSO cflags
  MIPS: genvdso: Remove GOT checks
  MIPS: Remove obsolete MIPS checks for DST node "chosen@0"
  MIPS: generic: Remove input symbols from defconfig
  MIPS: Delete unused code in linux32.c
  MIPS: Remove unused sys_32_mmap2
  MIPS: Remove nabi_no_regargs
  mips: dts: mscc: enable spi and NOR flash support on ocelot PCB123
  mips: dts: mscc: Add spi on Ocelot
  MIPS: Loongson: Merge load addresses
  MIPS: Loongson: Set Loongson32 to MIPS32R1
  MIPS: mscc: ocelot: add interrupt controller properties to GPIO controller
  MIPS: generic: Select MIPS_AUTO_PFN_OFFSET
  ...
2018-08-13 19:24:32 -07:00
Linus Torvalds 1e45e9a95e Merge branch 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timer updates from Thomas Gleixner:
 "The timers departement more or less proudly presents:

   - More Y2038 timekeeping work mostly in the core code. The work is
     slowly, but steadily targeting the actuall syscalls.

   - Enhanced timekeeping suspend/resume support by utilizing
     clocksources which do not stop during suspend, but are otherwise
     not the main timekeeping clocksources.

   - Make NTP adjustmets more accurate and immediate when the frequency
     is set directly and not incrementally.

   - Sanitize the overrung handing of posix timers

   - A new timer driver for Mediatek SoCs

   - The usual pile of fixes and updates all over the place"

* 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (32 commits)
  clockevents: Warn if cpu_all_mask is used as cpumask
  tick/broadcast-hrtimer: Use cpu_possible_mask for ce_broadcast_hrtimer
  clocksource/drivers/arm_arch_timer: Fix bogus cpu_all_mask usage
  clocksource: ti-32k: Remove CLOCK_SOURCE_SUSPEND_NONSTOP flag
  timers: Clear timer_base::must_forward_clk with timer_base::lock held
  clocksource/drivers/sprd: Register one always-on timer to compensate suspend time
  clocksource/drivers/timer-mediatek: Add support for system timer
  clocksource/drivers/timer-mediatek: Convert the driver to timer-of
  clocksource/drivers/timer-mediatek: Use specific prefix for GPT
  clocksource/drivers/timer-mediatek: Rename mtk_timer to timer-mediatek
  clocksource/drivers/timer-mediatek: Add system timer bindings
  clocksource/drivers: Set clockevent device cpumask to cpu_possible_mask
  time: Introduce one suspend clocksource to compensate the suspend time
  time: Fix extra sleeptime injection when suspend fails
  timekeeping/ntp: Constify some function arguments
  ntp: Use kstrtos64 for s64 variable
  ntp: Remove redundant arguments
  timer: Fix coding style
  ktime: Provide typesafe ktime_to_ns()
  hrtimer: Improve kernel message printing
  ...
2018-08-13 13:02:31 -07:00
Linus Torvalds de5d1b39ea Merge branch 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull locking/atomics update from Thomas Gleixner:
 "The locking, atomics and memory model brains delivered:

   - A larger update to the atomics code which reworks the ordering
     barriers, consolidates the atomic primitives, provides the new
     atomic64_fetch_add_unless() primitive and cleans up the include
     hell.

   - Simplify cmpxchg() instrumentation and add instrumentation for
     xchg() and cmpxchg_double().

   - Updates to the memory model and documentation"

* 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (48 commits)
  locking/atomics: Rework ordering barriers
  locking/atomics: Instrument cmpxchg_double*()
  locking/atomics: Instrument xchg()
  locking/atomics: Simplify cmpxchg() instrumentation
  locking/atomics/x86: Reduce arch_cmpxchg64*() instrumentation
  tools/memory-model: Rename litmus tests to comply to norm7
  tools/memory-model/Documentation: Fix typo, smb->smp
  sched/Documentation: Update wake_up() & co. memory-barrier guarantees
  locking/spinlock, sched/core: Clarify requirements for smp_mb__after_spinlock()
  sched/core: Use smp_mb() in wake_woken_function()
  tools/memory-model: Add informal LKMM documentation to MAINTAINERS
  locking/atomics/Documentation: Describe atomic_set() as a write operation
  tools/memory-model: Make scripts executable
  tools/memory-model: Remove ACCESS_ONCE() from model
  tools/memory-model: Remove ACCESS_ONCE() from recipes
  locking/memory-barriers.txt/kokr: Update Korean translation to fix broken DMA vs. MMIO ordering example
  MAINTAINERS: Add Daniel Lustig as an LKMM reviewer
  tools/memory-model: Fix ISA2+pooncelock+pooncelock+pombonce name
  tools/memory-model: Add litmus test for full multicopy atomicity
  locking/refcount: Always allow checked forms
  ...
2018-08-13 12:23:39 -07:00
Linus Torvalds 400439275d Merge branch 'efi-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull EFI updates from Thomas Gleixner:
 "The EFI pile:

   - Make mixed mode UEFI runtime service invocations mutually
     exclusive, as mandated by the UEFI spec

   - Perform UEFI runtime services calls from a work queue so the calls
     into the firmware occur from a kernel thread

   - Honor the UEFI memory map attributes for live memory regions
     configured by UEFI as a framebuffer. This works around a coherency
     problem with KVM guests running on ARM.

   - Cleanups, improvements and fixes all over the place"

* 'efi-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  efivars: Call guid_parse() against guid_t type of variable
  efi/cper: Use consistent types for UUIDs
  efi/x86: Replace references to efi_early->is64 with efi_is_64bit()
  efi: Deduplicate efi_open_volume()
  efi/x86: Add missing NULL initialization in UGA draw protocol discovery
  efi/x86: Merge 32-bit and 64-bit UGA draw protocol setup routines
  efi/x86: Align efi_uga_draw_protocol typedef names to convention
  efi/x86: Merge the setup_efi_pci32() and setup_efi_pci64() routines
  efi/x86: Prevent reentrant firmware calls in mixed mode
  efi/esrt: Only call efi_mem_reserve() for boot services memory
  fbdev/efifb: Honour UEFI memory map attributes when mapping the FB
  efi: Drop type and attribute checks in efi_mem_desc_lookup()
  efi/libstub/arm: Add opt-in Kconfig option for the DTB loader
  efi: Remove the declaration of efi_late_init() as the function is unused
  efi/cper: Avoid using get_seconds()
  efi: Use a work queue to invoke EFI Runtime Services
  efi/x86: Use non-blocking SetVariable() for efi_delete_dummy_variable()
  efi/x86: Clean up the eboot code
2018-08-13 10:25:08 -07:00
Yan, Zheng 0fcf6c02b2 ceph: don't drop message if it contains more data than expected
Later version mds may encode more data into messages.

Signed-off-by: "Yan, Zheng" <zyan@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2018-08-13 17:55:44 +02:00
Yan, Zheng 342ce1823e ceph: support cephfs' own feature bits
Signed-off-by: "Yan, Zheng" <zyan@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2018-08-13 17:55:44 +02:00
Chengguang Xu e5bc08d09f ceph: refactor error handling code in ceph_reserve_caps()
Call new helper __ceph_unreserve_caps() to reduce duplicated code.

Signed-off-by: Chengguang Xu <cgxu519@gmx.com>
Reviewed-by: "Yan, Zheng" <zyan@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2018-08-13 17:55:44 +02:00
Chengguang Xu 7bf8f736c8 ceph: refactor ceph_unreserve_caps()
The code of ceph_unreserve_caps() and error handling in
ceph_reserve_caps() are duplicated, so introduce a helper
__ceph_unreserve_caps() to reduce duplicated code.

Signed-off-by: Chengguang Xu <cgxu519@gmx.com>
Reviewed-by: "Yan, Zheng" <zyan@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2018-08-13 17:55:43 +02:00
Chengguang Xu d554849290 ceph: change to void return type for __do_request()
We do not check return code for __do_request() in all callers,
so change to void return type.

Signed-off-by: Chengguang Xu <cgxu519@gmx.com>
Reviewed-by: "Yan, Zheng" <zyan@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2018-08-13 17:55:43 +02:00
Chengguang Xu 9da12e3a7d ceph: compare fsc->max_file_size and inode->i_size for max file size limit
In ceph_llseek(), we compare fsc->max_file_size and inode->i_size to
choose max file size limit.

Signed-off-by: Chengguang Xu <cgxu519@gmx.com>
Reviewed-by: "Yan, Zheng" <zyan@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2018-08-13 17:55:43 +02:00
Chengguang Xu 36a4c72d1c ceph: add additional size check in ceph_setattr()
ceph_setattr() finally calls vfs function inode_newsize_ok()
to do offset validation and that is based on sb->s_maxbytes.
Because we set sb->s_maxbytes to MAX_LFS_FILESIZE to through
VFS check and do proper offset validation in cephfs level,
we need adding proper offset validation before calling
inode_newsize_ok().

Signed-off-by: Chengguang Xu <cgxu519@gmx.com>
Reviewed-by: "Yan, Zheng" <zyan@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2018-08-13 17:55:43 +02:00
piaojun 3111784bee fs/9p/xattr.c: catch the error of p9_client_clunk when setting xattr failed
In my testing, v9fs_fid_xattr_set will return successfully even if the
backend ext4 filesystem has no space to store xattr key-value. That will
cause inconsistent behavior between front end and back end. The reason is
that lsetxattr will be triggered by p9_client_clunk, and unfortunately we
did not catch the error. This patch will catch the error to notify upper
caller.

p9_client_clunk (in 9p)
  p9_client_rpc(clnt, P9_TCLUNK, "d", fid->fid);
    v9fs_clunk (in qemu)
      put_fid
        free_fid
          v9fs_xattr_fid_clunk
            v9fs_co_lsetxattr
              s->ops->lsetxattr
                ext4_xattr_user_set (in host ext4 filesystem)

Link: http://lkml.kernel.org/r/5B57EACC.2060900@huawei.com
Signed-off-by: Jun Piao <piaojun@huawei.com>
Cc: Eric Van Hensbergen <ericvh@gmail.com>
Cc: Ron Minnich <rminnich@sandia.gov>
Cc: Latchesar Ionkov <lucho@ionkov.net>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: stable@vger.kernel.org
Signed-off-by: Dominique Martinet <dominique.martinet@cea.fr>
2018-08-13 09:35:16 +09:00
Colin Ian King 6baaac0961 fs/9p/v9fs.c: fix spelling mistake "Uknown" -> "Unknown"
fix spelling mistake in pr_info message text

Link: http://lkml.kernel.org/r/20180526150650.10562-1-colin.king@canonical.com
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Cc: Eric Van Hensbergen <ericvh@gmail.com>
Cc: Ron Minnich <rminnich@sandia.gov>
Cc: Latchesar Ionkov <lucho@ionkov.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Dominique Martinet <dominique.martinet@cea.fr>
2018-08-13 09:21:44 +09:00
Souptick Joarder fe6340e2d1 fs/9p/vfs_file.c: use new return type vm_fault_t
Use new return type vm_fault_t for page_mkwrite handler.

See 1c8f422059 ("mm: change return type to vm_fault_t") for reference.

Link: http://lkml.kernel.org/r/20180702154928.GA3964@jordon-HP-15-Notebook-PC
Signed-off-by: Souptick Joarder <jrdr.linux@gmail.com>
Reviewed-by: Matthew Wilcox <mawilcox@microsoft.com>
Acked-by: Jun Piao <piaojun@huawei.com>
Cc: Eric Van Hensbergen <ericvh@gmail.com>
Cc: Ron Minnich <rminnich@sandia.gov>
Cc: Latchesar Ionkov <lucho@ionkov.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Dominique Martinet <dominique.martinet@cea.fr>
2018-08-13 09:21:44 +09:00
Linus Torvalds d6dd643159 Merge branch 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull vfs fixes from Al Viro:
 "A bunch of race fixes, mostly around lazy pathwalk.

  All of it is -stable fodder, a large part going back to 2013"

* 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  make sure that __dentry_kill() always invalidates d_seq, unhashed or not
  fix __legitimize_mnt()/mntput() race
  fix mntput/mntput race
  root dentries need RCU-delayed freeing
2018-08-12 11:21:17 -07:00
Shan Hai 01239d77b9 xfs: fix a null pointer dereference in xfs_bmap_extents_to_btree
Fuzzing tool reports a write to null pointer error in the
xfs_bmap_extents_to_btree, fix it by bailing out on encountering
a null pointer.

Signed-off-by: Shan Hai <shan.hai@oracle.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-08-12 08:37:31 -07:00
Eric Sandeen fa6c668d80 xfs: remove b_last_holder & associated macros
The old lock tracking infrastructure in xfs using the b_last_holder
field seems to only be useful if you can get into the system with a
debugger; it seems that the existing tracepoints would be the way to
go these days, and this old infrastructure can be removed.

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-08-12 08:37:31 -07:00
Andreas Gruenbacher 10259de1d8 iomap: Switch to offset_in_page for clarity
Instead of open-coding pos & (PAGE_SIZE - 1) and pos & ~PAGE_MASK, use
the offset_in_page macro.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-08-12 08:37:31 -07:00
Dave Jiang e25ff835af xfs: Close race between direct IO and xfs_break_layouts()
This patch is the duplicate of ross's fix for ext4 for xfs.

If the refcount of a page is lowered between the time that it is returned
by dax_busy_page() and when the refcount is again checked in
xfs_break_layouts() => ___wait_var_event(), the waiting function
xfs_wait_dax_page() will never be called.  This means that
xfs_break_layouts() will still have 'retry' set to false, so we'll stop
looping and never check the refcount of other pages in this inode.

Instead, always continue looping as long as dax_layout_busy_page() gives us
a page which it found with an elevated refcount.

Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-08-12 08:37:31 -07:00
Boris Brezillon da86748bf6 NAND core changes:
- Add the SPI-NAND framework.
 - Create a helper to find the best ECC configuration.
 - Create NAND controller operations.
 - Allocate dynamically ONFI parameters structure.
 - Add defines for ONFI version bits.
 - Add manufacturer fixup for ONFI parameter page.
 - Add an option to specify NAND chip as a boot device.
 - Add Reed-Solomon error correction algorithm.
 - Better name for the controller structure.
 - Remove unused caller_is_module() definition.
 - Make subop helpers return unsigned values.
 - Expose _notsupp() helpers for raw page accessors.
 - Add default values for dynamic timings.
 - Kill the chip->scan_bbt() hook.
 - Rename nand_default_bbt() into nand_create_bbt().
 - Start to clean the nand_chip structure.
 - Remove stale prototype from rawnand.h.
 
 Raw NAND controllers drivers changes:
 - Qcom: structuring cleanup.
 - Denali: use core helper to find the best ECC configuration.
 - Possible build of almost all drivers by adding a dependency on
   COMPILE_TEST for almost all of them in Kconfig, implies various
   fixes, Kconfig cleanup, GPIO headers inclusion cleanup, and even
   changes in sparc64 and ia64 architectures.
 - Clean the ->probe() functions error path of a lot of drivers.
 - Migrate all drivers to use nand_scan() instead of
   nand_scan_ident()/nand_scan_tail() pair.
 - Use mtd_device_register() where applicable to simplify the code.
 - Marvell:
   * Handle on-die ECC.
   * Better clocks handling.
   * Remove bogus comment.
   * Add suspend and resume support.
 - Tegra: add NAND controller driver.
 - Atmel:
   * Add module param to avoid using dma.
   * Drop Wenyou Yang from MAINTAINERS.
 - Denali: optimize timings handling.
 - FSMC: Stop using chip->read_buf().
 - FSL:
   * Switch to SPDX license tag identifiers.
   * Fix qualifiers in MXC init functions.
 
 Raw NAND chip drivers changes:
 - Micron:
   * Add fixup for ONFI revision.
   * Update ecc_stats.corrected.
   * Make ECC activation stateful.
   * Avoid enabling/disabling ECC when it can't be disabled.
   * Get the actual number of bitflips.
   * Allow forced on-die ECC.
   * Support 8/512 on-die ECC.
   * Fix on-die ECC detection logic.
 - Hynix:
   * Fix decoding the OOB size on H27UCG8T2BTR.
   * Use ->exec_op() in hynix_nand_reg_write_op().
 -----BEGIN PGP SIGNATURE-----
 
 iQEcBAABCAAGBQJbYYGVAAoJECVq6hhHvVaE0poIAJy+VpZl0jTPQ/oO8TQui9hE
 IZbc8LwohCvegYYhiY1cNESyMYamDfoK6M93i/0zTJF2AJAxPl25ldT8N5Wr16DO
 5Vfsdjv75V8l0JEY2SvWYmC6glOAYs0UEDdcFNJRMPqUnQz+VvBIafJOCQqzo4ZH
 SDnLx3XzOxO4PAPnztWEg50WvaqMPt7ThcqoxThHMcQaLrNjgJUsV0mN+vNEv16Q
 6gH6hl1C019k+Kj2Zu0vAifHw1K7gIYT4HvqKwstQ6HYUX2IzIzuEpRIcIze0S5z
 XKzZ57USItb3l+Y3YwFBLjgP4N+VTT5X59LxdtCOXJ+YvzgxwtKElRvalNcryYI=
 =zkEf
 -----END PGP SIGNATURE-----

Merge tag 'nand/for-4.19' of git://git.infradead.org/linux-mtd into mtd/next

Pull NAND updates from Miquel Raynal:

"
 NAND core changes:
 - Add the SPI-NAND framework.
 - Create a helper to find the best ECC configuration.
 - Create NAND controller operations.
 - Allocate dynamically ONFI parameters structure.
 - Add defines for ONFI version bits.
 - Add manufacturer fixup for ONFI parameter page.
 - Add an option to specify NAND chip as a boot device.
 - Add Reed-Solomon error correction algorithm.
 - Better name for the controller structure.
 - Remove unused caller_is_module() definition.
 - Make subop helpers return unsigned values.
 - Expose _notsupp() helpers for raw page accessors.
 - Add default values for dynamic timings.
 - Kill the chip->scan_bbt() hook.
 - Rename nand_default_bbt() into nand_create_bbt().
 - Start to clean the nand_chip structure.
 - Remove stale prototype from rawnand.h.

 Raw NAND controllers drivers changes:
 - Qcom: structuring cleanup.
 - Denali: use core helper to find the best ECC configuration.
 - Possible build of almost all drivers by adding a dependency on
   COMPILE_TEST for almost all of them in Kconfig, implies various
   fixes, Kconfig cleanup, GPIO headers inclusion cleanup, and even
   changes in sparc64 and ia64 architectures.
 - Clean the ->probe() functions error path of a lot of drivers.
 - Migrate all drivers to use nand_scan() instead of
   nand_scan_ident()/nand_scan_tail() pair.
 - Use mtd_device_register() where applicable to simplify the code.
 - Marvell:
   * Handle on-die ECC.
   * Better clocks handling.
   * Remove bogus comment.
   * Add suspend and resume support.
 - Tegra: add NAND controller driver.
 - Atmel:
   * Add module param to avoid using dma.
   * Drop Wenyou Yang from MAINTAINERS.
 - Denali: optimize timings handling.
 - FSMC: Stop using chip->read_buf().
 - FSL:
   * Switch to SPDX license tag identifiers.
   * Fix qualifiers in MXC init functions.

 Raw NAND chip drivers changes:
 - Micron:
   * Add fixup for ONFI revision.
   * Update ecc_stats.corrected.
   * Make ECC activation stateful.
   * Avoid enabling/disabling ECC when it can't be disabled.
   * Get the actual number of bitflips.
   * Allow forced on-die ECC.
   * Support 8/512 on-die ECC.
   * Fix on-die ECC detection logic.
 - Hynix:
   * Fix decoding the OOB size on H27UCG8T2BTR.
   * Use ->exec_op() in hynix_nand_reg_write_op().
"
2018-08-11 12:15:19 +02:00
Steve French c4f7173ac3 smb3: create smb3 equivalent alias for cifs pseudo-xattrs
We really, really don't want to be encouraging people to use
cifs (the dialect) since it is insecure, so to avoid confusion
we want to move them to names which include 'smb3' instead of
'cifs' - so this simply creates an alias for the pseudo-xattrs

e.g. can now do:
getfattr -n user.smb3.creationtime /mnt1/file
and
getfattr -n user.smb3.dosattrib /mnt1/file
and
getfattr -n system.smb3_acl /mnt1/file

instead of forcing you to use the string 'cifs' in
these (e.g. getfattr -n system.cifs_acl /mnt1/file)

Signed-off-by: Steve French <stfrench@microsoft.com>
2018-08-10 18:46:58 -05:00
Darrick J. Wong 13942aa94a xfs: repair the AGI
Rebuild the AGI header items with some help from the rmapbt.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
2018-08-10 11:44:31 -07:00
Darrick J. Wong 0e93d3f43e xfs: repair the AGFL
Repair the AGFL from the rmap data.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
2018-08-10 11:44:31 -07:00
Darrick J. Wong f9ed6debca xfs: repair the AGF
Regenerate the AGF from the rmap data.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
2018-08-10 11:44:31 -07:00
Steve French cdeaf9d04a smb3: allow previous versions to be mounted with snapshot= mount parm
mounting with the "snapshots=" mount parm allows a read-only
view of a previous version of a file system (see MS-SMB2
and "timewarp" tokens, section 2.2.13.2.6) based on the timestamp
passed in on the snapshots mount parm.

Add processing to optionally send this create context.

Example output:

/mnt1 is mounted with "snapshots=..." and will see an earlier
version of the directory, with three fewer files than /mnt2
the current version of the directory.

root@Ubuntu-17-Virtual-Machine:~/cifs-2.6# cat /proc/mounts | grep cifs
//172.22.149.186/public /mnt1 cifs
ro,relatime,vers=default,cache=strict,username=smfrench,uid=0,noforceuid,gid=0,noforcegid,addr=172.22.149.186,file_mode=0755,dir_mode=0755,soft,nounix,mapposix,rsize=1048576,wsize=1048576,echo_interval=60,snapshot=131748608570000000,actimeo=1

//172.22.149.186/public /mnt2 cifs
rw,relatime,vers=default,cache=strict,username=smfrench,uid=0,noforceuid,gid=0,noforcegid,addr=172.22.149.186,file_mode=0755,dir_mode=0755,soft,nounix,mapposix,rsize=1048576,wsize=1048576,echo_interval=60,actimeo=1

root@Ubuntu-17-Virtual-Machine:~/cifs-2.6# ls /mnt1
EmptyDir  newerdir
root@Ubuntu-17-Virtual-Machine:~/cifs-2.6# ls /mnt1/newerdir

root@Ubuntu-17-Virtual-Machine:~/cifs-2.6# ls /mnt2
EmptyDir  file  newerdir  newestdir  timestamp-trace.cap
root@Ubuntu-17-Virtual-Machine:~/cifs-2.6# ls /mnt2/newerdir
new-file-not-in-snapshot

Snapshots are extremely useful for comparing previous versions of files or directories,
and recovering from data corruptions or mistakes.

Signed-off-by: Steve French <stfrench@microsoft.com>
Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>
2018-08-10 11:54:08 -05:00
Ronnie Sahlberg e55954a5f7 cifs: don't show domain= in mount output when domain is empty
Reported-by: Xiaoli Feng <xifeng@redhat.com>
Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
2018-08-10 11:53:51 -05:00
Ronnie Sahlberg c1777df1a5 cifs: add missing support for ACLs in SMB 3.11
We were missing the methods for get_acl and friends for the 3.11
dialect.

Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
CC: Stable <stable@vger.kernel.org>
Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
2018-08-10 11:53:32 -05:00
Steve French e02789a53d smb3: enumerating snapshots was leaving part of the data off end
When enumerating snapshots, the last few bytes of the final
snapshot could be left off since we were miscalculating the
length returned (leaving off the sizeof struct SRV_SNAPSHOT_ARRAY)
See MS-SMB2 section 2.2.32.2. In addition fixup the length used
to allow smaller buffer to be passed in, in order to allow
returning the size of the whole snapshot array more easily.

Sample userspace output with a kernel patched with this
(mounted to a Windows volume with two snapshots).
Before this patch, the second snapshot would be missing a
few bytes at the end.

~/cifs-2.6# ~/enum-snapshots /mnt/file
press enter to issue the ioctl to retrieve snapshot information ...

size of snapshot array = 102
Num snapshots: 2 Num returned: 2 Array Size: 102

Snapshot 0:@GMT-2018.06.30-19.34.17
Snapshot 1:@GMT-2018.06.30-19.33.37

CC: Stable <stable@vger.kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
2018-08-09 21:20:01 -05:00
Ronnie Sahlberg 730928c8f4 cifs: update smb2_queryfs() to use compounding
Change smb2_queryfs() to use a Create/QueryInfo/Close compound request.

Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Reviewed-by: Paulo Alcantara <palcantara@suse.com>
Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
2018-08-09 21:19:56 -05:00
Ronnie Sahlberg b24df3e30c cifs: update receive_encrypted_standard to handle compounded responses
Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Reviewed-by: Paulo Alcantara <palcantara@suse.com>
Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
2018-08-09 21:19:45 -05:00
Al Viro 4c0d7cd5c8 make sure that __dentry_kill() always invalidates d_seq, unhashed or not
RCU pathwalk relies upon the assumption that anything that changes
->d_inode of a dentry will invalidate its ->d_seq.  That's almost
true - the one exception is that the final dput() of already unhashed
dentry does *not* touch ->d_seq at all.  Unhashing does, though,
so for anything we'd found by RCU dcache lookup we are fine.
Unfortunately, we can *start* with an unhashed dentry or jump into
it.

We could try and be careful in the (few) places where that could
happen.  Or we could just make the final dput() invalidate the damn
thing, unhashed or not.  The latter is much simpler and easier to
backport, so let's do it that way.

Reported-by: "Dae R. Jeong" <threeearcat@gmail.com>
Cc: stable@vger.kernel.org
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-08-09 18:07:15 -04:00
Al Viro 119e1ef80e fix __legitimize_mnt()/mntput() race
__legitimize_mnt() has two problems - one is that in case of success
the check of mount_lock is not ordered wrt preceding increment of
refcount, making it possible to have successful __legitimize_mnt()
on one CPU just before the otherwise final mntpu() on another,
with __legitimize_mnt() not seeing mntput() taking the lock and
mntput() not seeing the increment done by __legitimize_mnt().
Solved by a pair of barriers.

Another is that failure of __legitimize_mnt() on the second
read_seqretry() leaves us with reference that'll need to be
dropped by caller; however, if that races with final mntput()
we can end up with caller dropping rcu_read_lock() and doing
mntput() to release that reference - with the first mntput()
having freed the damn thing just as rcu_read_lock() had been
dropped.  Solution: in "do mntput() yourself" failure case
grab mount_lock, check if MNT_DOOMED has been set by racing
final mntput() that has missed our increment and if it has -
undo the increment and treat that as "failure, caller doesn't
need to drop anything" case.

It's not easy to hit - the final mntput() has to come right
after the first read_seqretry() in __legitimize_mnt() *and*
manage to miss the increment done by __legitimize_mnt() before
the second read_seqretry() in there.  The things that are almost
impossible to hit on bare hardware are not impossible on SMP
KVM, though...

Reported-by: Oleg Nesterov <oleg@redhat.com>
Fixes: 48a066e72d ("RCU'd vsfmounts")
Cc: stable@vger.kernel.org
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-08-09 17:51:32 -04:00
Al Viro 9ea0a46ca2 fix mntput/mntput race
mntput_no_expire() does the calculation of total refcount under mount_lock;
unfortunately, the decrement (as well as all increments) are done outside
of it, leading to false positives in the "are we dropping the last reference"
test.  Consider the following situation:
	* mnt is a lazy-umounted mount, kept alive by two opened files.  One
of those files gets closed.  Total refcount of mnt is 2.  On CPU 42
mntput(mnt) (called from __fput()) drops one reference, decrementing component
	* After it has looked at component #0, the process on CPU 0 does
mntget(), incrementing component #0, gets preempted and gets to run again -
on CPU 69.  There it does mntput(), which drops the reference (component #69)
and proceeds to spin on mount_lock.
	* On CPU 42 our first mntput() finishes counting.  It observes the
decrement of component #69, but not the increment of component #0.  As the
result, the total it gets is not 1 as it should've been - it's 0.  At which
point we decide that vfsmount needs to be killed and proceed to free it and
shut the filesystem down.  However, there's still another opened file
on that filesystem, with reference to (now freed) vfsmount, etc. and we are
screwed.

It's not a wide race, but it can be reproduced with artificial slowdown of
the mnt_get_count() loop, and it should be easier to hit on SMP KVM setups.

Fix consists of moving the refcount decrement under mount_lock; the tricky
part is that we want (and can) keep the fast case (i.e. mount that still
has non-NULL ->mnt_ns) entirely out of mount_lock.  All places that zero
mnt->mnt_ns are dropping some reference to mnt and they call synchronize_rcu()
before that mntput().  IOW, if mntput() observes (under rcu_read_lock())
a non-NULL ->mnt_ns, it is guaranteed that there is another reference yet to
be dropped.

Reported-by: Jann Horn <jannh@google.com>
Tested-by: Jann Horn <jannh@google.com>
Fixes: 48a066e72d ("RCU'd vsfmounts")
Cc: stable@vger.kernel.org
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-08-09 17:21:17 -04:00
Ronnie Sahlberg 1eb9fb5204 cifs: create SMB2_open_init()/SMB2_open_free() helpers.
Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Reviewed-by: Paulo Alcantara <palcantara@suse.com>
Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
2018-08-08 18:10:26 -05:00
Ronnie Sahlberg 296ecbae7f cifs: add SMB2_query_info_[init|free]()
Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
Reviewed-by: Paulo Alcantara <palcantara@suse.com>
2018-08-08 18:08:47 -05:00
Ronnie Sahlberg 8eb4ecfab0 cifs: add SMB2_close_init()/SMB2_close_free()
Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
Reviewed-by: Paulo Alcantara <palcantara@suse.com>
2018-08-08 16:49:08 -05:00
Jeff Layton da33a871ba locks: remove misleading obsolete comment
The spinlock handling in this file has changed significantly since this
comment was written, and the file_lock_lock is no more. In addition,
this overall comment no longer applies. Deleting an entry now requires
both locks.

Signed-off-by: Jeff Layton <jlayton@kernel.org>
2018-08-08 12:59:06 -04:00
Bob Peterson f5580d0f8b gfs2: eliminate update_rgrp_lvb_unlinked
Function update_rgrp_lvb_unlinked used to do the same thing as
be32_add_cpu. This patch removes it in favor of using be32_add_cpu
directly.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Reviewed-by: Andrew Price <anprice@redhat.com>
2018-08-08 10:34:39 -05:00
Steve French 468d677954 smb3: display stats counters for number of slow commands
When CONFIG_CIFS_STATS2 is enabled keep counters for slow
commands (ie server took longer than 1 second to respond)
by SMB2/SMB3 command code.  This can help in diagnosing
whether performance problems are on server (instead of
client) and which commands are causing the problem.

Sample output (the new lines contain words "slow responses ...")

$ cat /proc/fs/cifs/Stats
Resources in use
CIFS Session: 1
Share (unique mount targets): 2
SMB Request/Response Buffer: 1 Pool size: 5
SMB Small Req/Resp Buffer: 1 Pool size: 30
Total Large 10 Small 490 Allocations
Operations (MIDs): 0

0 session 0 share reconnects
Total vfs operations: 67 maximum at one time: 2
4 slow responses from localhost for command 5
1 slow responses from localhost for command 6
1 slow responses from localhost for command 14
1 slow responses from localhost for command 16

1) \\localhost\test
SMBs: 243
Bytes read: 1024000  Bytes written: 104857600
TreeConnects: 1 total 0 failed
TreeDisconnects: 0 total 0 failed
Creates: 40 total 0 failed
Closes: 39 total 0 failed
...

Signed-off-by: Steve French <stfrench@microsoft.com>
Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
Reviewed-by: Aurelien Aptel <aaptel@suse.com>
2018-08-07 14:30:59 -05:00
Aurelien Aptel a5c62f4833 CIFS: fix uninitialized ptr deref in smb2 signing
server->secmech.sdeschmacsha256 is not properly initialized before
smb2_shash_allocate(), set shash after that call.

also fix typo in error message

Fixes: 8de8c4608f ("cifs: Fix validation of signed data in smb2")

Signed-off-by: Aurelien Aptel <aaptel@suse.com>
Reviewed-by: Paulo Alcantara <palcantara@suse.com>
Reported-by: Xiaoli Feng <xifeng@redhat.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
CC: Stable <stable@vger.kernel.org>
2018-08-07 14:30:59 -05:00
Steve French fd09b7d3b3 smb3: Do not send SMB3 SET_INFO if nothing changed
An earlier commit had a typo which prevented the
optimization from working:

commit 18dd8e1a65 ("Do not send SMB3 SET_INFO request if nothing is changing")

Thank you to Metze for noticing this.  Also clear a
reserved field in the FILE_BASIC_INFO struct we send
that should be zero (all the other fields in that
struct were set or cleared explicitly already in
cifs_set_file_info).

Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
CC: Stable <stable@vger.kernel.org> # 4.9.x+
Reported-by: Stefan Metzmacher <metze@samba.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
2018-08-07 14:30:59 -05:00
Steve French d258650004 smb3: fix minor debug output for CONFIG_CIFS_STATS
CONFIG_CIFS_STATS is now always enabled (to simplify the
code and since the STATS are important for some common
customer use cases and also debugging), but needed one
minor change so that STATS shows as enabled in the debug
output in /proc/fs/cifs/DebugData, otherwise it could
get confusing with STATS no longer showing up in the
"Features" list in /proc/fs/cifs/DebugData when basic
stats were in fact available.

Signed-off-by: Steve French <stfrench@microsoft.com>
Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>
2018-08-07 14:30:59 -05:00
Steve French 020eec5f71 smb3: add tracepoint for slow responses
If responses take longer than one second from the server,
we can optionally log them to dmesg in current cifs.ko code
(CONFIG_CIFS_STATS2 must be configured and a
/proc/fs/cifs/cifsFYI flag must be set), but can be more useful
to log these via ftrace (tracepoint is smb3_slow_rsp) which
is easier and more granular (still requires CONFIG_CIFS_STATS2
to be configured in the build though).

Signed-off-by: Steve French <stfrench@microsoft.com>
Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
2018-08-07 14:28:01 -05:00
Ronnie Sahlberg e0bba0b854 cifs: add compound_send_recv()
Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
2018-08-07 14:23:20 -05:00
Ronnie Sahlberg 1f3a8f5f7a cifs: make smb_send_rqst take an array of requests
Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
2018-08-07 14:23:04 -05:00
Ronnie Sahlberg b2c96de7fe cifs: update init_sg, crypt_message to take an array of rqst
These are used for SMB3 encryption and compounded requests.
Update these functions and the other functions related to SMB3 encryption to
take an array of requests.

Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
2018-08-07 14:21:18 -05:00
Steve French c281bc0c74 smb3: fix reset of bytes read and written stats
echo 0 > /proc/fs/cifs/Stats is supposed to reset the stats
but there were four (see example below) that were not reset
(bytes read and witten, total vfs ops and max ops
at one time).

...
0 session 0 share reconnects
Total vfs operations: 100 maximum at one time: 2

1) \\localhost\test
SMBs: 0
Bytes read: 502092  Bytes written: 31457286
TreeConnects: 0 total 0 failed
TreeDisconnects: 0 total 0 failed
...

This patch fixes cifs_stats_proc_write to properly reset
those four.

Signed-off-by: Steve French <stfrench@microsoft.com>
Reviewed-by: Aurelien Aptel <aaptel@suse.com>
2018-08-07 14:20:22 -05:00
Steve French 52ce1ac429 smb3: display bytes_read and bytes_written in smb3 stats
We were only displaying bytes_read and bytes_written in cifs
stats, fix smb3 stats to also display them.  Sample output
with this patch:

    cat /proc/fs/cifs/Stats:

CIFS Session: 1
Share (unique mount targets): 2
SMB Request/Response Buffer: 1 Pool size: 5
SMB Small Req/Resp Buffer: 1 Pool size: 30
Operations (MIDs): 0

0 session 0 share reconnects
Total vfs operations: 94 maximum at one time: 2

1) \\localhost\test
SMBs: 214
Bytes read: 502092  Bytes written: 31457286
TreeConnects: 1 total 0 failed
TreeDisconnects: 0 total 0 failed
Creates: 52 total 3 failed
Closes: 48 total 0 failed
Flushes: 0 total 0 failed
Reads: 17 total 0 failed
Writes: 31 total 0 failed
...

Signed-off-by: Steve French <stfrench@microsoft.com>
Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>
2018-08-07 14:20:22 -05:00
Steve French fcabb89299 cifs: simple stats should always be enabled
CONFIG_CIFS_STATS should always be enabled as Pavel recently
noted.  Simple statistics are not a significant performance hit,
and removing the ifdef simplifies the code slightly.

Signed-off-by: Steve French <stfrench@microsoft.com>
Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>
Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
2018-08-07 14:20:22 -05:00
Ronnie Sahlberg 9da6ec7775 cifs: use a refcount to protect open/closing the cached file handle
Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
Cc: <stable@vger.kernel.org>
2018-08-07 14:20:22 -05:00
Steve French bf1fdeb789 smb3: add reconnect tracepoints
Add tracepoints for reconnecting an smb3 session

Example output (from trace-cmd) with the patch
(showing the session marked for reconnect, the stat failing, and then
the subsequent SMB3 commands after the server comes back up).
The "smb3_reconnect" event is the new one.

           cifsd-25993 [000] .... 29635.368265: smb3_reconnect: server=localhost current_mid=0x1e
            stat-26200 [001] .... 29638.516403: smb3_enter: 	cifs_revalidate_dentry_attr: xid=22
            stat-26200 [001] .... 29648.723296: smb3_exit_err: 	cifs_revalidate_dentry_attr: xid=22 rc=-112
     kworker/0:1-22830 [000] .... 29653.850947: smb3_cmd_done: 	sid=0x0 tid=0x0 cmd=0 mid=0
     kworker/0:1-22830 [000] .... 29653.851191: smb3_cmd_err: 	sid=0x8ae4683c tid=0x0 cmd=1 mid=1 status=0xc0000016 rc=-5
     kworker/0:1-22830 [000] .... 29653.855254: smb3_cmd_done: 	sid=0x8ae4683c tid=0x0 cmd=1 mid=2
     kworker/0:1-22830 [000] .... 29653.855482: smb3_cmd_done: 	sid=0x8ae4683c tid=0x8084f30d cmd=3 mid=3

Signed-off-by: Steve French <stfrench@microsoft.com>
Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>
Reviewed-by: Aurelien Aptel <aaptel@suse.com>
2018-08-07 14:20:22 -05:00
Steve French e68a932b0b smb3: add tracepoint for session expired or deleted
In debugging reconnection problems, want to be able to more easily
trace cases in which the server has marked the SMB3 session
expired or deleted (to distinguish from timeout cases).

Signed-off-by: Steve French <stfrench@microsoft.com>
Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>
2018-08-07 14:15:57 -05:00
Steve French 06188fcf9c cifs: remove unused stats
These timers were a good idea but weren't used in current code,
and the idea was cifs specific.  Future patch will add similar timers
for SMB2/SMB3, but no sense using memory for cifs timers that
aren't used in current code.

Signed-off-by: Steve French <stfrench@microsoft.com>
Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>
2018-08-07 14:15:57 -05:00
Steve French 22783155f4 smb3: don't request leases in symlink creation and query
Fixes problem pointed out by Pavel in discussions about commit
729c0c9dd5

Signed-off-by: Pavel Shilovsky <pshilov@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>
CC: Stable <stable@vger.kernel.org> # 3.18.x+
2018-08-07 14:15:57 -05:00
Steve French 1995d28f84 smb3: remove per-session operations from per-tree connection stats
Remove counters from the per-tree connection /proc/fs/cifs/Stats
output that will always be zero (since they are not per-tcon ops)
ie SMB3 Negotiate, SessionSetup, Logoff, Echo, Cancel.

Also clarify "sent" to be "total" per-Pavel's suggestion
(since this "total" includes total for all operations that we try to
send whether or not succesffully sent). Sample output below:

Resources in use
CIFS Session: 1
Share (unique mount targets): 2
SMB Request/Response Buffer: 1 Pool size: 5
SMB Small Req/Resp Buffer: 1 Pool size: 30
Operations (MIDs): 0

1 session 2 share reconnects
Total vfs operations: 23 maximum at one time: 2

1) \\localhost\test
SMBs: 45
TreeConnects: 2 total 0 failed
TreeDisconnects: 0 total 0 failed
Creates: 13 total 2 failed
Closes: 9 total 0 failed
Flushes: 0 total 0 failed
Reads: 0 total 0 failed
Writes: 1 total 0 failed
Locks: 0 total 0 failed
IOCTLs: 3 total 1 failed
QueryDirectories: 4 total 2 failed
ChangeNotifies: 0 total 0 failed
QueryInfos: 10 total 0 failed
SetInfos: 3 total 0 failed
OplockBreaks: 0 sent 0 failed

Signed-off-by: Steve French <stfrench@microsoft.com>
Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
2018-08-07 14:15:56 -05:00
Steve French 289131e1f1 SMB3: Number of requests sent should be displayed for SMB3 not just CIFS
For SMB2/SMB3 the number of requests sent was not displayed
in /proc/fs/cifs/Stats unless CONFIG_CIFS_STATS2 was
enabled (only number of failed requests displayed). As
with earlier dialects, we should be displaying these
counters if CONFIG_CIFS_STATS is enabled. They
are important for debugging.

e.g. when you cat /proc/fs/cifs/Stats (before the patch)
Resources in use
CIFS Session: 1
Share (unique mount targets): 2
SMB Request/Response Buffer: 1 Pool size: 5
SMB Small Req/Resp Buffer: 1 Pool size: 30
Operations (MIDs): 0

0 session 0 share reconnects
Total vfs operations: 690 maximum at one time: 2

1) \\localhost\test
SMBs: 975
Negotiates: 0 sent 0 failed
SessionSetups: 0 sent 0 failed
Logoffs: 0 sent 0 failed
TreeConnects: 0 sent 0 failed
TreeDisconnects: 0 sent 0 failed
Creates: 0 sent 2 failed
Closes: 0 sent 0 failed
Flushes: 0 sent 0 failed
Reads: 0 sent 0 failed
Writes: 0 sent 0 failed
Locks: 0 sent 0 failed
IOCTLs: 0 sent 1 failed
Cancels: 0 sent 0 failed
Echos: 0 sent 0 failed
QueryDirectories: 0 sent 63 failed

Signed-off-by: Steve French <stfrench@microsoft.com>
Reviewed-by: Aurelien Aptel <aaptel@suse.com>
Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
2018-08-07 14:15:56 -05:00
Steve French 8a69e96e61 smb3: snapshot mounts are read-only and make sure info is displayable about the mount
snapshot mounts were not marked as read-only and did not display the snapshot
time (in /proc/mounts) specified on mount

With this patch - note that can not write to the snapshot mount (see "ro" in
/proc/mounts line) and also the missing snapshot timewarp token time is
dumped.  Sample line from /proc/mounts with the patch:

//127.0.0.1/scratch /mnt2 smb3 ro,relatime,vers=default,cache=strict,username=testuser,domain=,uid=0,noforceuid,gid=0,noforcegid,addr=127.0.0.1,file_mode=0755,dir_mode=0755,soft,nounix,serverino,mapposix,noperm,rsize=1048576,wsize=1048576,echo_interval=60,snapshot=1234567,actimeo=1 0 0

Signed-off-by: Steve French <stfrench@microsoft.com>
Reviewed-by: Paulo Alcantara <palcantara@suse.de>
2018-08-07 14:15:56 -05:00
Steve French c3ed44026c smb3: remove noisy warning message on mount
Some servers, like Samba, don't support the fsctl for
query_network_interface_info so don't log a noisy warning
message on mount for this by default unless the error is more serious.
Lower the error to an FYI level so it does not get logged by
default.

Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>
Reviewed-by: Aurelien Aptel <aaptel@suse.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2018-08-07 14:15:56 -05:00
Steve French 0fdfef9aa7 smb3: simplify code by removing CONFIG_CIFS_SMB311
We really, really want to be encouraging use of secure dialects,
and SMB3.1.1 offers useful security features, and will soon
be the recommended dialect for many use cases. Simplify the code
by removing the CONFIG_CIFS_SMB311 ifdef so users don't disable
it in the build, and create compatibility and/or security issues
with modern servers - many of which have been supporting this
dialect for multiple years.

Also clarify some of the Kconfig text for cifs.ko about
SMB3.1.1 and current supported features in the module.

Signed-off-by: Steve French <stfrench@microsoft.com>
Acked-by: Aurelien Aptel <aaptel@suse.com>
Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>
2018-08-07 14:15:56 -05:00
Steve French 950132afd5 cifs: add missing debug entries for kconfig options
/proc/fs/cifs/DebugData displays the features (Kconfig options)
used to build cifs.ko but it was missing some, and needed comma
separator.  These can be useful in debugging certain problems
so we know which optional features were enabled in the user's build.
Also clarify them, by making them more closely match the
corresponding CONFIG_CIFS_* parm.

Old format:
Features: dfs fscache posix spnego xattr acl

New format:
Features: DFS,FSCACHE,SMB_DIRECT,STATS,DEBUG2,ALLOW_INSECURE_LEGACY,CIFS_POSIX,UPCALL(SPNEGO),XATTR,ACL

Signed-off-by: Steve French <stfrench@microsoft.com>
Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>
Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
Reviewed-by: Paulo Alcantara <palcantara@suse.de>
CC: Stable <stable@vger.kernel.org>
2018-08-07 14:15:56 -05:00
Steve French 2d30421783 smb3: add support for statfs for smb3.1.1 posix extensions
Output now matches expected stat -f output for all fields
except for Namelen and ID which were addressed in a companion
patch (which retrieves them from existing SMB3 mechanisms
and works whether POSIX enabled or not)

Signed-off-by: Steve French <smfrench@gmail.com>
Reviewed-by: Aurelien Aptel <aaptel@suse.com>
2018-08-07 14:15:41 -05:00
Steve French 21ba3845b5 smb3: fill in statfs fsid and correct namelen
Fil in the correct namelen (typically 255 not 4096) in the
statfs response and also fill in a reasonably unique fsid
(in this case taken from the volume id, and the creation time
of the volume).

In the case of the POSIX statfs all fields are now filled in,
and in the case of non-POSIX mounts, all fields are filled
in which can be.

Signed-off-by: Steve French <stfrench@gmail.com>
CC: Stable <stable@vger.kernel.org>
Reviewed-by: Aurelien Aptel <aaptel@suse.com>
2018-08-07 14:15:41 -05:00
Paulo Alcantara a12d0c590c cifs: Make sure all data pages are signed correctly
Check if every data page is signed correctly in sigining helper.

Signed-off-by: Paulo Alcantara <palcantara@suse.de>
Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2018-08-07 14:15:41 -05:00
Aurelien Aptel 256b4c3f03 CIFS: fix memory leak and remove dead code
also fixes error code in smb311_posix_mkdir() (where
the error assignment needs to go before the goto)
a typo that Dan Carpenter and Paulo and Gustavo
pointed out.

Signed-off-by: Aurelien Aptel <aaptel@suse.com>
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Reviewed-by: Paulo Alcantara <palcantara@suse.de>
Signed-off-by: Steve French <stfrench@microsoft.com>
2018-08-07 14:15:41 -05:00
Steve French 7420451f6a cifs: allow disabling insecure dialects in the config
allow disabling cifs (SMB1 ie vers=1.0) and vers=2.0 in the
config for the build of cifs.ko if want to always prevent mounting
with these less secure dialects.

Signed-off-by: Steve French <stfrench@microsoft.com>
Reviewed-by: Aurelien Aptel <aaptel@suse.com>
Reviewed-by: Jeremy Allison <jra@samba.org>
2018-08-07 14:15:41 -05:00
Steve French 8505c8bfd8 smb3: if server does not support posix do not allow posix mount option
If user specifies "posix" on an SMB3.11 mount, then fail the mount
if server does not return the POSIX negotiate context indicating
support for posix.

Signed-off-by: Steve French <stfrench@microsoft.com>
Reviewed-by: Aurelien Aptel <aaptel@suse.com>
2018-08-07 14:15:41 -05:00
Arnd Bergmann cbedeadf9c cifs: use 64-bit timestamps for fscache
In the fscache, we just need the timestamps as cookies to check for
changes, so we don't really care about the overflow, but it's better
to stop using the deprecated timespec so we don't have to go through
explicit conversion functions.

To avoid comparing uninitialized padding values that are copied
while assigning the timespec values, this rearranges the members of
cifs_fscache_inode_auxdata to avoid padding, and assigns them
individually.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Paulo Alcantara <palcantara@suse.de>
Signed-off-by: Steve French <stfrench@microsoft.com>
2018-08-07 14:15:41 -05:00
Arnd Bergmann 95390201e7 cifs: use timespec64 internally
In cifs, the timestamps are stored in memory in the cifs_fattr structure,
which uses the deprecated 'timespec' structure. Now that the VFS code
has moved on to 'timespec64', the next step is to change over the fattr
as well.

This also makes 32-bit and 64-bit systems behave the same way, and
no longer overflow the 32-bit time_t in year 2038.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Paulo Alcantara <palcantara@suse.de>
Signed-off-by: Steve French <stfrench@microsoft.com>
2018-08-07 14:15:41 -05:00
Dan Carpenter ff361fda55 cifs: Silence uninitialized variable warning
This is not really a runtime issue but Smatch complains that:

    fs/cifs/smb2ops.c:1740 smb2_query_symlink()
    error: uninitialized symbol 'resp_buftype'.

The warning is right that it can be uninitialized...  Also "err_buf"
would be NULL at this point and we're not supposed to pass NULLs to
free_rsp_buf() or it might trigger some extra output if we turn on
debugging.

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Paulo Alcantara <palcantara@suse.de>
Signed-off-by: Steve French <stfrench@microsoft.com>
2018-08-07 14:15:41 -05:00
Brian Foster 73971b172a xfs: remove dead error handling code in xfs_dquot_disk_alloc()
Colin Ian King reports that commit 82ff27bc52 ("xfs: automatic dfops
buffer relogging") leaves around some dead error handling code in
xfs_dquot_disk_alloc(). This was discovered via Coverity scan.

Since the associated commit eliminates the act of joining a buffer
to a dfops, this intermediate error state is no longer possible and
the error handling code can be removed. Since the caller cancels the
transaction on error, which cancels the dfops, eliminate the
unnecessary xfs_defer_cancel() call and error handling labels.

Fixes: 82ff27bc52 ("xfs: automatic dfops buffer relogging")
Reported-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-08-07 10:57:13 -07:00
Christoph Hellwig 2ba090d521 xfs: use WRITE_ONCE to update if_seq
This adds ordering of the updates and makes sure we always see the if_seq
update before the extent tree is modified.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-08-07 10:57:12 -07:00
Bob Peterson dffe12a828 gfs2: Fix gfs2_testbit to use clone bitmaps
Function gfs2_testbit is called in three places. Two of those places,
gfs2_alloc_extent and gfs2_unaligned_extlen, should be using the clone
bitmaps, not the "real" bitmaps. Function gfs2_unaligned_extlen is used
by the block reservations scheme to determine the length of an extent of
free blocks. Before this patch, it wasn't using the clone bitmap, which
means recently-freed blocks were treated as free blocks for the purposes
of an allocation.

This patch adds a new parameter to gfs2_testbit to indicate whether or
not the clone bitmaps should be used (if available).

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Reviewed-by: Andreas Gruenbacher <agruenba@redhat.com>
2018-08-07 10:07:00 -05:00
Jeff Layton c883da313e locks: add tracepoint in flock codepath
Signed-off-by: Jeff Layton <jlayton@kernel.org>
2018-08-06 13:15:16 -04:00
Al Viro 90bad5e05b root dentries need RCU-delayed freeing
Since mountpoint crossing can happen without leaving lazy mode,
root dentries do need the same protection against having their
memory freed without RCU delay as everything else in the tree.

It's partially hidden by RCU delay between detaching from the
mount tree and dropping the vfsmount reference, but the starting
point of pathwalk can be on an already detached mount, in which
case umount-caused RCU delay has already passed by the time the
lazy pathwalk grabs rcu_read_lock().  If the starting point
happens to be at the root of that vfsmount *and* that vfsmount
covers the entire filesystem, we get trouble.

Fixes: 48a066e72d ("RCU'd vsfmounts")
Cc: stable@vger.kernel.org
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-08-06 09:13:32 -04:00
Naohiro Aota 39379faaad btrfs: revert fs_devices state on error of btrfs_init_new_device
When btrfs hits error after modifying fs_devices in
btrfs_init_new_device() (such as btrfs_add_dev_item() returns error), it
leaves everything as is, but frees allocated btrfs_device. As a result,
fs_devices->devices and fs_devices->alloc_list contain already freed
btrfs_device, leading to later use-after-free bug.

Error path also messes the things like ->num_devices. While they go back
to the original value by unscanning btrfs devices, it is safe to revert
them here.

Fixes: 79787eaab4 ("btrfs: replace many BUG_ONs with proper error handling")
Signed-off-by: Naohiro Aota <naota@elisp.net>
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2018-08-06 13:13:04 +02:00
Qu Wenruo 64f64f43c8 btrfs: Exit gracefully when chunk map cannot be inserted to the tree
It's entirely possible that a crafted btrfs image contains overlapping
chunks.

Although we can't detect such problem by tree-checker, it's not a
catastrophic problem, current extent map can already detect such problem
and return -EEXIST.

We just only need to exit gracefully and fail the mount.

Reported-by: Xu Wen <wen.xu@gatech.edu>
Link: https://bugzilla.kernel.org/show_bug.cgi?id=200409
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2018-08-06 13:13:03 +02:00
Qu Wenruo cf90d884b3 btrfs: Introduce mount time chunk <-> dev extent mapping check
This patch will introduce chunk <-> dev extent mapping check, to protect
us against invalid dev extents or chunks.

Since chunk mapping is the fundamental infrastructure of btrfs, extra
check at mount time could prevent a lot of unexpected behavior (BUG_ON).

Reported-by: Xu Wen <wen.xu@gatech.edu>
Link: https://bugzilla.kernel.org/show_bug.cgi?id=200403
Link: https://bugzilla.kernel.org/show_bug.cgi?id=200407
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: Su Yue <suy.fnst@cn.fujitsu.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2018-08-06 13:13:03 +02:00
Qu Wenruo 7ef49515fa btrfs: Verify that every chunk has corresponding block group at mount time
If a crafted image has missing block group items, it could cause
unexpected behavior and breaks the assumption of 1:1 chunk<->block group
mapping.

Although we have the block group -> chunk mapping check, we still need
chunk -> block group mapping check.

This patch will do extra check to ensure each chunk has its
corresponding block group.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=199847
Reported-by: Xu Wen <wen.xu@gatech.edu>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: Gu Jinxiang <gujx@cn.fujitsu.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2018-08-06 13:13:03 +02:00
Qu Wenruo 514c7dca85 btrfs: Check that each block group has corresponding chunk at mount time
A crafted btrfs image with incorrect chunk<->block group mapping will
trigger a lot of unexpected things as the mapping is essential.

Although the problem can be caught by block group item checker
added in "btrfs: tree-checker: Verify block_group_item", it's still not
sufficient.  A sufficiently valid block group item can pass the check
added by the mentioned patch but could fail to match the existing chunk.

This patch will add extra block group -> chunk mapping check, to ensure
we have a completely matching (start, len, flags) chunk for each block
group at mount time.

Here we reuse the original helper find_first_block_group(), which is
already doing the basic bg -> chunk checks, adding further checks of the
start/len and type flags.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=199837
Reported-by: Xu Wen <wen.xu@gatech.edu>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: Su Yue <suy.fnst@cn.fujitsu.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2018-08-06 13:13:03 +02:00
Filipe Manana 22d3151c2c Btrfs: send, fix incorrect file layout after hole punching beyond eof
When doing an incremental send, if we have a file in the parent snapshot
that has prealloc extents beyond EOF and in the send snapshot it got a
hole punch that partially covers the prealloc extents, the send stream,
when replayed by a receiver, can result in a file that has a size bigger
than it should and filled with zeroes past the correct EOF.

For example:

  $ mkfs.btrfs -f /dev/sdb
  $ mount /dev/sdb /mnt

  $ xfs_io -f -c "falloc -k 0 4M" /mnt/foobar
  $ xfs_io -c "pwrite -S 0xea 0 1M" /mnt/foobar

  $ btrfs subvolume snapshot -r /mnt /mnt/snap1
  $ btrfs send -f /tmp/1.send /mnt/snap1

  $ xfs_io -c "fpunch 1M 2M" /mnt/foobar

  $ btrfs subvolume snapshot -r /mnt /mnt/snap2
  $ btrfs send -f /tmp/2.send -p /mnt/snap1 /mnt/snap2

  $ stat --format %s /mnt/snap2/foobar
  1048576
  $ md5sum /mnt/snap2/foobar
  d31659e82e87798acd4669a1e0a19d4f  /mnt/snap2/foobar

  $ umount /mnt
  $ mkfs.btrfs -f /dev/sdc
  $ mount /dev/sdc /mnt

  $ btrfs receive -f /mnt/1.snap /mnt
  $ btrfs receive -f /mnt/2.snap /mnt

  $ stat --format %s /mnt/snap2/foobar
  3145728
  # --> should be 1Mb and not 3Mb (which was the end offset of hole
  #     punch operation)
  $ md5sum /mnt/snap2/foobar
  117baf295297c2a995f92da725b0b651  /mnt/snap2/foobar
  # --> should be d31659e82e87798acd4669a1e0a19d4f as in the original fs

This issue actually happens only since commit ffa7c4296e ("Btrfs: send,
do not issue unnecessary truncate operations"), but before that commit we
were issuing a write operation full of zeroes (to "punch" a hole) which
was extending the file size beyond the correct value and then immediately
issue a truncate operation to the correct size and undoing the previous
write operation. Since the send protocol does not support fallocate, for
extent preallocation and hole punching, fix this by not even attempting
to send a "hole" (regular write full of zeroes) if it starts at an offset
greater then or equals to the file's size. This approach, besides being
much more simple then making send issue the truncate operation, adds the
benefit of avoiding the useless pair of write of zeroes and truncate
operations, saving time and IO at the receiver and reducing the size of
the send stream.

A test case for fstests follows soon.

Fixes: ffa7c4296e ("Btrfs: send, do not issue unnecessary truncate operations")
CC: stable@vger.kernel.org # 4.17+
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2018-08-06 13:13:03 +02:00
Misono Tomohiro 672d599041 btrfs: Use wrapper macro for rcu string to remove duplicate code
Cleanup patch and no functional changes.

Signed-off-by: Misono Tomohiro <misono.tomohiro@jp.fujitsu.com>
Reviewed-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2018-08-06 13:13:02 +02:00
Al Viro f5b3a4173f btrfs: simplify btrfs_iget
Don't open-code iget_failed(), don't bother with btrfs_free_path(NULL),
move handling of positive return values of btrfs_lookup_inode() from
btrfs_read_locked_inode() to btrfs_iget() and kill now obviously
pointless ASSERT() in there.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2018-08-06 13:13:02 +02:00
Al Viro 9bc2ceff66 btrfs: lift make_bad_inode into btrfs_iget
We don't need to check is_bad_inode() after the call of
btrfs_read_locked_inode() - it's exactly the same as checking return
value for being non-zero.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Reviewed-by: David Sterba <dsterba@suse.com>
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2018-08-06 13:13:02 +02:00
Al Viro 8d9e220ca0 btrfs: simplify IS_ERR/PTR_ERR checks
IS_ERR(p) && PTR_ERR(p) == n is a weird way to spell p == ERR_PTR(n).

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Reviewed-by: David Sterba <dsterba@suse.com>
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
[ update changelog ]
Signed-off-by: David Sterba <dsterba@suse.com>
2018-08-06 13:13:02 +02:00
Al Viro 2e19f1f9d3 btrfs: btrfs_iget never returns an is_bad_inode inode
Just get rid of pointless checks.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2018-08-06 13:13:02 +02:00
Misono Tomohiro 1e7e1f9e3a btrfs: replace: Reset on-disk dev stats value after replace
on-disk devs stats value is updated in btrfs_run_dev_stats(),
which is called during commit transaction, if device->dev_stats_ccnt
is not zero.

Since current replace operation does not touch dev_stats_ccnt,
on-disk dev stats value is not updated. Therefore "btrfs device stats"
may return old device's value after umount/mount
(Example: See "btrfs ins dump-t -t DEV $DEV" after btrfs/100 finish).

Fix this by just incrementing dev_stats_ccnt in
btrfs_dev_replace_finishing() when replace is succeeded and this will
update the values.

Signed-off-by: Misono Tomohiro <misono.tomohiro@jp.fujitsu.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2018-08-06 13:13:01 +02:00
Misono Tomohiro 85c3954819 btrfs: extent-tree: Remove unused __btrfs_free_block_rsv
There is no user of this function anymore.

This was forgotten to be removed in commit a575ceeb13
("Btrfs: get rid of unused orphan infrastructure").

Signed-off-by: Misono Tomohiro <misono.tomohiro@jp.fujitsu.com>
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2018-08-06 13:13:01 +02:00
Misono Tomohiro afc6961ffd btrfs: backref: Use ERR_CAST to return error code
Use ERR_CAST() instead of void * to make meaning clear.

Signed-off-by: Misono Tomohiro <misono.tomohiro@jp.fujitsu.com>
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2018-08-06 13:13:01 +02:00
Lu Fengqi 5b7d687ad5 btrfs: Remove redundant btrfs_release_path from btrfs_unlink_subvol
Although it is safe to call this on already released paths with no locks
held or extent buffers, removing the redundant btrfs_release_path is
reasonable.

Signed-off-by: Lu Fengqi <lufq.fnst@cn.fujitsu.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2018-08-06 13:13:01 +02:00
Lu Fengqi 401b3b19d5 btrfs: Remove root parameter from btrfs_unlink_subvol
All callers pass the root tree of dir, we can push that down to the
function itself.

Signed-off-by: Lu Fengqi <lufq.fnst@cn.fujitsu.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2018-08-06 13:13:01 +02:00
Lu Fengqi 6025c19fb2 btrfs: Remove fs_info from btrfs_add_root_ref
It can be referenced from the passed transaction handle.

Signed-off-by: Lu Fengqi <lufq.fnst@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2018-08-06 13:13:00 +02:00
Lu Fengqi 3ee1c5530e btrfs: Remove fs_info from btrfs_del_root_ref
It can be referenced from the passed transaction handle.

Signed-off-by: Lu Fengqi <lufq.fnst@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2018-08-06 13:13:00 +02:00
Lu Fengqi ab9ce7d42b btrfs: Remove fs_info from btrfs_del_root
It can be referenced from the passed transaction handle.

Signed-off-by: Lu Fengqi <lufq.fnst@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2018-08-06 13:13:00 +02:00
Lu Fengqi 9add29457a btrfs: Remove fs_info from btrfs_delete_delayed_dir_index
It can be referenced from the passed transaction handle.

Signed-off-by: Lu Fengqi <lufq.fnst@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2018-08-06 13:13:00 +02:00
Lu Fengqi 4465c8b422 btrfs: Remove fs_info from btrfs_insert_delayed_dir_index
It can be referenced from the passed transaction handle.

Signed-off-by: Lu Fengqi <lufq.fnst@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2018-08-06 13:13:00 +02:00
David Sterba b5851021f1 btrfs: extent-tree: remove unused member walk_control::for_reloc
Leftover after fix e339a6b097 ("Btrfs: __btrfs_mod_ref should always
use no_quota"), that removed it from the function calls but not the
structure.

Signed-off-by: David Sterba <dsterba@suse.com>
2018-08-06 13:12:59 +02:00
Filipe Manana 46b2f4590a Btrfs: fix send failure when root has deleted files still open
The more common use case of send involves creating a RO snapshot and then
use it for a send operation. In this case it's not possible to have inodes
in the snapshot that have a link count of zero (inode with an orphan item)
since during snapshot creation we do the orphan cleanup. However, other
less common use cases for send can end up seeing inodes with a link count
of zero and in this case the send operation fails with a ENOENT error
because any attempt to generate a path for the inode, with the purpose
of creating it or updating it at the receiver, fails since there are no
inode reference items. One use case it to use a regular subvolume for
a send operation after turning it to RO mode or turning a RW snapshot
into RO mode and then using it for a send operation. In both cases, if a
file gets all its hard links deleted while there is an open file
descriptor before turning the subvolume/snapshot into RO mode, the send
operation will encounter an inode with a link count of zero and then
fail with errno ENOENT.

Example using a full send with a subvolume:

  $ mkfs.btrfs -f /dev/sdb
  $ mount /dev/sdb /mnt

  $ btrfs subvolume create /mnt/sv1
  $ touch /mnt/sv1/foo
  $ touch /mnt/sv1/bar

  # keep an open file descriptor on file bar
  $ exec 73</mnt/sv1/bar
  $ unlink /mnt/sv1/bar

  # Turn the subvolume to RO mode and use it for a full send, while
  # holding the open file descriptor.
  $ btrfs property set /mnt/sv1 ro true

  $ btrfs send -f /tmp/full.send /mnt/sv1
  At subvol /mnt/sv1
  ERROR: send ioctl failed with -2: No such file or directory

Example using an incremental send with snapshots:

  $ mkfs.btrfs -f /dev/sdb
  $ mount /dev/sdb /mnt

  $ btrfs subvolume create /mnt/sv1
  $ touch /mnt/sv1/foo
  $ touch /mnt/sv1/bar

  $ btrfs subvolume snapshot -r /mnt/sv1 /mnt/snap1

  $ echo "hello world" >> /mnt/sv1/bar

  $ btrfs subvolume snapshot -r /mnt/sv1 /mnt/snap2

  # Turn the second snapshot to RW mode and delete file foo while
  # holding an open file descriptor on it.
  $ btrfs property set /mnt/snap2 ro false
  $ exec 73</mnt/snap2/foo
  $ unlink /mnt/snap2/foo

  # Set the second snapshot back to RO mode and do an incremental send.
  $ btrfs property set /mnt/snap2 ro true

  $ btrfs send -f /tmp/inc.send -p /mnt/snap1 /mnt/snap2
  At subvol /mnt/snap2
  ERROR: send ioctl failed with -2: No such file or directory

So fix this by ignoring inodes with a link count of zero if we are either
doing a full send or if they do not exist in the parent snapshot (they
are new in the send snapshot), and unlink all paths found in the parent
snapshot when doing an incremental send (and ignoring all other inode
items, such as xattrs and extents).

A test case for fstests follows soon.

CC: stable@vger.kernel.org # 4.4+
Reported-by: Martin Wilck <martin.wilck@suse.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2018-08-06 13:12:59 +02:00
Filipe Manana 0d836392ca Btrfs: fix mount failure after fsync due to hard link recreation
If we end up with logging an inode reference item which has the same name
but different index from the one we have persisted, we end up failing when
replaying the log with an errno value of -EEXIST. The error comes from
btrfs_add_link(), which is called from add_inode_ref(), when we are
replaying an inode reference item.

Example scenario where this happens:

  $ mkfs.btrfs -f /dev/sdb
  $ mount /dev/sdb /mnt

  $ touch /mnt/foo
  $ ln /mnt/foo /mnt/bar

  $ sync

  # Rename the first hard link (foo) to a new name and rename the second
  # hard link (bar) to the old name of the first hard link (foo).
  $ mv /mnt/foo /mnt/qwerty
  $ mv /mnt/bar /mnt/foo

  # Create a new file, in the same parent directory, with the old name of
  # the second hard link (bar) and fsync this new file.
  # We do this instead of calling fsync on foo/qwerty because if we did
  # that the fsync resulted in a full transaction commit, not triggering
  # the problem.
  $ touch /mnt/bar
  $ xfs_io -c "fsync" /mnt/bar

  <power fail>

  $ mount /dev/sdb /mnt
  mount: mount /dev/sdb on /mnt failed: File exists

So fix this by checking if a conflicting inode reference exists (same
name, same parent but different index), removing it (and the associated
dir index entries from the parent inode) if it exists, before attempting
to add the new reference.

A test case for fstests follows soon.

CC: stable@vger.kernel.org # 4.4+
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2018-08-06 13:12:59 +02:00
Josef Bacik 4559b0a717 btrfs: don't leak ret from do_chunk_alloc
If we're trying to make a data reservation and we have to allocate a
data chunk we could leak ret == 1, as do_chunk_alloc() will return 1 if
it allocated a chunk.  Since the end of the function is the success path
just return 0.

CC: stable@vger.kernel.org # 4.4+
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2018-08-06 13:12:59 +02:00
David Sterba 84db5ccf42 btrfs: merge free_fs_root helpers
The exported helper just calls the static one. There's no obvious reason
to have them separate eg. for performance reasons where the static one
could be better optimized in the same unit. There's a slight decrease in
code size and stack consumption.

Signed-off-by: David Sterba <dsterba@suse.com>
2018-08-06 13:12:59 +02:00
David Sterba 2ffad70ed3 btrfs: constify strings passed to assertion helper
Signed-off-by: David Sterba <dsterba@suse.com>
2018-08-06 13:12:59 +02:00
David Sterba e9539cff04 btrfs: dev-replace: remove unused members of btrfs_dev_replace
Lock owner and nesting level have been unused since day 1, probably
copy&pasted from the extent_buffer locking scheme without much thinking.
The locking of device replace is simpler and does not need any lock
nesting.

Signed-off-by: David Sterba <dsterba@suse.com>
2018-08-06 13:12:58 +02:00
David Sterba e17385ca29 btrfs: remove unused member btrfs_root::name
Added in 58176a9604 ("Btrfs: Add per-root block accounting and sysfs
entries") in 2007, the roots had names exported in sysfs. The code
was commented out in 4df27c4d5c ("Btrfs: change how subvolumes
are organized") and cleaned by 182608c829 ("btrfs: remove old
unused commented out code").

Signed-off-by: David Sterba <dsterba@suse.com>
2018-08-06 13:12:58 +02:00
Adam Borowski 616d374efa btrfs: allow defrag on a file opened read-only that has rw permissions
Requiring a read-write descriptor conflicts both ways with exec,
returning ETXTBSY whenever you try to defrag a program that's currently
being run, or causing intermittent exec failures on a live system being
defragged.

As defrag doesn't change the file's contents in any way, there's no
reason to consider it a rw operation.  Thus, let's check only whether
the file could have been opened rw.  Such access control is still needed
as currently defrag can use extra disk space, and might trigger bugs.

We return EINVAL when the request is invalid; here it's ok but merely
the user has insufficient privileges.  Thus, the EPERM return value
reflects the error better -- as discussed in the identical case for
dedupe.

According to codesearch.debian.net, no userspace program distinguishes
these values beyond strerror().

Signed-off-by: Adam Borowski <kilobyte@angband.pl>
Reviewed-by: David Sterba <dsterba@suse.com>
[ fold the EPERM patch from Adam ]
Signed-off-by: David Sterba <dsterba@suse.com>
2018-08-06 13:12:58 +02:00
Josef Bacik 3c4276936f Btrfs: fix btrfs_write_inode vs delayed iput deadlock
We recently ran into the following deadlock involving
btrfs_write_inode():

[  +0.005066]  __schedule+0x38e/0x8c0
[  +0.007144]  schedule+0x36/0x80
[  +0.006447]  bit_wait+0x11/0x60
[  +0.006446]  __wait_on_bit+0xbe/0x110
[  +0.007487]  ? bit_wait_io+0x60/0x60
[  +0.007319]  __inode_wait_for_writeback+0x96/0xc0
[  +0.009568]  ? autoremove_wake_function+0x40/0x40
[  +0.009565]  inode_wait_for_writeback+0x21/0x30
[  +0.009224]  evict+0xb0/0x190
[  +0.006099]  iput+0x1a8/0x210
[  +0.006103]  btrfs_run_delayed_iputs+0x73/0xc0
[  +0.009047]  btrfs_commit_transaction+0x799/0x8c0
[  +0.009567]  btrfs_write_inode+0x81/0xb0
[  +0.008008]  __writeback_single_inode+0x267/0x320
[  +0.009569]  writeback_sb_inodes+0x25b/0x4e0
[  +0.008702]  wb_writeback+0x102/0x2d0
[  +0.007487]  wb_workfn+0xa4/0x310
[  +0.006794]  ? wb_workfn+0xa4/0x310
[  +0.007143]  process_one_work+0x150/0x410
[  +0.008179]  worker_thread+0x6d/0x520
[  +0.007490]  kthread+0x12c/0x160
[  +0.006620]  ? put_pwq_unlocked+0x80/0x80
[  +0.008185]  ? kthread_park+0xa0/0xa0
[  +0.007484]  ? do_syscall_64+0x53/0x150
[  +0.007837]  ret_from_fork+0x29/0x40

Writeback calls:

btrfs_write_inode
  btrfs_commit_transaction
    btrfs_run_delayed_iputs

If iput() is called on that same inode, evict() will wait for writeback
forever.

btrfs_write_inode() was originally added way back in 4730a4bc5b
("btrfs_dirty_inode") to support O_SYNC writes. However, ->write_inode()
hasn't been used for O_SYNC since 148f948ba8 ("vfs: Introduce new
helpers for syncing after writing to O_SYNC file or IS_SYNC inode"), so
btrfs_write_inode() is actually unnecessary (and leads to a bunch of
unnecessary commits). Get rid of it, which also gets rid of the
deadlock.

CC: stable@vger.kernel.org # 3.2+
Signed-off-by: Josef Bacik <jbacik@fb.com>
[Omar: new commit message]
Signed-off-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2018-08-06 13:12:58 +02:00
Nikolay Borisov 97aff912a2 btrfs: Remove fs_info from btrfs_finish_chunk_alloc
It can be referenced from the passed transaction handle.

Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: Lu Fengqi <lufq.fnst@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2018-08-06 13:12:58 +02:00
Nikolay Borisov f4208794d0 btrfs: Remove fs_info form btrfs_free_chunk
It can be referenced from the passed transaction handle.

Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: Lu Fengqi <lufq.fnst@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2018-08-06 13:12:57 +02:00
Nikolay Borisov 4f5ad7bd63 btrfs: Remove fs_info from btrfs_destroy_dev_replace_tgtdev
This function is always passed a well-formed tgtdevice so the fs_info
can be referenced from there.

Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: Lu Fengqi <lufq.fnst@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2018-08-06 13:12:57 +02:00
Nikolay Borisov d6507cf1e2 btrfs: Remove fs_info from btrfs_assign_next_active_device
It can be referenced from the passed 'device' argument which is always
a well-formed device.

Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: Lu Fengqi <lufq.fnst@cn.fujitsu.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2018-08-06 13:12:57 +02:00
Nikolay Borisov 5495f195fc btrfs: remove fs_info argument from update_dev_stat_item
It can be referenced from the passed transaction handle.

Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: Lu Fengqi <lufq.fnst@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2018-08-06 13:12:57 +02:00
Nikolay Borisov 68a9db5f23 btrfs: Remove fs_info from btrfs_rm_dev_replace_remove_srcdev
It can be referenced from the passed srcdev argument.

Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: Lu Fengqi <lufq.fnst@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2018-08-06 13:12:57 +02:00
Nikolay Borisov 8e87e85627 btrfs: Remove fs_info argument from btrfs_add_dev_item
It can be referenced form the passed transaction handle.

Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: Lu Fengqi <lufq.fnst@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2018-08-06 13:12:56 +02:00
Qu Wenruo 5e23a6fea6 btrfs: extent-tree: Remove dead alignment check
In find_free_extent() under checks: label, we have the following code:

		search_start = ALIGN(offset, fs_info->stripesize);
		/* move on to the next group */
		if (search_start + num_bytes >
		    block_group->key.objectid + block_group->key.offset) {
			btrfs_add_free_space(block_group, offset, num_bytes);
			goto loop;
		}
		if (offset < search_start)
			btrfs_add_free_space(block_group, offset,
					     search_start - offset);
		BUG_ON(offset > search_start);

However ALIGN() is rounding up, thus @search_start >= @offset and that
BUG_ON() will never be triggered.

Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2018-08-06 13:12:56 +02:00
Filipe Manana ca5d2ba1ae Btrfs: remove unused key assignment when doing a full send
At send.c:full_send_tree() we were setting the 'key' variable in the loop
while never using it later. We were also using two btrfs_key variables
to store the initial key for search and the key found in every iteration
of the loop. So remove this useless key assignment and use the same
btrfs_key variable to store the initial search key and the key found in
each iteration. This was introduced in the initial send commit but was
never used (commit 31db9f7c23 ("Btrfs: introduce BTRFS_IOC_SEND for
btrfs send/receive").

Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2018-08-06 13:12:56 +02:00
David Sterba 5cdc84bfde btrfs: drop extent_io_ops::set_range_writeback callback
The data and metadata callback implementation both use the same
function. We can remove the call indirection and intermediate helper
completely.

Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2018-08-06 13:12:56 +02:00
David Sterba 00032d38ea btrfs: drop extent_io_ops::merge_bio_hook callback
The data and metadata callback implementation both use the same
function. We can remove the call indirection completely.

Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2018-08-06 13:12:56 +02:00
David Sterba 05912a3c04 btrfs: drop extent_io_ops::tree_fs_info callback
All implementations of the callback are trivial and do the same and
there's only one user. Merge everything together.

Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2018-08-06 13:12:55 +02:00
David Sterba e288c080dd btrfs: unify end_io callbacks of async_submit_bio
The end_io callbacks passed to btrfs_wq_submit_bio
(btrfs_submit_bio_done and btree_submit_bio_done) are effectively the
same code, there's no point to do the indirection. Export
btrfs_submit_bio_done and call it directly.

Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2018-08-06 13:12:55 +02:00
David Sterba d7cbfafc4b btrfs: remove unused member async_submit_bio::bio_flags
After splitting the start and end hooks in a758781d4b ("btrfs:
separate types for submit_bio_start and submit_bio_done"), some of
the function arguments were dropped but not removed from the structure.

Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2018-08-06 13:12:55 +02:00
David Sterba d7e8555b1d btrfs: remove unused member async_submit_bio::fs_info
Introduced by c6100a4b4e ("Btrfs: replace tree->mapping with
tree->private_data") to be used in run_one_async_done where it got
unused after 736cd52e0c ("Btrfs: remove nr_async_submits and
async_submit_draining").

Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2018-08-06 13:12:55 +02:00
Gu Jinxiang 315409b009 btrfs: validate type when reading a chunk
Reported in https://bugzilla.kernel.org/show_bug.cgi?id=199839, with an
image that has an invalid chunk type but does not return an error.

Add chunk type check in btrfs_check_chunk_valid, to detect the wrong
type combinations.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=199839
Reported-by: Xu Wen <wen.xu@gatech.edu>
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Gu Jinxiang <gujx@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2018-08-06 13:12:55 +02:00
Nikolay Borisov b0132a3be5 btrfs: Rename EXTENT_BUFFER_DUMMY to EXTENT_BUFFER_UNMAPPED
EXTENT_BUFFER_DUMMY is an awful name for this flag. Buffers which have
this flag set are not in any way dummy. Rather, they are private in the
sense that are not mapped and linked to the global buffer tree. This
flag has subtle implications to the way free_extent_buffer works for
example, as well as controls whether page->mapping->private_lock is held
during extent_buffer release. Pages for an unmapped buffer cannot be
under io, nor can they be written by a 3rd party so taking the lock is
unnecessary.

Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
[ EXTENT_BUFFER_UNMAPPED, update changelog ]
Signed-off-by: David Sterba <dsterba@suse.com>
2018-08-06 13:12:55 +02:00
Nikolay Borisov 07e21c4dad btrfs: Document locking requirement via lockdep_assert_held
Remove stale comment since there is no longer an eb->eb_lock and
document the locking expectation with a lockdep_assert_held statement.
No functional changes.

Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2018-08-06 13:12:54 +02:00
David Sterba 55ac01396a btrfs: rename btrfs_release_extent_buffer_page
The function used to release one page (and always the first one), but
not anymore since a50924e3a4 ("btrfs: drop constant param
from btrfs_release_extent_buffer_page").  Update the name and comment.

Signed-off-by: David Sterba <dsterba@suse.com>
2018-08-06 13:12:54 +02:00
Nikolay Borisov d64766fdf9 btrfs: Refactor loop in btrfs_release_extent_buffer_page
The purpose of the function is to free all the pages comprising an
extent buffer. This can be achieved with a simple for loop rather than
the slightly more involved 'do {} while' construct. So rewrite the
loop using a 'for' construct. Additionally we can never have an
extent_buffer that has 0 pages so remove the check for index == 0. No
functional changes.

The reversed order used to have a meaning in the past where the first
page served as a blocking point for several callers. See eg
4f2de97ace ("Btrfs: set page->private to the eb").

Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2018-08-06 13:12:54 +02:00
Nikolay Borisov b16d011e79 btrfs: Reword dodgy comments in alloc_extent_buffer
Commit eb14ab8ed2 ("Btrfs: fix page->private races") fixed a genuine
race between extent buffer initialisation and btree_releasepage.
Unfortunately as the code has evolved the comments weren't changed which
made them slightly wrong and they weren't very clear in the fist place.
Fix this by (hopefully) rewording them in a more approachable manner.

Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2018-08-06 13:12:54 +02:00
Nikolay Borisov 28187ae569 btrfs: Simplify page unlocking in alloc_extent_buffer
Current version of the page unlocking code was added in
727011e07c ("Btrfs: allow metadata blocks larger than the page size")
but even in this commit that particular flag was never used per-se. In
fact, btrfs only uses PageChecked for data pages to identify pages
which have been dirtied but don't have ORDERED bit set. For more
information see 247e743cbe ("Btrfs: Use async helpers to deal with
pages that have been improperly dirtied").

However, this doesn't apply to extent buffer pages. The important bit
here is that the pages are unlocked AFTER the extent buffer has been
properly recorded in the radix tree to avoid races with
btree_releasepage. Let's exploit this fact and simplify the page
unlocking sequence by unlocking the pages in-order and removing the
redundant PageChecked flag setting/clearing.

Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2018-08-06 13:12:54 +02:00
Qu Wenruo 8b9b6f2554 btrfs: scrub: cleanup the remaining nodatasum fixup code
Remove the remaining code that misused the page cache pages during
device replace and could cause data corruption for compressed nodatasum
extents. Such files do not normally exist but there's a bug that allows
this combination and the corruption was exposed by device replace fixup
code.

Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2018-08-06 13:12:53 +02:00
David Sterba 46df06b85e btrfs: refactor block group replication factor calculation to a helper
There are many places that open code the duplicity factor of the block
group profiles, create a common helper. This can be easily extended for
more copies.

Signed-off-by: David Sterba <dsterba@suse.com>
2018-08-06 13:12:53 +02:00
Anand Jain 321a4bf72b btrfs: use the assigned fs_devices instead of the dereference
We have assigned the %fs_info->fs_devices in %fs_devices as its not
modified just use it for the mutex_lock().

Signed-off-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2018-08-06 13:12:53 +02:00
Lu Fengqi 62088ca742 btrfs: qgroup: Drop fs_info parameter from qgroup_rescan_leaf
It can be fetched from the transaction handle.

Signed-off-by: Lu Fengqi <lufq.fnst@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2018-08-06 13:12:53 +02:00
Lu Fengqi a937742250 btrfs: qgroup: Drop fs_info parameter from btrfs_qgroup_inherit
It can be fetched from the transaction handle.

Signed-off-by: Lu Fengqi <lufq.fnst@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2018-08-06 13:12:53 +02:00
Lu Fengqi 280f8bd2cb btrfs: qgroup: Drop fs_info parameter from btrfs_run_qgroups
It can be fetched from the transaction handle.

Signed-off-by: Lu Fengqi <lufq.fnst@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2018-08-06 13:12:52 +02:00
Lu Fengqi 8696d76045 btrfs: qgroup: Drop fs_info parameter from btrfs_qgroup_account_extent
It can be fetched from the transaction handle.

Signed-off-by: Lu Fengqi <lufq.fnst@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2018-08-06 13:12:52 +02:00
Lu Fengqi deb4062743 btrfs: qgroup: Drop root parameter from btrfs_qgroup_trace_subtree
The fs_info can be fetched from the transaction handle directly.

Signed-off-by: Lu Fengqi <lufq.fnst@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2018-08-06 13:12:52 +02:00
Lu Fengqi 8d38d7eb7b btrfs: qgroup: Drop fs_info parameter from btrfs_qgroup_trace_leaf_items
It can be fetched from the transaction handle.

Signed-off-by: Lu Fengqi <lufq.fnst@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2018-08-06 13:12:52 +02:00
Lu Fengqi a95f3aafd6 btrfs: qgroup: Drop fs_info parameter from btrfs_qgroup_trace_extent
It can be fetched from the transaction handle. In addition, remove the
WARN_ON(trans == NULL) because it's not possible to hit this condition.

Signed-off-by: Lu Fengqi <lufq.fnst@cn.fujitsu.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2018-08-06 13:12:52 +02:00
Lu Fengqi f0042d5e92 btrfs: qgroup: Drop fs_info parameter from btrfs_limit_qgroup
It can be fetched from the transaction handle.

Signed-off-by: Lu Fengqi <lufq.fnst@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2018-08-06 13:12:51 +02:00
Lu Fengqi 3efbee1d00 btrfs: qgroup: Drop fs_info parameter from btrfs_remove_qgroup
It can be fetched from the transaction handle.

Signed-off-by: Lu Fengqi <lufq.fnst@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2018-08-06 13:12:51 +02:00
Lu Fengqi 49a05ecde3 btrfs: qgroup: Drop fs_info parameter from btrfs_create_qgroup
It can be fetched from the transaction handle.

Signed-off-by: Lu Fengqi <lufq.fnst@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2018-08-06 13:12:51 +02:00