Commit Graph

871685 Commits

Author SHA1 Message Date
Christoph Hellwig 390aab0a16 xfs: move the locking from xlog_state_finish_copy to the callers
This will allow optimizing various locking cycles in the following
patches.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
2019-10-21 09:04:58 -07:00
Christoph Hellwig 2c68a1dfbd xfs: remove the unused ic_io_size field from xlog_in_core
ic_io_size is only used inside xlog_write_iclog, where we can just use
the count parameter intead.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
2019-10-21 09:04:58 -07:00
Christoph Hellwig cd95cb962b xfs: pass the correct flag to xlog_write_iclog
xlog_write_iclog expects a bool for the second argument.  While any
non-0 value happens to work fine this makes all calls consistent.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
2019-10-21 09:04:58 -07:00
Brian Foster dc8e69bd72 xfs: optimize near mode bnobt scans with concurrent cntbt lookups
The near mode fallback algorithm consists of a left/right scan of
the bnobt. This algorithm has very poor breakdown characteristics
under worst case free space fragmentation conditions. If a suitable
extent is far enough from the locality hint, each allocation may
scan most or all of the bnobt before it completes. This causes
pathological behavior and extremely high allocation latencies.

While locality is important to near mode allocations, it is not so
important as to incur pathological allocation latency to provide the
asolute best available locality for every allocation. If the
allocation is large enough or far enough away, there is a point of
diminishing returns. As such, we can bound the overall operation by
including an iterative cntbt lookup in the broader search. The cntbt
lookup is optimized to immediately find the extent with best
locality for the given size on each iteration. Since the cntbt is
indexed by extent size, the lookup repeats with a variably
aggressive increasing search key size until it runs off the edge of
the tree.

This approach provides a natural balance between the two algorithms
for various situations. For example, the bnobt scan is able to
satisfy smaller allocations such as for inode chunks or btree blocks
more quickly where the cntbt search may have to search through a
large set of extent sizes when the search key starts off small
relative to the largest extent in the tree. On the other hand, the
cntbt search more deterministically covers the set of suitable
extents for larger data extent allocation requests that the bnobt
scan may have to search the entire tree to locate.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-10-21 09:04:58 -07:00
Brian Foster d29688257f xfs: factor out tree fixup logic into helper
Lift the btree fixup path into a helper function.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-10-21 09:04:58 -07:00
Brian Foster 0e26d5ca4a xfs: refactor near mode alloc bnobt scan into separate function
In preparation to enhance the near mode allocation bnobt scan algorithm, lift
it into a separate function. No functional changes.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-10-21 09:04:58 -07:00
Brian Foster 78d7aabdee xfs: refactor and reuse best extent scanning logic
The bnobt "find best" helper implements a simple btree walker
function. This general pattern, or a subset thereof, is reused in
various parts of a near mode allocation operation. For example, the
bnobt left/right scans are each iterative btree walks along with the
cntbt lastblock scan.

Rework this function into a generic btree walker, add a couple
parameters to control termination behavior from various contexts and
reuse it where applicable.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-10-21 09:04:58 -07:00
Brian Foster 4a65b7c2c7 xfs: refactor allocation tree fixup code
Both algorithms duplicate the same btree allocation code. Eliminate
the duplication and reuse the fallback algorithm codepath.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-10-21 09:04:58 -07:00
Brian Foster fec0afdaf4 xfs: reuse best extent tracking logic for bnobt scan
The near mode bnobt scan searches left and right in the bnobt
looking for the closest free extent to the allocation hint that
satisfies minlen. Once such an extent is found, the left/right
search terminates, we search one more time in the opposite direction
and finish the allocation with the best overall extent.

The left/right and find best searches are currently controlled via a
combination of cursor state and local variables. Clean up this code
and prepare for further improvements to the near mode fallback
algorithm by reusing the allocation cursor best extent tracking
mechanism. Update the tracking logic to deactivate bnobt cursors
when out of allocation range and replace open-coded extent checks to
calls to the common helper. In doing so, rename some misnamed local
variables in the top-level near mode allocation function.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-10-21 09:04:58 -07:00
Brian Foster 396bbf3c65 xfs: refactor cntbt lastblock scan best extent logic into helper
The cntbt lastblock scan checks the size, alignment, locality, etc.
of each free extent in the block and compares it with the current
best candidate. This logic will be reused by the upcoming optimized
cntbt algorithm, so refactor it into a separate helper. Note that
acur->diff is now initialized to -1 (unsigned) instead of 0 to
support the more granular comparison logic in the new helper.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-10-21 09:04:58 -07:00
Brian Foster c62321a2a0 xfs: track best extent from cntbt lastblock scan in alloc cursor
If the size lookup lands in the last block of the by-size btree, the
near mode algorithm scans the entire block for the extent with best
available locality. In preparation for similar best available
extent tracking across both btrees, extend the allocation cursor
with best extent data and lift the associated state from the cntbt
last block scan code. No functional changes.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-10-21 09:04:58 -07:00
Brian Foster d6d3aff203 xfs: track allocation busy state in allocation cursor
Extend the allocation cursor to track extent busy state for an
allocation attempt. No functional changes.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-10-21 09:04:58 -07:00
Brian Foster f5e7dbea1e xfs: introduce allocation cursor data structure
Introduce a new allocation cursor data structure to encapsulate the
various states and structures used to perform an extent allocation.
This structure will eventually be used to track overall allocation
state across different search algorithms on both free space btrees.

To start, include the three btree cursors (one for the cntbt and two
for the bnobt left/right search) used by the near mode allocation
algorithm and refactor the cursor setup and teardown code into
helpers. This slightly changes cursor memory allocation patterns,
but otherwise makes no functional changes to the allocation
algorithm.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
[darrick: fix sparse complaints]
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-10-21 09:04:58 -07:00
Brian Foster f6b428a46d xfs: track active state of allocation btree cursors
The upcoming allocation algorithm update searches multiple
allocation btree cursors concurrently. As such, it requires an
active state to track when a particular cursor should continue
searching. While active state will be modified based on higher level
logic, we can define base functionality based on the result of
allocation btree lookups.

Define an active flag in the private area of the btree cursor.
Update it based on the result of lookups in the existing allocation
btree helpers. Finally, provide a new helper to query the current
state.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-10-21 09:04:58 -07:00
Christoph Hellwig bdb2ed2dbd xfs: ignore extent size hints for always COW inodes
There is no point in applying extent size hints for always COW inodes,
as we would just have to COW any extra allocation beyond the data
actually written.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-10-21 09:04:58 -07:00
yu kuai e5e634041b xfs: include QUOTA, FATAL ASSERT build options in XFS_BUILD_OPTIONS
In commit d03a2f1b9f ("xfs: include WARN, REPAIR build options in
XFS_BUILD_OPTIONS"), Eric pointed out that the XFS_BUILD_OPTIONS string,
shown at module init time and in modinfo output, does not currently
include all available build options. So, he added in CONFIG_XFS_WARN and
CONFIG_XFS_REPAIR. However, this is not enough, add in CONFIG_XFS_QUOTA
and CONFIG_XFS_ASSERT_FATAL.

Signed-off-by: yu kuai <yukuai3@huawei.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-10-21 09:04:57 -07:00
Goldwyn Rodrigues c039b99792 iomap: use a srcmap for a read-modify-write I/O
The srcmap is used to identify where the read is to be performed from.
It is passed to ->iomap_begin, which can fill it in if we need to read
data for partially written blocks from a different location than the
write target.  The srcmap is only supported for buffered writes so far.

Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
[hch: merged two patches, removed the IOMAP_F_COW flag, use iomap as
      srcmap if not set, adjust length down to srcmap end as well]
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Acked-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
2019-10-21 08:51:59 -07:00
Christoph Hellwig eb81cf9d0e iomap: renumber IOMAP_HOLE to 0
Instead of keeping a separate unnamed state for uninitialized iomaps,
renumber IOMAP_HOLE to zero so that an uninitialized iomap is treated
as a hole.

Suggested-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-10-21 08:51:59 -07:00
Christoph Hellwig 32a38a4991 iomap: use write_begin to read pages to unshare
Use the existing iomap write_begin code to read the pages unshared
by iomap_file_unshare.  That avoids the extra ->readpage call and
extent tree lookup currently done by read_mapping_page.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-10-21 08:51:59 -07:00
Christoph Hellwig d3b4043969 iomap: move the zeroing case out of iomap_read_page_sync
That keeps the function a little easier to understand, and easier to
modify for pending enhancements.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-10-21 08:51:59 -07:00
Christoph Hellwig 3590c4d897 iomap: ignore non-shared or non-data blocks in xfs_file_dirty
xfs_file_dirty is used to unshare reflink blocks.  Rename the function
to xfs_file_unshare to better document that purpose, and skip iomaps
that are not shared and don't need zeroing.  This will allow to simplify
the caller.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-10-21 08:51:59 -07:00
Christoph Hellwig dcd6158d15 iomap: always use AOP_FLAG_NOFS in iomap_write_begin
All callers pass AOP_FLAG_NOFS, so lift that flag to iomap_write_begin
to allow reusing the flags arguments for an internal flags namespace
soon.  Also remove the local index variable that is only used once.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-10-21 08:51:59 -07:00
Christoph Hellwig c12d6fa88d iomap: remove the unused iomap argument to __iomap_write_end
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Allison Collins <allison.henderson@oracle.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-10-21 08:51:59 -07:00
Christoph Hellwig 65a60e8687 iomap: better document the IOMAP_F_* flags
The documentation for IOMAP_F_* is a bit disorganized, and doesn't
mention the fact that most flags are set by the file system and consumed
by the iomap core, while IOMAP_F_SIZE_CHANGED is set by the core and
consumed by the file system.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Allison Collins <allison.henderson@oracle.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-10-21 08:51:59 -07:00
Darrick J. Wong 9cd0ed63ca iomap: enhance writeback error message
If we encounter an IO error during writeback, log the inode, offset, and
sector number of the failure, instead of forcing the user to do some
sort of reverse mapping to figure out which file is affected.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2019-10-21 08:51:59 -07:00
Christoph Hellwig 48d64cd18b iomap: pass a struct page to iomap_finish_page_writeback
No need to pass the full bio_vec.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-10-21 08:51:59 -07:00
Christoph Hellwig b3d423ec89 iomap: cleanup iomap_ioend_compare
Move the initialization of ia and ib to the declaration line and remove
a superflous else.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-10-21 08:51:59 -07:00
Christoph Hellwig ab08b01ec0 iomap: move struct iomap_page out of iomap.h
Now that all the writepage code is in the iomap code there is no
need to keep this structure public.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-10-21 08:51:59 -07:00
Christoph Hellwig 3e19e6f3ee iomap: warn on inline maps in iomap_writepage_map
And inline mapping should never mark the page dirty and thus never end up
in writepages.  Add a check for that condition and warn if it happens.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-10-21 08:51:59 -07:00
Christoph Hellwig 598ecfbaa7 iomap: lift the xfs writeback code to iomap
Take the xfs writeback code and move it to fs/iomap.  A new structure
with three methods is added as the abstraction from the generic writeback
code to the file system.  These methods are used to map blocks, submit an
ioend, and cancel a page that encountered an error before it was added to
an ioend.

Signed-off-by: Christoph Hellwig <hch@lst.de>
[darrick: rename ->submit_ioend to ->prepare_ioend to clarify what it
does]
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
2019-10-21 08:51:59 -07:00
Christoph Hellwig 9e91c5728c iomap: lift common tracing code from xfs to iomap
Lift the xfs code for tracing address space operations to the iomap
layer.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-10-21 08:51:59 -07:00
Christoph Hellwig 009d8d849d iomap: zero newly allocated mapped blocks
File systems like gfs2 don't support delayed allocations or unwritten
extents and thus allocate normal mapped blocks to fill holes.  To
cover the case of such file systems allocating new blocks to fill holes
also zero out mapped blocks with the new flag.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-10-21 08:51:59 -07:00
Christoph Hellwig 760fea8bfb xfs: remove the fork fields in the writepage_ctx and ioend
In preparation for moving the writeback code to iomap.c, replace the
XFS-specific COW fork concept with the iomap IOMAP_F_SHARED flag.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-10-21 08:51:59 -07:00
Christoph Hellwig 5653017bc4 xfs: turn io_append_trans into an io_private void pointer
In preparation for moving the ioend structure to common code we need
to get rid of the xfs-specific xfs_trans type.  Just make it a file
system private void pointer instead.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-10-21 08:51:59 -07:00
Christoph Hellwig 433dad94ec xfs: refactor the ioend merging code
Introduce two nicely abstracted helper, which can be moved to the iomap
code later.  Also use list_first_entry_or_null to simplify the code a
bit.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-10-21 08:51:59 -07:00
Christoph Hellwig 4e087a3b31 xfs: use a struct iomap in xfs_writepage_ctx
In preparation for moving the XFS writeback code to fs/iomap.c, switch
it to use struct iomap instead of the XFS-specific struct xfs_bmbt_irec.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-10-21 08:51:59 -07:00
Christoph Hellwig 05b30949f1 xfs: set IOMAP_F_NEW more carefully
Don't set IOMAP_F_NEW if we COW over an existing allocated range, as
these aren't strictly new allocations.  This is required to be able to
use IOMAP_F_NEW to zero newly allocated blocks, which is required for
the iomap code to fully support file systems that don't do delayed
allocations or use unwritten extents.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-10-21 08:51:59 -07:00
Christoph Hellwig 2492a606b3 xfs: initialize iomap->flags in xfs_bmbt_to_iomap
Currently we don't overwrite the flags field in the iomap in
xfs_bmbt_to_iomap.  This works fine with 0-initialized iomaps on stack,
but is harmful once we want to be able to reuse an iomap in the
writeback code.  Replace the shared parameter with a set of initial
flags an thus ensures the flags field is always reinitialized.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-10-21 08:51:59 -07:00
Dave Chinner 7684e2c438 iomap: iomap that extends beyond EOF should be marked dirty
When doing a direct IO that spans the current EOF, and there are
written blocks beyond EOF that extend beyond the current write, the
only metadata update that needs to be done is a file size extension.

However, we don't mark such iomaps as IOMAP_F_DIRTY to indicate that
there is IO completion metadata updates required, and hence we may
fail to correctly sync file size extensions made in IO completion
when O_DSYNC writes are being used and the hardware supports FUA.

Hence when setting IOMAP_F_DIRTY, we need to also take into account
whether the iomap spans the current EOF. If it does, then we need to
mark it dirty so that IO completion will call generic_write_sync()
to flush the inode size update to stable storage correctly.

Fixes: 3460cac1ca ("iomap: Use FUA for pure data O_DSYNC DIO writes")
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
[darrick: removed the ext4 part; they'll handle it separately]
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-10-17 13:12:01 -07:00
Jan Kara 906753befc xfs: Use iomap_dio_rw to wait for unaligned direct IO
Use iomap_dio_rw() to wait for unaligned direct IO instead of opencoding
the wait.

Signed-off-by: Jan Kara <jack@suse.cz>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-10-15 08:43:43 -07:00
Jan Kara 13ef954445 iomap: Allow forcing of waiting for running DIO in iomap_dio_rw()
Filesystems do not support doing IO as asynchronous in some cases. For
example in case of unaligned writes or in case file size needs to be
extended (e.g. for ext4). Instead of forcing filesystem to wait for AIO
in such cases, add argument to iomap_dio_rw() which makes the function
wait for IO completion. This also results in executing
iomap_dio_complete() inline in iomap_dio_rw() providing its return value
to the caller as for ordinary sync IO.

Signed-off-by: Jan Kara <jack@suse.cz>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-10-15 08:43:42 -07:00
Linus Torvalds 4f5cafb5cb Linux 5.4-rc3 2019-10-13 16:37:36 -07:00
Linus Torvalds d4615e5a46 A few tracing fixes:
- Removed locked down from tracefs itself and moved it to the trace
    directory. Having the open functions there do the lockdown checks.
 
  - Fixed a few races with opening an instance file and the instance being
    deleted (Discovered during the locked down updates). Kept separate
    from the clean up code such that they can be backported to stable
    easier.
 
  - Cleaned up and consolidated the checks done when opening a trace
    file, as there were multiple checks that need to be done, and it
    did not make sense having them done in each open instance.
 
  - Fixed a regression in the record mcount code.
 
  - Small hw_lat detector tracer fixes.
 
  - A trace_pipe read fix due to not initializing trace_seq.
 -----BEGIN PGP SIGNATURE-----
 
 iIoEABYIADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCXaNhphQccm9zdGVkdEBn
 b29kbWlzLm9yZwAKCRAp5XQQmuv6quDIAP4v08ARNdIh+r+c4AOBm3xsOuE/d9GB
 I56ydnskm+x2JQD6Ap9ivXe9yDBIErFeHNtCoq7pM8YDI4YoYIB30N0GfwM=
 =7oAu
 -----END PGP SIGNATURE-----

Merge tag 'trace-v5.4-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace

Pull tracing fixes from Steven Rostedt:
 "A few tracing fixes:

   - Remove lockdown from tracefs itself and moved it to the trace
     directory. Have the open functions there do the lockdown checks.

   - Fix a few races with opening an instance file and the instance
     being deleted (Discovered during the lockdown updates). Kept
     separate from the clean up code such that they can be backported to
     stable easier.

   - Clean up and consolidated the checks done when opening a trace
     file, as there were multiple checks that need to be done, and it
     did not make sense having them done in each open instance.

   - Fix a regression in the record mcount code.

   - Small hw_lat detector tracer fixes.

   - A trace_pipe read fix due to not initializing trace_seq"

* tag 'trace-v5.4-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
  tracing: Initialize iter->seq after zeroing in tracing_read_pipe()
  tracing/hwlat: Don't ignore outer-loop duration when calculating max_latency
  tracing/hwlat: Report total time spent in all NMIs during the sample
  recordmcount: Fix nop_mcount() function
  tracing: Do not create tracefs files if tracefs lockdown is in effect
  tracing: Add locked_down checks to the open calls of files created for tracefs
  tracing: Add tracing_check_open_get_tr()
  tracing: Have trace events system open call tracing_open_generic_tr()
  tracing: Get trace_array reference for available_tracers files
  ftrace: Get a reference counter for the trace_array on filter files
  tracefs: Revert ccbd54ff54 ("tracefs: Restrict tracefs when the kernel is locked down")
2019-10-13 14:47:10 -07:00
Linus Torvalds 2581efa9a4 hwmon fixes for v5.4-rc3
Update/fix inspur-ipsps1 and k10temp Documentation
 Fix nct7904 driver
 Fix HWMON_P_MIN_ALARM mask in hwmon core
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEiHPvMQj9QTOCiqgVyx8mb86fmYEFAl2iD0IACgkQyx8mb86f
 mYEnFQ//XJMPIukj659J3hwfzxEx3LXucQsQivUhmVXYzPAzdxa9eC97tj1I3pgz
 nKR2BIFvXtI+o3459TWuvj+Y7utmQXRQ5XiP6JBb+Wkn84IK6O0GNKSOPYDNI/3t
 /ii2XY3dERr8BuBPmRuYP7GorjFJH1P7B0FXrln5HLYAyS1yJVioLVhbrbi63NkL
 ngoxZhgcVP5QIbaSFC6ZJs1oqXXVIbvBEkt9bjkmDhJssmPtFweVxvIUnWUoPdAg
 A/GvmOddhRa/9vXzZbNThOFOLq7nHguIDvbIwizULMRHfDHB5aFEqyz9W/OFqSj5
 8s+GRiO1oYBxVrz4Bl32pR0aCJbZ5WlxhgKrLdlunRnATAzXpLzRAbKhOhoBchJZ
 xyC+96q0x8GIJ5dBqm2XqZhJssZgfgb/6SSlzSu+r6aQogHQc1/PDL26PHR517E5
 T/pUNY97LvOOgiosDuaLiGpiszoz0WdeHVqdUujnRiN5TtE5u0M6ACdumXGjFb8K
 0sPkep919yaaKNaPYA6u9PCHl4WG5EJdBkY44z4BkCt9scVvXEvATGUxNMwH8Bvm
 95w/swRlN2yq+fa4T4I3UtAAW9EgCJ3LIYYUo1me0SDld2cPwbVkMvOIirD3TBLC
 qoP178X34HaqeozOaW4Hi08QYLzktmCo5KFm6KKNFbFCQZ4oBo4=
 =d+np
 -----END PGP SIGNATURE-----

Merge tag 'hwmon-for-v5.4-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging

Pull hwmon fixes from Guenter Roeck:

 - Update/fix inspur-ipsps1 and k10temp Documentation

 - Fix nct7904 driver

 - Fix HWMON_P_MIN_ALARM mask in hwmon core

* tag 'hwmon-for-v5.4-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging:
  hwmon: docs: Extend inspur-ipsps1 title underline
  hwmon: (nct7904) Add array fan_alarm and vsen_alarm to store the alarms in nct7904_data struct.
  docs: hwmon: Include 'inspur-ipsps1.rst' into docs
  hwmon: Fix HWMON_P_MIN_ALARM mask
  hwmon: (k10temp) Update documentation and add temp2_input info
  hwmon: (nct7904) Fix the incorrect value of vsen_mask in nct7904_data struct
2019-10-13 08:40:31 -07:00
Linus Torvalds 71b1b5532b This pull request contains two fixes for MTD:
- spi-nor: Fix for a regression in write_sr()
 - rawnand: Regression fix for the au1550nd driver
 -----BEGIN PGP SIGNATURE-----
 
 iQJKBAABCAA0FiEEdgfidid8lnn52cLTZvlZhesYu8EFAl2iNXoWHHJpY2hhcmRA
 c2lnbWEtc3Rhci5hdAAKCRBm+VmF6xi7wXBVD/sH9DCWSFAUXwqnXG1GP/05R++q
 0OHc8t8DmTxcsNXMmPUhj/icQHRf611QvH9T3oOKCn6umOeuDskB89H4ELDnk9kg
 ncxW48ijCRT4RNvL0FPbsbL3eU22kRtLoHj9VSa4tSjeNzgdbBG5sdugmASCNByi
 WLyNY6nXZG2v8lpRwP/uv6zDtlp3r7ucNVdKlKWTsdBzpL94LIkwYgsWg/TTG6eO
 9/29DBEjQonu3kCMkxKoHP1OVfS+XYKnbg4W6MNyDFRyT6xJxgt191K4q735ZiCJ
 Qf+rjUMo9/7WTUdSlAept3tb/0iKAV6V8Ql8xBGfLsGMsa7O6eh083d+9QvjXMw9
 I3dwsb9Of7t2n6Cj9pT74k6UA3KSfkM4v7r87Gr4UkyWxNn5uIn9uy/WJ+yCHFUd
 1aDC6ZR8dzzsJNoWx6E4zFtBtu+Whd9/V18qTGegjg/eFK51z4Dh6bDkyKveYV5C
 3343fuYQ0KO4wsFyiUcraP8BQcxk2FkjMcWS3bX3tbJuux8GV9UwfKSdlICwEgw+
 Rql4N4SKZWoLg7d1tXOQ9fGikBel0mwQOvgABJoVOq13doUTesdiVO2JyUIblkvP
 cA4w1NDgjXZ+kMh0n6aR+kJZcHd751/EoO0mk8tt6JNT80NWottr4CpzSNx+MbAW
 YgYzi/q35JnIV5Ec4Q==
 =mJlc
 -----END PGP SIGNATURE-----

Merge tag 'fixes-for-5.4-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/mtd/linux

Pull MTD fixes from Richard Weinberger:
 "Two fixes for MTD:

   - spi-nor: Fix for a regression in write_sr()

   - rawnand: Regression fix for the au1550nd driver"

* tag 'fixes-for-5.4-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/mtd/linux:
  mtd: rawnand: au1550nd: Fix au_read_buf16() prototype
  mtd: spi-nor: Fix direction of the write_sr() transfer
2019-10-13 08:26:54 -07:00
Linus Torvalds b27528b027 for-linus-20191012
-----BEGIN PGP SIGNATURE-----
 
 iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAl2iBvAQHGF4Ym9lQGtl
 cm5lbC5kawAKCRD301j7KXHgpvokEACaUozc0Y2L99CqJcRtoH8cwIAimjKTFSmi
 UKijO/w11uc9xCIOospkM8iseRbpP/oCIwfYyVROi4LniPB1TSyQJoLF/MELeqIi
 lOSg8RrXzC22h5+QG3BR5VO55me3MwIXmJLMMWXbh7FAuUg+wepVwHwzZBurd1Pz
 C1wG8NU758AmqJdSGclUbdLLwTBkR+fXoi2EHfdbsBMqGmdZ5VRzCLx70grDcxq2
 S9gt+2+diME6o6p1wko+m9pdhUdEy7epPE50K4SA+DU06I77OZpTdSAPilPyQHei
 +3HDslzjCyos5/Ay+7hWl42jvyZ+x6GI5VcDLMTcMZsmxLy/WObDB4zcAixJrDV0
 c7XWnwCCs2wMXYMsu0ASbkU32mcQ/Ik8CrX0W3/tP1aZBW2LZnOHvXYbybzpbgbo
 QY47t3jz21uzF5kplBGxqaY1sPmLxp5V0vQmj9eN+Y7T+hiR/UgWKO/5B4CGNJe0
 BQkXNl5cZAtqxmSE8AxlpOla7z6beZUljdhFc7K8cMDdhoatRgm2MBh2G9BB1ayo
 VwHc1qLcvInpQrGiC4l5H/F8jrEs/PglknFSs1wlsav1gXA69J0cHA7LY6Czweaw
 mkiKr7jA6duBBn6MRUThF5sBMFWYuxp3B/BA4wfDJsSfCmyhWN+OZlsDaEVrzk7o
 /XK2Ew9R2Q==
 =PUeL
 -----END PGP SIGNATURE-----

Merge tag 'for-linus-20191012' of git://git.kernel.dk/linux-block

Pull io_uring fix from Jens Axboe:
 "Single small fix for a regression in the sequence logic for linked
  commands"

* tag 'for-linus-20191012' of git://git.kernel.dk/linux-block:
  io_uring: fix sequence logic for timeout requests
2019-10-13 08:15:35 -07:00
Petr Mladek d303de1fcf tracing: Initialize iter->seq after zeroing in tracing_read_pipe()
A customer reported the following softlockup:

[899688.160002] NMI watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [test.sh:16464]
[899688.160002] CPU: 0 PID: 16464 Comm: test.sh Not tainted 4.12.14-6.23-azure #1 SLE12-SP4
[899688.160002] RIP: 0010:up_write+0x1a/0x30
[899688.160002] Kernel panic - not syncing: softlockup: hung tasks
[899688.160002] RIP: 0010:up_write+0x1a/0x30
[899688.160002] RSP: 0018:ffffa86784d4fde8 EFLAGS: 00000257 ORIG_RAX: ffffffffffffff12
[899688.160002] RAX: ffffffff970fea00 RBX: 0000000000000001 RCX: 0000000000000000
[899688.160002] RDX: ffffffff00000001 RSI: 0000000000000080 RDI: ffffffff970fea00
[899688.160002] RBP: ffffffffffffffff R08: ffffffffffffffff R09: 0000000000000000
[899688.160002] R10: 0000000000000000 R11: 0000000000000000 R12: ffff8b59014720d8
[899688.160002] R13: ffff8b59014720c0 R14: ffff8b5901471090 R15: ffff8b5901470000
[899688.160002]  tracing_read_pipe+0x336/0x3c0
[899688.160002]  __vfs_read+0x26/0x140
[899688.160002]  vfs_read+0x87/0x130
[899688.160002]  SyS_read+0x42/0x90
[899688.160002]  do_syscall_64+0x74/0x160

It caught the process in the middle of trace_access_unlock(). There is
no loop. So, it must be looping in the caller tracing_read_pipe()
via the "waitagain" label.

Crashdump analyze uncovered that iter->seq was completely zeroed
at this point, including iter->seq.seq.size. It means that
print_trace_line() was never able to print anything and
there was no forward progress.

The culprit seems to be in the code:

	/* reset all but tr, trace, and overruns */
	memset(&iter->seq, 0,
	       sizeof(struct trace_iterator) -
	       offsetof(struct trace_iterator, seq));

It was added by the commit 53d0aa7730 ("ftrace:
add logic to record overruns"). It was v2.6.27-rc1.
It was the time when iter->seq looked like:

     struct trace_seq {
	unsigned char		buffer[PAGE_SIZE];
	unsigned int		len;
     };

There was no "size" variable and zeroing was perfectly fine.

The solution is to reinitialize the structure after or without
zeroing.

Link: http://lkml.kernel.org/r/20191011142134.11997-1-pmladek@suse.com

Signed-off-by: Petr Mladek <pmladek@suse.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2019-10-12 20:49:34 -04:00
Srivatsa S. Bhat (VMware) fc64e4ad80 tracing/hwlat: Don't ignore outer-loop duration when calculating max_latency
max_latency is intended to record the maximum ever observed hardware
latency, which may occur in either part of the loop (inner/outer). So
we need to also consider the outer-loop sample when updating
max_latency.

Link: http://lkml.kernel.org/r/157073345463.17189.18124025522664682811.stgit@srivatsa-ubuntu

Fixes: e7c15cd8a1 ("tracing: Added hardware latency tracer")
Cc: stable@vger.kernel.org
Signed-off-by: Srivatsa S. Bhat (VMware) <srivatsa@csail.mit.edu>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2019-10-12 20:49:33 -04:00
Srivatsa S. Bhat (VMware) 98dc19c114 tracing/hwlat: Report total time spent in all NMIs during the sample
nmi_total_ts is supposed to record the total time spent in *all* NMIs
that occur on the given CPU during the (active portion of the)
sampling window. However, the code seems to be overwriting this
variable for each NMI, thereby only recording the time spent in the
most recent NMI. Fix it by accumulating the duration instead.

Link: http://lkml.kernel.org/r/157073343544.17189.13911783866738671133.stgit@srivatsa-ubuntu

Fixes: 7b2c862501 ("tracing: Add NMI tracing in hwlat detector")
Cc: stable@vger.kernel.org
Signed-off-by: Srivatsa S. Bhat (VMware) <srivatsa@csail.mit.edu>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2019-10-12 20:49:33 -04:00
Steven Rostedt (VMware) 7f8557b88d recordmcount: Fix nop_mcount() function
The removal of the longjmp code in recordmcount.c mistakenly made the return
of make_nop() being negative an exit of nop_mcount(). It should not exit the
routine, but instead just not process that part of the code. By exiting with
an error code, it would cause the update of recordmcount to fail some files
which would fail the build if ftrace function tracing was enabled.

Link: http://lkml.kernel.org/r/20191009110538.5909fec6@gandalf.local.home

Reported-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Tested-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Fixes: 3f1df12019 ("recordmcount: Rewrite error/success handling")
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2019-10-12 20:49:33 -04:00