The siginificant new ext4 feature this time around is Harshad's new

fast_commit mode.  In addition, thanks to Mauricio for fixing a race
 where mmap'ed pages that are being changed in parallel with a
 data=journal transaction commit could result in bad checksums in the
 failure that could cause journal replays to fail.  Also notable is
 Ritesh's buffered write optimization which can result in significant
 improvements on parallel write workloads.  (The kernel test robot
 reported a 330.6% improvement on fio.write_iops on a 96 core system
 using DAX[1].)
 
 Besides that, we have the usual miscellaneous cleanups and bug fixes.
 
 [1] https://lore.kernel.org/r/20200925071217.GO28663@shao2-debian
 -----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCAAdFiEEK2m5VNv+CHkogTfJ8vlZVpUNgaMFAl+RuCQACgkQ8vlZVpUN
 gaNebgf/dUnQp5SG2/2zczSDqr+f8DOiuAdn9I54BAr2HwdkMbbiktKfenfpu41k
 SMGNV6rYSs248dWFtkzM7C2T1dpGrdAe2OCYrU6HPR/xoZlx/RcDz39u7nXBDeup
 NV7RnPgIzCAGZXCOY/Zu1k88T1eosLRTIWvIcNOspt75MC0vJ8GSmkx1bVEUsv8w
 Uq6T0OREfDiLJpEZxtfbl3o+8Rfs82t3Soj4pwN8ESL/RWBTT8PlwAGhIcdjnHy/
 lsgT35IrY4OL6Eas9msUmFYrWhO6cW21kWOugYALQXZ3ny4A+r5nZZcY/wCq01NX
 J2Z02ZiMTZUmFFREbtc0eJukXWEVvA==
 =14K9
 -----END PGP SIGNATURE-----

Merge tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4

Pull ext4 updates from Ted Ts'o:
 "The siginificant new ext4 feature this time around is Harshad's new
  fast_commit mode.

  In addition, thanks to Mauricio for fixing a race where mmap'ed pages
  that are being changed in parallel with a data=journal transaction
  commit could result in bad checksums in the failure that could cause
  journal replays to fail.

  Also notable is Ritesh's buffered write optimization which can result
  in significant improvements on parallel write workloads. (The kernel
  test robot reported a 330.6% improvement on fio.write_iops on a 96
  core system using DAX)

  Besides that, we have the usual miscellaneous cleanups and bug fixes"

Link: https://lore.kernel.org/r/20200925071217.GO28663@shao2-debian

* tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (46 commits)
  ext4: fix invalid inode checksum
  ext4: add fast commit stats in procfs
  ext4: add a mount opt to forcefully turn fast commits on
  ext4: fast commit recovery path
  jbd2: fast commit recovery path
  ext4: main fast-commit commit path
  jbd2: add fast commit machinery
  ext4 / jbd2: add fast commit initialization
  ext4: add fast_commit feature and handling for extended mount options
  doc: update ext4 and journalling docs to include fast commit feature
  ext4: Detect already used quota file early
  jbd2: avoid transaction reuse after reformatting
  ext4: use the normal helper to get the actual inode
  ext4: fix bs < ps issue reported with dioread_nolock mount opt
  ext4: data=journal: write-protect pages on j_submit_inode_data_buffers()
  ext4: data=journal: fixes for ext4_page_mkwrite()
  jbd2, ext4, ocfs2: introduce/use journal callbacks j_submit|finish_inode_data_buffers()
  jbd2: introduce/export functions jbd2_journal_submit|finish_inode_data_buffers()
  ext4: introduce ext4_sb_bread_unmovable() to replace sb_bread_unmovable()
  ext4: use ext4_sb_bread() instead of sb_bread()
  ...
This commit is contained in:
Linus Torvalds 2020-10-22 10:31:08 -07:00
commit 96485e4462
35 changed files with 4725 additions and 396 deletions

View File

@ -28,6 +28,17 @@ metadata are written to disk through the journal. This is slower but
safest. If ``data=writeback``, dirty data blocks are not flushed to the
disk before the metadata are written to disk through the journal.
In case of ``data=ordered`` mode, Ext4 also supports fast commits which
help reduce commit latency significantly. The default ``data=ordered``
mode works by logging metadata blocks to the journal. In fast commit
mode, Ext4 only stores the minimal delta needed to recreate the
affected metadata in fast commit space that is shared with JBD2.
Once the fast commit area fills in or if fast commit is not possible
or if JBD2 commit timer goes off, Ext4 performs a traditional full commit.
A full commit invalidates all the fast commits that happened before
it and thus it makes the fast commit area empty for further fast
commits. This feature needs to be enabled at mkfs time.
The journal inode is typically inode 8. The first 68 bytes of the
journal inode are replicated in the ext4 superblock. The journal itself
is normal (but hidden) file within the filesystem. The file usually
@ -609,3 +620,58 @@ bytes long (but uses a full block):
- h\_commit\_nsec
- Nanoseconds component of the above timestamp.
Fast commits
~~~~~~~~~~~~
Fast commit area is organized as a log of tag length values. Each TLV has
a ``struct ext4_fc_tl`` in the beginning which stores the tag and the length
of the entire field. It is followed by variable length tag specific value.
Here is the list of supported tags and their meanings:
.. list-table::
:widths: 8 20 20 32
:header-rows: 1
* - Tag
- Meaning
- Value struct
- Description
* - EXT4_FC_TAG_HEAD
- Fast commit area header
- ``struct ext4_fc_head``
- Stores the TID of the transaction after which these fast commits should
be applied.
* - EXT4_FC_TAG_ADD_RANGE
- Add extent to inode
- ``struct ext4_fc_add_range``
- Stores the inode number and extent to be added in this inode
* - EXT4_FC_TAG_DEL_RANGE
- Remove logical offsets to inode
- ``struct ext4_fc_del_range``
- Stores the inode number and the logical offset range that needs to be
removed
* - EXT4_FC_TAG_CREAT
- Create directory entry for a newly created file
- ``struct ext4_fc_dentry_info``
- Stores the parent inode number, inode number and directory entry of the
newly created file
* - EXT4_FC_TAG_LINK
- Link a directory entry to an inode
- ``struct ext4_fc_dentry_info``
- Stores the parent inode number, inode number and directory entry
* - EXT4_FC_TAG_UNLINK
- Unlink a directory entry of an inode
- ``struct ext4_fc_dentry_info``
- Stores the parent inode number, inode number and directory entry
* - EXT4_FC_TAG_PAD
- Padding (unused area)
- None
- Unused bytes in the fast commit area.
* - EXT4_FC_TAG_TAIL
- Mark the end of a fast commit
- ``struct ext4_fc_tail``
- Stores the TID of the commit, CRC of the fast commit of which this tag
represents the end of

View File

@ -132,6 +132,39 @@ The opportunities for abuse and DOS attacks with this should be obvious,
if you allow unprivileged userspace to trigger codepaths containing
these calls.
Fast commits
~~~~~~~~~~~~
JBD2 to also allows you to perform file-system specific delta commits known as
fast commits. In order to use fast commits, you first need to call
:c:func:`jbd2_fc_init` and tell how many blocks at the end of journal
area should be reserved for fast commits. Along with that, you will also need
to set following callbacks that perform correspodning work:
`journal->j_fc_cleanup_cb`: Cleanup function called after every full commit and
fast commit.
`journal->j_fc_replay_cb`: Replay function called for replay of fast commit
blocks.
File system is free to perform fast commits as and when it wants as long as it
gets permission from JBD2 to do so by calling the function
:c:func:`jbd2_fc_begin_commit()`. Once a fast commit is done, the client
file system should tell JBD2 about it by calling
:c:func:`jbd2_fc_end_commit()`. If file system wants JBD2 to perform a full
commit immediately after stopping the fast commit it can do so by calling
:c:func:`jbd2_fc_end_commit_fallback()`. This is useful if fast commit operation
fails for some reason and the only way to guarantee consistency is for JBD2 to
perform the full traditional commit.
JBD2 helper functions to manage fast commit buffers. File system can use
:c:func:`jbd2_fc_get_buf()` and :c:func:`jbd2_fc_wait_bufs()` to allocate
and wait on IO completion of fast commit buffers.
Currently, only Ext4 implements fast commits. For details of its implementation
of fast commits, please refer to the top level comments in
fs/ext4/fast_commit.c.
Summary
~~~~~~~

View File

@ -10,7 +10,7 @@ ext4-y := balloc.o bitmap.o block_validity.o dir.o ext4_jbd2.o extents.o \
indirect.o inline.o inode.o ioctl.o mballoc.o migrate.o \
mmp.o move_extent.o namei.o page-io.o readpage.o resize.o \
super.o symlink.o sysfs.o xattr.o xattr_hurd.o xattr_trusted.o \
xattr_user.o
xattr_user.o fast_commit.o
ext4-$(CONFIG_EXT4_FS_POSIX_ACL) += acl.o
ext4-$(CONFIG_EXT4_FS_SECURITY) += xattr_security.o

View File

@ -242,6 +242,7 @@ ext4_set_acl(struct inode *inode, struct posix_acl *acl, int type)
handle = ext4_journal_start(inode, EXT4_HT_XATTR, credits);
if (IS_ERR(handle))
return PTR_ERR(handle);
ext4_fc_start_update(inode);
if ((type == ACL_TYPE_ACCESS) && acl) {
error = posix_acl_update_mode(inode, &mode, &acl);
@ -259,6 +260,7 @@ ext4_set_acl(struct inode *inode, struct posix_acl *acl, int type)
}
out_stop:
ext4_journal_stop(handle);
ext4_fc_stop_update(inode);
if (error == -ENOSPC && ext4_should_retry_alloc(inode->i_sb, &retries))
goto retry;
return error;

View File

@ -368,7 +368,12 @@ static int ext4_validate_block_bitmap(struct super_block *sb,
struct buffer_head *bh)
{
ext4_fsblk_t blk;
struct ext4_group_info *grp = ext4_get_group_info(sb, block_group);
struct ext4_group_info *grp;
if (EXT4_SB(sb)->s_mount_state & EXT4_FC_REPLAY)
return 0;
grp = ext4_get_group_info(sb, block_group);
if (buffer_verified(bh))
return 0;
@ -495,10 +500,9 @@ ext4_read_block_bitmap_nowait(struct super_block *sb, ext4_group_t block_group,
*/
set_buffer_new(bh);
trace_ext4_read_block_bitmap_load(sb, block_group, ignore_locked);
bh->b_end_io = ext4_end_bitmap_read;
get_bh(bh);
submit_bh(REQ_OP_READ, REQ_META | REQ_PRIO |
(ignore_locked ? REQ_RAHEAD : 0), bh);
ext4_read_bh_nowait(bh, REQ_META | REQ_PRIO |
(ignore_locked ? REQ_RAHEAD : 0),
ext4_end_bitmap_read);
return bh;
verify:
err = ext4_validate_block_bitmap(sb, desc, block_group, bh);

View File

@ -131,7 +131,7 @@ static void debug_print_tree(struct ext4_sb_info *sbi)
printk(KERN_INFO "System zones: ");
rcu_read_lock();
system_blks = rcu_dereference(sbi->system_blks);
system_blks = rcu_dereference(sbi->s_system_blks);
node = rb_first(&system_blks->root);
while (node) {
entry = rb_entry(node, struct ext4_system_zone, node);
@ -261,7 +261,7 @@ int ext4_setup_system_zone(struct super_block *sb)
* with ext4_data_block_valid() accessing the rbtree at the same
* time.
*/
rcu_assign_pointer(sbi->system_blks, system_blks);
rcu_assign_pointer(sbi->s_system_blks, system_blks);
if (test_opt(sb, DEBUG))
debug_print_tree(sbi);
@ -286,9 +286,9 @@ void ext4_release_system_zone(struct super_block *sb)
{
struct ext4_system_blocks *system_blks;
system_blks = rcu_dereference_protected(EXT4_SB(sb)->system_blks,
system_blks = rcu_dereference_protected(EXT4_SB(sb)->s_system_blks,
lockdep_is_held(&sb->s_umount));
rcu_assign_pointer(EXT4_SB(sb)->system_blks, NULL);
rcu_assign_pointer(EXT4_SB(sb)->s_system_blks, NULL);
if (system_blks)
call_rcu(&system_blks->rcu, ext4_destroy_system_zone);
@ -319,7 +319,7 @@ int ext4_inode_block_valid(struct inode *inode, ext4_fsblk_t start_blk,
* mount option.
*/
rcu_read_lock();
system_blks = rcu_dereference(sbi->system_blks);
system_blks = rcu_dereference(sbi->s_system_blks);
if (system_blks == NULL)
goto out_rcu;

View File

@ -674,7 +674,7 @@ static int ext4_d_compare(const struct dentry *dentry, unsigned int len,
{
struct qstr qstr = {.name = str, .len = len };
const struct dentry *parent = READ_ONCE(dentry->d_parent);
const struct inode *inode = READ_ONCE(parent->d_inode);
const struct inode *inode = d_inode_rcu(parent);
char strbuf[DNAME_INLINE_LEN];
if (!inode || !IS_CASEFOLDED(inode) ||
@ -706,7 +706,7 @@ static int ext4_d_hash(const struct dentry *dentry, struct qstr *str)
{
const struct ext4_sb_info *sbi = EXT4_SB(dentry->d_sb);
const struct unicode_map *um = sbi->s_encoding;
const struct inode *inode = READ_ONCE(dentry->d_inode);
const struct inode *inode = d_inode_rcu(dentry);
unsigned char *norm;
int len, ret = 0;

View File

@ -27,7 +27,6 @@
#include <linux/seqlock.h>
#include <linux/mutex.h>
#include <linux/timer.h>
#include <linux/version.h>
#include <linux/wait.h>
#include <linux/sched/signal.h>
#include <linux/blockgroup_lock.h>
@ -492,7 +491,7 @@ struct flex_groups {
/* Flags which are mutually exclusive to DAX */
#define EXT4_DAX_MUT_EXCL (EXT4_VERITY_FL | EXT4_ENCRYPT_FL |\
EXT4_JOURNAL_DATA_FL)
EXT4_JOURNAL_DATA_FL | EXT4_INLINE_DATA_FL)
/* Mask out flags that are inappropriate for the given type of inode. */
static inline __u32 ext4_mask_flags(umode_t mode, __u32 flags)
@ -964,6 +963,7 @@ do { \
#endif /* defined(__KERNEL__) || defined(__linux__) */
#include "extents_status.h"
#include "fast_commit.h"
/*
* Lock subclasses for i_data_sem in the ext4_inode_info structure.
@ -1021,6 +1021,31 @@ struct ext4_inode_info {
struct list_head i_orphan; /* unlinked but open inodes */
/* Fast commit related info */
struct list_head i_fc_list; /*
* inodes that need fast commit
* protected by sbi->s_fc_lock.
*/
/* Fast commit subtid when this inode was committed */
unsigned int i_fc_committed_subtid;
/* Start of lblk range that needs to be committed in this fast commit */
ext4_lblk_t i_fc_lblk_start;
/* End of lblk range that needs to be committed in this fast commit */
ext4_lblk_t i_fc_lblk_len;
/* Number of ongoing updates on this inode */
atomic_t i_fc_updates;
/* Fast commit wait queue for this inode */
wait_queue_head_t i_fc_wait;
/* Protect concurrent accesses on i_fc_lblk_start, i_fc_lblk_len */
struct mutex i_fc_lock;
/*
* i_disksize keeps track of what the inode size is ON DISK, not
* in memory. During truncate, i_size is set to the new size by
@ -1141,6 +1166,11 @@ struct ext4_inode_info {
#define EXT4_VALID_FS 0x0001 /* Unmounted cleanly */
#define EXT4_ERROR_FS 0x0002 /* Errors detected */
#define EXT4_ORPHAN_FS 0x0004 /* Orphans being recovered */
#define EXT4_FC_INELIGIBLE 0x0008 /* Fast commit ineligible */
#define EXT4_FC_COMMITTING 0x0010 /* File system underoing a fast
* commit.
*/
#define EXT4_FC_REPLAY 0x0020 /* Fast commit replay ongoing */
/*
* Misc. filesystem flags
@ -1214,6 +1244,8 @@ struct ext4_inode_info {
#define EXT4_MOUNT2_EXPLICIT_JOURNAL_CHECKSUM 0x00000008 /* User explicitly
specified journal checksum */
#define EXT4_MOUNT2_JOURNAL_FAST_COMMIT 0x00000010 /* Journal fast commit */
#define clear_opt(sb, opt) EXT4_SB(sb)->s_mount_opt &= \
~EXT4_MOUNT_##opt
#define set_opt(sb, opt) EXT4_SB(sb)->s_mount_opt |= \
@ -1481,14 +1513,14 @@ struct ext4_sb_info {
unsigned long s_commit_interval;
u32 s_max_batch_time;
u32 s_min_batch_time;
struct block_device *journal_bdev;
struct block_device *s_journal_bdev;
#ifdef CONFIG_QUOTA
/* Names of quota files with journalled quota */
char __rcu *s_qf_names[EXT4_MAXQUOTAS];
int s_jquota_fmt; /* Format of quota to use */
#endif
unsigned int s_want_extra_isize; /* New inodes should reserve # bytes */
struct ext4_system_blocks __rcu *system_blks;
struct ext4_system_blocks __rcu *s_system_blks;
#ifdef EXTENTS_STATS
/* ext4 extents stats */
@ -1611,6 +1643,34 @@ struct ext4_sb_info {
/* Record the errseq of the backing block device */
errseq_t s_bdev_wb_err;
spinlock_t s_bdev_wb_lock;
/* Ext4 fast commit stuff */
atomic_t s_fc_subtid;
atomic_t s_fc_ineligible_updates;
/*
* After commit starts, the main queue gets locked, and the further
* updates get added in the staging queue.
*/
#define FC_Q_MAIN 0
#define FC_Q_STAGING 1
struct list_head s_fc_q[2]; /* Inodes staged for fast commit
* that have data changes in them.
*/
struct list_head s_fc_dentry_q[2]; /* directory entry updates */
unsigned int s_fc_bytes;
/*
* Main fast commit lock. This lock protects accesses to the
* following fields:
* ei->i_fc_list, s_fc_dentry_q, s_fc_q, s_fc_bytes, s_fc_bh.
*/
spinlock_t s_fc_lock;
struct buffer_head *s_fc_bh;
struct ext4_fc_stats s_fc_stats;
u64 s_fc_avg_commit_time;
#ifdef CONFIG_EXT4_DEBUG
int s_fc_debug_max_replay;
#endif
struct ext4_fc_replay_state s_fc_replay_state;
};
static inline struct ext4_sb_info *EXT4_SB(struct super_block *sb)
@ -1721,6 +1781,7 @@ enum {
EXT4_STATE_EXT_PRECACHED, /* extents have been precached */
EXT4_STATE_LUSTRE_EA_INODE, /* Lustre-style ea_inode */
EXT4_STATE_VERITY_IN_PROGRESS, /* building fs-verity Merkle tree */
EXT4_STATE_FC_COMMITTING, /* Fast commit ongoing */
};
#define EXT4_INODE_BIT_FNS(name, field, offset) \
@ -1814,6 +1875,7 @@ static inline bool ext4_verity_in_progress(struct inode *inode)
#define EXT4_FEATURE_COMPAT_RESIZE_INODE 0x0010
#define EXT4_FEATURE_COMPAT_DIR_INDEX 0x0020
#define EXT4_FEATURE_COMPAT_SPARSE_SUPER2 0x0200
#define EXT4_FEATURE_COMPAT_FAST_COMMIT 0x0400
#define EXT4_FEATURE_COMPAT_STABLE_INODES 0x0800
#define EXT4_FEATURE_RO_COMPAT_SPARSE_SUPER 0x0001
@ -1916,6 +1978,7 @@ EXT4_FEATURE_COMPAT_FUNCS(xattr, EXT_ATTR)
EXT4_FEATURE_COMPAT_FUNCS(resize_inode, RESIZE_INODE)
EXT4_FEATURE_COMPAT_FUNCS(dir_index, DIR_INDEX)
EXT4_FEATURE_COMPAT_FUNCS(sparse_super2, SPARSE_SUPER2)
EXT4_FEATURE_COMPAT_FUNCS(fast_commit, FAST_COMMIT)
EXT4_FEATURE_COMPAT_FUNCS(stable_inodes, STABLE_INODES)
EXT4_FEATURE_RO_COMPAT_FUNCS(sparse_super, SPARSE_SUPER)
@ -2650,6 +2713,7 @@ extern int ext4fs_dirhash(const struct inode *dir, const char *name, int len,
struct dx_hash_info *hinfo);
/* ialloc.c */
extern int ext4_mark_inode_used(struct super_block *sb, int ino);
extern struct inode *__ext4_new_inode(handle_t *, struct inode *, umode_t,
const struct qstr *qstr, __u32 goal,
uid_t *owner, __u32 i_flags,
@ -2675,6 +2739,27 @@ extern int ext4_init_inode_table(struct super_block *sb,
ext4_group_t group, int barrier);
extern void ext4_end_bitmap_read(struct buffer_head *bh, int uptodate);
/* fast_commit.c */
int ext4_fc_info_show(struct seq_file *seq, void *v);
void ext4_fc_init(struct super_block *sb, journal_t *journal);
void ext4_fc_init_inode(struct inode *inode);
void ext4_fc_track_range(struct inode *inode, ext4_lblk_t start,
ext4_lblk_t end);
void ext4_fc_track_unlink(struct inode *inode, struct dentry *dentry);
void ext4_fc_track_link(struct inode *inode, struct dentry *dentry);
void ext4_fc_track_create(struct inode *inode, struct dentry *dentry);
void ext4_fc_track_inode(struct inode *inode);
void ext4_fc_mark_ineligible(struct super_block *sb, int reason);
void ext4_fc_start_ineligible(struct super_block *sb, int reason);
void ext4_fc_stop_ineligible(struct super_block *sb);
void ext4_fc_start_update(struct inode *inode);
void ext4_fc_stop_update(struct inode *inode);
void ext4_fc_del(struct inode *inode);
bool ext4_fc_replay_check_excluded(struct super_block *sb, ext4_fsblk_t block);
void ext4_fc_replay_cleanup(struct super_block *sb);
int ext4_fc_commit(journal_t *journal, tid_t commit_tid);
int __init ext4_fc_init_dentry_cache(void);
/* mballoc.c */
extern const struct seq_operations ext4_mb_seq_groups_ops;
extern long ext4_mb_stats;
@ -2704,8 +2789,12 @@ extern int ext4_group_add_blocks(handle_t *handle, struct super_block *sb,
ext4_fsblk_t block, unsigned long count);
extern int ext4_trim_fs(struct super_block *, struct fstrim_range *);
extern void ext4_process_freed_data(struct super_block *sb, tid_t commit_tid);
extern void ext4_mb_mark_bb(struct super_block *sb, ext4_fsblk_t block,
int len, int state);
/* inode.c */
void ext4_inode_csum_set(struct inode *inode, struct ext4_inode *raw,
struct ext4_inode_info *ei);
int ext4_inode_is_fast_symlink(struct inode *inode);
struct buffer_head *ext4_getblk(handle_t *, struct inode *, ext4_lblk_t, int);
struct buffer_head *ext4_bread(handle_t *, struct inode *, ext4_lblk_t, int);
@ -2752,6 +2841,8 @@ extern int ext4_sync_inode(handle_t *, struct inode *);
extern void ext4_dirty_inode(struct inode *, int);
extern int ext4_change_inode_journal_flag(struct inode *, int);
extern int ext4_get_inode_loc(struct inode *, struct ext4_iloc *);
extern int ext4_get_fc_inode_loc(struct super_block *sb, unsigned long ino,
struct ext4_iloc *iloc);
extern int ext4_inode_attach_jinode(struct inode *inode);
extern int ext4_can_truncate(struct inode *inode);
extern int ext4_truncate(struct inode *);
@ -2785,12 +2876,15 @@ extern int ext4_ind_remove_space(handle_t *handle, struct inode *inode,
/* ioctl.c */
extern long ext4_ioctl(struct file *, unsigned int, unsigned long);
extern long ext4_compat_ioctl(struct file *, unsigned int, unsigned long);
extern void ext4_reset_inode_seed(struct inode *inode);
/* migrate.c */
extern int ext4_ext_migrate(struct inode *);
extern int ext4_ind_migrate(struct inode *inode);
/* namei.c */
extern int ext4_init_new_dir(handle_t *handle, struct inode *dir,
struct inode *inode);
extern int ext4_dirblock_csum_verify(struct inode *inode,
struct buffer_head *bh);
extern int ext4_orphan_add(handle_t *, struct inode *);
@ -2824,6 +2918,14 @@ extern int ext4_resize_fs(struct super_block *sb, ext4_fsblk_t n_blocks_count);
/* super.c */
extern struct buffer_head *ext4_sb_bread(struct super_block *sb,
sector_t block, int op_flags);
extern struct buffer_head *ext4_sb_bread_unmovable(struct super_block *sb,
sector_t block);
extern void ext4_read_bh_nowait(struct buffer_head *bh, int op_flags,
bh_end_io_t *end_io);
extern int ext4_read_bh(struct buffer_head *bh, int op_flags,
bh_end_io_t *end_io);
extern int ext4_read_bh_lock(struct buffer_head *bh, int op_flags, bool wait);
extern void ext4_sb_breadahead_unmovable(struct super_block *sb, sector_t block);
extern int ext4_seq_options_show(struct seq_file *seq, void *offset);
extern int ext4_calculate_overhead(struct super_block *sb);
extern void ext4_superblock_csum_set(struct super_block *sb);
@ -3012,22 +3114,24 @@ static inline int ext4_has_group_desc_csum(struct super_block *sb)
return ext4_has_feature_gdt_csum(sb) || ext4_has_metadata_csum(sb);
}
#define ext4_read_incompat_64bit_val(es, name) \
(((es)->s_feature_incompat & cpu_to_le32(EXT4_FEATURE_INCOMPAT_64BIT) \
? (ext4_fsblk_t)le32_to_cpu(es->name##_hi) << 32 : 0) | \
le32_to_cpu(es->name##_lo))
static inline ext4_fsblk_t ext4_blocks_count(struct ext4_super_block *es)
{
return ((ext4_fsblk_t)le32_to_cpu(es->s_blocks_count_hi) << 32) |
le32_to_cpu(es->s_blocks_count_lo);
return ext4_read_incompat_64bit_val(es, s_blocks_count);
}
static inline ext4_fsblk_t ext4_r_blocks_count(struct ext4_super_block *es)
{
return ((ext4_fsblk_t)le32_to_cpu(es->s_r_blocks_count_hi) << 32) |
le32_to_cpu(es->s_r_blocks_count_lo);
return ext4_read_incompat_64bit_val(es, s_r_blocks_count);
}
static inline ext4_fsblk_t ext4_free_blocks_count(struct ext4_super_block *es)
{
return ((ext4_fsblk_t)le32_to_cpu(es->s_free_blocks_count_hi) << 32) |
le32_to_cpu(es->s_free_blocks_count_lo);
return ext4_read_incompat_64bit_val(es, s_free_blocks_count);
}
static inline void ext4_blocks_count_set(struct ext4_super_block *es,
@ -3153,6 +3257,9 @@ int ext4_update_disksize_before_punch(struct inode *inode, loff_t offset,
struct ext4_group_info {
unsigned long bb_state;
#ifdef AGGRESSIVE_CHECK
unsigned long bb_check_counter;
#endif
struct rb_root bb_free_root;
ext4_grpblk_t bb_first_free; /* first free block */
ext4_grpblk_t bb_free; /* total free blocks */
@ -3357,6 +3464,10 @@ extern int ext4_handle_dirty_dirblock(handle_t *handle, struct inode *inode,
extern int ext4_ci_compare(const struct inode *parent,
const struct qstr *fname,
const struct qstr *entry, bool quick);
extern int __ext4_unlink(struct inode *dir, const struct qstr *d_name,
struct inode *inode);
extern int __ext4_link(struct inode *dir, struct inode *inode,
struct dentry *dentry);
#define S_SHIFT 12
static const unsigned char ext4_type_by_mode[(S_IFMT >> S_SHIFT) + 1] = {
@ -3457,6 +3568,11 @@ extern int ext4_clu_mapped(struct inode *inode, ext4_lblk_t lclu);
extern int ext4_datasem_ensure_credits(handle_t *handle, struct inode *inode,
int check_cred, int restart_cred,
int revoke_cred);
extern void ext4_ext_replay_shrink_inode(struct inode *inode, ext4_lblk_t end);
extern int ext4_ext_replay_set_iblocks(struct inode *inode);
extern int ext4_ext_replay_update_ex(struct inode *inode, ext4_lblk_t start,
int len, int unwritten, ext4_fsblk_t pblk);
extern int ext4_ext_clear_bb(struct inode *inode);
/* move_extent.c */

View File

@ -100,7 +100,7 @@ handle_t *__ext4_journal_start_sb(struct super_block *sb, unsigned int line,
return ERR_PTR(err);
journal = EXT4_SB(sb)->s_journal;
if (!journal)
if (!journal || (EXT4_SB(sb)->s_mount_state & EXT4_FC_REPLAY))
return ext4_get_nojournal();
return jbd2__journal_start(journal, blocks, rsv_blocks, revoke_creds,
GFP_NOFS, type, line);

View File

@ -501,7 +501,7 @@ __read_extent_tree_block(const char *function, unsigned int line,
if (!bh_uptodate_or_lock(bh)) {
trace_ext4_ext_load_extent(inode, pblk, _RET_IP_);
err = bh_submit_read(bh);
err = ext4_read_bh(bh, 0, NULL);
if (err < 0)
goto errout;
}
@ -3723,6 +3723,7 @@ static int ext4_convert_unwritten_extents_endio(handle_t *handle,
err = ext4_ext_dirty(handle, inode, path + path->p_depth);
out:
ext4_ext_show_leaf(inode, path);
ext4_fc_track_range(inode, ee_block, ee_block + ee_len - 1);
return err;
}
@ -3794,6 +3795,7 @@ convert_initialized_extent(handle_t *handle, struct inode *inode,
if (*allocated > map->m_len)
*allocated = map->m_len;
map->m_len = *allocated;
ext4_fc_track_range(inode, ee_block, ee_block + ee_len - 1);
return 0;
}
@ -4023,7 +4025,7 @@ static int get_implied_cluster_alloc(struct super_block *sb,
* down_read(&EXT4_I(inode)->i_data_sem) if not allocating file system block
* (ie, create is zero). Otherwise down_write(&EXT4_I(inode)->i_data_sem)
*
* return > 0, number of of blocks already mapped/allocated
* return > 0, number of blocks already mapped/allocated
* if create == 0 and these are pre-allocated blocks
* buffer head is unmapped
* otherwise blocks are mapped
@ -4327,7 +4329,7 @@ int ext4_ext_map_blocks(handle_t *handle, struct inode *inode,
map->m_len = ar.len;
allocated = map->m_len;
ext4_ext_show_leaf(inode, path);
ext4_fc_track_range(inode, map->m_lblk, map->m_lblk + map->m_len - 1);
out:
ext4_ext_drop_refs(path);
kfree(path);
@ -4600,7 +4602,8 @@ static long ext4_zero_range(struct file *file, loff_t offset,
ret = ext4_mark_inode_dirty(handle, inode);
if (unlikely(ret))
goto out_handle;
ext4_fc_track_range(inode, offset >> inode->i_sb->s_blocksize_bits,
(offset + len - 1) >> inode->i_sb->s_blocksize_bits);
/* Zero out partial block at the edges of the range */
ret = ext4_zero_partial_blocks(handle, inode, offset, len);
if (ret >= 0)
@ -4648,23 +4651,34 @@ long ext4_fallocate(struct file *file, int mode, loff_t offset, loff_t len)
FALLOC_FL_COLLAPSE_RANGE | FALLOC_FL_ZERO_RANGE |
FALLOC_FL_INSERT_RANGE))
return -EOPNOTSUPP;
ext4_fc_track_range(inode, offset >> blkbits,
(offset + len - 1) >> blkbits);
if (mode & FALLOC_FL_PUNCH_HOLE)
return ext4_punch_hole(inode, offset, len);
ext4_fc_start_update(inode);
if (mode & FALLOC_FL_PUNCH_HOLE) {
ret = ext4_punch_hole(inode, offset, len);
goto exit;
}
ret = ext4_convert_inline_data(inode);
if (ret)
return ret;
goto exit;
if (mode & FALLOC_FL_COLLAPSE_RANGE)
return ext4_collapse_range(inode, offset, len);
if (mode & FALLOC_FL_COLLAPSE_RANGE) {
ret = ext4_collapse_range(inode, offset, len);
goto exit;
}
if (mode & FALLOC_FL_INSERT_RANGE)
return ext4_insert_range(inode, offset, len);
if (mode & FALLOC_FL_ZERO_RANGE)
return ext4_zero_range(file, offset, len, mode);
if (mode & FALLOC_FL_INSERT_RANGE) {
ret = ext4_insert_range(inode, offset, len);
goto exit;
}
if (mode & FALLOC_FL_ZERO_RANGE) {
ret = ext4_zero_range(file, offset, len, mode);
goto exit;
}
trace_ext4_fallocate_enter(inode, offset, len, mode);
lblk = offset >> blkbits;
@ -4698,12 +4712,14 @@ long ext4_fallocate(struct file *file, int mode, loff_t offset, loff_t len)
goto out;
if (file->f_flags & O_SYNC && EXT4_SB(inode->i_sb)->s_journal) {
ret = jbd2_complete_transaction(EXT4_SB(inode->i_sb)->s_journal,
EXT4_I(inode)->i_sync_tid);
ret = ext4_fc_commit(EXT4_SB(inode->i_sb)->s_journal,
EXT4_I(inode)->i_sync_tid);
}
out:
inode_unlock(inode);
trace_ext4_fallocate_exit(inode, offset, max_blocks, ret);
exit:
ext4_fc_stop_update(inode);
return ret;
}
@ -4769,7 +4785,7 @@ int ext4_convert_unwritten_extents(handle_t *handle, struct inode *inode,
int ext4_convert_unwritten_io_end_vec(handle_t *handle, ext4_io_end_t *io_end)
{
int ret, err = 0;
int ret = 0, err = 0;
struct ext4_io_end_vec *io_end_vec;
/*
@ -5291,6 +5307,7 @@ static int ext4_collapse_range(struct inode *inode, loff_t offset, loff_t len)
ret = PTR_ERR(handle);
goto out_mmap;
}
ext4_fc_start_ineligible(sb, EXT4_FC_REASON_FALLOC_RANGE);
down_write(&EXT4_I(inode)->i_data_sem);
ext4_discard_preallocations(inode, 0);
@ -5329,6 +5346,7 @@ static int ext4_collapse_range(struct inode *inode, loff_t offset, loff_t len)
out_stop:
ext4_journal_stop(handle);
ext4_fc_stop_ineligible(sb);
out_mmap:
up_write(&EXT4_I(inode)->i_mmap_sem);
out_mutex:
@ -5429,6 +5447,7 @@ static int ext4_insert_range(struct inode *inode, loff_t offset, loff_t len)
ret = PTR_ERR(handle);
goto out_mmap;
}
ext4_fc_start_ineligible(sb, EXT4_FC_REASON_FALLOC_RANGE);
/* Expand file to avoid data loss if there is error while shifting */
inode->i_size += len;
@ -5503,6 +5522,7 @@ static int ext4_insert_range(struct inode *inode, loff_t offset, loff_t len)
out_stop:
ext4_journal_stop(handle);
ext4_fc_stop_ineligible(sb);
out_mmap:
up_write(&EXT4_I(inode)->i_mmap_sem);
out_mutex:
@ -5784,3 +5804,264 @@ int ext4_clu_mapped(struct inode *inode, ext4_lblk_t lclu)
return err ? err : mapped;
}
/*
* Updates physical block address and unwritten status of extent
* starting at lblk start and of len. If such an extent doesn't exist,
* this function splits the extent tree appropriately to create an
* extent like this. This function is called in the fast commit
* replay path. Returns 0 on success and error on failure.
*/
int ext4_ext_replay_update_ex(struct inode *inode, ext4_lblk_t start,
int len, int unwritten, ext4_fsblk_t pblk)
{
struct ext4_ext_path *path = NULL, *ppath;
struct ext4_extent *ex;
int ret;
path = ext4_find_extent(inode, start, NULL, 0);
if (!path)
return -EINVAL;
ex = path[path->p_depth].p_ext;
if (!ex) {
ret = -EFSCORRUPTED;
goto out;
}
if (le32_to_cpu(ex->ee_block) != start ||
ext4_ext_get_actual_len(ex) != len) {
/* We need to split this extent to match our extent first */
ppath = path;
down_write(&EXT4_I(inode)->i_data_sem);
ret = ext4_force_split_extent_at(NULL, inode, &ppath, start, 1);
up_write(&EXT4_I(inode)->i_data_sem);
if (ret)
goto out;
kfree(path);
path = ext4_find_extent(inode, start, NULL, 0);
if (IS_ERR(path))
return -1;
ppath = path;
ex = path[path->p_depth].p_ext;
WARN_ON(le32_to_cpu(ex->ee_block) != start);
if (ext4_ext_get_actual_len(ex) != len) {
down_write(&EXT4_I(inode)->i_data_sem);
ret = ext4_force_split_extent_at(NULL, inode, &ppath,
start + len, 1);
up_write(&EXT4_I(inode)->i_data_sem);
if (ret)
goto out;
kfree(path);
path = ext4_find_extent(inode, start, NULL, 0);
if (IS_ERR(path))
return -EINVAL;
ex = path[path->p_depth].p_ext;
}
}
if (unwritten)
ext4_ext_mark_unwritten(ex);
else
ext4_ext_mark_initialized(ex);
ext4_ext_store_pblock(ex, pblk);
down_write(&EXT4_I(inode)->i_data_sem);
ret = ext4_ext_dirty(NULL, inode, &path[path->p_depth]);
up_write(&EXT4_I(inode)->i_data_sem);
out:
ext4_ext_drop_refs(path);
kfree(path);
ext4_mark_inode_dirty(NULL, inode);
return ret;
}
/* Try to shrink the extent tree */
void ext4_ext_replay_shrink_inode(struct inode *inode, ext4_lblk_t end)
{
struct ext4_ext_path *path = NULL;
struct ext4_extent *ex;
ext4_lblk_t old_cur, cur = 0;
while (cur < end) {
path = ext4_find_extent(inode, cur, NULL, 0);
if (IS_ERR(path))
return;
ex = path[path->p_depth].p_ext;
if (!ex) {
ext4_ext_drop_refs(path);
kfree(path);
ext4_mark_inode_dirty(NULL, inode);
return;
}
old_cur = cur;
cur = le32_to_cpu(ex->ee_block) + ext4_ext_get_actual_len(ex);
if (cur <= old_cur)
cur = old_cur + 1;
ext4_ext_try_to_merge(NULL, inode, path, ex);
down_write(&EXT4_I(inode)->i_data_sem);
ext4_ext_dirty(NULL, inode, &path[path->p_depth]);
up_write(&EXT4_I(inode)->i_data_sem);
ext4_mark_inode_dirty(NULL, inode);
ext4_ext_drop_refs(path);
kfree(path);
}
}
/* Check if *cur is a hole and if it is, skip it */
static void skip_hole(struct inode *inode, ext4_lblk_t *cur)
{
int ret;
struct ext4_map_blocks map;
map.m_lblk = *cur;
map.m_len = ((inode->i_size) >> inode->i_sb->s_blocksize_bits) - *cur;
ret = ext4_map_blocks(NULL, inode, &map, 0);
if (ret != 0)
return;
*cur = *cur + map.m_len;
}
/* Count number of blocks used by this inode and update i_blocks */
int ext4_ext_replay_set_iblocks(struct inode *inode)
{
struct ext4_ext_path *path = NULL, *path2 = NULL;
struct ext4_extent *ex;
ext4_lblk_t cur = 0, end;
int numblks = 0, i, ret = 0;
ext4_fsblk_t cmp1, cmp2;
struct ext4_map_blocks map;
/* Determin the size of the file first */
path = ext4_find_extent(inode, EXT_MAX_BLOCKS - 1, NULL,
EXT4_EX_NOCACHE);
if (IS_ERR(path))
return PTR_ERR(path);
ex = path[path->p_depth].p_ext;
if (!ex) {
ext4_ext_drop_refs(path);
kfree(path);
goto out;
}
end = le32_to_cpu(ex->ee_block) + ext4_ext_get_actual_len(ex);
ext4_ext_drop_refs(path);
kfree(path);
/* Count the number of data blocks */
cur = 0;
while (cur < end) {
map.m_lblk = cur;
map.m_len = end - cur;
ret = ext4_map_blocks(NULL, inode, &map, 0);
if (ret < 0)
break;
if (ret > 0)
numblks += ret;
cur = cur + map.m_len;
}
/*
* Count the number of extent tree blocks. We do it by looking up
* two successive extents and determining the difference between
* their paths. When path is different for 2 successive extents
* we compare the blocks in the path at each level and increment
* iblocks by total number of differences found.
*/
cur = 0;
skip_hole(inode, &cur);
path = ext4_find_extent(inode, cur, NULL, 0);
if (IS_ERR(path))
goto out;
numblks += path->p_depth;
ext4_ext_drop_refs(path);
kfree(path);
while (cur < end) {
path = ext4_find_extent(inode, cur, NULL, 0);
if (IS_ERR(path))
break;
ex = path[path->p_depth].p_ext;
if (!ex) {
ext4_ext_drop_refs(path);
kfree(path);
return 0;
}
cur = max(cur + 1, le32_to_cpu(ex->ee_block) +
ext4_ext_get_actual_len(ex));
skip_hole(inode, &cur);
path2 = ext4_find_extent(inode, cur, NULL, 0);
if (IS_ERR(path2)) {
ext4_ext_drop_refs(path);
kfree(path);
break;
}
ex = path2[path2->p_depth].p_ext;
for (i = 0; i <= max(path->p_depth, path2->p_depth); i++) {
cmp1 = cmp2 = 0;
if (i <= path->p_depth)
cmp1 = path[i].p_bh ?
path[i].p_bh->b_blocknr : 0;
if (i <= path2->p_depth)
cmp2 = path2[i].p_bh ?
path2[i].p_bh->b_blocknr : 0;
if (cmp1 != cmp2 && cmp2 != 0)
numblks++;
}
ext4_ext_drop_refs(path);
ext4_ext_drop_refs(path2);
kfree(path);
kfree(path2);
}
out:
inode->i_blocks = numblks << (inode->i_sb->s_blocksize_bits - 9);
ext4_mark_inode_dirty(NULL, inode);
return 0;
}
int ext4_ext_clear_bb(struct inode *inode)
{
struct ext4_ext_path *path = NULL;
struct ext4_extent *ex;
ext4_lblk_t cur = 0, end;
int j, ret = 0;
struct ext4_map_blocks map;
/* Determin the size of the file first */
path = ext4_find_extent(inode, EXT_MAX_BLOCKS - 1, NULL,
EXT4_EX_NOCACHE);
if (IS_ERR(path))
return PTR_ERR(path);
ex = path[path->p_depth].p_ext;
if (!ex) {
ext4_ext_drop_refs(path);
kfree(path);
return 0;
}
end = le32_to_cpu(ex->ee_block) + ext4_ext_get_actual_len(ex);
ext4_ext_drop_refs(path);
kfree(path);
cur = 0;
while (cur < end) {
map.m_lblk = cur;
map.m_len = end - cur;
ret = ext4_map_blocks(NULL, inode, &map, 0);
if (ret < 0)
break;
if (ret > 0) {
path = ext4_find_extent(inode, map.m_lblk, NULL, 0);
if (!IS_ERR_OR_NULL(path)) {
for (j = 0; j < path->p_depth; j++) {
ext4_mb_mark_bb(inode->i_sb,
path[j].p_block, 1, 0);
}
ext4_ext_drop_refs(path);
kfree(path);
}
ext4_mb_mark_bb(inode->i_sb, map.m_pblk, map.m_len, 0);
}
cur = cur + map.m_len;
}
return 0;
}

View File

@ -311,6 +311,9 @@ void ext4_es_find_extent_range(struct inode *inode,
ext4_lblk_t lblk, ext4_lblk_t end,
struct extent_status *es)
{
if (EXT4_SB(inode->i_sb)->s_mount_state & EXT4_FC_REPLAY)
return;
trace_ext4_es_find_extent_range_enter(inode, lblk);
read_lock(&EXT4_I(inode)->i_es_lock);
@ -361,6 +364,9 @@ bool ext4_es_scan_range(struct inode *inode,
{
bool ret;
if (EXT4_SB(inode->i_sb)->s_mount_state & EXT4_FC_REPLAY)
return false;
read_lock(&EXT4_I(inode)->i_es_lock);
ret = __es_scan_range(inode, matching_fn, lblk, end);
read_unlock(&EXT4_I(inode)->i_es_lock);
@ -404,6 +410,9 @@ bool ext4_es_scan_clu(struct inode *inode,
{
bool ret;
if (EXT4_SB(inode->i_sb)->s_mount_state & EXT4_FC_REPLAY)
return false;
read_lock(&EXT4_I(inode)->i_es_lock);
ret = __es_scan_clu(inode, matching_fn, lblk);
read_unlock(&EXT4_I(inode)->i_es_lock);
@ -812,6 +821,9 @@ int ext4_es_insert_extent(struct inode *inode, ext4_lblk_t lblk,
int err = 0;
struct ext4_sb_info *sbi = EXT4_SB(inode->i_sb);
if (EXT4_SB(inode->i_sb)->s_mount_state & EXT4_FC_REPLAY)
return 0;
es_debug("add [%u/%u) %llu %x to extent status tree of inode %lu\n",
lblk, len, pblk, status, inode->i_ino);
@ -873,6 +885,9 @@ void ext4_es_cache_extent(struct inode *inode, ext4_lblk_t lblk,
struct extent_status newes;
ext4_lblk_t end = lblk + len - 1;
if (EXT4_SB(inode->i_sb)->s_mount_state & EXT4_FC_REPLAY)
return;
newes.es_lblk = lblk;
newes.es_len = len;
ext4_es_store_pblock_status(&newes, pblk, status);
@ -908,6 +923,9 @@ int ext4_es_lookup_extent(struct inode *inode, ext4_lblk_t lblk,
struct rb_node *node;
int found = 0;
if (EXT4_SB(inode->i_sb)->s_mount_state & EXT4_FC_REPLAY)
return 0;
trace_ext4_es_lookup_extent_enter(inode, lblk);
es_debug("lookup extent in block %u\n", lblk);
@ -1419,6 +1437,9 @@ int ext4_es_remove_extent(struct inode *inode, ext4_lblk_t lblk,
int err = 0;
int reserved = 0;
if (EXT4_SB(inode->i_sb)->s_mount_state & EXT4_FC_REPLAY)
return 0;
trace_ext4_es_remove_extent(inode, lblk, len);
es_debug("remove [%u/%u) from extent status tree of inode %lu\n",
lblk, len, inode->i_ino);
@ -1969,6 +1990,9 @@ int ext4_es_insert_delayed_block(struct inode *inode, ext4_lblk_t lblk,
struct extent_status newes;
int err = 0;
if (EXT4_SB(inode->i_sb)->s_mount_state & EXT4_FC_REPLAY)
return 0;
es_debug("add [%u/1) delayed to extent status tree of inode %lu\n",
lblk, inode->i_ino);

2139
fs/ext4/fast_commit.c Normal file

File diff suppressed because it is too large Load Diff

159
fs/ext4/fast_commit.h Normal file
View File

@ -0,0 +1,159 @@
/* SPDX-License-Identifier: GPL-2.0 */
#ifndef __FAST_COMMIT_H__
#define __FAST_COMMIT_H__
/* Number of blocks in journal area to allocate for fast commits */
#define EXT4_NUM_FC_BLKS 256
/* Fast commit tags */
#define EXT4_FC_TAG_ADD_RANGE 0x0001
#define EXT4_FC_TAG_DEL_RANGE 0x0002
#define EXT4_FC_TAG_CREAT 0x0003
#define EXT4_FC_TAG_LINK 0x0004
#define EXT4_FC_TAG_UNLINK 0x0005
#define EXT4_FC_TAG_INODE 0x0006
#define EXT4_FC_TAG_PAD 0x0007
#define EXT4_FC_TAG_TAIL 0x0008
#define EXT4_FC_TAG_HEAD 0x0009
#define EXT4_FC_SUPPORTED_FEATURES 0x0
/* On disk fast commit tlv value structures */
/* Fast commit on disk tag length structure */
struct ext4_fc_tl {
__le16 fc_tag;
__le16 fc_len;
};
/* Value structure for tag EXT4_FC_TAG_HEAD. */
struct ext4_fc_head {
__le32 fc_features;
__le32 fc_tid;
};
/* Value structure for EXT4_FC_TAG_ADD_RANGE. */
struct ext4_fc_add_range {
__le32 fc_ino;
__u8 fc_ex[12];
};
/* Value structure for tag EXT4_FC_TAG_DEL_RANGE. */
struct ext4_fc_del_range {
__le32 fc_ino;
__le32 fc_lblk;
__le32 fc_len;
};
/*
* This is the value structure for tags EXT4_FC_TAG_CREAT, EXT4_FC_TAG_LINK
* and EXT4_FC_TAG_UNLINK.
*/
struct ext4_fc_dentry_info {
__le32 fc_parent_ino;
__le32 fc_ino;
u8 fc_dname[0];
};
/* Value structure for EXT4_FC_TAG_INODE and EXT4_FC_TAG_INODE_PARTIAL. */
struct ext4_fc_inode {
__le32 fc_ino;
__u8 fc_raw_inode[0];
};
/* Value structure for tag EXT4_FC_TAG_TAIL. */
struct ext4_fc_tail {
__le32 fc_tid;
__le32 fc_crc;
};
/*
* In memory list of dentry updates that are performed on the file
* system used by fast commit code.
*/
struct ext4_fc_dentry_update {
int fcd_op; /* Type of update create / unlink / link */
int fcd_parent; /* Parent inode number */
int fcd_ino; /* Inode number */
struct qstr fcd_name; /* Dirent name */
unsigned char fcd_iname[DNAME_INLINE_LEN]; /* Dirent name string */
struct list_head fcd_list;
};
/*
* Fast commit reason codes
*/
enum {
/*
* Commit status codes:
*/
EXT4_FC_REASON_OK = 0,
EXT4_FC_REASON_INELIGIBLE,
EXT4_FC_REASON_ALREADY_COMMITTED,
EXT4_FC_REASON_FC_START_FAILED,
EXT4_FC_REASON_FC_FAILED,
/*
* Fast commit ineligiblity reasons:
*/
EXT4_FC_REASON_XATTR = 0,
EXT4_FC_REASON_CROSS_RENAME,
EXT4_FC_REASON_JOURNAL_FLAG_CHANGE,
EXT4_FC_REASON_MEM,
EXT4_FC_REASON_SWAP_BOOT,
EXT4_FC_REASON_RESIZE,
EXT4_FC_REASON_RENAME_DIR,
EXT4_FC_REASON_FALLOC_RANGE,
EXT4_FC_COMMIT_FAILED,
EXT4_FC_REASON_MAX
};
struct ext4_fc_stats {
unsigned int fc_ineligible_reason_count[EXT4_FC_REASON_MAX];
unsigned long fc_num_commits;
unsigned long fc_ineligible_commits;
unsigned long fc_numblks;
};
#define EXT4_FC_REPLAY_REALLOC_INCREMENT 4
/*
* Physical block regions added to different inodes due to fast commit
* recovery. These are set during the SCAN phase. During the replay phase,
* our allocator excludes these from its allocation. This ensures that
* we don't accidentally allocating a block that is going to be used by
* another inode.
*/
struct ext4_fc_alloc_region {
ext4_lblk_t lblk;
ext4_fsblk_t pblk;
int ino, len;
};
/*
* Fast commit replay state.
*/
struct ext4_fc_replay_state {
int fc_replay_num_tags;
int fc_replay_expected_off;
int fc_current_pass;
int fc_cur_tag;
int fc_crc;
struct ext4_fc_alloc_region *fc_regions;
int fc_regions_size, fc_regions_used, fc_regions_valid;
int *fc_modified_inodes;
int fc_modified_inodes_used, fc_modified_inodes_size;
};
#define region_last(__region) (((__region)->lblk) + ((__region)->len) - 1)
#define fc_for_each_tl(__start, __end, __tl) \
for (tl = (struct ext4_fc_tl *)start; \
(u8 *)tl < (u8 *)end; \
tl = (struct ext4_fc_tl *)((u8 *)tl + \
sizeof(struct ext4_fc_tl) + \
+ le16_to_cpu(tl->fc_len)))
#endif /* __FAST_COMMIT_H__ */

View File

@ -260,6 +260,7 @@ static ssize_t ext4_buffered_write_iter(struct kiocb *iocb,
if (iocb->ki_flags & IOCB_NOWAIT)
return -EOPNOTSUPP;
ext4_fc_start_update(inode);
inode_lock(inode);
ret = ext4_write_checks(iocb, from);
if (ret <= 0)
@ -271,6 +272,7 @@ static ssize_t ext4_buffered_write_iter(struct kiocb *iocb,
out:
inode_unlock(inode);
ext4_fc_stop_update(inode);
if (likely(ret > 0)) {
iocb->ki_pos += ret;
ret = generic_write_sync(iocb, ret);
@ -534,7 +536,9 @@ static ssize_t ext4_dio_write_iter(struct kiocb *iocb, struct iov_iter *from)
goto out;
}
ext4_fc_start_update(inode);
ret = ext4_orphan_add(handle, inode);
ext4_fc_stop_update(inode);
if (ret) {
ext4_journal_stop(handle);
goto out;
@ -656,8 +660,8 @@ ext4_file_write_iter(struct kiocb *iocb, struct iov_iter *from)
#endif
if (iocb->ki_flags & IOCB_DIRECT)
return ext4_dio_write_iter(iocb, from);
return ext4_buffered_write_iter(iocb, from);
else
return ext4_buffered_write_iter(iocb, from);
}
#ifdef CONFIG_FS_DAX
@ -757,6 +761,7 @@ static int ext4_file_mmap(struct file *file, struct vm_area_struct *vma)
if (!daxdev_mapping_supported(vma, dax_dev))
return -EOPNOTSUPP;
ext4_fc_start_update(inode);
file_accessed(file);
if (IS_DAX(file_inode(file))) {
vma->vm_ops = &ext4_dax_vm_ops;
@ -764,6 +769,7 @@ static int ext4_file_mmap(struct file *file, struct vm_area_struct *vma)
} else {
vma->vm_ops = &ext4_file_vm_ops;
}
ext4_fc_stop_update(inode);
return 0;
}
@ -844,7 +850,7 @@ static int ext4_file_open(struct inode *inode, struct file *filp)
return ret;
}
filp->f_mode |= FMODE_NOWAIT;
filp->f_mode |= FMODE_NOWAIT | FMODE_BUF_RASYNC;
return dquot_file_open(inode, filp);
}

View File

@ -108,6 +108,9 @@ static int ext4_getfsmap_helper(struct super_block *sb,
/* Are we just counting mappings? */
if (info->gfi_head->fmh_count == 0) {
if (info->gfi_head->fmh_entries == UINT_MAX)
return EXT4_QUERY_RANGE_ABORT;
if (rec_fsblk > info->gfi_next_fsblk)
info->gfi_head->fmh_entries++;
@ -571,8 +574,8 @@ static bool ext4_getfsmap_is_valid_device(struct super_block *sb,
if (fm->fmr_device == 0 || fm->fmr_device == UINT_MAX ||
fm->fmr_device == new_encode_dev(sb->s_bdev->bd_dev))
return true;
if (EXT4_SB(sb)->journal_bdev &&
fm->fmr_device == new_encode_dev(EXT4_SB(sb)->journal_bdev->bd_dev))
if (EXT4_SB(sb)->s_journal_bdev &&
fm->fmr_device == new_encode_dev(EXT4_SB(sb)->s_journal_bdev->bd_dev))
return true;
return false;
}
@ -642,9 +645,9 @@ int ext4_getfsmap(struct super_block *sb, struct ext4_fsmap_head *head,
memset(handlers, 0, sizeof(handlers));
handlers[0].gfd_dev = new_encode_dev(sb->s_bdev->bd_dev);
handlers[0].gfd_fn = ext4_getfsmap_datadev;
if (EXT4_SB(sb)->journal_bdev) {
if (EXT4_SB(sb)->s_journal_bdev) {
handlers[1].gfd_dev = new_encode_dev(
EXT4_SB(sb)->journal_bdev->bd_dev);
EXT4_SB(sb)->s_journal_bdev->bd_dev);
handlers[1].gfd_fn = ext4_getfsmap_logdev;
}

View File

@ -112,7 +112,7 @@ static int ext4_fsync_journal(struct inode *inode, bool datasync,
!jbd2_trans_will_send_data_barrier(journal, commit_tid))
*needs_barrier = true;
return jbd2_complete_transaction(journal, commit_tid);
return ext4_fc_commit(journal, commit_tid);
}
/*
@ -150,7 +150,7 @@ int ext4_sync_file(struct file *file, loff_t start, loff_t end, int datasync)
ret = file_write_and_wait_range(file, start, end);
if (ret)
return ret;
goto out;
/*
* data=writeback,ordered:

View File

@ -82,7 +82,12 @@ static int ext4_validate_inode_bitmap(struct super_block *sb,
struct buffer_head *bh)
{
ext4_fsblk_t blk;
struct ext4_group_info *grp = ext4_get_group_info(sb, block_group);
struct ext4_group_info *grp;
if (EXT4_SB(sb)->s_mount_state & EXT4_FC_REPLAY)
return 0;
grp = ext4_get_group_info(sb, block_group);
if (buffer_verified(bh))
return 0;
@ -189,10 +194,7 @@ ext4_read_inode_bitmap(struct super_block *sb, ext4_group_t block_group)
* submit the buffer_head for reading
*/
trace_ext4_load_inode_bitmap(sb, block_group);
bh->b_end_io = ext4_end_bitmap_read;
get_bh(bh);
submit_bh(REQ_OP_READ, REQ_META | REQ_PRIO, bh);
wait_on_buffer(bh);
ext4_read_bh(bh, REQ_META | REQ_PRIO, ext4_end_bitmap_read);
ext4_simulate_fail_bh(sb, bh, EXT4_SIM_IBITMAP_EIO);
if (!buffer_uptodate(bh)) {
put_bh(bh);
@ -284,15 +286,17 @@ void ext4_free_inode(handle_t *handle, struct inode *inode)
bit = (ino - 1) % EXT4_INODES_PER_GROUP(sb);
bitmap_bh = ext4_read_inode_bitmap(sb, block_group);
/* Don't bother if the inode bitmap is corrupt. */
grp = ext4_get_group_info(sb, block_group);
if (IS_ERR(bitmap_bh)) {
fatal = PTR_ERR(bitmap_bh);
bitmap_bh = NULL;
goto error_return;
}
if (unlikely(EXT4_MB_GRP_IBITMAP_CORRUPT(grp))) {
fatal = -EFSCORRUPTED;
goto error_return;
if (!(sbi->s_mount_state & EXT4_FC_REPLAY)) {
grp = ext4_get_group_info(sb, block_group);
if (unlikely(EXT4_MB_GRP_IBITMAP_CORRUPT(grp))) {
fatal = -EFSCORRUPTED;
goto error_return;
}
}
BUFFER_TRACE(bitmap_bh, "get_write_access");
@ -742,6 +746,122 @@ static int find_inode_bit(struct super_block *sb, ext4_group_t group,
return 1;
}
int ext4_mark_inode_used(struct super_block *sb, int ino)
{
unsigned long max_ino = le32_to_cpu(EXT4_SB(sb)->s_es->s_inodes_count);
struct buffer_head *inode_bitmap_bh = NULL, *group_desc_bh = NULL;
struct ext4_group_desc *gdp;
ext4_group_t group;
int bit;
int err = -EFSCORRUPTED;
if (ino < EXT4_FIRST_INO(sb) || ino > max_ino)
goto out;
group = (ino - 1) / EXT4_INODES_PER_GROUP(sb);
bit = (ino - 1) % EXT4_INODES_PER_GROUP(sb);
inode_bitmap_bh = ext4_read_inode_bitmap(sb, group);
if (IS_ERR(inode_bitmap_bh))
return PTR_ERR(inode_bitmap_bh);
if (ext4_test_bit(bit, inode_bitmap_bh->b_data)) {
err = 0;
goto out;
}
gdp = ext4_get_group_desc(sb, group, &group_desc_bh);
if (!gdp || !group_desc_bh) {
err = -EINVAL;
goto out;
}
ext4_set_bit(bit, inode_bitmap_bh->b_data);
BUFFER_TRACE(inode_bitmap_bh, "call ext4_handle_dirty_metadata");
err = ext4_handle_dirty_metadata(NULL, NULL, inode_bitmap_bh);
if (err) {
ext4_std_error(sb, err);
goto out;
}
err = sync_dirty_buffer(inode_bitmap_bh);
if (err) {
ext4_std_error(sb, err);
goto out;
}
/* We may have to initialize the block bitmap if it isn't already */
if (ext4_has_group_desc_csum(sb) &&
gdp->bg_flags & cpu_to_le16(EXT4_BG_BLOCK_UNINIT)) {
struct buffer_head *block_bitmap_bh;
block_bitmap_bh = ext4_read_block_bitmap(sb, group);
if (IS_ERR(block_bitmap_bh)) {
err = PTR_ERR(block_bitmap_bh);
goto out;
}
BUFFER_TRACE(block_bitmap_bh, "dirty block bitmap");
err = ext4_handle_dirty_metadata(NULL, NULL, block_bitmap_bh);
sync_dirty_buffer(block_bitmap_bh);
/* recheck and clear flag under lock if we still need to */
ext4_lock_group(sb, group);
if (ext4_has_group_desc_csum(sb) &&
(gdp->bg_flags & cpu_to_le16(EXT4_BG_BLOCK_UNINIT))) {
gdp->bg_flags &= cpu_to_le16(~EXT4_BG_BLOCK_UNINIT);
ext4_free_group_clusters_set(sb, gdp,
ext4_free_clusters_after_init(sb, group, gdp));
ext4_block_bitmap_csum_set(sb, group, gdp,
block_bitmap_bh);
ext4_group_desc_csum_set(sb, group, gdp);
}
ext4_unlock_group(sb, group);
brelse(block_bitmap_bh);
if (err) {
ext4_std_error(sb, err);
goto out;
}
}
/* Update the relevant bg descriptor fields */
if (ext4_has_group_desc_csum(sb)) {
int free;
ext4_lock_group(sb, group); /* while we modify the bg desc */
free = EXT4_INODES_PER_GROUP(sb) -
ext4_itable_unused_count(sb, gdp);
if (gdp->bg_flags & cpu_to_le16(EXT4_BG_INODE_UNINIT)) {
gdp->bg_flags &= cpu_to_le16(~EXT4_BG_INODE_UNINIT);
free = 0;
}
/*
* Check the relative inode number against the last used
* relative inode number in this group. if it is greater
* we need to update the bg_itable_unused count
*/
if (bit >= free)
ext4_itable_unused_set(sb, gdp,
(EXT4_INODES_PER_GROUP(sb) - bit - 1));
} else {
ext4_lock_group(sb, group);
}
ext4_free_inodes_set(sb, gdp, ext4_free_inodes_count(sb, gdp) - 1);
if (ext4_has_group_desc_csum(sb)) {
ext4_inode_bitmap_csum_set(sb, group, gdp, inode_bitmap_bh,
EXT4_INODES_PER_GROUP(sb) / 8);
ext4_group_desc_csum_set(sb, group, gdp);
}
ext4_unlock_group(sb, group);
err = ext4_handle_dirty_metadata(NULL, NULL, group_desc_bh);
sync_dirty_buffer(group_desc_bh);
out:
return err;
}
static int ext4_xattr_credits_for_new_inode(struct inode *dir, mode_t mode,
bool encrypt)
{
@ -818,7 +938,7 @@ struct inode *__ext4_new_inode(handle_t *handle, struct inode *dir,
struct inode *ret;
ext4_group_t i;
ext4_group_t flex_group;
struct ext4_group_info *grp;
struct ext4_group_info *grp = NULL;
bool encrypt = false;
/* Cannot create files in a deleted directory */
@ -918,15 +1038,21 @@ struct inode *__ext4_new_inode(handle_t *handle, struct inode *dir,
if (ext4_free_inodes_count(sb, gdp) == 0)
goto next_group;
grp = ext4_get_group_info(sb, group);
/* Skip groups with already-known suspicious inode tables */
if (EXT4_MB_GRP_IBITMAP_CORRUPT(grp))
goto next_group;
if (!(sbi->s_mount_state & EXT4_FC_REPLAY)) {
grp = ext4_get_group_info(sb, group);
/*
* Skip groups with already-known suspicious inode
* tables
*/
if (EXT4_MB_GRP_IBITMAP_CORRUPT(grp))
goto next_group;
}
brelse(inode_bitmap_bh);
inode_bitmap_bh = ext4_read_inode_bitmap(sb, group);
/* Skip groups with suspicious inode tables */
if (EXT4_MB_GRP_IBITMAP_CORRUPT(grp) ||
if (((!(sbi->s_mount_state & EXT4_FC_REPLAY))
&& EXT4_MB_GRP_IBITMAP_CORRUPT(grp)) ||
IS_ERR(inode_bitmap_bh)) {
inode_bitmap_bh = NULL;
goto next_group;
@ -945,7 +1071,7 @@ struct inode *__ext4_new_inode(handle_t *handle, struct inode *dir,
goto next_group;
}
if (!handle) {
if ((!(sbi->s_mount_state & EXT4_FC_REPLAY)) && !handle) {
BUG_ON(nblocks <= 0);
handle = __ext4_journal_start_sb(dir->i_sb, line_no,
handle_type, nblocks, 0,
@ -1049,9 +1175,15 @@ struct inode *__ext4_new_inode(handle_t *handle, struct inode *dir,
/* Update the relevant bg descriptor fields */
if (ext4_has_group_desc_csum(sb)) {
int free;
struct ext4_group_info *grp = ext4_get_group_info(sb, group);
struct ext4_group_info *grp = NULL;
down_read(&grp->alloc_sem); /* protect vs itable lazyinit */
if (!(sbi->s_mount_state & EXT4_FC_REPLAY)) {
grp = ext4_get_group_info(sb, group);
down_read(&grp->alloc_sem); /*
* protect vs itable
* lazyinit
*/
}
ext4_lock_group(sb, group); /* while we modify the bg desc */
free = EXT4_INODES_PER_GROUP(sb) -
ext4_itable_unused_count(sb, gdp);
@ -1067,7 +1199,8 @@ struct inode *__ext4_new_inode(handle_t *handle, struct inode *dir,
if (ino > free)
ext4_itable_unused_set(sb, gdp,
(EXT4_INODES_PER_GROUP(sb) - ino));
up_read(&grp->alloc_sem);
if (!(sbi->s_mount_state & EXT4_FC_REPLAY))
up_read(&grp->alloc_sem);
} else {
ext4_lock_group(sb, group);
}

View File

@ -163,7 +163,7 @@ static Indirect *ext4_get_branch(struct inode *inode, int depth,
}
if (!bh_uptodate_or_lock(bh)) {
if (bh_submit_read(bh) < 0) {
if (ext4_read_bh(bh, 0, NULL) < 0) {
put_bh(bh);
goto failure;
}
@ -593,7 +593,8 @@ int ext4_ind_map_blocks(handle_t *handle, struct inode *inode,
if (ext4_has_feature_bigalloc(inode->i_sb)) {
EXT4_ERROR_INODE(inode, "Can't allocate blocks for "
"non-extent mapped inodes with bigalloc");
return -EFSCORRUPTED;
err = -EFSCORRUPTED;
goto out;
}
/* Set up for the direct block allocation */
@ -1012,14 +1013,14 @@ static void ext4_free_branches(handle_t *handle, struct inode *inode,
}
/* Go read the buffer for the next level down */
bh = sb_bread(inode->i_sb, nr);
bh = ext4_sb_bread(inode->i_sb, nr, 0);
/*
* A read failure? Report error and clear slot
* (should be rare).
*/
if (!bh) {
ext4_error_inode_block(inode, nr, EIO,
if (IS_ERR(bh)) {
ext4_error_inode_block(inode, nr, -PTR_ERR(bh),
"Read failure");
continue;
}
@ -1033,7 +1034,7 @@ static void ext4_free_branches(handle_t *handle, struct inode *inode,
brelse(bh);
/*
* Everything below this this pointer has been
* Everything below this pointer has been
* released. Now let this top-of-subtree go.
*
* We want the freeing of this indirect block to be

View File

@ -354,7 +354,7 @@ static int ext4_update_inline_data(handle_t *handle, struct inode *inode,
if (error)
goto out;
/* Update the xttr entry. */
/* Update the xattr entry. */
i.value = value;
i.value_len = len;

View File

@ -101,8 +101,8 @@ static int ext4_inode_csum_verify(struct inode *inode, struct ext4_inode *raw,
return provided == calculated;
}
static void ext4_inode_csum_set(struct inode *inode, struct ext4_inode *raw,
struct ext4_inode_info *ei)
void ext4_inode_csum_set(struct inode *inode, struct ext4_inode *raw,
struct ext4_inode_info *ei)
{
__u32 csum;
@ -514,7 +514,8 @@ int ext4_map_blocks(handle_t *handle, struct inode *inode,
return -EFSCORRUPTED;
/* Lookup extent status tree firstly */
if (ext4_es_lookup_extent(inode, map->m_lblk, NULL, &es)) {
if (!(EXT4_SB(inode->i_sb)->s_mount_state & EXT4_FC_REPLAY) &&
ext4_es_lookup_extent(inode, map->m_lblk, NULL, &es)) {
if (ext4_es_is_written(&es) || ext4_es_is_unwritten(&es)) {
map->m_pblk = ext4_es_pblock(&es) +
map->m_lblk - es.es_lblk;
@ -729,6 +730,8 @@ int ext4_map_blocks(handle_t *handle, struct inode *inode,
if (ret)
return ret;
}
ext4_fc_track_range(inode, map->m_lblk,
map->m_lblk + map->m_len - 1);
}
if (retval < 0)
@ -825,7 +828,8 @@ struct buffer_head *ext4_getblk(handle_t *handle, struct inode *inode,
int create = map_flags & EXT4_GET_BLOCKS_CREATE;
int err;
J_ASSERT(handle != NULL || create == 0);
J_ASSERT((EXT4_SB(inode->i_sb)->s_mount_state & EXT4_FC_REPLAY)
|| handle != NULL || create == 0);
map.m_lblk = block;
map.m_len = 1;
@ -841,7 +845,8 @@ struct buffer_head *ext4_getblk(handle_t *handle, struct inode *inode,
return ERR_PTR(-ENOMEM);
if (map.m_flags & EXT4_MAP_NEW) {
J_ASSERT(create != 0);
J_ASSERT(handle != NULL);
J_ASSERT((EXT4_SB(inode->i_sb)->s_mount_state & EXT4_FC_REPLAY)
|| (handle != NULL));
/*
* Now that we do not always journal data, we should
@ -878,18 +883,20 @@ struct buffer_head *ext4_bread(handle_t *handle, struct inode *inode,
ext4_lblk_t block, int map_flags)
{
struct buffer_head *bh;
int ret;
bh = ext4_getblk(handle, inode, block, map_flags);
if (IS_ERR(bh))
return bh;
if (!bh || ext4_buffer_uptodate(bh))
return bh;
ll_rw_block(REQ_OP_READ, REQ_META | REQ_PRIO, 1, &bh);
wait_on_buffer(bh);
if (buffer_uptodate(bh))
return bh;
put_bh(bh);
return ERR_PTR(-EIO);
ret = ext4_read_bh_lock(bh, REQ_META | REQ_PRIO, true);
if (ret) {
put_bh(bh);
return ERR_PTR(ret);
}
return bh;
}
/* Read a contiguous batch of blocks. */
@ -910,8 +917,7 @@ int ext4_bread_batch(struct inode *inode, ext4_lblk_t block, int bh_count,
for (i = 0; i < bh_count; i++)
/* Note that NULL bhs[i] is valid because of holes. */
if (bhs[i] && !ext4_buffer_uptodate(bhs[i]))
ll_rw_block(REQ_OP_READ, REQ_META | REQ_PRIO, 1,
&bhs[i]);
ext4_read_bh_lock(bhs[i], REQ_META | REQ_PRIO, false);
if (!wait)
return 0;
@ -1081,7 +1087,7 @@ static int ext4_block_write_begin(struct page *page, loff_t pos, unsigned len,
if (!buffer_uptodate(bh) && !buffer_delay(bh) &&
!buffer_unwritten(bh) &&
(block_start < from || block_end > to)) {
ll_rw_block(REQ_OP_READ, 0, 1, &bh);
ext4_read_bh_lock(bh, 0, false);
wait[nr_wait++] = bh;
}
}
@ -1910,6 +1916,9 @@ static int __ext4_journalled_writepage(struct page *page,
err = ext4_walk_page_buffers(handle, page_bufs, 0, len, NULL,
write_end_fn);
}
if (ret == 0)
ret = err;
err = ext4_jbd2_inode_add_write(handle, inode, 0, len);
if (ret == 0)
ret = err;
EXT4_I(inode)->i_datasync_tid = handle->h_transaction->t_tid;
@ -2254,7 +2263,7 @@ static int mpage_process_page(struct mpage_da_data *mpd, struct page *page,
err = PTR_ERR(io_end_vec);
goto out;
}
io_end_vec->offset = mpd->map.m_lblk << blkbits;
io_end_vec->offset = (loff_t)mpd->map.m_lblk << blkbits;
}
*map_bh = true;
goto out;
@ -2785,7 +2794,7 @@ static int ext4_writepages(struct address_space *mapping,
* ext4_journal_stop() can wait for transaction commit
* to finish which may depend on writeback of pages to
* complete or on page lock to be released. In that
* case, we have to wait until after after we have
* case, we have to wait until after we have
* submitted all the IO, released page locks we hold,
* and dropped io_end reference (for extent conversion
* to be able to complete) before stopping the handle.
@ -3296,9 +3305,14 @@ static bool ext4_inode_datasync_dirty(struct inode *inode)
{
journal_t *journal = EXT4_SB(inode->i_sb)->s_journal;
if (journal)
return !jbd2_transaction_committed(journal,
EXT4_I(inode)->i_datasync_tid);
if (journal) {
if (jbd2_transaction_committed(journal,
EXT4_I(inode)->i_datasync_tid))
return true;
return atomic_read(&EXT4_SB(inode->i_sb)->s_fc_subtid) >=
EXT4_I(inode)->i_fc_committed_subtid;
}
/* Any metadata buffers to write? */
if (!list_empty(&inode->i_mapping->private_list))
return true;
@ -3436,14 +3450,26 @@ static int ext4_iomap_begin(struct inode *inode, loff_t offset, loff_t length,
map.m_len = min_t(loff_t, (offset + length - 1) >> blkbits,
EXT4_MAX_LOGICAL_BLOCK) - map.m_lblk + 1;
if (flags & IOMAP_WRITE)
if (flags & IOMAP_WRITE) {
/*
* We check here if the blocks are already allocated, then we
* don't need to start a journal txn and we can directly return
* the mapping information. This could boost performance
* especially in multi-threaded overwrite requests.
*/
if (offset + length <= i_size_read(inode)) {
ret = ext4_map_blocks(NULL, inode, &map, 0);
if (ret > 0 && (map.m_flags & EXT4_MAP_MAPPED))
goto out;
}
ret = ext4_iomap_alloc(inode, &map, flags);
else
} else {
ret = ext4_map_blocks(NULL, inode, &map, 0);
}
if (ret < 0)
return ret;
out:
ext4_set_iomap(inode, iomap, &map, offset, length);
return 0;
@ -3601,6 +3627,13 @@ static int ext4_set_page_dirty(struct page *page)
return __set_page_dirty_buffers(page);
}
static int ext4_iomap_swap_activate(struct swap_info_struct *sis,
struct file *file, sector_t *span)
{
return iomap_swapfile_activate(sis, file, span,
&ext4_iomap_report_ops);
}
static const struct address_space_operations ext4_aops = {
.readpage = ext4_readpage,
.readahead = ext4_readahead,
@ -3616,6 +3649,7 @@ static const struct address_space_operations ext4_aops = {
.migratepage = buffer_migrate_page,
.is_partially_uptodate = block_is_partially_uptodate,
.error_remove_page = generic_error_remove_page,
.swap_activate = ext4_iomap_swap_activate,
};
static const struct address_space_operations ext4_journalled_aops = {
@ -3632,6 +3666,7 @@ static const struct address_space_operations ext4_journalled_aops = {
.direct_IO = noop_direct_IO,
.is_partially_uptodate = block_is_partially_uptodate,
.error_remove_page = generic_error_remove_page,
.swap_activate = ext4_iomap_swap_activate,
};
static const struct address_space_operations ext4_da_aops = {
@ -3649,6 +3684,7 @@ static const struct address_space_operations ext4_da_aops = {
.migratepage = buffer_migrate_page,
.is_partially_uptodate = block_is_partially_uptodate,
.error_remove_page = generic_error_remove_page,
.swap_activate = ext4_iomap_swap_activate,
};
static const struct address_space_operations ext4_dax_aops = {
@ -3657,6 +3693,7 @@ static const struct address_space_operations ext4_dax_aops = {
.set_page_dirty = noop_set_page_dirty,
.bmap = ext4_bmap,
.invalidatepage = noop_invalidatepage,
.swap_activate = ext4_iomap_swap_activate,
};
void ext4_set_aops(struct inode *inode)
@ -3730,11 +3767,8 @@ static int __ext4_block_zero_page_range(handle_t *handle,
set_buffer_uptodate(bh);
if (!buffer_uptodate(bh)) {
err = -EIO;
ll_rw_block(REQ_OP_READ, 0, 1, &bh);
wait_on_buffer(bh);
/* Uhhuh. Read error. Complain and punt. */
if (!buffer_uptodate(bh))
err = ext4_read_bh_lock(bh, 0, true);
if (err)
goto unlock;
if (fscrypt_inode_uses_fs_layer_crypto(inode)) {
/* We expect the key to be set. */
@ -4073,6 +4107,7 @@ int ext4_punch_hole(struct inode *inode, loff_t offset, loff_t length)
up_write(&EXT4_I(inode)->i_data_sem);
}
ext4_fc_track_range(inode, first_block, stop_block);
if (IS_SYNC(inode))
ext4_handle_sync(handle);
@ -4252,22 +4287,22 @@ int ext4_truncate(struct inode *inode)
* data in memory that is needed to recreate the on-disk version of this
* inode.
*/
static int __ext4_get_inode_loc(struct inode *inode,
struct ext4_iloc *iloc, int in_mem)
static int __ext4_get_inode_loc(struct super_block *sb, unsigned long ino,
struct ext4_iloc *iloc, int in_mem,
ext4_fsblk_t *ret_block)
{
struct ext4_group_desc *gdp;
struct buffer_head *bh;
struct super_block *sb = inode->i_sb;
ext4_fsblk_t block;
struct blk_plug plug;
int inodes_per_block, inode_offset;
iloc->bh = NULL;
if (inode->i_ino < EXT4_ROOT_INO ||
inode->i_ino > le32_to_cpu(EXT4_SB(sb)->s_es->s_inodes_count))
if (ino < EXT4_ROOT_INO ||
ino > le32_to_cpu(EXT4_SB(sb)->s_es->s_inodes_count))
return -EFSCORRUPTED;
iloc->block_group = (inode->i_ino - 1) / EXT4_INODES_PER_GROUP(sb);
iloc->block_group = (ino - 1) / EXT4_INODES_PER_GROUP(sb);
gdp = ext4_get_group_desc(sb, iloc->block_group, NULL);
if (!gdp)
return -EIO;
@ -4276,7 +4311,7 @@ static int __ext4_get_inode_loc(struct inode *inode,
* Figure out the offset within the block group inode table
*/
inodes_per_block = EXT4_SB(sb)->s_inodes_per_block;
inode_offset = ((inode->i_ino - 1) %
inode_offset = ((ino - 1) %
EXT4_INODES_PER_GROUP(sb));
block = ext4_inode_table(sb, gdp) + (inode_offset / inodes_per_block);
iloc->offset = (inode_offset % inodes_per_block) * EXT4_INODE_SIZE(sb);
@ -4289,16 +4324,7 @@ static int __ext4_get_inode_loc(struct inode *inode,
if (!buffer_uptodate(bh)) {
lock_buffer(bh);
/*
* If the buffer has the write error flag, we have failed
* to write out another inode in the same block. In this
* case, we don't have to read the block because we may
* read the old inode data successfully.
*/
if (buffer_write_io_error(bh) && !buffer_uptodate(bh))
set_buffer_uptodate(bh);
if (buffer_uptodate(bh)) {
if (ext4_buffer_uptodate(bh)) {
/* someone brought it uptodate while we waited */
unlock_buffer(bh);
goto has_buffer;
@ -4369,7 +4395,7 @@ static int __ext4_get_inode_loc(struct inode *inode,
if (end > table)
end = table;
while (b <= end)
sb_breadahead_unmovable(sb, b++);
ext4_sb_breadahead_unmovable(sb, b++);
}
/*
@ -4377,16 +4403,14 @@ static int __ext4_get_inode_loc(struct inode *inode,
* has in-inode xattrs, or we don't have this inode in memory.
* Read the block from disk.
*/
trace_ext4_load_inode(inode);
get_bh(bh);
bh->b_end_io = end_buffer_read_sync;
submit_bh(REQ_OP_READ, REQ_META | REQ_PRIO, bh);
trace_ext4_load_inode(sb, ino);
ext4_read_bh_nowait(bh, REQ_META | REQ_PRIO, NULL);
blk_finish_plug(&plug);
wait_on_buffer(bh);
if (!buffer_uptodate(bh)) {
simulate_eio:
ext4_error_inode_block(inode, block, EIO,
"unable to read itable block");
if (ret_block)
*ret_block = block;
brelse(bh);
return -EIO;
}
@ -4396,11 +4420,43 @@ static int __ext4_get_inode_loc(struct inode *inode,
return 0;
}
static int __ext4_get_inode_loc_noinmem(struct inode *inode,
struct ext4_iloc *iloc)
{
ext4_fsblk_t err_blk;
int ret;
ret = __ext4_get_inode_loc(inode->i_sb, inode->i_ino, iloc, 0,
&err_blk);
if (ret == -EIO)
ext4_error_inode_block(inode, err_blk, EIO,
"unable to read itable block");
return ret;
}
int ext4_get_inode_loc(struct inode *inode, struct ext4_iloc *iloc)
{
ext4_fsblk_t err_blk;
int ret;
/* We have all inode data except xattrs in memory here. */
return __ext4_get_inode_loc(inode, iloc,
!ext4_test_inode_state(inode, EXT4_STATE_XATTR));
ret = __ext4_get_inode_loc(inode->i_sb, inode->i_ino, iloc,
!ext4_test_inode_state(inode, EXT4_STATE_XATTR), &err_blk);
if (ret == -EIO)
ext4_error_inode_block(inode, err_blk, EIO,
"unable to read itable block");
return ret;
}
int ext4_get_fc_inode_loc(struct super_block *sb, unsigned long ino,
struct ext4_iloc *iloc)
{
return __ext4_get_inode_loc(sb, ino, iloc, 0, NULL);
}
static bool ext4_should_enable_dax(struct inode *inode)
@ -4566,7 +4622,7 @@ struct inode *__ext4_iget(struct super_block *sb, unsigned long ino,
ei = EXT4_I(inode);
iloc.bh = NULL;
ret = __ext4_get_inode_loc(inode, &iloc, 0);
ret = __ext4_get_inode_loc_noinmem(inode, &iloc);
if (ret < 0)
goto bad_inode;
raw_inode = ext4_raw_inode(&iloc);
@ -4612,10 +4668,11 @@ struct inode *__ext4_iget(struct super_block *sb, unsigned long ino,
sizeof(gen));
}
if (!ext4_inode_csum_verify(inode, raw_inode, ei) ||
ext4_simulate_fail(sb, EXT4_SIM_INODE_CRC)) {
ext4_error_inode_err(inode, function, line, 0, EFSBADCRC,
"iget: checksum invalid");
if ((!ext4_inode_csum_verify(inode, raw_inode, ei) ||
ext4_simulate_fail(sb, EXT4_SIM_INODE_CRC)) &&
(!(EXT4_SB(sb)->s_mount_state & EXT4_FC_REPLAY))) {
ext4_error_inode_err(inode, function, line, 0,
EFSBADCRC, "iget: checksum invalid");
ret = -EFSBADCRC;
goto bad_inode;
}
@ -4703,6 +4760,7 @@ struct inode *__ext4_iget(struct super_block *sb, unsigned long ino,
for (block = 0; block < EXT4_N_BLOCKS; block++)
ei->i_data[block] = raw_inode->i_block[block];
INIT_LIST_HEAD(&ei->i_orphan);
ext4_fc_init_inode(&ei->vfs_inode);
/*
* Set transaction id's of transactions that have to be committed
@ -4768,9 +4826,10 @@ struct inode *__ext4_iget(struct super_block *sb, unsigned long ino,
goto bad_inode;
} else if (!ext4_has_inline_data(inode)) {
/* validate the block references in the inode */
if (S_ISREG(inode->i_mode) || S_ISDIR(inode->i_mode) ||
(S_ISLNK(inode->i_mode) &&
!ext4_inode_is_fast_symlink(inode))) {
if (!(EXT4_SB(sb)->s_mount_state & EXT4_FC_REPLAY) &&
(S_ISREG(inode->i_mode) || S_ISDIR(inode->i_mode) ||
(S_ISLNK(inode->i_mode) &&
!ext4_inode_is_fast_symlink(inode)))) {
if (ext4_test_inode_flag(inode, EXT4_INODE_EXTENTS))
ret = ext4_ext_check_inode(inode);
else
@ -4971,6 +5030,12 @@ static int ext4_do_update_inode(handle_t *handle,
if (ext4_test_inode_state(inode, EXT4_STATE_NEW))
memset(raw_inode, 0, EXT4_SB(inode->i_sb)->s_inode_size);
err = ext4_inode_blocks_set(handle, raw_inode, ei);
if (err) {
spin_unlock(&ei->i_raw_lock);
goto out_brelse;
}
raw_inode->i_mode = cpu_to_le16(inode->i_mode);
i_uid = i_uid_read(inode);
i_gid = i_gid_read(inode);
@ -5004,11 +5069,6 @@ static int ext4_do_update_inode(handle_t *handle,
EXT4_INODE_SET_XTIME(i_atime, inode, raw_inode);
EXT4_EINODE_SET_XTIME(i_crtime, ei, raw_inode);
err = ext4_inode_blocks_set(handle, raw_inode, ei);
if (err) {
spin_unlock(&ei->i_raw_lock);
goto out_brelse;
}
raw_inode->i_dtime = cpu_to_le32(ei->i_dtime);
raw_inode->i_flags = cpu_to_le32(ei->i_flags & 0xFFFFFFFF);
if (likely(!test_opt2(inode->i_sb, HURD_COMPAT)))
@ -5149,12 +5209,12 @@ int ext4_write_inode(struct inode *inode, struct writeback_control *wbc)
if (wbc->sync_mode != WB_SYNC_ALL || wbc->for_sync)
return 0;
err = jbd2_complete_transaction(EXT4_SB(inode->i_sb)->s_journal,
err = ext4_fc_commit(EXT4_SB(inode->i_sb)->s_journal,
EXT4_I(inode)->i_sync_tid);
} else {
struct ext4_iloc iloc;
err = __ext4_get_inode_loc(inode, &iloc, 0);
err = __ext4_get_inode_loc_noinmem(inode, &iloc);
if (err)
return err;
/*
@ -5278,6 +5338,7 @@ int ext4_setattr(struct dentry *dentry, struct iattr *attr)
if (error)
return error;
}
ext4_fc_start_update(inode);
if ((ia_valid & ATTR_UID && !uid_eq(attr->ia_uid, inode->i_uid)) ||
(ia_valid & ATTR_GID && !gid_eq(attr->ia_gid, inode->i_gid))) {
handle_t *handle;
@ -5301,6 +5362,7 @@ int ext4_setattr(struct dentry *dentry, struct iattr *attr)
if (error) {
ext4_journal_stop(handle);
ext4_fc_stop_update(inode);
return error;
}
/* Update corresponding info in inode so that everything is in
@ -5323,11 +5385,15 @@ int ext4_setattr(struct dentry *dentry, struct iattr *attr)
if (!(ext4_test_inode_flag(inode, EXT4_INODE_EXTENTS))) {
struct ext4_sb_info *sbi = EXT4_SB(inode->i_sb);
if (attr->ia_size > sbi->s_bitmap_maxbytes)
if (attr->ia_size > sbi->s_bitmap_maxbytes) {
ext4_fc_stop_update(inode);
return -EFBIG;
}
}
if (!S_ISREG(inode->i_mode))
if (!S_ISREG(inode->i_mode)) {
ext4_fc_stop_update(inode);
return -EINVAL;
}
if (IS_I_VERSION(inode) && attr->ia_size != inode->i_size)
inode_inc_iversion(inode);
@ -5351,7 +5417,7 @@ int ext4_setattr(struct dentry *dentry, struct iattr *attr)
rc = ext4_break_layouts(inode);
if (rc) {
up_write(&EXT4_I(inode)->i_mmap_sem);
return rc;
goto err_out;
}
if (attr->ia_size != inode->i_size) {
@ -5372,6 +5438,21 @@ int ext4_setattr(struct dentry *dentry, struct iattr *attr)
inode->i_mtime = current_time(inode);
inode->i_ctime = inode->i_mtime;
}
if (shrink)
ext4_fc_track_range(inode,
(attr->ia_size > 0 ? attr->ia_size - 1 : 0) >>
inode->i_sb->s_blocksize_bits,
(oldsize > 0 ? oldsize - 1 : 0) >>
inode->i_sb->s_blocksize_bits);
else
ext4_fc_track_range(
inode,
(oldsize > 0 ? oldsize - 1 : oldsize) >>
inode->i_sb->s_blocksize_bits,
(attr->ia_size > 0 ? attr->ia_size - 1 : 0) >>
inode->i_sb->s_blocksize_bits);
down_write(&EXT4_I(inode)->i_data_sem);
EXT4_I(inode)->i_disksize = attr->ia_size;
rc = ext4_mark_inode_dirty(handle, inode);
@ -5430,9 +5511,11 @@ int ext4_setattr(struct dentry *dentry, struct iattr *attr)
rc = posix_acl_chmod(inode, inode->i_mode);
err_out:
ext4_std_error(inode->i_sb, error);
if (error)
ext4_std_error(inode->i_sb, error);
if (!error)
error = rc;
ext4_fc_stop_update(inode);
return error;
}
@ -5614,6 +5697,8 @@ int ext4_mark_iloc_dirty(handle_t *handle,
put_bh(iloc->bh);
return -EIO;
}
ext4_fc_track_inode(inode);
if (IS_I_VERSION(inode))
inode_inc_iversion(inode);
@ -5937,6 +6022,8 @@ int ext4_change_inode_journal_flag(struct inode *inode, int val)
if (IS_ERR(handle))
return PTR_ERR(handle);
ext4_fc_mark_ineligible(inode->i_sb,
EXT4_FC_REASON_JOURNAL_FLAG_CHANGE);
err = ext4_mark_inode_dirty(handle, inode);
ext4_handle_sync(handle);
ext4_journal_stop(handle);
@ -5977,9 +6064,17 @@ vm_fault_t ext4_page_mkwrite(struct vm_fault *vmf)
if (err)
goto out_ret;
/*
* On data journalling we skip straight to the transaction handle:
* there's no delalloc; page truncated will be checked later; the
* early return w/ all buffers mapped (calculates size/len) can't
* be used; and there's no dioread_nolock, so only ext4_get_block.
*/
if (ext4_should_journal_data(inode))
goto retry_alloc;
/* Delalloc case is easy... */
if (test_opt(inode->i_sb, DELALLOC) &&
!ext4_should_journal_data(inode) &&
!ext4_nonda_switch(inode->i_sb)) {
do {
err = block_page_mkwrite(vma, vmf,
@ -6005,6 +6100,9 @@ vm_fault_t ext4_page_mkwrite(struct vm_fault *vmf)
/*
* Return if we have all the buffers mapped. This avoids the need to do
* journal_start/journal_stop which can block and take a long time
*
* This cannot be done for data journalling, as we have to add the
* inode to the transaction's list to writeprotect pages on commit.
*/
if (page_has_buffers(page)) {
if (!ext4_walk_page_buffers(NULL, page_buffers(page),
@ -6029,16 +6127,42 @@ vm_fault_t ext4_page_mkwrite(struct vm_fault *vmf)
ret = VM_FAULT_SIGBUS;
goto out;
}
err = block_page_mkwrite(vma, vmf, get_block);
if (!err && ext4_should_journal_data(inode)) {
if (ext4_walk_page_buffers(handle, page_buffers(page), 0,
PAGE_SIZE, NULL, do_journal_get_write_access)) {
unlock_page(page);
ret = VM_FAULT_SIGBUS;
ext4_journal_stop(handle);
goto out;
/*
* Data journalling can't use block_page_mkwrite() because it
* will set_buffer_dirty() before do_journal_get_write_access()
* thus might hit warning messages for dirty metadata buffers.
*/
if (!ext4_should_journal_data(inode)) {
err = block_page_mkwrite(vma, vmf, get_block);
} else {
lock_page(page);
size = i_size_read(inode);
/* Page got truncated from under us? */
if (page->mapping != mapping || page_offset(page) > size) {
ret = VM_FAULT_NOPAGE;
goto out_error;
}
if (page->index == size >> PAGE_SHIFT)
len = size & ~PAGE_MASK;
else
len = PAGE_SIZE;
err = __block_write_begin(page, 0, len, ext4_get_block);
if (!err) {
ret = VM_FAULT_SIGBUS;
if (ext4_walk_page_buffers(handle, page_buffers(page),
0, len, NULL, do_journal_get_write_access))
goto out_error;
if (ext4_walk_page_buffers(handle, page_buffers(page),
0, len, NULL, write_end_fn))
goto out_error;
if (ext4_jbd2_inode_add_write(handle, inode, 0, len))
goto out_error;
ext4_set_inode_state(inode, EXT4_STATE_JDATA);
} else {
unlock_page(page);
}
ext4_set_inode_state(inode, EXT4_STATE_JDATA);
}
ext4_journal_stop(handle);
if (err == -ENOSPC && ext4_should_retry_alloc(inode->i_sb, &retries))
@ -6049,6 +6173,10 @@ vm_fault_t ext4_page_mkwrite(struct vm_fault *vmf)
up_read(&EXT4_I(inode)->i_mmap_sem);
sb_end_pagefault(inode->i_sb);
return ret;
out_error:
unlock_page(page);
ext4_journal_stop(handle);
goto out;
}
vm_fault_t ext4_filemap_fault(struct vm_fault *vmf)

View File

@ -86,7 +86,7 @@ static void swap_inode_data(struct inode *inode1, struct inode *inode2)
i_size_write(inode2, isize);
}
static void reset_inode_seed(struct inode *inode)
void ext4_reset_inode_seed(struct inode *inode)
{
struct ext4_inode_info *ei = EXT4_I(inode);
struct ext4_sb_info *sbi = EXT4_SB(inode->i_sb);
@ -165,6 +165,7 @@ static long swap_inode_boot_loader(struct super_block *sb,
err = -EINVAL;
goto err_out;
}
ext4_fc_start_ineligible(sb, EXT4_FC_REASON_SWAP_BOOT);
/* Protect extent tree against block allocations via delalloc */
ext4_double_down_write_data_sem(inode, inode_bl);
@ -199,8 +200,8 @@ static long swap_inode_boot_loader(struct super_block *sb,
inode->i_generation = prandom_u32();
inode_bl->i_generation = prandom_u32();
reset_inode_seed(inode);
reset_inode_seed(inode_bl);
ext4_reset_inode_seed(inode);
ext4_reset_inode_seed(inode_bl);
ext4_discard_preallocations(inode, 0);
@ -247,6 +248,7 @@ static long swap_inode_boot_loader(struct super_block *sb,
err_out1:
ext4_journal_stop(handle);
ext4_fc_stop_ineligible(sb);
ext4_double_up_write_data_sem(inode, inode_bl);
err_out:
@ -807,7 +809,7 @@ static int ext4_ioctl_get_es_cache(struct file *filp, unsigned long arg)
return error;
}
long ext4_ioctl(struct file *filp, unsigned int cmd, unsigned long arg)
static long __ext4_ioctl(struct file *filp, unsigned int cmd, unsigned long arg)
{
struct inode *inode = file_inode(filp);
struct super_block *sb = inode->i_sb;
@ -1074,6 +1076,7 @@ long ext4_ioctl(struct file *filp, unsigned int cmd, unsigned long arg)
err = ext4_resize_fs(sb, n_blocks_count);
if (EXT4_SB(sb)->s_journal) {
ext4_fc_mark_ineligible(sb, EXT4_FC_REASON_RESIZE);
jbd2_journal_lock_updates(EXT4_SB(sb)->s_journal);
err2 = jbd2_journal_flush(EXT4_SB(sb)->s_journal);
jbd2_journal_unlock_updates(EXT4_SB(sb)->s_journal);
@ -1308,6 +1311,17 @@ long ext4_ioctl(struct file *filp, unsigned int cmd, unsigned long arg)
}
}
long ext4_ioctl(struct file *filp, unsigned int cmd, unsigned long arg)
{
long ret;
ext4_fc_start_update(file_inode(filp));
ret = __ext4_ioctl(filp, cmd, arg);
ext4_fc_stop_update(file_inode(filp));
return ret;
}
#ifdef CONFIG_COMPAT
long ext4_compat_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
{

View File

@ -124,7 +124,7 @@
* /sys/fs/ext4/<partition>/mb_group_prealloc. The value is represented in
* terms of number of blocks. If we have mounted the file system with -O
* stripe=<value> option the group prealloc request is normalized to the
* the smallest multiple of the stripe value (sbi->s_stripe) which is
* smallest multiple of the stripe value (sbi->s_stripe) which is
* greater than the default mb_group_prealloc.
*
* The regular allocator (using the buddy cache) supports a few tunables.
@ -619,11 +619,8 @@ static int __mb_check_buddy(struct ext4_buddy *e4b, char *file,
void *buddy;
void *buddy2;
{
static int mb_check_counter;
if (mb_check_counter++ % 100 != 0)
return 0;
}
if (e4b->bd_info->bb_check_counter++ % 10)
return 0;
while (order > 1) {
buddy = mb_find_buddy(e4b, order, &max);
@ -1394,9 +1391,6 @@ void ext4_set_bits(void *bm, int cur, int len)
}
}
/*
* _________________________________________________________________ */
static inline int mb_buddy_adjust_border(int* bit, void* bitmap, int side)
{
if (mb_test_bit(*bit + side, bitmap)) {
@ -1508,14 +1502,16 @@ static void mb_free_blocks(struct inode *inode, struct ext4_buddy *e4b,
blocknr = ext4_group_first_block_no(sb, e4b->bd_group);
blocknr += EXT4_C2B(sbi, block);
ext4_grp_locked_error(sb, e4b->bd_group,
inode ? inode->i_ino : 0,
blocknr,
"freeing already freed block "
"(bit %u); block bitmap corrupt.",
block);
ext4_mark_group_bitmap_corrupted(sb, e4b->bd_group,
if (!(sbi->s_mount_state & EXT4_FC_REPLAY)) {
ext4_grp_locked_error(sb, e4b->bd_group,
inode ? inode->i_ino : 0,
blocknr,
"freeing already freed block (bit %u); block bitmap corrupt.",
block);
ext4_mark_group_bitmap_corrupted(
sb, e4b->bd_group,
EXT4_GROUP_INFO_BBITMAP_CORRUPT);
}
mb_regenerate_buddy(e4b);
goto done;
}
@ -2019,7 +2015,7 @@ void ext4_mb_complex_scan_group(struct ext4_allocation_context *ac,
/*
* IF we have corrupt bitmap, we won't find any
* free blocks even though group info says we
* we have free blocks
* have free blocks
*/
ext4_grp_locked_error(sb, e4b->bd_group, 0, 0,
"%d free clusters as per "
@ -3302,6 +3298,84 @@ ext4_mb_mark_diskspace_used(struct ext4_allocation_context *ac,
return err;
}
/*
* Idempotent helper for Ext4 fast commit replay path to set the state of
* blocks in bitmaps and update counters.
*/
void ext4_mb_mark_bb(struct super_block *sb, ext4_fsblk_t block,
int len, int state)
{
struct buffer_head *bitmap_bh = NULL;
struct ext4_group_desc *gdp;
struct buffer_head *gdp_bh;
struct ext4_sb_info *sbi = EXT4_SB(sb);
ext4_group_t group;
ext4_grpblk_t blkoff;
int i, clen, err;
int already;
clen = EXT4_B2C(sbi, len);
ext4_get_group_no_and_offset(sb, block, &group, &blkoff);
bitmap_bh = ext4_read_block_bitmap(sb, group);
if (IS_ERR(bitmap_bh)) {
err = PTR_ERR(bitmap_bh);
bitmap_bh = NULL;
goto out_err;
}
err = -EIO;
gdp = ext4_get_group_desc(sb, group, &gdp_bh);
if (!gdp)
goto out_err;
ext4_lock_group(sb, group);
already = 0;
for (i = 0; i < clen; i++)
if (!mb_test_bit(blkoff + i, bitmap_bh->b_data) == !state)
already++;
if (state)
ext4_set_bits(bitmap_bh->b_data, blkoff, clen);
else
mb_test_and_clear_bits(bitmap_bh->b_data, blkoff, clen);
if (ext4_has_group_desc_csum(sb) &&
(gdp->bg_flags & cpu_to_le16(EXT4_BG_BLOCK_UNINIT))) {
gdp->bg_flags &= cpu_to_le16(~EXT4_BG_BLOCK_UNINIT);
ext4_free_group_clusters_set(sb, gdp,
ext4_free_clusters_after_init(sb,
group, gdp));
}
if (state)
clen = ext4_free_group_clusters(sb, gdp) - clen + already;
else
clen = ext4_free_group_clusters(sb, gdp) + clen - already;
ext4_free_group_clusters_set(sb, gdp, clen);
ext4_block_bitmap_csum_set(sb, group, gdp, bitmap_bh);
ext4_group_desc_csum_set(sb, group, gdp);
ext4_unlock_group(sb, group);
if (sbi->s_log_groups_per_flex) {
ext4_group_t flex_group = ext4_flex_group(sbi, group);
atomic64_sub(len,
&sbi_array_rcu_deref(sbi, s_flex_groups,
flex_group)->free_clusters);
}
err = ext4_handle_dirty_metadata(NULL, NULL, bitmap_bh);
if (err)
goto out_err;
sync_dirty_buffer(bitmap_bh);
err = ext4_handle_dirty_metadata(NULL, NULL, gdp_bh);
sync_dirty_buffer(gdp_bh);
out_err:
brelse(bitmap_bh);
}
/*
* here we normalize request for locality group
* Group request are normalized to s_mb_group_prealloc, which goes to
@ -4160,7 +4234,7 @@ ext4_mb_discard_group_preallocations(struct super_block *sb,
struct ext4_buddy e4b;
int err;
int busy = 0;
int free = 0;
int free, free_total = 0;
mb_debug(sb, "discard preallocation for group %u\n", group);
if (list_empty(&grp->bb_prealloc_list))
@ -4188,8 +4262,8 @@ ext4_mb_discard_group_preallocations(struct super_block *sb,
INIT_LIST_HEAD(&list);
repeat:
free = 0;
ext4_lock_group(sb, group);
this_cpu_inc(discard_pa_seq);
list_for_each_entry_safe(pa, tmp,
&grp->bb_prealloc_list, pa_group_list) {
spin_lock(&pa->pa_lock);
@ -4206,6 +4280,9 @@ ext4_mb_discard_group_preallocations(struct super_block *sb,
/* seems this one can be freed ... */
ext4_mb_mark_pa_deleted(sb, pa);
if (!free)
this_cpu_inc(discard_pa_seq);
/* we can trust pa_free ... */
free += pa->pa_free;
@ -4215,22 +4292,6 @@ ext4_mb_discard_group_preallocations(struct super_block *sb,
list_add(&pa->u.pa_tmp_list, &list);
}
/* if we still need more blocks and some PAs were used, try again */
if (free < needed && busy) {
busy = 0;
ext4_unlock_group(sb, group);
cond_resched();
goto repeat;
}
/* found anything to free? */
if (list_empty(&list)) {
BUG_ON(free != 0);
mb_debug(sb, "Someone else may have freed PA for this group %u\n",
group);
goto out;
}
/* now free all selected PAs */
list_for_each_entry_safe(pa, tmp, &list, u.pa_tmp_list) {
@ -4248,14 +4309,22 @@ ext4_mb_discard_group_preallocations(struct super_block *sb,
call_rcu(&(pa)->u.pa_rcu, ext4_mb_pa_callback);
}
out:
free_total += free;
/* if we still need more blocks and some PAs were used, try again */
if (free_total < needed && busy) {
ext4_unlock_group(sb, group);
cond_resched();
busy = 0;
goto repeat;
}
ext4_unlock_group(sb, group);
ext4_mb_unload_buddy(&e4b);
put_bh(bitmap_bh);
out_dbg:
mb_debug(sb, "discarded (%d) blocks preallocated for group %u bb_free (%d)\n",
free, group, grp->bb_free);
return free;
free_total, group, grp->bb_free);
return free_total;
}
/*
@ -4283,6 +4352,9 @@ void ext4_discard_preallocations(struct inode *inode, unsigned int needed)
return;
}
if (EXT4_SB(sb)->s_mount_state & EXT4_FC_REPLAY)
return;
mb_debug(sb, "discard preallocation for inode %lu\n",
inode->i_ino);
trace_ext4_discard_preallocations(inode,
@ -4830,6 +4902,9 @@ static bool ext4_mb_discard_preallocations_should_retry(struct super_block *sb,
return ret;
}
static ext4_fsblk_t ext4_mb_new_blocks_simple(handle_t *handle,
struct ext4_allocation_request *ar, int *errp);
/*
* Main entry point into mballoc to allocate blocks
* it tries to use preallocation first, then falls back
@ -4851,6 +4926,8 @@ ext4_fsblk_t ext4_mb_new_blocks(handle_t *handle,
sbi = EXT4_SB(sb);
trace_ext4_request_blocks(ar);
if (sbi->s_mount_state & EXT4_FC_REPLAY)
return ext4_mb_new_blocks_simple(handle, ar, errp);
/* Allow to use superuser reservation for quota file */
if (ext4_is_quota_file(ar->inode))
@ -5078,6 +5155,102 @@ ext4_mb_free_metadata(handle_t *handle, struct ext4_buddy *e4b,
return 0;
}
/*
* Simple allocator for Ext4 fast commit replay path. It searches for blocks
* linearly starting at the goal block and also excludes the blocks which
* are going to be in use after fast commit replay.
*/
static ext4_fsblk_t ext4_mb_new_blocks_simple(handle_t *handle,
struct ext4_allocation_request *ar, int *errp)
{
struct buffer_head *bitmap_bh;
struct super_block *sb = ar->inode->i_sb;
ext4_group_t group;
ext4_grpblk_t blkoff;
int i;
ext4_fsblk_t goal, block;
struct ext4_super_block *es = EXT4_SB(sb)->s_es;
goal = ar->goal;
if (goal < le32_to_cpu(es->s_first_data_block) ||
goal >= ext4_blocks_count(es))
goal = le32_to_cpu(es->s_first_data_block);
ar->len = 0;
ext4_get_group_no_and_offset(sb, goal, &group, &blkoff);
for (; group < ext4_get_groups_count(sb); group++) {
bitmap_bh = ext4_read_block_bitmap(sb, group);
if (IS_ERR(bitmap_bh)) {
*errp = PTR_ERR(bitmap_bh);
pr_warn("Failed to read block bitmap\n");
return 0;
}
ext4_get_group_no_and_offset(sb,
max(ext4_group_first_block_no(sb, group), goal),
NULL, &blkoff);
i = mb_find_next_zero_bit(bitmap_bh->b_data, sb->s_blocksize,
blkoff);
brelse(bitmap_bh);
if (i >= sb->s_blocksize)
continue;
if (ext4_fc_replay_check_excluded(sb,
ext4_group_first_block_no(sb, group) + i))
continue;
break;
}
if (group >= ext4_get_groups_count(sb) && i >= sb->s_blocksize)
return 0;
block = ext4_group_first_block_no(sb, group) + i;
ext4_mb_mark_bb(sb, block, 1, 1);
ar->len = 1;
return block;
}
static void ext4_free_blocks_simple(struct inode *inode, ext4_fsblk_t block,
unsigned long count)
{
struct buffer_head *bitmap_bh;
struct super_block *sb = inode->i_sb;
struct ext4_group_desc *gdp;
struct buffer_head *gdp_bh;
ext4_group_t group;
ext4_grpblk_t blkoff;
int already_freed = 0, err, i;
ext4_get_group_no_and_offset(sb, block, &group, &blkoff);
bitmap_bh = ext4_read_block_bitmap(sb, group);
if (IS_ERR(bitmap_bh)) {
err = PTR_ERR(bitmap_bh);
pr_warn("Failed to read block bitmap\n");
return;
}
gdp = ext4_get_group_desc(sb, group, &gdp_bh);
if (!gdp)
return;
for (i = 0; i < count; i++) {
if (!mb_test_bit(blkoff + i, bitmap_bh->b_data))
already_freed++;
}
mb_clear_bits(bitmap_bh->b_data, blkoff, count);
err = ext4_handle_dirty_metadata(NULL, NULL, bitmap_bh);
if (err)
return;
ext4_free_group_clusters_set(
sb, gdp, ext4_free_group_clusters(sb, gdp) +
count - already_freed);
ext4_block_bitmap_csum_set(sb, group, gdp, bitmap_bh);
ext4_group_desc_csum_set(sb, group, gdp);
ext4_handle_dirty_metadata(NULL, NULL, gdp_bh);
sync_dirty_buffer(bitmap_bh);
sync_dirty_buffer(gdp_bh);
brelse(bitmap_bh);
}
/**
* ext4_free_blocks() -- Free given blocks and update quota
* @handle: handle for this transaction
@ -5104,6 +5277,13 @@ void ext4_free_blocks(handle_t *handle, struct inode *inode,
int err = 0;
int ret;
sbi = EXT4_SB(sb);
if (sbi->s_mount_state & EXT4_FC_REPLAY) {
ext4_free_blocks_simple(inode, block, count);
return;
}
might_sleep();
if (bh) {
if (block)
@ -5112,7 +5292,6 @@ void ext4_free_blocks(handle_t *handle, struct inode *inode,
block = bh->b_blocknr;
}
sbi = EXT4_SB(sb);
if (!(flags & EXT4_FREE_BLOCKS_VALIDATED) &&
!ext4_inode_block_valid(inode, block, count)) {
ext4_error(sb, "Freeing blocks not in datazone - "

View File

@ -85,15 +85,11 @@ static int read_mmp_block(struct super_block *sb, struct buffer_head **bh,
}
}
get_bh(*bh);
lock_buffer(*bh);
(*bh)->b_end_io = end_buffer_read_sync;
submit_bh(REQ_OP_READ, REQ_META | REQ_PRIO, *bh);
wait_on_buffer(*bh);
if (!buffer_uptodate(*bh)) {
ret = -EIO;
ret = ext4_read_bh(*bh, REQ_META | REQ_PRIO, NULL);
if (ret)
goto warn_exit;
}
mmp = (struct mmp_struct *)((*bh)->b_data);
if (le32_to_cpu(mmp->mmp_magic) != EXT4_MMP_MAGIC) {
ret = -EFSCORRUPTED;

View File

@ -215,7 +215,7 @@ mext_page_mkuptodate(struct page *page, unsigned from, unsigned to)
for (i = 0; i < nr; i++) {
bh = arr[i];
if (!bh_uptodate_or_lock(bh)) {
err = bh_submit_read(bh);
err = ext4_read_bh(bh, 0, NULL);
if (err)
return err;
}

View File

@ -2553,7 +2553,7 @@ static int ext4_delete_entry(handle_t *handle,
* for checking S_ISDIR(inode) (since the INODE_INDEX feature will not be set
* on regular files) and to avoid creating huge/slow non-HTREE directories.
*/
static void ext4_inc_count(handle_t *handle, struct inode *inode)
static void ext4_inc_count(struct inode *inode)
{
inc_nlink(inode);
if (is_dx(inode) &&
@ -2565,7 +2565,7 @@ static void ext4_inc_count(handle_t *handle, struct inode *inode)
* If a directory had nlink == 1, then we should let it be 1. This indicates
* directory has >EXT4_LINK_MAX subdirs.
*/
static void ext4_dec_count(handle_t *handle, struct inode *inode)
static void ext4_dec_count(struct inode *inode)
{
if (!S_ISDIR(inode->i_mode) || inode->i_nlink > 2)
drop_nlink(inode);
@ -2610,7 +2610,7 @@ static int ext4_create(struct inode *dir, struct dentry *dentry, umode_t mode,
bool excl)
{
handle_t *handle;
struct inode *inode;
struct inode *inode, *inode_save;
int err, credits, retries = 0;
err = dquot_initialize(dir);
@ -2628,7 +2628,11 @@ static int ext4_create(struct inode *dir, struct dentry *dentry, umode_t mode,
inode->i_op = &ext4_file_inode_operations;
inode->i_fop = &ext4_file_operations;
ext4_set_aops(inode);
inode_save = inode;
ihold(inode_save);
err = ext4_add_nondir(handle, dentry, &inode);
ext4_fc_track_create(inode_save, dentry);
iput(inode_save);
}
if (handle)
ext4_journal_stop(handle);
@ -2643,7 +2647,7 @@ static int ext4_mknod(struct inode *dir, struct dentry *dentry,
umode_t mode, dev_t rdev)
{
handle_t *handle;
struct inode *inode;
struct inode *inode, *inode_save;
int err, credits, retries = 0;
err = dquot_initialize(dir);
@ -2660,7 +2664,12 @@ static int ext4_mknod(struct inode *dir, struct dentry *dentry,
if (!IS_ERR(inode)) {
init_special_inode(inode, inode->i_mode, rdev);
inode->i_op = &ext4_special_inode_operations;
inode_save = inode;
ihold(inode_save);
err = ext4_add_nondir(handle, dentry, &inode);
if (!err)
ext4_fc_track_create(inode_save, dentry);
iput(inode_save);
}
if (handle)
ext4_journal_stop(handle);
@ -2739,7 +2748,7 @@ struct ext4_dir_entry_2 *ext4_init_dot_dotdot(struct inode *inode,
return ext4_next_entry(de, blocksize);
}
static int ext4_init_new_dir(handle_t *handle, struct inode *dir,
int ext4_init_new_dir(handle_t *handle, struct inode *dir,
struct inode *inode)
{
struct buffer_head *dir_block = NULL;
@ -2824,7 +2833,9 @@ static int ext4_mkdir(struct inode *dir, struct dentry *dentry, umode_t mode)
iput(inode);
goto out_retry;
}
ext4_inc_count(handle, dir);
ext4_fc_track_create(inode, dentry);
ext4_inc_count(dir);
ext4_update_dx_flag(dir);
err = ext4_mark_inode_dirty(handle, dir);
if (err)
@ -3162,8 +3173,9 @@ static int ext4_rmdir(struct inode *dir, struct dentry *dentry)
retval = ext4_mark_inode_dirty(handle, inode);
if (retval)
goto end_rmdir;
ext4_dec_count(handle, dir);
ext4_dec_count(dir);
ext4_update_dx_flag(dir);
ext4_fc_track_unlink(inode, dentry);
retval = ext4_mark_inode_dirty(handle, dir);
#ifdef CONFIG_UNICODE
@ -3184,42 +3196,32 @@ static int ext4_rmdir(struct inode *dir, struct dentry *dentry)
return retval;
}
static int ext4_unlink(struct inode *dir, struct dentry *dentry)
int __ext4_unlink(struct inode *dir, const struct qstr *d_name,
struct inode *inode)
{
int retval;
struct inode *inode;
int retval = -ENOENT;
struct buffer_head *bh;
struct ext4_dir_entry_2 *de;
handle_t *handle = NULL;
int skip_remove_dentry = 0;
if (unlikely(ext4_forced_shutdown(EXT4_SB(dir->i_sb))))
return -EIO;
bh = ext4_find_entry(dir, d_name, &de, NULL);
if (IS_ERR(bh))
return PTR_ERR(bh);
trace_ext4_unlink_enter(dir, dentry);
/* Initialize quotas before so that eventual writes go
* in separate transaction */
retval = dquot_initialize(dir);
if (retval)
goto out_trace;
retval = dquot_initialize(d_inode(dentry));
if (retval)
goto out_trace;
bh = ext4_find_entry(dir, &dentry->d_name, &de, NULL);
if (IS_ERR(bh)) {
retval = PTR_ERR(bh);
goto out_trace;
}
if (!bh) {
retval = -ENOENT;
goto out_trace;
}
inode = d_inode(dentry);
if (!bh)
return -ENOENT;
if (le32_to_cpu(de->inode) != inode->i_ino) {
retval = -EFSCORRUPTED;
goto out_bh;
/*
* It's okay if we find dont find dentry which matches
* the inode. That's because it might have gotten
* renamed to a different inode number
*/
if (EXT4_SB(inode->i_sb)->s_mount_state & EXT4_FC_REPLAY)
skip_remove_dentry = 1;
else
goto out_bh;
}
handle = ext4_journal_start(dir, EXT4_HT_DIR,
@ -3232,17 +3234,21 @@ static int ext4_unlink(struct inode *dir, struct dentry *dentry)
if (IS_DIRSYNC(dir))
ext4_handle_sync(handle);
retval = ext4_delete_entry(handle, dir, de, bh);
if (retval)
goto out_handle;
dir->i_ctime = dir->i_mtime = current_time(dir);
ext4_update_dx_flag(dir);
retval = ext4_mark_inode_dirty(handle, dir);
if (retval)
goto out_handle;
if (!skip_remove_dentry) {
retval = ext4_delete_entry(handle, dir, de, bh);
if (retval)
goto out_handle;
dir->i_ctime = dir->i_mtime = current_time(dir);
ext4_update_dx_flag(dir);
retval = ext4_mark_inode_dirty(handle, dir);
if (retval)
goto out_handle;
} else {
retval = 0;
}
if (inode->i_nlink == 0)
ext4_warning_inode(inode, "Deleting file '%.*s' with no links",
dentry->d_name.len, dentry->d_name.name);
d_name->len, d_name->name);
else
drop_nlink(inode);
if (!inode->i_nlink)
@ -3250,6 +3256,35 @@ static int ext4_unlink(struct inode *dir, struct dentry *dentry)
inode->i_ctime = current_time(inode);
retval = ext4_mark_inode_dirty(handle, inode);
out_handle:
ext4_journal_stop(handle);
out_bh:
brelse(bh);
return retval;
}
static int ext4_unlink(struct inode *dir, struct dentry *dentry)
{
int retval;
if (unlikely(ext4_forced_shutdown(EXT4_SB(dir->i_sb))))
return -EIO;
trace_ext4_unlink_enter(dir, dentry);
/*
* Initialize quotas before so that eventual writes go
* in separate transaction
*/
retval = dquot_initialize(dir);
if (retval)
goto out_trace;
retval = dquot_initialize(d_inode(dentry));
if (retval)
goto out_trace;
retval = __ext4_unlink(dir, &dentry->d_name, d_inode(dentry));
if (!retval)
ext4_fc_track_unlink(d_inode(dentry), dentry);
#ifdef CONFIG_UNICODE
/* VFS negative dentries are incompatible with Encoding and
* Case-insensitiveness. Eventually we'll want avoid
@ -3261,10 +3296,6 @@ static int ext4_unlink(struct inode *dir, struct dentry *dentry)
d_invalidate(dentry);
#endif
out_handle:
ext4_journal_stop(handle);
out_bh:
brelse(bh);
out_trace:
trace_ext4_unlink_exit(dentry, retval);
return retval;
@ -3345,7 +3376,8 @@ static int ext4_symlink(struct inode *dir,
*/
drop_nlink(inode);
err = ext4_orphan_add(handle, inode);
ext4_journal_stop(handle);
if (handle)
ext4_journal_stop(handle);
handle = NULL;
if (err)
goto err_drop_inode;
@ -3399,29 +3431,10 @@ static int ext4_symlink(struct inode *dir,
return err;
}
static int ext4_link(struct dentry *old_dentry,
struct inode *dir, struct dentry *dentry)
int __ext4_link(struct inode *dir, struct inode *inode, struct dentry *dentry)
{
handle_t *handle;
struct inode *inode = d_inode(old_dentry);
int err, retries = 0;
if (inode->i_nlink >= EXT4_LINK_MAX)
return -EMLINK;
err = fscrypt_prepare_link(old_dentry, dir, dentry);
if (err)
return err;
if ((ext4_test_inode_flag(dir, EXT4_INODE_PROJINHERIT)) &&
(!projid_eq(EXT4_I(dir)->i_projid,
EXT4_I(old_dentry->d_inode)->i_projid)))
return -EXDEV;
err = dquot_initialize(dir);
if (err)
return err;
retry:
handle = ext4_journal_start(dir, EXT4_HT_DIR,
(EXT4_DATA_TRANS_BLOCKS(dir->i_sb) +
@ -3433,11 +3446,12 @@ static int ext4_link(struct dentry *old_dentry,
ext4_handle_sync(handle);
inode->i_ctime = current_time(inode);
ext4_inc_count(handle, inode);
ext4_inc_count(inode);
ihold(inode);
err = ext4_add_entry(handle, dentry, inode);
if (!err) {
ext4_fc_track_link(inode, dentry);
err = ext4_mark_inode_dirty(handle, inode);
/* this can happen only for tmpfile being
* linked the first time
@ -3455,6 +3469,29 @@ static int ext4_link(struct dentry *old_dentry,
return err;
}
static int ext4_link(struct dentry *old_dentry,
struct inode *dir, struct dentry *dentry)
{
struct inode *inode = d_inode(old_dentry);
int err;
if (inode->i_nlink >= EXT4_LINK_MAX)
return -EMLINK;
err = fscrypt_prepare_link(old_dentry, dir, dentry);
if (err)
return err;
if ((ext4_test_inode_flag(dir, EXT4_INODE_PROJINHERIT)) &&
(!projid_eq(EXT4_I(dir)->i_projid,
EXT4_I(old_dentry->d_inode)->i_projid)))
return -EXDEV;
err = dquot_initialize(dir);
if (err)
return err;
return __ext4_link(dir, inode, dentry);
}
/*
* Try to find buffer head where contains the parent block.
@ -3630,9 +3667,9 @@ static void ext4_update_dir_count(handle_t *handle, struct ext4_renament *ent)
{
if (ent->dir_nlink_delta) {
if (ent->dir_nlink_delta == -1)
ext4_dec_count(handle, ent->dir);
ext4_dec_count(ent->dir);
else
ext4_inc_count(handle, ent->dir);
ext4_inc_count(ent->dir);
ext4_mark_inode_dirty(handle, ent->dir);
}
}
@ -3844,7 +3881,7 @@ static int ext4_rename(struct inode *old_dir, struct dentry *old_dentry,
}
if (new.inode) {
ext4_dec_count(handle, new.inode);
ext4_dec_count(new.inode);
new.inode->i_ctime = current_time(new.inode);
}
old.dir->i_ctime = old.dir->i_mtime = current_time(old.dir);
@ -3854,14 +3891,14 @@ static int ext4_rename(struct inode *old_dir, struct dentry *old_dentry,
if (retval)
goto end_rename;
ext4_dec_count(handle, old.dir);
ext4_dec_count(old.dir);
if (new.inode) {
/* checked ext4_empty_dir above, can't have another
* parent, ext4_dec_count() won't work for many-linked
* dirs */
clear_nlink(new.inode);
} else {
ext4_inc_count(handle, new.dir);
ext4_inc_count(new.dir);
ext4_update_dx_flag(new.dir);
retval = ext4_mark_inode_dirty(handle, new.dir);
if (unlikely(retval))
@ -3871,6 +3908,22 @@ static int ext4_rename(struct inode *old_dir, struct dentry *old_dentry,
retval = ext4_mark_inode_dirty(handle, old.dir);
if (unlikely(retval))
goto end_rename;
if (S_ISDIR(old.inode->i_mode)) {
/*
* We disable fast commits here that's because the
* replay code is not yet capable of changing dot dot
* dirents in directories.
*/
ext4_fc_mark_ineligible(old.inode->i_sb,
EXT4_FC_REASON_RENAME_DIR);
} else {
if (new.inode)
ext4_fc_track_unlink(new.inode, new.dentry);
ext4_fc_track_link(old.inode, new.dentry);
ext4_fc_track_unlink(old.inode, old.dentry);
}
if (new.inode) {
retval = ext4_mark_inode_dirty(handle, new.inode);
if (unlikely(retval))
@ -4014,7 +4067,8 @@ static int ext4_cross_rename(struct inode *old_dir, struct dentry *old_dentry,
retval = ext4_mark_inode_dirty(handle, new.inode);
if (unlikely(retval))
goto end_rename;
ext4_fc_mark_ineligible(new.inode->i_sb,
EXT4_FC_REASON_CROSS_RENAME);
if (old.dir_bh) {
retval = ext4_rename_dir_finish(handle, &old, new.dir->i_ino);
if (retval)

View File

@ -843,8 +843,10 @@ static int add_new_gdb(handle_t *handle, struct inode *inode,
BUFFER_TRACE(dind, "get_write_access");
err = ext4_journal_get_write_access(handle, dind);
if (unlikely(err))
if (unlikely(err)) {
ext4_std_error(sb, err);
goto errout;
}
/* ext4_reserve_inode_write() gets a reference on the iloc */
err = ext4_reserve_inode_write(handle, inode, &iloc);
@ -1243,7 +1245,7 @@ static struct buffer_head *ext4_get_bitmap(struct super_block *sb, __u64 block)
if (unlikely(!bh))
return NULL;
if (!bh_uptodate_or_lock(bh)) {
if (bh_submit_read(bh) < 0) {
if (ext4_read_bh(bh, 0, NULL) < 0) {
brelse(bh);
return NULL;
}
@ -1806,8 +1808,8 @@ int ext4_group_extend(struct super_block *sb, struct ext4_super_block *es,
o_blocks_count + add, add);
/* See if the device is actually as big as what was requested */
bh = sb_bread(sb, o_blocks_count + add - 1);
if (!bh) {
bh = ext4_sb_bread(sb, o_blocks_count + add - 1, 0);
if (IS_ERR(bh)) {
ext4_warning(sb, "can't read last block, resize aborted");
return -ENOSPC;
}
@ -1932,8 +1934,8 @@ int ext4_resize_fs(struct super_block *sb, ext4_fsblk_t n_blocks_count)
int meta_bg;
/* See if the device is actually as big as what was requested */
bh = sb_bread(sb, n_blocks_count - 1);
if (!bh) {
bh = ext4_sb_bread(sb, n_blocks_count - 1, 0);
if (IS_ERR(bh)) {
ext4_warning(sb, "can't read last block, resize aborted");
return -ENOSPC;
}

View File

@ -141,27 +141,115 @@ MODULE_ALIAS_FS("ext3");
MODULE_ALIAS("ext3");
#define IS_EXT3_SB(sb) ((sb)->s_bdev->bd_holder == &ext3_fs_type)
static inline void __ext4_read_bh(struct buffer_head *bh, int op_flags,
bh_end_io_t *end_io)
{
/*
* buffer's verified bit is no longer valid after reading from
* disk again due to write out error, clear it to make sure we
* recheck the buffer contents.
*/
clear_buffer_verified(bh);
bh->b_end_io = end_io ? end_io : end_buffer_read_sync;
get_bh(bh);
submit_bh(REQ_OP_READ, op_flags, bh);
}
void ext4_read_bh_nowait(struct buffer_head *bh, int op_flags,
bh_end_io_t *end_io)
{
BUG_ON(!buffer_locked(bh));
if (ext4_buffer_uptodate(bh)) {
unlock_buffer(bh);
return;
}
__ext4_read_bh(bh, op_flags, end_io);
}
int ext4_read_bh(struct buffer_head *bh, int op_flags, bh_end_io_t *end_io)
{
BUG_ON(!buffer_locked(bh));
if (ext4_buffer_uptodate(bh)) {
unlock_buffer(bh);
return 0;
}
__ext4_read_bh(bh, op_flags, end_io);
wait_on_buffer(bh);
if (buffer_uptodate(bh))
return 0;
return -EIO;
}
int ext4_read_bh_lock(struct buffer_head *bh, int op_flags, bool wait)
{
if (trylock_buffer(bh)) {
if (wait)
return ext4_read_bh(bh, op_flags, NULL);
ext4_read_bh_nowait(bh, op_flags, NULL);
return 0;
}
if (wait) {
wait_on_buffer(bh);
if (buffer_uptodate(bh))
return 0;
return -EIO;
}
return 0;
}
/*
* This works like sb_bread() except it uses ERR_PTR for error
* This works like __bread_gfp() except it uses ERR_PTR for error
* returns. Currently with sb_bread it's impossible to distinguish
* between ENOMEM and EIO situations (since both result in a NULL
* return.
*/
struct buffer_head *
ext4_sb_bread(struct super_block *sb, sector_t block, int op_flags)
static struct buffer_head *__ext4_sb_bread_gfp(struct super_block *sb,
sector_t block, int op_flags,
gfp_t gfp)
{
struct buffer_head *bh = sb_getblk(sb, block);
struct buffer_head *bh;
int ret;
bh = sb_getblk_gfp(sb, block, gfp);
if (bh == NULL)
return ERR_PTR(-ENOMEM);
if (ext4_buffer_uptodate(bh))
return bh;
ll_rw_block(REQ_OP_READ, REQ_META | op_flags, 1, &bh);
wait_on_buffer(bh);
if (buffer_uptodate(bh))
return bh;
put_bh(bh);
return ERR_PTR(-EIO);
ret = ext4_read_bh_lock(bh, REQ_META | op_flags, true);
if (ret) {
put_bh(bh);
return ERR_PTR(ret);
}
return bh;
}
struct buffer_head *ext4_sb_bread(struct super_block *sb, sector_t block,
int op_flags)
{
return __ext4_sb_bread_gfp(sb, block, op_flags, __GFP_MOVABLE);
}
struct buffer_head *ext4_sb_bread_unmovable(struct super_block *sb,
sector_t block)
{
return __ext4_sb_bread_gfp(sb, block, 0, 0);
}
void ext4_sb_breadahead_unmovable(struct super_block *sb, sector_t block)
{
struct buffer_head *bh = sb_getblk_gfp(sb, block, 0);
if (likely(bh)) {
ext4_read_bh_lock(bh, REQ_RAHEAD, false);
brelse(bh);
}
}
static int ext4_verify_csum_type(struct super_block *sb,
@ -201,7 +289,18 @@ void ext4_superblock_csum_set(struct super_block *sb)
if (!ext4_has_metadata_csum(sb))
return;
/*
* Locking the superblock prevents the scenario
* where:
* 1) a first thread pauses during checksum calculation.
* 2) a second thread updates the superblock, recalculates
* the checksum, and updates s_checksum
* 3) the first thread resumes and finishes its checksum calculation
* and updates s_checksum with a potentially stale or torn value.
*/
lock_buffer(EXT4_SB(sb)->s_sbh);
es->s_checksum = ext4_superblock_csum(sb, es);
unlock_buffer(EXT4_SB(sb)->s_sbh);
}
ext4_fsblk_t ext4_block_bitmap(struct super_block *sb,
@ -472,6 +571,89 @@ static void ext4_journal_commit_callback(journal_t *journal, transaction_t *txn)
spin_unlock(&sbi->s_md_lock);
}
/*
* This writepage callback for write_cache_pages()
* takes care of a few cases after page cleaning.
*
* write_cache_pages() already checks for dirty pages
* and calls clear_page_dirty_for_io(), which we want,
* to write protect the pages.
*
* However, we may have to redirty a page (see below.)
*/
static int ext4_journalled_writepage_callback(struct page *page,
struct writeback_control *wbc,
void *data)
{
transaction_t *transaction = (transaction_t *) data;
struct buffer_head *bh, *head;
struct journal_head *jh;
bh = head = page_buffers(page);
do {
/*
* We have to redirty a page in these cases:
* 1) If buffer is dirty, it means the page was dirty because it
* contains a buffer that needs checkpointing. So the dirty bit
* needs to be preserved so that checkpointing writes the buffer
* properly.
* 2) If buffer is not part of the committing transaction
* (we may have just accidentally come across this buffer because
* inode range tracking is not exact) or if the currently running
* transaction already contains this buffer as well, dirty bit
* needs to be preserved so that the buffer gets writeprotected
* properly on running transaction's commit.
*/
jh = bh2jh(bh);
if (buffer_dirty(bh) ||
(jh && (jh->b_transaction != transaction ||
jh->b_next_transaction))) {
redirty_page_for_writepage(wbc, page);
goto out;
}
} while ((bh = bh->b_this_page) != head);
out:
return AOP_WRITEPAGE_ACTIVATE;
}
static int ext4_journalled_submit_inode_data_buffers(struct jbd2_inode *jinode)
{
struct address_space *mapping = jinode->i_vfs_inode->i_mapping;
struct writeback_control wbc = {
.sync_mode = WB_SYNC_ALL,
.nr_to_write = LONG_MAX,
.range_start = jinode->i_dirty_start,
.range_end = jinode->i_dirty_end,
};
return write_cache_pages(mapping, &wbc,
ext4_journalled_writepage_callback,
jinode->i_transaction);
}
static int ext4_journal_submit_inode_data_buffers(struct jbd2_inode *jinode)
{
int ret;
if (ext4_should_journal_data(jinode->i_vfs_inode))
ret = ext4_journalled_submit_inode_data_buffers(jinode);
else
ret = jbd2_journal_submit_inode_data_buffers(jinode);
return ret;
}
static int ext4_journal_finish_inode_data_buffers(struct jbd2_inode *jinode)
{
int ret = 0;
if (!ext4_should_journal_data(jinode->i_vfs_inode))
ret = jbd2_journal_finish_inode_data_buffers(jinode);
return ret;
}
static bool system_going_down(void)
{
return system_state == SYSTEM_HALT || system_state == SYSTEM_POWER_OFF
@ -939,10 +1121,10 @@ static void ext4_blkdev_put(struct block_device *bdev)
static void ext4_blkdev_remove(struct ext4_sb_info *sbi)
{
struct block_device *bdev;
bdev = sbi->journal_bdev;
bdev = sbi->s_journal_bdev;
if (bdev) {
ext4_blkdev_put(bdev);
sbi->journal_bdev = NULL;
sbi->s_journal_bdev = NULL;
}
}
@ -1073,14 +1255,14 @@ static void ext4_put_super(struct super_block *sb)
sync_blockdev(sb->s_bdev);
invalidate_bdev(sb->s_bdev);
if (sbi->journal_bdev && sbi->journal_bdev != sb->s_bdev) {
if (sbi->s_journal_bdev && sbi->s_journal_bdev != sb->s_bdev) {
/*
* Invalidate the journal device's buffers. We don't want them
* floating about in memory - the physical journal device may
* hotswapped, and it breaks the `ro-after' testing code.
*/
sync_blockdev(sbi->journal_bdev);
invalidate_bdev(sbi->journal_bdev);
sync_blockdev(sbi->s_journal_bdev);
invalidate_bdev(sbi->s_journal_bdev);
ext4_blkdev_remove(sbi);
}
@ -1149,6 +1331,8 @@ static struct inode *ext4_alloc_inode(struct super_block *sb)
ei->i_datasync_tid = 0;
atomic_set(&ei->i_unwritten, 0);
INIT_WORK(&ei->i_rsv_conversion_work, ext4_end_io_rsv_work);
ext4_fc_init_inode(&ei->vfs_inode);
mutex_init(&ei->i_fc_lock);
return &ei->vfs_inode;
}
@ -1166,6 +1350,10 @@ static int ext4_drop_inode(struct inode *inode)
static void ext4_free_in_core_inode(struct inode *inode)
{
fscrypt_free_inode(inode);
if (!list_empty(&(EXT4_I(inode)->i_fc_list))) {
pr_warn("%s: inode %ld still in fc list",
__func__, inode->i_ino);
}
kmem_cache_free(ext4_inode_cachep, EXT4_I(inode));
}
@ -1191,6 +1379,7 @@ static void init_once(void *foo)
init_rwsem(&ei->i_data_sem);
init_rwsem(&ei->i_mmap_sem);
inode_init_once(&ei->vfs_inode);
ext4_fc_init_inode(&ei->vfs_inode);
}
static int __init init_inodecache(void)
@ -1219,6 +1408,7 @@ static void destroy_inodecache(void)
void ext4_clear_inode(struct inode *inode)
{
ext4_fc_del(inode);
invalidate_inode_buffers(inode);
clear_inode(inode);
ext4_discard_preallocations(inode, 0);
@ -1526,7 +1716,11 @@ enum {
Opt_dioread_nolock, Opt_dioread_lock,
Opt_discard, Opt_nodiscard, Opt_init_itable, Opt_noinit_itable,
Opt_max_dir_size_kb, Opt_nojournal_checksum, Opt_nombcache,
Opt_prefetch_block_bitmaps,
Opt_prefetch_block_bitmaps, Opt_no_fc,
#ifdef CONFIG_EXT4_DEBUG
Opt_fc_debug_max_replay,
#endif
Opt_fc_debug_force
};
static const match_table_t tokens = {
@ -1613,6 +1807,11 @@ static const match_table_t tokens = {
{Opt_init_itable, "init_itable=%u"},
{Opt_init_itable, "init_itable"},
{Opt_noinit_itable, "noinit_itable"},
{Opt_no_fc, "no_fc"},
{Opt_fc_debug_force, "fc_debug_force"},
#ifdef CONFIG_EXT4_DEBUG
{Opt_fc_debug_max_replay, "fc_debug_max_replay=%u"},
#endif
{Opt_max_dir_size_kb, "max_dir_size_kb=%u"},
{Opt_test_dummy_encryption, "test_dummy_encryption=%s"},
{Opt_test_dummy_encryption, "test_dummy_encryption"},
@ -1739,6 +1938,7 @@ static int clear_qf_name(struct super_block *sb, int qtype)
#define MOPT_EXT4_ONLY (MOPT_NO_EXT2 | MOPT_NO_EXT3)
#define MOPT_STRING 0x0400
#define MOPT_SKIP 0x0800
#define MOPT_2 0x1000
static const struct mount_opts {
int token;
@ -1839,6 +2039,13 @@ static const struct mount_opts {
{Opt_nombcache, EXT4_MOUNT_NO_MBCACHE, MOPT_SET},
{Opt_prefetch_block_bitmaps, EXT4_MOUNT_PREFETCH_BLOCK_BITMAPS,
MOPT_SET},
{Opt_no_fc, EXT4_MOUNT2_JOURNAL_FAST_COMMIT,
MOPT_CLEAR | MOPT_2 | MOPT_EXT4_ONLY},
{Opt_fc_debug_force, EXT4_MOUNT2_JOURNAL_FAST_COMMIT,
MOPT_SET | MOPT_2 | MOPT_EXT4_ONLY},
#ifdef CONFIG_EXT4_DEBUG
{Opt_fc_debug_max_replay, 0, MOPT_GTE0},
#endif
{Opt_err, 0, 0}
};
@ -2048,6 +2255,10 @@ static int handle_mount_opt(struct super_block *sb, char *opt, int token,
sbi->s_li_wait_mult = arg;
} else if (token == Opt_max_dir_size_kb) {
sbi->s_max_dir_size_kb = arg;
#ifdef CONFIG_EXT4_DEBUG
} else if (token == Opt_fc_debug_max_replay) {
sbi->s_fc_debug_max_replay = arg;
#endif
} else if (token == Opt_stripe) {
sbi->s_stripe = arg;
} else if (token == Opt_resuid) {
@ -2216,10 +2427,17 @@ static int handle_mount_opt(struct super_block *sb, char *opt, int token,
WARN_ON(1);
return -1;
}
if (arg != 0)
sbi->s_mount_opt |= m->mount_opt;
else
sbi->s_mount_opt &= ~m->mount_opt;
if (m->flags & MOPT_2) {
if (arg != 0)
sbi->s_mount_opt2 |= m->mount_opt;
else
sbi->s_mount_opt2 &= ~m->mount_opt;
} else {
if (arg != 0)
sbi->s_mount_opt |= m->mount_opt;
else
sbi->s_mount_opt &= ~m->mount_opt;
}
}
return 1;
}
@ -2436,6 +2654,9 @@ static int _ext4_show_options(struct seq_file *seq, struct super_block *sb,
SEQ_OPTS_PUTS("dax=inode");
}
if (test_opt2(sb, JOURNAL_FAST_COMMIT))
SEQ_OPTS_PUTS("fast_commit");
ext4_show_quota_options(seq, sb);
return 0;
}
@ -3754,7 +3975,7 @@ int ext4_calculate_overhead(struct super_block *sb)
* Add the internal journal blocks whether the journal has been
* loaded or not
*/
if (sbi->s_journal && !sbi->journal_bdev)
if (sbi->s_journal && !sbi->s_journal_bdev)
overhead += EXT4_NUM_B2C(sbi, sbi->s_journal->j_maxlen);
else if (ext4_has_feature_journal(sb) && !sbi->s_journal && j_inum) {
/* j_inum for internal journal is non-zero */
@ -3868,8 +4089,11 @@ static int ext4_fill_super(struct super_block *sb, void *data, int silent)
logical_sb_block = sb_block;
}
if (!(bh = sb_bread_unmovable(sb, logical_sb_block))) {
bh = ext4_sb_bread_unmovable(sb, logical_sb_block);
if (IS_ERR(bh)) {
ext4_msg(sb, KERN_ERR, "unable to read superblock");
ret = PTR_ERR(bh);
bh = NULL;
goto out_fail;
}
/*
@ -3936,6 +4160,8 @@ static int ext4_fill_super(struct super_block *sb, void *data, int silent)
#ifdef CONFIG_EXT4_FS_POSIX_ACL
set_opt(sb, POSIX_ACL);
#endif
if (ext4_has_feature_fast_commit(sb))
set_opt2(sb, JOURNAL_FAST_COMMIT);
/* don't forget to enable journal_csum when metadata_csum is enabled. */
if (ext4_has_metadata_csum(sb))
set_opt(sb, JOURNAL_CHECKSUM);
@ -4265,10 +4491,12 @@ static int ext4_fill_super(struct super_block *sb, void *data, int silent)
brelse(bh);
logical_sb_block = sb_block * EXT4_MIN_BLOCK_SIZE;
offset = do_div(logical_sb_block, blocksize);
bh = sb_bread_unmovable(sb, logical_sb_block);
if (!bh) {
bh = ext4_sb_bread_unmovable(sb, logical_sb_block);
if (IS_ERR(bh)) {
ext4_msg(sb, KERN_ERR,
"Can't read superblock on 2nd try");
ret = PTR_ERR(bh);
bh = NULL;
goto failed_mount;
}
es = (struct ext4_super_block *)(bh->b_data + offset);
@ -4480,18 +4708,20 @@ static int ext4_fill_super(struct super_block *sb, void *data, int silent)
/* Pre-read the descriptors into the buffer cache */
for (i = 0; i < db_count; i++) {
block = descriptor_loc(sb, logical_sb_block, i);
sb_breadahead_unmovable(sb, block);
ext4_sb_breadahead_unmovable(sb, block);
}
for (i = 0; i < db_count; i++) {
struct buffer_head *bh;
block = descriptor_loc(sb, logical_sb_block, i);
bh = sb_bread_unmovable(sb, block);
if (!bh) {
bh = ext4_sb_bread_unmovable(sb, block);
if (IS_ERR(bh)) {
ext4_msg(sb, KERN_ERR,
"can't read group descriptor %d", i);
db_count = i;
ret = PTR_ERR(bh);
bh = NULL;
goto failed_mount2;
}
rcu_read_lock();
@ -4539,6 +4769,26 @@ static int ext4_fill_super(struct super_block *sb, void *data, int silent)
INIT_LIST_HEAD(&sbi->s_orphan); /* unlinked but open files */
mutex_init(&sbi->s_orphan_lock);
/* Initialize fast commit stuff */
atomic_set(&sbi->s_fc_subtid, 0);
atomic_set(&sbi->s_fc_ineligible_updates, 0);
INIT_LIST_HEAD(&sbi->s_fc_q[FC_Q_MAIN]);
INIT_LIST_HEAD(&sbi->s_fc_q[FC_Q_STAGING]);
INIT_LIST_HEAD(&sbi->s_fc_dentry_q[FC_Q_MAIN]);
INIT_LIST_HEAD(&sbi->s_fc_dentry_q[FC_Q_STAGING]);
sbi->s_fc_bytes = 0;
sbi->s_mount_state &= ~EXT4_FC_INELIGIBLE;
sbi->s_mount_state &= ~EXT4_FC_COMMITTING;
spin_lock_init(&sbi->s_fc_lock);
memset(&sbi->s_fc_stats, 0, sizeof(sbi->s_fc_stats));
sbi->s_fc_replay_state.fc_regions = NULL;
sbi->s_fc_replay_state.fc_regions_size = 0;
sbi->s_fc_replay_state.fc_regions_used = 0;
sbi->s_fc_replay_state.fc_regions_valid = 0;
sbi->s_fc_replay_state.fc_modified_inodes = NULL;
sbi->s_fc_replay_state.fc_modified_inodes_size = 0;
sbi->s_fc_replay_state.fc_modified_inodes_used = 0;
sb->s_root = NULL;
needs_recovery = (es->s_last_orphan != 0 ||
@ -4588,6 +4838,7 @@ static int ext4_fill_super(struct super_block *sb, void *data, int silent)
sbi->s_def_mount_opt &= ~EXT4_MOUNT_JOURNAL_CHECKSUM;
clear_opt(sb, JOURNAL_CHECKSUM);
clear_opt(sb, DATA_FLAGS);
clear_opt2(sb, JOURNAL_FAST_COMMIT);
sbi->s_journal = NULL;
needs_recovery = 0;
goto no_journal;
@ -4646,6 +4897,10 @@ static int ext4_fill_super(struct super_block *sb, void *data, int silent)
set_task_ioprio(sbi->s_journal->j_task, journal_ioprio);
sbi->s_journal->j_commit_callback = ext4_journal_commit_callback;
sbi->s_journal->j_submit_inode_data_buffers =
ext4_journal_submit_inode_data_buffers;
sbi->s_journal->j_finish_inode_data_buffers =
ext4_journal_finish_inode_data_buffers;
no_journal:
if (!test_opt(sb, NO_MBCACHE)) {
@ -4748,6 +5003,7 @@ static int ext4_fill_super(struct super_block *sb, void *data, int silent)
goto failed_mount4a;
}
}
ext4_fc_replay_cleanup(sb);
ext4_ext_init(sb);
err = ext4_mb_init(sb);
@ -4814,9 +5070,8 @@ static int ext4_fill_super(struct super_block *sb, void *data, int silent)
* used to detect the metadata async write error.
*/
spin_lock_init(&sbi->s_bdev_wb_lock);
if (!sb_rdonly(sb))
errseq_check_and_advance(&sb->s_bdev->bd_inode->i_mapping->wb_err,
&sbi->s_bdev_wb_err);
errseq_check_and_advance(&sb->s_bdev->bd_inode->i_mapping->wb_err,
&sbi->s_bdev_wb_err);
sb->s_bdev->bd_super = sb;
EXT4_SB(sb)->s_mount_state |= EXT4_ORPHAN_FS;
ext4_orphan_cleanup(sb, es);
@ -4872,6 +5127,7 @@ static int ext4_fill_super(struct super_block *sb, void *data, int silent)
failed_mount8:
ext4_unregister_sysfs(sb);
kobject_put(&sbi->s_kobj);
failed_mount7:
ext4_unregister_li_request(sb);
failed_mount6:
@ -4960,6 +5216,7 @@ static void ext4_init_journal_params(struct super_block *sb, journal_t *journal)
journal->j_commit_interval = sbi->s_commit_interval;
journal->j_min_batch_time = sbi->s_min_batch_time;
journal->j_max_batch_time = sbi->s_max_batch_time;
ext4_fc_init(sb, journal);
write_lock(&journal->j_state_lock);
if (test_opt(sb, BARRIER))
@ -5102,9 +5359,7 @@ static journal_t *ext4_get_dev_journal(struct super_block *sb,
goto out_bdev;
}
journal->j_private = sb;
ll_rw_block(REQ_OP_READ, REQ_META | REQ_PRIO, 1, &journal->j_sb_buffer);
wait_on_buffer(journal->j_sb_buffer);
if (!buffer_uptodate(journal->j_sb_buffer)) {
if (ext4_read_bh_lock(journal->j_sb_buffer, REQ_META | REQ_PRIO, true)) {
ext4_msg(sb, KERN_ERR, "I/O error on journal device");
goto out_journal;
}
@ -5114,7 +5369,7 @@ static journal_t *ext4_get_dev_journal(struct super_block *sb,
be32_to_cpu(journal->j_superblock->s_nr_users));
goto out_journal;
}
EXT4_SB(sb)->journal_bdev = bdev;
EXT4_SB(sb)->s_journal_bdev = bdev;
ext4_init_journal_params(sb, journal);
return journal;
@ -5707,14 +5962,6 @@ static int ext4_remount(struct super_block *sb, int *flags, char *data)
goto restore_opts;
}
/*
* Update the original bdev mapping's wb_err value
* which could be used to detect the metadata async
* write error.
*/
errseq_check_and_advance(&sb->s_bdev->bd_inode->i_mapping->wb_err,
&sbi->s_bdev_wb_err);
/*
* Mounting a RDONLY partition read-write, so reread
* and store the current valid flag. (It may have
@ -5760,7 +6007,7 @@ static int ext4_remount(struct super_block *sb, int *flags, char *data)
* Releasing of existing data is done when we are sure remount will
* succeed.
*/
if (test_opt(sb, BLOCK_VALIDITY) && !sbi->system_blks) {
if (test_opt(sb, BLOCK_VALIDITY) && !sbi->s_system_blks) {
err = ext4_setup_system_zone(sb);
if (err)
goto restore_opts;
@ -5786,7 +6033,7 @@ static int ext4_remount(struct super_block *sb, int *flags, char *data)
}
}
#endif
if (!test_opt(sb, BLOCK_VALIDITY) && sbi->system_blks)
if (!test_opt(sb, BLOCK_VALIDITY) && sbi->s_system_blks)
ext4_release_system_zone(sb);
/*
@ -5809,7 +6056,7 @@ static int ext4_remount(struct super_block *sb, int *flags, char *data)
sbi->s_commit_interval = old_opts.s_commit_interval;
sbi->s_min_batch_time = old_opts.s_min_batch_time;
sbi->s_max_batch_time = old_opts.s_max_batch_time;
if (!test_opt(sb, BLOCK_VALIDITY) && sbi->system_blks)
if (!test_opt(sb, BLOCK_VALIDITY) && sbi->s_system_blks)
ext4_release_system_zone(sb);
#ifdef CONFIG_QUOTA
sbi->s_jquota_fmt = old_opts.s_jquota_fmt;
@ -6042,6 +6289,11 @@ static int ext4_quota_on(struct super_block *sb, int type, int format_id,
/* Quotafile not on the same filesystem? */
if (path->dentry->d_sb != sb)
return -EXDEV;
/* Quota already enabled for this file? */
if (IS_NOQUOTA(d_inode(path->dentry)))
return -EBUSY;
/* Journaling quota? */
if (EXT4_SB(sb)->s_qf_names[type]) {
/* Quotafile not in fs root? */
@ -6309,6 +6561,10 @@ static ssize_t ext4_quota_write(struct super_block *sb, int type,
brelse(bh);
out:
if (inode->i_size < off + len) {
ext4_fc_track_range(inode,
(inode->i_size > 0 ? inode->i_size - 1 : 0)
>> inode->i_sb->s_blocksize_bits,
(off + len) >> inode->i_sb->s_blocksize_bits);
i_size_write(inode, off + len);
EXT4_I(inode)->i_disksize = inode->i_size;
err2 = ext4_mark_inode_dirty(handle, inode);
@ -6437,6 +6693,11 @@ static int __init ext4_init_fs(void)
err = init_inodecache();
if (err)
goto out1;
err = ext4_fc_init_dentry_cache();
if (err)
goto out05;
register_as_ext3();
register_as_ext2();
err = register_filesystem(&ext4_fs_type);
@ -6447,6 +6708,7 @@ static int __init ext4_init_fs(void)
out:
unregister_as_ext2();
unregister_as_ext3();
out05:
destroy_inodecache();
out1:
ext4_exit_mballoc();

View File

@ -521,6 +521,8 @@ int ext4_register_sysfs(struct super_block *sb)
proc_create_single_data("es_shrinker_info", S_IRUGO,
sbi->s_proc, ext4_seq_es_shrinker_info_show,
sb);
proc_create_single_data("fc_info", 0444, sbi->s_proc,
ext4_fc_info_show, sb);
proc_create_seq_data("mb_groups", S_IRUGO, sbi->s_proc,
&ext4_mb_seq_groups_ops, sb);
}

View File

@ -2419,6 +2419,7 @@ ext4_xattr_set_handle(handle_t *handle, struct inode *inode, int name_index,
if (IS_SYNC(inode))
ext4_handle_sync(handle);
}
ext4_fc_mark_ineligible(inode->i_sb, EXT4_FC_REASON_XATTR);
cleanup:
brelse(is.iloc.bh);
@ -2496,6 +2497,7 @@ ext4_xattr_set(struct inode *inode, int name_index, const char *name,
if (error == 0)
error = error2;
}
ext4_fc_mark_ineligible(inode->i_sb, EXT4_FC_REASON_XATTR);
return error;
}
@ -2928,6 +2930,7 @@ int ext4_xattr_delete_inode(handle_t *handle, struct inode *inode,
error);
goto cleanup;
}
ext4_fc_mark_ineligible(inode->i_sb, EXT4_FC_REASON_XATTR);
}
error = 0;
cleanup:

View File

@ -187,21 +187,49 @@ static int journal_wait_on_commit_record(journal_t *journal,
* use writepages() because with delayed allocation we may be doing
* block allocation in writepages().
*/
static int journal_submit_inode_data_buffers(struct address_space *mapping,
loff_t dirty_start, loff_t dirty_end)
int jbd2_journal_submit_inode_data_buffers(struct jbd2_inode *jinode)
{
int ret;
struct address_space *mapping = jinode->i_vfs_inode->i_mapping;
struct writeback_control wbc = {
.sync_mode = WB_SYNC_ALL,
.nr_to_write = mapping->nrpages * 2,
.range_start = dirty_start,
.range_end = dirty_end,
.range_start = jinode->i_dirty_start,
.range_end = jinode->i_dirty_end,
};
ret = generic_writepages(mapping, &wbc);
return ret;
/*
* submit the inode data buffers. We use writepage
* instead of writepages. Because writepages can do
* block allocation with delalloc. We need to write
* only allocated blocks here.
*/
return generic_writepages(mapping, &wbc);
}
/* Send all the data buffers related to an inode */
int jbd2_submit_inode_data(struct jbd2_inode *jinode)
{
if (!jinode || !(jinode->i_flags & JI_WRITE_DATA))
return 0;
trace_jbd2_submit_inode_data(jinode->i_vfs_inode);
return jbd2_journal_submit_inode_data_buffers(jinode);
}
EXPORT_SYMBOL(jbd2_submit_inode_data);
int jbd2_wait_inode_data(journal_t *journal, struct jbd2_inode *jinode)
{
if (!jinode || !(jinode->i_flags & JI_WAIT_DATA) ||
!jinode->i_vfs_inode || !jinode->i_vfs_inode->i_mapping)
return 0;
return filemap_fdatawait_range_keep_errors(
jinode->i_vfs_inode->i_mapping, jinode->i_dirty_start,
jinode->i_dirty_end);
}
EXPORT_SYMBOL(jbd2_wait_inode_data);
/*
* Submit all the data buffers of inode associated with the transaction to
* disk.
@ -215,29 +243,20 @@ static int journal_submit_data_buffers(journal_t *journal,
{
struct jbd2_inode *jinode;
int err, ret = 0;
struct address_space *mapping;
spin_lock(&journal->j_list_lock);
list_for_each_entry(jinode, &commit_transaction->t_inode_list, i_list) {
loff_t dirty_start = jinode->i_dirty_start;
loff_t dirty_end = jinode->i_dirty_end;
if (!(jinode->i_flags & JI_WRITE_DATA))
continue;
mapping = jinode->i_vfs_inode->i_mapping;
jinode->i_flags |= JI_COMMIT_RUNNING;
spin_unlock(&journal->j_list_lock);
/*
* submit the inode data buffers. We use writepage
* instead of writepages. Because writepages can do
* block allocation with delalloc. We need to write
* only allocated blocks here.
*/
/* submit the inode data buffers. */
trace_jbd2_submit_inode_data(jinode->i_vfs_inode);
err = journal_submit_inode_data_buffers(mapping, dirty_start,
dirty_end);
if (!ret)
ret = err;
if (journal->j_submit_inode_data_buffers) {
err = journal->j_submit_inode_data_buffers(jinode);
if (!ret)
ret = err;
}
spin_lock(&journal->j_list_lock);
J_ASSERT(jinode->i_transaction == commit_transaction);
jinode->i_flags &= ~JI_COMMIT_RUNNING;
@ -248,6 +267,15 @@ static int journal_submit_data_buffers(journal_t *journal,
return ret;
}
int jbd2_journal_finish_inode_data_buffers(struct jbd2_inode *jinode)
{
struct address_space *mapping = jinode->i_vfs_inode->i_mapping;
return filemap_fdatawait_range_keep_errors(mapping,
jinode->i_dirty_start,
jinode->i_dirty_end);
}
/*
* Wait for data submitted for writeout, refile inodes to proper
* transaction if needed.
@ -262,18 +290,16 @@ static int journal_finish_inode_data_buffers(journal_t *journal,
/* For locking, see the comment in journal_submit_data_buffers() */
spin_lock(&journal->j_list_lock);
list_for_each_entry(jinode, &commit_transaction->t_inode_list, i_list) {
loff_t dirty_start = jinode->i_dirty_start;
loff_t dirty_end = jinode->i_dirty_end;
if (!(jinode->i_flags & JI_WAIT_DATA))
continue;
jinode->i_flags |= JI_COMMIT_RUNNING;
spin_unlock(&journal->j_list_lock);
err = filemap_fdatawait_range_keep_errors(
jinode->i_vfs_inode->i_mapping, dirty_start,
dirty_end);
if (!ret)
ret = err;
/* wait for the inode data buffers writeout. */
if (journal->j_finish_inode_data_buffers) {
err = journal->j_finish_inode_data_buffers(jinode);
if (!ret)
ret = err;
}
spin_lock(&journal->j_list_lock);
jinode->i_flags &= ~JI_COMMIT_RUNNING;
smp_mb();
@ -413,6 +439,20 @@ void jbd2_journal_commit_transaction(journal_t *journal)
J_ASSERT(journal->j_running_transaction != NULL);
J_ASSERT(journal->j_committing_transaction == NULL);
write_lock(&journal->j_state_lock);
journal->j_flags |= JBD2_FULL_COMMIT_ONGOING;
while (journal->j_flags & JBD2_FAST_COMMIT_ONGOING) {
DEFINE_WAIT(wait);
prepare_to_wait(&journal->j_fc_wait, &wait,
TASK_UNINTERRUPTIBLE);
write_unlock(&journal->j_state_lock);
schedule();
write_lock(&journal->j_state_lock);
finish_wait(&journal->j_fc_wait, &wait);
}
write_unlock(&journal->j_state_lock);
commit_transaction = journal->j_running_transaction;
trace_jbd2_start_commit(journal, commit_transaction);
@ -420,6 +460,7 @@ void jbd2_journal_commit_transaction(journal_t *journal)
commit_transaction->t_tid);
write_lock(&journal->j_state_lock);
journal->j_fc_off = 0;
J_ASSERT(commit_transaction->t_state == T_RUNNING);
commit_transaction->t_state = T_LOCKED;
@ -1119,12 +1160,16 @@ void jbd2_journal_commit_transaction(journal_t *journal)
if (journal->j_commit_callback)
journal->j_commit_callback(journal, commit_transaction);
if (journal->j_fc_cleanup_callback)
journal->j_fc_cleanup_callback(journal, 1);
trace_jbd2_end_commit(journal, commit_transaction);
jbd_debug(1, "JBD2: commit %d complete, head %d\n",
journal->j_commit_sequence, journal->j_tail_sequence);
write_lock(&journal->j_state_lock);
journal->j_flags &= ~JBD2_FULL_COMMIT_ONGOING;
journal->j_flags &= ~JBD2_FAST_COMMIT_ONGOING;
spin_lock(&journal->j_list_lock);
commit_transaction->t_state = T_FINISHED;
/* Check if the transaction can be dropped now that we are finished */
@ -1136,6 +1181,7 @@ void jbd2_journal_commit_transaction(journal_t *journal)
spin_unlock(&journal->j_list_lock);
write_unlock(&journal->j_state_lock);
wake_up(&journal->j_wait_done_commit);
wake_up(&journal->j_fc_wait);
/*
* Calculate overall stats

View File

@ -91,6 +91,8 @@ EXPORT_SYMBOL(jbd2_journal_try_to_free_buffers);
EXPORT_SYMBOL(jbd2_journal_force_commit);
EXPORT_SYMBOL(jbd2_journal_inode_ranged_write);
EXPORT_SYMBOL(jbd2_journal_inode_ranged_wait);
EXPORT_SYMBOL(jbd2_journal_submit_inode_data_buffers);
EXPORT_SYMBOL(jbd2_journal_finish_inode_data_buffers);
EXPORT_SYMBOL(jbd2_journal_init_jbd_inode);
EXPORT_SYMBOL(jbd2_journal_release_jbd_inode);
EXPORT_SYMBOL(jbd2_journal_begin_ordered_truncate);
@ -157,7 +159,9 @@ static void commit_timeout(struct timer_list *t)
*
* 1) COMMIT: Every so often we need to commit the current state of the
* filesystem to disk. The journal thread is responsible for writing
* all of the metadata buffers to disk.
* all of the metadata buffers to disk. If a fast commit is ongoing
* journal thread waits until it's done and then continues from
* there on.
*
* 2) CHECKPOINT: We cannot reuse a used section of the log file until all
* of the data in that part of the log has been rewritten elsewhere on
@ -714,6 +718,75 @@ int jbd2_log_wait_commit(journal_t *journal, tid_t tid)
return err;
}
/*
* Start a fast commit. If there's an ongoing fast or full commit wait for
* it to complete. Returns 0 if a new fast commit was started. Returns -EALREADY
* if a fast commit is not needed, either because there's an already a commit
* going on or this tid has already been committed. Returns -EINVAL if no jbd2
* commit has yet been performed.
*/
int jbd2_fc_begin_commit(journal_t *journal, tid_t tid)
{
/*
* Fast commits only allowed if at least one full commit has
* been processed.
*/
if (!journal->j_stats.ts_tid)
return -EINVAL;
if (tid <= journal->j_commit_sequence)
return -EALREADY;
write_lock(&journal->j_state_lock);
if (journal->j_flags & JBD2_FULL_COMMIT_ONGOING ||
(journal->j_flags & JBD2_FAST_COMMIT_ONGOING)) {
DEFINE_WAIT(wait);
prepare_to_wait(&journal->j_fc_wait, &wait,
TASK_UNINTERRUPTIBLE);
write_unlock(&journal->j_state_lock);
schedule();
finish_wait(&journal->j_fc_wait, &wait);
return -EALREADY;
}
journal->j_flags |= JBD2_FAST_COMMIT_ONGOING;
write_unlock(&journal->j_state_lock);
return 0;
}
EXPORT_SYMBOL(jbd2_fc_begin_commit);
/*
* Stop a fast commit. If fallback is set, this function starts commit of
* TID tid before any other fast commit can start.
*/
static int __jbd2_fc_end_commit(journal_t *journal, tid_t tid, bool fallback)
{
if (journal->j_fc_cleanup_callback)
journal->j_fc_cleanup_callback(journal, 0);
write_lock(&journal->j_state_lock);
journal->j_flags &= ~JBD2_FAST_COMMIT_ONGOING;
if (fallback)
journal->j_flags |= JBD2_FULL_COMMIT_ONGOING;
write_unlock(&journal->j_state_lock);
wake_up(&journal->j_fc_wait);
if (fallback)
return jbd2_complete_transaction(journal, tid);
return 0;
}
int jbd2_fc_end_commit(journal_t *journal)
{
return __jbd2_fc_end_commit(journal, 0, 0);
}
EXPORT_SYMBOL(jbd2_fc_end_commit);
int jbd2_fc_end_commit_fallback(journal_t *journal, tid_t tid)
{
return __jbd2_fc_end_commit(journal, tid, 1);
}
EXPORT_SYMBOL(jbd2_fc_end_commit_fallback);
/* Return 1 when transaction with given tid has already committed. */
int jbd2_transaction_committed(journal_t *journal, tid_t tid)
{
@ -782,6 +855,110 @@ int jbd2_journal_next_log_block(journal_t *journal, unsigned long long *retp)
return jbd2_journal_bmap(journal, blocknr, retp);
}
/* Map one fast commit buffer for use by the file system */
int jbd2_fc_get_buf(journal_t *journal, struct buffer_head **bh_out)
{
unsigned long long pblock;
unsigned long blocknr;
int ret = 0;
struct buffer_head *bh;
int fc_off;
*bh_out = NULL;
write_lock(&journal->j_state_lock);
if (journal->j_fc_off + journal->j_fc_first < journal->j_fc_last) {
fc_off = journal->j_fc_off;
blocknr = journal->j_fc_first + fc_off;
journal->j_fc_off++;
} else {
ret = -EINVAL;
}
write_unlock(&journal->j_state_lock);
if (ret)
return ret;
ret = jbd2_journal_bmap(journal, blocknr, &pblock);
if (ret)
return ret;
bh = __getblk(journal->j_dev, pblock, journal->j_blocksize);
if (!bh)
return -ENOMEM;
lock_buffer(bh);
clear_buffer_uptodate(bh);
set_buffer_dirty(bh);
unlock_buffer(bh);
journal->j_fc_wbuf[fc_off] = bh;
*bh_out = bh;
return 0;
}
EXPORT_SYMBOL(jbd2_fc_get_buf);
/*
* Wait on fast commit buffers that were allocated by jbd2_fc_get_buf
* for completion.
*/
int jbd2_fc_wait_bufs(journal_t *journal, int num_blks)
{
struct buffer_head *bh;
int i, j_fc_off;
read_lock(&journal->j_state_lock);
j_fc_off = journal->j_fc_off;
read_unlock(&journal->j_state_lock);
/*
* Wait in reverse order to minimize chances of us being woken up before
* all IOs have completed
*/
for (i = j_fc_off - 1; i >= j_fc_off - num_blks; i--) {
bh = journal->j_fc_wbuf[i];
wait_on_buffer(bh);
put_bh(bh);
journal->j_fc_wbuf[i] = NULL;
if (unlikely(!buffer_uptodate(bh)))
return -EIO;
}
return 0;
}
EXPORT_SYMBOL(jbd2_fc_wait_bufs);
/*
* Wait on fast commit buffers that were allocated by jbd2_fc_get_buf
* for completion.
*/
int jbd2_fc_release_bufs(journal_t *journal)
{
struct buffer_head *bh;
int i, j_fc_off;
read_lock(&journal->j_state_lock);
j_fc_off = journal->j_fc_off;
read_unlock(&journal->j_state_lock);
/*
* Wait in reverse order to minimize chances of us being woken up before
* all IOs have completed
*/
for (i = j_fc_off - 1; i >= 0; i--) {
bh = journal->j_fc_wbuf[i];
if (!bh)
break;
put_bh(bh);
journal->j_fc_wbuf[i] = NULL;
}
return 0;
}
EXPORT_SYMBOL(jbd2_fc_release_bufs);
/*
* Conversion of logical to physical block numbers for the journal
*
@ -1140,6 +1317,7 @@ static journal_t *journal_init_common(struct block_device *bdev,
init_waitqueue_head(&journal->j_wait_commit);
init_waitqueue_head(&journal->j_wait_updates);
init_waitqueue_head(&journal->j_wait_reserved);
init_waitqueue_head(&journal->j_fc_wait);
mutex_init(&journal->j_abort_mutex);
mutex_init(&journal->j_barrier);
mutex_init(&journal->j_checkpoint_mutex);
@ -1179,6 +1357,14 @@ static journal_t *journal_init_common(struct block_device *bdev,
if (!journal->j_wbuf)
goto err_cleanup;
if (journal->j_fc_wbufsize > 0) {
journal->j_fc_wbuf = kmalloc_array(journal->j_fc_wbufsize,
sizeof(struct buffer_head *),
GFP_KERNEL);
if (!journal->j_fc_wbuf)
goto err_cleanup;
}
bh = getblk_unmovable(journal->j_dev, start, journal->j_blocksize);
if (!bh) {
pr_err("%s: Cannot get buffer for journal superblock\n",
@ -1192,11 +1378,23 @@ static journal_t *journal_init_common(struct block_device *bdev,
err_cleanup:
kfree(journal->j_wbuf);
kfree(journal->j_fc_wbuf);
jbd2_journal_destroy_revoke(journal);
kfree(journal);
return NULL;
}
int jbd2_fc_init(journal_t *journal, int num_fc_blks)
{
journal->j_fc_wbufsize = num_fc_blks;
journal->j_fc_wbuf = kmalloc_array(journal->j_fc_wbufsize,
sizeof(struct buffer_head *), GFP_KERNEL);
if (!journal->j_fc_wbuf)
return -ENOMEM;
return 0;
}
EXPORT_SYMBOL(jbd2_fc_init);
/* jbd2_journal_init_dev and jbd2_journal_init_inode:
*
* Create a journal structure assigned some fixed set of disk blocks to
@ -1314,11 +1512,20 @@ static int journal_reset(journal_t *journal)
}
journal->j_first = first;
journal->j_last = last;
journal->j_head = first;
journal->j_tail = first;
journal->j_free = last - first;
if (jbd2_has_feature_fast_commit(journal) &&
journal->j_fc_wbufsize > 0) {
journal->j_fc_last = last;
journal->j_last = last - journal->j_fc_wbufsize;
journal->j_fc_first = journal->j_last + 1;
journal->j_fc_off = 0;
} else {
journal->j_last = last;
}
journal->j_head = journal->j_first;
journal->j_tail = journal->j_first;
journal->j_free = journal->j_last - journal->j_first;
journal->j_tail_sequence = journal->j_transaction_sequence;
journal->j_commit_sequence = journal->j_transaction_sequence - 1;
@ -1464,6 +1671,7 @@ int jbd2_journal_update_sb_log_tail(journal_t *journal, tid_t tail_tid,
static void jbd2_mark_journal_empty(journal_t *journal, int write_op)
{
journal_superblock_t *sb = journal->j_superblock;
bool had_fast_commit = false;
BUG_ON(!mutex_is_locked(&journal->j_checkpoint_mutex));
lock_buffer(journal->j_sb_buffer);
@ -1477,9 +1685,20 @@ static void jbd2_mark_journal_empty(journal_t *journal, int write_op)
sb->s_sequence = cpu_to_be32(journal->j_tail_sequence);
sb->s_start = cpu_to_be32(0);
if (jbd2_has_feature_fast_commit(journal)) {
/*
* When journal is clean, no need to commit fast commit flag and
* make file system incompatible with older kernels.
*/
jbd2_clear_feature_fast_commit(journal);
had_fast_commit = true;
}
jbd2_write_superblock(journal, write_op);
if (had_fast_commit)
jbd2_set_feature_fast_commit(journal);
/* Log is no longer empty */
write_lock(&journal->j_state_lock);
journal->j_flags |= JBD2_FLUSHED;
@ -1663,9 +1882,18 @@ static int load_superblock(journal_t *journal)
journal->j_tail_sequence = be32_to_cpu(sb->s_sequence);
journal->j_tail = be32_to_cpu(sb->s_start);
journal->j_first = be32_to_cpu(sb->s_first);
journal->j_last = be32_to_cpu(sb->s_maxlen);
journal->j_errno = be32_to_cpu(sb->s_errno);
if (jbd2_has_feature_fast_commit(journal) &&
journal->j_fc_wbufsize > 0) {
journal->j_fc_last = be32_to_cpu(sb->s_maxlen);
journal->j_last = journal->j_fc_last - journal->j_fc_wbufsize;
journal->j_fc_first = journal->j_last + 1;
journal->j_fc_off = 0;
} else {
journal->j_last = be32_to_cpu(sb->s_maxlen);
}
return 0;
}
@ -1726,6 +1954,9 @@ int jbd2_journal_load(journal_t *journal)
*/
journal->j_flags &= ~JBD2_ABORT;
if (journal->j_fc_wbufsize > 0)
jbd2_journal_set_features(journal, 0, 0,
JBD2_FEATURE_INCOMPAT_FAST_COMMIT);
/* OK, we've finished with the dynamic journal bits:
* reinitialise the dynamic contents of the superblock in memory
* and reset them on disk. */
@ -1809,6 +2040,8 @@ int jbd2_journal_destroy(journal_t *journal)
jbd2_journal_destroy_revoke(journal);
if (journal->j_chksum_driver)
crypto_free_shash(journal->j_chksum_driver);
if (journal->j_fc_wbufsize > 0)
kfree(journal->j_fc_wbuf);
kfree(journal->j_wbuf);
kfree(journal);

View File

@ -35,7 +35,6 @@ struct recovery_info
int nr_revoke_hits;
};
enum passtype {PASS_SCAN, PASS_REVOKE, PASS_REPLAY};
static int do_one_pass(journal_t *journal,
struct recovery_info *info, enum passtype pass);
static int scan_revoke_records(journal_t *, struct buffer_head *,
@ -225,10 +224,51 @@ static int count_tags(journal_t *journal, struct buffer_head *bh)
/* Make sure we wrap around the log correctly! */
#define wrap(journal, var) \
do { \
if (var >= (journal)->j_last) \
var -= ((journal)->j_last - (journal)->j_first); \
unsigned long _wrap_last = \
jbd2_has_feature_fast_commit(journal) ? \
(journal)->j_fc_last : (journal)->j_last; \
\
if (var >= _wrap_last) \
var -= (_wrap_last - (journal)->j_first); \
} while (0)
static int fc_do_one_pass(journal_t *journal,
struct recovery_info *info, enum passtype pass)
{
unsigned int expected_commit_id = info->end_transaction;
unsigned long next_fc_block;
struct buffer_head *bh;
int err = 0;
next_fc_block = journal->j_fc_first;
if (!journal->j_fc_replay_callback)
return 0;
while (next_fc_block <= journal->j_fc_last) {
jbd_debug(3, "Fast commit replay: next block %ld",
next_fc_block);
err = jread(&bh, journal, next_fc_block);
if (err) {
jbd_debug(3, "Fast commit replay: read error");
break;
}
jbd_debug(3, "Processing fast commit blk with seq %d");
err = journal->j_fc_replay_callback(journal, bh, pass,
next_fc_block - journal->j_fc_first,
expected_commit_id);
next_fc_block++;
if (err < 0 || err == JBD2_FC_REPLAY_STOP)
break;
err = 0;
}
if (err)
jbd_debug(3, "Fast commit replay failed, err = %d\n", err);
return err;
}
/**
* jbd2_journal_recover - recovers a on-disk journal
* @journal: the journal to recover
@ -428,6 +468,8 @@ static int do_one_pass(journal_t *journal,
__u32 crc32_sum = ~0; /* Transactional Checksums */
int descr_csum_size = 0;
int block_error = 0;
bool need_check_commit_time = false;
__u64 last_trans_commit_time = 0, commit_time;
/*
* First thing is to establish what we expect to find in the log
@ -470,7 +512,9 @@ static int do_one_pass(journal_t *journal,
break;
jbd_debug(2, "Scanning for sequence ID %u at %lu/%lu\n",
next_commit_ID, next_log_block, journal->j_last);
next_commit_ID, next_log_block,
jbd2_has_feature_fast_commit(journal) ?
journal->j_fc_last : journal->j_last);
/* Skip over each chunk of the transaction looking
* either the next descriptor block or the final commit
@ -520,12 +564,21 @@ static int do_one_pass(journal_t *journal,
if (descr_csum_size > 0 &&
!jbd2_descriptor_block_csum_verify(journal,
bh->b_data)) {
printk(KERN_ERR "JBD2: Invalid checksum "
"recovering block %lu in log\n",
next_log_block);
err = -EFSBADCRC;
brelse(bh);
goto failed;
/*
* PASS_SCAN can see stale blocks due to lazy
* journal init. Don't error out on those yet.
*/
if (pass != PASS_SCAN) {
pr_err("JBD2: Invalid checksum recovering block %lu in log\n",
next_log_block);
err = -EFSBADCRC;
brelse(bh);
goto failed;
}
need_check_commit_time = true;
jbd_debug(1,
"invalid descriptor block found in %lu\n",
next_log_block);
}
/* If it is a valid descriptor block, replay it
@ -535,6 +588,7 @@ static int do_one_pass(journal_t *journal,
if (pass != PASS_REPLAY) {
if (pass == PASS_SCAN &&
jbd2_has_feature_checksum(journal) &&
!need_check_commit_time &&
!info->end_transaction) {
if (calc_chksums(journal, bh,
&next_log_block,
@ -683,11 +737,41 @@ static int do_one_pass(journal_t *journal,
* mentioned conditions. Hence assume
* "Interrupted Commit".)
*/
commit_time = be64_to_cpu(
((struct commit_header *)bh->b_data)->h_commit_sec);
/*
* If need_check_commit_time is set, it means we are in
* PASS_SCAN and csum verify failed before. If
* commit_time is increasing, it's the same journal,
* otherwise it is stale journal block, just end this
* recovery.
*/
if (need_check_commit_time) {
if (commit_time >= last_trans_commit_time) {
pr_err("JBD2: Invalid checksum found in transaction %u\n",
next_commit_ID);
err = -EFSBADCRC;
brelse(bh);
goto failed;
}
ignore_crc_mismatch:
/*
* It likely does not belong to same journal,
* just end this recovery with success.
*/
jbd_debug(1, "JBD2: Invalid checksum ignored in transaction %u, likely stale data\n",
next_commit_ID);
err = 0;
brelse(bh);
goto done;
}
/* Found an expected commit block: if checksums
* are present verify them in PASS_SCAN; else not
/*
* Found an expected commit block: if checksums
* are present, verify them in PASS_SCAN; else not
* much to do other than move on to the next sequence
* number. */
* number.
*/
if (pass == PASS_SCAN &&
jbd2_has_feature_checksum(journal)) {
struct commit_header *cbh =
@ -719,6 +803,8 @@ static int do_one_pass(journal_t *journal,
!jbd2_commit_block_csum_verify(journal,
bh->b_data)) {
chksum_error:
if (commit_time < last_trans_commit_time)
goto ignore_crc_mismatch;
info->end_transaction = next_commit_ID;
if (!jbd2_has_feature_async_commit(journal)) {
@ -728,11 +814,24 @@ static int do_one_pass(journal_t *journal,
break;
}
}
if (pass == PASS_SCAN)
last_trans_commit_time = commit_time;
brelse(bh);
next_commit_ID++;
continue;
case JBD2_REVOKE_BLOCK:
/*
* Check revoke block crc in pass_scan, if csum verify
* failed, check commit block time later.
*/
if (pass == PASS_SCAN &&
!jbd2_descriptor_block_csum_verify(journal,
bh->b_data)) {
jbd_debug(1, "JBD2: invalid revoke block found in %lu\n",
next_log_block);
need_check_commit_time = true;
}
/* If we aren't in the REVOKE pass, then we can
* just skip over this block. */
if (pass != PASS_REVOKE) {
@ -777,6 +876,13 @@ static int do_one_pass(journal_t *journal,
success = -EIO;
}
}
if (jbd2_has_feature_fast_commit(journal) && pass != PASS_REVOKE) {
err = fc_do_one_pass(journal, info, pass);
if (err)
success = err;
}
if (block_error && success == 0)
success = -EIO;
return success;
@ -800,9 +906,6 @@ static int scan_revoke_records(journal_t *journal, struct buffer_head *bh,
offset = sizeof(jbd2_journal_revoke_header_t);
rcount = be32_to_cpu(header->r_count);
if (!jbd2_descriptor_block_csum_verify(journal, header))
return -EFSBADCRC;
if (jbd2_journal_has_csum_v2or3(journal))
csum_size = sizeof(struct jbd2_journal_block_tail);
if (rcount > journal->j_blocksize - csum_size)

View File

@ -883,6 +883,10 @@ int ocfs2_journal_init(struct ocfs2_journal *journal, int *dirty)
OCFS2_JOURNAL_DIRTY_FL);
journal->j_journal = j_journal;
journal->j_journal->j_submit_inode_data_buffers =
jbd2_journal_submit_inode_data_buffers;
journal->j_journal->j_finish_inode_data_buffers =
jbd2_journal_finish_inode_data_buffers;
journal->j_inode = inode;
journal->j_bh = bh;

View File

@ -289,6 +289,7 @@ typedef struct journal_superblock_s
#define JBD2_FEATURE_INCOMPAT_ASYNC_COMMIT 0x00000004
#define JBD2_FEATURE_INCOMPAT_CSUM_V2 0x00000008
#define JBD2_FEATURE_INCOMPAT_CSUM_V3 0x00000010
#define JBD2_FEATURE_INCOMPAT_FAST_COMMIT 0x00000020
/* See "journal feature predicate functions" below */
@ -299,7 +300,8 @@ typedef struct journal_superblock_s
JBD2_FEATURE_INCOMPAT_64BIT | \
JBD2_FEATURE_INCOMPAT_ASYNC_COMMIT | \
JBD2_FEATURE_INCOMPAT_CSUM_V2 | \
JBD2_FEATURE_INCOMPAT_CSUM_V3)
JBD2_FEATURE_INCOMPAT_CSUM_V3 | \
JBD2_FEATURE_INCOMPAT_FAST_COMMIT)
#ifdef __KERNEL__
@ -452,8 +454,8 @@ struct jbd2_inode {
struct jbd2_revoke_table_s;
/**
* struct handle_s - The handle_s type is the concrete type associated with
* handle_t.
* struct jbd2_journal_handle - The jbd2_journal_handle type is the concrete
* type associated with handle_t.
* @h_transaction: Which compound transaction is this update a part of?
* @h_journal: Which journal handle belongs to - used iff h_reserved set.
* @h_rsv_handle: Handle reserved for finishing the logical operation.
@ -629,7 +631,9 @@ struct transaction_s
struct journal_head *t_shadow_list;
/*
* List of inodes whose data we've modified in data=ordered mode.
* List of inodes associated with the transaction; e.g., ext4 uses
* this to track inodes in data=ordered and data=journal mode that
* need special handling on transaction commit; also used by ocfs2.
* [j_list_lock]
*/
struct list_head t_inode_list;
@ -747,6 +751,11 @@ jbd2_time_diff(unsigned long start, unsigned long end)
#define JBD2_NR_BATCH 64
enum passtype {PASS_SCAN, PASS_REVOKE, PASS_REPLAY};
#define JBD2_FC_REPLAY_STOP 0
#define JBD2_FC_REPLAY_CONTINUE 1
/**
* struct journal_s - The journal_s type is the concrete type associated with
* journal_t.
@ -857,6 +866,13 @@ struct journal_s
*/
wait_queue_head_t j_wait_reserved;
/**
* @j_fc_wait:
*
* Wait queue to wait for completion of async fast commits.
*/
wait_queue_head_t j_fc_wait;
/**
* @j_checkpoint_mutex:
*
@ -914,6 +930,30 @@ struct journal_s
*/
unsigned long j_last;
/**
* @j_fc_first:
*
* The block number of the first fast commit block in the journal
* [j_state_lock].
*/
unsigned long j_fc_first;
/**
* @j_fc_off:
*
* Number of fast commit blocks currently allocated.
* [j_state_lock].
*/
unsigned long j_fc_off;
/**
* @j_fc_last:
*
* The block number one beyond the last fast commit block in the journal
* [j_state_lock].
*/
unsigned long j_fc_last;
/**
* @j_dev: Device where we store the journal.
*/
@ -1064,6 +1104,12 @@ struct journal_s
*/
struct buffer_head **j_wbuf;
/**
* @j_fc_wbuf: Array of fast commit bhs for
* jbd2_journal_commit_transaction.
*/
struct buffer_head **j_fc_wbuf;
/**
* @j_wbufsize:
*
@ -1071,6 +1117,13 @@ struct journal_s
*/
int j_wbufsize;
/**
* @j_fc_wbufsize:
*
* Size of @j_fc_wbuf array.
*/
int j_fc_wbufsize;
/**
* @j_last_sync_writer:
*
@ -1111,6 +1164,27 @@ struct journal_s
void (*j_commit_callback)(journal_t *,
transaction_t *);
/**
* @j_submit_inode_data_buffers:
*
* This function is called for all inodes associated with the
* committing transaction marked with JI_WRITE_DATA flag
* before we start to write out the transaction to the journal.
*/
int (*j_submit_inode_data_buffers)
(struct jbd2_inode *);
/**
* @j_finish_inode_data_buffers:
*
* This function is called for all inodes associated with the
* committing transaction marked with JI_WAIT_DATA flag
* after we have written the transaction to the journal
* but before we write out the commit block.
*/
int (*j_finish_inode_data_buffers)
(struct jbd2_inode *);
/*
* Journal statistics
*/
@ -1170,6 +1244,30 @@ struct journal_s
*/
struct lockdep_map j_trans_commit_map;
#endif
/**
* @j_fc_cleanup_callback:
*
* Clean-up after fast commit or full commit. JBD2 calls this function
* after every commit operation.
*/
void (*j_fc_cleanup_callback)(struct journal_s *journal, int);
/*
* @j_fc_replay_callback:
*
* File-system specific function that performs replay of a fast
* commit. JBD2 calls this function for each fast commit block found in
* the journal. This function should return JBD2_FC_REPLAY_CONTINUE
* to indicate that the block was processed correctly and more fast
* commit replay should continue. Return value of JBD2_FC_REPLAY_STOP
* indicates the end of replay (no more blocks remaining). A negative
* return value indicates error.
*/
int (*j_fc_replay_callback)(struct journal_s *journal,
struct buffer_head *bh,
enum passtype pass, int off,
tid_t expected_commit_id);
};
#define jbd2_might_wait_for_commit(j) \
@ -1240,6 +1338,7 @@ JBD2_FEATURE_INCOMPAT_FUNCS(64bit, 64BIT)
JBD2_FEATURE_INCOMPAT_FUNCS(async_commit, ASYNC_COMMIT)
JBD2_FEATURE_INCOMPAT_FUNCS(csum2, CSUM_V2)
JBD2_FEATURE_INCOMPAT_FUNCS(csum3, CSUM_V3)
JBD2_FEATURE_INCOMPAT_FUNCS(fast_commit, FAST_COMMIT)
/*
* Journal flag definitions
@ -1253,6 +1352,8 @@ JBD2_FEATURE_INCOMPAT_FUNCS(csum3, CSUM_V3)
#define JBD2_ABORT_ON_SYNCDATA_ERR 0x040 /* Abort the journal on file
* data write error in ordered
* mode */
#define JBD2_FAST_COMMIT_ONGOING 0x100 /* Fast commit is ongoing */
#define JBD2_FULL_COMMIT_ONGOING 0x200 /* Full commit is ongoing */
/*
* Function declarations for the journaling transaction and buffer
@ -1421,6 +1522,10 @@ extern int jbd2_journal_inode_ranged_write(handle_t *handle,
extern int jbd2_journal_inode_ranged_wait(handle_t *handle,
struct jbd2_inode *inode, loff_t start_byte,
loff_t length);
extern int jbd2_journal_submit_inode_data_buffers(
struct jbd2_inode *jinode);
extern int jbd2_journal_finish_inode_data_buffers(
struct jbd2_inode *jinode);
extern int jbd2_journal_begin_ordered_truncate(journal_t *journal,
struct jbd2_inode *inode, loff_t new_size);
extern void jbd2_journal_init_jbd_inode(struct jbd2_inode *jinode, struct inode *inode);
@ -1505,6 +1610,17 @@ void __jbd2_log_wait_for_space(journal_t *journal);
extern void __jbd2_journal_drop_transaction(journal_t *, transaction_t *);
extern int jbd2_cleanup_journal_tail(journal_t *);
/* Fast commit related APIs */
int jbd2_fc_init(journal_t *journal, int num_fc_blks);
int jbd2_fc_begin_commit(journal_t *journal, tid_t tid);
int jbd2_fc_end_commit(journal_t *journal);
int jbd2_fc_end_commit_fallback(journal_t *journal, tid_t tid);
int jbd2_fc_get_buf(journal_t *journal, struct buffer_head **bh_out);
int jbd2_submit_inode_data(struct jbd2_inode *jinode);
int jbd2_wait_inode_data(journal_t *journal, struct jbd2_inode *jinode);
int jbd2_fc_wait_bufs(journal_t *journal, int num_blks);
int jbd2_fc_release_bufs(journal_t *journal);
/*
* is_journal_abort
*

View File

@ -95,6 +95,16 @@ TRACE_DEFINE_ENUM(ES_REFERENCED_B);
{ FALLOC_FL_COLLAPSE_RANGE, "COLLAPSE_RANGE"}, \
{ FALLOC_FL_ZERO_RANGE, "ZERO_RANGE"})
#define show_fc_reason(reason) \
__print_symbolic(reason, \
{ EXT4_FC_REASON_XATTR, "XATTR"}, \
{ EXT4_FC_REASON_CROSS_RENAME, "CROSS_RENAME"}, \
{ EXT4_FC_REASON_JOURNAL_FLAG_CHANGE, "JOURNAL_FLAG_CHANGE"}, \
{ EXT4_FC_REASON_MEM, "NO_MEM"}, \
{ EXT4_FC_REASON_SWAP_BOOT, "SWAP_BOOT"}, \
{ EXT4_FC_REASON_RESIZE, "RESIZE"}, \
{ EXT4_FC_REASON_RENAME_DIR, "RENAME_DIR"}, \
{ EXT4_FC_REASON_FALLOC_RANGE, "FALLOC_RANGE"})
TRACE_EVENT(ext4_other_inode_update_time,
TP_PROTO(struct inode *inode, ino_t orig_ino),
@ -1766,9 +1776,9 @@ TRACE_EVENT(ext4_ext_load_extent,
);
TRACE_EVENT(ext4_load_inode,
TP_PROTO(struct inode *inode),
TP_PROTO(struct super_block *sb, unsigned long ino),
TP_ARGS(inode),
TP_ARGS(sb, ino),
TP_STRUCT__entry(
__field( dev_t, dev )
@ -1776,8 +1786,8 @@ TRACE_EVENT(ext4_load_inode,
),
TP_fast_assign(
__entry->dev = inode->i_sb->s_dev;
__entry->ino = inode->i_ino;
__entry->dev = sb->s_dev;
__entry->ino = ino;
),
TP_printk("dev %d,%d ino %ld",
@ -2791,6 +2801,216 @@ TRACE_EVENT(ext4_lazy_itable_init,
MAJOR(__entry->dev), MINOR(__entry->dev), __entry->group)
);
TRACE_EVENT(ext4_fc_replay_scan,
TP_PROTO(struct super_block *sb, int error, int off),
TP_ARGS(sb, error, off),
TP_STRUCT__entry(
__field(dev_t, dev)
__field(int, error)
__field(int, off)
),
TP_fast_assign(
__entry->dev = sb->s_dev;
__entry->error = error;
__entry->off = off;
),
TP_printk("FC scan pass on dev %d,%d: error %d, off %d",
MAJOR(__entry->dev), MINOR(__entry->dev),
__entry->error, __entry->off)
);
TRACE_EVENT(ext4_fc_replay,
TP_PROTO(struct super_block *sb, int tag, int ino, int priv1, int priv2),
TP_ARGS(sb, tag, ino, priv1, priv2),
TP_STRUCT__entry(
__field(dev_t, dev)
__field(int, tag)
__field(int, ino)
__field(int, priv1)
__field(int, priv2)
),
TP_fast_assign(
__entry->dev = sb->s_dev;
__entry->tag = tag;
__entry->ino = ino;
__entry->priv1 = priv1;
__entry->priv2 = priv2;
),
TP_printk("FC Replay %d,%d: tag %d, ino %d, data1 %d, data2 %d",
MAJOR(__entry->dev), MINOR(__entry->dev),
__entry->tag, __entry->ino, __entry->priv1, __entry->priv2)
);
TRACE_EVENT(ext4_fc_commit_start,
TP_PROTO(struct super_block *sb),
TP_ARGS(sb),
TP_STRUCT__entry(
__field(dev_t, dev)
),
TP_fast_assign(
__entry->dev = sb->s_dev;
),
TP_printk("fast_commit started on dev %d,%d",
MAJOR(__entry->dev), MINOR(__entry->dev))
);
TRACE_EVENT(ext4_fc_commit_stop,
TP_PROTO(struct super_block *sb, int nblks, int reason),
TP_ARGS(sb, nblks, reason),
TP_STRUCT__entry(
__field(dev_t, dev)
__field(int, nblks)
__field(int, reason)
__field(int, num_fc)
__field(int, num_fc_ineligible)
__field(int, nblks_agg)
),
TP_fast_assign(
__entry->dev = sb->s_dev;
__entry->nblks = nblks;
__entry->reason = reason;
__entry->num_fc = EXT4_SB(sb)->s_fc_stats.fc_num_commits;
__entry->num_fc_ineligible =
EXT4_SB(sb)->s_fc_stats.fc_ineligible_commits;
__entry->nblks_agg = EXT4_SB(sb)->s_fc_stats.fc_numblks;
),
TP_printk("fc on [%d,%d] nblks %d, reason %d, fc = %d, ineligible = %d, agg_nblks %d",
MAJOR(__entry->dev), MINOR(__entry->dev),
__entry->nblks, __entry->reason, __entry->num_fc,
__entry->num_fc_ineligible, __entry->nblks_agg)
);
#define FC_REASON_NAME_STAT(reason) \
show_fc_reason(reason), \
__entry->sbi->s_fc_stats.fc_ineligible_reason_count[reason]
TRACE_EVENT(ext4_fc_stats,
TP_PROTO(struct super_block *sb),
TP_ARGS(sb),
TP_STRUCT__entry(
__field(dev_t, dev)
__field(struct ext4_sb_info *, sbi)
__field(int, count)
),
TP_fast_assign(
__entry->dev = sb->s_dev;
__entry->sbi = EXT4_SB(sb);
),
TP_printk("dev %d:%d fc ineligible reasons:\n"
"%s:%d, %s:%d, %s:%d, %s:%d, %s:%d, %s:%d, %s:%d, %s,%d; "
"num_commits:%ld, ineligible: %ld, numblks: %ld",
MAJOR(__entry->dev), MINOR(__entry->dev),
FC_REASON_NAME_STAT(EXT4_FC_REASON_XATTR),
FC_REASON_NAME_STAT(EXT4_FC_REASON_CROSS_RENAME),
FC_REASON_NAME_STAT(EXT4_FC_REASON_JOURNAL_FLAG_CHANGE),
FC_REASON_NAME_STAT(EXT4_FC_REASON_MEM),
FC_REASON_NAME_STAT(EXT4_FC_REASON_SWAP_BOOT),
FC_REASON_NAME_STAT(EXT4_FC_REASON_RESIZE),
FC_REASON_NAME_STAT(EXT4_FC_REASON_RENAME_DIR),
FC_REASON_NAME_STAT(EXT4_FC_REASON_FALLOC_RANGE),
__entry->sbi->s_fc_stats.fc_num_commits,
__entry->sbi->s_fc_stats.fc_ineligible_commits,
__entry->sbi->s_fc_stats.fc_numblks)
);
#define DEFINE_TRACE_DENTRY_EVENT(__type) \
TRACE_EVENT(ext4_fc_track_##__type, \
TP_PROTO(struct inode *inode, struct dentry *dentry, int ret), \
\
TP_ARGS(inode, dentry, ret), \
\
TP_STRUCT__entry( \
__field(dev_t, dev) \
__field(int, ino) \
__field(int, error) \
), \
\
TP_fast_assign( \
__entry->dev = inode->i_sb->s_dev; \
__entry->ino = inode->i_ino; \
__entry->error = ret; \
), \
\
TP_printk("dev %d:%d, inode %d, error %d, fc_%s", \
MAJOR(__entry->dev), MINOR(__entry->dev), \
__entry->ino, __entry->error, \
#__type) \
)
DEFINE_TRACE_DENTRY_EVENT(create);
DEFINE_TRACE_DENTRY_EVENT(link);
DEFINE_TRACE_DENTRY_EVENT(unlink);
TRACE_EVENT(ext4_fc_track_inode,
TP_PROTO(struct inode *inode, int ret),
TP_ARGS(inode, ret),
TP_STRUCT__entry(
__field(dev_t, dev)
__field(int, ino)
__field(int, error)
),
TP_fast_assign(
__entry->dev = inode->i_sb->s_dev;
__entry->ino = inode->i_ino;
__entry->error = ret;
),
TP_printk("dev %d:%d, inode %d, error %d",
MAJOR(__entry->dev), MINOR(__entry->dev),
__entry->ino, __entry->error)
);
TRACE_EVENT(ext4_fc_track_range,
TP_PROTO(struct inode *inode, long start, long end, int ret),
TP_ARGS(inode, start, end, ret),
TP_STRUCT__entry(
__field(dev_t, dev)
__field(int, ino)
__field(long, start)
__field(long, end)
__field(int, error)
),
TP_fast_assign(
__entry->dev = inode->i_sb->s_dev;
__entry->ino = inode->i_ino;
__entry->start = start;
__entry->end = end;
__entry->error = ret;
),
TP_printk("dev %d:%d, inode %d, error %d, start %ld, end %ld",
MAJOR(__entry->dev), MINOR(__entry->dev),
__entry->ino, __entry->error, __entry->start,
__entry->end)
);
#endif /* _TRACE_EXT4_H */
/* This part must be outside protection */