Changes for Linux 5.2:

- Fix some more buffer deadlocks when performing an unmount after a hard
   shutdown.
 - Fix some minor space accounting issues.
 - Fix some use after free problems.
 - Make the (undocumented) FITRIM behavior consistent with other filesystems.
 - Embiggen the xfs geometry ioctl's data structure.
 - Introduce a new AG geometry ioctl.
 - Introduce a new online health reporting infrastructure and ioctl for
   userspace to query a filesystem's health status.
 - Enhance online scrub and repair to update the health reports.
 - Reduce thundering herd problems when writeback io completes.
 - Fix some transaction reservation type errors.
 - Fix integer overflow problems with delayed alloc reservation counters.
 - Fix some problems where we would exit to userspace without unlocking.
 - Fix inconsistent behavior when finishing deferred ops fails.
 - Strengthen scrub to check incore data against ondisk metadata.
 - Remove long-broken mntpt mount option.
 - Add an online scrub function for the filesystem summary counters,
   which should make online metadata scrub more or less feature complete
   for now.
 - Various cleanups.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEUzaAxoMeQq6m2jMV+H93GTRKtOsFAlzMUEwACgkQ+H93GTRK
 tOvvHw//bou5YL/gMrsxCbg2b7rpsNG3TIOz5Kq52V3JMtyqFWzArpCBEskVZXD0
 J4ZXZMv/VSvI2DgV22w8/vlsjPJiODPu5mIqmcyQmZK8eDg4sL7EKVa601F57kyj
 QPrdT1AnxUl+n1gM4XrV57xmsNwLYMVHKQC9e5MS6LSu7+1sarw7HFxSY1AMG9Ys
 skEvwe762LCbMsnBBLCs/ZeqHWlqDok9HiKZNCj35aRrLV9dA97mjznlBFUJbhbw
 kAG2jeAaG0LQnlnCzPRd3HJqQXlGL4044gx73RRY3+/POVYupKiC9KSlImq/cbLd
 n7UWHVDieWoOuLKverUICw4UuqtkAXurUCW7w91ipEmZUlYKNrMNNXiEm7pfgJU3
 A2KK1R14UYKJ3zX6xPz4mdlYhh0KB/xlN01Rdzhrhk9XKfL92/YyjpyjcTIeUZm8
 RNKLAoWRpJZPou3RPpfZLFTSmtYIcTB92kYV6XpQ3DRrJLjlaHbu9VJaipbZGhxY
 rdF+Rtk8EjKMFP0bixDHePWCu7317vMy1lbpO5UipxyC9eTwry54EaCxP03CI7YO
 OAsqCdf8HYlGqEjWKprkCczMYkDRDT0p4bS27Rdzc1D5lUj6/g5hF7RD0MYc1/eA
 ZDQUqVgBTmAQp+tKPTHhuSTWyZ8IIt0kdg5Z8IRVWxd+SmwwGoo=
 =d2sO
 -----END PGP SIGNATURE-----

Merge tag 'xfs-5.2-merge-4' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux

Pull xfs updates from Darrick Wong:
 "Here's a big pile of new stuff for XFS for 5.2. XFS has grown the
  ability to report metadata health status to userspace after online
  fsck checks the filesystem. The online metadata checking code is (I
  really hope) feature complete with the addition of checks for the
  global fs counters, though it'll remain EXPERIMENTAL for now.

  There are also fixes for thundering herds of writeback completions and
  some other deadlocks, fixes for theoretical integer overflow attacks
  on space accounting, and removal of the long-defunct 'mntpt' option
  which was deprecated in the mid-2000s and (it turns out) totally
  broken since 2011 (and nobody complained...).

  Summary:

   - Fix some more buffer deadlocks when performing an unmount after a
     hard shutdown.

   - Fix some minor space accounting issues.

   - Fix some use after free problems.

   - Make the (undocumented) FITRIM behavior consistent with other
     filesystems.

   - Embiggen the xfs geometry ioctl's data structure.

   - Introduce a new AG geometry ioctl.

   - Introduce a new online health reporting infrastructure and ioctl
     for userspace to query a filesystem's health status.

   - Enhance online scrub and repair to update the health reports.

   - Reduce thundering herd problems when writeback io completes.

   - Fix some transaction reservation type errors.

   - Fix integer overflow problems with delayed alloc reservation
     counters.

   - Fix some problems where we would exit to userspace without
     unlocking.

   - Fix inconsistent behavior when finishing deferred ops fails.

   - Strengthen scrub to check incore data against ondisk metadata.

   - Remove long-broken mntpt mount option.

   - Add an online scrub function for the filesystem summary counters,
     which should make online metadata scrub more or less feature
     complete for now.

   - Various cleanups"

* tag 'xfs-5.2-merge-4' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: (38 commits)
  xfs: change some error-less functions to void types
  xfs: add online scrub for superblock counters
  xfs: don't parse the mtpt mount option
  xfs: always rejoin held resources during defer roll
  xfs: add missing error check in xfs_prepare_shift()
  xfs: scrub should check incore counters against ondisk headers
  xfs: allow scrubbers to pause background reclaim
  xfs: rename the speculative block allocation reclaim toggle functions
  xfs: track delayed allocation reservations across the filesystem
  xfs: fix broken bhold behavior in xrep_roll_ag_trans
  xfs: unlock inode when xfs_ioctl_setattr_get_trans can't get transaction
  xfs: kill the xfs_dqtrx_t typedef
  xfs: widen inode delalloc block counter to 64-bits
  xfs: widen quota block counters to 64-bit integers
  xfs: abort unaligned nowait directio early
  xfs: assert that we don't enter agfl freeing with a non-permanent transaction
  xfs: make tr_growdata a permanent transaction
  xfs: merge adjacent io completions of the same type
  xfs: remove unused m_data_workqueue
  xfs: implement per-inode writeback completion queues
  ...
This commit is contained in:
Linus Torvalds 2019-05-07 11:46:56 -07:00
commit aa26690fab
59 changed files with 2084 additions and 312 deletions

View File

@ -73,6 +73,7 @@ xfs-y += xfs_aops.o \
xfs_fsmap.o \ xfs_fsmap.o \
xfs_fsops.o \ xfs_fsops.o \
xfs_globals.o \ xfs_globals.o \
xfs_health.o \
xfs_icache.o \ xfs_icache.o \
xfs_ioctl.o \ xfs_ioctl.o \
xfs_iomap.o \ xfs_iomap.o \
@ -142,6 +143,8 @@ xfs-y += $(addprefix scrub/, \
common.o \ common.o \
dabtree.o \ dabtree.o \
dir.o \ dir.o \
fscounters.o \
health.o \
ialloc.o \ ialloc.o \
inode.o \ inode.o \
parent.o \ parent.o \

View File

@ -19,6 +19,8 @@
#include "xfs_ialloc.h" #include "xfs_ialloc.h"
#include "xfs_rmap.h" #include "xfs_rmap.h"
#include "xfs_ag.h" #include "xfs_ag.h"
#include "xfs_ag_resv.h"
#include "xfs_health.h"
static struct xfs_buf * static struct xfs_buf *
xfs_get_aghdr_buf( xfs_get_aghdr_buf(
@ -461,3 +463,55 @@ xfs_ag_extend_space(
len, &XFS_RMAP_OINFO_SKIP_UPDATE, len, &XFS_RMAP_OINFO_SKIP_UPDATE,
XFS_AG_RESV_NONE); XFS_AG_RESV_NONE);
} }
/* Retrieve AG geometry. */
int
xfs_ag_get_geometry(
struct xfs_mount *mp,
xfs_agnumber_t agno,
struct xfs_ag_geometry *ageo)
{
struct xfs_buf *agi_bp;
struct xfs_buf *agf_bp;
struct xfs_agi *agi;
struct xfs_agf *agf;
struct xfs_perag *pag;
unsigned int freeblks;
int error;
if (agno >= mp->m_sb.sb_agcount)
return -EINVAL;
/* Lock the AG headers. */
error = xfs_ialloc_read_agi(mp, NULL, agno, &agi_bp);
if (error)
return error;
error = xfs_alloc_read_agf(mp, NULL, agno, 0, &agf_bp);
if (error)
goto out_agi;
pag = xfs_perag_get(mp, agno);
/* Fill out form. */
memset(ageo, 0, sizeof(*ageo));
ageo->ag_number = agno;
agi = XFS_BUF_TO_AGI(agi_bp);
ageo->ag_icount = be32_to_cpu(agi->agi_count);
ageo->ag_ifree = be32_to_cpu(agi->agi_freecount);
agf = XFS_BUF_TO_AGF(agf_bp);
ageo->ag_length = be32_to_cpu(agf->agf_length);
freeblks = pag->pagf_freeblks +
pag->pagf_flcount +
pag->pagf_btreeblks -
xfs_ag_resv_needed(pag, XFS_AG_RESV_NONE);
ageo->ag_freeblks = freeblks;
xfs_ag_geom_health(pag, ageo);
/* Release resources. */
xfs_perag_put(pag);
xfs_buf_relse(agf_bp);
out_agi:
xfs_buf_relse(agi_bp);
return error;
}

View File

@ -26,5 +26,7 @@ struct aghdr_init_data {
int xfs_ag_init_headers(struct xfs_mount *mp, struct aghdr_init_data *id); int xfs_ag_init_headers(struct xfs_mount *mp, struct aghdr_init_data *id);
int xfs_ag_extend_space(struct xfs_mount *mp, struct xfs_trans *tp, int xfs_ag_extend_space(struct xfs_mount *mp, struct xfs_trans *tp,
struct aghdr_init_data *id, xfs_extlen_t len); struct aghdr_init_data *id, xfs_extlen_t len);
int xfs_ag_get_geometry(struct xfs_mount *mp, xfs_agnumber_t agno,
struct xfs_ag_geometry *ageo);
#endif /* __LIBXFS_AG_H */ #endif /* __LIBXFS_AG_H */

View File

@ -2042,6 +2042,7 @@ xfs_alloc_space_available(
xfs_extlen_t alloc_len, longest; xfs_extlen_t alloc_len, longest;
xfs_extlen_t reservation; /* blocks that are still reserved */ xfs_extlen_t reservation; /* blocks that are still reserved */
int available; int available;
xfs_extlen_t agflcount;
if (flags & XFS_ALLOC_FLAG_FREEING) if (flags & XFS_ALLOC_FLAG_FREEING)
return true; return true;
@ -2054,8 +2055,13 @@ xfs_alloc_space_available(
if (longest < alloc_len) if (longest < alloc_len)
return false; return false;
/* do we have enough free space remaining for the allocation? */ /*
available = (int)(pag->pagf_freeblks + pag->pagf_flcount - * Do we have enough free space remaining for the allocation? Don't
* account extra agfl blocks because we are about to defer free them,
* making them unavailable until the current transaction commits.
*/
agflcount = min_t(xfs_extlen_t, pag->pagf_flcount, min_free);
available = (int)(pag->pagf_freeblks + agflcount -
reservation - min_free - args->minleft); reservation - min_free - args->minleft);
if (available < (int)max(args->total, alloc_len)) if (available < (int)max(args->total, alloc_len))
return false; return false;
@ -2237,6 +2243,9 @@ xfs_alloc_fix_freelist(
xfs_extlen_t need; /* total blocks needed in freelist */ xfs_extlen_t need; /* total blocks needed in freelist */
int error = 0; int error = 0;
/* deferred ops (AGFL block frees) require permanent transactions */
ASSERT(tp->t_flags & XFS_TRANS_PERM_LOG_RES);
if (!pag->pagf_init) { if (!pag->pagf_init) {
error = xfs_alloc_read_agf(mp, tp, args->agno, flags, &agbp); error = xfs_alloc_read_agf(mp, tp, args->agno, flags, &agbp);
if (error) if (error)

View File

@ -224,10 +224,10 @@ xfs_attr_try_sf_addname(
*/ */
int int
xfs_attr_set_args( xfs_attr_set_args(
struct xfs_da_args *args, struct xfs_da_args *args)
struct xfs_buf **leaf_bp)
{ {
struct xfs_inode *dp = args->dp; struct xfs_inode *dp = args->dp;
struct xfs_buf *leaf_bp = NULL;
int error; int error;
/* /*
@ -255,7 +255,7 @@ xfs_attr_set_args(
* It won't fit in the shortform, transform to a leaf block. * It won't fit in the shortform, transform to a leaf block.
* GROT: another possible req'mt for a double-split btree op. * GROT: another possible req'mt for a double-split btree op.
*/ */
error = xfs_attr_shortform_to_leaf(args, leaf_bp); error = xfs_attr_shortform_to_leaf(args, &leaf_bp);
if (error) if (error)
return error; return error;
@ -263,23 +263,16 @@ xfs_attr_set_args(
* Prevent the leaf buffer from being unlocked so that a * Prevent the leaf buffer from being unlocked so that a
* concurrent AIL push cannot grab the half-baked leaf * concurrent AIL push cannot grab the half-baked leaf
* buffer and run into problems with the write verifier. * buffer and run into problems with the write verifier.
* Once we're done rolling the transaction we can release
* the hold and add the attr to the leaf.
*/ */
xfs_trans_bhold(args->trans, *leaf_bp); xfs_trans_bhold(args->trans, leaf_bp);
error = xfs_defer_finish(&args->trans); error = xfs_defer_finish(&args->trans);
if (error) xfs_trans_bhold_release(args->trans, leaf_bp);
if (error) {
xfs_trans_brelse(args->trans, leaf_bp);
return error; return error;
}
/*
* Commit the leaf transformation. We'll need another
* (linked) transaction to add the new attribute to the
* leaf.
*/
error = xfs_trans_roll_inode(&args->trans, dp);
if (error)
return error;
xfs_trans_bjoin(args->trans, *leaf_bp);
*leaf_bp = NULL;
} }
if (xfs_bmap_one_block(dp, XFS_ATTR_FORK)) if (xfs_bmap_one_block(dp, XFS_ATTR_FORK))
@ -322,7 +315,6 @@ xfs_attr_set(
int flags) int flags)
{ {
struct xfs_mount *mp = dp->i_mount; struct xfs_mount *mp = dp->i_mount;
struct xfs_buf *leaf_bp = NULL;
struct xfs_da_args args; struct xfs_da_args args;
struct xfs_trans_res tres; struct xfs_trans_res tres;
int rsvd = (flags & ATTR_ROOT) != 0; int rsvd = (flags & ATTR_ROOT) != 0;
@ -381,9 +373,9 @@ xfs_attr_set(
goto out_trans_cancel; goto out_trans_cancel;
xfs_trans_ijoin(args.trans, dp, 0); xfs_trans_ijoin(args.trans, dp, 0);
error = xfs_attr_set_args(&args, &leaf_bp); error = xfs_attr_set_args(&args);
if (error) if (error)
goto out_release_leaf; goto out_trans_cancel;
if (!args.trans) { if (!args.trans) {
/* shortform attribute has already been committed */ /* shortform attribute has already been committed */
goto out_unlock; goto out_unlock;
@ -408,9 +400,6 @@ xfs_attr_set(
xfs_iunlock(dp, XFS_ILOCK_EXCL); xfs_iunlock(dp, XFS_ILOCK_EXCL);
return error; return error;
out_release_leaf:
if (leaf_bp)
xfs_trans_brelse(args.trans, leaf_bp);
out_trans_cancel: out_trans_cancel:
if (args.trans) if (args.trans)
xfs_trans_cancel(args.trans); xfs_trans_cancel(args.trans);

View File

@ -140,7 +140,7 @@ int xfs_attr_get(struct xfs_inode *ip, const unsigned char *name,
unsigned char *value, int *valuelenp, int flags); unsigned char *value, int *valuelenp, int flags);
int xfs_attr_set(struct xfs_inode *dp, const unsigned char *name, int xfs_attr_set(struct xfs_inode *dp, const unsigned char *name,
unsigned char *value, int valuelen, int flags); unsigned char *value, int valuelen, int flags);
int xfs_attr_set_args(struct xfs_da_args *args, struct xfs_buf **leaf_bp); int xfs_attr_set_args(struct xfs_da_args *args);
int xfs_attr_remove(struct xfs_inode *dp, const unsigned char *name, int flags); int xfs_attr_remove(struct xfs_inode *dp, const unsigned char *name, int flags);
int xfs_attr_remove_args(struct xfs_da_args *args); int xfs_attr_remove_args(struct xfs_da_args *args);
int xfs_attr_list(struct xfs_inode *dp, char *buffer, int bufsize, int xfs_attr_list(struct xfs_inode *dp, char *buffer, int bufsize,

View File

@ -2009,6 +2009,9 @@ xfs_bmap_add_extent_delay_real(
goto done; goto done;
} }
if (da_new != da_old)
xfs_mod_delalloc(mp, (int64_t)da_new - da_old);
if (bma->cur) { if (bma->cur) {
da_new += bma->cur->bc_private.b.allocated; da_new += bma->cur->bc_private.b.allocated;
bma->cur->bc_private.b.allocated = 0; bma->cur->bc_private.b.allocated = 0;
@ -2640,6 +2643,7 @@ xfs_bmap_add_extent_hole_delay(
/* /*
* Nothing to do for disk quota accounting here. * Nothing to do for disk quota accounting here.
*/ */
xfs_mod_delalloc(ip->i_mount, (int64_t)newlen - oldlen);
} }
} }
@ -3352,8 +3356,10 @@ xfs_bmap_btalloc_accounting(
* already have quota reservation and there's nothing to do * already have quota reservation and there's nothing to do
* yet. * yet.
*/ */
if (ap->wasdel) if (ap->wasdel) {
xfs_mod_delalloc(ap->ip->i_mount, -(int64_t)args->len);
return; return;
}
/* /*
* Otherwise, we've allocated blocks in a hole. The transaction * Otherwise, we've allocated blocks in a hole. The transaction
@ -3372,8 +3378,10 @@ xfs_bmap_btalloc_accounting(
/* data/attr fork only */ /* data/attr fork only */
ap->ip->i_d.di_nblocks += args->len; ap->ip->i_d.di_nblocks += args->len;
xfs_trans_log_inode(ap->tp, ap->ip, XFS_ILOG_CORE); xfs_trans_log_inode(ap->tp, ap->ip, XFS_ILOG_CORE);
if (ap->wasdel) if (ap->wasdel) {
ap->ip->i_delayed_blks -= args->len; ap->ip->i_delayed_blks -= args->len;
xfs_mod_delalloc(ap->ip->i_mount, -(int64_t)args->len);
}
xfs_trans_mod_dquot_byino(ap->tp, ap->ip, xfs_trans_mod_dquot_byino(ap->tp, ap->ip,
ap->wasdel ? XFS_TRANS_DQ_DELBCOUNT : XFS_TRANS_DQ_BCOUNT, ap->wasdel ? XFS_TRANS_DQ_DELBCOUNT : XFS_TRANS_DQ_BCOUNT,
args->len); args->len);
@ -3969,6 +3977,7 @@ xfs_bmapi_reserve_delalloc(
ip->i_delayed_blks += alen; ip->i_delayed_blks += alen;
xfs_mod_delalloc(ip->i_mount, alen + indlen);
got->br_startoff = aoff; got->br_startoff = aoff;
got->br_startblock = nullstartblock(indlen); got->br_startblock = nullstartblock(indlen);
@ -4840,8 +4849,10 @@ xfs_bmap_del_extent_delay(
da_diff = da_old - da_new; da_diff = da_old - da_new;
if (!isrt) if (!isrt)
da_diff += del->br_blockcount; da_diff += del->br_blockcount;
if (da_diff) if (da_diff) {
xfs_mod_fdblocks(mp, da_diff, false); xfs_mod_fdblocks(mp, da_diff, false);
xfs_mod_delalloc(mp, -da_diff);
}
return error; return error;
} }

View File

@ -274,13 +274,15 @@ xfs_defer_trans_roll(
trace_xfs_defer_trans_roll(tp, _RET_IP_); trace_xfs_defer_trans_roll(tp, _RET_IP_);
/* Roll the transaction. */ /*
* Roll the transaction. Rolling always given a new transaction (even
* if committing the old one fails!) to hand back to the caller, so we
* join the held resources to the new transaction so that we always
* return with the held resources joined to @tpp, no matter what
* happened.
*/
error = xfs_trans_roll(tpp); error = xfs_trans_roll(tpp);
tp = *tpp; tp = *tpp;
if (error) {
trace_xfs_defer_trans_roll_error(tp, error);
return error;
}
/* Rejoin the joined inodes. */ /* Rejoin the joined inodes. */
for (i = 0; i < ipcount; i++) for (i = 0; i < ipcount; i++)
@ -292,6 +294,8 @@ xfs_defer_trans_roll(
xfs_trans_bhold(tp, bplist[i]); xfs_trans_bhold(tp, bplist[i]);
} }
if (error)
trace_xfs_defer_trans_roll_error(tp, error);
return error; return error;
} }

View File

@ -110,7 +110,7 @@ xfs_dqblk_verify(
/* /*
* Do some primitive error checking on ondisk dquot data structures. * Do some primitive error checking on ondisk dquot data structures.
*/ */
int void
xfs_dqblk_repair( xfs_dqblk_repair(
struct xfs_mount *mp, struct xfs_mount *mp,
struct xfs_dqblk *dqb, struct xfs_dqblk *dqb,
@ -133,8 +133,6 @@ xfs_dqblk_repair(
xfs_update_cksum((char *)dqb, sizeof(struct xfs_dqblk), xfs_update_cksum((char *)dqb, sizeof(struct xfs_dqblk),
XFS_DQUOT_CRC_OFF); XFS_DQUOT_CRC_OFF);
} }
return 0;
} }
STATIC bool STATIC bool

View File

@ -124,7 +124,7 @@ typedef struct xfs_flock64 {
/* /*
* Output for XFS_IOC_FSGEOMETRY_V1 * Output for XFS_IOC_FSGEOMETRY_V1
*/ */
typedef struct xfs_fsop_geom_v1 { struct xfs_fsop_geom_v1 {
__u32 blocksize; /* filesystem (data) block size */ __u32 blocksize; /* filesystem (data) block size */
__u32 rtextsize; /* realtime extent size */ __u32 rtextsize; /* realtime extent size */
__u32 agblocks; /* fsblocks in an AG */ __u32 agblocks; /* fsblocks in an AG */
@ -145,12 +145,39 @@ typedef struct xfs_fsop_geom_v1 {
__u32 logsectsize; /* log sector size, bytes */ __u32 logsectsize; /* log sector size, bytes */
__u32 rtsectsize; /* realtime sector size, bytes */ __u32 rtsectsize; /* realtime sector size, bytes */
__u32 dirblocksize; /* directory block size, bytes */ __u32 dirblocksize; /* directory block size, bytes */
} xfs_fsop_geom_v1_t; };
/*
* Output for XFS_IOC_FSGEOMETRY_V4
*/
struct xfs_fsop_geom_v4 {
__u32 blocksize; /* filesystem (data) block size */
__u32 rtextsize; /* realtime extent size */
__u32 agblocks; /* fsblocks in an AG */
__u32 agcount; /* number of allocation groups */
__u32 logblocks; /* fsblocks in the log */
__u32 sectsize; /* (data) sector size, bytes */
__u32 inodesize; /* inode size in bytes */
__u32 imaxpct; /* max allowed inode space(%) */
__u64 datablocks; /* fsblocks in data subvolume */
__u64 rtblocks; /* fsblocks in realtime subvol */
__u64 rtextents; /* rt extents in realtime subvol*/
__u64 logstart; /* starting fsblock of the log */
unsigned char uuid[16]; /* unique id of the filesystem */
__u32 sunit; /* stripe unit, fsblocks */
__u32 swidth; /* stripe width, fsblocks */
__s32 version; /* structure version */
__u32 flags; /* superblock version flags */
__u32 logsectsize; /* log sector size, bytes */
__u32 rtsectsize; /* realtime sector size, bytes */
__u32 dirblocksize; /* directory block size, bytes */
__u32 logsunit; /* log stripe unit, bytes */
};
/* /*
* Output for XFS_IOC_FSGEOMETRY * Output for XFS_IOC_FSGEOMETRY
*/ */
typedef struct xfs_fsop_geom { struct xfs_fsop_geom {
__u32 blocksize; /* filesystem (data) block size */ __u32 blocksize; /* filesystem (data) block size */
__u32 rtextsize; /* realtime extent size */ __u32 rtextsize; /* realtime extent size */
__u32 agblocks; /* fsblocks in an AG */ __u32 agblocks; /* fsblocks in an AG */
@ -171,8 +198,18 @@ typedef struct xfs_fsop_geom {
__u32 logsectsize; /* log sector size, bytes */ __u32 logsectsize; /* log sector size, bytes */
__u32 rtsectsize; /* realtime sector size, bytes */ __u32 rtsectsize; /* realtime sector size, bytes */
__u32 dirblocksize; /* directory block size, bytes */ __u32 dirblocksize; /* directory block size, bytes */
__u32 logsunit; /* log stripe unit, bytes */ __u32 logsunit; /* log stripe unit, bytes */
} xfs_fsop_geom_t; uint32_t sick; /* o: unhealthy fs & rt metadata */
uint32_t checked; /* o: checked fs & rt metadata */
__u64 reserved[17]; /* reserved space */
};
#define XFS_FSOP_GEOM_SICK_COUNTERS (1 << 0) /* summary counters */
#define XFS_FSOP_GEOM_SICK_UQUOTA (1 << 1) /* user quota */
#define XFS_FSOP_GEOM_SICK_GQUOTA (1 << 2) /* group quota */
#define XFS_FSOP_GEOM_SICK_PQUOTA (1 << 3) /* project quota */
#define XFS_FSOP_GEOM_SICK_RT_BITMAP (1 << 4) /* realtime bitmap */
#define XFS_FSOP_GEOM_SICK_RT_SUMMARY (1 << 5) /* realtime summary */
/* Output for XFS_FS_COUNTS */ /* Output for XFS_FS_COUNTS */
typedef struct xfs_fsop_counts { typedef struct xfs_fsop_counts {
@ -188,28 +225,30 @@ typedef struct xfs_fsop_resblks {
__u64 resblks_avail; __u64 resblks_avail;
} xfs_fsop_resblks_t; } xfs_fsop_resblks_t;
#define XFS_FSOP_GEOM_VERSION 0 #define XFS_FSOP_GEOM_VERSION 0
#define XFS_FSOP_GEOM_VERSION_V5 5
#define XFS_FSOP_GEOM_FLAGS_ATTR 0x0001 /* attributes in use */ #define XFS_FSOP_GEOM_FLAGS_ATTR (1 << 0) /* attributes in use */
#define XFS_FSOP_GEOM_FLAGS_NLINK 0x0002 /* 32-bit nlink values */ #define XFS_FSOP_GEOM_FLAGS_NLINK (1 << 1) /* 32-bit nlink values */
#define XFS_FSOP_GEOM_FLAGS_QUOTA 0x0004 /* quotas enabled */ #define XFS_FSOP_GEOM_FLAGS_QUOTA (1 << 2) /* quotas enabled */
#define XFS_FSOP_GEOM_FLAGS_IALIGN 0x0008 /* inode alignment */ #define XFS_FSOP_GEOM_FLAGS_IALIGN (1 << 3) /* inode alignment */
#define XFS_FSOP_GEOM_FLAGS_DALIGN 0x0010 /* large data alignment */ #define XFS_FSOP_GEOM_FLAGS_DALIGN (1 << 4) /* large data alignment */
#define XFS_FSOP_GEOM_FLAGS_SHARED 0x0020 /* read-only shared */ #define XFS_FSOP_GEOM_FLAGS_SHARED (1 << 5) /* read-only shared */
#define XFS_FSOP_GEOM_FLAGS_EXTFLG 0x0040 /* special extent flag */ #define XFS_FSOP_GEOM_FLAGS_EXTFLG (1 << 6) /* special extent flag */
#define XFS_FSOP_GEOM_FLAGS_DIRV2 0x0080 /* directory version 2 */ #define XFS_FSOP_GEOM_FLAGS_DIRV2 (1 << 7) /* directory version 2 */
#define XFS_FSOP_GEOM_FLAGS_LOGV2 0x0100 /* log format version 2 */ #define XFS_FSOP_GEOM_FLAGS_LOGV2 (1 << 8) /* log format version 2 */
#define XFS_FSOP_GEOM_FLAGS_SECTOR 0x0200 /* sector sizes >1BB */ #define XFS_FSOP_GEOM_FLAGS_SECTOR (1 << 9) /* sector sizes >1BB */
#define XFS_FSOP_GEOM_FLAGS_ATTR2 0x0400 /* inline attributes rework */ #define XFS_FSOP_GEOM_FLAGS_ATTR2 (1 << 10) /* inline attributes rework */
#define XFS_FSOP_GEOM_FLAGS_PROJID32 0x0800 /* 32-bit project IDs */ #define XFS_FSOP_GEOM_FLAGS_PROJID32 (1 << 11) /* 32-bit project IDs */
#define XFS_FSOP_GEOM_FLAGS_DIRV2CI 0x1000 /* ASCII only CI names */ #define XFS_FSOP_GEOM_FLAGS_DIRV2CI (1 << 12) /* ASCII only CI names */
#define XFS_FSOP_GEOM_FLAGS_LAZYSB 0x4000 /* lazy superblock counters */ /* -- Do not use -- (1 << 13) SGI parent pointers */
#define XFS_FSOP_GEOM_FLAGS_V5SB 0x8000 /* version 5 superblock */ #define XFS_FSOP_GEOM_FLAGS_LAZYSB (1 << 14) /* lazy superblock counters */
#define XFS_FSOP_GEOM_FLAGS_FTYPE 0x10000 /* inode directory types */ #define XFS_FSOP_GEOM_FLAGS_V5SB (1 << 15) /* version 5 superblock */
#define XFS_FSOP_GEOM_FLAGS_FINOBT 0x20000 /* free inode btree */ #define XFS_FSOP_GEOM_FLAGS_FTYPE (1 << 16) /* inode directory types */
#define XFS_FSOP_GEOM_FLAGS_SPINODES 0x40000 /* sparse inode chunks */ #define XFS_FSOP_GEOM_FLAGS_FINOBT (1 << 17) /* free inode btree */
#define XFS_FSOP_GEOM_FLAGS_RMAPBT 0x80000 /* reverse mapping btree */ #define XFS_FSOP_GEOM_FLAGS_SPINODES (1 << 18) /* sparse inode chunks */
#define XFS_FSOP_GEOM_FLAGS_REFLINK 0x100000 /* files can share blocks */ #define XFS_FSOP_GEOM_FLAGS_RMAPBT (1 << 19) /* reverse mapping btree */
#define XFS_FSOP_GEOM_FLAGS_REFLINK (1 << 20) /* files can share blocks */
/* /*
* Minimum and maximum sizes need for growth checks. * Minimum and maximum sizes need for growth checks.
@ -237,6 +276,31 @@ typedef struct xfs_fsop_resblks {
#define XFS_MIN_DBLOCKS(s) ((xfs_rfsblock_t)((s)->sb_agcount - 1) * \ #define XFS_MIN_DBLOCKS(s) ((xfs_rfsblock_t)((s)->sb_agcount - 1) * \
(s)->sb_agblocks + XFS_MIN_AG_BLOCKS) (s)->sb_agblocks + XFS_MIN_AG_BLOCKS)
/*
* Output for XFS_IOC_AG_GEOMETRY
*/
struct xfs_ag_geometry {
uint32_t ag_number; /* i/o: AG number */
uint32_t ag_length; /* o: length in blocks */
uint32_t ag_freeblks; /* o: free space */
uint32_t ag_icount; /* o: inodes allocated */
uint32_t ag_ifree; /* o: inodes free */
uint32_t ag_sick; /* o: sick things in ag */
uint32_t ag_checked; /* o: checked metadata in ag */
uint32_t ag_reserved32; /* o: zero */
uint64_t ag_reserved[12];/* o: zero */
};
#define XFS_AG_GEOM_SICK_SB (1 << 0) /* superblock */
#define XFS_AG_GEOM_SICK_AGF (1 << 1) /* AGF header */
#define XFS_AG_GEOM_SICK_AGFL (1 << 2) /* AGFL header */
#define XFS_AG_GEOM_SICK_AGI (1 << 3) /* AGI header */
#define XFS_AG_GEOM_SICK_BNOBT (1 << 4) /* free space by block */
#define XFS_AG_GEOM_SICK_CNTBT (1 << 5) /* free space by length */
#define XFS_AG_GEOM_SICK_INOBT (1 << 6) /* inode index */
#define XFS_AG_GEOM_SICK_FINOBT (1 << 7) /* free inode index */
#define XFS_AG_GEOM_SICK_RMAPBT (1 << 8) /* reverse mappings */
#define XFS_AG_GEOM_SICK_REFCNTBT (1 << 9) /* reference counts */
/* /*
* Structures for XFS_IOC_FSGROWFSDATA, XFS_IOC_FSGROWFSLOG & XFS_IOC_FSGROWFSRT * Structures for XFS_IOC_FSGROWFSDATA, XFS_IOC_FSGROWFSLOG & XFS_IOC_FSGROWFSRT
*/ */
@ -285,13 +349,25 @@ typedef struct xfs_bstat {
#define bs_projid bs_projid_lo /* (previously just bs_projid) */ #define bs_projid bs_projid_lo /* (previously just bs_projid) */
__u16 bs_forkoff; /* inode fork offset in bytes */ __u16 bs_forkoff; /* inode fork offset in bytes */
__u16 bs_projid_hi; /* higher part of project id */ __u16 bs_projid_hi; /* higher part of project id */
unsigned char bs_pad[6]; /* pad space, unused */ uint16_t bs_sick; /* sick inode metadata */
uint16_t bs_checked; /* checked inode metadata */
unsigned char bs_pad[2]; /* pad space, unused */
__u32 bs_cowextsize; /* cow extent size */ __u32 bs_cowextsize; /* cow extent size */
__u32 bs_dmevmask; /* DMIG event mask */ __u32 bs_dmevmask; /* DMIG event mask */
__u16 bs_dmstate; /* DMIG state info */ __u16 bs_dmstate; /* DMIG state info */
__u16 bs_aextents; /* attribute number of extents */ __u16 bs_aextents; /* attribute number of extents */
} xfs_bstat_t; } xfs_bstat_t;
/* bs_sick flags */
#define XFS_BS_SICK_INODE (1 << 0) /* inode core */
#define XFS_BS_SICK_BMBTD (1 << 1) /* data fork */
#define XFS_BS_SICK_BMBTA (1 << 2) /* attr fork */
#define XFS_BS_SICK_BMBTC (1 << 3) /* cow fork */
#define XFS_BS_SICK_DIR (1 << 4) /* directory */
#define XFS_BS_SICK_XATTR (1 << 5) /* extended attributes */
#define XFS_BS_SICK_SYMLINK (1 << 6) /* symbolic link remote target */
#define XFS_BS_SICK_PARENT (1 << 7) /* parent pointers */
/* /*
* Project quota id helpers (previously projid was 16bit only * Project quota id helpers (previously projid was 16bit only
* and using two 16bit values to hold new 32bit projid was choosen * and using two 16bit values to hold new 32bit projid was choosen
@ -502,9 +578,10 @@ struct xfs_scrub_metadata {
#define XFS_SCRUB_TYPE_UQUOTA 21 /* user quotas */ #define XFS_SCRUB_TYPE_UQUOTA 21 /* user quotas */
#define XFS_SCRUB_TYPE_GQUOTA 22 /* group quotas */ #define XFS_SCRUB_TYPE_GQUOTA 22 /* group quotas */
#define XFS_SCRUB_TYPE_PQUOTA 23 /* project quotas */ #define XFS_SCRUB_TYPE_PQUOTA 23 /* project quotas */
#define XFS_SCRUB_TYPE_FSCOUNTERS 24 /* fs summary counters */
/* Number of scrub subcommands. */ /* Number of scrub subcommands. */
#define XFS_SCRUB_TYPE_NR 24 #define XFS_SCRUB_TYPE_NR 25
/* i: Repair this metadata. */ /* i: Repair this metadata. */
#define XFS_SCRUB_IFLAG_REPAIR (1 << 0) #define XFS_SCRUB_IFLAG_REPAIR (1 << 0)
@ -590,6 +667,7 @@ struct xfs_scrub_metadata {
#define XFS_IOC_FREE_EOFBLOCKS _IOR ('X', 58, struct xfs_fs_eofblocks) #define XFS_IOC_FREE_EOFBLOCKS _IOR ('X', 58, struct xfs_fs_eofblocks)
/* XFS_IOC_GETFSMAP ------ hoisted 59 */ /* XFS_IOC_GETFSMAP ------ hoisted 59 */
#define XFS_IOC_SCRUB_METADATA _IOWR('X', 60, struct xfs_scrub_metadata) #define XFS_IOC_SCRUB_METADATA _IOWR('X', 60, struct xfs_scrub_metadata)
#define XFS_IOC_AG_GEOMETRY _IOWR('X', 61, struct xfs_ag_geometry)
/* /*
* ioctl commands that replace IRIX syssgi()'s * ioctl commands that replace IRIX syssgi()'s
@ -620,8 +698,9 @@ struct xfs_scrub_metadata {
#define XFS_IOC_FSSETDM_BY_HANDLE _IOW ('X', 121, struct xfs_fsop_setdm_handlereq) #define XFS_IOC_FSSETDM_BY_HANDLE _IOW ('X', 121, struct xfs_fsop_setdm_handlereq)
#define XFS_IOC_ATTRLIST_BY_HANDLE _IOW ('X', 122, struct xfs_fsop_attrlist_handlereq) #define XFS_IOC_ATTRLIST_BY_HANDLE _IOW ('X', 122, struct xfs_fsop_attrlist_handlereq)
#define XFS_IOC_ATTRMULTI_BY_HANDLE _IOW ('X', 123, struct xfs_fsop_attrmulti_handlereq) #define XFS_IOC_ATTRMULTI_BY_HANDLE _IOW ('X', 123, struct xfs_fsop_attrmulti_handlereq)
#define XFS_IOC_FSGEOMETRY _IOR ('X', 124, struct xfs_fsop_geom) #define XFS_IOC_FSGEOMETRY_V4 _IOR ('X', 124, struct xfs_fsop_geom_v4)
#define XFS_IOC_GOINGDOWN _IOR ('X', 125, uint32_t) #define XFS_IOC_GOINGDOWN _IOR ('X', 125, uint32_t)
#define XFS_IOC_FSGEOMETRY _IOR ('X', 126, struct xfs_fsop_geom)
/* XFS_IOC_GETFSUUID ---------- deprecated 140 */ /* XFS_IOC_GETFSUUID ---------- deprecated 140 */

190
fs/xfs/libxfs/xfs_health.h Normal file
View File

@ -0,0 +1,190 @@
// SPDX-License-Identifier: GPL-2.0+
/*
* Copyright (C) 2019 Oracle. All Rights Reserved.
* Author: Darrick J. Wong <darrick.wong@oracle.com>
*/
#ifndef __XFS_HEALTH_H__
#define __XFS_HEALTH_H__
/*
* In-Core Filesystem Health Assessments
* =====================================
*
* We'd like to be able to summarize the current health status of the
* filesystem so that the administrator knows when it's necessary to schedule
* some downtime for repairs. Until then, we would also like to avoid abrupt
* shutdowns due to corrupt metadata.
*
* The online scrub feature evaluates the health of all filesystem metadata.
* When scrub detects corruption in a piece of metadata it will set the
* corresponding sickness flag, and repair will clear it if successful. If
* problems remain at unmount time, we can also request manual intervention by
* logging a notice to run xfs_repair.
*
* Each health tracking group uses a pair of fields for reporting. The
* "checked" field tell us if a given piece of metadata has ever been examined,
* and the "sick" field tells us if that piece was found to need repairs.
* Therefore we can conclude that for a given sick flag value:
*
* - checked && sick => metadata needs repair
* - checked && !sick => metadata is ok
* - !checked => has not been examined since mount
*/
struct xfs_mount;
struct xfs_perag;
struct xfs_inode;
struct xfs_fsop_geom;
/* Observable health issues for metadata spanning the entire filesystem. */
#define XFS_SICK_FS_COUNTERS (1 << 0) /* summary counters */
#define XFS_SICK_FS_UQUOTA (1 << 1) /* user quota */
#define XFS_SICK_FS_GQUOTA (1 << 2) /* group quota */
#define XFS_SICK_FS_PQUOTA (1 << 3) /* project quota */
/* Observable health issues for realtime volume metadata. */
#define XFS_SICK_RT_BITMAP (1 << 0) /* realtime bitmap */
#define XFS_SICK_RT_SUMMARY (1 << 1) /* realtime summary */
/* Observable health issues for AG metadata. */
#define XFS_SICK_AG_SB (1 << 0) /* superblock */
#define XFS_SICK_AG_AGF (1 << 1) /* AGF header */
#define XFS_SICK_AG_AGFL (1 << 2) /* AGFL header */
#define XFS_SICK_AG_AGI (1 << 3) /* AGI header */
#define XFS_SICK_AG_BNOBT (1 << 4) /* free space by block */
#define XFS_SICK_AG_CNTBT (1 << 5) /* free space by length */
#define XFS_SICK_AG_INOBT (1 << 6) /* inode index */
#define XFS_SICK_AG_FINOBT (1 << 7) /* free inode index */
#define XFS_SICK_AG_RMAPBT (1 << 8) /* reverse mappings */
#define XFS_SICK_AG_REFCNTBT (1 << 9) /* reference counts */
/* Observable health issues for inode metadata. */
#define XFS_SICK_INO_CORE (1 << 0) /* inode core */
#define XFS_SICK_INO_BMBTD (1 << 1) /* data fork */
#define XFS_SICK_INO_BMBTA (1 << 2) /* attr fork */
#define XFS_SICK_INO_BMBTC (1 << 3) /* cow fork */
#define XFS_SICK_INO_DIR (1 << 4) /* directory */
#define XFS_SICK_INO_XATTR (1 << 5) /* extended attributes */
#define XFS_SICK_INO_SYMLINK (1 << 6) /* symbolic link remote target */
#define XFS_SICK_INO_PARENT (1 << 7) /* parent pointers */
/* Primary evidence of health problems in a given group. */
#define XFS_SICK_FS_PRIMARY (XFS_SICK_FS_COUNTERS | \
XFS_SICK_FS_UQUOTA | \
XFS_SICK_FS_GQUOTA | \
XFS_SICK_FS_PQUOTA)
#define XFS_SICK_RT_PRIMARY (XFS_SICK_RT_BITMAP | \
XFS_SICK_RT_SUMMARY)
#define XFS_SICK_AG_PRIMARY (XFS_SICK_AG_SB | \
XFS_SICK_AG_AGF | \
XFS_SICK_AG_AGFL | \
XFS_SICK_AG_AGI | \
XFS_SICK_AG_BNOBT | \
XFS_SICK_AG_CNTBT | \
XFS_SICK_AG_INOBT | \
XFS_SICK_AG_FINOBT | \
XFS_SICK_AG_RMAPBT | \
XFS_SICK_AG_REFCNTBT)
#define XFS_SICK_INO_PRIMARY (XFS_SICK_INO_CORE | \
XFS_SICK_INO_BMBTD | \
XFS_SICK_INO_BMBTA | \
XFS_SICK_INO_BMBTC | \
XFS_SICK_INO_DIR | \
XFS_SICK_INO_XATTR | \
XFS_SICK_INO_SYMLINK | \
XFS_SICK_INO_PARENT)
/* These functions must be provided by the xfs implementation. */
void xfs_fs_mark_sick(struct xfs_mount *mp, unsigned int mask);
void xfs_fs_mark_healthy(struct xfs_mount *mp, unsigned int mask);
void xfs_fs_measure_sickness(struct xfs_mount *mp, unsigned int *sick,
unsigned int *checked);
void xfs_rt_mark_sick(struct xfs_mount *mp, unsigned int mask);
void xfs_rt_mark_healthy(struct xfs_mount *mp, unsigned int mask);
void xfs_rt_measure_sickness(struct xfs_mount *mp, unsigned int *sick,
unsigned int *checked);
void xfs_ag_mark_sick(struct xfs_perag *pag, unsigned int mask);
void xfs_ag_mark_healthy(struct xfs_perag *pag, unsigned int mask);
void xfs_ag_measure_sickness(struct xfs_perag *pag, unsigned int *sick,
unsigned int *checked);
void xfs_inode_mark_sick(struct xfs_inode *ip, unsigned int mask);
void xfs_inode_mark_healthy(struct xfs_inode *ip, unsigned int mask);
void xfs_inode_measure_sickness(struct xfs_inode *ip, unsigned int *sick,
unsigned int *checked);
void xfs_health_unmount(struct xfs_mount *mp);
/* Now some helpers. */
static inline bool
xfs_fs_has_sickness(struct xfs_mount *mp, unsigned int mask)
{
unsigned int sick, checked;
xfs_fs_measure_sickness(mp, &sick, &checked);
return sick & mask;
}
static inline bool
xfs_rt_has_sickness(struct xfs_mount *mp, unsigned int mask)
{
unsigned int sick, checked;
xfs_rt_measure_sickness(mp, &sick, &checked);
return sick & mask;
}
static inline bool
xfs_ag_has_sickness(struct xfs_perag *pag, unsigned int mask)
{
unsigned int sick, checked;
xfs_ag_measure_sickness(pag, &sick, &checked);
return sick & mask;
}
static inline bool
xfs_inode_has_sickness(struct xfs_inode *ip, unsigned int mask)
{
unsigned int sick, checked;
xfs_inode_measure_sickness(ip, &sick, &checked);
return sick & mask;
}
static inline bool
xfs_fs_is_healthy(struct xfs_mount *mp)
{
return !xfs_fs_has_sickness(mp, -1U);
}
static inline bool
xfs_rt_is_healthy(struct xfs_mount *mp)
{
return !xfs_rt_has_sickness(mp, -1U);
}
static inline bool
xfs_ag_is_healthy(struct xfs_perag *pag)
{
return !xfs_ag_has_sickness(pag, -1U);
}
static inline bool
xfs_inode_is_healthy(struct xfs_inode *ip)
{
return !xfs_inode_has_sickness(ip, -1U);
}
void xfs_fsop_geom_health(struct xfs_mount *mp, struct xfs_fsop_geom *geo);
void xfs_ag_geom_health(struct xfs_perag *pag, struct xfs_ag_geometry *ageo);
void xfs_bulkstat_health(struct xfs_inode *ip, struct xfs_bstat *bs);
#endif /* __XFS_HEALTH_H__ */

View File

@ -142,7 +142,7 @@ extern xfs_failaddr_t xfs_dquot_verify(struct xfs_mount *mp,
extern xfs_failaddr_t xfs_dqblk_verify(struct xfs_mount *mp, extern xfs_failaddr_t xfs_dqblk_verify(struct xfs_mount *mp,
struct xfs_dqblk *dqb, xfs_dqid_t id, uint type); struct xfs_dqblk *dqb, xfs_dqid_t id, uint type);
extern int xfs_calc_dquots_per_chunk(unsigned int nbblks); extern int xfs_calc_dquots_per_chunk(unsigned int nbblks);
extern int xfs_dqblk_repair(struct xfs_mount *mp, struct xfs_dqblk *dqb, extern void xfs_dqblk_repair(struct xfs_mount *mp, struct xfs_dqblk *dqb,
xfs_dqid_t id, uint type); xfs_dqid_t id, uint type);
#endif /* __XFS_QUOTA_H__ */ #endif /* __XFS_QUOTA_H__ */

View File

@ -30,6 +30,7 @@
#include "xfs_refcount_btree.h" #include "xfs_refcount_btree.h"
#include "xfs_da_format.h" #include "xfs_da_format.h"
#include "xfs_da_btree.h" #include "xfs_da_btree.h"
#include "xfs_health.h"
/* /*
* Physical superblock buffer manipulations. Shared with libxfs in userspace. * Physical superblock buffer manipulations. Shared with libxfs in userspace.
@ -905,7 +906,7 @@ xfs_initialize_perag_data(
/* /*
* If the new summary counts are obviously incorrect, fail the * If the new summary counts are obviously incorrect, fail the
* mount operation because that implies the AGFs are also corrupt. * mount operation because that implies the AGFs are also corrupt.
* Clear BAD_SUMMARY so that we don't unmount with a dirty log, which * Clear FS_COUNTERS so that we don't unmount with a dirty log, which
* will prevent xfs_repair from fixing anything. * will prevent xfs_repair from fixing anything.
*/ */
if (fdblocks > sbp->sb_dblocks || ifree > ialloc) { if (fdblocks > sbp->sb_dblocks || ifree > ialloc) {
@ -923,7 +924,7 @@ xfs_initialize_perag_data(
xfs_reinit_percpu_counters(mp); xfs_reinit_percpu_counters(mp);
out: out:
mp->m_flags &= ~XFS_MOUNT_BAD_SUMMARY; xfs_fs_mark_healthy(mp, XFS_SICK_FS_COUNTERS);
return error; return error;
} }
@ -1084,7 +1085,7 @@ xfs_sync_sb_buf(
return error; return error;
} }
int void
xfs_fs_geometry( xfs_fs_geometry(
struct xfs_sb *sbp, struct xfs_sb *sbp,
struct xfs_fsop_geom *geo, struct xfs_fsop_geom *geo,
@ -1108,13 +1109,13 @@ xfs_fs_geometry(
memcpy(geo->uuid, &sbp->sb_uuid, sizeof(sbp->sb_uuid)); memcpy(geo->uuid, &sbp->sb_uuid, sizeof(sbp->sb_uuid));
if (struct_version < 2) if (struct_version < 2)
return 0; return;
geo->sunit = sbp->sb_unit; geo->sunit = sbp->sb_unit;
geo->swidth = sbp->sb_width; geo->swidth = sbp->sb_width;
if (struct_version < 3) if (struct_version < 3)
return 0; return;
geo->version = XFS_FSOP_GEOM_VERSION; geo->version = XFS_FSOP_GEOM_VERSION;
geo->flags = XFS_FSOP_GEOM_FLAGS_NLINK | geo->flags = XFS_FSOP_GEOM_FLAGS_NLINK |
@ -1158,14 +1159,17 @@ xfs_fs_geometry(
geo->dirblocksize = xfs_dir2_dirblock_bytes(sbp); geo->dirblocksize = xfs_dir2_dirblock_bytes(sbp);
if (struct_version < 4) if (struct_version < 4)
return 0; return;
if (xfs_sb_version_haslogv2(sbp)) if (xfs_sb_version_haslogv2(sbp))
geo->flags |= XFS_FSOP_GEOM_FLAGS_LOGV2; geo->flags |= XFS_FSOP_GEOM_FLAGS_LOGV2;
geo->logsunit = sbp->sb_logsunit; geo->logsunit = sbp->sb_logsunit;
return 0; if (struct_version < 5)
return;
geo->version = XFS_FSOP_GEOM_VERSION_V5;
} }
/* Read a secondary superblock. */ /* Read a secondary superblock. */

View File

@ -33,7 +33,7 @@ extern void xfs_sb_quota_from_disk(struct xfs_sb *sbp);
extern int xfs_update_secondary_sbs(struct xfs_mount *mp); extern int xfs_update_secondary_sbs(struct xfs_mount *mp);
#define XFS_FS_GEOM_MAX_STRUCT_VER (4) #define XFS_FS_GEOM_MAX_STRUCT_VER (4)
extern int xfs_fs_geometry(struct xfs_sb *sbp, struct xfs_fsop_geom *geo, extern void xfs_fs_geometry(struct xfs_sb *sbp, struct xfs_fsop_geom *geo,
int struct_version); int struct_version);
extern int xfs_sb_read_secondary(struct xfs_mount *mp, extern int xfs_sb_read_secondary(struct xfs_mount *mp,
struct xfs_trans *tp, xfs_agnumber_t agno, struct xfs_trans *tp, xfs_agnumber_t agno,

View File

@ -876,9 +876,13 @@ xfs_trans_resv_calc(
resp->tr_sb.tr_logres = xfs_calc_sb_reservation(mp); resp->tr_sb.tr_logres = xfs_calc_sb_reservation(mp);
resp->tr_sb.tr_logcount = XFS_DEFAULT_LOG_COUNT; resp->tr_sb.tr_logcount = XFS_DEFAULT_LOG_COUNT;
/* growdata requires permanent res; it can free space to the last AG */
resp->tr_growdata.tr_logres = xfs_calc_growdata_reservation(mp);
resp->tr_growdata.tr_logcount = XFS_DEFAULT_PERM_LOG_COUNT;
resp->tr_growdata.tr_logflags |= XFS_TRANS_PERM_LOG_RES;
/* The following transaction are logged in logical format */ /* The following transaction are logged in logical format */
resp->tr_ichange.tr_logres = xfs_calc_ichange_reservation(mp); resp->tr_ichange.tr_logres = xfs_calc_ichange_reservation(mp);
resp->tr_growdata.tr_logres = xfs_calc_growdata_reservation(mp);
resp->tr_fsyncts.tr_logres = xfs_calc_swrite_reservation(mp); resp->tr_fsyncts.tr_logres = xfs_calc_swrite_reservation(mp);
resp->tr_writeid.tr_logres = xfs_calc_writeid_reservation(mp); resp->tr_writeid.tr_logres = xfs_calc_writeid_reservation(mp);
resp->tr_attrsetrt.tr_logres = xfs_calc_attrsetrt_reservation(mp); resp->tr_attrsetrt.tr_logres = xfs_calc_attrsetrt_reservation(mp);

View File

@ -185,7 +185,7 @@ xfs_verify_rtbno(
} }
/* Calculate the range of valid icount values. */ /* Calculate the range of valid icount values. */
static void void
xfs_icount_range( xfs_icount_range(
struct xfs_mount *mp, struct xfs_mount *mp,
unsigned long long *min, unsigned long long *min,

View File

@ -191,5 +191,7 @@ bool xfs_verify_dir_ino(struct xfs_mount *mp, xfs_ino_t ino);
bool xfs_verify_rtbno(struct xfs_mount *mp, xfs_rtblock_t rtbno); bool xfs_verify_rtbno(struct xfs_mount *mp, xfs_rtblock_t rtbno);
bool xfs_verify_icount(struct xfs_mount *mp, unsigned long long icount); bool xfs_verify_icount(struct xfs_mount *mp, unsigned long long icount);
bool xfs_verify_dablk(struct xfs_mount *mp, xfs_fileoff_t off); bool xfs_verify_dablk(struct xfs_mount *mp, xfs_fileoff_t off);
void xfs_icount_range(struct xfs_mount *mp, unsigned long long *min,
unsigned long long *max);
#endif /* __XFS_TYPES_H__ */ #endif /* __XFS_TYPES_H__ */

View File

@ -514,6 +514,7 @@ xchk_agf(
{ {
struct xfs_mount *mp = sc->mp; struct xfs_mount *mp = sc->mp;
struct xfs_agf *agf; struct xfs_agf *agf;
struct xfs_perag *pag;
xfs_agnumber_t agno; xfs_agnumber_t agno;
xfs_agblock_t agbno; xfs_agblock_t agbno;
xfs_agblock_t eoag; xfs_agblock_t eoag;
@ -586,6 +587,16 @@ xchk_agf(
if (agfl_count != 0 && fl_count != agfl_count) if (agfl_count != 0 && fl_count != agfl_count)
xchk_block_set_corrupt(sc, sc->sa.agf_bp); xchk_block_set_corrupt(sc, sc->sa.agf_bp);
/* Do the incore counters match? */
pag = xfs_perag_get(mp, agno);
if (pag->pagf_freeblks != be32_to_cpu(agf->agf_freeblks))
xchk_block_set_corrupt(sc, sc->sa.agf_bp);
if (pag->pagf_flcount != be32_to_cpu(agf->agf_flcount))
xchk_block_set_corrupt(sc, sc->sa.agf_bp);
if (pag->pagf_btreeblks != be32_to_cpu(agf->agf_btreeblks))
xchk_block_set_corrupt(sc, sc->sa.agf_bp);
xfs_perag_put(pag);
xchk_agf_xref(sc); xchk_agf_xref(sc);
out: out:
return error; return error;
@ -811,6 +822,7 @@ xchk_agi(
{ {
struct xfs_mount *mp = sc->mp; struct xfs_mount *mp = sc->mp;
struct xfs_agi *agi; struct xfs_agi *agi;
struct xfs_perag *pag;
xfs_agnumber_t agno; xfs_agnumber_t agno;
xfs_agblock_t agbno; xfs_agblock_t agbno;
xfs_agblock_t eoag; xfs_agblock_t eoag;
@ -881,6 +893,14 @@ xchk_agi(
if (agi->agi_pad32 != cpu_to_be32(0)) if (agi->agi_pad32 != cpu_to_be32(0))
xchk_block_set_corrupt(sc, sc->sa.agi_bp); xchk_block_set_corrupt(sc, sc->sa.agi_bp);
/* Do the incore counters match? */
pag = xfs_perag_get(mp, agno);
if (pag->pagi_count != be32_to_cpu(agi->agi_count))
xchk_block_set_corrupt(sc, sc->sa.agi_bp);
if (pag->pagi_freecount != be32_to_cpu(agi->agi_freecount))
xchk_block_set_corrupt(sc, sc->sa.agi_bp);
xfs_perag_put(pag);
xchk_agi_xref(sc); xchk_agi_xref(sc);
out: out:
return error; return error;

View File

@ -38,6 +38,7 @@
#include "scrub/trace.h" #include "scrub/trace.h"
#include "scrub/btree.h" #include "scrub/btree.h"
#include "scrub/repair.h" #include "scrub/repair.h"
#include "scrub/health.h"
/* Common code for the metadata scrubbers. */ /* Common code for the metadata scrubbers. */
@ -208,6 +209,15 @@ xchk_ino_set_preen(
trace_xchk_ino_preen(sc, ino, __return_address); trace_xchk_ino_preen(sc, ino, __return_address);
} }
/* Record something being wrong with the filesystem primary superblock. */
void
xchk_set_corrupt(
struct xfs_scrub *sc)
{
sc->sm->sm_flags |= XFS_SCRUB_OFLAG_CORRUPT;
trace_xchk_fs_error(sc, 0, __return_address);
}
/* Record a corrupt block. */ /* Record a corrupt block. */
void void
xchk_block_set_corrupt( xchk_block_set_corrupt(
@ -458,13 +468,18 @@ xchk_ag_btcur_init(
struct xfs_mount *mp = sc->mp; struct xfs_mount *mp = sc->mp;
xfs_agnumber_t agno = sa->agno; xfs_agnumber_t agno = sa->agno;
if (sa->agf_bp) { xchk_perag_get(sc->mp, sa);
if (sa->agf_bp &&
xchk_ag_btree_healthy_enough(sc, sa->pag, XFS_BTNUM_BNO)) {
/* Set up a bnobt cursor for cross-referencing. */ /* Set up a bnobt cursor for cross-referencing. */
sa->bno_cur = xfs_allocbt_init_cursor(mp, sc->tp, sa->agf_bp, sa->bno_cur = xfs_allocbt_init_cursor(mp, sc->tp, sa->agf_bp,
agno, XFS_BTNUM_BNO); agno, XFS_BTNUM_BNO);
if (!sa->bno_cur) if (!sa->bno_cur)
goto err; goto err;
}
if (sa->agf_bp &&
xchk_ag_btree_healthy_enough(sc, sa->pag, XFS_BTNUM_CNT)) {
/* Set up a cntbt cursor for cross-referencing. */ /* Set up a cntbt cursor for cross-referencing. */
sa->cnt_cur = xfs_allocbt_init_cursor(mp, sc->tp, sa->agf_bp, sa->cnt_cur = xfs_allocbt_init_cursor(mp, sc->tp, sa->agf_bp,
agno, XFS_BTNUM_CNT); agno, XFS_BTNUM_CNT);
@ -473,7 +488,8 @@ xchk_ag_btcur_init(
} }
/* Set up a inobt cursor for cross-referencing. */ /* Set up a inobt cursor for cross-referencing. */
if (sa->agi_bp) { if (sa->agi_bp &&
xchk_ag_btree_healthy_enough(sc, sa->pag, XFS_BTNUM_INO)) {
sa->ino_cur = xfs_inobt_init_cursor(mp, sc->tp, sa->agi_bp, sa->ino_cur = xfs_inobt_init_cursor(mp, sc->tp, sa->agi_bp,
agno, XFS_BTNUM_INO); agno, XFS_BTNUM_INO);
if (!sa->ino_cur) if (!sa->ino_cur)
@ -481,7 +497,8 @@ xchk_ag_btcur_init(
} }
/* Set up a finobt cursor for cross-referencing. */ /* Set up a finobt cursor for cross-referencing. */
if (sa->agi_bp && xfs_sb_version_hasfinobt(&mp->m_sb)) { if (sa->agi_bp && xfs_sb_version_hasfinobt(&mp->m_sb) &&
xchk_ag_btree_healthy_enough(sc, sa->pag, XFS_BTNUM_FINO)) {
sa->fino_cur = xfs_inobt_init_cursor(mp, sc->tp, sa->agi_bp, sa->fino_cur = xfs_inobt_init_cursor(mp, sc->tp, sa->agi_bp,
agno, XFS_BTNUM_FINO); agno, XFS_BTNUM_FINO);
if (!sa->fino_cur) if (!sa->fino_cur)
@ -489,7 +506,8 @@ xchk_ag_btcur_init(
} }
/* Set up a rmapbt cursor for cross-referencing. */ /* Set up a rmapbt cursor for cross-referencing. */
if (sa->agf_bp && xfs_sb_version_hasrmapbt(&mp->m_sb)) { if (sa->agf_bp && xfs_sb_version_hasrmapbt(&mp->m_sb) &&
xchk_ag_btree_healthy_enough(sc, sa->pag, XFS_BTNUM_RMAP)) {
sa->rmap_cur = xfs_rmapbt_init_cursor(mp, sc->tp, sa->agf_bp, sa->rmap_cur = xfs_rmapbt_init_cursor(mp, sc->tp, sa->agf_bp,
agno); agno);
if (!sa->rmap_cur) if (!sa->rmap_cur)
@ -497,7 +515,8 @@ xchk_ag_btcur_init(
} }
/* Set up a refcountbt cursor for cross-referencing. */ /* Set up a refcountbt cursor for cross-referencing. */
if (sa->agf_bp && xfs_sb_version_hasreflink(&mp->m_sb)) { if (sa->agf_bp && xfs_sb_version_hasreflink(&mp->m_sb) &&
xchk_ag_btree_healthy_enough(sc, sa->pag, XFS_BTNUM_REFC)) {
sa->refc_cur = xfs_refcountbt_init_cursor(mp, sc->tp, sa->refc_cur = xfs_refcountbt_init_cursor(mp, sc->tp,
sa->agf_bp, agno); sa->agf_bp, agno);
if (!sa->refc_cur) if (!sa->refc_cur)
@ -884,3 +903,21 @@ xchk_ilock_inverted(
} }
return -EDEADLOCK; return -EDEADLOCK;
} }
/* Pause background reaping of resources. */
void
xchk_stop_reaping(
struct xfs_scrub *sc)
{
sc->flags |= XCHK_REAPING_DISABLED;
xfs_stop_block_reaping(sc->mp);
}
/* Restart background reaping of resources. */
void
xchk_start_reaping(
struct xfs_scrub *sc)
{
xfs_start_block_reaping(sc->mp);
sc->flags &= ~XCHK_REAPING_DISABLED;
}

View File

@ -39,6 +39,7 @@ void xchk_block_set_preen(struct xfs_scrub *sc,
struct xfs_buf *bp); struct xfs_buf *bp);
void xchk_ino_set_preen(struct xfs_scrub *sc, xfs_ino_t ino); void xchk_ino_set_preen(struct xfs_scrub *sc, xfs_ino_t ino);
void xchk_set_corrupt(struct xfs_scrub *sc);
void xchk_block_set_corrupt(struct xfs_scrub *sc, void xchk_block_set_corrupt(struct xfs_scrub *sc,
struct xfs_buf *bp); struct xfs_buf *bp);
void xchk_ino_set_corrupt(struct xfs_scrub *sc, xfs_ino_t ino); void xchk_ino_set_corrupt(struct xfs_scrub *sc, xfs_ino_t ino);
@ -105,6 +106,7 @@ xchk_setup_quota(struct xfs_scrub *sc, struct xfs_inode *ip)
return -ENOENT; return -ENOENT;
} }
#endif #endif
int xchk_setup_fscounters(struct xfs_scrub *sc, struct xfs_inode *ip);
void xchk_ag_free(struct xfs_scrub *sc, struct xchk_ag *sa); void xchk_ag_free(struct xfs_scrub *sc, struct xchk_ag *sa);
int xchk_ag_init(struct xfs_scrub *sc, xfs_agnumber_t agno, int xchk_ag_init(struct xfs_scrub *sc, xfs_agnumber_t agno,
@ -137,5 +139,7 @@ static inline bool xchk_skip_xref(struct xfs_scrub_metadata *sm)
int xchk_metadata_inode_forks(struct xfs_scrub *sc); int xchk_metadata_inode_forks(struct xfs_scrub *sc);
int xchk_ilock_inverted(struct xfs_inode *ip, uint lock_mode); int xchk_ilock_inverted(struct xfs_inode *ip, uint lock_mode);
void xchk_stop_reaping(struct xfs_scrub *sc);
void xchk_start_reaping(struct xfs_scrub *sc);
#endif /* __XFS_SCRUB_COMMON_H__ */ #endif /* __XFS_SCRUB_COMMON_H__ */

366
fs/xfs/scrub/fscounters.c Normal file
View File

@ -0,0 +1,366 @@
// SPDX-License-Identifier: GPL-2.0+
/*
* Copyright (C) 2019 Oracle. All Rights Reserved.
* Author: Darrick J. Wong <darrick.wong@oracle.com>
*/
#include "xfs.h"
#include "xfs_fs.h"
#include "xfs_shared.h"
#include "xfs_format.h"
#include "xfs_trans_resv.h"
#include "xfs_mount.h"
#include "xfs_defer.h"
#include "xfs_btree.h"
#include "xfs_bit.h"
#include "xfs_log_format.h"
#include "xfs_trans.h"
#include "xfs_sb.h"
#include "xfs_inode.h"
#include "xfs_alloc.h"
#include "xfs_ialloc.h"
#include "xfs_rmap.h"
#include "xfs_error.h"
#include "xfs_errortag.h"
#include "xfs_icache.h"
#include "xfs_health.h"
#include "xfs_bmap.h"
#include "scrub/xfs_scrub.h"
#include "scrub/scrub.h"
#include "scrub/common.h"
#include "scrub/trace.h"
/*
* FS Summary Counters
* ===================
*
* The basics of filesystem summary counter checking are that we iterate the
* AGs counting the number of free blocks, free space btree blocks, per-AG
* reservations, inodes, delayed allocation reservations, and free inodes.
* Then we compare what we computed against the in-core counters.
*
* However, the reality is that summary counters are a tricky beast to check.
* While we /could/ freeze the filesystem and scramble around the AGs counting
* the free blocks, in practice we prefer not do that for a scan because
* freezing is costly. To get around this, we added a per-cpu counter of the
* delalloc reservations so that we can rotor around the AGs relatively
* quickly, and we allow the counts to be slightly off because we're not taking
* any locks while we do this.
*
* So the first thing we do is warm up the buffer cache in the setup routine by
* walking all the AGs to make sure the incore per-AG structure has been
* initialized. The expected value calculation then iterates the incore per-AG
* structures as quickly as it can. We snapshot the percpu counters before and
* after this operation and use the difference in counter values to guess at
* our tolerance for mismatch between expected and actual counter values.
*/
/*
* Since the expected value computation is lockless but only browses incore
* values, the percpu counters should be fairly close to each other. However,
* we'll allow ourselves to be off by at least this (arbitrary) amount.
*/
#define XCHK_FSCOUNT_MIN_VARIANCE (512)
/*
* Make sure the per-AG structure has been initialized from the on-disk header
* contents and trust that the incore counters match the ondisk counters. (The
* AGF and AGI scrubbers check them, and a normal xfs_scrub run checks the
* summary counters after checking all AG headers). Do this from the setup
* function so that the inner AG aggregation loop runs as quickly as possible.
*
* This function runs during the setup phase /before/ we start checking any
* metadata.
*/
STATIC int
xchk_fscount_warmup(
struct xfs_scrub *sc)
{
struct xfs_mount *mp = sc->mp;
struct xfs_buf *agi_bp = NULL;
struct xfs_buf *agf_bp = NULL;
struct xfs_perag *pag = NULL;
xfs_agnumber_t agno;
int error = 0;
for (agno = 0; agno < mp->m_sb.sb_agcount; agno++) {
pag = xfs_perag_get(mp, agno);
if (pag->pagi_init && pag->pagf_init)
goto next_loop_perag;
/* Lock both AG headers. */
error = xfs_ialloc_read_agi(mp, sc->tp, agno, &agi_bp);
if (error)
break;
error = xfs_alloc_read_agf(mp, sc->tp, agno, 0, &agf_bp);
if (error)
break;
error = -ENOMEM;
if (!agf_bp || !agi_bp)
break;
/*
* These are supposed to be initialized by the header read
* function.
*/
error = -EFSCORRUPTED;
if (!pag->pagi_init || !pag->pagf_init)
break;
xfs_buf_relse(agf_bp);
agf_bp = NULL;
xfs_buf_relse(agi_bp);
agi_bp = NULL;
next_loop_perag:
xfs_perag_put(pag);
pag = NULL;
error = 0;
if (fatal_signal_pending(current))
break;
}
if (agf_bp)
xfs_buf_relse(agf_bp);
if (agi_bp)
xfs_buf_relse(agi_bp);
if (pag)
xfs_perag_put(pag);
return error;
}
int
xchk_setup_fscounters(
struct xfs_scrub *sc,
struct xfs_inode *ip)
{
struct xchk_fscounters *fsc;
int error;
sc->buf = kmem_zalloc(sizeof(struct xchk_fscounters), KM_SLEEP);
if (!sc->buf)
return -ENOMEM;
fsc = sc->buf;
xfs_icount_range(sc->mp, &fsc->icount_min, &fsc->icount_max);
/* We must get the incore counters set up before we can proceed. */
error = xchk_fscount_warmup(sc);
if (error)
return error;
/*
* Pause background reclaim while we're scrubbing to reduce the
* likelihood of background perturbations to the counters throwing off
* our calculations.
*/
xchk_stop_reaping(sc);
return xchk_trans_alloc(sc, 0);
}
/*
* Calculate what the global in-core counters ought to be from the incore
* per-AG structure. Callers can compare this to the actual in-core counters
* to estimate by how much both in-core and on-disk counters need to be
* adjusted.
*/
STATIC int
xchk_fscount_aggregate_agcounts(
struct xfs_scrub *sc,
struct xchk_fscounters *fsc)
{
struct xfs_mount *mp = sc->mp;
struct xfs_perag *pag;
uint64_t delayed;
xfs_agnumber_t agno;
int tries = 8;
retry:
fsc->icount = 0;
fsc->ifree = 0;
fsc->fdblocks = 0;
for (agno = 0; agno < mp->m_sb.sb_agcount; agno++) {
pag = xfs_perag_get(mp, agno);
/* This somehow got unset since the warmup? */
if (!pag->pagi_init || !pag->pagf_init) {
xfs_perag_put(pag);
return -EFSCORRUPTED;
}
/* Count all the inodes */
fsc->icount += pag->pagi_count;
fsc->ifree += pag->pagi_freecount;
/* Add up the free/freelist/bnobt/cntbt blocks */
fsc->fdblocks += pag->pagf_freeblks;
fsc->fdblocks += pag->pagf_flcount;
fsc->fdblocks += pag->pagf_btreeblks;
/*
* Per-AG reservations are taken out of the incore counters,
* so they must be left out of the free blocks computation.
*/
fsc->fdblocks -= pag->pag_meta_resv.ar_reserved;
fsc->fdblocks -= pag->pag_rmapbt_resv.ar_orig_reserved;
xfs_perag_put(pag);
if (fatal_signal_pending(current))
break;
}
/*
* The global incore space reservation is taken from the incore
* counters, so leave that out of the computation.
*/
fsc->fdblocks -= mp->m_resblks_avail;
/*
* Delayed allocation reservations are taken out of the incore counters
* but not recorded on disk, so leave them and their indlen blocks out
* of the computation.
*/
delayed = percpu_counter_sum(&mp->m_delalloc_blks);
fsc->fdblocks -= delayed;
trace_xchk_fscounters_calc(mp, fsc->icount, fsc->ifree, fsc->fdblocks,
delayed);
/* Bail out if the values we compute are totally nonsense. */
if (fsc->icount < fsc->icount_min || fsc->icount > fsc->icount_max ||
fsc->fdblocks > mp->m_sb.sb_dblocks ||
fsc->ifree > fsc->icount_max)
return -EFSCORRUPTED;
/*
* If ifree > icount then we probably had some perturbation in the
* counters while we were calculating things. We'll try a few times
* to maintain ifree <= icount before giving up.
*/
if (fsc->ifree > fsc->icount) {
if (tries--)
goto retry;
xchk_set_incomplete(sc);
return 0;
}
return 0;
}
/*
* Is the @counter reasonably close to the @expected value?
*
* We neither locked nor froze anything in the filesystem while aggregating the
* per-AG data to compute the @expected value, which means that the counter
* could have changed. We know the @old_value of the summation of the counter
* before the aggregation, and we re-sum the counter now. If the expected
* value falls between the two summations, we're ok.
*
* Otherwise, we /might/ have a problem. If the change in the summations is
* more than we want to tolerate, the filesystem is probably busy and we should
* just send back INCOMPLETE and see if userspace will try again.
*/
static inline bool
xchk_fscount_within_range(
struct xfs_scrub *sc,
const int64_t old_value,
struct percpu_counter *counter,
uint64_t expected)
{
int64_t min_value, max_value;
int64_t curr_value = percpu_counter_sum(counter);
trace_xchk_fscounters_within_range(sc->mp, expected, curr_value,
old_value);
/* Negative values are always wrong. */
if (curr_value < 0)
return false;
/* Exact matches are always ok. */
if (curr_value == expected)
return true;
min_value = min(old_value, curr_value);
max_value = max(old_value, curr_value);
/* Within the before-and-after range is ok. */
if (expected >= min_value && expected <= max_value)
return true;
/*
* If the difference between the two summations is too large, the fs
* might just be busy and so we'll mark the scrub incomplete. Return
* true here so that we don't mark the counter corrupt.
*
* XXX: In the future when userspace can grant scrub permission to
* quiesce the filesystem to solve the outsized variance problem, this
* check should be moved up and the return code changed to signal to
* userspace that we need quiesce permission.
*/
if (max_value - min_value >= XCHK_FSCOUNT_MIN_VARIANCE) {
xchk_set_incomplete(sc);
return true;
}
return false;
}
/* Check the superblock counters. */
int
xchk_fscounters(
struct xfs_scrub *sc)
{
struct xfs_mount *mp = sc->mp;
struct xchk_fscounters *fsc = sc->buf;
int64_t icount, ifree, fdblocks;
int error;
/* Snapshot the percpu counters. */
icount = percpu_counter_sum(&mp->m_icount);
ifree = percpu_counter_sum(&mp->m_ifree);
fdblocks = percpu_counter_sum(&mp->m_fdblocks);
/* No negative values, please! */
if (icount < 0 || ifree < 0 || fdblocks < 0)
xchk_set_corrupt(sc);
/* See if icount is obviously wrong. */
if (icount < fsc->icount_min || icount > fsc->icount_max)
xchk_set_corrupt(sc);
/* See if fdblocks is obviously wrong. */
if (fdblocks > mp->m_sb.sb_dblocks)
xchk_set_corrupt(sc);
/*
* If ifree exceeds icount by more than the minimum variance then
* something's probably wrong with the counters.
*/
if (ifree > icount && ifree - icount > XCHK_FSCOUNT_MIN_VARIANCE)
xchk_set_corrupt(sc);
/* Walk the incore AG headers to calculate the expected counters. */
error = xchk_fscount_aggregate_agcounts(sc, fsc);
if (!xchk_process_error(sc, 0, XFS_SB_BLOCK(mp), &error))
return error;
if (sc->sm->sm_flags & XFS_SCRUB_OFLAG_INCOMPLETE)
return 0;
/* Compare the in-core counters with whatever we counted. */
if (!xchk_fscount_within_range(sc, icount, &mp->m_icount, fsc->icount))
xchk_set_corrupt(sc);
if (!xchk_fscount_within_range(sc, ifree, &mp->m_ifree, fsc->ifree))
xchk_set_corrupt(sc);
if (!xchk_fscount_within_range(sc, fdblocks, &mp->m_fdblocks,
fsc->fdblocks))
xchk_set_corrupt(sc);
return 0;
}

237
fs/xfs/scrub/health.c Normal file
View File

@ -0,0 +1,237 @@
// SPDX-License-Identifier: GPL-2.0+
/*
* Copyright (C) 2019 Oracle. All Rights Reserved.
* Author: Darrick J. Wong <darrick.wong@oracle.com>
*/
#include "xfs.h"
#include "xfs_fs.h"
#include "xfs_shared.h"
#include "xfs_format.h"
#include "xfs_trans_resv.h"
#include "xfs_mount.h"
#include "xfs_defer.h"
#include "xfs_btree.h"
#include "xfs_bit.h"
#include "xfs_log_format.h"
#include "xfs_trans.h"
#include "xfs_sb.h"
#include "xfs_inode.h"
#include "xfs_health.h"
#include "scrub/scrub.h"
#include "scrub/health.h"
/*
* Scrub and In-Core Filesystem Health Assessments
* ===============================================
*
* Online scrub and repair have the time and the ability to perform stronger
* checks than we can do from the metadata verifiers, because they can
* cross-reference records between data structures. Therefore, scrub is in a
* good position to update the online filesystem health assessments to reflect
* the good/bad state of the data structure.
*
* We therefore extend scrub in the following ways to achieve this:
*
* 1. Create a "sick_mask" field in the scrub context. When we're setting up a
* scrub call, set this to the default XFS_SICK_* flag(s) for the selected
* scrub type (call it A). Scrub and repair functions can override the default
* sick_mask value if they choose.
*
* 2. If the scrubber returns a runtime error code, we exit making no changes
* to the incore sick state.
*
* 3. If the scrubber finds that A is clean, use sick_mask to clear the incore
* sick flags before exiting.
*
* 4. If the scrubber finds that A is corrupt, use sick_mask to set the incore
* sick flags. If the user didn't want to repair then we exit, leaving the
* metadata structure unfixed and the sick flag set.
*
* 5. Now we know that A is corrupt and the user wants to repair, so run the
* repairer. If the repairer returns an error code, we exit with that error
* code, having made no further changes to the incore sick state.
*
* 6. If repair rebuilds A correctly and the subsequent re-scrub of A is clean,
* use sick_mask to clear the incore sick flags. This should have the effect
* that A is no longer marked sick.
*
* 7. If repair rebuilds A incorrectly, the re-scrub will find it corrupt and
* use sick_mask to set the incore sick flags. This should have no externally
* visible effect since we already set them in step (4).
*
* There are some complications to this story, however. For certain types of
* complementary metadata indices (e.g. inobt/finobt), it is easier to rebuild
* both structures at the same time. The following principles apply to this
* type of repair strategy:
*
* 8. Any repair function that rebuilds multiple structures should update
* sick_mask_visible to reflect whatever other structures are rebuilt, and
* verify that all the rebuilt structures can pass a scrub check. The outcomes
* of 5-7 still apply, but with a sick_mask that covers everything being
* rebuilt.
*/
/* Map our scrub type to a sick mask and a set of health update functions. */
enum xchk_health_group {
XHG_FS = 1,
XHG_RT,
XHG_AG,
XHG_INO,
};
struct xchk_health_map {
enum xchk_health_group group;
unsigned int sick_mask;
};
static const struct xchk_health_map type_to_health_flag[XFS_SCRUB_TYPE_NR] = {
[XFS_SCRUB_TYPE_SB] = { XHG_AG, XFS_SICK_AG_SB },
[XFS_SCRUB_TYPE_AGF] = { XHG_AG, XFS_SICK_AG_AGF },
[XFS_SCRUB_TYPE_AGFL] = { XHG_AG, XFS_SICK_AG_AGFL },
[XFS_SCRUB_TYPE_AGI] = { XHG_AG, XFS_SICK_AG_AGI },
[XFS_SCRUB_TYPE_BNOBT] = { XHG_AG, XFS_SICK_AG_BNOBT },
[XFS_SCRUB_TYPE_CNTBT] = { XHG_AG, XFS_SICK_AG_CNTBT },
[XFS_SCRUB_TYPE_INOBT] = { XHG_AG, XFS_SICK_AG_INOBT },
[XFS_SCRUB_TYPE_FINOBT] = { XHG_AG, XFS_SICK_AG_FINOBT },
[XFS_SCRUB_TYPE_RMAPBT] = { XHG_AG, XFS_SICK_AG_RMAPBT },
[XFS_SCRUB_TYPE_REFCNTBT] = { XHG_AG, XFS_SICK_AG_REFCNTBT },
[XFS_SCRUB_TYPE_INODE] = { XHG_INO, XFS_SICK_INO_CORE },
[XFS_SCRUB_TYPE_BMBTD] = { XHG_INO, XFS_SICK_INO_BMBTD },
[XFS_SCRUB_TYPE_BMBTA] = { XHG_INO, XFS_SICK_INO_BMBTA },
[XFS_SCRUB_TYPE_BMBTC] = { XHG_INO, XFS_SICK_INO_BMBTC },
[XFS_SCRUB_TYPE_DIR] = { XHG_INO, XFS_SICK_INO_DIR },
[XFS_SCRUB_TYPE_XATTR] = { XHG_INO, XFS_SICK_INO_XATTR },
[XFS_SCRUB_TYPE_SYMLINK] = { XHG_INO, XFS_SICK_INO_SYMLINK },
[XFS_SCRUB_TYPE_PARENT] = { XHG_INO, XFS_SICK_INO_PARENT },
[XFS_SCRUB_TYPE_RTBITMAP] = { XHG_RT, XFS_SICK_RT_BITMAP },
[XFS_SCRUB_TYPE_RTSUM] = { XHG_RT, XFS_SICK_RT_SUMMARY },
[XFS_SCRUB_TYPE_UQUOTA] = { XHG_FS, XFS_SICK_FS_UQUOTA },
[XFS_SCRUB_TYPE_GQUOTA] = { XHG_FS, XFS_SICK_FS_GQUOTA },
[XFS_SCRUB_TYPE_PQUOTA] = { XHG_FS, XFS_SICK_FS_PQUOTA },
[XFS_SCRUB_TYPE_FSCOUNTERS] = { XHG_FS, XFS_SICK_FS_COUNTERS },
};
/* Return the health status mask for this scrub type. */
unsigned int
xchk_health_mask_for_scrub_type(
__u32 scrub_type)
{
return type_to_health_flag[scrub_type].sick_mask;
}
/*
* Update filesystem health assessments based on what we found and did.
*
* If the scrubber finds errors, we mark sick whatever's mentioned in
* sick_mask, no matter whether this is a first scan or an
* evaluation of repair effectiveness.
*
* Otherwise, no direct corruption was found, so mark whatever's in
* sick_mask as healthy.
*/
void
xchk_update_health(
struct xfs_scrub *sc)
{
struct xfs_perag *pag;
bool bad;
if (!sc->sick_mask)
return;
bad = (sc->sm->sm_flags & XFS_SCRUB_OFLAG_CORRUPT);
switch (type_to_health_flag[sc->sm->sm_type].group) {
case XHG_AG:
pag = xfs_perag_get(sc->mp, sc->sm->sm_agno);
if (bad)
xfs_ag_mark_sick(pag, sc->sick_mask);
else
xfs_ag_mark_healthy(pag, sc->sick_mask);
xfs_perag_put(pag);
break;
case XHG_INO:
if (!sc->ip)
return;
if (bad)
xfs_inode_mark_sick(sc->ip, sc->sick_mask);
else
xfs_inode_mark_healthy(sc->ip, sc->sick_mask);
break;
case XHG_FS:
if (bad)
xfs_fs_mark_sick(sc->mp, sc->sick_mask);
else
xfs_fs_mark_healthy(sc->mp, sc->sick_mask);
break;
case XHG_RT:
if (bad)
xfs_rt_mark_sick(sc->mp, sc->sick_mask);
else
xfs_rt_mark_healthy(sc->mp, sc->sick_mask);
break;
default:
ASSERT(0);
break;
}
}
/* Is the given per-AG btree healthy enough for scanning? */
bool
xchk_ag_btree_healthy_enough(
struct xfs_scrub *sc,
struct xfs_perag *pag,
xfs_btnum_t btnum)
{
unsigned int mask = 0;
/*
* We always want the cursor if it's the same type as whatever we're
* scrubbing, even if we already know the structure is corrupt.
*
* Otherwise, we're only interested in the btree for cross-referencing.
* If we know the btree is bad then don't bother, just set XFAIL.
*/
switch (btnum) {
case XFS_BTNUM_BNO:
if (sc->sm->sm_type == XFS_SCRUB_TYPE_BNOBT)
return true;
mask = XFS_SICK_AG_BNOBT;
break;
case XFS_BTNUM_CNT:
if (sc->sm->sm_type == XFS_SCRUB_TYPE_CNTBT)
return true;
mask = XFS_SICK_AG_CNTBT;
break;
case XFS_BTNUM_INO:
if (sc->sm->sm_type == XFS_SCRUB_TYPE_INOBT)
return true;
mask = XFS_SICK_AG_INOBT;
break;
case XFS_BTNUM_FINO:
if (sc->sm->sm_type == XFS_SCRUB_TYPE_FINOBT)
return true;
mask = XFS_SICK_AG_FINOBT;
break;
case XFS_BTNUM_RMAP:
if (sc->sm->sm_type == XFS_SCRUB_TYPE_RMAPBT)
return true;
mask = XFS_SICK_AG_RMAPBT;
break;
case XFS_BTNUM_REFC:
if (sc->sm->sm_type == XFS_SCRUB_TYPE_REFCNTBT)
return true;
mask = XFS_SICK_AG_REFCNTBT;
break;
default:
ASSERT(0);
return true;
}
if (xfs_ag_has_sickness(pag, mask)) {
sc->sm->sm_flags |= XFS_SCRUB_OFLAG_XFAIL;
return false;
}
return true;
}

14
fs/xfs/scrub/health.h Normal file
View File

@ -0,0 +1,14 @@
// SPDX-License-Identifier: GPL-2.0+
/*
* Copyright (C) 2019 Oracle. All Rights Reserved.
* Author: Darrick J. Wong <darrick.wong@oracle.com>
*/
#ifndef __XFS_SCRUB_HEALTH_H__
#define __XFS_SCRUB_HEALTH_H__
unsigned int xchk_health_mask_for_scrub_type(__u32 scrub_type);
void xchk_update_health(struct xfs_scrub *sc);
bool xchk_ag_btree_healthy_enough(struct xfs_scrub *sc, struct xfs_perag *pag,
xfs_btnum_t btnum);
#endif /* __XFS_SCRUB_HEALTH_H__ */

View File

@ -39,7 +39,7 @@ xchk_setup_ag_iallocbt(
struct xfs_scrub *sc, struct xfs_scrub *sc,
struct xfs_inode *ip) struct xfs_inode *ip)
{ {
return xchk_setup_ag_btree(sc, ip, sc->try_harder); return xchk_setup_ag_btree(sc, ip, sc->flags & XCHK_TRY_HARDER);
} }
/* Inode btree scrubber. */ /* Inode btree scrubber. */
@ -185,7 +185,7 @@ xchk_iallocbt_check_cluster_ifree(
if (error == -ENODATA) { if (error == -ENODATA) {
/* Not cached, just read the disk buffer */ /* Not cached, just read the disk buffer */
freemask_ok = irec_free ^ !!(dip->di_mode); freemask_ok = irec_free ^ !!(dip->di_mode);
if (!bs->sc->try_harder && !freemask_ok) if (!(bs->sc->flags & XCHK_TRY_HARDER) && !freemask_ok)
return -EDEADLOCK; return -EDEADLOCK;
} else if (error < 0) { } else if (error < 0) {
/* /*

View File

@ -320,7 +320,7 @@ xchk_parent(
* If we failed to lock the parent inode even after a retry, just mark * If we failed to lock the parent inode even after a retry, just mark
* this scrub incomplete and return. * this scrub incomplete and return.
*/ */
if (sc->try_harder && error == -EDEADLOCK) { if ((sc->flags & XCHK_TRY_HARDER) && error == -EDEADLOCK) {
error = 0; error = 0;
xchk_set_incomplete(sc); xchk_set_incomplete(sc);
} }

View File

@ -60,7 +60,7 @@ xchk_setup_quota(
dqtype = xchk_quota_to_dqtype(sc); dqtype = xchk_quota_to_dqtype(sc);
if (dqtype == 0) if (dqtype == 0)
return -EINVAL; return -EINVAL;
sc->has_quotaofflock = true; sc->flags |= XCHK_HAS_QUOTAOFFLOCK;
mutex_lock(&sc->mp->m_quotainfo->qi_quotaofflock); mutex_lock(&sc->mp->m_quotainfo->qi_quotaofflock);
if (!xfs_this_quota_on(sc->mp, dqtype)) if (!xfs_this_quota_on(sc->mp, dqtype))
return -ENOENT; return -ENOENT;

View File

@ -46,8 +46,7 @@
int int
xrep_attempt( xrep_attempt(
struct xfs_inode *ip, struct xfs_inode *ip,
struct xfs_scrub *sc, struct xfs_scrub *sc)
bool *fixed)
{ {
int error = 0; int error = 0;
@ -66,13 +65,13 @@ xrep_attempt(
* scrub so that we can tell userspace if we fixed the problem. * scrub so that we can tell userspace if we fixed the problem.
*/ */
sc->sm->sm_flags &= ~XFS_SCRUB_FLAGS_OUT; sc->sm->sm_flags &= ~XFS_SCRUB_FLAGS_OUT;
*fixed = true; sc->flags |= XREP_ALREADY_FIXED;
return -EAGAIN; return -EAGAIN;
case -EDEADLOCK: case -EDEADLOCK:
case -EAGAIN: case -EAGAIN:
/* Tell the caller to try again having grabbed all the locks. */ /* Tell the caller to try again having grabbed all the locks. */
if (!sc->try_harder) { if (!(sc->flags & XCHK_TRY_HARDER)) {
sc->try_harder = true; sc->flags |= XCHK_TRY_HARDER;
return -EAGAIN; return -EAGAIN;
} }
/* /*
@ -137,10 +136,16 @@ xrep_roll_ag_trans(
if (sc->sa.agfl_bp) if (sc->sa.agfl_bp)
xfs_trans_bhold(sc->tp, sc->sa.agfl_bp); xfs_trans_bhold(sc->tp, sc->sa.agfl_bp);
/* Roll the transaction. */ /*
* Roll the transaction. We still own the buffer and the buffer lock
* regardless of whether or not the roll succeeds. If the roll fails,
* the buffers will be released during teardown on our way out of the
* kernel. If it succeeds, we join them to the new transaction and
* move on.
*/
error = xfs_trans_roll(&sc->tp); error = xfs_trans_roll(&sc->tp);
if (error) if (error)
goto out_release; return error;
/* Join AG headers to the new transaction. */ /* Join AG headers to the new transaction. */
if (sc->sa.agi_bp) if (sc->sa.agi_bp)
@ -151,21 +156,6 @@ xrep_roll_ag_trans(
xfs_trans_bjoin(sc->tp, sc->sa.agfl_bp); xfs_trans_bjoin(sc->tp, sc->sa.agfl_bp);
return 0; return 0;
out_release:
/*
* Rolling failed, so release the hold on the buffers. The
* buffers will be released during teardown on our way out
* of the kernel.
*/
if (sc->sa.agi_bp)
xfs_trans_bhold_release(sc->tp, sc->sa.agi_bp);
if (sc->sa.agf_bp)
xfs_trans_bhold_release(sc->tp, sc->sa.agf_bp);
if (sc->sa.agfl_bp)
xfs_trans_bhold_release(sc->tp, sc->sa.agfl_bp);
return error;
} }
/* /*

View File

@ -15,7 +15,7 @@ static inline int xrep_notsupported(struct xfs_scrub *sc)
/* Repair helpers */ /* Repair helpers */
int xrep_attempt(struct xfs_inode *ip, struct xfs_scrub *sc, bool *fixed); int xrep_attempt(struct xfs_inode *ip, struct xfs_scrub *sc);
void xrep_failure(struct xfs_mount *mp); void xrep_failure(struct xfs_mount *mp);
int xrep_roll_ag_trans(struct xfs_scrub *sc); int xrep_roll_ag_trans(struct xfs_scrub *sc);
bool xrep_ag_has_space(struct xfs_perag *pag, xfs_extlen_t nr_blocks, bool xrep_ag_has_space(struct xfs_perag *pag, xfs_extlen_t nr_blocks,
@ -64,8 +64,7 @@ int xrep_agi(struct xfs_scrub *sc);
static inline int xrep_attempt( static inline int xrep_attempt(
struct xfs_inode *ip, struct xfs_inode *ip,
struct xfs_scrub *sc, struct xfs_scrub *sc)
bool *fixed)
{ {
return -EOPNOTSUPP; return -EOPNOTSUPP;
} }

View File

@ -40,6 +40,7 @@
#include "scrub/trace.h" #include "scrub/trace.h"
#include "scrub/btree.h" #include "scrub/btree.h"
#include "scrub/repair.h" #include "scrub/repair.h"
#include "scrub/health.h"
/* /*
* Online Scrub and Repair * Online Scrub and Repair
@ -186,8 +187,12 @@ xchk_teardown(
xfs_irele(sc->ip); xfs_irele(sc->ip);
sc->ip = NULL; sc->ip = NULL;
} }
if (sc->has_quotaofflock) if (sc->flags & XCHK_REAPING_DISABLED)
xchk_start_reaping(sc);
if (sc->flags & XCHK_HAS_QUOTAOFFLOCK) {
mutex_unlock(&sc->mp->m_quotainfo->qi_quotaofflock); mutex_unlock(&sc->mp->m_quotainfo->qi_quotaofflock);
sc->flags &= ~XCHK_HAS_QUOTAOFFLOCK;
}
if (sc->buf) { if (sc->buf) {
kmem_free(sc->buf); kmem_free(sc->buf);
sc->buf = NULL; sc->buf = NULL;
@ -347,6 +352,12 @@ static const struct xchk_meta_ops meta_scrub_ops[] = {
.scrub = xchk_quota, .scrub = xchk_quota,
.repair = xrep_notsupported, .repair = xrep_notsupported,
}, },
[XFS_SCRUB_TYPE_FSCOUNTERS] = { /* fs summary counters */
.type = ST_FS,
.setup = xchk_setup_fscounters,
.scrub = xchk_fscounters,
.repair = xrep_notsupported,
},
}; };
/* This isn't a stable feature, warn once per day. */ /* This isn't a stable feature, warn once per day. */
@ -466,10 +477,14 @@ xfs_scrub_metadata(
struct xfs_inode *ip, struct xfs_inode *ip,
struct xfs_scrub_metadata *sm) struct xfs_scrub_metadata *sm)
{ {
struct xfs_scrub sc; struct xfs_scrub sc = {
.mp = ip->i_mount,
.sm = sm,
.sa = {
.agno = NULLAGNUMBER,
},
};
struct xfs_mount *mp = ip->i_mount; struct xfs_mount *mp = ip->i_mount;
bool try_harder = false;
bool already_fixed = false;
int error = 0; int error = 0;
BUILD_BUG_ON(sizeof(meta_scrub_ops) != BUILD_BUG_ON(sizeof(meta_scrub_ops) !=
@ -491,21 +506,17 @@ xfs_scrub_metadata(
xchk_experimental_warning(mp); xchk_experimental_warning(mp);
sc.ops = &meta_scrub_ops[sm->sm_type];
sc.sick_mask = xchk_health_mask_for_scrub_type(sm->sm_type);
retry_op: retry_op:
/* Set up for the operation. */ /* Set up for the operation. */
memset(&sc, 0, sizeof(sc));
sc.mp = ip->i_mount;
sc.sm = sm;
sc.ops = &meta_scrub_ops[sm->sm_type];
sc.try_harder = try_harder;
sc.sa.agno = NULLAGNUMBER;
error = sc.ops->setup(&sc, ip); error = sc.ops->setup(&sc, ip);
if (error) if (error)
goto out_teardown; goto out_teardown;
/* Scrub for errors. */ /* Scrub for errors. */
error = sc.ops->scrub(&sc); error = sc.ops->scrub(&sc);
if (!try_harder && error == -EDEADLOCK) { if (!(sc.flags & XCHK_TRY_HARDER) && error == -EDEADLOCK) {
/* /*
* Scrubbers return -EDEADLOCK to mean 'try harder'. * Scrubbers return -EDEADLOCK to mean 'try harder'.
* Tear down everything we hold, then set up again with * Tear down everything we hold, then set up again with
@ -514,12 +525,15 @@ xfs_scrub_metadata(
error = xchk_teardown(&sc, ip, 0); error = xchk_teardown(&sc, ip, 0);
if (error) if (error)
goto out; goto out;
try_harder = true; sc.flags |= XCHK_TRY_HARDER;
goto retry_op; goto retry_op;
} else if (error) } else if (error)
goto out_teardown; goto out_teardown;
if ((sc.sm->sm_flags & XFS_SCRUB_IFLAG_REPAIR) && !already_fixed) { xchk_update_health(&sc);
if ((sc.sm->sm_flags & XFS_SCRUB_IFLAG_REPAIR) &&
!(sc.flags & XREP_ALREADY_FIXED)) {
bool needs_fix; bool needs_fix;
/* Let debug users force us into the repair routines. */ /* Let debug users force us into the repair routines. */
@ -542,10 +556,13 @@ xfs_scrub_metadata(
* If it's broken, userspace wants us to fix it, and we haven't * If it's broken, userspace wants us to fix it, and we haven't
* already tried to fix it, then attempt a repair. * already tried to fix it, then attempt a repair.
*/ */
error = xrep_attempt(ip, &sc, &already_fixed); error = xrep_attempt(ip, &sc);
if (error == -EAGAIN) { if (error == -EAGAIN) {
if (sc.try_harder) /*
try_harder = true; * Either the repair function succeeded or it couldn't
* get all the resources it needs; either way, we go
* back to the beginning and call the scrub function.
*/
error = xchk_teardown(&sc, ip, 0); error = xchk_teardown(&sc, ip, 0);
if (error) { if (error) {
xrep_failure(mp); xrep_failure(mp);

View File

@ -62,13 +62,27 @@ struct xfs_scrub {
struct xfs_inode *ip; struct xfs_inode *ip;
void *buf; void *buf;
uint ilock_flags; uint ilock_flags;
bool try_harder;
bool has_quotaofflock; /* See the XCHK/XREP state flags below. */
unsigned int flags;
/*
* The XFS_SICK_* flags that correspond to the metadata being scrubbed
* or repaired. We will use this mask to update the in-core fs health
* status with whatever we find.
*/
unsigned int sick_mask;
/* State tracking for single-AG operations. */ /* State tracking for single-AG operations. */
struct xchk_ag sa; struct xchk_ag sa;
}; };
/* XCHK state flags grow up from zero, XREP state flags grown down from 2^31 */
#define XCHK_TRY_HARDER (1 << 0) /* can't get resources, try again */
#define XCHK_HAS_QUOTAOFFLOCK (1 << 1) /* we hold the quotaoff lock */
#define XCHK_REAPING_DISABLED (1 << 2) /* background block reaping paused */
#define XREP_ALREADY_FIXED (1 << 31) /* checking our repair work */
/* Metadata scrubbers */ /* Metadata scrubbers */
int xchk_tester(struct xfs_scrub *sc); int xchk_tester(struct xfs_scrub *sc);
int xchk_superblock(struct xfs_scrub *sc); int xchk_superblock(struct xfs_scrub *sc);
@ -113,6 +127,7 @@ xchk_quota(struct xfs_scrub *sc)
return -ENOENT; return -ENOENT;
} }
#endif #endif
int xchk_fscounters(struct xfs_scrub *sc);
/* cross-referencing helpers */ /* cross-referencing helpers */
void xchk_xref_is_used_space(struct xfs_scrub *sc, xfs_agblock_t agbno, void xchk_xref_is_used_space(struct xfs_scrub *sc, xfs_agblock_t agbno,
@ -138,4 +153,12 @@ void xchk_xref_is_used_rt_space(struct xfs_scrub *sc, xfs_rtblock_t rtbno,
# define xchk_xref_is_used_rt_space(sc, rtbno, len) do { } while (0) # define xchk_xref_is_used_rt_space(sc, rtbno, len) do { } while (0)
#endif #endif
struct xchk_fscounters {
uint64_t icount;
uint64_t ifree;
uint64_t fdblocks;
unsigned long long icount_min;
unsigned long long icount_max;
};
#endif /* __XFS_SCRUB_SCRUB_H__ */ #endif /* __XFS_SCRUB_SCRUB_H__ */

View File

@ -50,6 +50,7 @@ TRACE_DEFINE_ENUM(XFS_SCRUB_TYPE_RTSUM);
TRACE_DEFINE_ENUM(XFS_SCRUB_TYPE_UQUOTA); TRACE_DEFINE_ENUM(XFS_SCRUB_TYPE_UQUOTA);
TRACE_DEFINE_ENUM(XFS_SCRUB_TYPE_GQUOTA); TRACE_DEFINE_ENUM(XFS_SCRUB_TYPE_GQUOTA);
TRACE_DEFINE_ENUM(XFS_SCRUB_TYPE_PQUOTA); TRACE_DEFINE_ENUM(XFS_SCRUB_TYPE_PQUOTA);
TRACE_DEFINE_ENUM(XFS_SCRUB_TYPE_FSCOUNTERS);
#define XFS_SCRUB_TYPE_STRINGS \ #define XFS_SCRUB_TYPE_STRINGS \
{ XFS_SCRUB_TYPE_PROBE, "probe" }, \ { XFS_SCRUB_TYPE_PROBE, "probe" }, \
@ -75,7 +76,8 @@ TRACE_DEFINE_ENUM(XFS_SCRUB_TYPE_PQUOTA);
{ XFS_SCRUB_TYPE_RTSUM, "rtsummary" }, \ { XFS_SCRUB_TYPE_RTSUM, "rtsummary" }, \
{ XFS_SCRUB_TYPE_UQUOTA, "usrquota" }, \ { XFS_SCRUB_TYPE_UQUOTA, "usrquota" }, \
{ XFS_SCRUB_TYPE_GQUOTA, "grpquota" }, \ { XFS_SCRUB_TYPE_GQUOTA, "grpquota" }, \
{ XFS_SCRUB_TYPE_PQUOTA, "prjquota" } { XFS_SCRUB_TYPE_PQUOTA, "prjquota" }, \
{ XFS_SCRUB_TYPE_FSCOUNTERS, "fscounters" }
DECLARE_EVENT_CLASS(xchk_class, DECLARE_EVENT_CLASS(xchk_class,
TP_PROTO(struct xfs_inode *ip, struct xfs_scrub_metadata *sm, TP_PROTO(struct xfs_inode *ip, struct xfs_scrub_metadata *sm,
@ -223,6 +225,7 @@ DEFINE_EVENT(xchk_block_error_class, name, \
void *ret_ip), \ void *ret_ip), \
TP_ARGS(sc, daddr, ret_ip)) TP_ARGS(sc, daddr, ret_ip))
DEFINE_SCRUB_BLOCK_ERROR_EVENT(xchk_fs_error);
DEFINE_SCRUB_BLOCK_ERROR_EVENT(xchk_block_error); DEFINE_SCRUB_BLOCK_ERROR_EVENT(xchk_block_error);
DEFINE_SCRUB_BLOCK_ERROR_EVENT(xchk_block_preen); DEFINE_SCRUB_BLOCK_ERROR_EVENT(xchk_block_preen);
@ -590,6 +593,64 @@ TRACE_EVENT(xchk_iallocbt_check_cluster,
__entry->cluster_ino) __entry->cluster_ino)
) )
TRACE_EVENT(xchk_fscounters_calc,
TP_PROTO(struct xfs_mount *mp, uint64_t icount, uint64_t ifree,
uint64_t fdblocks, uint64_t delalloc),
TP_ARGS(mp, icount, ifree, fdblocks, delalloc),
TP_STRUCT__entry(
__field(dev_t, dev)
__field(int64_t, icount_sb)
__field(uint64_t, icount_calculated)
__field(int64_t, ifree_sb)
__field(uint64_t, ifree_calculated)
__field(int64_t, fdblocks_sb)
__field(uint64_t, fdblocks_calculated)
__field(uint64_t, delalloc)
),
TP_fast_assign(
__entry->dev = mp->m_super->s_dev;
__entry->icount_sb = mp->m_sb.sb_icount;
__entry->icount_calculated = icount;
__entry->ifree_sb = mp->m_sb.sb_ifree;
__entry->ifree_calculated = ifree;
__entry->fdblocks_sb = mp->m_sb.sb_fdblocks;
__entry->fdblocks_calculated = fdblocks;
__entry->delalloc = delalloc;
),
TP_printk("dev %d:%d icount %lld:%llu ifree %lld::%llu fdblocks %lld::%llu delalloc %llu",
MAJOR(__entry->dev), MINOR(__entry->dev),
__entry->icount_sb,
__entry->icount_calculated,
__entry->ifree_sb,
__entry->ifree_calculated,
__entry->fdblocks_sb,
__entry->fdblocks_calculated,
__entry->delalloc)
)
TRACE_EVENT(xchk_fscounters_within_range,
TP_PROTO(struct xfs_mount *mp, uint64_t expected, int64_t curr_value,
int64_t old_value),
TP_ARGS(mp, expected, curr_value, old_value),
TP_STRUCT__entry(
__field(dev_t, dev)
__field(uint64_t, expected)
__field(int64_t, curr_value)
__field(int64_t, old_value)
),
TP_fast_assign(
__entry->dev = mp->m_super->s_dev;
__entry->expected = expected;
__entry->curr_value = curr_value;
__entry->old_value = old_value;
),
TP_printk("dev %d:%d expected %llu curr_value %lld old_value %lld",
MAJOR(__entry->dev), MINOR(__entry->dev),
__entry->expected,
__entry->curr_value,
__entry->old_value)
)
/* repair tracepoints */ /* repair tracepoints */
#if IS_ENABLED(CONFIG_XFS_ONLINE_REPAIR) #if IS_ENABLED(CONFIG_XFS_ONLINE_REPAIR)

View File

@ -234,11 +234,10 @@ xfs_setfilesize_ioend(
* IO write completion. * IO write completion.
*/ */
STATIC void STATIC void
xfs_end_io( xfs_end_ioend(
struct work_struct *work) struct xfs_ioend *ioend)
{ {
struct xfs_ioend *ioend = struct list_head ioend_list;
container_of(work, struct xfs_ioend, io_work);
struct xfs_inode *ip = XFS_I(ioend->io_inode); struct xfs_inode *ip = XFS_I(ioend->io_inode);
xfs_off_t offset = ioend->io_offset; xfs_off_t offset = ioend->io_offset;
size_t size = ioend->io_size; size_t size = ioend->io_size;
@ -275,7 +274,116 @@ xfs_end_io(
done: done:
if (ioend->io_append_trans) if (ioend->io_append_trans)
error = xfs_setfilesize_ioend(ioend, error); error = xfs_setfilesize_ioend(ioend, error);
list_replace_init(&ioend->io_list, &ioend_list);
xfs_destroy_ioend(ioend, error); xfs_destroy_ioend(ioend, error);
while (!list_empty(&ioend_list)) {
ioend = list_first_entry(&ioend_list, struct xfs_ioend,
io_list);
list_del_init(&ioend->io_list);
xfs_destroy_ioend(ioend, error);
}
}
/*
* We can merge two adjacent ioends if they have the same set of work to do.
*/
static bool
xfs_ioend_can_merge(
struct xfs_ioend *ioend,
int ioend_error,
struct xfs_ioend *next)
{
int next_error;
next_error = blk_status_to_errno(next->io_bio->bi_status);
if (ioend_error != next_error)
return false;
if ((ioend->io_fork == XFS_COW_FORK) ^ (next->io_fork == XFS_COW_FORK))
return false;
if ((ioend->io_state == XFS_EXT_UNWRITTEN) ^
(next->io_state == XFS_EXT_UNWRITTEN))
return false;
if (ioend->io_offset + ioend->io_size != next->io_offset)
return false;
if (xfs_ioend_is_append(ioend) != xfs_ioend_is_append(next))
return false;
return true;
}
/* Try to merge adjacent completions. */
STATIC void
xfs_ioend_try_merge(
struct xfs_ioend *ioend,
struct list_head *more_ioends)
{
struct xfs_ioend *next_ioend;
int ioend_error;
int error;
if (list_empty(more_ioends))
return;
ioend_error = blk_status_to_errno(ioend->io_bio->bi_status);
while (!list_empty(more_ioends)) {
next_ioend = list_first_entry(more_ioends, struct xfs_ioend,
io_list);
if (!xfs_ioend_can_merge(ioend, ioend_error, next_ioend))
break;
list_move_tail(&next_ioend->io_list, &ioend->io_list);
ioend->io_size += next_ioend->io_size;
if (ioend->io_append_trans) {
error = xfs_setfilesize_ioend(next_ioend, 1);
ASSERT(error == 1);
}
}
}
/* list_sort compare function for ioends */
static int
xfs_ioend_compare(
void *priv,
struct list_head *a,
struct list_head *b)
{
struct xfs_ioend *ia;
struct xfs_ioend *ib;
ia = container_of(a, struct xfs_ioend, io_list);
ib = container_of(b, struct xfs_ioend, io_list);
if (ia->io_offset < ib->io_offset)
return -1;
else if (ia->io_offset > ib->io_offset)
return 1;
return 0;
}
/* Finish all pending io completions. */
void
xfs_end_io(
struct work_struct *work)
{
struct xfs_inode *ip;
struct xfs_ioend *ioend;
struct list_head completion_list;
unsigned long flags;
ip = container_of(work, struct xfs_inode, i_ioend_work);
spin_lock_irqsave(&ip->i_ioend_lock, flags);
list_replace_init(&ip->i_ioend_list, &completion_list);
spin_unlock_irqrestore(&ip->i_ioend_lock, flags);
list_sort(NULL, &completion_list, xfs_ioend_compare);
while (!list_empty(&completion_list)) {
ioend = list_first_entry(&completion_list, struct xfs_ioend,
io_list);
list_del_init(&ioend->io_list);
xfs_ioend_try_merge(ioend, &completion_list);
xfs_end_ioend(ioend);
}
} }
STATIC void STATIC void
@ -283,14 +391,20 @@ xfs_end_bio(
struct bio *bio) struct bio *bio)
{ {
struct xfs_ioend *ioend = bio->bi_private; struct xfs_ioend *ioend = bio->bi_private;
struct xfs_mount *mp = XFS_I(ioend->io_inode)->i_mount; struct xfs_inode *ip = XFS_I(ioend->io_inode);
struct xfs_mount *mp = ip->i_mount;
unsigned long flags;
if (ioend->io_fork == XFS_COW_FORK || if (ioend->io_fork == XFS_COW_FORK ||
ioend->io_state == XFS_EXT_UNWRITTEN) ioend->io_state == XFS_EXT_UNWRITTEN ||
queue_work(mp->m_unwritten_workqueue, &ioend->io_work); ioend->io_append_trans != NULL) {
else if (ioend->io_append_trans) spin_lock_irqsave(&ip->i_ioend_lock, flags);
queue_work(mp->m_data_workqueue, &ioend->io_work); if (list_empty(&ip->i_ioend_list))
else WARN_ON_ONCE(!queue_work(mp->m_unwritten_workqueue,
&ip->i_ioend_work));
list_add_tail(&ioend->io_list, &ip->i_ioend_list);
spin_unlock_irqrestore(&ip->i_ioend_lock, flags);
} else
xfs_destroy_ioend(ioend, blk_status_to_errno(bio->bi_status)); xfs_destroy_ioend(ioend, blk_status_to_errno(bio->bi_status));
} }
@ -594,7 +708,6 @@ xfs_alloc_ioend(
ioend->io_inode = inode; ioend->io_inode = inode;
ioend->io_size = 0; ioend->io_size = 0;
ioend->io_offset = offset; ioend->io_offset = offset;
INIT_WORK(&ioend->io_work, xfs_end_io);
ioend->io_append_trans = NULL; ioend->io_append_trans = NULL;
ioend->io_bio = bio; ioend->io_bio = bio;
return ioend; return ioend;

View File

@ -18,7 +18,6 @@ struct xfs_ioend {
struct inode *io_inode; /* file being written to */ struct inode *io_inode; /* file being written to */
size_t io_size; /* size of the extent */ size_t io_size; /* size of the extent */
xfs_off_t io_offset; /* offset in the file */ xfs_off_t io_offset; /* offset in the file */
struct work_struct io_work; /* xfsdatad work queue */
struct xfs_trans *io_append_trans;/* xact. for size update */ struct xfs_trans *io_append_trans;/* xact. for size update */
struct bio *io_bio; /* bio being built */ struct bio *io_bio; /* bio being built */
struct bio io_inline_bio; /* MUST BE LAST! */ struct bio io_inline_bio; /* MUST BE LAST! */

View File

@ -1193,6 +1193,8 @@ xfs_prepare_shift(
* about to shift down every extent from offset to EOF. * about to shift down every extent from offset to EOF.
*/ */
error = xfs_flush_unmap_range(ip, offset, XFS_ISIZE(ip)); error = xfs_flush_unmap_range(ip, offset, XFS_ISIZE(ip));
if (error)
return error;
/* /*
* Clean out anything hanging around in the cow fork now that * Clean out anything hanging around in the cow fork now that

View File

@ -605,6 +605,8 @@ xfs_buf_item_unlock(
#if defined(DEBUG) || defined(XFS_WARN) #if defined(DEBUG) || defined(XFS_WARN)
bool ordered = bip->bli_flags & XFS_BLI_ORDERED; bool ordered = bip->bli_flags & XFS_BLI_ORDERED;
bool dirty = bip->bli_flags & XFS_BLI_DIRTY; bool dirty = bip->bli_flags & XFS_BLI_DIRTY;
bool aborted = test_bit(XFS_LI_ABORTED,
&lip->li_flags);
#endif #endif
trace_xfs_buf_item_unlock(bip); trace_xfs_buf_item_unlock(bip);
@ -633,7 +635,7 @@ xfs_buf_item_unlock(
released = xfs_buf_item_put(bip); released = xfs_buf_item_put(bip);
if (hold || (stale && !released)) if (hold || (stale && !released))
return; return;
ASSERT(!stale || test_bit(XFS_LI_ABORTED, &lip->li_flags)); ASSERT(!stale || aborted);
xfs_buf_relse(bp); xfs_buf_relse(bp);
} }

View File

@ -172,6 +172,8 @@ xfs_ioc_trim(
if (copy_from_user(&range, urange, sizeof(range))) if (copy_from_user(&range, urange, sizeof(range)))
return -EFAULT; return -EFAULT;
range.minlen = max_t(u64, granularity, range.minlen);
minlen = BTOBB(range.minlen);
/* /*
* Truncating down the len isn't actually quite correct, but using * Truncating down the len isn't actually quite correct, but using
* BBTOB would mean we trivially get overflows for values * BBTOB would mean we trivially get overflows for values
@ -186,7 +188,6 @@ xfs_ioc_trim(
start = BTOBB(range.start); start = BTOBB(range.start);
end = start + BTOBBT(range.len) - 1; end = start + BTOBBT(range.len) - 1;
minlen = BTOBB(max_t(u64, granularity, range.minlen));
if (end > XFS_FSB_TO_BB(mp, mp->m_sb.sb_dblocks) - 1) if (end > XFS_FSB_TO_BB(mp, mp->m_sb.sb_dblocks) - 1)
end = XFS_FSB_TO_BB(mp, mp->m_sb.sb_dblocks)- 1; end = XFS_FSB_TO_BB(mp, mp->m_sb.sb_dblocks)- 1;

View File

@ -277,7 +277,8 @@ xfs_dquot_set_prealloc_limits(struct xfs_dquot *dqp)
/* /*
* Ensure that the given in-core dquot has a buffer on disk backing it, and * Ensure that the given in-core dquot has a buffer on disk backing it, and
* return the buffer. This is called when the bmapi finds a hole. * return the buffer locked and held. This is called when the bmapi finds a
* hole.
*/ */
STATIC int STATIC int
xfs_dquot_disk_alloc( xfs_dquot_disk_alloc(
@ -355,13 +356,14 @@ xfs_dquot_disk_alloc(
* If everything succeeds, the caller of this function is returned a * If everything succeeds, the caller of this function is returned a
* buffer that is locked and held to the transaction. The caller * buffer that is locked and held to the transaction. The caller
* is responsible for unlocking any buffer passed back, either * is responsible for unlocking any buffer passed back, either
* manually or by committing the transaction. * manually or by committing the transaction. On error, the buffer is
* released and not passed back.
*/ */
xfs_trans_bhold(tp, bp); xfs_trans_bhold(tp, bp);
error = xfs_defer_finish(tpp); error = xfs_defer_finish(tpp);
tp = *tpp;
if (error) { if (error) {
xfs_buf_relse(bp); xfs_trans_bhold_release(*tpp, bp);
xfs_trans_brelse(*tpp, bp);
return error; return error;
} }
*bpp = bp; *bpp = bp;
@ -521,7 +523,6 @@ xfs_qm_dqread_alloc(
struct xfs_buf **bpp) struct xfs_buf **bpp)
{ {
struct xfs_trans *tp; struct xfs_trans *tp;
struct xfs_buf *bp;
int error; int error;
error = xfs_trans_alloc(mp, &M_RES(mp)->tr_qm_dqalloc, error = xfs_trans_alloc(mp, &M_RES(mp)->tr_qm_dqalloc,
@ -529,7 +530,7 @@ xfs_qm_dqread_alloc(
if (error) if (error)
goto err; goto err;
error = xfs_dquot_disk_alloc(&tp, dqp, &bp); error = xfs_dquot_disk_alloc(&tp, dqp, bpp);
if (error) if (error)
goto err_cancel; goto err_cancel;
@ -539,10 +540,10 @@ xfs_qm_dqread_alloc(
* Buffer was held to the transaction, so we have to unlock it * Buffer was held to the transaction, so we have to unlock it
* manually here because we're not passing it back. * manually here because we're not passing it back.
*/ */
xfs_buf_relse(bp); xfs_buf_relse(*bpp);
*bpp = NULL;
goto err; goto err;
} }
*bpp = bp;
return 0; return 0;
err_cancel: err_cancel:

View File

@ -517,6 +517,9 @@ xfs_file_dio_aio_write(
} }
if (iocb->ki_flags & IOCB_NOWAIT) { if (iocb->ki_flags & IOCB_NOWAIT) {
/* unaligned dio always waits, bail */
if (unaligned_io)
return -EAGAIN;
if (!xfs_ilock_nowait(ip, iolock)) if (!xfs_ilock_nowait(ip, iolock))
return -EAGAIN; return -EAGAIN;
} else { } else {
@ -536,9 +539,6 @@ xfs_file_dio_aio_write(
* xfs_file_aio_write_checks() for other reasons. * xfs_file_aio_write_checks() for other reasons.
*/ */
if (unaligned_io) { if (unaligned_io) {
/* unaligned dio always waits, bail */
if (iocb->ki_flags & IOCB_NOWAIT)
return -EAGAIN;
inode_dio_wait(inode); inode_dio_wait(inode);
} else if (iolock == XFS_IOLOCK_EXCL) { } else if (iolock == XFS_IOLOCK_EXCL) {
xfs_ilock_demote(ip, XFS_IOLOCK_EXCL); xfs_ilock_demote(ip, XFS_IOLOCK_EXCL);

View File

@ -289,7 +289,7 @@ xfs_growfs_log(
* exported through ioctl XFS_IOC_FSCOUNTS * exported through ioctl XFS_IOC_FSCOUNTS
*/ */
int void
xfs_fs_counts( xfs_fs_counts(
xfs_mount_t *mp, xfs_mount_t *mp,
xfs_fsop_counts_t *cnt) xfs_fsop_counts_t *cnt)
@ -302,7 +302,6 @@ xfs_fs_counts(
spin_lock(&mp->m_sb_lock); spin_lock(&mp->m_sb_lock);
cnt->freertx = mp->m_sb.sb_frextents; cnt->freertx = mp->m_sb.sb_frextents;
spin_unlock(&mp->m_sb_lock); spin_unlock(&mp->m_sb_lock);
return 0;
} }
/* /*

View File

@ -8,7 +8,7 @@
extern int xfs_growfs_data(xfs_mount_t *mp, xfs_growfs_data_t *in); extern int xfs_growfs_data(xfs_mount_t *mp, xfs_growfs_data_t *in);
extern int xfs_growfs_log(xfs_mount_t *mp, xfs_growfs_log_t *in); extern int xfs_growfs_log(xfs_mount_t *mp, xfs_growfs_log_t *in);
extern int xfs_fs_counts(xfs_mount_t *mp, xfs_fsop_counts_t *cnt); extern void xfs_fs_counts(xfs_mount_t *mp, xfs_fsop_counts_t *cnt);
extern int xfs_reserve_blocks(xfs_mount_t *mp, uint64_t *inval, extern int xfs_reserve_blocks(xfs_mount_t *mp, uint64_t *inval,
xfs_fsop_resblks_t *outval); xfs_fsop_resblks_t *outval);
extern int xfs_fs_goingdown(xfs_mount_t *mp, uint32_t inflags); extern int xfs_fs_goingdown(xfs_mount_t *mp, uint32_t inflags);

392
fs/xfs/xfs_health.c Normal file
View File

@ -0,0 +1,392 @@
// SPDX-License-Identifier: GPL-2.0+
/*
* Copyright (C) 2019 Oracle. All Rights Reserved.
* Author: Darrick J. Wong <darrick.wong@oracle.com>
*/
#include "xfs.h"
#include "xfs_fs.h"
#include "xfs_shared.h"
#include "xfs_format.h"
#include "xfs_log_format.h"
#include "xfs_trans_resv.h"
#include "xfs_bit.h"
#include "xfs_sb.h"
#include "xfs_mount.h"
#include "xfs_defer.h"
#include "xfs_da_format.h"
#include "xfs_da_btree.h"
#include "xfs_inode.h"
#include "xfs_trace.h"
#include "xfs_health.h"
/*
* Warn about metadata corruption that we detected but haven't fixed, and
* make sure we're not sitting on anything that would get in the way of
* recovery.
*/
void
xfs_health_unmount(
struct xfs_mount *mp)
{
struct xfs_perag *pag;
xfs_agnumber_t agno;
unsigned int sick = 0;
unsigned int checked = 0;
bool warn = false;
if (XFS_FORCED_SHUTDOWN(mp))
return;
/* Measure AG corruption levels. */
for (agno = 0; agno < mp->m_sb.sb_agcount; agno++) {
pag = xfs_perag_get(mp, agno);
xfs_ag_measure_sickness(pag, &sick, &checked);
if (sick) {
trace_xfs_ag_unfixed_corruption(mp, agno, sick);
warn = true;
}
xfs_perag_put(pag);
}
/* Measure realtime volume corruption levels. */
xfs_rt_measure_sickness(mp, &sick, &checked);
if (sick) {
trace_xfs_rt_unfixed_corruption(mp, sick);
warn = true;
}
/*
* Measure fs corruption and keep the sample around for the warning.
* See the note below for why we exempt FS_COUNTERS.
*/
xfs_fs_measure_sickness(mp, &sick, &checked);
if (sick & ~XFS_SICK_FS_COUNTERS) {
trace_xfs_fs_unfixed_corruption(mp, sick);
warn = true;
}
if (warn) {
xfs_warn(mp,
"Uncorrected metadata errors detected; please run xfs_repair.");
/*
* We discovered uncorrected metadata problems at some point
* during this filesystem mount and have advised the
* administrator to run repair once the unmount completes.
*
* However, we must be careful -- when FSCOUNTERS are flagged
* unhealthy, the unmount procedure omits writing the clean
* unmount record to the log so that the next mount will run
* recovery and recompute the summary counters. In other
* words, we leave a dirty log to get the counters fixed.
*
* Unfortunately, xfs_repair cannot recover dirty logs, so if
* there were filesystem problems, FSCOUNTERS was flagged, and
* the administrator takes our advice to run xfs_repair,
* they'll have to zap the log before repairing structures.
* We don't really want to encourage this, so we mark the
* FSCOUNTERS healthy so that a subsequent repair run won't see
* a dirty log.
*/
if (sick & XFS_SICK_FS_COUNTERS)
xfs_fs_mark_healthy(mp, XFS_SICK_FS_COUNTERS);
}
}
/* Mark unhealthy per-fs metadata. */
void
xfs_fs_mark_sick(
struct xfs_mount *mp,
unsigned int mask)
{
ASSERT(!(mask & ~XFS_SICK_FS_PRIMARY));
trace_xfs_fs_mark_sick(mp, mask);
spin_lock(&mp->m_sb_lock);
mp->m_fs_sick |= mask;
mp->m_fs_checked |= mask;
spin_unlock(&mp->m_sb_lock);
}
/* Mark a per-fs metadata healed. */
void
xfs_fs_mark_healthy(
struct xfs_mount *mp,
unsigned int mask)
{
ASSERT(!(mask & ~XFS_SICK_FS_PRIMARY));
trace_xfs_fs_mark_healthy(mp, mask);
spin_lock(&mp->m_sb_lock);
mp->m_fs_sick &= ~mask;
mp->m_fs_checked |= mask;
spin_unlock(&mp->m_sb_lock);
}
/* Sample which per-fs metadata are unhealthy. */
void
xfs_fs_measure_sickness(
struct xfs_mount *mp,
unsigned int *sick,
unsigned int *checked)
{
spin_lock(&mp->m_sb_lock);
*sick = mp->m_fs_sick;
*checked = mp->m_fs_checked;
spin_unlock(&mp->m_sb_lock);
}
/* Mark unhealthy realtime metadata. */
void
xfs_rt_mark_sick(
struct xfs_mount *mp,
unsigned int mask)
{
ASSERT(!(mask & ~XFS_SICK_RT_PRIMARY));
trace_xfs_rt_mark_sick(mp, mask);
spin_lock(&mp->m_sb_lock);
mp->m_rt_sick |= mask;
mp->m_rt_checked |= mask;
spin_unlock(&mp->m_sb_lock);
}
/* Mark a realtime metadata healed. */
void
xfs_rt_mark_healthy(
struct xfs_mount *mp,
unsigned int mask)
{
ASSERT(!(mask & ~XFS_SICK_RT_PRIMARY));
trace_xfs_rt_mark_healthy(mp, mask);
spin_lock(&mp->m_sb_lock);
mp->m_rt_sick &= ~mask;
mp->m_rt_checked |= mask;
spin_unlock(&mp->m_sb_lock);
}
/* Sample which realtime metadata are unhealthy. */
void
xfs_rt_measure_sickness(
struct xfs_mount *mp,
unsigned int *sick,
unsigned int *checked)
{
spin_lock(&mp->m_sb_lock);
*sick = mp->m_rt_sick;
*checked = mp->m_rt_checked;
spin_unlock(&mp->m_sb_lock);
}
/* Mark unhealthy per-ag metadata. */
void
xfs_ag_mark_sick(
struct xfs_perag *pag,
unsigned int mask)
{
ASSERT(!(mask & ~XFS_SICK_AG_PRIMARY));
trace_xfs_ag_mark_sick(pag->pag_mount, pag->pag_agno, mask);
spin_lock(&pag->pag_state_lock);
pag->pag_sick |= mask;
pag->pag_checked |= mask;
spin_unlock(&pag->pag_state_lock);
}
/* Mark per-ag metadata ok. */
void
xfs_ag_mark_healthy(
struct xfs_perag *pag,
unsigned int mask)
{
ASSERT(!(mask & ~XFS_SICK_AG_PRIMARY));
trace_xfs_ag_mark_healthy(pag->pag_mount, pag->pag_agno, mask);
spin_lock(&pag->pag_state_lock);
pag->pag_sick &= ~mask;
pag->pag_checked |= mask;
spin_unlock(&pag->pag_state_lock);
}
/* Sample which per-ag metadata are unhealthy. */
void
xfs_ag_measure_sickness(
struct xfs_perag *pag,
unsigned int *sick,
unsigned int *checked)
{
spin_lock(&pag->pag_state_lock);
*sick = pag->pag_sick;
*checked = pag->pag_checked;
spin_unlock(&pag->pag_state_lock);
}
/* Mark the unhealthy parts of an inode. */
void
xfs_inode_mark_sick(
struct xfs_inode *ip,
unsigned int mask)
{
ASSERT(!(mask & ~XFS_SICK_INO_PRIMARY));
trace_xfs_inode_mark_sick(ip, mask);
spin_lock(&ip->i_flags_lock);
ip->i_sick |= mask;
ip->i_checked |= mask;
spin_unlock(&ip->i_flags_lock);
}
/* Mark parts of an inode healed. */
void
xfs_inode_mark_healthy(
struct xfs_inode *ip,
unsigned int mask)
{
ASSERT(!(mask & ~XFS_SICK_INO_PRIMARY));
trace_xfs_inode_mark_healthy(ip, mask);
spin_lock(&ip->i_flags_lock);
ip->i_sick &= ~mask;
ip->i_checked |= mask;
spin_unlock(&ip->i_flags_lock);
}
/* Sample which parts of an inode are unhealthy. */
void
xfs_inode_measure_sickness(
struct xfs_inode *ip,
unsigned int *sick,
unsigned int *checked)
{
spin_lock(&ip->i_flags_lock);
*sick = ip->i_sick;
*checked = ip->i_checked;
spin_unlock(&ip->i_flags_lock);
}
/* Mappings between internal sick masks and ioctl sick masks. */
struct ioctl_sick_map {
unsigned int sick_mask;
unsigned int ioctl_mask;
};
static const struct ioctl_sick_map fs_map[] = {
{ XFS_SICK_FS_COUNTERS, XFS_FSOP_GEOM_SICK_COUNTERS},
{ XFS_SICK_FS_UQUOTA, XFS_FSOP_GEOM_SICK_UQUOTA },
{ XFS_SICK_FS_GQUOTA, XFS_FSOP_GEOM_SICK_GQUOTA },
{ XFS_SICK_FS_PQUOTA, XFS_FSOP_GEOM_SICK_PQUOTA },
{ 0, 0 },
};
static const struct ioctl_sick_map rt_map[] = {
{ XFS_SICK_RT_BITMAP, XFS_FSOP_GEOM_SICK_RT_BITMAP },
{ XFS_SICK_RT_SUMMARY, XFS_FSOP_GEOM_SICK_RT_SUMMARY },
{ 0, 0 },
};
static inline void
xfgeo_health_tick(
struct xfs_fsop_geom *geo,
unsigned int sick,
unsigned int checked,
const struct ioctl_sick_map *m)
{
if (checked & m->sick_mask)
geo->checked |= m->ioctl_mask;
if (sick & m->sick_mask)
geo->sick |= m->ioctl_mask;
}
/* Fill out fs geometry health info. */
void
xfs_fsop_geom_health(
struct xfs_mount *mp,
struct xfs_fsop_geom *geo)
{
const struct ioctl_sick_map *m;
unsigned int sick;
unsigned int checked;
geo->sick = 0;
geo->checked = 0;
xfs_fs_measure_sickness(mp, &sick, &checked);
for (m = fs_map; m->sick_mask; m++)
xfgeo_health_tick(geo, sick, checked, m);
xfs_rt_measure_sickness(mp, &sick, &checked);
for (m = rt_map; m->sick_mask; m++)
xfgeo_health_tick(geo, sick, checked, m);
}
static const struct ioctl_sick_map ag_map[] = {
{ XFS_SICK_AG_SB, XFS_AG_GEOM_SICK_SB },
{ XFS_SICK_AG_AGF, XFS_AG_GEOM_SICK_AGF },
{ XFS_SICK_AG_AGFL, XFS_AG_GEOM_SICK_AGFL },
{ XFS_SICK_AG_AGI, XFS_AG_GEOM_SICK_AGI },
{ XFS_SICK_AG_BNOBT, XFS_AG_GEOM_SICK_BNOBT },
{ XFS_SICK_AG_CNTBT, XFS_AG_GEOM_SICK_CNTBT },
{ XFS_SICK_AG_INOBT, XFS_AG_GEOM_SICK_INOBT },
{ XFS_SICK_AG_FINOBT, XFS_AG_GEOM_SICK_FINOBT },
{ XFS_SICK_AG_RMAPBT, XFS_AG_GEOM_SICK_RMAPBT },
{ XFS_SICK_AG_REFCNTBT, XFS_AG_GEOM_SICK_REFCNTBT },
{ 0, 0 },
};
/* Fill out ag geometry health info. */
void
xfs_ag_geom_health(
struct xfs_perag *pag,
struct xfs_ag_geometry *ageo)
{
const struct ioctl_sick_map *m;
unsigned int sick;
unsigned int checked;
ageo->ag_sick = 0;
ageo->ag_checked = 0;
xfs_ag_measure_sickness(pag, &sick, &checked);
for (m = ag_map; m->sick_mask; m++) {
if (checked & m->sick_mask)
ageo->ag_checked |= m->ioctl_mask;
if (sick & m->sick_mask)
ageo->ag_sick |= m->ioctl_mask;
}
}
static const struct ioctl_sick_map ino_map[] = {
{ XFS_SICK_INO_CORE, XFS_BS_SICK_INODE },
{ XFS_SICK_INO_BMBTD, XFS_BS_SICK_BMBTD },
{ XFS_SICK_INO_BMBTA, XFS_BS_SICK_BMBTA },
{ XFS_SICK_INO_BMBTC, XFS_BS_SICK_BMBTC },
{ XFS_SICK_INO_DIR, XFS_BS_SICK_DIR },
{ XFS_SICK_INO_XATTR, XFS_BS_SICK_XATTR },
{ XFS_SICK_INO_SYMLINK, XFS_BS_SICK_SYMLINK },
{ XFS_SICK_INO_PARENT, XFS_BS_SICK_PARENT },
{ 0, 0 },
};
/* Fill out bulkstat health info. */
void
xfs_bulkstat_health(
struct xfs_inode *ip,
struct xfs_bstat *bs)
{
const struct ioctl_sick_map *m;
unsigned int sick;
unsigned int checked;
bs->bs_sick = 0;
bs->bs_checked = 0;
xfs_inode_measure_sickness(ip, &sick, &checked);
for (m = ino_map; m->sick_mask; m++) {
if (checked & m->sick_mask)
bs->bs_checked |= m->ioctl_mask;
if (sick & m->sick_mask)
bs->bs_sick |= m->ioctl_mask;
}
}

View File

@ -70,6 +70,11 @@ xfs_inode_alloc(
ip->i_flags = 0; ip->i_flags = 0;
ip->i_delayed_blks = 0; ip->i_delayed_blks = 0;
memset(&ip->i_d, 0, sizeof(ip->i_d)); memset(&ip->i_d, 0, sizeof(ip->i_d));
ip->i_sick = 0;
ip->i_checked = 0;
INIT_WORK(&ip->i_ioend_work, xfs_end_io);
INIT_LIST_HEAD(&ip->i_ioend_list);
spin_lock_init(&ip->i_ioend_lock);
return ip; return ip;
} }
@ -446,6 +451,8 @@ xfs_iget_cache_hit(
ip->i_flags |= XFS_INEW; ip->i_flags |= XFS_INEW;
xfs_inode_clear_reclaim_tag(pag, ip->i_ino); xfs_inode_clear_reclaim_tag(pag, ip->i_ino);
inode->i_state = I_NEW; inode->i_state = I_NEW;
ip->i_sick = 0;
ip->i_checked = 0;
ASSERT(!rwsem_is_locked(&inode->i_rwsem)); ASSERT(!rwsem_is_locked(&inode->i_rwsem));
init_rwsem(&inode->i_rwsem); init_rwsem(&inode->i_rwsem);
@ -1815,7 +1822,7 @@ xfs_inode_clear_cowblocks_tag(
/* Disable post-EOF and CoW block auto-reclamation. */ /* Disable post-EOF and CoW block auto-reclamation. */
void void
xfs_icache_disable_reclaim( xfs_stop_block_reaping(
struct xfs_mount *mp) struct xfs_mount *mp)
{ {
cancel_delayed_work_sync(&mp->m_eofblocks_work); cancel_delayed_work_sync(&mp->m_eofblocks_work);
@ -1824,7 +1831,7 @@ xfs_icache_disable_reclaim(
/* Enable post-EOF and CoW block auto-reclamation. */ /* Enable post-EOF and CoW block auto-reclamation. */
void void
xfs_icache_enable_reclaim( xfs_start_block_reaping(
struct xfs_mount *mp) struct xfs_mount *mp)
{ {
xfs_queue_eofblocks(mp); xfs_queue_eofblocks(mp);

View File

@ -119,7 +119,7 @@ xfs_fs_eofblocks_from_user(
int xfs_icache_inode_is_allocated(struct xfs_mount *mp, struct xfs_trans *tp, int xfs_icache_inode_is_allocated(struct xfs_mount *mp, struct xfs_trans *tp,
xfs_ino_t ino, bool *inuse); xfs_ino_t ino, bool *inuse);
void xfs_icache_disable_reclaim(struct xfs_mount *mp); void xfs_stop_block_reaping(struct xfs_mount *mp);
void xfs_icache_enable_reclaim(struct xfs_mount *mp); void xfs_start_block_reaping(struct xfs_mount *mp);
#endif #endif

View File

@ -1116,7 +1116,7 @@ xfs_droplink(
/* /*
* Increment the link count on an inode & log the change. * Increment the link count on an inode & log the change.
*/ */
static int static void
xfs_bumplink( xfs_bumplink(
xfs_trans_t *tp, xfs_trans_t *tp,
xfs_inode_t *ip) xfs_inode_t *ip)
@ -1126,7 +1126,6 @@ xfs_bumplink(
ASSERT(ip->i_d.di_version > 1); ASSERT(ip->i_d.di_version > 1);
inc_nlink(VFS_I(ip)); inc_nlink(VFS_I(ip));
xfs_trans_log_inode(tp, ip, XFS_ILOG_CORE); xfs_trans_log_inode(tp, ip, XFS_ILOG_CORE);
return 0;
} }
int int
@ -1235,9 +1234,7 @@ xfs_create(
if (error) if (error)
goto out_trans_cancel; goto out_trans_cancel;
error = xfs_bumplink(tp, dp); xfs_bumplink(tp, dp);
if (error)
goto out_trans_cancel;
} }
/* /*
@ -1454,9 +1451,7 @@ xfs_link(
xfs_trans_ichgtime(tp, tdp, XFS_ICHGTIME_MOD | XFS_ICHGTIME_CHG); xfs_trans_ichgtime(tp, tdp, XFS_ICHGTIME_MOD | XFS_ICHGTIME_CHG);
xfs_trans_log_inode(tp, tdp, XFS_ILOG_CORE); xfs_trans_log_inode(tp, tdp, XFS_ILOG_CORE);
error = xfs_bumplink(tp, sip); xfs_bumplink(tp, sip);
if (error)
goto error_return;
/* /*
* If this is a synchronous mount, make sure that the * If this is a synchronous mount, make sure that the
@ -3097,9 +3092,7 @@ xfs_cross_rename(
error = xfs_droplink(tp, dp2); error = xfs_droplink(tp, dp2);
if (error) if (error)
goto out_trans_abort; goto out_trans_abort;
error = xfs_bumplink(tp, dp1); xfs_bumplink(tp, dp1);
if (error)
goto out_trans_abort;
} }
/* /*
@ -3123,9 +3116,7 @@ xfs_cross_rename(
error = xfs_droplink(tp, dp1); error = xfs_droplink(tp, dp1);
if (error) if (error)
goto out_trans_abort; goto out_trans_abort;
error = xfs_bumplink(tp, dp2); xfs_bumplink(tp, dp2);
if (error)
goto out_trans_abort;
} }
/* /*
@ -3322,9 +3313,7 @@ xfs_rename(
XFS_ICHGTIME_MOD | XFS_ICHGTIME_CHG); XFS_ICHGTIME_MOD | XFS_ICHGTIME_CHG);
if (new_parent && src_is_directory) { if (new_parent && src_is_directory) {
error = xfs_bumplink(tp, target_dp); xfs_bumplink(tp, target_dp);
if (error)
goto out_trans_cancel;
} }
} else { /* target_ip != NULL */ } else { /* target_ip != NULL */
/* /*
@ -3443,9 +3432,7 @@ xfs_rename(
*/ */
if (wip) { if (wip) {
ASSERT(VFS_I(wip)->i_nlink == 0); ASSERT(VFS_I(wip)->i_nlink == 0);
error = xfs_bumplink(tp, wip); xfs_bumplink(tp, wip);
if (error)
goto out_trans_cancel;
error = xfs_iunlink_remove(tp, wip); error = xfs_iunlink_remove(tp, wip);
if (error) if (error)
goto out_trans_cancel; goto out_trans_cancel;
@ -3614,7 +3601,6 @@ xfs_iflush_cluster(
* inode buffer and shut down the filesystem. * inode buffer and shut down the filesystem.
*/ */
rcu_read_unlock(); rcu_read_unlock();
xfs_force_shutdown(mp, SHUTDOWN_CORRUPT_INCORE);
/* /*
* We'll always have an inode attached to the buffer for completion * We'll always have an inode attached to the buffer for completion
@ -3624,11 +3610,14 @@ xfs_iflush_cluster(
* xfs_buf_submit(). * xfs_buf_submit().
*/ */
ASSERT(bp->b_iodone); ASSERT(bp->b_iodone);
bp->b_flags |= XBF_ASYNC;
bp->b_flags &= ~XBF_DONE; bp->b_flags &= ~XBF_DONE;
xfs_buf_stale(bp); xfs_buf_stale(bp);
xfs_buf_ioerror(bp, -EIO); xfs_buf_ioerror(bp, -EIO);
xfs_buf_ioend(bp); xfs_buf_ioend(bp);
xfs_force_shutdown(mp, SHUTDOWN_CORRUPT_INCORE);
/* abort the corrupt inode, as it was not attached to the buffer */ /* abort the corrupt inode, as it was not attached to the buffer */
xfs_iflush_abort(cip, false); xfs_iflush_abort(cip, false);
kmem_free(cilist); kmem_free(cilist);

View File

@ -45,10 +45,18 @@ typedef struct xfs_inode {
mrlock_t i_lock; /* inode lock */ mrlock_t i_lock; /* inode lock */
mrlock_t i_mmaplock; /* inode mmap IO lock */ mrlock_t i_mmaplock; /* inode mmap IO lock */
atomic_t i_pincount; /* inode pin count */ atomic_t i_pincount; /* inode pin count */
/*
* Bitsets of inode metadata that have been checked and/or are sick.
* Callers must hold i_flags_lock before accessing this field.
*/
uint16_t i_checked;
uint16_t i_sick;
spinlock_t i_flags_lock; /* inode i_flags lock */ spinlock_t i_flags_lock; /* inode i_flags lock */
/* Miscellaneous state. */ /* Miscellaneous state. */
unsigned long i_flags; /* see defined flags below */ unsigned long i_flags; /* see defined flags below */
unsigned int i_delayed_blks; /* count of delay alloc blks */ uint64_t i_delayed_blks; /* count of delay alloc blks */
struct xfs_icdinode i_d; /* most of ondisk inode */ struct xfs_icdinode i_d; /* most of ondisk inode */
@ -57,6 +65,11 @@ typedef struct xfs_inode {
/* VFS inode */ /* VFS inode */
struct inode i_vnode; /* embedded VFS inode */ struct inode i_vnode; /* embedded VFS inode */
/* pending io completions */
spinlock_t i_ioend_lock;
struct work_struct i_ioend_work;
struct list_head i_ioend_list;
} xfs_inode_t; } xfs_inode_t;
/* Convert from vfs inode to xfs inode */ /* Convert from vfs inode to xfs inode */
@ -503,4 +516,6 @@ bool xfs_inode_verify_forks(struct xfs_inode *ip);
int xfs_iunlink_init(struct xfs_perag *pag); int xfs_iunlink_init(struct xfs_perag *pag);
void xfs_iunlink_destroy(struct xfs_perag *pag); void xfs_iunlink_destroy(struct xfs_perag *pag);
void xfs_end_io(struct work_struct *work);
#endif /* __XFS_INODE_H__ */ #endif /* __XFS_INODE_H__ */

View File

@ -33,6 +33,8 @@
#include "xfs_fsmap.h" #include "xfs_fsmap.h"
#include "scrub/xfs_scrub.h" #include "scrub/xfs_scrub.h"
#include "xfs_sb.h" #include "xfs_sb.h"
#include "xfs_ag.h"
#include "xfs_health.h"
#include <linux/capability.h> #include <linux/capability.h>
#include <linux/cred.h> #include <linux/cred.h>
@ -779,40 +781,46 @@ xfs_ioc_bulkstat(
} }
STATIC int STATIC int
xfs_ioc_fsgeometry_v1( xfs_ioc_fsgeometry(
xfs_mount_t *mp, struct xfs_mount *mp,
void __user *arg) void __user *arg,
int struct_version)
{ {
xfs_fsop_geom_t fsgeo; struct xfs_fsop_geom fsgeo;
int error; size_t len;
error = xfs_fs_geometry(&mp->m_sb, &fsgeo, 3); xfs_fs_geometry(&mp->m_sb, &fsgeo, struct_version);
if (error)
return error;
/* if (struct_version <= 3)
* Caller should have passed an argument of type len = sizeof(struct xfs_fsop_geom_v1);
* xfs_fsop_geom_v1_t. This is a proper subset of the else if (struct_version == 4)
* xfs_fsop_geom_t that xfs_fs_geometry() fills in. len = sizeof(struct xfs_fsop_geom_v4);
*/ else {
if (copy_to_user(arg, &fsgeo, sizeof(xfs_fsop_geom_v1_t))) xfs_fsop_geom_health(mp, &fsgeo);
len = sizeof(fsgeo);
}
if (copy_to_user(arg, &fsgeo, len))
return -EFAULT; return -EFAULT;
return 0; return 0;
} }
STATIC int STATIC int
xfs_ioc_fsgeometry( xfs_ioc_ag_geometry(
xfs_mount_t *mp, struct xfs_mount *mp,
void __user *arg) void __user *arg)
{ {
xfs_fsop_geom_t fsgeo; struct xfs_ag_geometry ageo;
int error; int error;
error = xfs_fs_geometry(&mp->m_sb, &fsgeo, 4); if (copy_from_user(&ageo, arg, sizeof(ageo)))
return -EFAULT;
error = xfs_ag_get_geometry(mp, ageo.ag_number, &ageo);
if (error) if (error)
return error; return error;
if (copy_to_user(arg, &fsgeo, sizeof(fsgeo))) if (copy_to_user(arg, &ageo, sizeof(ageo)))
return -EFAULT; return -EFAULT;
return 0; return 0;
} }
@ -1142,7 +1150,7 @@ xfs_ioctl_setattr_get_trans(
error = xfs_trans_alloc(mp, &M_RES(mp)->tr_ichange, 0, 0, 0, &tp); error = xfs_trans_alloc(mp, &M_RES(mp)->tr_ichange, 0, 0, 0, &tp);
if (error) if (error)
return ERR_PTR(error); goto out_unlock;
xfs_ilock(ip, XFS_ILOCK_EXCL); xfs_ilock(ip, XFS_ILOCK_EXCL);
xfs_trans_ijoin(tp, ip, XFS_ILOCK_EXCL | join_flags); xfs_trans_ijoin(tp, ip, XFS_ILOCK_EXCL | join_flags);
@ -1937,10 +1945,14 @@ xfs_file_ioctl(
return xfs_ioc_bulkstat(mp, cmd, arg); return xfs_ioc_bulkstat(mp, cmd, arg);
case XFS_IOC_FSGEOMETRY_V1: case XFS_IOC_FSGEOMETRY_V1:
return xfs_ioc_fsgeometry_v1(mp, arg); return xfs_ioc_fsgeometry(mp, arg, 3);
case XFS_IOC_FSGEOMETRY_V4:
return xfs_ioc_fsgeometry(mp, arg, 4);
case XFS_IOC_FSGEOMETRY: case XFS_IOC_FSGEOMETRY:
return xfs_ioc_fsgeometry(mp, arg); return xfs_ioc_fsgeometry(mp, arg, 5);
case XFS_IOC_AG_GEOMETRY:
return xfs_ioc_ag_geometry(mp, arg);
case XFS_IOC_GETVERSION: case XFS_IOC_GETVERSION:
return put_user(inode->i_generation, (int __user *)arg); return put_user(inode->i_generation, (int __user *)arg);
@ -2031,9 +2043,7 @@ xfs_file_ioctl(
case XFS_IOC_FSCOUNTS: { case XFS_IOC_FSCOUNTS: {
xfs_fsop_counts_t out; xfs_fsop_counts_t out;
error = xfs_fs_counts(mp, &out); xfs_fs_counts(mp, &out);
if (error)
return error;
if (copy_to_user(arg, &out, sizeof(out))) if (copy_to_user(arg, &out, sizeof(out)))
return -EFAULT; return -EFAULT;

View File

@ -52,12 +52,9 @@ xfs_compat_ioc_fsgeometry_v1(
struct xfs_mount *mp, struct xfs_mount *mp,
compat_xfs_fsop_geom_v1_t __user *arg32) compat_xfs_fsop_geom_v1_t __user *arg32)
{ {
xfs_fsop_geom_t fsgeo; struct xfs_fsop_geom fsgeo;
int error;
error = xfs_fs_geometry(&mp->m_sb, &fsgeo, 3); xfs_fs_geometry(&mp->m_sb, &fsgeo, 3);
if (error)
return error;
/* The 32-bit variant simply has some padding at the end */ /* The 32-bit variant simply has some padding at the end */
if (copy_to_user(arg32, &fsgeo, sizeof(struct compat_xfs_fsop_geom_v1))) if (copy_to_user(arg32, &fsgeo, sizeof(struct compat_xfs_fsop_geom_v1)))
return -EFAULT; return -EFAULT;
@ -561,7 +558,9 @@ xfs_file_compat_ioctl(
switch (cmd) { switch (cmd) {
/* No size or alignment issues on any arch */ /* No size or alignment issues on any arch */
case XFS_IOC_DIOINFO: case XFS_IOC_DIOINFO:
case XFS_IOC_FSGEOMETRY_V4:
case XFS_IOC_FSGEOMETRY: case XFS_IOC_FSGEOMETRY:
case XFS_IOC_AG_GEOMETRY:
case XFS_IOC_FSGETXATTR: case XFS_IOC_FSGETXATTR:
case XFS_IOC_FSSETXATTR: case XFS_IOC_FSSETXATTR:
case XFS_IOC_FSGETXATTRA: case XFS_IOC_FSGETXATTRA:

View File

@ -18,6 +18,7 @@
#include "xfs_error.h" #include "xfs_error.h"
#include "xfs_trace.h" #include "xfs_trace.h"
#include "xfs_icache.h" #include "xfs_icache.h"
#include "xfs_health.h"
/* /*
* Return stat information for one inode. * Return stat information for one inode.
@ -84,6 +85,7 @@ xfs_bulkstat_one_int(
buf->bs_extsize = dic->di_extsize << mp->m_sb.sb_blocklog; buf->bs_extsize = dic->di_extsize << mp->m_sb.sb_blocklog;
buf->bs_extents = dic->di_nextents; buf->bs_extents = dic->di_nextents;
memset(buf->bs_pad, 0, sizeof(buf->bs_pad)); memset(buf->bs_pad, 0, sizeof(buf->bs_pad));
xfs_bulkstat_health(ip, buf);
buf->bs_dmevmask = dic->di_dmevmask; buf->bs_dmevmask = dic->di_dmevmask;
buf->bs_dmstate = dic->di_dmstate; buf->bs_dmstate = dic->di_dmstate;
buf->bs_aextents = dic->di_anextents; buf->bs_aextents = dic->di_anextents;

View File

@ -23,6 +23,7 @@
#include "xfs_cksum.h" #include "xfs_cksum.h"
#include "xfs_sysfs.h" #include "xfs_sysfs.h"
#include "xfs_sb.h" #include "xfs_sb.h"
#include "xfs_health.h"
kmem_zone_t *xfs_log_ticket_zone; kmem_zone_t *xfs_log_ticket_zone;
@ -861,7 +862,7 @@ xfs_log_write_unmount_record(
* recalculated during log recovery at next mount. Refer to * recalculated during log recovery at next mount. Refer to
* xlog_check_unmount_rec for more details. * xlog_check_unmount_rec for more details.
*/ */
if (XFS_TEST_ERROR((mp->m_flags & XFS_MOUNT_BAD_SUMMARY), mp, if (XFS_TEST_ERROR(xfs_fs_has_sickness(mp, XFS_SICK_FS_COUNTERS), mp,
XFS_ERRTAG_FORCE_SUMMARY_RECALC)) { XFS_ERRTAG_FORCE_SUMMARY_RECALC)) {
xfs_alert(mp, "%s: will fix summary counters at next mount", xfs_alert(mp, "%s: will fix summary counters at next mount",
__func__); __func__);

View File

@ -582,6 +582,19 @@ xlog_cil_committed(
struct xfs_cil_ctx *ctx = args; struct xfs_cil_ctx *ctx = args;
struct xfs_mount *mp = ctx->cil->xc_log->l_mp; struct xfs_mount *mp = ctx->cil->xc_log->l_mp;
/*
* If the I/O failed, we're aborting the commit and already shutdown.
* Wake any commit waiters before aborting the log items so we don't
* block async log pushers on callbacks. Async log pushers explicitly do
* not wait on log force completion because they may be holding locks
* required to unpin items.
*/
if (abort) {
spin_lock(&ctx->cil->xc_push_lock);
wake_up_all(&ctx->cil->xc_commit_wait);
spin_unlock(&ctx->cil->xc_push_lock);
}
xfs_trans_committed_bulk(ctx->cil->xc_log->l_ailp, ctx->lv_chain, xfs_trans_committed_bulk(ctx->cil->xc_log->l_ailp, ctx->lv_chain,
ctx->start_lsn, abort); ctx->start_lsn, abort);
@ -589,15 +602,7 @@ xlog_cil_committed(
xfs_extent_busy_clear(mp, &ctx->busy_extents, xfs_extent_busy_clear(mp, &ctx->busy_extents,
(mp->m_flags & XFS_MOUNT_DISCARD) && !abort); (mp->m_flags & XFS_MOUNT_DISCARD) && !abort);
/*
* If we are aborting the commit, wake up anyone waiting on the
* committing list. If we don't, then a shutdown we can leave processes
* waiting in xlog_cil_force_lsn() waiting on a sequence commit that
* will never happen because we aborted it.
*/
spin_lock(&ctx->cil->xc_push_lock); spin_lock(&ctx->cil->xc_push_lock);
if (abort)
wake_up_all(&ctx->cil->xc_commit_wait);
list_del(&ctx->committing); list_del(&ctx->committing);
spin_unlock(&ctx->cil->xc_push_lock); spin_unlock(&ctx->cil->xc_push_lock);

View File

@ -5167,7 +5167,7 @@ xlog_recover_process_iunlinks(
} }
} }
STATIC int STATIC void
xlog_unpack_data( xlog_unpack_data(
struct xlog_rec_header *rhead, struct xlog_rec_header *rhead,
char *dp, char *dp,
@ -5190,8 +5190,6 @@ xlog_unpack_data(
dp += BBSIZE; dp += BBSIZE;
} }
} }
return 0;
} }
/* /*
@ -5206,11 +5204,9 @@ xlog_recover_process(
int pass, int pass,
struct list_head *buffer_list) struct list_head *buffer_list)
{ {
int error;
__le32 old_crc = rhead->h_crc; __le32 old_crc = rhead->h_crc;
__le32 crc; __le32 crc;
crc = xlog_cksum(log, rhead, dp, be32_to_cpu(rhead->h_len)); crc = xlog_cksum(log, rhead, dp, be32_to_cpu(rhead->h_len));
/* /*
@ -5249,9 +5245,7 @@ xlog_recover_process(
return -EFSCORRUPTED; return -EFSCORRUPTED;
} }
error = xlog_unpack_data(rhead, dp, log); xlog_unpack_data(rhead, dp, log);
if (error)
return error;
return xlog_recover_process_data(log, rhash, rhead, dp, pass, return xlog_recover_process_data(log, rhash, rhead, dp, pass,
buffer_list); buffer_list);

View File

@ -34,6 +34,7 @@
#include "xfs_refcount_btree.h" #include "xfs_refcount_btree.h"
#include "xfs_reflink.h" #include "xfs_reflink.h"
#include "xfs_extent_busy.h" #include "xfs_extent_busy.h"
#include "xfs_health.h"
static DEFINE_MUTEX(xfs_uuid_table_mutex); static DEFINE_MUTEX(xfs_uuid_table_mutex);
@ -231,6 +232,7 @@ xfs_initialize_perag(
error = xfs_iunlink_init(pag); error = xfs_iunlink_init(pag);
if (error) if (error)
goto out_hash_destroy; goto out_hash_destroy;
spin_lock_init(&pag->pag_state_lock);
} }
index = xfs_set_inode_alloc(mp, agcount); index = xfs_set_inode_alloc(mp, agcount);
@ -644,7 +646,7 @@ xfs_check_summary_counts(
(mp->m_sb.sb_fdblocks > mp->m_sb.sb_dblocks || (mp->m_sb.sb_fdblocks > mp->m_sb.sb_dblocks ||
!xfs_verify_icount(mp, mp->m_sb.sb_icount) || !xfs_verify_icount(mp, mp->m_sb.sb_icount) ||
mp->m_sb.sb_ifree > mp->m_sb.sb_icount)) mp->m_sb.sb_ifree > mp->m_sb.sb_icount))
mp->m_flags |= XFS_MOUNT_BAD_SUMMARY; xfs_fs_mark_sick(mp, XFS_SICK_FS_COUNTERS);
/* /*
* We can safely re-initialise incore superblock counters from the * We can safely re-initialise incore superblock counters from the
@ -659,7 +661,7 @@ xfs_check_summary_counts(
*/ */
if ((!xfs_sb_version_haslazysbcount(&mp->m_sb) || if ((!xfs_sb_version_haslazysbcount(&mp->m_sb) ||
XFS_LAST_UNMOUNT_WAS_CLEAN(mp)) && XFS_LAST_UNMOUNT_WAS_CLEAN(mp)) &&
!(mp->m_flags & XFS_MOUNT_BAD_SUMMARY)) !xfs_fs_has_sickness(mp, XFS_SICK_FS_COUNTERS))
return 0; return 0;
return xfs_initialize_perag_data(mp, mp->m_sb.sb_agcount); return xfs_initialize_perag_data(mp, mp->m_sb.sb_agcount);
@ -1068,6 +1070,7 @@ xfs_mountfs(
*/ */
cancel_delayed_work_sync(&mp->m_reclaim_work); cancel_delayed_work_sync(&mp->m_reclaim_work);
xfs_reclaim_inodes(mp, SYNC_WAIT); xfs_reclaim_inodes(mp, SYNC_WAIT);
xfs_health_unmount(mp);
out_log_dealloc: out_log_dealloc:
mp->m_flags |= XFS_MOUNT_UNMOUNTING; mp->m_flags |= XFS_MOUNT_UNMOUNTING;
xfs_log_mount_cancel(mp); xfs_log_mount_cancel(mp);
@ -1104,7 +1107,7 @@ xfs_unmountfs(
uint64_t resblks; uint64_t resblks;
int error; int error;
xfs_icache_disable_reclaim(mp); xfs_stop_block_reaping(mp);
xfs_fs_unreserve_ag_blocks(mp); xfs_fs_unreserve_ag_blocks(mp);
xfs_qm_unmount_quotas(mp); xfs_qm_unmount_quotas(mp);
xfs_rtunmount_inodes(mp); xfs_rtunmount_inodes(mp);
@ -1150,6 +1153,7 @@ xfs_unmountfs(
*/ */
cancel_delayed_work_sync(&mp->m_reclaim_work); cancel_delayed_work_sync(&mp->m_reclaim_work);
xfs_reclaim_inodes(mp, SYNC_WAIT); xfs_reclaim_inodes(mp, SYNC_WAIT);
xfs_health_unmount(mp);
xfs_qm_unmount(mp); xfs_qm_unmount(mp);
@ -1445,7 +1449,26 @@ xfs_force_summary_recalc(
if (!xfs_sb_version_haslazysbcount(&mp->m_sb)) if (!xfs_sb_version_haslazysbcount(&mp->m_sb))
return; return;
spin_lock(&mp->m_sb_lock); xfs_fs_mark_sick(mp, XFS_SICK_FS_COUNTERS);
mp->m_flags |= XFS_MOUNT_BAD_SUMMARY; }
spin_unlock(&mp->m_sb_lock);
/*
* Update the in-core delayed block counter.
*
* We prefer to update the counter without having to take a spinlock for every
* counter update (i.e. batching). Each change to delayed allocation
* reservations can change can easily exceed the default percpu counter
* batching, so we use a larger batch factor here.
*
* Note that we don't currently have any callers requiring fast summation
* (e.g. percpu_counter_read) so we can use a big batch value here.
*/
#define XFS_DELALLOC_BATCH (4096)
void
xfs_mod_delalloc(
struct xfs_mount *mp,
int64_t delta)
{
percpu_counter_add_batch(&mp->m_delalloc_blks, delta,
XFS_DELALLOC_BATCH);
} }

View File

@ -60,6 +60,20 @@ struct xfs_error_cfg {
typedef struct xfs_mount { typedef struct xfs_mount {
struct super_block *m_super; struct super_block *m_super;
xfs_tid_t m_tid; /* next unused tid for fs */ xfs_tid_t m_tid; /* next unused tid for fs */
/*
* Bitsets of per-fs metadata that have been checked and/or are sick.
* Callers must hold m_sb_lock to access these two fields.
*/
uint8_t m_fs_checked;
uint8_t m_fs_sick;
/*
* Bitsets of rt metadata that have been checked and/or are sick.
* Callers must hold m_sb_lock to access this field.
*/
uint8_t m_rt_checked;
uint8_t m_rt_sick;
struct xfs_ail *m_ail; /* fs active log item list */ struct xfs_ail *m_ail; /* fs active log item list */
struct xfs_sb m_sb; /* copy of fs superblock */ struct xfs_sb m_sb; /* copy of fs superblock */
@ -67,6 +81,12 @@ typedef struct xfs_mount {
struct percpu_counter m_icount; /* allocated inodes counter */ struct percpu_counter m_icount; /* allocated inodes counter */
struct percpu_counter m_ifree; /* free inodes counter */ struct percpu_counter m_ifree; /* free inodes counter */
struct percpu_counter m_fdblocks; /* free block counter */ struct percpu_counter m_fdblocks; /* free block counter */
/*
* Count of data device blocks reserved for delayed allocations,
* including indlen blocks. Does not include allocated CoW staging
* extents or anything related to the rt device.
*/
struct percpu_counter m_delalloc_blks;
struct xfs_buf *m_sb_bp; /* buffer for superblock */ struct xfs_buf *m_sb_bp; /* buffer for superblock */
char *m_fsname; /* filesystem name */ char *m_fsname; /* filesystem name */
@ -175,7 +195,6 @@ typedef struct xfs_mount {
struct xstats m_stats; /* per-fs stats */ struct xstats m_stats; /* per-fs stats */
struct workqueue_struct *m_buf_workqueue; struct workqueue_struct *m_buf_workqueue;
struct workqueue_struct *m_data_workqueue;
struct workqueue_struct *m_unwritten_workqueue; struct workqueue_struct *m_unwritten_workqueue;
struct workqueue_struct *m_cil_workqueue; struct workqueue_struct *m_cil_workqueue;
struct workqueue_struct *m_reclaim_workqueue; struct workqueue_struct *m_reclaim_workqueue;
@ -214,7 +233,6 @@ typedef struct xfs_mount {
must be synchronous except must be synchronous except
for space allocations */ for space allocations */
#define XFS_MOUNT_UNMOUNTING (1ULL << 1) /* filesystem is unmounting */ #define XFS_MOUNT_UNMOUNTING (1ULL << 1) /* filesystem is unmounting */
#define XFS_MOUNT_BAD_SUMMARY (1ULL << 2) /* summary counters are bad */
#define XFS_MOUNT_WAS_CLEAN (1ULL << 3) #define XFS_MOUNT_WAS_CLEAN (1ULL << 3)
#define XFS_MOUNT_FS_SHUTDOWN (1ULL << 4) /* atomic stop of all filesystem #define XFS_MOUNT_FS_SHUTDOWN (1ULL << 4) /* atomic stop of all filesystem
operations, typically for operations, typically for
@ -369,6 +387,15 @@ typedef struct xfs_perag {
xfs_agino_t pagl_pagino; xfs_agino_t pagl_pagino;
xfs_agino_t pagl_leftrec; xfs_agino_t pagl_leftrec;
xfs_agino_t pagl_rightrec; xfs_agino_t pagl_rightrec;
/*
* Bitsets of per-ag metadata that have been checked and/or are sick.
* Callers should hold pag_state_lock before accessing this field.
*/
uint16_t pag_checked;
uint16_t pag_sick;
spinlock_t pag_state_lock;
spinlock_t pagb_lock; /* lock for pagb_tree */ spinlock_t pagb_lock; /* lock for pagb_tree */
struct rb_root pagb_tree; /* ordered tree of busy extents */ struct rb_root pagb_tree; /* ordered tree of busy extents */
unsigned int pagb_gen; /* generation count for pagb_tree */ unsigned int pagb_gen; /* generation count for pagb_tree */
@ -454,5 +481,6 @@ int xfs_zero_extent(struct xfs_inode *ip, xfs_fsblock_t start_fsb,
struct xfs_error_cfg * xfs_error_get_cfg(struct xfs_mount *mp, struct xfs_error_cfg * xfs_error_get_cfg(struct xfs_mount *mp,
int error_class, int error); int error_class, int error);
void xfs_force_summary_recalc(struct xfs_mount *mp); void xfs_force_summary_recalc(struct xfs_mount *mp);
void xfs_mod_delalloc(struct xfs_mount *mp, int64_t delta);
#endif /* __XFS_MOUNT_H__ */ #endif /* __XFS_MOUNT_H__ */

View File

@ -1812,7 +1812,8 @@ xfs_qm_vop_chown_reserve(
uint flags) uint flags)
{ {
struct xfs_mount *mp = ip->i_mount; struct xfs_mount *mp = ip->i_mount;
uint delblks, blkflags, prjflags = 0; uint64_t delblks;
unsigned int blkflags, prjflags = 0;
struct xfs_dquot *udq_unres = NULL; struct xfs_dquot *udq_unres = NULL;
struct xfs_dquot *gdq_unres = NULL; struct xfs_dquot *gdq_unres = NULL;
struct xfs_dquot *pdq_unres = NULL; struct xfs_dquot *pdq_unres = NULL;

View File

@ -113,12 +113,8 @@ xfs_quota_inode(xfs_mount_t *mp, uint dq_flags)
return NULL; return NULL;
} }
extern void xfs_trans_mod_dquot(struct xfs_trans *, extern void xfs_trans_mod_dquot(struct xfs_trans *tp, struct xfs_dquot *dqp,
struct xfs_dquot *, uint, long); uint field, int64_t delta);
extern int xfs_trans_reserve_quota_bydquots(struct xfs_trans *,
struct xfs_mount *, struct xfs_dquot *,
struct xfs_dquot *, struct xfs_dquot *,
long, long, uint);
extern void xfs_trans_dqjoin(struct xfs_trans *, struct xfs_dquot *); extern void xfs_trans_dqjoin(struct xfs_trans *, struct xfs_dquot *);
extern void xfs_trans_log_dquot(struct xfs_trans *, struct xfs_dquot *); extern void xfs_trans_log_dquot(struct xfs_trans *, struct xfs_dquot *);

View File

@ -56,32 +56,35 @@ xfs_quota_chkd_flag(
* The structure kept inside the xfs_trans_t keep track of dquot changes * The structure kept inside the xfs_trans_t keep track of dquot changes
* within a transaction and apply them later. * within a transaction and apply them later.
*/ */
typedef struct xfs_dqtrx { struct xfs_dqtrx {
struct xfs_dquot *qt_dquot; /* the dquot this refers to */ struct xfs_dquot *qt_dquot; /* the dquot this refers to */
ulong qt_blk_res; /* blks reserved on a dquot */
ulong qt_ino_res; /* inode reserved on a dquot */ uint64_t qt_blk_res; /* blks reserved on a dquot */
ulong qt_ino_res_used; /* inodes used from the reservation */ int64_t qt_bcount_delta; /* dquot blk count changes */
long qt_bcount_delta; /* dquot blk count changes */ int64_t qt_delbcnt_delta; /* delayed dquot blk count changes */
long qt_delbcnt_delta; /* delayed dquot blk count changes */
long qt_icount_delta; /* dquot inode count changes */ uint64_t qt_rtblk_res; /* # blks reserved on a dquot */
ulong qt_rtblk_res; /* # blks reserved on a dquot */ uint64_t qt_rtblk_res_used;/* # blks used from reservation */
ulong qt_rtblk_res_used;/* # blks used from reservation */ int64_t qt_rtbcount_delta;/* dquot realtime blk changes */
long qt_rtbcount_delta;/* dquot realtime blk changes */ int64_t qt_delrtb_delta; /* delayed RT blk count changes */
long qt_delrtb_delta; /* delayed RT blk count changes */
} xfs_dqtrx_t; uint64_t qt_ino_res; /* inode reserved on a dquot */
uint64_t qt_ino_res_used; /* inodes used from the reservation */
int64_t qt_icount_delta; /* dquot inode count changes */
};
#ifdef CONFIG_XFS_QUOTA #ifdef CONFIG_XFS_QUOTA
extern void xfs_trans_dup_dqinfo(struct xfs_trans *, struct xfs_trans *); extern void xfs_trans_dup_dqinfo(struct xfs_trans *, struct xfs_trans *);
extern void xfs_trans_free_dqinfo(struct xfs_trans *); extern void xfs_trans_free_dqinfo(struct xfs_trans *);
extern void xfs_trans_mod_dquot_byino(struct xfs_trans *, struct xfs_inode *, extern void xfs_trans_mod_dquot_byino(struct xfs_trans *, struct xfs_inode *,
uint, long); uint, int64_t);
extern void xfs_trans_apply_dquot_deltas(struct xfs_trans *); extern void xfs_trans_apply_dquot_deltas(struct xfs_trans *);
extern void xfs_trans_unreserve_and_mod_dquots(struct xfs_trans *); extern void xfs_trans_unreserve_and_mod_dquots(struct xfs_trans *);
extern int xfs_trans_reserve_quota_nblks(struct xfs_trans *, extern int xfs_trans_reserve_quota_nblks(struct xfs_trans *,
struct xfs_inode *, long, long, uint); struct xfs_inode *, int64_t, long, uint);
extern int xfs_trans_reserve_quota_bydquots(struct xfs_trans *, extern int xfs_trans_reserve_quota_bydquots(struct xfs_trans *,
struct xfs_mount *, struct xfs_dquot *, struct xfs_mount *, struct xfs_dquot *,
struct xfs_dquot *, struct xfs_dquot *, long, long, uint); struct xfs_dquot *, struct xfs_dquot *, int64_t, long, uint);
extern int xfs_qm_vop_dqalloc(struct xfs_inode *, xfs_dqid_t, xfs_dqid_t, extern int xfs_qm_vop_dqalloc(struct xfs_inode *, xfs_dqid_t, xfs_dqid_t,
prid_t, uint, struct xfs_dquot **, struct xfs_dquot **, prid_t, uint, struct xfs_dquot **, struct xfs_dquot **,
@ -121,14 +124,14 @@ xfs_qm_vop_dqalloc(struct xfs_inode *ip, xfs_dqid_t uid, xfs_dqid_t gid,
#define xfs_trans_apply_dquot_deltas(tp) #define xfs_trans_apply_dquot_deltas(tp)
#define xfs_trans_unreserve_and_mod_dquots(tp) #define xfs_trans_unreserve_and_mod_dquots(tp)
static inline int xfs_trans_reserve_quota_nblks(struct xfs_trans *tp, static inline int xfs_trans_reserve_quota_nblks(struct xfs_trans *tp,
struct xfs_inode *ip, long nblks, long ninos, uint flags) struct xfs_inode *ip, int64_t nblks, long ninos, uint flags)
{ {
return 0; return 0;
} }
static inline int xfs_trans_reserve_quota_bydquots(struct xfs_trans *tp, static inline int xfs_trans_reserve_quota_bydquots(struct xfs_trans *tp,
struct xfs_mount *mp, struct xfs_dquot *udqp, struct xfs_mount *mp, struct xfs_dquot *udqp,
struct xfs_dquot *gdqp, struct xfs_dquot *pdqp, struct xfs_dquot *gdqp, struct xfs_dquot *pdqp,
long nblks, long nions, uint flags) int64_t nblks, long nions, uint flags)
{ {
return 0; return 0;
} }

View File

@ -66,7 +66,7 @@ static struct xfs_kobj xfs_dbg_kobj; /* global debug sysfs attrs */
enum { enum {
Opt_logbufs, Opt_logbsize, Opt_logdev, Opt_rtdev, Opt_biosize, Opt_logbufs, Opt_logbsize, Opt_logdev, Opt_rtdev, Opt_biosize,
Opt_wsync, Opt_noalign, Opt_swalloc, Opt_sunit, Opt_swidth, Opt_nouuid, Opt_wsync, Opt_noalign, Opt_swalloc, Opt_sunit, Opt_swidth, Opt_nouuid,
Opt_mtpt, Opt_grpid, Opt_nogrpid, Opt_bsdgroups, Opt_sysvgroups, Opt_grpid, Opt_nogrpid, Opt_bsdgroups, Opt_sysvgroups,
Opt_allocsize, Opt_norecovery, Opt_inode64, Opt_inode32, Opt_ikeep, Opt_allocsize, Opt_norecovery, Opt_inode64, Opt_inode32, Opt_ikeep,
Opt_noikeep, Opt_largeio, Opt_nolargeio, Opt_attr2, Opt_noattr2, Opt_noikeep, Opt_largeio, Opt_nolargeio, Opt_attr2, Opt_noattr2,
Opt_filestreams, Opt_quota, Opt_noquota, Opt_usrquota, Opt_grpquota, Opt_filestreams, Opt_quota, Opt_noquota, Opt_usrquota, Opt_grpquota,
@ -87,7 +87,6 @@ static const match_table_t tokens = {
{Opt_sunit, "sunit=%u"}, /* data volume stripe unit */ {Opt_sunit, "sunit=%u"}, /* data volume stripe unit */
{Opt_swidth, "swidth=%u"}, /* data volume stripe width */ {Opt_swidth, "swidth=%u"}, /* data volume stripe width */
{Opt_nouuid, "nouuid"}, /* ignore filesystem UUID */ {Opt_nouuid, "nouuid"}, /* ignore filesystem UUID */
{Opt_mtpt, "mtpt"}, /* filesystem mount point */
{Opt_grpid, "grpid"}, /* group-ID from parent directory */ {Opt_grpid, "grpid"}, /* group-ID from parent directory */
{Opt_nogrpid, "nogrpid"}, /* group-ID from current process */ {Opt_nogrpid, "nogrpid"}, /* group-ID from current process */
{Opt_bsdgroups, "bsdgroups"}, /* group-ID from parent directory */ {Opt_bsdgroups, "bsdgroups"}, /* group-ID from parent directory */
@ -236,9 +235,6 @@ xfs_parseargs(
if (!mp->m_logname) if (!mp->m_logname)
return -ENOMEM; return -ENOMEM;
break; break;
case Opt_mtpt:
xfs_warn(mp, "%s option not allowed on this system", p);
return -EINVAL;
case Opt_rtdev: case Opt_rtdev:
kfree(mp->m_rtname); kfree(mp->m_rtname);
mp->m_rtname = match_strdup(args); mp->m_rtname = match_strdup(args);
@ -448,7 +444,7 @@ struct proc_xfs_info {
char *str; char *str;
}; };
STATIC int STATIC void
xfs_showargs( xfs_showargs(
struct xfs_mount *mp, struct xfs_mount *mp,
struct seq_file *m) struct seq_file *m)
@ -527,9 +523,8 @@ xfs_showargs(
if (!(mp->m_qflags & XFS_ALL_QUOTA_ACCT)) if (!(mp->m_qflags & XFS_ALL_QUOTA_ACCT))
seq_puts(m, ",noquota"); seq_puts(m, ",noquota");
return 0;
} }
static uint64_t static uint64_t
xfs_max_file_offset( xfs_max_file_offset(
unsigned int blockshift) unsigned int blockshift)
@ -838,15 +833,10 @@ xfs_init_mount_workqueues(
if (!mp->m_buf_workqueue) if (!mp->m_buf_workqueue)
goto out; goto out;
mp->m_data_workqueue = alloc_workqueue("xfs-data/%s",
WQ_MEM_RECLAIM|WQ_FREEZABLE, 0, mp->m_fsname);
if (!mp->m_data_workqueue)
goto out_destroy_buf;
mp->m_unwritten_workqueue = alloc_workqueue("xfs-conv/%s", mp->m_unwritten_workqueue = alloc_workqueue("xfs-conv/%s",
WQ_MEM_RECLAIM|WQ_FREEZABLE, 0, mp->m_fsname); WQ_MEM_RECLAIM|WQ_FREEZABLE, 0, mp->m_fsname);
if (!mp->m_unwritten_workqueue) if (!mp->m_unwritten_workqueue)
goto out_destroy_data_iodone_queue; goto out_destroy_buf;
mp->m_cil_workqueue = alloc_workqueue("xfs-cil/%s", mp->m_cil_workqueue = alloc_workqueue("xfs-cil/%s",
WQ_MEM_RECLAIM|WQ_FREEZABLE, 0, mp->m_fsname); WQ_MEM_RECLAIM|WQ_FREEZABLE, 0, mp->m_fsname);
@ -886,8 +876,6 @@ xfs_init_mount_workqueues(
destroy_workqueue(mp->m_cil_workqueue); destroy_workqueue(mp->m_cil_workqueue);
out_destroy_unwritten: out_destroy_unwritten:
destroy_workqueue(mp->m_unwritten_workqueue); destroy_workqueue(mp->m_unwritten_workqueue);
out_destroy_data_iodone_queue:
destroy_workqueue(mp->m_data_workqueue);
out_destroy_buf: out_destroy_buf:
destroy_workqueue(mp->m_buf_workqueue); destroy_workqueue(mp->m_buf_workqueue);
out: out:
@ -903,7 +891,6 @@ xfs_destroy_mount_workqueues(
destroy_workqueue(mp->m_log_workqueue); destroy_workqueue(mp->m_log_workqueue);
destroy_workqueue(mp->m_reclaim_workqueue); destroy_workqueue(mp->m_reclaim_workqueue);
destroy_workqueue(mp->m_cil_workqueue); destroy_workqueue(mp->m_cil_workqueue);
destroy_workqueue(mp->m_data_workqueue);
destroy_workqueue(mp->m_unwritten_workqueue); destroy_workqueue(mp->m_unwritten_workqueue);
destroy_workqueue(mp->m_buf_workqueue); destroy_workqueue(mp->m_buf_workqueue);
} }
@ -1376,7 +1363,7 @@ xfs_fs_remount(
xfs_force_shutdown(mp, SHUTDOWN_CORRUPT_INCORE); xfs_force_shutdown(mp, SHUTDOWN_CORRUPT_INCORE);
return error; return error;
} }
xfs_icache_enable_reclaim(mp); xfs_start_block_reaping(mp);
/* Create the per-AG metadata reservation pool .*/ /* Create the per-AG metadata reservation pool .*/
error = xfs_fs_reserve_ag_blocks(mp); error = xfs_fs_reserve_ag_blocks(mp);
@ -1390,7 +1377,7 @@ xfs_fs_remount(
* Cancel background eofb scanning so it cannot race with the * Cancel background eofb scanning so it cannot race with the
* final log force+buftarg wait and deadlock the remount. * final log force+buftarg wait and deadlock the remount.
*/ */
xfs_icache_disable_reclaim(mp); xfs_stop_block_reaping(mp);
/* Get rid of any leftover CoW reservations... */ /* Get rid of any leftover CoW reservations... */
error = xfs_icache_free_cowblocks(mp, NULL); error = xfs_icache_free_cowblocks(mp, NULL);
@ -1434,7 +1421,7 @@ xfs_fs_freeze(
{ {
struct xfs_mount *mp = XFS_M(sb); struct xfs_mount *mp = XFS_M(sb);
xfs_icache_disable_reclaim(mp); xfs_stop_block_reaping(mp);
xfs_save_resvblks(mp); xfs_save_resvblks(mp);
xfs_quiesce_attr(mp); xfs_quiesce_attr(mp);
return xfs_sync_sb(mp, true); return xfs_sync_sb(mp, true);
@ -1448,7 +1435,7 @@ xfs_fs_unfreeze(
xfs_restore_resvblks(mp); xfs_restore_resvblks(mp);
xfs_log_work_queue(mp); xfs_log_work_queue(mp);
xfs_icache_enable_reclaim(mp); xfs_start_block_reaping(mp);
return 0; return 0;
} }
@ -1457,7 +1444,8 @@ xfs_fs_show_options(
struct seq_file *m, struct seq_file *m,
struct dentry *root) struct dentry *root)
{ {
return xfs_showargs(XFS_M(root->d_sb), m); xfs_showargs(XFS_M(root->d_sb), m);
return 0;
} }
/* /*
@ -1546,8 +1534,14 @@ xfs_init_percpu_counters(
if (error) if (error)
goto free_ifree; goto free_ifree;
error = percpu_counter_init(&mp->m_delalloc_blks, 0, GFP_KERNEL);
if (error)
goto free_fdblocks;
return 0; return 0;
free_fdblocks:
percpu_counter_destroy(&mp->m_fdblocks);
free_ifree: free_ifree:
percpu_counter_destroy(&mp->m_ifree); percpu_counter_destroy(&mp->m_ifree);
free_icount: free_icount:
@ -1571,6 +1565,9 @@ xfs_destroy_percpu_counters(
percpu_counter_destroy(&mp->m_icount); percpu_counter_destroy(&mp->m_icount);
percpu_counter_destroy(&mp->m_ifree); percpu_counter_destroy(&mp->m_ifree);
percpu_counter_destroy(&mp->m_fdblocks); percpu_counter_destroy(&mp->m_fdblocks);
ASSERT(XFS_FORCED_SHUTDOWN(mp) ||
percpu_counter_sum(&mp->m_delalloc_blks) == 0);
percpu_counter_destroy(&mp->m_delalloc_blks);
} }
static struct xfs_mount * static struct xfs_mount *

View File

@ -3440,6 +3440,82 @@ DEFINE_AGINODE_EVENT(xfs_iunlink);
DEFINE_AGINODE_EVENT(xfs_iunlink_remove); DEFINE_AGINODE_EVENT(xfs_iunlink_remove);
DEFINE_AG_EVENT(xfs_iunlink_map_prev_fallback); DEFINE_AG_EVENT(xfs_iunlink_map_prev_fallback);
DECLARE_EVENT_CLASS(xfs_fs_corrupt_class,
TP_PROTO(struct xfs_mount *mp, unsigned int flags),
TP_ARGS(mp, flags),
TP_STRUCT__entry(
__field(dev_t, dev)
__field(unsigned int, flags)
),
TP_fast_assign(
__entry->dev = mp->m_super->s_dev;
__entry->flags = flags;
),
TP_printk("dev %d:%d flags 0x%x",
MAJOR(__entry->dev), MINOR(__entry->dev),
__entry->flags)
);
#define DEFINE_FS_CORRUPT_EVENT(name) \
DEFINE_EVENT(xfs_fs_corrupt_class, name, \
TP_PROTO(struct xfs_mount *mp, unsigned int flags), \
TP_ARGS(mp, flags))
DEFINE_FS_CORRUPT_EVENT(xfs_fs_mark_sick);
DEFINE_FS_CORRUPT_EVENT(xfs_fs_mark_healthy);
DEFINE_FS_CORRUPT_EVENT(xfs_fs_unfixed_corruption);
DEFINE_FS_CORRUPT_EVENT(xfs_rt_mark_sick);
DEFINE_FS_CORRUPT_EVENT(xfs_rt_mark_healthy);
DEFINE_FS_CORRUPT_EVENT(xfs_rt_unfixed_corruption);
DECLARE_EVENT_CLASS(xfs_ag_corrupt_class,
TP_PROTO(struct xfs_mount *mp, xfs_agnumber_t agno, unsigned int flags),
TP_ARGS(mp, agno, flags),
TP_STRUCT__entry(
__field(dev_t, dev)
__field(xfs_agnumber_t, agno)
__field(unsigned int, flags)
),
TP_fast_assign(
__entry->dev = mp->m_super->s_dev;
__entry->agno = agno;
__entry->flags = flags;
),
TP_printk("dev %d:%d agno %u flags 0x%x",
MAJOR(__entry->dev), MINOR(__entry->dev),
__entry->agno, __entry->flags)
);
#define DEFINE_AG_CORRUPT_EVENT(name) \
DEFINE_EVENT(xfs_ag_corrupt_class, name, \
TP_PROTO(struct xfs_mount *mp, xfs_agnumber_t agno, \
unsigned int flags), \
TP_ARGS(mp, agno, flags))
DEFINE_AG_CORRUPT_EVENT(xfs_ag_mark_sick);
DEFINE_AG_CORRUPT_EVENT(xfs_ag_mark_healthy);
DEFINE_AG_CORRUPT_EVENT(xfs_ag_unfixed_corruption);
DECLARE_EVENT_CLASS(xfs_inode_corrupt_class,
TP_PROTO(struct xfs_inode *ip, unsigned int flags),
TP_ARGS(ip, flags),
TP_STRUCT__entry(
__field(dev_t, dev)
__field(xfs_ino_t, ino)
__field(unsigned int, flags)
),
TP_fast_assign(
__entry->dev = ip->i_mount->m_super->s_dev;
__entry->ino = ip->i_ino;
__entry->flags = flags;
),
TP_printk("dev %d:%d ino 0x%llx flags 0x%x",
MAJOR(__entry->dev), MINOR(__entry->dev),
__entry->ino, __entry->flags)
);
#define DEFINE_INODE_CORRUPT_EVENT(name) \
DEFINE_EVENT(xfs_inode_corrupt_class, name, \
TP_PROTO(struct xfs_inode *ip, unsigned int flags), \
TP_ARGS(ip, flags))
DEFINE_INODE_CORRUPT_EVENT(xfs_inode_mark_sick);
DEFINE_INODE_CORRUPT_EVENT(xfs_inode_mark_healthy);
#endif /* _TRACE_XFS_H */ #endif /* _TRACE_XFS_H */
#undef TRACE_INCLUDE_PATH #undef TRACE_INCLUDE_PATH

View File

@ -74,13 +74,13 @@ xfs_trans_log_dquot(
*/ */
void void
xfs_trans_dup_dqinfo( xfs_trans_dup_dqinfo(
xfs_trans_t *otp, struct xfs_trans *otp,
xfs_trans_t *ntp) struct xfs_trans *ntp)
{ {
xfs_dqtrx_t *oq, *nq; struct xfs_dqtrx *oq, *nq;
int i, j; int i, j;
xfs_dqtrx_t *oqa, *nqa; struct xfs_dqtrx *oqa, *nqa;
ulong blk_res_used; uint64_t blk_res_used;
if (!otp->t_dqinfo) if (!otp->t_dqinfo)
return; return;
@ -137,7 +137,7 @@ xfs_trans_mod_dquot_byino(
xfs_trans_t *tp, xfs_trans_t *tp,
xfs_inode_t *ip, xfs_inode_t *ip,
uint field, uint field,
long delta) int64_t delta)
{ {
xfs_mount_t *mp = tp->t_mountp; xfs_mount_t *mp = tp->t_mountp;
@ -191,12 +191,12 @@ xfs_trans_get_dqtrx(
*/ */
void void
xfs_trans_mod_dquot( xfs_trans_mod_dquot(
xfs_trans_t *tp, struct xfs_trans *tp,
xfs_dquot_t *dqp, struct xfs_dquot *dqp,
uint field, uint field,
long delta) int64_t delta)
{ {
xfs_dqtrx_t *qtrx; struct xfs_dqtrx *qtrx;
ASSERT(tp); ASSERT(tp);
ASSERT(XFS_IS_QUOTA_RUNNING(tp->t_mountp)); ASSERT(XFS_IS_QUOTA_RUNNING(tp->t_mountp));
@ -219,14 +219,14 @@ xfs_trans_mod_dquot(
* regular disk blk reservation * regular disk blk reservation
*/ */
case XFS_TRANS_DQ_RES_BLKS: case XFS_TRANS_DQ_RES_BLKS:
qtrx->qt_blk_res += (ulong)delta; qtrx->qt_blk_res += delta;
break; break;
/* /*
* inode reservation * inode reservation
*/ */
case XFS_TRANS_DQ_RES_INOS: case XFS_TRANS_DQ_RES_INOS:
qtrx->qt_ino_res += (ulong)delta; qtrx->qt_ino_res += delta;
break; break;
/* /*
@ -245,7 +245,7 @@ xfs_trans_mod_dquot(
*/ */
case XFS_TRANS_DQ_ICOUNT: case XFS_TRANS_DQ_ICOUNT:
if (qtrx->qt_ino_res && delta > 0) { if (qtrx->qt_ino_res && delta > 0) {
qtrx->qt_ino_res_used += (ulong)delta; qtrx->qt_ino_res_used += delta;
ASSERT(qtrx->qt_ino_res >= qtrx->qt_ino_res_used); ASSERT(qtrx->qt_ino_res >= qtrx->qt_ino_res_used);
} }
qtrx->qt_icount_delta += delta; qtrx->qt_icount_delta += delta;
@ -255,7 +255,7 @@ xfs_trans_mod_dquot(
* rtblk reservation * rtblk reservation
*/ */
case XFS_TRANS_DQ_RES_RTBLKS: case XFS_TRANS_DQ_RES_RTBLKS:
qtrx->qt_rtblk_res += (ulong)delta; qtrx->qt_rtblk_res += delta;
break; break;
/* /*
@ -263,7 +263,7 @@ xfs_trans_mod_dquot(
*/ */
case XFS_TRANS_DQ_RTBCOUNT: case XFS_TRANS_DQ_RTBCOUNT:
if (qtrx->qt_rtblk_res && delta > 0) { if (qtrx->qt_rtblk_res && delta > 0) {
qtrx->qt_rtblk_res_used += (ulong)delta; qtrx->qt_rtblk_res_used += delta;
ASSERT(qtrx->qt_rtblk_res >= qtrx->qt_rtblk_res_used); ASSERT(qtrx->qt_rtblk_res >= qtrx->qt_rtblk_res_used);
} }
qtrx->qt_rtbcount_delta += delta; qtrx->qt_rtbcount_delta += delta;
@ -288,8 +288,8 @@ xfs_trans_mod_dquot(
*/ */
STATIC void STATIC void
xfs_trans_dqlockedjoin( xfs_trans_dqlockedjoin(
xfs_trans_t *tp, struct xfs_trans *tp,
xfs_dqtrx_t *q) struct xfs_dqtrx *q)
{ {
ASSERT(q[0].qt_dquot != NULL); ASSERT(q[0].qt_dquot != NULL);
if (q[1].qt_dquot == NULL) { if (q[1].qt_dquot == NULL) {
@ -320,8 +320,8 @@ xfs_trans_apply_dquot_deltas(
struct xfs_dquot *dqp; struct xfs_dquot *dqp;
struct xfs_dqtrx *qtrx, *qa; struct xfs_dqtrx *qtrx, *qa;
struct xfs_disk_dquot *d; struct xfs_disk_dquot *d;
long totalbdelta; int64_t totalbdelta;
long totalrtbdelta; int64_t totalrtbdelta;
if (!(tp->t_flags & XFS_TRANS_DQ_DIRTY)) if (!(tp->t_flags & XFS_TRANS_DQ_DIRTY))
return; return;
@ -413,7 +413,7 @@ xfs_trans_apply_dquot_deltas(
* reservation that a transaction structure knows of. * reservation that a transaction structure knows of.
*/ */
if (qtrx->qt_blk_res != 0) { if (qtrx->qt_blk_res != 0) {
ulong blk_res_used = 0; uint64_t blk_res_used = 0;
if (qtrx->qt_bcount_delta > 0) if (qtrx->qt_bcount_delta > 0)
blk_res_used = qtrx->qt_bcount_delta; blk_res_used = qtrx->qt_bcount_delta;
@ -501,7 +501,7 @@ xfs_trans_unreserve_and_mod_dquots(
{ {
int i, j; int i, j;
xfs_dquot_t *dqp; xfs_dquot_t *dqp;
xfs_dqtrx_t *qtrx, *qa; struct xfs_dqtrx *qtrx, *qa;
bool locked; bool locked;
if (!tp->t_dqinfo || !(tp->t_flags & XFS_TRANS_DQ_DIRTY)) if (!tp->t_dqinfo || !(tp->t_flags & XFS_TRANS_DQ_DIRTY))
@ -585,7 +585,7 @@ xfs_trans_dqresv(
xfs_trans_t *tp, xfs_trans_t *tp,
xfs_mount_t *mp, xfs_mount_t *mp,
xfs_dquot_t *dqp, xfs_dquot_t *dqp,
long nblks, int64_t nblks,
long ninos, long ninos,
uint flags) uint flags)
{ {
@ -745,7 +745,7 @@ xfs_trans_reserve_quota_bydquots(
struct xfs_dquot *udqp, struct xfs_dquot *udqp,
struct xfs_dquot *gdqp, struct xfs_dquot *gdqp,
struct xfs_dquot *pdqp, struct xfs_dquot *pdqp,
long nblks, int64_t nblks,
long ninos, long ninos,
uint flags) uint flags)
{ {
@ -804,7 +804,7 @@ int
xfs_trans_reserve_quota_nblks( xfs_trans_reserve_quota_nblks(
struct xfs_trans *tp, struct xfs_trans *tp,
struct xfs_inode *ip, struct xfs_inode *ip,
long nblks, int64_t nblks,
long ninos, long ninos,
uint flags) uint flags)
{ {