Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs

Pull ext3 removal, quota & udf fixes from Jan Kara:
 "The biggest change in the pull is the removal of ext3 filesystem
  driver (~28k lines removed).  Ext4 driver is a full featured
  replacement these days and both RH and SUSE use it for several years
  without issues.  Also there are some workarounds in VM & block layer
  mainly for ext3 which we could eventually get rid of.

  Other larger change is addition of proper error handling for
  dquot_initialize().  The rest is small fixes and cleanups"

[ I wasn't convinced about the ext3 removal and worried about things
  falling through the cracks for legacy users, but ext4 maintainers
  piped up and were all unanimously in favor of removal, and maintaining
  all legacy ext3 support inside ext4.   - Linus ]

* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
  udf: Don't modify filesystem for read-only mounts
  quota: remove an unneeded condition
  ext4: memory leak on error in ext4_symlink()
  mm/Kconfig: NEED_BOUNCE_POOL: clean-up condition
  ext4: Improve ext4 Kconfig test
  block: Remove forced page bouncing under IO
  fs: Remove ext3 filesystem driver
  doc: Update doc about journalling layer
  jfs: Handle error from dquot_initialize()
  reiserfs: Handle error from dquot_initialize()
  ocfs2: Handle error from dquot_initialize()
  ext4: Handle error from dquot_initialize()
  ext2: Handle error from dquot_initalize()
  quota: Propagate error from ->acquire_dquot()
This commit is contained in:
Linus Torvalds 2015-09-03 12:28:30 -07:00
commit e31fb9e005
69 changed files with 505 additions and 28389 deletions

View File

@ -146,36 +146,30 @@
The journalling layer is easy to use. You need to
first of all create a journal_t data structure. There are
two calls to do this dependent on how you decide to allocate the physical
media on which the journal resides. The journal_init_inode() call
is for journals stored in filesystem inodes, or the journal_init_dev()
call can be use for journal stored on a raw device (in a continuous range
media on which the journal resides. The jbd2_journal_init_inode() call
is for journals stored in filesystem inodes, or the jbd2_journal_init_dev()
call can be used for journal stored on a raw device (in a continuous range
of blocks). A journal_t is a typedef for a struct pointer, so when
you are finally finished make sure you call journal_destroy() on it
you are finally finished make sure you call jbd2_journal_destroy() on it
to free up any used kernel memory.
</para>
<para>
Once you have got your journal_t object you need to 'mount' or load the journal
file, unless of course you haven't initialised it yet - in which case you
need to call journal_create().
file. The journalling layer expects the space for the journal was already
allocated and initialized properly by the userspace tools. When loading the
journal you must call jbd2_journal_load() to process journal contents. If the
client file system detects the journal contents does not need to be processed
(or even need not have valid contents), it may call jbd2_journal_wipe() to
clear the journal contents before calling jbd2_journal_load().
</para>
<para>
Most of the time however your journal file will already have been created, but
before you load it you must call journal_wipe() to empty the journal file.
Hang on, you say , what if the filesystem wasn't cleanly umount()'d . Well, it is the
job of the client file system to detect this and skip the call to journal_wipe().
</para>
<para>
In either case the next call should be to journal_load() which prepares the
journal file for use. Note that journal_wipe(..,0) calls journal_skip_recovery()
for you if it detects any outstanding transactions in the journal and similarly
journal_load() will call journal_recover() if necessary.
I would advise reading fs/ext3/super.c for examples on this stage.
[RGG: Why is the journal_wipe() call necessary - doesn't this needlessly
complicate the API. Or isn't a good idea for the journal layer to hide
dirty mounts from the client fs]
Note that jbd2_journal_wipe(..,0) calls jbd2_journal_skip_recovery() for you if
it detects any outstanding transactions in the journal and similarly
jbd2_journal_load() will call jbd2_journal_recover() if necessary. I would
advise reading ext4_load_journal() in fs/ext4/super.c for examples on this
stage.
</para>
<para>
@ -189,41 +183,41 @@ You still need to actually journal your filesystem changes, this
is done by wrapping them into transactions. Additionally you
also need to wrap the modification of each of the buffers
with calls to the journal layer, so it knows what the modifications
you are actually making are. To do this use journal_start() which
you are actually making are. To do this use jbd2_journal_start() which
returns a transaction handle.
</para>
<para>
journal_start()
and its counterpart journal_stop(), which indicates the end of a transaction
are nestable calls, so you can reenter a transaction if necessary,
but remember you must call journal_stop() the same number of times as
journal_start() before the transaction is completed (or more accurately
leaves the update phase). Ext3/VFS makes use of this feature to simplify
quota support.
jbd2_journal_start()
and its counterpart jbd2_journal_stop(), which indicates the end of a
transaction are nestable calls, so you can reenter a transaction if necessary,
but remember you must call jbd2_journal_stop() the same number of times as
jbd2_journal_start() before the transaction is completed (or more accurately
leaves the update phase). Ext4/VFS makes use of this feature to simplify
handling of inode dirtying, quota support, etc.
</para>
<para>
Inside each transaction you need to wrap the modifications to the
individual buffers (blocks). Before you start to modify a buffer you
need to call journal_get_{create,write,undo}_access() as appropriate,
need to call jbd2_journal_get_{create,write,undo}_access() as appropriate,
this allows the journalling layer to copy the unmodified data if it
needs to. After all the buffer may be part of a previously uncommitted
transaction.
At this point you are at last ready to modify a buffer, and once
you are have done so you need to call journal_dirty_{meta,}data().
you are have done so you need to call jbd2_journal_dirty_{meta,}data().
Or if you've asked for access to a buffer you now know is now longer
required to be pushed back on the device you can call journal_forget()
required to be pushed back on the device you can call jbd2_journal_forget()
in much the same way as you might have used bforget() in the past.
</para>
<para>
A journal_flush() may be called at any time to commit and checkpoint
A jbd2_journal_flush() may be called at any time to commit and checkpoint
all your transactions.
</para>
<para>
Then at umount time , in your put_super() you can then call journal_destroy()
Then at umount time , in your put_super() you can then call jbd2_journal_destroy()
to clean up your in-core journal object.
</para>
@ -231,82 +225,74 @@ to clean up your in-core journal object.
Unfortunately there a couple of ways the journal layer can cause a deadlock.
The first thing to note is that each task can only have
a single outstanding transaction at any one time, remember nothing
commits until the outermost journal_stop(). This means
commits until the outermost jbd2_journal_stop(). This means
you must complete the transaction at the end of each file/inode/address
etc. operation you perform, so that the journalling system isn't re-entered
on another journal. Since transactions can't be nested/batched
across differing journals, and another filesystem other than
yours (say ext3) may be modified in a later syscall.
yours (say ext4) may be modified in a later syscall.
</para>
<para>
The second case to bear in mind is that journal_start() can
The second case to bear in mind is that jbd2_journal_start() can
block if there isn't enough space in the journal for your transaction
(based on the passed nblocks param) - when it blocks it merely(!) needs to
wait for transactions to complete and be committed from other tasks,
so essentially we are waiting for journal_stop(). So to avoid
deadlocks you must treat journal_start/stop() as if they
so essentially we are waiting for jbd2_journal_stop(). So to avoid
deadlocks you must treat jbd2_journal_start/stop() as if they
were semaphores and include them in your semaphore ordering rules to prevent
deadlocks. Note that journal_extend() has similar blocking behaviour to
journal_start() so you can deadlock here just as easily as on journal_start().
deadlocks. Note that jbd2_journal_extend() has similar blocking behaviour to
jbd2_journal_start() so you can deadlock here just as easily as on
jbd2_journal_start().
</para>
<para>
Try to reserve the right number of blocks the first time. ;-). This will
be the maximum number of blocks you are going to touch in this transaction.
I advise having a look at at least ext3_jbd.h to see the basis on which
ext3 uses to make these decisions.
I advise having a look at at least ext4_jbd.h to see the basis on which
ext4 uses to make these decisions.
</para>
<para>
Another wriggle to watch out for is your on-disk block allocation strategy.
why? Because, if you undo a delete, you need to ensure you haven't reused any
of the freed blocks in a later transaction. One simple way of doing this
is make sure any blocks you allocate only have checkpointed transactions
listed against them. Ext3 does this in ext3_test_allocatable().
Why? Because, if you do a delete, you need to ensure you haven't reused any
of the freed blocks until the transaction freeing these blocks commits. If you
reused these blocks and crash happens, there is no way to restore the contents
of the reallocated blocks at the end of the last fully committed transaction.
One simple way of doing this is to mark blocks as free in internal in-memory
block allocation structures only after the transaction freeing them commits.
Ext4 uses journal commit callback for this purpose.
</para>
<para>
Lock is also providing through journal_{un,}lock_updates(),
ext3 uses this when it wants a window with a clean and stable fs for a moment.
eg.
With journal commit callbacks you can ask the journalling layer to call a
callback function when the transaction is finally committed to disk, so that
you can do some of your own management. You ask the journalling layer for
calling the callback by simply setting journal->j_commit_callback function
pointer and that function is called after each transaction commit. You can also
use transaction->t_private_list for attaching entries to a transaction that
need processing when the transaction commits.
</para>
<para>
JBD2 also provides a way to block all transaction updates via
jbd2_journal_{un,}lock_updates(). Ext4 uses this when it wants a window with a
clean and stable fs for a moment. E.g.
</para>
<programlisting>
journal_lock_updates() //stop new stuff happening..
journal_flush() // checkpoint everything.
jbd2_journal_lock_updates() //stop new stuff happening..
jbd2_journal_flush() // checkpoint everything.
..do stuff on stable fs
journal_unlock_updates() // carry on with filesystem use.
jbd2_journal_unlock_updates() // carry on with filesystem use.
</programlisting>
<para>
The opportunities for abuse and DOS attacks with this should be obvious,
if you allow unprivileged userspace to trigger codepaths containing these
calls.
</para>
<para>
A new feature of jbd since 2.5.25 is commit callbacks with the new
journal_callback_set() function you can now ask the journalling layer
to call you back when the transaction is finally committed to disk, so that
you can do some of your own management. The key to this is the journal_callback
struct, this maintains the internal callback information but you can
extend it like this:-
</para>
<programlisting>
struct myfs_callback_s {
//Data structure element required by jbd..
struct journal_callback for_jbd;
// Stuff for myfs allocated together.
myfs_inode* i_commited;
}
</programlisting>
<para>
this would be useful if you needed to know when data was committed to a
particular inode.
</para>
</sect2>
@ -319,36 +305,6 @@ being each mount, each modification (transaction) and each changed buffer
to tell the journalling layer about them.
</para>
<para>
Here is a some pseudo code to give you an idea of how it works, as
an example.
</para>
<programlisting>
journal_t* my_jnrl = journal_create();
journal_init_{dev,inode}(jnrl,...)
if (clean) journal_wipe();
journal_load();
foreach(transaction) { /*transactions must be
completed before
a syscall returns to
userspace*/
handle_t * xct=journal_start(my_jnrl);
foreach(bh) {
journal_get_{create,write,undo}_access(xact,bh);
if ( myfs_modify(bh) ) { /* returns true
if makes changes */
journal_dirty_{meta,}data(xact,bh);
} else {
journal_forget(bh);
}
}
journal_stop(xct);
}
journal_destroy(my_jrnl);
</programlisting>
</sect2>
</sect1>
@ -357,13 +313,13 @@ an example.
<title>Data Types</title>
<para>
The journalling layer uses typedefs to 'hide' the concrete definitions
of the structures used. As a client of the JBD layer you can
of the structures used. As a client of the JBD2 layer you can
just rely on the using the pointer as a magic cookie of some sort.
Obviously the hiding is not enforced as this is 'C'.
</para>
<sect2 id="structures"><title>Structures</title>
!Iinclude/linux/jbd.h
!Iinclude/linux/jbd2.h
</sect2>
</sect1>
@ -375,11 +331,11 @@ an example.
manage transactions
</para>
<sect2 id="journal_level"><title>Journal Level</title>
!Efs/jbd/journal.c
!Ifs/jbd/recovery.c
!Efs/jbd2/journal.c
!Ifs/jbd2/recovery.c
</sect2>
<sect2 id="transaction_level"><title>Transasction Level</title>
!Efs/jbd/transaction.c
!Efs/jbd2/transaction.c
</sect2>
</sect1>
<sect1 id="see_also">

View File

@ -360,8 +360,8 @@ and are copied into the filesystem. If a transaction is incomplete at
the time of the crash, then there is no guarantee of consistency for
the blocks in that transaction so they are discarded (which means any
filesystem changes they represent are also lost).
Check Documentation/filesystems/ext3.txt if you want to read more about
ext3 and journaling.
Check Documentation/filesystems/ext4.txt if you want to read more about
ext4 and journaling.
References
==========

View File

@ -6,210 +6,7 @@ Ext3 was originally released in September 1999. Written by Stephen Tweedie
for the 2.2 branch, and ported to 2.4 kernels by Peter Braam, Andreas Dilger,
Andrew Morton, Alexander Viro, Ted Ts'o and Stephen Tweedie.
Ext3 is the ext2 filesystem enhanced with journalling capabilities.
Ext3 is the ext2 filesystem enhanced with journalling capabilities. The
filesystem is a subset of ext4 filesystem so use ext4 driver for accessing
ext3 filesystems.
Options
=======
When mounting an ext3 filesystem, the following option are accepted:
(*) == default
ro Mount filesystem read only. Note that ext3 will replay
the journal (and thus write to the partition) even when
mounted "read only". Mount options "ro,noload" can be
used to prevent writes to the filesystem.
journal=update Update the ext3 file system's journal to the current
format.
journal=inum When a journal already exists, this option is ignored.
Otherwise, it specifies the number of the inode which
will represent the ext3 file system's journal file.
journal_path=path
journal_dev=devnum When the external journal device's major/minor numbers
have changed, these options allow the user to specify
the new journal location. The journal device is
identified through either its new major/minor numbers
encoded in devnum, or via a path to the device.
norecovery Don't load the journal on mounting. Note that this forces
noload mount of inconsistent filesystem, which can lead to
various problems.
data=journal All data are committed into the journal prior to being
written into the main file system.
data=ordered (*) All data are forced directly out to the main file
system prior to its metadata being committed to the
journal.
data=writeback Data ordering is not preserved, data may be written
into the main file system after its metadata has been
committed to the journal.
commit=nrsec (*) Ext3 can be told to sync all its data and metadata
every 'nrsec' seconds. The default value is 5 seconds.
This means that if you lose your power, you will lose
as much as the latest 5 seconds of work (your
filesystem will not be damaged though, thanks to the
journaling). This default value (or any low value)
will hurt performance, but it's good for data-safety.
Setting it to 0 will have the same effect as leaving
it at the default (5 seconds).
Setting it to very large values will improve
performance.
barrier=<0|1(*)> This enables/disables the use of write barriers in
barrier (*) the jbd code. barrier=0 disables, barrier=1 enables.
nobarrier This also requires an IO stack which can support
barriers, and if jbd gets an error on a barrier
write, it will disable again with a warning.
Write barriers enforce proper on-disk ordering
of journal commits, making volatile disk write caches
safe to use, at some performance penalty. If
your disks are battery-backed in one way or another,
disabling barriers may safely improve performance.
The mount options "barrier" and "nobarrier" can
also be used to enable or disable barriers, for
consistency with other ext3 mount options.
user_xattr Enables Extended User Attributes. Additionally, you
need to have extended attribute support enabled in the
kernel configuration (CONFIG_EXT3_FS_XATTR). See the
attr(5) manual page and http://acl.bestbits.at/ to
learn more about extended attributes.
nouser_xattr Disables Extended User Attributes.
acl Enables POSIX Access Control Lists support.
Additionally, you need to have ACL support enabled in
the kernel configuration (CONFIG_EXT3_FS_POSIX_ACL).
See the acl(5) manual page and http://acl.bestbits.at/
for more information.
noacl This option disables POSIX Access Control List
support.
reservation
noreservation
bsddf (*) Make 'df' act like BSD.
minixdf Make 'df' act like Minix.
check=none Don't do extra checking of bitmaps on mount.
nocheck
debug Extra debugging information is sent to syslog.
errors=remount-ro Remount the filesystem read-only on an error.
errors=continue Keep going on a filesystem error.
errors=panic Panic and halt the machine if an error occurs.
(These mount options override the errors behavior
specified in the superblock, which can be
configured using tune2fs.)
data_err=ignore(*) Just print an error message if an error occurs
in a file data buffer in ordered mode.
data_err=abort Abort the journal if an error occurs in a file
data buffer in ordered mode.
grpid Give objects the same group ID as their creator.
bsdgroups
nogrpid (*) New objects have the group ID of their creator.
sysvgroups
resgid=n The group ID which may use the reserved blocks.
resuid=n The user ID which may use the reserved blocks.
sb=n Use alternate superblock at this location.
quota These options are ignored by the filesystem. They
noquota are used only by quota tools to recognize volumes
grpquota where quota should be turned on. See documentation
usrquota in the quota-tools package for more details
(http://sourceforge.net/projects/linuxquota).
jqfmt=<quota type> These options tell filesystem details about quota
usrjquota=<file> so that quota information can be properly updated
grpjquota=<file> during journal replay. They replace the above
quota options. See documentation in the quota-tools
package for more details
(http://sourceforge.net/projects/linuxquota).
Specification
=============
Ext3 shares all disk implementation with the ext2 filesystem, and adds
transactions capabilities to ext2. Journaling is done by the Journaling Block
Device layer.
Journaling Block Device layer
-----------------------------
The Journaling Block Device layer (JBD) isn't ext3 specific. It was designed
to add journaling capabilities to a block device. The ext3 filesystem code
will inform the JBD of modifications it is performing (called a transaction).
The journal supports the transactions start and stop, and in case of a crash,
the journal can replay the transactions to quickly put the partition back into
a consistent state.
Handles represent a single atomic update to a filesystem. JBD can handle an
external journal on a block device.
Data Mode
---------
There are 3 different data modes:
* writeback mode
In data=writeback mode, ext3 does not journal data at all. This mode provides
a similar level of journaling as that of XFS, JFS, and ReiserFS in its default
mode - metadata journaling. A crash+recovery can cause incorrect data to
appear in files which were written shortly before the crash. This mode will
typically provide the best ext3 performance.
* ordered mode
In data=ordered mode, ext3 only officially journals metadata, but it logically
groups metadata and data blocks into a single unit called a transaction. When
it's time to write the new metadata out to disk, the associated data blocks
are written first. In general, this mode performs slightly slower than
writeback but significantly faster than journal mode.
* journal mode
data=journal mode provides full data and metadata journaling. All new data is
written to the journal first, and then to its final location.
In the event of a crash, the journal can be replayed, bringing both data and
metadata into a consistent state. This mode is the slowest except when data
needs to be read from and written to disk at the same time where it
outperforms all other modes.
Compatibility
-------------
Ext2 partitions can be easily convert to ext3, with `tune2fs -j <dev>`.
Ext3 is fully compatible with Ext2. Ext3 partitions can easily be mounted as
Ext2.
External Tools
==============
See manual pages to learn more.
tune2fs: create a ext3 journal on a ext2 partition with the -j flag.
mke2fs: create a ext3 partition with the -j flag.
debugfs: ext2 and ext3 file system debugger.
ext2online: online (mounted) ext2 and ext3 filesystem resizer
References
==========
kernel source: <file:fs/ext3/>
<file:fs/jbd/>
programs: http://e2fsprogs.sourceforge.net/
http://ext2resize.sourceforge.net
useful links: http://www.ibm.com/developerworks/library/l-fs7/index.html
http://www.ibm.com/developerworks/library/l-fs8/index.html

View File

@ -769,7 +769,7 @@ struct address_space_operations {
to stall to allow flushers a chance to complete some IO. Ordinarily
it can use PageDirty and PageWriteback but some filesystems have
more complex state (unstable pages in NFS prevent reclaim) or
do not set those flags due to locking problems (jbd). This callback
do not set those flags due to locking problems. This callback
allows a filesystem to indicate to the VM if a page should be
treated as dirty or writeback for the purposes of stalling.

View File

@ -4078,15 +4078,6 @@ F: Documentation/filesystems/ext2.txt
F: fs/ext2/
F: include/linux/ext2*
EXT3 FILE SYSTEM
M: Jan Kara <jack@suse.com>
M: Andrew Morton <akpm@linux-foundation.org>
M: Andreas Dilger <adilger.kernel@dilger.ca>
L: linux-ext4@vger.kernel.org
S: Maintained
F: Documentation/filesystems/ext3.txt
F: fs/ext3/
EXT4 FILE SYSTEM
M: "Theodore Ts'o" <tytso@mit.edu>
M: Andreas Dilger <adilger.kernel@dilger.ca>
@ -5787,16 +5778,9 @@ S: Maintained
F: fs/jffs2/
F: include/uapi/linux/jffs2.h
JOURNALLING LAYER FOR BLOCK DEVICES (JBD)
M: Andrew Morton <akpm@linux-foundation.org>
M: Jan Kara <jack@suse.com>
L: linux-ext4@vger.kernel.org
S: Maintained
F: fs/jbd/
F: include/linux/jbd.h
JOURNALLING LAYER FOR BLOCK DEVICES (JBD2)
M: "Theodore Ts'o" <tytso@mit.edu>
M: Jan Kara <jack@suse.com>
L: linux-ext4@vger.kernel.org
S: Maintained
F: fs/jbd2/

View File

@ -177,26 +177,8 @@ static void bounce_end_io_read_isa(struct bio *bio)
__bounce_end_io_read(bio, isa_page_pool);
}
#ifdef CONFIG_NEED_BOUNCE_POOL
static int must_snapshot_stable_pages(struct request_queue *q, struct bio *bio)
{
if (bio_data_dir(bio) != WRITE)
return 0;
if (!bdi_cap_stable_pages_required(&q->backing_dev_info))
return 0;
return bio_flagged(bio, BIO_SNAP_STABLE);
}
#else
static int must_snapshot_stable_pages(struct request_queue *q, struct bio *bio)
{
return 0;
}
#endif /* CONFIG_NEED_BOUNCE_POOL */
static void __blk_queue_bounce(struct request_queue *q, struct bio **bio_orig,
mempool_t *pool, int force)
mempool_t *pool)
{
struct bio *bio;
int rw = bio_data_dir(*bio_orig);
@ -204,8 +186,6 @@ static void __blk_queue_bounce(struct request_queue *q, struct bio **bio_orig,
struct bvec_iter iter;
unsigned i;
if (force)
goto bounce;
bio_for_each_segment(from, *bio_orig, iter)
if (page_to_pfn(from.bv_page) > queue_bounce_pfn(q))
goto bounce;
@ -217,7 +197,7 @@ static void __blk_queue_bounce(struct request_queue *q, struct bio **bio_orig,
bio_for_each_segment_all(to, bio, i) {
struct page *page = to->bv_page;
if (page_to_pfn(page) <= queue_bounce_pfn(q) && !force)
if (page_to_pfn(page) <= queue_bounce_pfn(q))
continue;
to->bv_page = mempool_alloc(pool, q->bounce_gfp);
@ -255,7 +235,6 @@ static void __blk_queue_bounce(struct request_queue *q, struct bio **bio_orig,
void blk_queue_bounce(struct request_queue *q, struct bio **bio_orig)
{
int must_bounce;
mempool_t *pool;
/*
@ -264,15 +243,13 @@ void blk_queue_bounce(struct request_queue *q, struct bio **bio_orig)
if (!bio_has_data(*bio_orig))
return;
must_bounce = must_snapshot_stable_pages(q, *bio_orig);
/*
* for non-isa bounce case, just check if the bounce pfn is equal
* to or bigger than the highest pfn in the system -- in that case,
* don't waste time iterating over bio segments
*/
if (!(q->bounce_gfp & GFP_DMA)) {
if (queue_bounce_pfn(q) >= blk_max_pfn && !must_bounce)
if (queue_bounce_pfn(q) >= blk_max_pfn)
return;
pool = page_pool;
} else {
@ -283,7 +260,7 @@ void blk_queue_bounce(struct request_queue *q, struct bio **bio_orig)
/*
* slow path
*/
__blk_queue_bounce(q, bio_orig, pool, must_bounce);
__blk_queue_bounce(q, bio_orig, pool);
}
EXPORT_SYMBOL(blk_queue_bounce);

View File

@ -11,18 +11,15 @@ config DCACHE_WORD_ACCESS
if BLOCK
source "fs/ext2/Kconfig"
source "fs/ext3/Kconfig"
source "fs/ext4/Kconfig"
source "fs/jbd/Kconfig"
source "fs/jbd2/Kconfig"
config FS_MBCACHE
# Meta block cache for Extended Attributes (ext2/ext3/ext4)
tristate
default y if EXT2_FS=y && EXT2_FS_XATTR
default y if EXT3_FS=y && EXT3_FS_XATTR
default y if EXT4_FS=y
default m if EXT2_FS_XATTR || EXT3_FS_XATTR || EXT4_FS
default m if EXT2_FS_XATTR || EXT4_FS
source "fs/reiserfs/Kconfig"
source "fs/jfs/Kconfig"

View File

@ -62,12 +62,10 @@ obj-$(CONFIG_DLM) += dlm/
# Do not add any filesystems before this line
obj-$(CONFIG_FSCACHE) += fscache/
obj-$(CONFIG_REISERFS_FS) += reiserfs/
obj-$(CONFIG_EXT3_FS) += ext3/ # Before ext2 so root fs can be ext3
obj-$(CONFIG_EXT2_FS) += ext2/
# We place ext4 after ext2 so plain ext2 root fs's are mounted using ext2
# unless explicitly requested by rootfstype
obj-$(CONFIG_EXT4_FS) += ext4/
obj-$(CONFIG_JBD) += jbd/
obj-$(CONFIG_JBD2) += jbd2/
obj-$(CONFIG_CRAMFS) += cramfs/
obj-$(CONFIG_SQUASHFS) += squashfs/

View File

@ -577,7 +577,10 @@ struct inode *ext2_new_inode(struct inode *dir, umode_t mode,
goto fail;
}
dquot_initialize(inode);
err = dquot_initialize(inode);
if (err)
goto fail_drop;
err = dquot_alloc_inode(inode);
if (err)
goto fail_drop;

View File

@ -1552,8 +1552,11 @@ int ext2_setattr(struct dentry *dentry, struct iattr *iattr)
if (error)
return error;
if (is_quota_modification(inode, iattr))
dquot_initialize(inode);
if (is_quota_modification(inode, iattr)) {
error = dquot_initialize(inode);
if (error)
return error;
}
if ((iattr->ia_valid & ATTR_UID && !uid_eq(iattr->ia_uid, inode->i_uid)) ||
(iattr->ia_valid & ATTR_GID && !gid_eq(iattr->ia_gid, inode->i_gid))) {
error = dquot_transfer(inode, iattr);

View File

@ -96,8 +96,11 @@ struct dentry *ext2_get_parent(struct dentry *child)
static int ext2_create (struct inode * dir, struct dentry * dentry, umode_t mode, bool excl)
{
struct inode *inode;
int err;
dquot_initialize(dir);
err = dquot_initialize(dir);
if (err)
return err;
inode = ext2_new_inode(dir, mode, &dentry->d_name);
if (IS_ERR(inode))
@ -143,7 +146,9 @@ static int ext2_mknod (struct inode * dir, struct dentry *dentry, umode_t mode,
if (!new_valid_dev(rdev))
return -EINVAL;
dquot_initialize(dir);
err = dquot_initialize(dir);
if (err)
return err;
inode = ext2_new_inode (dir, mode, &dentry->d_name);
err = PTR_ERR(inode);
@ -169,7 +174,9 @@ static int ext2_symlink (struct inode * dir, struct dentry * dentry,
if (l > sb->s_blocksize)
goto out;
dquot_initialize(dir);
err = dquot_initialize(dir);
if (err)
goto out;
inode = ext2_new_inode (dir, S_IFLNK | S_IRWXUGO, &dentry->d_name);
err = PTR_ERR(inode);
@ -212,7 +219,9 @@ static int ext2_link (struct dentry * old_dentry, struct inode * dir,
struct inode *inode = d_inode(old_dentry);
int err;
dquot_initialize(dir);
err = dquot_initialize(dir);
if (err)
return err;
inode->i_ctime = CURRENT_TIME_SEC;
inode_inc_link_count(inode);
@ -233,7 +242,9 @@ static int ext2_mkdir(struct inode * dir, struct dentry * dentry, umode_t mode)
struct inode * inode;
int err;
dquot_initialize(dir);
err = dquot_initialize(dir);
if (err)
return err;
inode_inc_link_count(dir);
@ -279,13 +290,17 @@ static int ext2_unlink(struct inode * dir, struct dentry *dentry)
struct inode * inode = d_inode(dentry);
struct ext2_dir_entry_2 * de;
struct page * page;
int err = -ENOENT;
int err;
dquot_initialize(dir);
err = dquot_initialize(dir);
if (err)
goto out;
de = ext2_find_entry (dir, &dentry->d_name, &page);
if (!de)
if (!de) {
err = -ENOENT;
goto out;
}
err = ext2_delete_entry (de, page);
if (err)
@ -323,14 +338,21 @@ static int ext2_rename (struct inode * old_dir, struct dentry * old_dentry,
struct ext2_dir_entry_2 * dir_de = NULL;
struct page * old_page;
struct ext2_dir_entry_2 * old_de;
int err = -ENOENT;
int err;
dquot_initialize(old_dir);
dquot_initialize(new_dir);
err = dquot_initialize(old_dir);
if (err)
goto out;
err = dquot_initialize(new_dir);
if (err)
goto out;
old_de = ext2_find_entry (old_dir, &old_dentry->d_name, &old_page);
if (!old_de)
if (!old_de) {
err = -ENOENT;
goto out;
}
if (S_ISDIR(old_inode->i_mode)) {
err = -EIO;

View File

@ -1,89 +0,0 @@
config EXT3_FS
tristate "Ext3 journalling file system support"
select JBD
help
This is the journalling version of the Second extended file system
(often called ext3), the de facto standard Linux file system
(method to organize files on a storage device) for hard disks.
The journalling code included in this driver means you do not have
to run e2fsck (file system checker) on your file systems after a
crash. The journal keeps track of any changes that were being made
at the time the system crashed, and can ensure that your file system
is consistent without the need for a lengthy check.
Other than adding the journal to the file system, the on-disk format
of ext3 is identical to ext2. It is possible to freely switch
between using the ext3 driver and the ext2 driver, as long as the
file system has been cleanly unmounted, or e2fsck is run on the file
system.
To add a journal on an existing ext2 file system or change the
behavior of ext3 file systems, you can use the tune2fs utility ("man
tune2fs"). To modify attributes of files and directories on ext3
file systems, use chattr ("man chattr"). You need to be using
e2fsprogs version 1.20 or later in order to create ext3 journals
(available at <http://sourceforge.net/projects/e2fsprogs/>).
To compile this file system support as a module, choose M here: the
module will be called ext3.
config EXT3_DEFAULTS_TO_ORDERED
bool "Default to 'data=ordered' in ext3"
depends on EXT3_FS
default y
help
The journal mode options for ext3 have different tradeoffs
between when data is guaranteed to be on disk and
performance. The use of "data=writeback" can cause
unwritten data to appear in files after an system crash or
power failure, which can be a security issue. However,
"data=ordered" mode can also result in major performance
problems, including seconds-long delays before an fsync()
call returns. For details, see:
http://ext4.wiki.kernel.org/index.php/Ext3_data_mode_tradeoffs
If you have been historically happy with ext3's performance,
data=ordered mode will be a safe choice and you should
answer 'y' here. If you understand the reliability and data
privacy issues of data=writeback and are willing to make
that trade off, answer 'n'.
config EXT3_FS_XATTR
bool "Ext3 extended attributes"
depends on EXT3_FS
default y
help
Extended attributes are name:value pairs associated with inodes by
the kernel or by users (see the attr(5) manual page, or visit
<http://acl.bestbits.at/> for details).
If unsure, say N.
You need this for POSIX ACL support on ext3.
config EXT3_FS_POSIX_ACL
bool "Ext3 POSIX Access Control Lists"
depends on EXT3_FS_XATTR
select FS_POSIX_ACL
help
Posix Access Control Lists (ACLs) support permissions for users and
groups beyond the owner/group/world scheme.
To learn more about Access Control Lists, visit the Posix ACLs for
Linux website <http://acl.bestbits.at/>.
If you don't know what Access Control Lists are, say N
config EXT3_FS_SECURITY
bool "Ext3 Security Labels"
depends on EXT3_FS_XATTR
help
Security labels support alternative access control models
implemented by security modules like SELinux. This option
enables an extended attribute handler for file security
labels in the ext3 filesystem.
If you are not using a security module that requires using
extended attributes for file security labels, say N.

View File

@ -1,12 +0,0 @@
#
# Makefile for the linux ext3-filesystem routines.
#
obj-$(CONFIG_EXT3_FS) += ext3.o
ext3-y := balloc.o bitmap.o dir.o file.o fsync.o ialloc.o inode.o \
ioctl.o namei.o super.o symlink.o hash.o resize.o ext3_jbd.o
ext3-$(CONFIG_EXT3_FS_XATTR) += xattr.o xattr_user.o xattr_trusted.o
ext3-$(CONFIG_EXT3_FS_POSIX_ACL) += acl.o
ext3-$(CONFIG_EXT3_FS_SECURITY) += xattr_security.o

View File

@ -1,281 +0,0 @@
/*
* linux/fs/ext3/acl.c
*
* Copyright (C) 2001-2003 Andreas Gruenbacher, <agruen@suse.de>
*/
#include "ext3.h"
#include "xattr.h"
#include "acl.h"
/*
* Convert from filesystem to in-memory representation.
*/
static struct posix_acl *
ext3_acl_from_disk(const void *value, size_t size)
{
const char *end = (char *)value + size;
int n, count;
struct posix_acl *acl;
if (!value)
return NULL;
if (size < sizeof(ext3_acl_header))
return ERR_PTR(-EINVAL);
if (((ext3_acl_header *)value)->a_version !=
cpu_to_le32(EXT3_ACL_VERSION))
return ERR_PTR(-EINVAL);
value = (char *)value + sizeof(ext3_acl_header);
count = ext3_acl_count(size);
if (count < 0)
return ERR_PTR(-EINVAL);
if (count == 0)
return NULL;
acl = posix_acl_alloc(count, GFP_NOFS);
if (!acl)
return ERR_PTR(-ENOMEM);
for (n=0; n < count; n++) {
ext3_acl_entry *entry =
(ext3_acl_entry *)value;
if ((char *)value + sizeof(ext3_acl_entry_short) > end)
goto fail;
acl->a_entries[n].e_tag = le16_to_cpu(entry->e_tag);
acl->a_entries[n].e_perm = le16_to_cpu(entry->e_perm);
switch(acl->a_entries[n].e_tag) {
case ACL_USER_OBJ:
case ACL_GROUP_OBJ:
case ACL_MASK:
case ACL_OTHER:
value = (char *)value +
sizeof(ext3_acl_entry_short);
break;
case ACL_USER:
value = (char *)value + sizeof(ext3_acl_entry);
if ((char *)value > end)
goto fail;
acl->a_entries[n].e_uid =
make_kuid(&init_user_ns,
le32_to_cpu(entry->e_id));
break;
case ACL_GROUP:
value = (char *)value + sizeof(ext3_acl_entry);
if ((char *)value > end)
goto fail;
acl->a_entries[n].e_gid =
make_kgid(&init_user_ns,
le32_to_cpu(entry->e_id));
break;
default:
goto fail;
}
}
if (value != end)
goto fail;
return acl;
fail:
posix_acl_release(acl);
return ERR_PTR(-EINVAL);
}
/*
* Convert from in-memory to filesystem representation.
*/
static void *
ext3_acl_to_disk(const struct posix_acl *acl, size_t *size)
{
ext3_acl_header *ext_acl;
char *e;
size_t n;
*size = ext3_acl_size(acl->a_count);
ext_acl = kmalloc(sizeof(ext3_acl_header) + acl->a_count *
sizeof(ext3_acl_entry), GFP_NOFS);
if (!ext_acl)
return ERR_PTR(-ENOMEM);
ext_acl->a_version = cpu_to_le32(EXT3_ACL_VERSION);
e = (char *)ext_acl + sizeof(ext3_acl_header);
for (n=0; n < acl->a_count; n++) {
const struct posix_acl_entry *acl_e = &acl->a_entries[n];
ext3_acl_entry *entry = (ext3_acl_entry *)e;
entry->e_tag = cpu_to_le16(acl_e->e_tag);
entry->e_perm = cpu_to_le16(acl_e->e_perm);
switch(acl_e->e_tag) {
case ACL_USER:
entry->e_id = cpu_to_le32(
from_kuid(&init_user_ns, acl_e->e_uid));
e += sizeof(ext3_acl_entry);
break;
case ACL_GROUP:
entry->e_id = cpu_to_le32(
from_kgid(&init_user_ns, acl_e->e_gid));
e += sizeof(ext3_acl_entry);
break;
case ACL_USER_OBJ:
case ACL_GROUP_OBJ:
case ACL_MASK:
case ACL_OTHER:
e += sizeof(ext3_acl_entry_short);
break;
default:
goto fail;
}
}
return (char *)ext_acl;
fail:
kfree(ext_acl);
return ERR_PTR(-EINVAL);
}
/*
* Inode operation get_posix_acl().
*
* inode->i_mutex: don't care
*/
struct posix_acl *
ext3_get_acl(struct inode *inode, int type)
{
int name_index;
char *value = NULL;
struct posix_acl *acl;
int retval;
switch (type) {
case ACL_TYPE_ACCESS:
name_index = EXT3_XATTR_INDEX_POSIX_ACL_ACCESS;
break;
case ACL_TYPE_DEFAULT:
name_index = EXT3_XATTR_INDEX_POSIX_ACL_DEFAULT;
break;
default:
BUG();
}
retval = ext3_xattr_get(inode, name_index, "", NULL, 0);
if (retval > 0) {
value = kmalloc(retval, GFP_NOFS);
if (!value)
return ERR_PTR(-ENOMEM);
retval = ext3_xattr_get(inode, name_index, "", value, retval);
}
if (retval > 0)
acl = ext3_acl_from_disk(value, retval);
else if (retval == -ENODATA || retval == -ENOSYS)
acl = NULL;
else
acl = ERR_PTR(retval);
kfree(value);
if (!IS_ERR(acl))
set_cached_acl(inode, type, acl);
return acl;
}
/*
* Set the access or default ACL of an inode.
*
* inode->i_mutex: down unless called from ext3_new_inode
*/
static int
__ext3_set_acl(handle_t *handle, struct inode *inode, int type,
struct posix_acl *acl)
{
int name_index;
void *value = NULL;
size_t size = 0;
int error;
switch(type) {
case ACL_TYPE_ACCESS:
name_index = EXT3_XATTR_INDEX_POSIX_ACL_ACCESS;
if (acl) {
error = posix_acl_equiv_mode(acl, &inode->i_mode);
if (error < 0)
return error;
else {
inode->i_ctime = CURRENT_TIME_SEC;
ext3_mark_inode_dirty(handle, inode);
if (error == 0)
acl = NULL;
}
}
break;
case ACL_TYPE_DEFAULT:
name_index = EXT3_XATTR_INDEX_POSIX_ACL_DEFAULT;
if (!S_ISDIR(inode->i_mode))
return acl ? -EACCES : 0;
break;
default:
return -EINVAL;
}
if (acl) {
value = ext3_acl_to_disk(acl, &size);
if (IS_ERR(value))
return (int)PTR_ERR(value);
}
error = ext3_xattr_set_handle(handle, inode, name_index, "",
value, size, 0);
kfree(value);
if (!error)
set_cached_acl(inode, type, acl);
return error;
}
int
ext3_set_acl(struct inode *inode, struct posix_acl *acl, int type)
{
handle_t *handle;
int error, retries = 0;
retry:
handle = ext3_journal_start(inode, EXT3_DATA_TRANS_BLOCKS(inode->i_sb));
if (IS_ERR(handle))
return PTR_ERR(handle);
error = __ext3_set_acl(handle, inode, type, acl);
ext3_journal_stop(handle);
if (error == -ENOSPC && ext3_should_retry_alloc(inode->i_sb, &retries))
goto retry;
return error;
}
/*
* Initialize the ACLs of a new inode. Called from ext3_new_inode.
*
* dir->i_mutex: down
* inode->i_mutex: up (access to inode is still exclusive)
*/
int
ext3_init_acl(handle_t *handle, struct inode *inode, struct inode *dir)
{
struct posix_acl *default_acl, *acl;
int error;
error = posix_acl_create(dir, &inode->i_mode, &default_acl, &acl);
if (error)
return error;
if (default_acl) {
error = __ext3_set_acl(handle, inode, ACL_TYPE_DEFAULT,
default_acl);
posix_acl_release(default_acl);
}
if (acl) {
if (!error)
error = __ext3_set_acl(handle, inode, ACL_TYPE_ACCESS,
acl);
posix_acl_release(acl);
}
return error;
}

View File

@ -1,72 +0,0 @@
/*
File: fs/ext3/acl.h
(C) 2001 Andreas Gruenbacher, <a.gruenbacher@computer.org>
*/
#include <linux/posix_acl_xattr.h>
#define EXT3_ACL_VERSION 0x0001
typedef struct {
__le16 e_tag;
__le16 e_perm;
__le32 e_id;
} ext3_acl_entry;
typedef struct {
__le16 e_tag;
__le16 e_perm;
} ext3_acl_entry_short;
typedef struct {
__le32 a_version;
} ext3_acl_header;
static inline size_t ext3_acl_size(int count)
{
if (count <= 4) {
return sizeof(ext3_acl_header) +
count * sizeof(ext3_acl_entry_short);
} else {
return sizeof(ext3_acl_header) +
4 * sizeof(ext3_acl_entry_short) +
(count - 4) * sizeof(ext3_acl_entry);
}
}
static inline int ext3_acl_count(size_t size)
{
ssize_t s;
size -= sizeof(ext3_acl_header);
s = size - 4 * sizeof(ext3_acl_entry_short);
if (s < 0) {
if (size % sizeof(ext3_acl_entry_short))
return -1;
return size / sizeof(ext3_acl_entry_short);
} else {
if (s % sizeof(ext3_acl_entry))
return -1;
return s / sizeof(ext3_acl_entry) + 4;
}
}
#ifdef CONFIG_EXT3_FS_POSIX_ACL
/* acl.c */
extern struct posix_acl *ext3_get_acl(struct inode *inode, int type);
extern int ext3_set_acl(struct inode *inode, struct posix_acl *acl, int type);
extern int ext3_init_acl (handle_t *, struct inode *, struct inode *);
#else /* CONFIG_EXT3_FS_POSIX_ACL */
#include <linux/sched.h>
#define ext3_get_acl NULL
#define ext3_set_acl NULL
static inline int
ext3_init_acl(handle_t *handle, struct inode *inode, struct inode *dir)
{
return 0;
}
#endif /* CONFIG_EXT3_FS_POSIX_ACL */

File diff suppressed because it is too large Load Diff

View File

@ -1,20 +0,0 @@
/*
* linux/fs/ext3/bitmap.c
*
* Copyright (C) 1992, 1993, 1994, 1995
* Remy Card (card@masi.ibp.fr)
* Laboratoire MASI - Institut Blaise Pascal
* Universite Pierre et Marie Curie (Paris VI)
*/
#include "ext3.h"
#ifdef EXT3FS_DEBUG
unsigned long ext3_count_free (struct buffer_head * map, unsigned int numchars)
{
return numchars * BITS_PER_BYTE - memweight(map->b_data, numchars);
}
#endif /* EXT3FS_DEBUG */

View File

@ -1,537 +0,0 @@
/*
* linux/fs/ext3/dir.c
*
* Copyright (C) 1992, 1993, 1994, 1995
* Remy Card (card@masi.ibp.fr)
* Laboratoire MASI - Institut Blaise Pascal
* Universite Pierre et Marie Curie (Paris VI)
*
* from
*
* linux/fs/minix/dir.c
*
* Copyright (C) 1991, 1992 Linus Torvalds
*
* ext3 directory handling functions
*
* Big-endian to little-endian byte-swapping/bitmaps by
* David S. Miller (davem@caip.rutgers.edu), 1995
*
* Hash Tree Directory indexing (c) 2001 Daniel Phillips
*
*/
#include <linux/compat.h>
#include "ext3.h"
static unsigned char ext3_filetype_table[] = {
DT_UNKNOWN, DT_REG, DT_DIR, DT_CHR, DT_BLK, DT_FIFO, DT_SOCK, DT_LNK
};
static int ext3_dx_readdir(struct file *, struct dir_context *);
static unsigned char get_dtype(struct super_block *sb, int filetype)
{
if (!EXT3_HAS_INCOMPAT_FEATURE(sb, EXT3_FEATURE_INCOMPAT_FILETYPE) ||
(filetype >= EXT3_FT_MAX))
return DT_UNKNOWN;
return (ext3_filetype_table[filetype]);
}
/**
* Check if the given dir-inode refers to an htree-indexed directory
* (or a directory which could potentially get converted to use htree
* indexing).
*
* Return 1 if it is a dx dir, 0 if not
*/
static int is_dx_dir(struct inode *inode)
{
struct super_block *sb = inode->i_sb;
if (EXT3_HAS_COMPAT_FEATURE(inode->i_sb,
EXT3_FEATURE_COMPAT_DIR_INDEX) &&
((EXT3_I(inode)->i_flags & EXT3_INDEX_FL) ||
((inode->i_size >> sb->s_blocksize_bits) == 1)))
return 1;
return 0;
}
int ext3_check_dir_entry (const char * function, struct inode * dir,
struct ext3_dir_entry_2 * de,
struct buffer_head * bh,
unsigned long offset)
{
const char * error_msg = NULL;
const int rlen = ext3_rec_len_from_disk(de->rec_len);
if (unlikely(rlen < EXT3_DIR_REC_LEN(1)))
error_msg = "rec_len is smaller than minimal";
else if (unlikely(rlen % 4 != 0))
error_msg = "rec_len % 4 != 0";
else if (unlikely(rlen < EXT3_DIR_REC_LEN(de->name_len)))
error_msg = "rec_len is too small for name_len";
else if (unlikely((((char *) de - bh->b_data) + rlen > dir->i_sb->s_blocksize)))
error_msg = "directory entry across blocks";
else if (unlikely(le32_to_cpu(de->inode) >
le32_to_cpu(EXT3_SB(dir->i_sb)->s_es->s_inodes_count)))
error_msg = "inode out of bounds";
if (unlikely(error_msg != NULL))
ext3_error (dir->i_sb, function,
"bad entry in directory #%lu: %s - "
"offset=%lu, inode=%lu, rec_len=%d, name_len=%d",
dir->i_ino, error_msg, offset,
(unsigned long) le32_to_cpu(de->inode),
rlen, de->name_len);
return error_msg == NULL ? 1 : 0;
}
static int ext3_readdir(struct file *file, struct dir_context *ctx)
{
unsigned long offset;
int i;
struct ext3_dir_entry_2 *de;
int err;
struct inode *inode = file_inode(file);
struct super_block *sb = inode->i_sb;
int dir_has_error = 0;
if (is_dx_dir(inode)) {
err = ext3_dx_readdir(file, ctx);
if (err != ERR_BAD_DX_DIR)
return err;
/*
* We don't set the inode dirty flag since it's not
* critical that it get flushed back to the disk.
*/
EXT3_I(inode)->i_flags &= ~EXT3_INDEX_FL;
}
offset = ctx->pos & (sb->s_blocksize - 1);
while (ctx->pos < inode->i_size) {
unsigned long blk = ctx->pos >> EXT3_BLOCK_SIZE_BITS(sb);
struct buffer_head map_bh;
struct buffer_head *bh = NULL;
map_bh.b_state = 0;
err = ext3_get_blocks_handle(NULL, inode, blk, 1, &map_bh, 0);
if (err > 0) {
pgoff_t index = map_bh.b_blocknr >>
(PAGE_CACHE_SHIFT - inode->i_blkbits);
if (!ra_has_index(&file->f_ra, index))
page_cache_sync_readahead(
sb->s_bdev->bd_inode->i_mapping,
&file->f_ra, file,
index, 1);
file->f_ra.prev_pos = (loff_t)index << PAGE_CACHE_SHIFT;
bh = ext3_bread(NULL, inode, blk, 0, &err);
}
/*
* We ignore I/O errors on directories so users have a chance
* of recovering data when there's a bad sector
*/
if (!bh) {
if (!dir_has_error) {
ext3_error(sb, __func__, "directory #%lu "
"contains a hole at offset %lld",
inode->i_ino, ctx->pos);
dir_has_error = 1;
}
/* corrupt size? Maybe no more blocks to read */
if (ctx->pos > inode->i_blocks << 9)
break;
ctx->pos += sb->s_blocksize - offset;
continue;
}
/* If the dir block has changed since the last call to
* readdir(2), then we might be pointing to an invalid
* dirent right now. Scan from the start of the block
* to make sure. */
if (offset && file->f_version != inode->i_version) {
for (i = 0; i < sb->s_blocksize && i < offset; ) {
de = (struct ext3_dir_entry_2 *)
(bh->b_data + i);
/* It's too expensive to do a full
* dirent test each time round this
* loop, but we do have to test at
* least that it is non-zero. A
* failure will be detected in the
* dirent test below. */
if (ext3_rec_len_from_disk(de->rec_len) <
EXT3_DIR_REC_LEN(1))
break;
i += ext3_rec_len_from_disk(de->rec_len);
}
offset = i;
ctx->pos = (ctx->pos & ~(sb->s_blocksize - 1))
| offset;
file->f_version = inode->i_version;
}
while (ctx->pos < inode->i_size
&& offset < sb->s_blocksize) {
de = (struct ext3_dir_entry_2 *) (bh->b_data + offset);
if (!ext3_check_dir_entry ("ext3_readdir", inode, de,
bh, offset)) {
/* On error, skip the to the
next block. */
ctx->pos = (ctx->pos |
(sb->s_blocksize - 1)) + 1;
break;
}
offset += ext3_rec_len_from_disk(de->rec_len);
if (le32_to_cpu(de->inode)) {
if (!dir_emit(ctx, de->name, de->name_len,
le32_to_cpu(de->inode),
get_dtype(sb, de->file_type))) {
brelse(bh);
return 0;
}
}
ctx->pos += ext3_rec_len_from_disk(de->rec_len);
}
offset = 0;
brelse (bh);
if (ctx->pos < inode->i_size)
if (!dir_relax(inode))
return 0;
}
return 0;
}
static inline int is_32bit_api(void)
{
#ifdef CONFIG_COMPAT
return is_compat_task();
#else
return (BITS_PER_LONG == 32);
#endif
}
/*
* These functions convert from the major/minor hash to an f_pos
* value for dx directories
*
* Upper layer (for example NFS) should specify FMODE_32BITHASH or
* FMODE_64BITHASH explicitly. On the other hand, we allow ext3 to be mounted
* directly on both 32-bit and 64-bit nodes, under such case, neither
* FMODE_32BITHASH nor FMODE_64BITHASH is specified.
*/
static inline loff_t hash2pos(struct file *filp, __u32 major, __u32 minor)
{
if ((filp->f_mode & FMODE_32BITHASH) ||
(!(filp->f_mode & FMODE_64BITHASH) && is_32bit_api()))
return major >> 1;
else
return ((__u64)(major >> 1) << 32) | (__u64)minor;
}
static inline __u32 pos2maj_hash(struct file *filp, loff_t pos)
{
if ((filp->f_mode & FMODE_32BITHASH) ||
(!(filp->f_mode & FMODE_64BITHASH) && is_32bit_api()))
return (pos << 1) & 0xffffffff;
else
return ((pos >> 32) << 1) & 0xffffffff;
}
static inline __u32 pos2min_hash(struct file *filp, loff_t pos)
{
if ((filp->f_mode & FMODE_32BITHASH) ||
(!(filp->f_mode & FMODE_64BITHASH) && is_32bit_api()))
return 0;
else
return pos & 0xffffffff;
}
/*
* Return 32- or 64-bit end-of-file for dx directories
*/
static inline loff_t ext3_get_htree_eof(struct file *filp)
{
if ((filp->f_mode & FMODE_32BITHASH) ||
(!(filp->f_mode & FMODE_64BITHASH) && is_32bit_api()))
return EXT3_HTREE_EOF_32BIT;
else
return EXT3_HTREE_EOF_64BIT;
}
/*
* ext3_dir_llseek() calls generic_file_llseek[_size]() to handle both
* non-htree and htree directories, where the "offset" is in terms
* of the filename hash value instead of the byte offset.
*
* Because we may return a 64-bit hash that is well beyond s_maxbytes,
* we need to pass the max hash as the maximum allowable offset in
* the htree directory case.
*
* NOTE: offsets obtained *before* ext3_set_inode_flag(dir, EXT3_INODE_INDEX)
* will be invalid once the directory was converted into a dx directory
*/
static loff_t ext3_dir_llseek(struct file *file, loff_t offset, int whence)
{
struct inode *inode = file->f_mapping->host;
int dx_dir = is_dx_dir(inode);
loff_t htree_max = ext3_get_htree_eof(file);
if (likely(dx_dir))
return generic_file_llseek_size(file, offset, whence,
htree_max, htree_max);
else
return generic_file_llseek(file, offset, whence);
}
/*
* This structure holds the nodes of the red-black tree used to store
* the directory entry in hash order.
*/
struct fname {
__u32 hash;
__u32 minor_hash;
struct rb_node rb_hash;
struct fname *next;
__u32 inode;
__u8 name_len;
__u8 file_type;
char name[0];
};
/*
* This functoin implements a non-recursive way of freeing all of the
* nodes in the red-black tree.
*/
static void free_rb_tree_fname(struct rb_root *root)
{
struct fname *fname, *next;
rbtree_postorder_for_each_entry_safe(fname, next, root, rb_hash)
do {
struct fname *old = fname;
fname = fname->next;
kfree(old);
} while (fname);
*root = RB_ROOT;
}
static struct dir_private_info *ext3_htree_create_dir_info(struct file *filp,
loff_t pos)
{
struct dir_private_info *p;
p = kzalloc(sizeof(struct dir_private_info), GFP_KERNEL);
if (!p)
return NULL;
p->curr_hash = pos2maj_hash(filp, pos);
p->curr_minor_hash = pos2min_hash(filp, pos);
return p;
}
void ext3_htree_free_dir_info(struct dir_private_info *p)
{
free_rb_tree_fname(&p->root);
kfree(p);
}
/*
* Given a directory entry, enter it into the fname rb tree.
*/
int ext3_htree_store_dirent(struct file *dir_file, __u32 hash,
__u32 minor_hash,
struct ext3_dir_entry_2 *dirent)
{
struct rb_node **p, *parent = NULL;
struct fname * fname, *new_fn;
struct dir_private_info *info;
int len;
info = (struct dir_private_info *) dir_file->private_data;
p = &info->root.rb_node;
/* Create and allocate the fname structure */
len = sizeof(struct fname) + dirent->name_len + 1;
new_fn = kzalloc(len, GFP_KERNEL);
if (!new_fn)
return -ENOMEM;
new_fn->hash = hash;
new_fn->minor_hash = minor_hash;
new_fn->inode = le32_to_cpu(dirent->inode);
new_fn->name_len = dirent->name_len;
new_fn->file_type = dirent->file_type;
memcpy(new_fn->name, dirent->name, dirent->name_len);
new_fn->name[dirent->name_len] = 0;
while (*p) {
parent = *p;
fname = rb_entry(parent, struct fname, rb_hash);
/*
* If the hash and minor hash match up, then we put
* them on a linked list. This rarely happens...
*/
if ((new_fn->hash == fname->hash) &&
(new_fn->minor_hash == fname->minor_hash)) {
new_fn->next = fname->next;
fname->next = new_fn;
return 0;
}
if (new_fn->hash < fname->hash)
p = &(*p)->rb_left;
else if (new_fn->hash > fname->hash)
p = &(*p)->rb_right;
else if (new_fn->minor_hash < fname->minor_hash)
p = &(*p)->rb_left;
else /* if (new_fn->minor_hash > fname->minor_hash) */
p = &(*p)->rb_right;
}
rb_link_node(&new_fn->rb_hash, parent, p);
rb_insert_color(&new_fn->rb_hash, &info->root);
return 0;
}
/*
* This is a helper function for ext3_dx_readdir. It calls filldir
* for all entres on the fname linked list. (Normally there is only
* one entry on the linked list, unless there are 62 bit hash collisions.)
*/
static bool call_filldir(struct file *file, struct dir_context *ctx,
struct fname *fname)
{
struct dir_private_info *info = file->private_data;
struct inode *inode = file_inode(file);
struct super_block *sb = inode->i_sb;
if (!fname) {
printk("call_filldir: called with null fname?!?\n");
return true;
}
ctx->pos = hash2pos(file, fname->hash, fname->minor_hash);
while (fname) {
if (!dir_emit(ctx, fname->name, fname->name_len,
fname->inode,
get_dtype(sb, fname->file_type))) {
info->extra_fname = fname;
return false;
}
fname = fname->next;
}
return true;
}
static int ext3_dx_readdir(struct file *file, struct dir_context *ctx)
{
struct dir_private_info *info = file->private_data;
struct inode *inode = file_inode(file);
struct fname *fname;
int ret;
if (!info) {
info = ext3_htree_create_dir_info(file, ctx->pos);
if (!info)
return -ENOMEM;
file->private_data = info;
}
if (ctx->pos == ext3_get_htree_eof(file))
return 0; /* EOF */
/* Some one has messed with f_pos; reset the world */
if (info->last_pos != ctx->pos) {
free_rb_tree_fname(&info->root);
info->curr_node = NULL;
info->extra_fname = NULL;
info->curr_hash = pos2maj_hash(file, ctx->pos);
info->curr_minor_hash = pos2min_hash(file, ctx->pos);
}
/*
* If there are any leftover names on the hash collision
* chain, return them first.
*/
if (info->extra_fname) {
if (!call_filldir(file, ctx, info->extra_fname))
goto finished;
info->extra_fname = NULL;
goto next_node;
} else if (!info->curr_node)
info->curr_node = rb_first(&info->root);
while (1) {
/*
* Fill the rbtree if we have no more entries,
* or the inode has changed since we last read in the
* cached entries.
*/
if ((!info->curr_node) ||
(file->f_version != inode->i_version)) {
info->curr_node = NULL;
free_rb_tree_fname(&info->root);
file->f_version = inode->i_version;
ret = ext3_htree_fill_tree(file, info->curr_hash,
info->curr_minor_hash,
&info->next_hash);
if (ret < 0)
return ret;
if (ret == 0) {
ctx->pos = ext3_get_htree_eof(file);
break;
}
info->curr_node = rb_first(&info->root);
}
fname = rb_entry(info->curr_node, struct fname, rb_hash);
info->curr_hash = fname->hash;
info->curr_minor_hash = fname->minor_hash;
if (!call_filldir(file, ctx, fname))
break;
next_node:
info->curr_node = rb_next(info->curr_node);
if (info->curr_node) {
fname = rb_entry(info->curr_node, struct fname,
rb_hash);
info->curr_hash = fname->hash;
info->curr_minor_hash = fname->minor_hash;
} else {
if (info->next_hash == ~0) {
ctx->pos = ext3_get_htree_eof(file);
break;
}
info->curr_hash = info->next_hash;
info->curr_minor_hash = 0;
}
}
finished:
info->last_pos = ctx->pos;
return 0;
}
static int ext3_release_dir (struct inode * inode, struct file * filp)
{
if (filp->private_data)
ext3_htree_free_dir_info(filp->private_data);
return 0;
}
const struct file_operations ext3_dir_operations = {
.llseek = ext3_dir_llseek,
.read = generic_read_dir,
.iterate = ext3_readdir,
.unlocked_ioctl = ext3_ioctl,
#ifdef CONFIG_COMPAT
.compat_ioctl = ext3_compat_ioctl,
#endif
.fsync = ext3_sync_file,
.release = ext3_release_dir,
};

File diff suppressed because it is too large Load Diff

View File

@ -1,59 +0,0 @@
/*
* Interface between ext3 and JBD
*/
#include "ext3.h"
int __ext3_journal_get_undo_access(const char *where, handle_t *handle,
struct buffer_head *bh)
{
int err = journal_get_undo_access(handle, bh);
if (err)
ext3_journal_abort_handle(where, __func__, bh, handle,err);
return err;
}
int __ext3_journal_get_write_access(const char *where, handle_t *handle,
struct buffer_head *bh)
{
int err = journal_get_write_access(handle, bh);
if (err)
ext3_journal_abort_handle(where, __func__, bh, handle,err);
return err;
}
int __ext3_journal_forget(const char *where, handle_t *handle,
struct buffer_head *bh)
{
int err = journal_forget(handle, bh);
if (err)
ext3_journal_abort_handle(where, __func__, bh, handle,err);
return err;
}
int __ext3_journal_revoke(const char *where, handle_t *handle,
unsigned long blocknr, struct buffer_head *bh)
{
int err = journal_revoke(handle, blocknr, bh);
if (err)
ext3_journal_abort_handle(where, __func__, bh, handle,err);
return err;
}
int __ext3_journal_get_create_access(const char *where,
handle_t *handle, struct buffer_head *bh)
{
int err = journal_get_create_access(handle, bh);
if (err)
ext3_journal_abort_handle(where, __func__, bh, handle,err);
return err;
}
int __ext3_journal_dirty_metadata(const char *where,
handle_t *handle, struct buffer_head *bh)
{
int err = journal_dirty_metadata(handle, bh);
if (err)
ext3_journal_abort_handle(where, __func__, bh, handle,err);
return err;
}

View File

@ -1,79 +0,0 @@
/*
* linux/fs/ext3/file.c
*
* Copyright (C) 1992, 1993, 1994, 1995
* Remy Card (card@masi.ibp.fr)
* Laboratoire MASI - Institut Blaise Pascal
* Universite Pierre et Marie Curie (Paris VI)
*
* from
*
* linux/fs/minix/file.c
*
* Copyright (C) 1991, 1992 Linus Torvalds
*
* ext3 fs regular file handling primitives
*
* 64-bit file support on 64-bit platforms by Jakub Jelinek
* (jj@sunsite.ms.mff.cuni.cz)
*/
#include <linux/quotaops.h>
#include "ext3.h"
#include "xattr.h"
#include "acl.h"
/*
* Called when an inode is released. Note that this is different
* from ext3_file_open: open gets called at every open, but release
* gets called only when /all/ the files are closed.
*/
static int ext3_release_file (struct inode * inode, struct file * filp)
{
if (ext3_test_inode_state(inode, EXT3_STATE_FLUSH_ON_CLOSE)) {
filemap_flush(inode->i_mapping);
ext3_clear_inode_state(inode, EXT3_STATE_FLUSH_ON_CLOSE);
}
/* if we are the last writer on the inode, drop the block reservation */
if ((filp->f_mode & FMODE_WRITE) &&
(atomic_read(&inode->i_writecount) == 1))
{
mutex_lock(&EXT3_I(inode)->truncate_mutex);
ext3_discard_reservation(inode);
mutex_unlock(&EXT3_I(inode)->truncate_mutex);
}
if (is_dx(inode) && filp->private_data)
ext3_htree_free_dir_info(filp->private_data);
return 0;
}
const struct file_operations ext3_file_operations = {
.llseek = generic_file_llseek,
.read_iter = generic_file_read_iter,
.write_iter = generic_file_write_iter,
.unlocked_ioctl = ext3_ioctl,
#ifdef CONFIG_COMPAT
.compat_ioctl = ext3_compat_ioctl,
#endif
.mmap = generic_file_mmap,
.open = dquot_file_open,
.release = ext3_release_file,
.fsync = ext3_sync_file,
.splice_read = generic_file_splice_read,
.splice_write = iter_file_splice_write,
};
const struct inode_operations ext3_file_inode_operations = {
.setattr = ext3_setattr,
#ifdef CONFIG_EXT3_FS_XATTR
.setxattr = generic_setxattr,
.getxattr = generic_getxattr,
.listxattr = ext3_listxattr,
.removexattr = generic_removexattr,
#endif
.get_acl = ext3_get_acl,
.set_acl = ext3_set_acl,
.fiemap = ext3_fiemap,
};

View File

@ -1,109 +0,0 @@
/*
* linux/fs/ext3/fsync.c
*
* Copyright (C) 1993 Stephen Tweedie (sct@redhat.com)
* from
* Copyright (C) 1992 Remy Card (card@masi.ibp.fr)
* Laboratoire MASI - Institut Blaise Pascal
* Universite Pierre et Marie Curie (Paris VI)
* from
* linux/fs/minix/truncate.c Copyright (C) 1991, 1992 Linus Torvalds
*
* ext3fs fsync primitive
*
* Big-endian to little-endian byte-swapping/bitmaps by
* David S. Miller (davem@caip.rutgers.edu), 1995
*
* Removed unnecessary code duplication for little endian machines
* and excessive __inline__s.
* Andi Kleen, 1997
*
* Major simplications and cleanup - we only need to do the metadata, because
* we can depend on generic_block_fdatasync() to sync the data blocks.
*/
#include <linux/blkdev.h>
#include <linux/writeback.h>
#include "ext3.h"
/*
* akpm: A new design for ext3_sync_file().
*
* This is only called from sys_fsync(), sys_fdatasync() and sys_msync().
* There cannot be a transaction open by this task.
* Another task could have dirtied this inode. Its data can be in any
* state in the journalling system.
*
* What we do is just kick off a commit and wait on it. This will snapshot the
* inode to disk.
*/
int ext3_sync_file(struct file *file, loff_t start, loff_t end, int datasync)
{
struct inode *inode = file->f_mapping->host;
struct ext3_inode_info *ei = EXT3_I(inode);
journal_t *journal = EXT3_SB(inode->i_sb)->s_journal;
int ret, needs_barrier = 0;
tid_t commit_tid;
trace_ext3_sync_file_enter(file, datasync);
if (inode->i_sb->s_flags & MS_RDONLY) {
/* Make sure that we read updated state */
smp_rmb();
if (EXT3_SB(inode->i_sb)->s_mount_state & EXT3_ERROR_FS)
return -EROFS;
return 0;
}
ret = filemap_write_and_wait_range(inode->i_mapping, start, end);
if (ret)
goto out;
J_ASSERT(ext3_journal_current_handle() == NULL);
/*
* data=writeback,ordered:
* The caller's filemap_fdatawrite()/wait will sync the data.
* Metadata is in the journal, we wait for a proper transaction
* to commit here.
*
* data=journal:
* filemap_fdatawrite won't do anything (the buffers are clean).
* ext3_force_commit will write the file data into the journal and
* will wait on that.
* filemap_fdatawait() will encounter a ton of newly-dirtied pages
* (they were dirtied by commit). But that's OK - the blocks are
* safe in-journal, which is all fsync() needs to ensure.
*/
if (ext3_should_journal_data(inode)) {
ret = ext3_force_commit(inode->i_sb);
goto out;
}
if (datasync)
commit_tid = atomic_read(&ei->i_datasync_tid);
else
commit_tid = atomic_read(&ei->i_sync_tid);
if (test_opt(inode->i_sb, BARRIER) &&
!journal_trans_will_send_data_barrier(journal, commit_tid))
needs_barrier = 1;
log_start_commit(journal, commit_tid);
ret = log_wait_commit(journal, commit_tid);
/*
* In case we didn't commit a transaction, we have to flush
* disk caches manually so that data really is on persistent
* storage
*/
if (needs_barrier) {
int err;
err = blkdev_issue_flush(inode->i_sb->s_bdev, GFP_KERNEL, NULL);
if (!ret)
ret = err;
}
out:
trace_ext3_sync_file_exit(inode, ret);
return ret;
}

View File

@ -1,206 +0,0 @@
/*
* linux/fs/ext3/hash.c
*
* Copyright (C) 2002 by Theodore Ts'o
*
* This file is released under the GPL v2.
*
* This file may be redistributed under the terms of the GNU Public
* License.
*/
#include "ext3.h"
#include <linux/cryptohash.h>
#define DELTA 0x9E3779B9
static void TEA_transform(__u32 buf[4], __u32 const in[])
{
__u32 sum = 0;
__u32 b0 = buf[0], b1 = buf[1];
__u32 a = in[0], b = in[1], c = in[2], d = in[3];
int n = 16;
do {
sum += DELTA;
b0 += ((b1 << 4)+a) ^ (b1+sum) ^ ((b1 >> 5)+b);
b1 += ((b0 << 4)+c) ^ (b0+sum) ^ ((b0 >> 5)+d);
} while(--n);
buf[0] += b0;
buf[1] += b1;
}
/* The old legacy hash */
static __u32 dx_hack_hash_unsigned(const char *name, int len)
{
__u32 hash, hash0 = 0x12a3fe2d, hash1 = 0x37abe8f9;
const unsigned char *ucp = (const unsigned char *) name;
while (len--) {
hash = hash1 + (hash0 ^ (((int) *ucp++) * 7152373));
if (hash & 0x80000000)
hash -= 0x7fffffff;
hash1 = hash0;
hash0 = hash;
}
return hash0 << 1;
}
static __u32 dx_hack_hash_signed(const char *name, int len)
{
__u32 hash, hash0 = 0x12a3fe2d, hash1 = 0x37abe8f9;
const signed char *scp = (const signed char *) name;
while (len--) {
hash = hash1 + (hash0 ^ (((int) *scp++) * 7152373));
if (hash & 0x80000000)
hash -= 0x7fffffff;
hash1 = hash0;
hash0 = hash;
}
return hash0 << 1;
}
static void str2hashbuf_signed(const char *msg, int len, __u32 *buf, int num)
{
__u32 pad, val;
int i;
const signed char *scp = (const signed char *) msg;
pad = (__u32)len | ((__u32)len << 8);
pad |= pad << 16;
val = pad;
if (len > num*4)
len = num * 4;
for (i = 0; i < len; i++) {
if ((i % 4) == 0)
val = pad;
val = ((int) scp[i]) + (val << 8);
if ((i % 4) == 3) {
*buf++ = val;
val = pad;
num--;
}
}
if (--num >= 0)
*buf++ = val;
while (--num >= 0)
*buf++ = pad;
}
static void str2hashbuf_unsigned(const char *msg, int len, __u32 *buf, int num)
{
__u32 pad, val;
int i;
const unsigned char *ucp = (const unsigned char *) msg;
pad = (__u32)len | ((__u32)len << 8);
pad |= pad << 16;
val = pad;
if (len > num*4)
len = num * 4;
for (i=0; i < len; i++) {
if ((i % 4) == 0)
val = pad;
val = ((int) ucp[i]) + (val << 8);
if ((i % 4) == 3) {
*buf++ = val;
val = pad;
num--;
}
}
if (--num >= 0)
*buf++ = val;
while (--num >= 0)
*buf++ = pad;
}
/*
* Returns the hash of a filename. If len is 0 and name is NULL, then
* this function can be used to test whether or not a hash version is
* supported.
*
* The seed is an 4 longword (32 bits) "secret" which can be used to
* uniquify a hash. If the seed is all zero's, then some default seed
* may be used.
*
* A particular hash version specifies whether or not the seed is
* represented, and whether or not the returned hash is 32 bits or 64
* bits. 32 bit hashes will return 0 for the minor hash.
*/
int ext3fs_dirhash(const char *name, int len, struct dx_hash_info *hinfo)
{
__u32 hash;
__u32 minor_hash = 0;
const char *p;
int i;
__u32 in[8], buf[4];
void (*str2hashbuf)(const char *, int, __u32 *, int) =
str2hashbuf_signed;
/* Initialize the default seed for the hash checksum functions */
buf[0] = 0x67452301;
buf[1] = 0xefcdab89;
buf[2] = 0x98badcfe;
buf[3] = 0x10325476;
/* Check to see if the seed is all zero's */
if (hinfo->seed) {
for (i=0; i < 4; i++) {
if (hinfo->seed[i])
break;
}
if (i < 4)
memcpy(buf, hinfo->seed, sizeof(buf));
}
switch (hinfo->hash_version) {
case DX_HASH_LEGACY_UNSIGNED:
hash = dx_hack_hash_unsigned(name, len);
break;
case DX_HASH_LEGACY:
hash = dx_hack_hash_signed(name, len);
break;
case DX_HASH_HALF_MD4_UNSIGNED:
str2hashbuf = str2hashbuf_unsigned;
case DX_HASH_HALF_MD4:
p = name;
while (len > 0) {
(*str2hashbuf)(p, len, in, 8);
half_md4_transform(buf, in);
len -= 32;
p += 32;
}
minor_hash = buf[2];
hash = buf[1];
break;
case DX_HASH_TEA_UNSIGNED:
str2hashbuf = str2hashbuf_unsigned;
case DX_HASH_TEA:
p = name;
while (len > 0) {
(*str2hashbuf)(p, len, in, 4);
TEA_transform(buf, in);
len -= 16;
p += 16;
}
hash = buf[0];
minor_hash = buf[1];
break;
default:
hinfo->hash = 0;
return -1;
}
hash = hash & ~1;
if (hash == (EXT3_HTREE_EOF_32BIT << 1))
hash = (EXT3_HTREE_EOF_32BIT - 1) << 1;
hinfo->hash = hash;
hinfo->minor_hash = minor_hash;
return 0;
}

View File

@ -1,706 +0,0 @@
/*
* linux/fs/ext3/ialloc.c
*
* Copyright (C) 1992, 1993, 1994, 1995
* Remy Card (card@masi.ibp.fr)
* Laboratoire MASI - Institut Blaise Pascal
* Universite Pierre et Marie Curie (Paris VI)
*
* BSD ufs-inspired inode and directory allocation by
* Stephen Tweedie (sct@redhat.com), 1993
* Big-endian to little-endian byte-swapping/bitmaps by
* David S. Miller (davem@caip.rutgers.edu), 1995
*/
#include <linux/quotaops.h>
#include <linux/random.h>
#include "ext3.h"
#include "xattr.h"
#include "acl.h"
/*
* ialloc.c contains the inodes allocation and deallocation routines
*/
/*
* The free inodes are managed by bitmaps. A file system contains several
* blocks groups. Each group contains 1 bitmap block for blocks, 1 bitmap
* block for inodes, N blocks for the inode table and data blocks.
*
* The file system contains group descriptors which are located after the
* super block. Each descriptor contains the number of the bitmap block and
* the free blocks count in the block.
*/
/*
* Read the inode allocation bitmap for a given block_group, reading
* into the specified slot in the superblock's bitmap cache.
*
* Return buffer_head of bitmap on success or NULL.
*/
static struct buffer_head *
read_inode_bitmap(struct super_block * sb, unsigned long block_group)
{
struct ext3_group_desc *desc;
struct buffer_head *bh = NULL;
desc = ext3_get_group_desc(sb, block_group, NULL);
if (!desc)
goto error_out;
bh = sb_bread(sb, le32_to_cpu(desc->bg_inode_bitmap));
if (!bh)
ext3_error(sb, "read_inode_bitmap",
"Cannot read inode bitmap - "
"block_group = %lu, inode_bitmap = %u",
block_group, le32_to_cpu(desc->bg_inode_bitmap));
error_out:
return bh;
}
/*
* NOTE! When we get the inode, we're the only people
* that have access to it, and as such there are no
* race conditions we have to worry about. The inode
* is not on the hash-lists, and it cannot be reached
* through the filesystem because the directory entry
* has been deleted earlier.
*
* HOWEVER: we must make sure that we get no aliases,
* which means that we have to call "clear_inode()"
* _before_ we mark the inode not in use in the inode
* bitmaps. Otherwise a newly created file might use
* the same inode number (not actually the same pointer
* though), and then we'd have two inodes sharing the
* same inode number and space on the harddisk.
*/
void ext3_free_inode (handle_t *handle, struct inode * inode)
{
struct super_block * sb = inode->i_sb;
int is_directory;
unsigned long ino;
struct buffer_head *bitmap_bh = NULL;
struct buffer_head *bh2;
unsigned long block_group;
unsigned long bit;
struct ext3_group_desc * gdp;
struct ext3_super_block * es;
struct ext3_sb_info *sbi;
int fatal = 0, err;
if (atomic_read(&inode->i_count) > 1) {
printk ("ext3_free_inode: inode has count=%d\n",
atomic_read(&inode->i_count));
return;
}
if (inode->i_nlink) {
printk ("ext3_free_inode: inode has nlink=%d\n",
inode->i_nlink);
return;
}
if (!sb) {
printk("ext3_free_inode: inode on nonexistent device\n");
return;
}
sbi = EXT3_SB(sb);
ino = inode->i_ino;
ext3_debug ("freeing inode %lu\n", ino);
trace_ext3_free_inode(inode);
is_directory = S_ISDIR(inode->i_mode);
es = EXT3_SB(sb)->s_es;
if (ino < EXT3_FIRST_INO(sb) || ino > le32_to_cpu(es->s_inodes_count)) {
ext3_error (sb, "ext3_free_inode",
"reserved or nonexistent inode %lu", ino);
goto error_return;
}
block_group = (ino - 1) / EXT3_INODES_PER_GROUP(sb);
bit = (ino - 1) % EXT3_INODES_PER_GROUP(sb);
bitmap_bh = read_inode_bitmap(sb, block_group);
if (!bitmap_bh)
goto error_return;
BUFFER_TRACE(bitmap_bh, "get_write_access");
fatal = ext3_journal_get_write_access(handle, bitmap_bh);
if (fatal)
goto error_return;
/* Ok, now we can actually update the inode bitmaps.. */
if (!ext3_clear_bit_atomic(sb_bgl_lock(sbi, block_group),
bit, bitmap_bh->b_data))
ext3_error (sb, "ext3_free_inode",
"bit already cleared for inode %lu", ino);
else {
gdp = ext3_get_group_desc (sb, block_group, &bh2);
BUFFER_TRACE(bh2, "get_write_access");
fatal = ext3_journal_get_write_access(handle, bh2);
if (fatal) goto error_return;
if (gdp) {
spin_lock(sb_bgl_lock(sbi, block_group));
le16_add_cpu(&gdp->bg_free_inodes_count, 1);
if (is_directory)
le16_add_cpu(&gdp->bg_used_dirs_count, -1);
spin_unlock(sb_bgl_lock(sbi, block_group));
percpu_counter_inc(&sbi->s_freeinodes_counter);
if (is_directory)
percpu_counter_dec(&sbi->s_dirs_counter);
}
BUFFER_TRACE(bh2, "call ext3_journal_dirty_metadata");
err = ext3_journal_dirty_metadata(handle, bh2);
if (!fatal) fatal = err;
}
BUFFER_TRACE(bitmap_bh, "call ext3_journal_dirty_metadata");
err = ext3_journal_dirty_metadata(handle, bitmap_bh);
if (!fatal)
fatal = err;
error_return:
brelse(bitmap_bh);
ext3_std_error(sb, fatal);
}
/*
* Orlov's allocator for directories.
*
* We always try to spread first-level directories.
*
* If there are blockgroups with both free inodes and free blocks counts
* not worse than average we return one with smallest directory count.
* Otherwise we simply return a random group.
*
* For the rest rules look so:
*
* It's OK to put directory into a group unless
* it has too many directories already (max_dirs) or
* it has too few free inodes left (min_inodes) or
* it has too few free blocks left (min_blocks).
* Parent's group is preferred, if it doesn't satisfy these
* conditions we search cyclically through the rest. If none
* of the groups look good we just look for a group with more
* free inodes than average (starting at parent's group).
*
* Debt is incremented each time we allocate a directory and decremented
* when we allocate an inode, within 0--255.
*/
static int find_group_orlov(struct super_block *sb, struct inode *parent)
{
int parent_group = EXT3_I(parent)->i_block_group;
struct ext3_sb_info *sbi = EXT3_SB(sb);
int ngroups = sbi->s_groups_count;
int inodes_per_group = EXT3_INODES_PER_GROUP(sb);
unsigned int freei, avefreei;
ext3_fsblk_t freeb, avefreeb;
unsigned int ndirs;
int max_dirs, min_inodes;
ext3_grpblk_t min_blocks;
int group = -1, i;
struct ext3_group_desc *desc;
freei = percpu_counter_read_positive(&sbi->s_freeinodes_counter);
avefreei = freei / ngroups;
freeb = percpu_counter_read_positive(&sbi->s_freeblocks_counter);
avefreeb = freeb / ngroups;
ndirs = percpu_counter_read_positive(&sbi->s_dirs_counter);
if ((parent == d_inode(sb->s_root)) ||
(EXT3_I(parent)->i_flags & EXT3_TOPDIR_FL)) {
int best_ndir = inodes_per_group;
int best_group = -1;
group = prandom_u32();
parent_group = (unsigned)group % ngroups;
for (i = 0; i < ngroups; i++) {
group = (parent_group + i) % ngroups;
desc = ext3_get_group_desc (sb, group, NULL);
if (!desc || !desc->bg_free_inodes_count)
continue;
if (le16_to_cpu(desc->bg_used_dirs_count) >= best_ndir)
continue;
if (le16_to_cpu(desc->bg_free_inodes_count) < avefreei)
continue;
if (le16_to_cpu(desc->bg_free_blocks_count) < avefreeb)
continue;
best_group = group;
best_ndir = le16_to_cpu(desc->bg_used_dirs_count);
}
if (best_group >= 0)
return best_group;
goto fallback;
}
max_dirs = ndirs / ngroups + inodes_per_group / 16;
min_inodes = avefreei - inodes_per_group / 4;
min_blocks = avefreeb - EXT3_BLOCKS_PER_GROUP(sb) / 4;
for (i = 0; i < ngroups; i++) {
group = (parent_group + i) % ngroups;
desc = ext3_get_group_desc (sb, group, NULL);
if (!desc || !desc->bg_free_inodes_count)
continue;
if (le16_to_cpu(desc->bg_used_dirs_count) >= max_dirs)
continue;
if (le16_to_cpu(desc->bg_free_inodes_count) < min_inodes)
continue;
if (le16_to_cpu(desc->bg_free_blocks_count) < min_blocks)
continue;
return group;
}
fallback:
for (i = 0; i < ngroups; i++) {
group = (parent_group + i) % ngroups;
desc = ext3_get_group_desc (sb, group, NULL);
if (!desc || !desc->bg_free_inodes_count)
continue;
if (le16_to_cpu(desc->bg_free_inodes_count) >= avefreei)
return group;
}
if (avefreei) {
/*
* The free-inodes counter is approximate, and for really small
* filesystems the above test can fail to find any blockgroups
*/
avefreei = 0;
goto fallback;
}
return -1;
}
static int find_group_other(struct super_block *sb, struct inode *parent)
{
int parent_group = EXT3_I(parent)->i_block_group;
int ngroups = EXT3_SB(sb)->s_groups_count;
struct ext3_group_desc *desc;
int group, i;
/*
* Try to place the inode in its parent directory
*/
group = parent_group;
desc = ext3_get_group_desc (sb, group, NULL);
if (desc && le16_to_cpu(desc->bg_free_inodes_count) &&
le16_to_cpu(desc->bg_free_blocks_count))
return group;
/*
* We're going to place this inode in a different blockgroup from its
* parent. We want to cause files in a common directory to all land in
* the same blockgroup. But we want files which are in a different
* directory which shares a blockgroup with our parent to land in a
* different blockgroup.
*
* So add our directory's i_ino into the starting point for the hash.
*/
group = (group + parent->i_ino) % ngroups;
/*
* Use a quadratic hash to find a group with a free inode and some free
* blocks.
*/
for (i = 1; i < ngroups; i <<= 1) {
group += i;
if (group >= ngroups)
group -= ngroups;
desc = ext3_get_group_desc (sb, group, NULL);
if (desc && le16_to_cpu(desc->bg_free_inodes_count) &&
le16_to_cpu(desc->bg_free_blocks_count))
return group;
}
/*
* That failed: try linear search for a free inode, even if that group
* has no free blocks.
*/
group = parent_group;
for (i = 0; i < ngroups; i++) {
if (++group >= ngroups)
group = 0;
desc = ext3_get_group_desc (sb, group, NULL);
if (desc && le16_to_cpu(desc->bg_free_inodes_count))
return group;
}
return -1;
}
/*
* There are two policies for allocating an inode. If the new inode is
* a directory, then a forward search is made for a block group with both
* free space and a low directory-to-inode ratio; if that fails, then of
* the groups with above-average free space, that group with the fewest
* directories already is chosen.
*
* For other inodes, search forward from the parent directory's block
* group to find a free inode.
*/
struct inode *ext3_new_inode(handle_t *handle, struct inode * dir,
const struct qstr *qstr, umode_t mode)
{
struct super_block *sb;
struct buffer_head *bitmap_bh = NULL;
struct buffer_head *bh2;
int group;
unsigned long ino = 0;
struct inode * inode;
struct ext3_group_desc * gdp = NULL;
struct ext3_super_block * es;
struct ext3_inode_info *ei;
struct ext3_sb_info *sbi;
int err = 0;
struct inode *ret;
int i;
/* Cannot create files in a deleted directory */
if (!dir || !dir->i_nlink)
return ERR_PTR(-EPERM);
sb = dir->i_sb;
trace_ext3_request_inode(dir, mode);
inode = new_inode(sb);
if (!inode)
return ERR_PTR(-ENOMEM);
ei = EXT3_I(inode);
sbi = EXT3_SB(sb);
es = sbi->s_es;
if (S_ISDIR(mode))
group = find_group_orlov(sb, dir);
else
group = find_group_other(sb, dir);
err = -ENOSPC;
if (group == -1)
goto out;
for (i = 0; i < sbi->s_groups_count; i++) {
err = -EIO;
gdp = ext3_get_group_desc(sb, group, &bh2);
if (!gdp)
goto fail;
brelse(bitmap_bh);
bitmap_bh = read_inode_bitmap(sb, group);
if (!bitmap_bh)
goto fail;
ino = 0;
repeat_in_this_group:
ino = ext3_find_next_zero_bit((unsigned long *)
bitmap_bh->b_data, EXT3_INODES_PER_GROUP(sb), ino);
if (ino < EXT3_INODES_PER_GROUP(sb)) {
BUFFER_TRACE(bitmap_bh, "get_write_access");
err = ext3_journal_get_write_access(handle, bitmap_bh);
if (err)
goto fail;
if (!ext3_set_bit_atomic(sb_bgl_lock(sbi, group),
ino, bitmap_bh->b_data)) {
/* we won it */
BUFFER_TRACE(bitmap_bh,
"call ext3_journal_dirty_metadata");
err = ext3_journal_dirty_metadata(handle,
bitmap_bh);
if (err)
goto fail;
goto got;
}
/* we lost it */
journal_release_buffer(handle, bitmap_bh);
if (++ino < EXT3_INODES_PER_GROUP(sb))
goto repeat_in_this_group;
}
/*
* This case is possible in concurrent environment. It is very
* rare. We cannot repeat the find_group_xxx() call because
* that will simply return the same blockgroup, because the
* group descriptor metadata has not yet been updated.
* So we just go onto the next blockgroup.
*/
if (++group == sbi->s_groups_count)
group = 0;
}
err = -ENOSPC;
goto out;
got:
ino += group * EXT3_INODES_PER_GROUP(sb) + 1;
if (ino < EXT3_FIRST_INO(sb) || ino > le32_to_cpu(es->s_inodes_count)) {
ext3_error (sb, "ext3_new_inode",
"reserved inode or inode > inodes count - "
"block_group = %d, inode=%lu", group, ino);
err = -EIO;
goto fail;
}
BUFFER_TRACE(bh2, "get_write_access");
err = ext3_journal_get_write_access(handle, bh2);
if (err) goto fail;
spin_lock(sb_bgl_lock(sbi, group));
le16_add_cpu(&gdp->bg_free_inodes_count, -1);
if (S_ISDIR(mode)) {
le16_add_cpu(&gdp->bg_used_dirs_count, 1);
}
spin_unlock(sb_bgl_lock(sbi, group));
BUFFER_TRACE(bh2, "call ext3_journal_dirty_metadata");
err = ext3_journal_dirty_metadata(handle, bh2);
if (err) goto fail;
percpu_counter_dec(&sbi->s_freeinodes_counter);
if (S_ISDIR(mode))
percpu_counter_inc(&sbi->s_dirs_counter);
if (test_opt(sb, GRPID)) {
inode->i_mode = mode;
inode->i_uid = current_fsuid();
inode->i_gid = dir->i_gid;
} else
inode_init_owner(inode, dir, mode);
inode->i_ino = ino;
/* This is the optimal IO size (for stat), not the fs block size */
inode->i_blocks = 0;
inode->i_mtime = inode->i_atime = inode->i_ctime = CURRENT_TIME_SEC;
memset(ei->i_data, 0, sizeof(ei->i_data));
ei->i_dir_start_lookup = 0;
ei->i_disksize = 0;
ei->i_flags =
ext3_mask_flags(mode, EXT3_I(dir)->i_flags & EXT3_FL_INHERITED);
#ifdef EXT3_FRAGMENTS
ei->i_faddr = 0;
ei->i_frag_no = 0;
ei->i_frag_size = 0;
#endif
ei->i_file_acl = 0;
ei->i_dir_acl = 0;
ei->i_dtime = 0;
ei->i_block_alloc_info = NULL;
ei->i_block_group = group;
ext3_set_inode_flags(inode);
if (IS_DIRSYNC(inode))
handle->h_sync = 1;
if (insert_inode_locked(inode) < 0) {
/*
* Likely a bitmap corruption causing inode to be allocated
* twice.
*/
err = -EIO;
goto fail;
}
spin_lock(&sbi->s_next_gen_lock);
inode->i_generation = sbi->s_next_generation++;
spin_unlock(&sbi->s_next_gen_lock);
ei->i_state_flags = 0;
ext3_set_inode_state(inode, EXT3_STATE_NEW);
/* See comment in ext3_iget for explanation */
if (ino >= EXT3_FIRST_INO(sb) + 1 &&
EXT3_INODE_SIZE(sb) > EXT3_GOOD_OLD_INODE_SIZE) {
ei->i_extra_isize =
sizeof(struct ext3_inode) - EXT3_GOOD_OLD_INODE_SIZE;
} else {
ei->i_extra_isize = 0;
}
ret = inode;
dquot_initialize(inode);
err = dquot_alloc_inode(inode);
if (err)
goto fail_drop;
err = ext3_init_acl(handle, inode, dir);
if (err)
goto fail_free_drop;
err = ext3_init_security(handle, inode, dir, qstr);
if (err)
goto fail_free_drop;
err = ext3_mark_inode_dirty(handle, inode);
if (err) {
ext3_std_error(sb, err);
goto fail_free_drop;
}
ext3_debug("allocating inode %lu\n", inode->i_ino);
trace_ext3_allocate_inode(inode, dir, mode);
goto really_out;
fail:
ext3_std_error(sb, err);
out:
iput(inode);
ret = ERR_PTR(err);
really_out:
brelse(bitmap_bh);
return ret;
fail_free_drop:
dquot_free_inode(inode);
fail_drop:
dquot_drop(inode);
inode->i_flags |= S_NOQUOTA;
clear_nlink(inode);
unlock_new_inode(inode);
iput(inode);
brelse(bitmap_bh);
return ERR_PTR(err);
}
/* Verify that we are loading a valid orphan from disk */
struct inode *ext3_orphan_get(struct super_block *sb, unsigned long ino)
{
unsigned long max_ino = le32_to_cpu(EXT3_SB(sb)->s_es->s_inodes_count);
unsigned long block_group;
int bit;
struct buffer_head *bitmap_bh;
struct inode *inode = NULL;
long err = -EIO;
/* Error cases - e2fsck has already cleaned up for us */
if (ino > max_ino) {
ext3_warning(sb, __func__,
"bad orphan ino %lu! e2fsck was run?", ino);
goto error;
}
block_group = (ino - 1) / EXT3_INODES_PER_GROUP(sb);
bit = (ino - 1) % EXT3_INODES_PER_GROUP(sb);
bitmap_bh = read_inode_bitmap(sb, block_group);
if (!bitmap_bh) {
ext3_warning(sb, __func__,
"inode bitmap error for orphan %lu", ino);
goto error;
}
/* Having the inode bit set should be a 100% indicator that this
* is a valid orphan (no e2fsck run on fs). Orphans also include
* inodes that were being truncated, so we can't check i_nlink==0.
*/
if (!ext3_test_bit(bit, bitmap_bh->b_data))
goto bad_orphan;
inode = ext3_iget(sb, ino);
if (IS_ERR(inode))
goto iget_failed;
/*
* If the orphans has i_nlinks > 0 then it should be able to be
* truncated, otherwise it won't be removed from the orphan list
* during processing and an infinite loop will result.
*/
if (inode->i_nlink && !ext3_can_truncate(inode))
goto bad_orphan;
if (NEXT_ORPHAN(inode) > max_ino)
goto bad_orphan;
brelse(bitmap_bh);
return inode;
iget_failed:
err = PTR_ERR(inode);
inode = NULL;
bad_orphan:
ext3_warning(sb, __func__,
"bad orphan inode %lu! e2fsck was run?", ino);
printk(KERN_NOTICE "ext3_test_bit(bit=%d, block=%llu) = %d\n",
bit, (unsigned long long)bitmap_bh->b_blocknr,
ext3_test_bit(bit, bitmap_bh->b_data));
printk(KERN_NOTICE "inode=%p\n", inode);
if (inode) {
printk(KERN_NOTICE "is_bad_inode(inode)=%d\n",
is_bad_inode(inode));
printk(KERN_NOTICE "NEXT_ORPHAN(inode)=%u\n",
NEXT_ORPHAN(inode));
printk(KERN_NOTICE "max_ino=%lu\n", max_ino);
printk(KERN_NOTICE "i_nlink=%u\n", inode->i_nlink);
/* Avoid freeing blocks if we got a bad deleted inode */
if (inode->i_nlink == 0)
inode->i_blocks = 0;
iput(inode);
}
brelse(bitmap_bh);
error:
return ERR_PTR(err);
}
unsigned long ext3_count_free_inodes (struct super_block * sb)
{
unsigned long desc_count;
struct ext3_group_desc *gdp;
int i;
#ifdef EXT3FS_DEBUG
struct ext3_super_block *es;
unsigned long bitmap_count, x;
struct buffer_head *bitmap_bh = NULL;
es = EXT3_SB(sb)->s_es;
desc_count = 0;
bitmap_count = 0;
gdp = NULL;
for (i = 0; i < EXT3_SB(sb)->s_groups_count; i++) {
gdp = ext3_get_group_desc (sb, i, NULL);
if (!gdp)
continue;
desc_count += le16_to_cpu(gdp->bg_free_inodes_count);
brelse(bitmap_bh);
bitmap_bh = read_inode_bitmap(sb, i);
if (!bitmap_bh)
continue;
x = ext3_count_free(bitmap_bh, EXT3_INODES_PER_GROUP(sb) / 8);
printk("group %d: stored = %d, counted = %lu\n",
i, le16_to_cpu(gdp->bg_free_inodes_count), x);
bitmap_count += x;
}
brelse(bitmap_bh);
printk("ext3_count_free_inodes: stored = %u, computed = %lu, %lu\n",
le32_to_cpu(es->s_free_inodes_count), desc_count, bitmap_count);
return desc_count;
#else
desc_count = 0;
for (i = 0; i < EXT3_SB(sb)->s_groups_count; i++) {
gdp = ext3_get_group_desc (sb, i, NULL);
if (!gdp)
continue;
desc_count += le16_to_cpu(gdp->bg_free_inodes_count);
cond_resched();
}
return desc_count;
#endif
}
/* Called at mount-time, super-block is locked */
unsigned long ext3_count_dirs (struct super_block * sb)
{
unsigned long count = 0;
int i;
for (i = 0; i < EXT3_SB(sb)->s_groups_count; i++) {
struct ext3_group_desc *gdp = ext3_get_group_desc (sb, i, NULL);
if (!gdp)
continue;
count += le16_to_cpu(gdp->bg_used_dirs_count);
}
return count;
}

File diff suppressed because it is too large Load Diff

View File

@ -1,327 +0,0 @@
/*
* linux/fs/ext3/ioctl.c
*
* Copyright (C) 1993, 1994, 1995
* Remy Card (card@masi.ibp.fr)
* Laboratoire MASI - Institut Blaise Pascal
* Universite Pierre et Marie Curie (Paris VI)
*/
#include <linux/mount.h>
#include <linux/compat.h>
#include <asm/uaccess.h>
#include "ext3.h"
long ext3_ioctl(struct file *filp, unsigned int cmd, unsigned long arg)
{
struct inode *inode = file_inode(filp);
struct ext3_inode_info *ei = EXT3_I(inode);
unsigned int flags;
unsigned short rsv_window_size;
ext3_debug ("cmd = %u, arg = %lu\n", cmd, arg);
switch (cmd) {
case EXT3_IOC_GETFLAGS:
ext3_get_inode_flags(ei);
flags = ei->i_flags & EXT3_FL_USER_VISIBLE;
return put_user(flags, (int __user *) arg);
case EXT3_IOC_SETFLAGS: {
handle_t *handle = NULL;
int err;
struct ext3_iloc iloc;
unsigned int oldflags;
unsigned int jflag;
if (!inode_owner_or_capable(inode))
return -EACCES;
if (get_user(flags, (int __user *) arg))
return -EFAULT;
err = mnt_want_write_file(filp);
if (err)
return err;
flags = ext3_mask_flags(inode->i_mode, flags);
mutex_lock(&inode->i_mutex);
/* Is it quota file? Do not allow user to mess with it */
err = -EPERM;
if (IS_NOQUOTA(inode))
goto flags_out;
oldflags = ei->i_flags;
/* The JOURNAL_DATA flag is modifiable only by root */
jflag = flags & EXT3_JOURNAL_DATA_FL;
/*
* The IMMUTABLE and APPEND_ONLY flags can only be changed by
* the relevant capability.
*
* This test looks nicer. Thanks to Pauline Middelink
*/
if ((flags ^ oldflags) & (EXT3_APPEND_FL | EXT3_IMMUTABLE_FL)) {
if (!capable(CAP_LINUX_IMMUTABLE))
goto flags_out;
}
/*
* The JOURNAL_DATA flag can only be changed by
* the relevant capability.
*/
if ((jflag ^ oldflags) & (EXT3_JOURNAL_DATA_FL)) {
if (!capable(CAP_SYS_RESOURCE))
goto flags_out;
}
handle = ext3_journal_start(inode, 1);
if (IS_ERR(handle)) {
err = PTR_ERR(handle);
goto flags_out;
}
if (IS_SYNC(inode))
handle->h_sync = 1;
err = ext3_reserve_inode_write(handle, inode, &iloc);
if (err)
goto flags_err;
flags = flags & EXT3_FL_USER_MODIFIABLE;
flags |= oldflags & ~EXT3_FL_USER_MODIFIABLE;
ei->i_flags = flags;
ext3_set_inode_flags(inode);
inode->i_ctime = CURRENT_TIME_SEC;
err = ext3_mark_iloc_dirty(handle, inode, &iloc);
flags_err:
ext3_journal_stop(handle);
if (err)
goto flags_out;
if ((jflag ^ oldflags) & (EXT3_JOURNAL_DATA_FL))
err = ext3_change_inode_journal_flag(inode, jflag);
flags_out:
mutex_unlock(&inode->i_mutex);
mnt_drop_write_file(filp);
return err;
}
case EXT3_IOC_GETVERSION:
case EXT3_IOC_GETVERSION_OLD:
return put_user(inode->i_generation, (int __user *) arg);
case EXT3_IOC_SETVERSION:
case EXT3_IOC_SETVERSION_OLD: {
handle_t *handle;
struct ext3_iloc iloc;
__u32 generation;
int err;
if (!inode_owner_or_capable(inode))
return -EPERM;
err = mnt_want_write_file(filp);
if (err)
return err;
if (get_user(generation, (int __user *) arg)) {
err = -EFAULT;
goto setversion_out;
}
mutex_lock(&inode->i_mutex);
handle = ext3_journal_start(inode, 1);
if (IS_ERR(handle)) {
err = PTR_ERR(handle);
goto unlock_out;
}
err = ext3_reserve_inode_write(handle, inode, &iloc);
if (err == 0) {
inode->i_ctime = CURRENT_TIME_SEC;
inode->i_generation = generation;
err = ext3_mark_iloc_dirty(handle, inode, &iloc);
}
ext3_journal_stop(handle);
unlock_out:
mutex_unlock(&inode->i_mutex);
setversion_out:
mnt_drop_write_file(filp);
return err;
}
case EXT3_IOC_GETRSVSZ:
if (test_opt(inode->i_sb, RESERVATION)
&& S_ISREG(inode->i_mode)
&& ei->i_block_alloc_info) {
rsv_window_size = ei->i_block_alloc_info->rsv_window_node.rsv_goal_size;
return put_user(rsv_window_size, (int __user *)arg);
}
return -ENOTTY;
case EXT3_IOC_SETRSVSZ: {
int err;
if (!test_opt(inode->i_sb, RESERVATION) ||!S_ISREG(inode->i_mode))
return -ENOTTY;
err = mnt_want_write_file(filp);
if (err)
return err;
if (!inode_owner_or_capable(inode)) {
err = -EACCES;
goto setrsvsz_out;
}
if (get_user(rsv_window_size, (int __user *)arg)) {
err = -EFAULT;
goto setrsvsz_out;
}
if (rsv_window_size > EXT3_MAX_RESERVE_BLOCKS)
rsv_window_size = EXT3_MAX_RESERVE_BLOCKS;
/*
* need to allocate reservation structure for this inode
* before set the window size
*/
mutex_lock(&ei->truncate_mutex);
if (!ei->i_block_alloc_info)
ext3_init_block_alloc_info(inode);
if (ei->i_block_alloc_info){
struct ext3_reserve_window_node *rsv = &ei->i_block_alloc_info->rsv_window_node;
rsv->rsv_goal_size = rsv_window_size;
}
mutex_unlock(&ei->truncate_mutex);
setrsvsz_out:
mnt_drop_write_file(filp);
return err;
}
case EXT3_IOC_GROUP_EXTEND: {
ext3_fsblk_t n_blocks_count;
struct super_block *sb = inode->i_sb;
int err, err2;
if (!capable(CAP_SYS_RESOURCE))
return -EPERM;
err = mnt_want_write_file(filp);
if (err)
return err;
if (get_user(n_blocks_count, (__u32 __user *)arg)) {
err = -EFAULT;
goto group_extend_out;
}
err = ext3_group_extend(sb, EXT3_SB(sb)->s_es, n_blocks_count);
journal_lock_updates(EXT3_SB(sb)->s_journal);
err2 = journal_flush(EXT3_SB(sb)->s_journal);
journal_unlock_updates(EXT3_SB(sb)->s_journal);
if (err == 0)
err = err2;
group_extend_out:
mnt_drop_write_file(filp);
return err;
}
case EXT3_IOC_GROUP_ADD: {
struct ext3_new_group_data input;
struct super_block *sb = inode->i_sb;
int err, err2;
if (!capable(CAP_SYS_RESOURCE))
return -EPERM;
err = mnt_want_write_file(filp);
if (err)
return err;
if (copy_from_user(&input, (struct ext3_new_group_input __user *)arg,
sizeof(input))) {
err = -EFAULT;
goto group_add_out;
}
err = ext3_group_add(sb, &input);
journal_lock_updates(EXT3_SB(sb)->s_journal);
err2 = journal_flush(EXT3_SB(sb)->s_journal);
journal_unlock_updates(EXT3_SB(sb)->s_journal);
if (err == 0)
err = err2;
group_add_out:
mnt_drop_write_file(filp);
return err;
}
case FITRIM: {
struct super_block *sb = inode->i_sb;
struct fstrim_range range;
int ret = 0;
if (!capable(CAP_SYS_ADMIN))
return -EPERM;
if (copy_from_user(&range, (struct fstrim_range __user *)arg,
sizeof(range)))
return -EFAULT;
ret = ext3_trim_fs(sb, &range);
if (ret < 0)
return ret;
if (copy_to_user((struct fstrim_range __user *)arg, &range,
sizeof(range)))
return -EFAULT;
return 0;
}
default:
return -ENOTTY;
}
}
#ifdef CONFIG_COMPAT
long ext3_compat_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
{
/* These are just misnamed, they actually get/put from/to user an int */
switch (cmd) {
case EXT3_IOC32_GETFLAGS:
cmd = EXT3_IOC_GETFLAGS;
break;
case EXT3_IOC32_SETFLAGS:
cmd = EXT3_IOC_SETFLAGS;
break;
case EXT3_IOC32_GETVERSION:
cmd = EXT3_IOC_GETVERSION;
break;
case EXT3_IOC32_SETVERSION:
cmd = EXT3_IOC_SETVERSION;
break;
case EXT3_IOC32_GROUP_EXTEND:
cmd = EXT3_IOC_GROUP_EXTEND;
break;
case EXT3_IOC32_GETVERSION_OLD:
cmd = EXT3_IOC_GETVERSION_OLD;
break;
case EXT3_IOC32_SETVERSION_OLD:
cmd = EXT3_IOC_SETVERSION_OLD;
break;
#ifdef CONFIG_JBD_DEBUG
case EXT3_IOC32_WAIT_FOR_READONLY:
cmd = EXT3_IOC_WAIT_FOR_READONLY;
break;
#endif
case EXT3_IOC32_GETRSVSZ:
cmd = EXT3_IOC_GETRSVSZ;
break;
case EXT3_IOC32_SETRSVSZ:
cmd = EXT3_IOC_SETRSVSZ;
break;
case EXT3_IOC_GROUP_ADD:
break;
default:
return -ENOIOCTLCMD;
}
return ext3_ioctl(file, cmd, (unsigned long) compat_ptr(arg));
}
#endif

File diff suppressed because it is too large Load Diff

View File

@ -1,27 +0,0 @@
/* linux/fs/ext3/namei.h
*
* Copyright (C) 2005 Simtec Electronics
* Ben Dooks <ben@simtec.co.uk>
*
*/
extern struct dentry *ext3_get_parent(struct dentry *child);
static inline struct buffer_head *ext3_dir_bread(handle_t *handle,
struct inode *inode,
int block, int create,
int *err)
{
struct buffer_head *bh;
bh = ext3_bread(handle, inode, block, create, err);
if (!bh && !(*err)) {
*err = -EIO;
ext3_error(inode->i_sb, __func__,
"Directory hole detected on inode %lu\n",
inode->i_ino);
return NULL;
}
return bh;
}

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

View File

@ -1,46 +0,0 @@
/*
* linux/fs/ext3/symlink.c
*
* Only fast symlinks left here - the rest is done by generic code. AV, 1999
*
* Copyright (C) 1992, 1993, 1994, 1995
* Remy Card (card@masi.ibp.fr)
* Laboratoire MASI - Institut Blaise Pascal
* Universite Pierre et Marie Curie (Paris VI)
*
* from
*
* linux/fs/minix/symlink.c
*
* Copyright (C) 1991, 1992 Linus Torvalds
*
* ext3 symlink handling code
*/
#include "ext3.h"
#include "xattr.h"
const struct inode_operations ext3_symlink_inode_operations = {
.readlink = generic_readlink,
.follow_link = page_follow_link_light,
.put_link = page_put_link,
.setattr = ext3_setattr,
#ifdef CONFIG_EXT3_FS_XATTR
.setxattr = generic_setxattr,
.getxattr = generic_getxattr,
.listxattr = ext3_listxattr,
.removexattr = generic_removexattr,
#endif
};
const struct inode_operations ext3_fast_symlink_inode_operations = {
.readlink = generic_readlink,
.follow_link = simple_follow_link,
.setattr = ext3_setattr,
#ifdef CONFIG_EXT3_FS_XATTR
.setxattr = generic_setxattr,
.getxattr = generic_getxattr,
.listxattr = ext3_listxattr,
.removexattr = generic_removexattr,
#endif
};

File diff suppressed because it is too large Load Diff

View File

@ -1,136 +0,0 @@
/*
File: fs/ext3/xattr.h
On-disk format of extended attributes for the ext3 filesystem.
(C) 2001 Andreas Gruenbacher, <a.gruenbacher@computer.org>
*/
#include <linux/xattr.h>
/* Magic value in attribute blocks */
#define EXT3_XATTR_MAGIC 0xEA020000
/* Maximum number of references to one attribute block */
#define EXT3_XATTR_REFCOUNT_MAX 1024
/* Name indexes */
#define EXT3_XATTR_INDEX_USER 1
#define EXT3_XATTR_INDEX_POSIX_ACL_ACCESS 2
#define EXT3_XATTR_INDEX_POSIX_ACL_DEFAULT 3
#define EXT3_XATTR_INDEX_TRUSTED 4
#define EXT3_XATTR_INDEX_LUSTRE 5
#define EXT3_XATTR_INDEX_SECURITY 6
struct ext3_xattr_header {
__le32 h_magic; /* magic number for identification */
__le32 h_refcount; /* reference count */
__le32 h_blocks; /* number of disk blocks used */
__le32 h_hash; /* hash value of all attributes */
__u32 h_reserved[4]; /* zero right now */
};
struct ext3_xattr_ibody_header {
__le32 h_magic; /* magic number for identification */
};
struct ext3_xattr_entry {
__u8 e_name_len; /* length of name */
__u8 e_name_index; /* attribute name index */
__le16 e_value_offs; /* offset in disk block of value */
__le32 e_value_block; /* disk block attribute is stored on (n/i) */
__le32 e_value_size; /* size of attribute value */
__le32 e_hash; /* hash value of name and value */
char e_name[0]; /* attribute name */
};
#define EXT3_XATTR_PAD_BITS 2
#define EXT3_XATTR_PAD (1<<EXT3_XATTR_PAD_BITS)
#define EXT3_XATTR_ROUND (EXT3_XATTR_PAD-1)
#define EXT3_XATTR_LEN(name_len) \
(((name_len) + EXT3_XATTR_ROUND + \
sizeof(struct ext3_xattr_entry)) & ~EXT3_XATTR_ROUND)
#define EXT3_XATTR_NEXT(entry) \
( (struct ext3_xattr_entry *)( \
(char *)(entry) + EXT3_XATTR_LEN((entry)->e_name_len)) )
#define EXT3_XATTR_SIZE(size) \
(((size) + EXT3_XATTR_ROUND) & ~EXT3_XATTR_ROUND)
# ifdef CONFIG_EXT3_FS_XATTR
extern const struct xattr_handler ext3_xattr_user_handler;
extern const struct xattr_handler ext3_xattr_trusted_handler;
extern const struct xattr_handler ext3_xattr_security_handler;
extern ssize_t ext3_listxattr(struct dentry *, char *, size_t);
extern int ext3_xattr_get(struct inode *, int, const char *, void *, size_t);
extern int ext3_xattr_set(struct inode *, int, const char *, const void *, size_t, int);
extern int ext3_xattr_set_handle(handle_t *, struct inode *, int, const char *, const void *, size_t, int);
extern void ext3_xattr_delete_inode(handle_t *, struct inode *);
extern void ext3_xattr_put_super(struct super_block *);
extern int init_ext3_xattr(void);
extern void exit_ext3_xattr(void);
extern const struct xattr_handler *ext3_xattr_handlers[];
# else /* CONFIG_EXT3_FS_XATTR */
static inline int
ext3_xattr_get(struct inode *inode, int name_index, const char *name,
void *buffer, size_t size, int flags)
{
return -EOPNOTSUPP;
}
static inline int
ext3_xattr_set(struct inode *inode, int name_index, const char *name,
const void *value, size_t size, int flags)
{
return -EOPNOTSUPP;
}
static inline int
ext3_xattr_set_handle(handle_t *handle, struct inode *inode, int name_index,
const char *name, const void *value, size_t size, int flags)
{
return -EOPNOTSUPP;
}
static inline void
ext3_xattr_delete_inode(handle_t *handle, struct inode *inode)
{
}
static inline void
ext3_xattr_put_super(struct super_block *sb)
{
}
static inline int
init_ext3_xattr(void)
{
return 0;
}
static inline void
exit_ext3_xattr(void)
{
}
#define ext3_xattr_handlers NULL
# endif /* CONFIG_EXT3_FS_XATTR */
#ifdef CONFIG_EXT3_FS_SECURITY
extern int ext3_init_security(handle_t *handle, struct inode *inode,
struct inode *dir, const struct qstr *qstr);
#else
static inline int ext3_init_security(handle_t *handle, struct inode *inode,
struct inode *dir, const struct qstr *qstr)
{
return 0;
}
#endif

View File

@ -1,78 +0,0 @@
/*
* linux/fs/ext3/xattr_security.c
* Handler for storing security labels as extended attributes.
*/
#include <linux/security.h>
#include "ext3.h"
#include "xattr.h"
static size_t
ext3_xattr_security_list(struct dentry *dentry, char *list, size_t list_size,
const char *name, size_t name_len, int type)
{
const size_t prefix_len = XATTR_SECURITY_PREFIX_LEN;
const size_t total_len = prefix_len + name_len + 1;
if (list && total_len <= list_size) {
memcpy(list, XATTR_SECURITY_PREFIX, prefix_len);
memcpy(list+prefix_len, name, name_len);
list[prefix_len + name_len] = '\0';
}
return total_len;
}
static int
ext3_xattr_security_get(struct dentry *dentry, const char *name,
void *buffer, size_t size, int type)
{
if (strcmp(name, "") == 0)
return -EINVAL;
return ext3_xattr_get(d_inode(dentry), EXT3_XATTR_INDEX_SECURITY,
name, buffer, size);
}
static int
ext3_xattr_security_set(struct dentry *dentry, const char *name,
const void *value, size_t size, int flags, int type)
{
if (strcmp(name, "") == 0)
return -EINVAL;
return ext3_xattr_set(d_inode(dentry), EXT3_XATTR_INDEX_SECURITY,
name, value, size, flags);
}
static int ext3_initxattrs(struct inode *inode,
const struct xattr *xattr_array,
void *fs_info)
{
const struct xattr *xattr;
handle_t *handle = fs_info;
int err = 0;
for (xattr = xattr_array; xattr->name != NULL; xattr++) {
err = ext3_xattr_set_handle(handle, inode,
EXT3_XATTR_INDEX_SECURITY,
xattr->name, xattr->value,
xattr->value_len, 0);
if (err < 0)
break;
}
return err;
}
int
ext3_init_security(handle_t *handle, struct inode *inode, struct inode *dir,
const struct qstr *qstr)
{
return security_inode_init_security(inode, dir, qstr,
&ext3_initxattrs, handle);
}
const struct xattr_handler ext3_xattr_security_handler = {
.prefix = XATTR_SECURITY_PREFIX,
.list = ext3_xattr_security_list,
.get = ext3_xattr_security_get,
.set = ext3_xattr_security_set,
};

View File

@ -1,54 +0,0 @@
/*
* linux/fs/ext3/xattr_trusted.c
* Handler for trusted extended attributes.
*
* Copyright (C) 2003 by Andreas Gruenbacher, <a.gruenbacher@computer.org>
*/
#include "ext3.h"
#include "xattr.h"
static size_t
ext3_xattr_trusted_list(struct dentry *dentry, char *list, size_t list_size,
const char *name, size_t name_len, int type)
{
const size_t prefix_len = XATTR_TRUSTED_PREFIX_LEN;
const size_t total_len = prefix_len + name_len + 1;
if (!capable(CAP_SYS_ADMIN))
return 0;
if (list && total_len <= list_size) {
memcpy(list, XATTR_TRUSTED_PREFIX, prefix_len);
memcpy(list+prefix_len, name, name_len);
list[prefix_len + name_len] = '\0';
}
return total_len;
}
static int
ext3_xattr_trusted_get(struct dentry *dentry, const char *name,
void *buffer, size_t size, int type)
{
if (strcmp(name, "") == 0)
return -EINVAL;
return ext3_xattr_get(d_inode(dentry), EXT3_XATTR_INDEX_TRUSTED,
name, buffer, size);
}
static int
ext3_xattr_trusted_set(struct dentry *dentry, const char *name,
const void *value, size_t size, int flags, int type)
{
if (strcmp(name, "") == 0)
return -EINVAL;
return ext3_xattr_set(d_inode(dentry), EXT3_XATTR_INDEX_TRUSTED, name,
value, size, flags);
}
const struct xattr_handler ext3_xattr_trusted_handler = {
.prefix = XATTR_TRUSTED_PREFIX,
.list = ext3_xattr_trusted_list,
.get = ext3_xattr_trusted_get,
.set = ext3_xattr_trusted_set,
};

View File

@ -1,58 +0,0 @@
/*
* linux/fs/ext3/xattr_user.c
* Handler for extended user attributes.
*
* Copyright (C) 2001 by Andreas Gruenbacher, <a.gruenbacher@computer.org>
*/
#include "ext3.h"
#include "xattr.h"
static size_t
ext3_xattr_user_list(struct dentry *dentry, char *list, size_t list_size,
const char *name, size_t name_len, int type)
{
const size_t prefix_len = XATTR_USER_PREFIX_LEN;
const size_t total_len = prefix_len + name_len + 1;
if (!test_opt(dentry->d_sb, XATTR_USER))
return 0;
if (list && total_len <= list_size) {
memcpy(list, XATTR_USER_PREFIX, prefix_len);
memcpy(list+prefix_len, name, name_len);
list[prefix_len + name_len] = '\0';
}
return total_len;
}
static int
ext3_xattr_user_get(struct dentry *dentry, const char *name, void *buffer,
size_t size, int type)
{
if (strcmp(name, "") == 0)
return -EINVAL;
if (!test_opt(dentry->d_sb, XATTR_USER))
return -EOPNOTSUPP;
return ext3_xattr_get(d_inode(dentry), EXT3_XATTR_INDEX_USER,
name, buffer, size);
}
static int
ext3_xattr_user_set(struct dentry *dentry, const char *name,
const void *value, size_t size, int flags, int type)
{
if (strcmp(name, "") == 0)
return -EINVAL;
if (!test_opt(dentry->d_sb, XATTR_USER))
return -EOPNOTSUPP;
return ext3_xattr_set(d_inode(dentry), EXT3_XATTR_INDEX_USER,
name, value, size, flags);
}
const struct xattr_handler ext3_xattr_user_handler = {
.prefix = XATTR_USER_PREFIX,
.list = ext3_xattr_user_list,
.get = ext3_xattr_user_get,
.set = ext3_xattr_user_set,
};

View File

@ -1,5 +1,38 @@
# Ext3 configs are here for backward compatibility with old configs which may
# have EXT3_FS set but not EXT4_FS set and thus would result in non-bootable
# kernels after the removal of ext3 driver.
config EXT3_FS
tristate "The Extended 3 (ext3) filesystem"
# These must match EXT4_FS selects...
select EXT4_FS
select JBD2
select CRC16
select CRYPTO
select CRYPTO_CRC32C
help
This config option is here only for backward compatibility. ext3
filesystem is now handled by the ext4 driver.
config EXT3_FS_POSIX_ACL
bool "Ext3 POSIX Access Control Lists"
depends on EXT3_FS
select EXT4_FS_POSIX_ACL
select FS_POSIX_ACL
help
This config option is here only for backward compatibility. ext3
filesystem is now handled by the ext4 driver.
config EXT3_FS_SECURITY
bool "Ext3 Security Labels"
depends on EXT3_FS
select EXT4_FS_SECURITY
help
This config option is here only for backward compatibility. ext3
filesystem is now handled by the ext4 driver.
config EXT4_FS
tristate "The Extended 4 (ext4) filesystem"
# Please update EXT3_FS selects when changing these
select JBD2
select CRC16
select CRYPTO
@ -16,26 +49,27 @@ config EXT4_FS
up fsck time. For more information, please see the web pages at
http://ext4.wiki.kernel.org.
The ext4 filesystem will support mounting an ext3
filesystem; while there will be some performance gains from
the delayed allocation and inode table readahead, the best
performance gains will require enabling ext4 features in the
filesystem, or formatting a new filesystem as an ext4
filesystem initially.
The ext4 filesystem supports mounting an ext3 filesystem; while there
are some performance gains from the delayed allocation and inode
table readahead, the best performance gains require enabling ext4
features in the filesystem using tune2fs, or formatting a new
filesystem as an ext4 filesystem initially. Without explicit enabling
of ext4 features, the on disk filesystem format stays fully backward
compatible.
To compile this file system support as a module, choose M here. The
module will be called ext4.
If unsure, say N.
config EXT4_USE_FOR_EXT23
config EXT4_USE_FOR_EXT2
bool "Use ext4 for ext2/ext3 file systems"
depends on EXT4_FS
depends on EXT3_FS=n || EXT2_FS=n
depends on EXT2_FS=n
default y
help
Allow the ext4 file system driver code to be used for ext2 or
ext3 file system mounts. This allows users to reduce their
Allow the ext4 file system driver code to be used for ext2
file system mounts. This allows users to reduce their
compiled kernel size by using one file system driver for
ext2, ext3, and ext4 file systems.

View File

@ -721,7 +721,7 @@ struct inode *__ext4_new_inode(handle_t *handle, struct inode *dir,
struct ext4_group_desc *gdp = NULL;
struct ext4_inode_info *ei;
struct ext4_sb_info *sbi;
int ret2, err = 0;
int ret2, err;
struct inode *ret;
ext4_group_t i;
ext4_group_t flex_group;
@ -769,7 +769,9 @@ struct inode *__ext4_new_inode(handle_t *handle, struct inode *dir,
inode->i_gid = dir->i_gid;
} else
inode_init_owner(inode, dir, mode);
dquot_initialize(inode);
err = dquot_initialize(inode);
if (err)
goto out;
if (!goal)
goal = sbi->s_inode_goal;

View File

@ -4661,8 +4661,11 @@ int ext4_setattr(struct dentry *dentry, struct iattr *attr)
if (error)
return error;
if (is_quota_modification(inode, attr))
dquot_initialize(inode);
if (is_quota_modification(inode, attr)) {
error = dquot_initialize(inode);
if (error)
return error;
}
if ((ia_valid & ATTR_UID && !uid_eq(attr->ia_uid, inode->i_uid)) ||
(ia_valid & ATTR_GID && !gid_eq(attr->ia_gid, inode->i_gid))) {
handle_t *handle;

View File

@ -2436,7 +2436,9 @@ static int ext4_create(struct inode *dir, struct dentry *dentry, umode_t mode,
struct inode *inode;
int err, credits, retries = 0;
dquot_initialize(dir);
err = dquot_initialize(dir);
if (err)
return err;
credits = (EXT4_DATA_TRANS_BLOCKS(dir->i_sb) +
EXT4_INDEX_EXTRA_TRANS_BLOCKS + 3);
@ -2470,7 +2472,9 @@ static int ext4_mknod(struct inode *dir, struct dentry *dentry,
if (!new_valid_dev(rdev))
return -EINVAL;
dquot_initialize(dir);
err = dquot_initialize(dir);
if (err)
return err;
credits = (EXT4_DATA_TRANS_BLOCKS(dir->i_sb) +
EXT4_INDEX_EXTRA_TRANS_BLOCKS + 3);
@ -2499,7 +2503,9 @@ static int ext4_tmpfile(struct inode *dir, struct dentry *dentry, umode_t mode)
struct inode *inode;
int err, retries = 0;
dquot_initialize(dir);
err = dquot_initialize(dir);
if (err)
return err;
retry:
inode = ext4_new_inode_start_handle(dir, mode,
@ -2612,7 +2618,9 @@ static int ext4_mkdir(struct inode *dir, struct dentry *dentry, umode_t mode)
if (EXT4_DIR_LINK_MAX(dir))
return -EMLINK;
dquot_initialize(dir);
err = dquot_initialize(dir);
if (err)
return err;
credits = (EXT4_DATA_TRANS_BLOCKS(dir->i_sb) +
EXT4_INDEX_EXTRA_TRANS_BLOCKS + 3);
@ -2910,8 +2918,12 @@ static int ext4_rmdir(struct inode *dir, struct dentry *dentry)
/* Initialize quotas before so that eventual writes go in
* separate transaction */
dquot_initialize(dir);
dquot_initialize(d_inode(dentry));
retval = dquot_initialize(dir);
if (retval)
return retval;
retval = dquot_initialize(d_inode(dentry));
if (retval)
return retval;
retval = -ENOENT;
bh = ext4_find_entry(dir, &dentry->d_name, &de, NULL);
@ -2980,8 +2992,12 @@ static int ext4_unlink(struct inode *dir, struct dentry *dentry)
trace_ext4_unlink_enter(dir, dentry);
/* Initialize quotas before so that eventual writes go
* in separate transaction */
dquot_initialize(dir);
dquot_initialize(d_inode(dentry));
retval = dquot_initialize(dir);
if (retval)
return retval;
retval = dquot_initialize(d_inode(dentry));
if (retval)
return retval;
retval = -ENOENT;
bh = ext4_find_entry(dir, &dentry->d_name, &de, NULL);
@ -3066,7 +3082,9 @@ static int ext4_symlink(struct inode *dir,
goto err_free_sd;
}
dquot_initialize(dir);
err = dquot_initialize(dir);
if (err)
goto err_free_sd;
if ((disk_link.len > EXT4_N_BLOCKS * 4)) {
/*
@ -3197,7 +3215,9 @@ static int ext4_link(struct dentry *old_dentry,
if (ext4_encrypted_inode(dir) &&
!ext4_is_child_context_consistent_with_parent(dir, inode))
return -EPERM;
dquot_initialize(dir);
err = dquot_initialize(dir);
if (err)
return err;
retry:
handle = ext4_journal_start(dir, EXT4_HT_DIR,
@ -3476,13 +3496,20 @@ static int ext4_rename(struct inode *old_dir, struct dentry *old_dentry,
int credits;
u8 old_file_type;
dquot_initialize(old.dir);
dquot_initialize(new.dir);
retval = dquot_initialize(old.dir);
if (retval)
return retval;
retval = dquot_initialize(new.dir);
if (retval)
return retval;
/* Initialize quotas before so that eventual writes go
* in separate transaction */
if (new.inode)
dquot_initialize(new.inode);
if (new.inode) {
retval = dquot_initialize(new.inode);
if (retval)
return retval;
}
old.bh = ext4_find_entry(old.dir, &old.dentry->d_name, &old.de, NULL);
if (IS_ERR(old.bh))
@ -3678,8 +3705,12 @@ static int ext4_cross_rename(struct inode *old_dir, struct dentry *old_dentry,
new.inode)))
return -EPERM;
dquot_initialize(old.dir);
dquot_initialize(new.dir);
retval = dquot_initialize(old.dir);
if (retval)
return retval;
retval = dquot_initialize(new.dir);
if (retval)
return retval;
old.bh = ext4_find_entry(old.dir, &old.dentry->d_name,
&old.de, &old.inlined);

View File

@ -84,7 +84,7 @@ static void ext4_unregister_li_request(struct super_block *sb);
static void ext4_clear_request_list(void);
static int ext4_reserve_clusters(struct ext4_sb_info *, ext4_fsblk_t);
#if !defined(CONFIG_EXT2_FS) && !defined(CONFIG_EXT2_FS_MODULE) && defined(CONFIG_EXT4_USE_FOR_EXT23)
#if !defined(CONFIG_EXT2_FS) && !defined(CONFIG_EXT2_FS_MODULE) && defined(CONFIG_EXT4_USE_FOR_EXT2)
static struct file_system_type ext2_fs_type = {
.owner = THIS_MODULE,
.name = "ext2",
@ -100,7 +100,6 @@ MODULE_ALIAS("ext2");
#endif
#if !defined(CONFIG_EXT3_FS) && !defined(CONFIG_EXT3_FS_MODULE) && defined(CONFIG_EXT4_USE_FOR_EXT23)
static struct file_system_type ext3_fs_type = {
.owner = THIS_MODULE,
.name = "ext3",
@ -111,9 +110,6 @@ static struct file_system_type ext3_fs_type = {
MODULE_ALIAS_FS("ext3");
MODULE_ALIAS("ext3");
#define IS_EXT3_SB(sb) ((sb)->s_bdev->bd_holder == &ext3_fs_type)
#else
#define IS_EXT3_SB(sb) (0)
#endif
static int ext4_verify_csum_type(struct super_block *sb,
struct ext4_super_block *es)
@ -5500,7 +5496,7 @@ static struct dentry *ext4_mount(struct file_system_type *fs_type, int flags,
return mount_bdev(fs_type, flags, dev_name, data, ext4_fill_super);
}
#if !defined(CONFIG_EXT2_FS) && !defined(CONFIG_EXT2_FS_MODULE) && defined(CONFIG_EXT4_USE_FOR_EXT23)
#if !defined(CONFIG_EXT2_FS) && !defined(CONFIG_EXT2_FS_MODULE) && defined(CONFIG_EXT4_USE_FOR_EXT2)
static inline void register_as_ext2(void)
{
int err = register_filesystem(&ext2_fs_type);
@ -5530,7 +5526,6 @@ static inline void unregister_as_ext2(void) { }
static inline int ext2_feature_set_ok(struct super_block *sb) { return 0; }
#endif
#if !defined(CONFIG_EXT3_FS) && !defined(CONFIG_EXT3_FS_MODULE) && defined(CONFIG_EXT4_USE_FOR_EXT23)
static inline void register_as_ext3(void)
{
int err = register_filesystem(&ext3_fs_type);
@ -5556,11 +5551,6 @@ static inline int ext3_feature_set_ok(struct super_block *sb)
return 0;
return 1;
}
#else
static inline void register_as_ext3(void) { }
static inline void unregister_as_ext3(void) { }
static inline int ext3_feature_set_ok(struct super_block *sb) { return 0; }
#endif
static struct file_system_type ext4_fs_type = {
.owner = THIS_MODULE,

View File

@ -1,30 +0,0 @@
config JBD
tristate
help
This is a generic journalling layer for block devices. It is
currently used by the ext3 file system, but it could also be
used to add journal support to other file systems or block
devices such as RAID or LVM.
If you are using the ext3 file system, you need to say Y here.
If you are not using ext3 then you will probably want to say N.
To compile this device as a module, choose M here: the module will be
called jbd. If you are compiling ext3 into the kernel, you
cannot compile this code as a module.
config JBD_DEBUG
bool "JBD (ext3) debugging support"
depends on JBD && DEBUG_FS
help
If you are using the ext3 journaled file system (or potentially any
other file system/device using JBD), this option allows you to
enable debugging output while the system is running, in order to
help track down any problems you are having. By default the
debugging output will be turned off.
If you select Y here, then you will be able to turn on debugging
with "echo N > /sys/kernel/debug/jbd/jbd-debug", where N is a
number between 1 and 5, the higher the number, the more debugging
output is generated. To turn debugging off again, do
"echo 0 > /sys/kernel/debug/jbd/jbd-debug".

View File

@ -1,7 +0,0 @@
#
# Makefile for the linux journaling routines.
#
obj-$(CONFIG_JBD) += jbd.o
jbd-objs := transaction.o commit.o recovery.o checkpoint.o revoke.o journal.o

View File

@ -1,782 +0,0 @@
/*
* linux/fs/jbd/checkpoint.c
*
* Written by Stephen C. Tweedie <sct@redhat.com>, 1999
*
* Copyright 1999 Red Hat Software --- All Rights Reserved
*
* This file is part of the Linux kernel and is made available under
* the terms of the GNU General Public License, version 2, or at your
* option, any later version, incorporated herein by reference.
*
* Checkpoint routines for the generic filesystem journaling code.
* Part of the ext2fs journaling system.
*
* Checkpointing is the process of ensuring that a section of the log is
* committed fully to disk, so that that portion of the log can be
* reused.
*/
#include <linux/time.h>
#include <linux/fs.h>
#include <linux/jbd.h>
#include <linux/errno.h>
#include <linux/slab.h>
#include <linux/blkdev.h>
#include <trace/events/jbd.h>
/*
* Unlink a buffer from a transaction checkpoint list.
*
* Called with j_list_lock held.
*/
static inline void __buffer_unlink_first(struct journal_head *jh)
{
transaction_t *transaction = jh->b_cp_transaction;
jh->b_cpnext->b_cpprev = jh->b_cpprev;
jh->b_cpprev->b_cpnext = jh->b_cpnext;
if (transaction->t_checkpoint_list == jh) {
transaction->t_checkpoint_list = jh->b_cpnext;
if (transaction->t_checkpoint_list == jh)
transaction->t_checkpoint_list = NULL;
}
}
/*
* Unlink a buffer from a transaction checkpoint(io) list.
*
* Called with j_list_lock held.
*/
static inline void __buffer_unlink(struct journal_head *jh)
{
transaction_t *transaction = jh->b_cp_transaction;
__buffer_unlink_first(jh);
if (transaction->t_checkpoint_io_list == jh) {
transaction->t_checkpoint_io_list = jh->b_cpnext;
if (transaction->t_checkpoint_io_list == jh)
transaction->t_checkpoint_io_list = NULL;
}
}
/*
* Move a buffer from the checkpoint list to the checkpoint io list
*
* Called with j_list_lock held
*/
static inline void __buffer_relink_io(struct journal_head *jh)
{
transaction_t *transaction = jh->b_cp_transaction;
__buffer_unlink_first(jh);
if (!transaction->t_checkpoint_io_list) {
jh->b_cpnext = jh->b_cpprev = jh;
} else {
jh->b_cpnext = transaction->t_checkpoint_io_list;
jh->b_cpprev = transaction->t_checkpoint_io_list->b_cpprev;
jh->b_cpprev->b_cpnext = jh;
jh->b_cpnext->b_cpprev = jh;
}
transaction->t_checkpoint_io_list = jh;
}
/*
* Try to release a checkpointed buffer from its transaction.
* Returns 1 if we released it and 2 if we also released the
* whole transaction.
*
* Requires j_list_lock
* Called under jbd_lock_bh_state(jh2bh(jh)), and drops it
*/
static int __try_to_free_cp_buf(struct journal_head *jh)
{
int ret = 0;
struct buffer_head *bh = jh2bh(jh);
if (jh->b_jlist == BJ_None && !buffer_locked(bh) &&
!buffer_dirty(bh) && !buffer_write_io_error(bh)) {
/*
* Get our reference so that bh cannot be freed before
* we unlock it
*/
get_bh(bh);
JBUFFER_TRACE(jh, "remove from checkpoint list");
ret = __journal_remove_checkpoint(jh) + 1;
jbd_unlock_bh_state(bh);
BUFFER_TRACE(bh, "release");
__brelse(bh);
} else {
jbd_unlock_bh_state(bh);
}
return ret;
}
/*
* __log_wait_for_space: wait until there is space in the journal.
*
* Called under j-state_lock *only*. It will be unlocked if we have to wait
* for a checkpoint to free up some space in the log.
*/
void __log_wait_for_space(journal_t *journal)
{
int nblocks, space_left;
assert_spin_locked(&journal->j_state_lock);
nblocks = jbd_space_needed(journal);
while (__log_space_left(journal) < nblocks) {
if (journal->j_flags & JFS_ABORT)
return;
spin_unlock(&journal->j_state_lock);
mutex_lock(&journal->j_checkpoint_mutex);
/*
* Test again, another process may have checkpointed while we
* were waiting for the checkpoint lock. If there are no
* transactions ready to be checkpointed, try to recover
* journal space by calling cleanup_journal_tail(), and if
* that doesn't work, by waiting for the currently committing
* transaction to complete. If there is absolutely no way
* to make progress, this is either a BUG or corrupted
* filesystem, so abort the journal and leave a stack
* trace for forensic evidence.
*/
spin_lock(&journal->j_state_lock);
spin_lock(&journal->j_list_lock);
nblocks = jbd_space_needed(journal);
space_left = __log_space_left(journal);
if (space_left < nblocks) {
int chkpt = journal->j_checkpoint_transactions != NULL;
tid_t tid = 0;
if (journal->j_committing_transaction)
tid = journal->j_committing_transaction->t_tid;
spin_unlock(&journal->j_list_lock);
spin_unlock(&journal->j_state_lock);
if (chkpt) {
log_do_checkpoint(journal);
} else if (cleanup_journal_tail(journal) == 0) {
/* We were able to recover space; yay! */
;
} else if (tid) {
log_wait_commit(journal, tid);
} else {
printk(KERN_ERR "%s: needed %d blocks and "
"only had %d space available\n",
__func__, nblocks, space_left);
printk(KERN_ERR "%s: no way to get more "
"journal space\n", __func__);
WARN_ON(1);
journal_abort(journal, 0);
}
spin_lock(&journal->j_state_lock);
} else {
spin_unlock(&journal->j_list_lock);
}
mutex_unlock(&journal->j_checkpoint_mutex);
}
}
/*
* We were unable to perform jbd_trylock_bh_state() inside j_list_lock.
* The caller must restart a list walk. Wait for someone else to run
* jbd_unlock_bh_state().
*/
static void jbd_sync_bh(journal_t *journal, struct buffer_head *bh)
__releases(journal->j_list_lock)
{
get_bh(bh);
spin_unlock(&journal->j_list_lock);
jbd_lock_bh_state(bh);
jbd_unlock_bh_state(bh);
put_bh(bh);
}
/*
* Clean up transaction's list of buffers submitted for io.
* We wait for any pending IO to complete and remove any clean
* buffers. Note that we take the buffers in the opposite ordering
* from the one in which they were submitted for IO.
*
* Return 0 on success, and return <0 if some buffers have failed
* to be written out.
*
* Called with j_list_lock held.
*/
static int __wait_cp_io(journal_t *journal, transaction_t *transaction)
{
struct journal_head *jh;
struct buffer_head *bh;
tid_t this_tid;
int released = 0;
int ret = 0;
this_tid = transaction->t_tid;
restart:
/* Did somebody clean up the transaction in the meanwhile? */
if (journal->j_checkpoint_transactions != transaction ||
transaction->t_tid != this_tid)
return ret;
while (!released && transaction->t_checkpoint_io_list) {
jh = transaction->t_checkpoint_io_list;
bh = jh2bh(jh);
if (!jbd_trylock_bh_state(bh)) {
jbd_sync_bh(journal, bh);
spin_lock(&journal->j_list_lock);
goto restart;
}
get_bh(bh);
if (buffer_locked(bh)) {
spin_unlock(&journal->j_list_lock);
jbd_unlock_bh_state(bh);
wait_on_buffer(bh);
/* the journal_head may have gone by now */
BUFFER_TRACE(bh, "brelse");
__brelse(bh);
spin_lock(&journal->j_list_lock);
goto restart;
}
if (unlikely(buffer_write_io_error(bh)))
ret = -EIO;
/*
* Now in whatever state the buffer currently is, we know that
* it has been written out and so we can drop it from the list
*/
released = __journal_remove_checkpoint(jh);
jbd_unlock_bh_state(bh);
__brelse(bh);
}
return ret;
}
#define NR_BATCH 64
static void
__flush_batch(journal_t *journal, struct buffer_head **bhs, int *batch_count)
{
int i;
struct blk_plug plug;
blk_start_plug(&plug);
for (i = 0; i < *batch_count; i++)
write_dirty_buffer(bhs[i], WRITE_SYNC);
blk_finish_plug(&plug);
for (i = 0; i < *batch_count; i++) {
struct buffer_head *bh = bhs[i];
clear_buffer_jwrite(bh);
BUFFER_TRACE(bh, "brelse");
__brelse(bh);
}
*batch_count = 0;
}
/*
* Try to flush one buffer from the checkpoint list to disk.
*
* Return 1 if something happened which requires us to abort the current
* scan of the checkpoint list. Return <0 if the buffer has failed to
* be written out.
*
* Called with j_list_lock held and drops it if 1 is returned
* Called under jbd_lock_bh_state(jh2bh(jh)), and drops it
*/
static int __process_buffer(journal_t *journal, struct journal_head *jh,
struct buffer_head **bhs, int *batch_count)
{
struct buffer_head *bh = jh2bh(jh);
int ret = 0;
if (buffer_locked(bh)) {
get_bh(bh);
spin_unlock(&journal->j_list_lock);
jbd_unlock_bh_state(bh);
wait_on_buffer(bh);
/* the journal_head may have gone by now */
BUFFER_TRACE(bh, "brelse");
__brelse(bh);
ret = 1;
} else if (jh->b_transaction != NULL) {
transaction_t *t = jh->b_transaction;
tid_t tid = t->t_tid;
spin_unlock(&journal->j_list_lock);
jbd_unlock_bh_state(bh);
log_start_commit(journal, tid);
log_wait_commit(journal, tid);
ret = 1;
} else if (!buffer_dirty(bh)) {
ret = 1;
if (unlikely(buffer_write_io_error(bh)))
ret = -EIO;
get_bh(bh);
J_ASSERT_JH(jh, !buffer_jbddirty(bh));
BUFFER_TRACE(bh, "remove from checkpoint");
__journal_remove_checkpoint(jh);
spin_unlock(&journal->j_list_lock);
jbd_unlock_bh_state(bh);
__brelse(bh);
} else {
/*
* Important: we are about to write the buffer, and
* possibly block, while still holding the journal lock.
* We cannot afford to let the transaction logic start
* messing around with this buffer before we write it to
* disk, as that would break recoverability.
*/
BUFFER_TRACE(bh, "queue");
get_bh(bh);
J_ASSERT_BH(bh, !buffer_jwrite(bh));
set_buffer_jwrite(bh);
bhs[*batch_count] = bh;
__buffer_relink_io(jh);
jbd_unlock_bh_state(bh);
(*batch_count)++;
if (*batch_count == NR_BATCH) {
spin_unlock(&journal->j_list_lock);
__flush_batch(journal, bhs, batch_count);
ret = 1;
}
}
return ret;
}
/*
* Perform an actual checkpoint. We take the first transaction on the
* list of transactions to be checkpointed and send all its buffers
* to disk. We submit larger chunks of data at once.
*
* The journal should be locked before calling this function.
* Called with j_checkpoint_mutex held.
*/
int log_do_checkpoint(journal_t *journal)
{
transaction_t *transaction;
tid_t this_tid;
int result;
jbd_debug(1, "Start checkpoint\n");
/*
* First thing: if there are any transactions in the log which
* don't need checkpointing, just eliminate them from the
* journal straight away.
*/
result = cleanup_journal_tail(journal);
trace_jbd_checkpoint(journal, result);
jbd_debug(1, "cleanup_journal_tail returned %d\n", result);
if (result <= 0)
return result;
/*
* OK, we need to start writing disk blocks. Take one transaction
* and write it.
*/
result = 0;
spin_lock(&journal->j_list_lock);
if (!journal->j_checkpoint_transactions)
goto out;
transaction = journal->j_checkpoint_transactions;
this_tid = transaction->t_tid;
restart:
/*
* If someone cleaned up this transaction while we slept, we're
* done (maybe it's a new transaction, but it fell at the same
* address).
*/
if (journal->j_checkpoint_transactions == transaction &&
transaction->t_tid == this_tid) {
int batch_count = 0;
struct buffer_head *bhs[NR_BATCH];
struct journal_head *jh;
int retry = 0, err;
while (!retry && transaction->t_checkpoint_list) {
struct buffer_head *bh;
jh = transaction->t_checkpoint_list;
bh = jh2bh(jh);
if (!jbd_trylock_bh_state(bh)) {
jbd_sync_bh(journal, bh);
retry = 1;
break;
}
retry = __process_buffer(journal, jh, bhs,&batch_count);
if (retry < 0 && !result)
result = retry;
if (!retry && (need_resched() ||
spin_needbreak(&journal->j_list_lock))) {
spin_unlock(&journal->j_list_lock);
retry = 1;
break;
}
}
if (batch_count) {
if (!retry) {
spin_unlock(&journal->j_list_lock);
retry = 1;
}
__flush_batch(journal, bhs, &batch_count);
}
if (retry) {
spin_lock(&journal->j_list_lock);
goto restart;
}
/*
* Now we have cleaned up the first transaction's checkpoint
* list. Let's clean up the second one
*/
err = __wait_cp_io(journal, transaction);
if (!result)
result = err;
}
out:
spin_unlock(&journal->j_list_lock);
if (result < 0)
journal_abort(journal, result);
else
result = cleanup_journal_tail(journal);
return (result < 0) ? result : 0;
}
/*
* Check the list of checkpoint transactions for the journal to see if
* we have already got rid of any since the last update of the log tail
* in the journal superblock. If so, we can instantly roll the
* superblock forward to remove those transactions from the log.
*
* Return <0 on error, 0 on success, 1 if there was nothing to clean up.
*
* This is the only part of the journaling code which really needs to be
* aware of transaction aborts. Checkpointing involves writing to the
* main filesystem area rather than to the journal, so it can proceed
* even in abort state, but we must not update the super block if
* checkpointing may have failed. Otherwise, we would lose some metadata
* buffers which should be written-back to the filesystem.
*/
int cleanup_journal_tail(journal_t *journal)
{
transaction_t * transaction;
tid_t first_tid;
unsigned int blocknr, freed;
if (is_journal_aborted(journal))
return 1;
/*
* OK, work out the oldest transaction remaining in the log, and
* the log block it starts at.
*
* If the log is now empty, we need to work out which is the
* next transaction ID we will write, and where it will
* start.
*/
spin_lock(&journal->j_state_lock);
spin_lock(&journal->j_list_lock);
transaction = journal->j_checkpoint_transactions;
if (transaction) {
first_tid = transaction->t_tid;
blocknr = transaction->t_log_start;
} else if ((transaction = journal->j_committing_transaction) != NULL) {
first_tid = transaction->t_tid;
blocknr = transaction->t_log_start;
} else if ((transaction = journal->j_running_transaction) != NULL) {
first_tid = transaction->t_tid;
blocknr = journal->j_head;
} else {
first_tid = journal->j_transaction_sequence;
blocknr = journal->j_head;
}
spin_unlock(&journal->j_list_lock);
J_ASSERT(blocknr != 0);
/* If the oldest pinned transaction is at the tail of the log
already then there's not much we can do right now. */
if (journal->j_tail_sequence == first_tid) {
spin_unlock(&journal->j_state_lock);
return 1;
}
spin_unlock(&journal->j_state_lock);
/*
* We need to make sure that any blocks that were recently written out
* --- perhaps by log_do_checkpoint() --- are flushed out before we
* drop the transactions from the journal. Similarly we need to be sure
* superblock makes it to disk before next transaction starts reusing
* freed space (otherwise we could replay some blocks of the new
* transaction thinking they belong to the old one). So we use
* WRITE_FLUSH_FUA. It's unlikely this will be necessary, especially
* with an appropriately sized journal, but we need this to guarantee
* correctness. Fortunately cleanup_journal_tail() doesn't get called
* all that often.
*/
journal_update_sb_log_tail(journal, first_tid, blocknr,
WRITE_FLUSH_FUA);
spin_lock(&journal->j_state_lock);
/* OK, update the superblock to recover the freed space.
* Physical blocks come first: have we wrapped beyond the end of
* the log? */
freed = blocknr - journal->j_tail;
if (blocknr < journal->j_tail)
freed = freed + journal->j_last - journal->j_first;
trace_jbd_cleanup_journal_tail(journal, first_tid, blocknr, freed);
jbd_debug(1,
"Cleaning journal tail from %d to %d (offset %u), "
"freeing %u\n",
journal->j_tail_sequence, first_tid, blocknr, freed);
journal->j_free += freed;
journal->j_tail_sequence = first_tid;
journal->j_tail = blocknr;
spin_unlock(&journal->j_state_lock);
return 0;
}
/* Checkpoint list management */
/*
* journal_clean_one_cp_list
*
* Find all the written-back checkpoint buffers in the given list and release
* them.
*
* Called with j_list_lock held.
* Returns number of buffers reaped (for debug)
*/
static int journal_clean_one_cp_list(struct journal_head *jh, int *released)
{
struct journal_head *last_jh;
struct journal_head *next_jh = jh;
int ret, freed = 0;
*released = 0;
if (!jh)
return 0;
last_jh = jh->b_cpprev;
do {
jh = next_jh;
next_jh = jh->b_cpnext;
/* Use trylock because of the ranking */
if (jbd_trylock_bh_state(jh2bh(jh))) {
ret = __try_to_free_cp_buf(jh);
if (ret) {
freed++;
if (ret == 2) {
*released = 1;
return freed;
}
}
}
/*
* This function only frees up some memory
* if possible so we dont have an obligation
* to finish processing. Bail out if preemption
* requested:
*/
if (need_resched())
return freed;
} while (jh != last_jh);
return freed;
}
/*
* journal_clean_checkpoint_list
*
* Find all the written-back checkpoint buffers in the journal and release them.
*
* Called with the journal locked.
* Called with j_list_lock held.
* Returns number of buffers reaped (for debug)
*/
int __journal_clean_checkpoint_list(journal_t *journal)
{
transaction_t *transaction, *last_transaction, *next_transaction;
int ret = 0;
int released;
transaction = journal->j_checkpoint_transactions;
if (!transaction)
goto out;
last_transaction = transaction->t_cpprev;
next_transaction = transaction;
do {
transaction = next_transaction;
next_transaction = transaction->t_cpnext;
ret += journal_clean_one_cp_list(transaction->
t_checkpoint_list, &released);
/*
* This function only frees up some memory if possible so we
* dont have an obligation to finish processing. Bail out if
* preemption requested:
*/
if (need_resched())
goto out;
if (released)
continue;
/*
* It is essential that we are as careful as in the case of
* t_checkpoint_list with removing the buffer from the list as
* we can possibly see not yet submitted buffers on io_list
*/
ret += journal_clean_one_cp_list(transaction->
t_checkpoint_io_list, &released);
if (need_resched())
goto out;
} while (transaction != last_transaction);
out:
return ret;
}
/*
* journal_remove_checkpoint: called after a buffer has been committed
* to disk (either by being write-back flushed to disk, or being
* committed to the log).
*
* We cannot safely clean a transaction out of the log until all of the
* buffer updates committed in that transaction have safely been stored
* elsewhere on disk. To achieve this, all of the buffers in a
* transaction need to be maintained on the transaction's checkpoint
* lists until they have been rewritten, at which point this function is
* called to remove the buffer from the existing transaction's
* checkpoint lists.
*
* The function returns 1 if it frees the transaction, 0 otherwise.
* The function can free jh and bh.
*
* This function is called with j_list_lock held.
* This function is called with jbd_lock_bh_state(jh2bh(jh))
*/
int __journal_remove_checkpoint(struct journal_head *jh)
{
transaction_t *transaction;
journal_t *journal;
int ret = 0;
JBUFFER_TRACE(jh, "entry");
if ((transaction = jh->b_cp_transaction) == NULL) {
JBUFFER_TRACE(jh, "not on transaction");
goto out;
}
journal = transaction->t_journal;
JBUFFER_TRACE(jh, "removing from transaction");
__buffer_unlink(jh);
jh->b_cp_transaction = NULL;
journal_put_journal_head(jh);
if (transaction->t_checkpoint_list != NULL ||
transaction->t_checkpoint_io_list != NULL)
goto out;
/*
* There is one special case to worry about: if we have just pulled the
* buffer off a running or committing transaction's checkpoing list,
* then even if the checkpoint list is empty, the transaction obviously
* cannot be dropped!
*
* The locking here around t_state is a bit sleazy.
* See the comment at the end of journal_commit_transaction().
*/
if (transaction->t_state != T_FINISHED)
goto out;
/* OK, that was the last buffer for the transaction: we can now
safely remove this transaction from the log */
__journal_drop_transaction(journal, transaction);
/* Just in case anybody was waiting for more transactions to be
checkpointed... */
wake_up(&journal->j_wait_logspace);
ret = 1;
out:
return ret;
}
/*
* journal_insert_checkpoint: put a committed buffer onto a checkpoint
* list so that we know when it is safe to clean the transaction out of
* the log.
*
* Called with the journal locked.
* Called with j_list_lock held.
*/
void __journal_insert_checkpoint(struct journal_head *jh,
transaction_t *transaction)
{
JBUFFER_TRACE(jh, "entry");
J_ASSERT_JH(jh, buffer_dirty(jh2bh(jh)) || buffer_jbddirty(jh2bh(jh)));
J_ASSERT_JH(jh, jh->b_cp_transaction == NULL);
/* Get reference for checkpointing transaction */
journal_grab_journal_head(jh2bh(jh));
jh->b_cp_transaction = transaction;
if (!transaction->t_checkpoint_list) {
jh->b_cpnext = jh->b_cpprev = jh;
} else {
jh->b_cpnext = transaction->t_checkpoint_list;
jh->b_cpprev = transaction->t_checkpoint_list->b_cpprev;
jh->b_cpprev->b_cpnext = jh;
jh->b_cpnext->b_cpprev = jh;
}
transaction->t_checkpoint_list = jh;
}
/*
* We've finished with this transaction structure: adios...
*
* The transaction must have no links except for the checkpoint by this
* point.
*
* Called with the journal locked.
* Called with j_list_lock held.
*/
void __journal_drop_transaction(journal_t *journal, transaction_t *transaction)
{
assert_spin_locked(&journal->j_list_lock);
if (transaction->t_cpnext) {
transaction->t_cpnext->t_cpprev = transaction->t_cpprev;
transaction->t_cpprev->t_cpnext = transaction->t_cpnext;
if (journal->j_checkpoint_transactions == transaction)
journal->j_checkpoint_transactions =
transaction->t_cpnext;
if (journal->j_checkpoint_transactions == transaction)
journal->j_checkpoint_transactions = NULL;
}
J_ASSERT(transaction->t_state == T_FINISHED);
J_ASSERT(transaction->t_buffers == NULL);
J_ASSERT(transaction->t_sync_datalist == NULL);
J_ASSERT(transaction->t_forget == NULL);
J_ASSERT(transaction->t_iobuf_list == NULL);
J_ASSERT(transaction->t_shadow_list == NULL);
J_ASSERT(transaction->t_log_list == NULL);
J_ASSERT(transaction->t_checkpoint_list == NULL);
J_ASSERT(transaction->t_checkpoint_io_list == NULL);
J_ASSERT(transaction->t_updates == 0);
J_ASSERT(journal->j_committing_transaction != transaction);
J_ASSERT(journal->j_running_transaction != transaction);
trace_jbd_drop_transaction(journal, transaction);
jbd_debug(1, "Dropping transaction %d, all done\n", transaction->t_tid);
kfree(transaction);
}

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

View File

@ -1,594 +0,0 @@
/*
* linux/fs/jbd/recovery.c
*
* Written by Stephen C. Tweedie <sct@redhat.com>, 1999
*
* Copyright 1999-2000 Red Hat Software --- All Rights Reserved
*
* This file is part of the Linux kernel and is made available under
* the terms of the GNU General Public License, version 2, or at your
* option, any later version, incorporated herein by reference.
*
* Journal recovery routines for the generic filesystem journaling code;
* part of the ext2fs journaling system.
*/
#ifndef __KERNEL__
#include "jfs_user.h"
#else
#include <linux/time.h>
#include <linux/fs.h>
#include <linux/jbd.h>
#include <linux/errno.h>
#include <linux/blkdev.h>
#endif
/*
* Maintain information about the progress of the recovery job, so that
* the different passes can carry information between them.
*/
struct recovery_info
{
tid_t start_transaction;
tid_t end_transaction;
int nr_replays;
int nr_revokes;
int nr_revoke_hits;
};
enum passtype {PASS_SCAN, PASS_REVOKE, PASS_REPLAY};
static int do_one_pass(journal_t *journal,
struct recovery_info *info, enum passtype pass);
static int scan_revoke_records(journal_t *, struct buffer_head *,
tid_t, struct recovery_info *);
#ifdef __KERNEL__
/* Release readahead buffers after use */
static void journal_brelse_array(struct buffer_head *b[], int n)
{
while (--n >= 0)
brelse (b[n]);
}
/*
* When reading from the journal, we are going through the block device
* layer directly and so there is no readahead being done for us. We
* need to implement any readahead ourselves if we want it to happen at
* all. Recovery is basically one long sequential read, so make sure we
* do the IO in reasonably large chunks.
*
* This is not so critical that we need to be enormously clever about
* the readahead size, though. 128K is a purely arbitrary, good-enough
* fixed value.
*/
#define MAXBUF 8
static int do_readahead(journal_t *journal, unsigned int start)
{
int err;
unsigned int max, nbufs, next;
unsigned int blocknr;
struct buffer_head *bh;
struct buffer_head * bufs[MAXBUF];
/* Do up to 128K of readahead */
max = start + (128 * 1024 / journal->j_blocksize);
if (max > journal->j_maxlen)
max = journal->j_maxlen;
/* Do the readahead itself. We'll submit MAXBUF buffer_heads at
* a time to the block device IO layer. */
nbufs = 0;
for (next = start; next < max; next++) {
err = journal_bmap(journal, next, &blocknr);
if (err) {
printk (KERN_ERR "JBD: bad block at offset %u\n",
next);
goto failed;
}
bh = __getblk(journal->j_dev, blocknr, journal->j_blocksize);
if (!bh) {
err = -ENOMEM;
goto failed;
}
if (!buffer_uptodate(bh) && !buffer_locked(bh)) {
bufs[nbufs++] = bh;
if (nbufs == MAXBUF) {
ll_rw_block(READ, nbufs, bufs);
journal_brelse_array(bufs, nbufs);
nbufs = 0;
}
} else
brelse(bh);
}
if (nbufs)
ll_rw_block(READ, nbufs, bufs);
err = 0;
failed:
if (nbufs)
journal_brelse_array(bufs, nbufs);
return err;
}
#endif /* __KERNEL__ */
/*
* Read a block from the journal
*/
static int jread(struct buffer_head **bhp, journal_t *journal,
unsigned int offset)
{
int err;
unsigned int blocknr;
struct buffer_head *bh;
*bhp = NULL;
if (offset >= journal->j_maxlen) {
printk(KERN_ERR "JBD: corrupted journal superblock\n");
return -EIO;
}
err = journal_bmap(journal, offset, &blocknr);
if (err) {
printk (KERN_ERR "JBD: bad block at offset %u\n",
offset);
return err;
}
bh = __getblk(journal->j_dev, blocknr, journal->j_blocksize);
if (!bh)
return -ENOMEM;
if (!buffer_uptodate(bh)) {
/* If this is a brand new buffer, start readahead.
Otherwise, we assume we are already reading it. */
if (!buffer_req(bh))
do_readahead(journal, offset);
wait_on_buffer(bh);
}
if (!buffer_uptodate(bh)) {
printk (KERN_ERR "JBD: Failed to read block at offset %u\n",
offset);
brelse(bh);
return -EIO;
}
*bhp = bh;
return 0;
}
/*
* Count the number of in-use tags in a journal descriptor block.
*/
static int count_tags(struct buffer_head *bh, int size)
{
char * tagp;
journal_block_tag_t * tag;
int nr = 0;
tagp = &bh->b_data[sizeof(journal_header_t)];
while ((tagp - bh->b_data + sizeof(journal_block_tag_t)) <= size) {
tag = (journal_block_tag_t *) tagp;
nr++;
tagp += sizeof(journal_block_tag_t);
if (!(tag->t_flags & cpu_to_be32(JFS_FLAG_SAME_UUID)))
tagp += 16;
if (tag->t_flags & cpu_to_be32(JFS_FLAG_LAST_TAG))
break;
}
return nr;
}
/* Make sure we wrap around the log correctly! */
#define wrap(journal, var) \
do { \
if (var >= (journal)->j_last) \
var -= ((journal)->j_last - (journal)->j_first); \
} while (0)
/**
* journal_recover - recovers a on-disk journal
* @journal: the journal to recover
*
* The primary function for recovering the log contents when mounting a
* journaled device.
*
* Recovery is done in three passes. In the first pass, we look for the
* end of the log. In the second, we assemble the list of revoke
* blocks. In the third and final pass, we replay any un-revoked blocks
* in the log.
*/
int journal_recover(journal_t *journal)
{
int err, err2;
journal_superblock_t * sb;
struct recovery_info info;
memset(&info, 0, sizeof(info));
sb = journal->j_superblock;
/*
* The journal superblock's s_start field (the current log head)
* is always zero if, and only if, the journal was cleanly
* unmounted.
*/
if (!sb->s_start) {
jbd_debug(1, "No recovery required, last transaction %d\n",
be32_to_cpu(sb->s_sequence));
journal->j_transaction_sequence = be32_to_cpu(sb->s_sequence) + 1;
return 0;
}
err = do_one_pass(journal, &info, PASS_SCAN);
if (!err)
err = do_one_pass(journal, &info, PASS_REVOKE);
if (!err)
err = do_one_pass(journal, &info, PASS_REPLAY);
jbd_debug(1, "JBD: recovery, exit status %d, "
"recovered transactions %u to %u\n",
err, info.start_transaction, info.end_transaction);
jbd_debug(1, "JBD: Replayed %d and revoked %d/%d blocks\n",
info.nr_replays, info.nr_revoke_hits, info.nr_revokes);
/* Restart the log at the next transaction ID, thus invalidating
* any existing commit records in the log. */
journal->j_transaction_sequence = ++info.end_transaction;
journal_clear_revoke(journal);
err2 = sync_blockdev(journal->j_fs_dev);
if (!err)
err = err2;
/* Flush disk caches to get replayed data on the permanent storage */
if (journal->j_flags & JFS_BARRIER) {
err2 = blkdev_issue_flush(journal->j_fs_dev, GFP_KERNEL, NULL);
if (!err)
err = err2;
}
return err;
}
/**
* journal_skip_recovery - Start journal and wipe exiting records
* @journal: journal to startup
*
* Locate any valid recovery information from the journal and set up the
* journal structures in memory to ignore it (presumably because the
* caller has evidence that it is out of date).
* This function does'nt appear to be exorted..
*
* We perform one pass over the journal to allow us to tell the user how
* much recovery information is being erased, and to let us initialise
* the journal transaction sequence numbers to the next unused ID.
*/
int journal_skip_recovery(journal_t *journal)
{
int err;
struct recovery_info info;
memset (&info, 0, sizeof(info));
err = do_one_pass(journal, &info, PASS_SCAN);
if (err) {
printk(KERN_ERR "JBD: error %d scanning journal\n", err);
++journal->j_transaction_sequence;
} else {
#ifdef CONFIG_JBD_DEBUG
int dropped = info.end_transaction -
be32_to_cpu(journal->j_superblock->s_sequence);
jbd_debug(1,
"JBD: ignoring %d transaction%s from the journal.\n",
dropped, (dropped == 1) ? "" : "s");
#endif
journal->j_transaction_sequence = ++info.end_transaction;
}
journal->j_tail = 0;
return err;
}
static int do_one_pass(journal_t *journal,
struct recovery_info *info, enum passtype pass)
{
unsigned int first_commit_ID, next_commit_ID;
unsigned int next_log_block;
int err, success = 0;
journal_superblock_t * sb;
journal_header_t * tmp;
struct buffer_head * bh;
unsigned int sequence;
int blocktype;
/*
* First thing is to establish what we expect to find in the log
* (in terms of transaction IDs), and where (in terms of log
* block offsets): query the superblock.
*/
sb = journal->j_superblock;
next_commit_ID = be32_to_cpu(sb->s_sequence);
next_log_block = be32_to_cpu(sb->s_start);
first_commit_ID = next_commit_ID;
if (pass == PASS_SCAN)
info->start_transaction = first_commit_ID;
jbd_debug(1, "Starting recovery pass %d\n", pass);
/*
* Now we walk through the log, transaction by transaction,
* making sure that each transaction has a commit block in the
* expected place. Each complete transaction gets replayed back
* into the main filesystem.
*/
while (1) {
int flags;
char * tagp;
journal_block_tag_t * tag;
struct buffer_head * obh;
struct buffer_head * nbh;
cond_resched();
/* If we already know where to stop the log traversal,
* check right now that we haven't gone past the end of
* the log. */
if (pass != PASS_SCAN)
if (tid_geq(next_commit_ID, info->end_transaction))
break;
jbd_debug(2, "Scanning for sequence ID %u at %u/%u\n",
next_commit_ID, next_log_block, journal->j_last);
/* Skip over each chunk of the transaction looking
* either the next descriptor block or the final commit
* record. */
jbd_debug(3, "JBD: checking block %u\n", next_log_block);
err = jread(&bh, journal, next_log_block);
if (err)
goto failed;
next_log_block++;
wrap(journal, next_log_block);
/* What kind of buffer is it?
*
* If it is a descriptor block, check that it has the
* expected sequence number. Otherwise, we're all done
* here. */
tmp = (journal_header_t *)bh->b_data;
if (tmp->h_magic != cpu_to_be32(JFS_MAGIC_NUMBER)) {
brelse(bh);
break;
}
blocktype = be32_to_cpu(tmp->h_blocktype);
sequence = be32_to_cpu(tmp->h_sequence);
jbd_debug(3, "Found magic %d, sequence %d\n",
blocktype, sequence);
if (sequence != next_commit_ID) {
brelse(bh);
break;
}
/* OK, we have a valid descriptor block which matches
* all of the sequence number checks. What are we going
* to do with it? That depends on the pass... */
switch(blocktype) {
case JFS_DESCRIPTOR_BLOCK:
/* If it is a valid descriptor block, replay it
* in pass REPLAY; otherwise, just skip over the
* blocks it describes. */
if (pass != PASS_REPLAY) {
next_log_block +=
count_tags(bh, journal->j_blocksize);
wrap(journal, next_log_block);
brelse(bh);
continue;
}
/* A descriptor block: we can now write all of
* the data blocks. Yay, useful work is finally
* getting done here! */
tagp = &bh->b_data[sizeof(journal_header_t)];
while ((tagp - bh->b_data +sizeof(journal_block_tag_t))
<= journal->j_blocksize) {
unsigned int io_block;
tag = (journal_block_tag_t *) tagp;
flags = be32_to_cpu(tag->t_flags);
io_block = next_log_block++;
wrap(journal, next_log_block);
err = jread(&obh, journal, io_block);
if (err) {
/* Recover what we can, but
* report failure at the end. */
success = err;
printk (KERN_ERR
"JBD: IO error %d recovering "
"block %u in log\n",
err, io_block);
} else {
unsigned int blocknr;
J_ASSERT(obh != NULL);
blocknr = be32_to_cpu(tag->t_blocknr);
/* If the block has been
* revoked, then we're all done
* here. */
if (journal_test_revoke
(journal, blocknr,
next_commit_ID)) {
brelse(obh);
++info->nr_revoke_hits;
goto skip_write;
}
/* Find a buffer for the new
* data being restored */
nbh = __getblk(journal->j_fs_dev,
blocknr,
journal->j_blocksize);
if (nbh == NULL) {
printk(KERN_ERR
"JBD: Out of memory "
"during recovery.\n");
err = -ENOMEM;
brelse(bh);
brelse(obh);
goto failed;
}
lock_buffer(nbh);
memcpy(nbh->b_data, obh->b_data,
journal->j_blocksize);
if (flags & JFS_FLAG_ESCAPE) {
*((__be32 *)nbh->b_data) =
cpu_to_be32(JFS_MAGIC_NUMBER);
}
BUFFER_TRACE(nbh, "marking dirty");
set_buffer_uptodate(nbh);
mark_buffer_dirty(nbh);
BUFFER_TRACE(nbh, "marking uptodate");
++info->nr_replays;
/* ll_rw_block(WRITE, 1, &nbh); */
unlock_buffer(nbh);
brelse(obh);
brelse(nbh);
}
skip_write:
tagp += sizeof(journal_block_tag_t);
if (!(flags & JFS_FLAG_SAME_UUID))
tagp += 16;
if (flags & JFS_FLAG_LAST_TAG)
break;
}
brelse(bh);
continue;
case JFS_COMMIT_BLOCK:
/* Found an expected commit block: not much to
* do other than move on to the next sequence
* number. */
brelse(bh);
next_commit_ID++;
continue;
case JFS_REVOKE_BLOCK:
/* If we aren't in the REVOKE pass, then we can
* just skip over this block. */
if (pass != PASS_REVOKE) {
brelse(bh);
continue;
}
err = scan_revoke_records(journal, bh,
next_commit_ID, info);
brelse(bh);
if (err)
goto failed;
continue;
default:
jbd_debug(3, "Unrecognised magic %d, end of scan.\n",
blocktype);
brelse(bh);
goto done;
}
}
done:
/*
* We broke out of the log scan loop: either we came to the
* known end of the log or we found an unexpected block in the
* log. If the latter happened, then we know that the "current"
* transaction marks the end of the valid log.
*/
if (pass == PASS_SCAN)
info->end_transaction = next_commit_ID;
else {
/* It's really bad news if different passes end up at
* different places (but possible due to IO errors). */
if (info->end_transaction != next_commit_ID) {
printk (KERN_ERR "JBD: recovery pass %d ended at "
"transaction %u, expected %u\n",
pass, next_commit_ID, info->end_transaction);
if (!success)
success = -EIO;
}
}
return success;
failed:
return err;
}
/* Scan a revoke record, marking all blocks mentioned as revoked. */
static int scan_revoke_records(journal_t *journal, struct buffer_head *bh,
tid_t sequence, struct recovery_info *info)
{
journal_revoke_header_t *header;
int offset, max;
header = (journal_revoke_header_t *) bh->b_data;
offset = sizeof(journal_revoke_header_t);
max = be32_to_cpu(header->r_count);
while (offset < max) {
unsigned int blocknr;
int err;
blocknr = be32_to_cpu(* ((__be32 *) (bh->b_data+offset)));
offset += 4;
err = journal_set_revoke(journal, blocknr, sequence);
if (err)
return err;
++info->nr_revokes;
}
return 0;
}

View File

@ -1,733 +0,0 @@
/*
* linux/fs/jbd/revoke.c
*
* Written by Stephen C. Tweedie <sct@redhat.com>, 2000
*
* Copyright 2000 Red Hat corp --- All Rights Reserved
*
* This file is part of the Linux kernel and is made available under
* the terms of the GNU General Public License, version 2, or at your
* option, any later version, incorporated herein by reference.
*
* Journal revoke routines for the generic filesystem journaling code;
* part of the ext2fs journaling system.
*
* Revoke is the mechanism used to prevent old log records for deleted
* metadata from being replayed on top of newer data using the same
* blocks. The revoke mechanism is used in two separate places:
*
* + Commit: during commit we write the entire list of the current
* transaction's revoked blocks to the journal
*
* + Recovery: during recovery we record the transaction ID of all
* revoked blocks. If there are multiple revoke records in the log
* for a single block, only the last one counts, and if there is a log
* entry for a block beyond the last revoke, then that log entry still
* gets replayed.
*
* We can get interactions between revokes and new log data within a
* single transaction:
*
* Block is revoked and then journaled:
* The desired end result is the journaling of the new block, so we
* cancel the revoke before the transaction commits.
*
* Block is journaled and then revoked:
* The revoke must take precedence over the write of the block, so we
* need either to cancel the journal entry or to write the revoke
* later in the log than the log block. In this case, we choose the
* latter: journaling a block cancels any revoke record for that block
* in the current transaction, so any revoke for that block in the
* transaction must have happened after the block was journaled and so
* the revoke must take precedence.
*
* Block is revoked and then written as data:
* The data write is allowed to succeed, but the revoke is _not_
* cancelled. We still need to prevent old log records from
* overwriting the new data. We don't even need to clear the revoke
* bit here.
*
* We cache revoke status of a buffer in the current transaction in b_states
* bits. As the name says, revokevalid flag indicates that the cached revoke
* status of a buffer is valid and we can rely on the cached status.
*
* Revoke information on buffers is a tri-state value:
*
* RevokeValid clear: no cached revoke status, need to look it up
* RevokeValid set, Revoked clear:
* buffer has not been revoked, and cancel_revoke
* need do nothing.
* RevokeValid set, Revoked set:
* buffer has been revoked.
*
* Locking rules:
* We keep two hash tables of revoke records. One hashtable belongs to the
* running transaction (is pointed to by journal->j_revoke), the other one
* belongs to the committing transaction. Accesses to the second hash table
* happen only from the kjournald and no other thread touches this table. Also
* journal_switch_revoke_table() which switches which hashtable belongs to the
* running and which to the committing transaction is called only from
* kjournald. Therefore we need no locks when accessing the hashtable belonging
* to the committing transaction.
*
* All users operating on the hash table belonging to the running transaction
* have a handle to the transaction. Therefore they are safe from kjournald
* switching hash tables under them. For operations on the lists of entries in
* the hash table j_revoke_lock is used.
*
* Finally, also replay code uses the hash tables but at this moment no one else
* can touch them (filesystem isn't mounted yet) and hence no locking is
* needed.
*/
#ifndef __KERNEL__
#include "jfs_user.h"
#else
#include <linux/time.h>
#include <linux/fs.h>
#include <linux/jbd.h>
#include <linux/errno.h>
#include <linux/slab.h>
#include <linux/list.h>
#include <linux/init.h>
#include <linux/bio.h>
#endif
#include <linux/log2.h>
#include <linux/hash.h>
static struct kmem_cache *revoke_record_cache;
static struct kmem_cache *revoke_table_cache;
/* Each revoke record represents one single revoked block. During
journal replay, this involves recording the transaction ID of the
last transaction to revoke this block. */
struct jbd_revoke_record_s
{
struct list_head hash;
tid_t sequence; /* Used for recovery only */
unsigned int blocknr;
};
/* The revoke table is just a simple hash table of revoke records. */
struct jbd_revoke_table_s
{
/* It is conceivable that we might want a larger hash table
* for recovery. Must be a power of two. */
int hash_size;
int hash_shift;
struct list_head *hash_table;
};
#ifdef __KERNEL__
static void write_one_revoke_record(journal_t *, transaction_t *,
struct journal_head **, int *,
struct jbd_revoke_record_s *, int);
static void flush_descriptor(journal_t *, struct journal_head *, int, int);
#endif
/* Utility functions to maintain the revoke table */
static inline int hash(journal_t *journal, unsigned int block)
{
struct jbd_revoke_table_s *table = journal->j_revoke;
return hash_32(block, table->hash_shift);
}
static int insert_revoke_hash(journal_t *journal, unsigned int blocknr,
tid_t seq)
{
struct list_head *hash_list;
struct jbd_revoke_record_s *record;
repeat:
record = kmem_cache_alloc(revoke_record_cache, GFP_NOFS);
if (!record)
goto oom;
record->sequence = seq;
record->blocknr = blocknr;
hash_list = &journal->j_revoke->hash_table[hash(journal, blocknr)];
spin_lock(&journal->j_revoke_lock);
list_add(&record->hash, hash_list);
spin_unlock(&journal->j_revoke_lock);
return 0;
oom:
if (!journal_oom_retry)
return -ENOMEM;
jbd_debug(1, "ENOMEM in %s, retrying\n", __func__);
yield();
goto repeat;
}
/* Find a revoke record in the journal's hash table. */
static struct jbd_revoke_record_s *find_revoke_record(journal_t *journal,
unsigned int blocknr)
{
struct list_head *hash_list;
struct jbd_revoke_record_s *record;
hash_list = &journal->j_revoke->hash_table[hash(journal, blocknr)];
spin_lock(&journal->j_revoke_lock);
record = (struct jbd_revoke_record_s *) hash_list->next;
while (&(record->hash) != hash_list) {
if (record->blocknr == blocknr) {
spin_unlock(&journal->j_revoke_lock);
return record;
}
record = (struct jbd_revoke_record_s *) record->hash.next;
}
spin_unlock(&journal->j_revoke_lock);
return NULL;
}
void journal_destroy_revoke_caches(void)
{
if (revoke_record_cache) {
kmem_cache_destroy(revoke_record_cache);
revoke_record_cache = NULL;
}
if (revoke_table_cache) {
kmem_cache_destroy(revoke_table_cache);
revoke_table_cache = NULL;
}
}
int __init journal_init_revoke_caches(void)
{
J_ASSERT(!revoke_record_cache);
J_ASSERT(!revoke_table_cache);
revoke_record_cache = kmem_cache_create("revoke_record",
sizeof(struct jbd_revoke_record_s),
0,
SLAB_HWCACHE_ALIGN|SLAB_TEMPORARY,
NULL);
if (!revoke_record_cache)
goto record_cache_failure;
revoke_table_cache = kmem_cache_create("revoke_table",
sizeof(struct jbd_revoke_table_s),
0, SLAB_TEMPORARY, NULL);
if (!revoke_table_cache)
goto table_cache_failure;
return 0;
table_cache_failure:
journal_destroy_revoke_caches();
record_cache_failure:
return -ENOMEM;
}
static struct jbd_revoke_table_s *journal_init_revoke_table(int hash_size)
{
int i;
struct jbd_revoke_table_s *table;
table = kmem_cache_alloc(revoke_table_cache, GFP_KERNEL);
if (!table)
goto out;
table->hash_size = hash_size;
table->hash_shift = ilog2(hash_size);
table->hash_table =
kmalloc(hash_size * sizeof(struct list_head), GFP_KERNEL);
if (!table->hash_table) {
kmem_cache_free(revoke_table_cache, table);
table = NULL;
goto out;
}
for (i = 0; i < hash_size; i++)
INIT_LIST_HEAD(&table->hash_table[i]);
out:
return table;
}
static void journal_destroy_revoke_table(struct jbd_revoke_table_s *table)
{
int i;
struct list_head *hash_list;
for (i = 0; i < table->hash_size; i++) {
hash_list = &table->hash_table[i];
J_ASSERT(list_empty(hash_list));
}
kfree(table->hash_table);
kmem_cache_free(revoke_table_cache, table);
}
/* Initialise the revoke table for a given journal to a given size. */
int journal_init_revoke(journal_t *journal, int hash_size)
{
J_ASSERT(journal->j_revoke_table[0] == NULL);
J_ASSERT(is_power_of_2(hash_size));
journal->j_revoke_table[0] = journal_init_revoke_table(hash_size);
if (!journal->j_revoke_table[0])
goto fail0;
journal->j_revoke_table[1] = journal_init_revoke_table(hash_size);
if (!journal->j_revoke_table[1])
goto fail1;
journal->j_revoke = journal->j_revoke_table[1];
spin_lock_init(&journal->j_revoke_lock);
return 0;
fail1:
journal_destroy_revoke_table(journal->j_revoke_table[0]);
fail0:
return -ENOMEM;
}
/* Destroy a journal's revoke table. The table must already be empty! */
void journal_destroy_revoke(journal_t *journal)
{
journal->j_revoke = NULL;
if (journal->j_revoke_table[0])
journal_destroy_revoke_table(journal->j_revoke_table[0]);
if (journal->j_revoke_table[1])
journal_destroy_revoke_table(journal->j_revoke_table[1]);
}
#ifdef __KERNEL__
/*
* journal_revoke: revoke a given buffer_head from the journal. This
* prevents the block from being replayed during recovery if we take a
* crash after this current transaction commits. Any subsequent
* metadata writes of the buffer in this transaction cancel the
* revoke.
*
* Note that this call may block --- it is up to the caller to make
* sure that there are no further calls to journal_write_metadata
* before the revoke is complete. In ext3, this implies calling the
* revoke before clearing the block bitmap when we are deleting
* metadata.
*
* Revoke performs a journal_forget on any buffer_head passed in as a
* parameter, but does _not_ forget the buffer_head if the bh was only
* found implicitly.
*
* bh_in may not be a journalled buffer - it may have come off
* the hash tables without an attached journal_head.
*
* If bh_in is non-zero, journal_revoke() will decrement its b_count
* by one.
*/
int journal_revoke(handle_t *handle, unsigned int blocknr,
struct buffer_head *bh_in)
{
struct buffer_head *bh = NULL;
journal_t *journal;
struct block_device *bdev;
int err;
might_sleep();
if (bh_in)
BUFFER_TRACE(bh_in, "enter");
journal = handle->h_transaction->t_journal;
if (!journal_set_features(journal, 0, 0, JFS_FEATURE_INCOMPAT_REVOKE)){
J_ASSERT (!"Cannot set revoke feature!");
return -EINVAL;
}
bdev = journal->j_fs_dev;
bh = bh_in;
if (!bh) {
bh = __find_get_block(bdev, blocknr, journal->j_blocksize);
if (bh)
BUFFER_TRACE(bh, "found on hash");
}
#ifdef JBD_EXPENSIVE_CHECKING
else {
struct buffer_head *bh2;
/* If there is a different buffer_head lying around in
* memory anywhere... */
bh2 = __find_get_block(bdev, blocknr, journal->j_blocksize);
if (bh2) {
/* ... and it has RevokeValid status... */
if (bh2 != bh && buffer_revokevalid(bh2))
/* ...then it better be revoked too,
* since it's illegal to create a revoke
* record against a buffer_head which is
* not marked revoked --- that would
* risk missing a subsequent revoke
* cancel. */
J_ASSERT_BH(bh2, buffer_revoked(bh2));
put_bh(bh2);
}
}
#endif
/* We really ought not ever to revoke twice in a row without
first having the revoke cancelled: it's illegal to free a
block twice without allocating it in between! */
if (bh) {
if (!J_EXPECT_BH(bh, !buffer_revoked(bh),
"inconsistent data on disk")) {
if (!bh_in)
brelse(bh);
return -EIO;
}
set_buffer_revoked(bh);
set_buffer_revokevalid(bh);
if (bh_in) {
BUFFER_TRACE(bh_in, "call journal_forget");
journal_forget(handle, bh_in);
} else {
BUFFER_TRACE(bh, "call brelse");
__brelse(bh);
}
}
jbd_debug(2, "insert revoke for block %u, bh_in=%p\n", blocknr, bh_in);
err = insert_revoke_hash(journal, blocknr,
handle->h_transaction->t_tid);
BUFFER_TRACE(bh_in, "exit");
return err;
}
/*
* Cancel an outstanding revoke. For use only internally by the
* journaling code (called from journal_get_write_access).
*
* We trust buffer_revoked() on the buffer if the buffer is already
* being journaled: if there is no revoke pending on the buffer, then we
* don't do anything here.
*
* This would break if it were possible for a buffer to be revoked and
* discarded, and then reallocated within the same transaction. In such
* a case we would have lost the revoked bit, but when we arrived here
* the second time we would still have a pending revoke to cancel. So,
* do not trust the Revoked bit on buffers unless RevokeValid is also
* set.
*/
int journal_cancel_revoke(handle_t *handle, struct journal_head *jh)
{
struct jbd_revoke_record_s *record;
journal_t *journal = handle->h_transaction->t_journal;
int need_cancel;
int did_revoke = 0; /* akpm: debug */
struct buffer_head *bh = jh2bh(jh);
jbd_debug(4, "journal_head %p, cancelling revoke\n", jh);
/* Is the existing Revoke bit valid? If so, we trust it, and
* only perform the full cancel if the revoke bit is set. If
* not, we can't trust the revoke bit, and we need to do the
* full search for a revoke record. */
if (test_set_buffer_revokevalid(bh)) {
need_cancel = test_clear_buffer_revoked(bh);
} else {
need_cancel = 1;
clear_buffer_revoked(bh);
}
if (need_cancel) {
record = find_revoke_record(journal, bh->b_blocknr);
if (record) {
jbd_debug(4, "cancelled existing revoke on "
"blocknr %llu\n", (unsigned long long)bh->b_blocknr);
spin_lock(&journal->j_revoke_lock);
list_del(&record->hash);
spin_unlock(&journal->j_revoke_lock);
kmem_cache_free(revoke_record_cache, record);
did_revoke = 1;
}
}
#ifdef JBD_EXPENSIVE_CHECKING
/* There better not be one left behind by now! */
record = find_revoke_record(journal, bh->b_blocknr);
J_ASSERT_JH(jh, record == NULL);
#endif
/* Finally, have we just cleared revoke on an unhashed
* buffer_head? If so, we'd better make sure we clear the
* revoked status on any hashed alias too, otherwise the revoke
* state machine will get very upset later on. */
if (need_cancel) {
struct buffer_head *bh2;
bh2 = __find_get_block(bh->b_bdev, bh->b_blocknr, bh->b_size);
if (bh2) {
if (bh2 != bh)
clear_buffer_revoked(bh2);
__brelse(bh2);
}
}
return did_revoke;
}
/*
* journal_clear_revoked_flags clears revoked flag of buffers in
* revoke table to reflect there is no revoked buffer in the next
* transaction which is going to be started.
*/
void journal_clear_buffer_revoked_flags(journal_t *journal)
{
struct jbd_revoke_table_s *revoke = journal->j_revoke;
int i = 0;
for (i = 0; i < revoke->hash_size; i++) {
struct list_head *hash_list;
struct list_head *list_entry;
hash_list = &revoke->hash_table[i];
list_for_each(list_entry, hash_list) {
struct jbd_revoke_record_s *record;
struct buffer_head *bh;
record = (struct jbd_revoke_record_s *)list_entry;
bh = __find_get_block(journal->j_fs_dev,
record->blocknr,
journal->j_blocksize);
if (bh) {
clear_buffer_revoked(bh);
__brelse(bh);
}
}
}
}
/* journal_switch_revoke table select j_revoke for next transaction
* we do not want to suspend any processing until all revokes are
* written -bzzz
*/
void journal_switch_revoke_table(journal_t *journal)
{
int i;
if (journal->j_revoke == journal->j_revoke_table[0])
journal->j_revoke = journal->j_revoke_table[1];
else
journal->j_revoke = journal->j_revoke_table[0];
for (i = 0; i < journal->j_revoke->hash_size; i++)
INIT_LIST_HEAD(&journal->j_revoke->hash_table[i]);
}
/*
* Write revoke records to the journal for all entries in the current
* revoke hash, deleting the entries as we go.
*/
void journal_write_revoke_records(journal_t *journal,
transaction_t *transaction, int write_op)
{
struct journal_head *descriptor;
struct jbd_revoke_record_s *record;
struct jbd_revoke_table_s *revoke;
struct list_head *hash_list;
int i, offset, count;
descriptor = NULL;
offset = 0;
count = 0;
/* select revoke table for committing transaction */
revoke = journal->j_revoke == journal->j_revoke_table[0] ?
journal->j_revoke_table[1] : journal->j_revoke_table[0];
for (i = 0; i < revoke->hash_size; i++) {
hash_list = &revoke->hash_table[i];
while (!list_empty(hash_list)) {
record = (struct jbd_revoke_record_s *)
hash_list->next;
write_one_revoke_record(journal, transaction,
&descriptor, &offset,
record, write_op);
count++;
list_del(&record->hash);
kmem_cache_free(revoke_record_cache, record);
}
}
if (descriptor)
flush_descriptor(journal, descriptor, offset, write_op);
jbd_debug(1, "Wrote %d revoke records\n", count);
}
/*
* Write out one revoke record. We need to create a new descriptor
* block if the old one is full or if we have not already created one.
*/
static void write_one_revoke_record(journal_t *journal,
transaction_t *transaction,
struct journal_head **descriptorp,
int *offsetp,
struct jbd_revoke_record_s *record,
int write_op)
{
struct journal_head *descriptor;
int offset;
journal_header_t *header;
/* If we are already aborting, this all becomes a noop. We
still need to go round the loop in
journal_write_revoke_records in order to free all of the
revoke records: only the IO to the journal is omitted. */
if (is_journal_aborted(journal))
return;
descriptor = *descriptorp;
offset = *offsetp;
/* Make sure we have a descriptor with space left for the record */
if (descriptor) {
if (offset == journal->j_blocksize) {
flush_descriptor(journal, descriptor, offset, write_op);
descriptor = NULL;
}
}
if (!descriptor) {
descriptor = journal_get_descriptor_buffer(journal);
if (!descriptor)
return;
header = (journal_header_t *) &jh2bh(descriptor)->b_data[0];
header->h_magic = cpu_to_be32(JFS_MAGIC_NUMBER);
header->h_blocktype = cpu_to_be32(JFS_REVOKE_BLOCK);
header->h_sequence = cpu_to_be32(transaction->t_tid);
/* Record it so that we can wait for IO completion later */
JBUFFER_TRACE(descriptor, "file as BJ_LogCtl");
journal_file_buffer(descriptor, transaction, BJ_LogCtl);
offset = sizeof(journal_revoke_header_t);
*descriptorp = descriptor;
}
* ((__be32 *)(&jh2bh(descriptor)->b_data[offset])) =
cpu_to_be32(record->blocknr);
offset += 4;
*offsetp = offset;
}
/*
* Flush a revoke descriptor out to the journal. If we are aborting,
* this is a noop; otherwise we are generating a buffer which needs to
* be waited for during commit, so it has to go onto the appropriate
* journal buffer list.
*/
static void flush_descriptor(journal_t *journal,
struct journal_head *descriptor,
int offset, int write_op)
{
journal_revoke_header_t *header;
struct buffer_head *bh = jh2bh(descriptor);
if (is_journal_aborted(journal)) {
put_bh(bh);
return;
}
header = (journal_revoke_header_t *) jh2bh(descriptor)->b_data;
header->r_count = cpu_to_be32(offset);
set_buffer_jwrite(bh);
BUFFER_TRACE(bh, "write");
set_buffer_dirty(bh);
write_dirty_buffer(bh, write_op);
}
#endif
/*
* Revoke support for recovery.
*
* Recovery needs to be able to:
*
* record all revoke records, including the tid of the latest instance
* of each revoke in the journal
*
* check whether a given block in a given transaction should be replayed
* (ie. has not been revoked by a revoke record in that or a subsequent
* transaction)
*
* empty the revoke table after recovery.
*/
/*
* First, setting revoke records. We create a new revoke record for
* every block ever revoked in the log as we scan it for recovery, and
* we update the existing records if we find multiple revokes for a
* single block.
*/
int journal_set_revoke(journal_t *journal,
unsigned int blocknr,
tid_t sequence)
{
struct jbd_revoke_record_s *record;
record = find_revoke_record(journal, blocknr);
if (record) {
/* If we have multiple occurrences, only record the
* latest sequence number in the hashed record */
if (tid_gt(sequence, record->sequence))
record->sequence = sequence;
return 0;
}
return insert_revoke_hash(journal, blocknr, sequence);
}
/*
* Test revoke records. For a given block referenced in the log, has
* that block been revoked? A revoke record with a given transaction
* sequence number revokes all blocks in that transaction and earlier
* ones, but later transactions still need replayed.
*/
int journal_test_revoke(journal_t *journal,
unsigned int blocknr,
tid_t sequence)
{
struct jbd_revoke_record_s *record;
record = find_revoke_record(journal, blocknr);
if (!record)
return 0;
if (tid_gt(sequence, record->sequence))
return 0;
return 1;
}
/*
* Finally, once recovery is over, we need to clear the revoke table so
* that it can be reused by the running filesystem.
*/
void journal_clear_revoke(journal_t *journal)
{
int i;
struct list_head *hash_list;
struct jbd_revoke_record_s *record;
struct jbd_revoke_table_s *revoke;
revoke = journal->j_revoke;
for (i = 0; i < revoke->hash_size; i++) {
hash_list = &revoke->hash_table[i];
while (!list_empty(hash_list)) {
record = (struct jbd_revoke_record_s*) hash_list->next;
list_del(&record->hash);
kmem_cache_free(revoke_record_cache, record);
}
}
}

File diff suppressed because it is too large Load Diff

View File

@ -107,8 +107,11 @@ int jfs_setattr(struct dentry *dentry, struct iattr *iattr)
if (rc)
return rc;
if (is_quota_modification(inode, iattr))
dquot_initialize(inode);
if (is_quota_modification(inode, iattr)) {
rc = dquot_initialize(inode);
if (rc)
return rc;
}
if ((iattr->ia_valid & ATTR_UID && !uid_eq(iattr->ia_uid, inode->i_uid)) ||
(iattr->ia_valid & ATTR_GID && !gid_eq(iattr->ia_gid, inode->i_gid))) {
rc = dquot_transfer(inode, iattr);

View File

@ -109,7 +109,9 @@ struct inode *ialloc(struct inode *parent, umode_t mode)
/*
* Allocate inode to quota.
*/
dquot_initialize(inode);
rc = dquot_initialize(inode);
if (rc)
goto fail_drop;
rc = dquot_alloc_inode(inode);
if (rc)
goto fail_drop;

View File

@ -86,7 +86,9 @@ static int jfs_create(struct inode *dip, struct dentry *dentry, umode_t mode,
jfs_info("jfs_create: dip:0x%p name:%pd", dip, dentry);
dquot_initialize(dip);
rc = dquot_initialize(dip);
if (rc)
goto out1;
/*
* search parent directory for entry/freespace
@ -218,7 +220,9 @@ static int jfs_mkdir(struct inode *dip, struct dentry *dentry, umode_t mode)
jfs_info("jfs_mkdir: dip:0x%p name:%pd", dip, dentry);
dquot_initialize(dip);
rc = dquot_initialize(dip);
if (rc)
goto out1;
/*
* search parent directory for entry/freespace
@ -355,8 +359,12 @@ static int jfs_rmdir(struct inode *dip, struct dentry *dentry)
jfs_info("jfs_rmdir: dip:0x%p name:%pd", dip, dentry);
/* Init inode for quota operations. */
dquot_initialize(dip);
dquot_initialize(ip);
rc = dquot_initialize(dip);
if (rc)
goto out;
rc = dquot_initialize(ip);
if (rc)
goto out;
/* directory must be empty to be removed */
if (!dtEmpty(ip)) {
@ -483,8 +491,12 @@ static int jfs_unlink(struct inode *dip, struct dentry *dentry)
jfs_info("jfs_unlink: dip:0x%p name:%pd", dip, dentry);
/* Init inode for quota operations. */
dquot_initialize(dip);
dquot_initialize(ip);
rc = dquot_initialize(dip);
if (rc)
goto out;
rc = dquot_initialize(ip);
if (rc)
goto out;
if ((rc = get_UCSname(&dname, dentry)))
goto out;
@ -799,7 +811,9 @@ static int jfs_link(struct dentry *old_dentry,
jfs_info("jfs_link: %pd %pd", old_dentry, dentry);
dquot_initialize(dir);
rc = dquot_initialize(dir);
if (rc)
goto out;
tid = txBegin(ip->i_sb, 0);
@ -810,7 +824,7 @@ static int jfs_link(struct dentry *old_dentry,
* scan parent directory for entry/freespace
*/
if ((rc = get_UCSname(&dname, dentry)))
goto out;
goto out_tx;
if ((rc = dtSearch(dir, &dname, &ino, &btstack, JFS_CREATE)))
goto free_dname;
@ -842,12 +856,13 @@ static int jfs_link(struct dentry *old_dentry,
free_dname:
free_UCSname(&dname);
out:
out_tx:
txEnd(tid);
mutex_unlock(&JFS_IP(ip)->commit_mutex);
mutex_unlock(&JFS_IP(dir)->commit_mutex);
out:
jfs_info("jfs_link: rc:%d", rc);
return rc;
}
@ -891,7 +906,9 @@ static int jfs_symlink(struct inode *dip, struct dentry *dentry,
jfs_info("jfs_symlink: dip:0x%p name:%s", dip, name);
dquot_initialize(dip);
rc = dquot_initialize(dip);
if (rc)
goto out1;
ssize = strlen(name) + 1;
@ -1082,8 +1099,12 @@ static int jfs_rename(struct inode *old_dir, struct dentry *old_dentry,
jfs_info("jfs_rename: %pd %pd", old_dentry, new_dentry);
dquot_initialize(old_dir);
dquot_initialize(new_dir);
rc = dquot_initialize(old_dir);
if (rc)
goto out1;
rc = dquot_initialize(new_dir);
if (rc)
goto out1;
old_ip = d_inode(old_dentry);
new_ip = d_inode(new_dentry);
@ -1130,7 +1151,9 @@ static int jfs_rename(struct inode *old_dir, struct dentry *old_dentry,
} else if (new_ip) {
IWRITE_LOCK(new_ip, RDWRLOCK_NORMAL);
/* Init inode for quota operations. */
dquot_initialize(new_ip);
rc = dquot_initialize(new_ip);
if (rc)
goto out_unlock;
}
/*
@ -1318,6 +1341,7 @@ static int jfs_rename(struct inode *old_dir, struct dentry *old_dentry,
clear_cflag(COMMIT_Stale, old_dir);
}
out_unlock:
if (new_ip && !S_ISDIR(new_ip->i_mode))
IWRITE_UNLOCK(new_ip);
out3:
@ -1353,7 +1377,9 @@ static int jfs_mknod(struct inode *dir, struct dentry *dentry,
jfs_info("jfs_mknod: %pd", dentry);
dquot_initialize(dir);
rc = dquot_initialize(dir);
if (rc)
goto out;
if ((rc = get_UCSname(&dname, dentry)))
goto out;

View File

@ -105,8 +105,11 @@ static int ocfs2_file_open(struct inode *inode, struct file *file)
file->f_path.dentry->d_name.len,
file->f_path.dentry->d_name.name, mode);
if (file->f_mode & FMODE_WRITE)
dquot_initialize(inode);
if (file->f_mode & FMODE_WRITE) {
status = dquot_initialize(inode);
if (status)
goto leave;
}
spin_lock(&oi->ip_lock);
@ -1155,8 +1158,11 @@ int ocfs2_setattr(struct dentry *dentry, struct iattr *attr)
if (status)
return status;
if (is_quota_modification(inode, attr))
dquot_initialize(inode);
if (is_quota_modification(inode, attr)) {
status = dquot_initialize(inode);
if (status)
return status;
}
size_change = S_ISREG(inode->i_mode) && attr->ia_valid & ATTR_SIZE;
if (size_change) {
status = ocfs2_rw_lock(inode, 1);
@ -1209,8 +1215,8 @@ int ocfs2_setattr(struct dentry *dentry, struct iattr *attr)
&& OCFS2_HAS_RO_COMPAT_FEATURE(sb,
OCFS2_FEATURE_RO_COMPAT_USRQUOTA)) {
transfer_to[USRQUOTA] = dqget(sb, make_kqid_uid(attr->ia_uid));
if (!transfer_to[USRQUOTA]) {
status = -ESRCH;
if (IS_ERR(transfer_to[USRQUOTA])) {
status = PTR_ERR(transfer_to[USRQUOTA]);
goto bail_unlock;
}
}
@ -1218,8 +1224,8 @@ int ocfs2_setattr(struct dentry *dentry, struct iattr *attr)
&& OCFS2_HAS_RO_COMPAT_FEATURE(sb,
OCFS2_FEATURE_RO_COMPAT_GRPQUOTA)) {
transfer_to[GRPQUOTA] = dqget(sb, make_kqid_gid(attr->ia_gid));
if (!transfer_to[GRPQUOTA]) {
status = -ESRCH;
if (IS_ERR(transfer_to[GRPQUOTA])) {
status = PTR_ERR(transfer_to[GRPQUOTA]);
goto bail_unlock;
}
}

View File

@ -200,11 +200,12 @@ static struct dentry *ocfs2_lookup(struct inode *dir, struct dentry *dentry,
static struct inode *ocfs2_get_init_inode(struct inode *dir, umode_t mode)
{
struct inode *inode;
int status;
inode = new_inode(dir->i_sb);
if (!inode) {
mlog(ML_ERROR, "new_inode failed!\n");
return NULL;
return ERR_PTR(-ENOMEM);
}
/* populate as many fields early on as possible - many of
@ -213,7 +214,10 @@ static struct inode *ocfs2_get_init_inode(struct inode *dir, umode_t mode)
if (S_ISDIR(mode))
set_nlink(inode, 2);
inode_init_owner(inode, dir, mode);
dquot_initialize(inode);
status = dquot_initialize(inode);
if (status)
return ERR_PTR(status);
return inode;
}
@ -264,7 +268,11 @@ static int ocfs2_mknod(struct inode *dir,
(unsigned long long)OCFS2_I(dir)->ip_blkno,
(unsigned long)dev, mode);
dquot_initialize(dir);
status = dquot_initialize(dir);
if (status) {
mlog_errno(status);
return status;
}
/* get our super block */
osb = OCFS2_SB(dir->i_sb);
@ -311,8 +319,9 @@ static int ocfs2_mknod(struct inode *dir,
}
inode = ocfs2_get_init_inode(dir, mode);
if (!inode) {
status = -ENOMEM;
if (IS_ERR(inode)) {
status = PTR_ERR(inode);
inode = NULL;
mlog_errno(status);
goto leave;
}
@ -708,7 +717,11 @@ static int ocfs2_link(struct dentry *old_dentry,
if (S_ISDIR(inode->i_mode))
return -EPERM;
dquot_initialize(dir);
err = dquot_initialize(dir);
if (err) {
mlog_errno(err);
return err;
}
err = ocfs2_double_lock(osb, &old_dir_bh, old_dir,
&parent_fe_bh, dir, 0);
@ -896,7 +909,11 @@ static int ocfs2_unlink(struct inode *dir,
(unsigned long long)OCFS2_I(dir)->ip_blkno,
(unsigned long long)OCFS2_I(inode)->ip_blkno);
dquot_initialize(dir);
status = dquot_initialize(dir);
if (status) {
mlog_errno(status);
return status;
}
BUG_ON(d_inode(dentry->d_parent) != dir);
@ -1230,8 +1247,16 @@ static int ocfs2_rename(struct inode *old_dir,
old_dentry->d_name.len, old_dentry->d_name.name,
new_dentry->d_name.len, new_dentry->d_name.name);
dquot_initialize(old_dir);
dquot_initialize(new_dir);
status = dquot_initialize(old_dir);
if (status) {
mlog_errno(status);
goto bail;
}
status = dquot_initialize(new_dir);
if (status) {
mlog_errno(status);
goto bail;
}
osb = OCFS2_SB(old_dir->i_sb);
@ -1786,7 +1811,11 @@ static int ocfs2_symlink(struct inode *dir,
trace_ocfs2_symlink_begin(dir, dentry, symname,
dentry->d_name.len, dentry->d_name.name);
dquot_initialize(dir);
status = dquot_initialize(dir);
if (status) {
mlog_errno(status);
goto bail;
}
sb = dir->i_sb;
osb = OCFS2_SB(sb);
@ -1831,8 +1860,9 @@ static int ocfs2_symlink(struct inode *dir,
}
inode = ocfs2_get_init_inode(dir, S_IFLNK | S_IRWXUGO);
if (!inode) {
status = -ENOMEM;
if (IS_ERR(inode)) {
status = PTR_ERR(inode);
inode = NULL;
mlog_errno(status);
goto bail;
}
@ -2485,8 +2515,9 @@ int ocfs2_create_inode_in_orphan(struct inode *dir,
}
inode = ocfs2_get_init_inode(dir, mode);
if (!inode) {
status = -ENOMEM;
if (IS_ERR(inode)) {
status = PTR_ERR(inode);
inode = NULL;
mlog_errno(status);
goto leave;
}

View File

@ -499,8 +499,8 @@ static int ocfs2_recover_local_quota_file(struct inode *lqinode,
dquot = dqget(sb,
make_kqid(&init_user_ns, type,
le64_to_cpu(dqblk->dqb_id)));
if (!dquot) {
status = -EIO;
if (IS_ERR(dquot)) {
status = PTR_ERR(dquot);
mlog(ML_ERROR, "Failed to get quota structure "
"for id %u, type %d. Cannot finish quota "
"file recovery.\n",

View File

@ -4419,8 +4419,9 @@ static int ocfs2_vfs_reflink(struct dentry *old_dentry, struct inode *dir,
}
mutex_lock(&inode->i_mutex);
dquot_initialize(dir);
error = ocfs2_reflink(old_dentry, dir, new_dentry, preserve);
error = dquot_initialize(dir);
if (!error)
error = ocfs2_reflink(old_dentry, dir, new_dentry, preserve);
mutex_unlock(&inode->i_mutex);
if (!error)
fsnotify_create(dir, new_dentry);

View File

@ -247,7 +247,7 @@ struct dqstats dqstats;
EXPORT_SYMBOL(dqstats);
static qsize_t inode_get_rsv_space(struct inode *inode);
static void __dquot_initialize(struct inode *inode, int type);
static int __dquot_initialize(struct inode *inode, int type);
static inline unsigned int
hashfn(const struct super_block *sb, struct kqid qid)
@ -832,16 +832,17 @@ static struct dquot *get_empty_dquot(struct super_block *sb, int type)
struct dquot *dqget(struct super_block *sb, struct kqid qid)
{
unsigned int hashent = hashfn(sb, qid);
struct dquot *dquot = NULL, *empty = NULL;
struct dquot *dquot, *empty = NULL;
if (!sb_has_quota_active(sb, qid.type))
return NULL;
return ERR_PTR(-ESRCH);
we_slept:
spin_lock(&dq_list_lock);
spin_lock(&dq_state_lock);
if (!sb_has_quota_active(sb, qid.type)) {
spin_unlock(&dq_state_lock);
spin_unlock(&dq_list_lock);
dquot = ERR_PTR(-ESRCH);
goto out;
}
spin_unlock(&dq_state_lock);
@ -876,11 +877,15 @@ struct dquot *dqget(struct super_block *sb, struct kqid qid)
* already finished or it will be canceled due to dq_count > 1 test */
wait_on_dquot(dquot);
/* Read the dquot / allocate space in quota file */
if (!test_bit(DQ_ACTIVE_B, &dquot->dq_flags) &&
sb->dq_op->acquire_dquot(dquot) < 0) {
dqput(dquot);
dquot = NULL;
goto out;
if (!test_bit(DQ_ACTIVE_B, &dquot->dq_flags)) {
int err;
err = sb->dq_op->acquire_dquot(dquot);
if (err < 0) {
dqput(dquot);
dquot = ERR_PTR(err);
goto out;
}
}
#ifdef CONFIG_QUOTA_DEBUG
BUG_ON(!dquot->dq_sb); /* Has somebody invalidated entry under us? */
@ -1390,15 +1395,16 @@ static int dquot_active(const struct inode *inode)
* It is better to call this function outside of any transaction as it
* might need a lot of space in journal for dquot structure allocation.
*/
static void __dquot_initialize(struct inode *inode, int type)
static int __dquot_initialize(struct inode *inode, int type)
{
int cnt, init_needed = 0;
struct dquot **dquots, *got[MAXQUOTAS];
struct super_block *sb = inode->i_sb;
qsize_t rsv;
int ret = 0;
if (!dquot_active(inode))
return;
return 0;
dquots = i_dquot(inode);
@ -1407,6 +1413,7 @@ static void __dquot_initialize(struct inode *inode, int type)
struct kqid qid;
kprojid_t projid;
int rc;
struct dquot *dquot;
got[cnt] = NULL;
if (type != -1 && cnt != type)
@ -1438,16 +1445,25 @@ static void __dquot_initialize(struct inode *inode, int type)
qid = make_kqid_projid(projid);
break;
}
got[cnt] = dqget(sb, qid);
dquot = dqget(sb, qid);
if (IS_ERR(dquot)) {
/* We raced with somebody turning quotas off... */
if (PTR_ERR(dquot) != -ESRCH) {
ret = PTR_ERR(dquot);
goto out_put;
}
dquot = NULL;
}
got[cnt] = dquot;
}
/* All required i_dquot has been initialized */
if (!init_needed)
return;
return 0;
spin_lock(&dq_data_lock);
if (IS_NOQUOTA(inode))
goto out_err;
goto out_lock;
for (cnt = 0; cnt < MAXQUOTAS; cnt++) {
if (type != -1 && cnt != type)
continue;
@ -1469,15 +1485,18 @@ static void __dquot_initialize(struct inode *inode, int type)
dquot_resv_space(dquots[cnt], rsv);
}
}
out_err:
out_lock:
spin_unlock(&dq_data_lock);
out_put:
/* Drop unused references */
dqput_all(got);
return ret;
}
void dquot_initialize(struct inode *inode)
int dquot_initialize(struct inode *inode)
{
__dquot_initialize(inode, -1);
return __dquot_initialize(inode, -1);
}
EXPORT_SYMBOL(dquot_initialize);
@ -1961,18 +1980,37 @@ EXPORT_SYMBOL(__dquot_transfer);
int dquot_transfer(struct inode *inode, struct iattr *iattr)
{
struct dquot *transfer_to[MAXQUOTAS] = {};
struct dquot *dquot;
struct super_block *sb = inode->i_sb;
int ret;
if (!dquot_active(inode))
return 0;
if (iattr->ia_valid & ATTR_UID && !uid_eq(iattr->ia_uid, inode->i_uid))
transfer_to[USRQUOTA] = dqget(sb, make_kqid_uid(iattr->ia_uid));
if (iattr->ia_valid & ATTR_GID && !gid_eq(iattr->ia_gid, inode->i_gid))
transfer_to[GRPQUOTA] = dqget(sb, make_kqid_gid(iattr->ia_gid));
if (iattr->ia_valid & ATTR_UID && !uid_eq(iattr->ia_uid, inode->i_uid)){
dquot = dqget(sb, make_kqid_uid(iattr->ia_uid));
if (IS_ERR(dquot)) {
if (PTR_ERR(dquot) != -ESRCH) {
ret = PTR_ERR(dquot);
goto out_put;
}
dquot = NULL;
}
transfer_to[USRQUOTA] = dquot;
}
if (iattr->ia_valid & ATTR_GID && !gid_eq(iattr->ia_gid, inode->i_gid)){
dquot = dqget(sb, make_kqid_gid(iattr->ia_gid));
if (IS_ERR(dquot)) {
if (PTR_ERR(dquot) != -ESRCH) {
ret = PTR_ERR(dquot);
goto out_put;
}
dquot = NULL;
}
transfer_to[GRPQUOTA] = dquot;
}
ret = __dquot_transfer(inode, transfer_to);
out_put:
dqput_all(transfer_to);
return ret;
}
@ -2518,8 +2556,8 @@ int dquot_get_dqblk(struct super_block *sb, struct kqid qid,
struct dquot *dquot;
dquot = dqget(sb, qid);
if (!dquot)
return -ESRCH;
if (IS_ERR(dquot))
return PTR_ERR(dquot);
do_get_dqblk(dquot, di);
dqput(dquot);
@ -2631,8 +2669,8 @@ int dquot_set_dqblk(struct super_block *sb, struct kqid qid,
int rc;
dquot = dqget(sb, qid);
if (!dquot) {
rc = -ESRCH;
if (IS_ERR(dquot)) {
rc = PTR_ERR(dquot);
goto out;
}
rc = do_set_dqblk(dquot, di);

View File

@ -141,9 +141,9 @@ static int quota_getinfo(struct super_block *sb, int type, void __user *addr)
if (tstate->flags & QCI_ROOT_SQUASH)
uinfo.dqi_flags |= DQF_ROOT_SQUASH;
uinfo.dqi_valid = IIF_ALL;
if (!ret && copy_to_user(addr, &uinfo, sizeof(uinfo)))
if (copy_to_user(addr, &uinfo, sizeof(uinfo)))
return -EFAULT;
return ret;
return 0;
}
static int quota_setinfo(struct super_block *sb, int type, void __user *addr)

View File

@ -3319,8 +3319,11 @@ int reiserfs_setattr(struct dentry *dentry, struct iattr *attr)
/* must be turned off for recursive notify_change calls */
ia_valid = attr->ia_valid &= ~(ATTR_KILL_SUID|ATTR_KILL_SGID);
if (is_quota_modification(inode, attr))
dquot_initialize(inode);
if (is_quota_modification(inode, attr)) {
error = dquot_initialize(inode);
if (error)
return error;
}
reiserfs_write_lock(inode->i_sb);
if (attr->ia_valid & ATTR_SIZE) {
/*

View File

@ -613,8 +613,7 @@ static int new_inode_init(struct inode *inode, struct inode *dir, umode_t mode)
* we have to set uid and gid here
*/
inode_init_owner(inode, dir, mode);
dquot_initialize(inode);
return 0;
return dquot_initialize(inode);
}
static int reiserfs_create(struct inode *dir, struct dentry *dentry, umode_t mode,
@ -633,12 +632,18 @@ static int reiserfs_create(struct inode *dir, struct dentry *dentry, umode_t mod
struct reiserfs_transaction_handle th;
struct reiserfs_security_handle security;
dquot_initialize(dir);
retval = dquot_initialize(dir);
if (retval)
return retval;
if (!(inode = new_inode(dir->i_sb))) {
return -ENOMEM;
}
new_inode_init(inode, dir, mode);
retval = new_inode_init(inode, dir, mode);
if (retval) {
drop_new_inode(inode);
return retval;
}
jbegin_count += reiserfs_cache_default_acl(dir);
retval = reiserfs_security_init(dir, inode, &dentry->d_name, &security);
@ -710,12 +715,18 @@ static int reiserfs_mknod(struct inode *dir, struct dentry *dentry, umode_t mode
if (!new_valid_dev(rdev))
return -EINVAL;
dquot_initialize(dir);
retval = dquot_initialize(dir);
if (retval)
return retval;
if (!(inode = new_inode(dir->i_sb))) {
return -ENOMEM;
}
new_inode_init(inode, dir, mode);
retval = new_inode_init(inode, dir, mode);
if (retval) {
drop_new_inode(inode);
return retval;
}
jbegin_count += reiserfs_cache_default_acl(dir);
retval = reiserfs_security_init(dir, inode, &dentry->d_name, &security);
@ -787,7 +798,9 @@ static int reiserfs_mkdir(struct inode *dir, struct dentry *dentry, umode_t mode
2 * (REISERFS_QUOTA_INIT_BLOCKS(dir->i_sb) +
REISERFS_QUOTA_TRANS_BLOCKS(dir->i_sb));
dquot_initialize(dir);
retval = dquot_initialize(dir);
if (retval)
return retval;
#ifdef DISPLACE_NEW_PACKING_LOCALITIES
/*
@ -800,7 +813,11 @@ static int reiserfs_mkdir(struct inode *dir, struct dentry *dentry, umode_t mode
if (!(inode = new_inode(dir->i_sb))) {
return -ENOMEM;
}
new_inode_init(inode, dir, mode);
retval = new_inode_init(inode, dir, mode);
if (retval) {
drop_new_inode(inode);
return retval;
}
jbegin_count += reiserfs_cache_default_acl(dir);
retval = reiserfs_security_init(dir, inode, &dentry->d_name, &security);
@ -899,7 +916,9 @@ static int reiserfs_rmdir(struct inode *dir, struct dentry *dentry)
JOURNAL_PER_BALANCE_CNT * 2 + 2 +
4 * REISERFS_QUOTA_TRANS_BLOCKS(dir->i_sb);
dquot_initialize(dir);
retval = dquot_initialize(dir);
if (retval)
return retval;
reiserfs_write_lock(dir->i_sb);
retval = journal_begin(&th, dir->i_sb, jbegin_count);
@ -985,7 +1004,9 @@ static int reiserfs_unlink(struct inode *dir, struct dentry *dentry)
int jbegin_count;
unsigned long savelink;
dquot_initialize(dir);
retval = dquot_initialize(dir);
if (retval)
return retval;
inode = d_inode(dentry);
@ -1095,12 +1116,18 @@ static int reiserfs_symlink(struct inode *parent_dir,
2 * (REISERFS_QUOTA_INIT_BLOCKS(parent_dir->i_sb) +
REISERFS_QUOTA_TRANS_BLOCKS(parent_dir->i_sb));
dquot_initialize(parent_dir);
retval = dquot_initialize(parent_dir);
if (retval)
return retval;
if (!(inode = new_inode(parent_dir->i_sb))) {
return -ENOMEM;
}
new_inode_init(inode, parent_dir, mode);
retval = new_inode_init(inode, parent_dir, mode);
if (retval) {
drop_new_inode(inode);
return retval;
}
retval = reiserfs_security_init(parent_dir, inode, &dentry->d_name,
&security);
@ -1184,7 +1211,9 @@ static int reiserfs_link(struct dentry *old_dentry, struct inode *dir,
JOURNAL_PER_BALANCE_CNT * 3 +
2 * REISERFS_QUOTA_TRANS_BLOCKS(dir->i_sb);
dquot_initialize(dir);
retval = dquot_initialize(dir);
if (retval)
return retval;
reiserfs_write_lock(dir->i_sb);
if (inode->i_nlink >= REISERFS_LINK_MAX) {
@ -1308,8 +1337,12 @@ static int reiserfs_rename(struct inode *old_dir, struct dentry *old_dentry,
JOURNAL_PER_BALANCE_CNT * 3 + 5 +
4 * REISERFS_QUOTA_TRANS_BLOCKS(old_dir->i_sb);
dquot_initialize(old_dir);
dquot_initialize(new_dir);
retval = dquot_initialize(old_dir);
if (retval)
return retval;
retval = dquot_initialize(new_dir);
if (retval)
return retval;
old_inode = d_inode(old_dentry);
new_dentry_inode = d_inode(new_dentry);

View File

@ -2070,6 +2070,7 @@ static int udf_fill_super(struct super_block *sb, void *options, int silent)
struct udf_options uopt;
struct kernel_lb_addr rootdir, fileset;
struct udf_sb_info *sbi;
bool lvid_open = false;
uopt.flags = (1 << UDF_FLAG_USE_AD_IN_ICB) | (1 << UDF_FLAG_STRICT);
uopt.uid = INVALID_UID;
@ -2216,8 +2217,10 @@ static int udf_fill_super(struct super_block *sb, void *options, int silent)
le16_to_cpu(ts.year), ts.month, ts.day,
ts.hour, ts.minute, le16_to_cpu(ts.typeAndTimezone));
}
if (!(sb->s_flags & MS_RDONLY))
if (!(sb->s_flags & MS_RDONLY)) {
udf_open_lvid(sb);
lvid_open = true;
}
/* Assign the root inode */
/* assign inodes by physical block number */
@ -2248,7 +2251,7 @@ static int udf_fill_super(struct super_block *sb, void *options, int silent)
if (UDF_QUERY_FLAG(sb, UDF_FLAG_NLS_MAP))
unload_nls(sbi->s_nls_map);
#endif
if (!(sb->s_flags & MS_RDONLY))
if (lvid_open)
udf_close_lvid(sb);
brelse(sbi->s_lvid_bh);
udf_sb_free_partitions(sb);

View File

@ -118,9 +118,8 @@ struct bio {
#define BIO_USER_MAPPED 4 /* contains user pages */
#define BIO_NULL_MAPPED 5 /* contains invalid user pages */
#define BIO_QUIET 6 /* Make BIO Quiet */
#define BIO_SNAP_STABLE 7 /* bio data must be snapshotted during write */
#define BIO_CHAIN 8 /* chained bio, ->bi_remaining in effect */
#define BIO_REFFED 9 /* bio has elevated ->bi_cnt */
#define BIO_CHAIN 7 /* chained bio, ->bi_remaining in effect */
#define BIO_REFFED 8 /* bio has elevated ->bi_cnt */
/*
* Flags starting here get preserved by bio_reset() - this includes

File diff suppressed because it is too large Load Diff

View File

@ -29,6 +29,7 @@
#include <linux/mutex.h>
#include <linux/timer.h>
#include <linux/slab.h>
#include <linux/bit_spinlock.h>
#include <crypto/hash.h>
#endif
@ -336,7 +337,45 @@ BUFFER_FNS(Freed, freed)
BUFFER_FNS(Shadow, shadow)
BUFFER_FNS(Verified, verified)
#include <linux/jbd_common.h>
static inline struct buffer_head *jh2bh(struct journal_head *jh)
{
return jh->b_bh;
}
static inline struct journal_head *bh2jh(struct buffer_head *bh)
{
return bh->b_private;
}
static inline void jbd_lock_bh_state(struct buffer_head *bh)
{
bit_spin_lock(BH_State, &bh->b_state);
}
static inline int jbd_trylock_bh_state(struct buffer_head *bh)
{
return bit_spin_trylock(BH_State, &bh->b_state);
}
static inline int jbd_is_locked_bh_state(struct buffer_head *bh)
{
return bit_spin_is_locked(BH_State, &bh->b_state);
}
static inline void jbd_unlock_bh_state(struct buffer_head *bh)
{
bit_spin_unlock(BH_State, &bh->b_state);
}
static inline void jbd_lock_bh_journal_head(struct buffer_head *bh)
{
bit_spin_lock(BH_JournalHead, &bh->b_state);
}
static inline void jbd_unlock_bh_journal_head(struct buffer_head *bh)
{
bit_spin_unlock(BH_JournalHead, &bh->b_state);
}
#define J_ASSERT(assert) BUG_ON(!(assert))

View File

@ -1,46 +0,0 @@
#ifndef _LINUX_JBD_STATE_H
#define _LINUX_JBD_STATE_H
#include <linux/bit_spinlock.h>
static inline struct buffer_head *jh2bh(struct journal_head *jh)
{
return jh->b_bh;
}
static inline struct journal_head *bh2jh(struct buffer_head *bh)
{
return bh->b_private;
}
static inline void jbd_lock_bh_state(struct buffer_head *bh)
{
bit_spin_lock(BH_State, &bh->b_state);
}
static inline int jbd_trylock_bh_state(struct buffer_head *bh)
{
return bit_spin_trylock(BH_State, &bh->b_state);
}
static inline int jbd_is_locked_bh_state(struct buffer_head *bh)
{
return bit_spin_is_locked(BH_State, &bh->b_state);
}
static inline void jbd_unlock_bh_state(struct buffer_head *bh)
{
bit_spin_unlock(BH_State, &bh->b_state);
}
static inline void jbd_lock_bh_journal_head(struct buffer_head *bh)
{
bit_spin_lock(BH_JournalHead, &bh->b_state);
}
static inline void jbd_unlock_bh_journal_head(struct buffer_head *bh)
{
bit_spin_unlock(BH_JournalHead, &bh->b_state);
}
#endif

View File

@ -43,7 +43,7 @@ void inode_claim_rsv_space(struct inode *inode, qsize_t number);
void inode_sub_rsv_space(struct inode *inode, qsize_t number);
void inode_reclaim_rsv_space(struct inode *inode, qsize_t number);
void dquot_initialize(struct inode *inode);
int dquot_initialize(struct inode *inode);
void dquot_drop(struct inode *inode);
struct dquot *dqget(struct super_block *sb, struct kqid qid);
static inline struct dquot *dqgrab(struct dquot *dquot)
@ -200,8 +200,9 @@ static inline int sb_has_quota_active(struct super_block *sb, int type)
return 0;
}
static inline void dquot_initialize(struct inode *inode)
static inline int dquot_initialize(struct inode *inode)
{
return 0;
}
static inline void dquot_drop(struct inode *inode)

View File

@ -1,866 +0,0 @@
#undef TRACE_SYSTEM
#define TRACE_SYSTEM ext3
#if !defined(_TRACE_EXT3_H) || defined(TRACE_HEADER_MULTI_READ)
#define _TRACE_EXT3_H
#include <linux/tracepoint.h>
TRACE_EVENT(ext3_free_inode,
TP_PROTO(struct inode *inode),
TP_ARGS(inode),
TP_STRUCT__entry(
__field( dev_t, dev )
__field( ino_t, ino )
__field( umode_t, mode )
__field( uid_t, uid )
__field( gid_t, gid )
__field( blkcnt_t, blocks )
),
TP_fast_assign(
__entry->dev = inode->i_sb->s_dev;
__entry->ino = inode->i_ino;
__entry->mode = inode->i_mode;
__entry->uid = i_uid_read(inode);
__entry->gid = i_gid_read(inode);
__entry->blocks = inode->i_blocks;
),
TP_printk("dev %d,%d ino %lu mode 0%o uid %u gid %u blocks %lu",
MAJOR(__entry->dev), MINOR(__entry->dev),
(unsigned long) __entry->ino,
__entry->mode, __entry->uid, __entry->gid,
(unsigned long) __entry->blocks)
);
TRACE_EVENT(ext3_request_inode,
TP_PROTO(struct inode *dir, int mode),
TP_ARGS(dir, mode),
TP_STRUCT__entry(
__field( dev_t, dev )
__field( ino_t, dir )
__field( umode_t, mode )
),
TP_fast_assign(
__entry->dev = dir->i_sb->s_dev;
__entry->dir = dir->i_ino;
__entry->mode = mode;
),
TP_printk("dev %d,%d dir %lu mode 0%o",
MAJOR(__entry->dev), MINOR(__entry->dev),
(unsigned long) __entry->dir, __entry->mode)
);
TRACE_EVENT(ext3_allocate_inode,
TP_PROTO(struct inode *inode, struct inode *dir, int mode),
TP_ARGS(inode, dir, mode),
TP_STRUCT__entry(
__field( dev_t, dev )
__field( ino_t, ino )
__field( ino_t, dir )
__field( umode_t, mode )
),
TP_fast_assign(
__entry->dev = inode->i_sb->s_dev;
__entry->ino = inode->i_ino;
__entry->dir = dir->i_ino;
__entry->mode = mode;
),
TP_printk("dev %d,%d ino %lu dir %lu mode 0%o",
MAJOR(__entry->dev), MINOR(__entry->dev),
(unsigned long) __entry->ino,
(unsigned long) __entry->dir, __entry->mode)
);
TRACE_EVENT(ext3_evict_inode,
TP_PROTO(struct inode *inode),
TP_ARGS(inode),
TP_STRUCT__entry(
__field( dev_t, dev )
__field( ino_t, ino )
__field( int, nlink )
),
TP_fast_assign(
__entry->dev = inode->i_sb->s_dev;
__entry->ino = inode->i_ino;
__entry->nlink = inode->i_nlink;
),
TP_printk("dev %d,%d ino %lu nlink %d",
MAJOR(__entry->dev), MINOR(__entry->dev),
(unsigned long) __entry->ino, __entry->nlink)
);
TRACE_EVENT(ext3_drop_inode,
TP_PROTO(struct inode *inode, int drop),
TP_ARGS(inode, drop),
TP_STRUCT__entry(
__field( dev_t, dev )
__field( ino_t, ino )
__field( int, drop )
),
TP_fast_assign(
__entry->dev = inode->i_sb->s_dev;
__entry->ino = inode->i_ino;
__entry->drop = drop;
),
TP_printk("dev %d,%d ino %lu drop %d",
MAJOR(__entry->dev), MINOR(__entry->dev),
(unsigned long) __entry->ino, __entry->drop)
);
TRACE_EVENT(ext3_mark_inode_dirty,
TP_PROTO(struct inode *inode, unsigned long IP),
TP_ARGS(inode, IP),
TP_STRUCT__entry(
__field( dev_t, dev )
__field( ino_t, ino )
__field(unsigned long, ip )
),
TP_fast_assign(
__entry->dev = inode->i_sb->s_dev;
__entry->ino = inode->i_ino;
__entry->ip = IP;
),
TP_printk("dev %d,%d ino %lu caller %pS",
MAJOR(__entry->dev), MINOR(__entry->dev),
(unsigned long) __entry->ino, (void *)__entry->ip)
);
TRACE_EVENT(ext3_write_begin,
TP_PROTO(struct inode *inode, loff_t pos, unsigned int len,
unsigned int flags),
TP_ARGS(inode, pos, len, flags),
TP_STRUCT__entry(
__field( dev_t, dev )
__field( ino_t, ino )
__field( loff_t, pos )
__field( unsigned int, len )
__field( unsigned int, flags )
),
TP_fast_assign(
__entry->dev = inode->i_sb->s_dev;
__entry->ino = inode->i_ino;
__entry->pos = pos;
__entry->len = len;
__entry->flags = flags;
),
TP_printk("dev %d,%d ino %lu pos %llu len %u flags %u",
MAJOR(__entry->dev), MINOR(__entry->dev),
(unsigned long) __entry->ino,
(unsigned long long) __entry->pos, __entry->len,
__entry->flags)
);
DECLARE_EVENT_CLASS(ext3__write_end,
TP_PROTO(struct inode *inode, loff_t pos, unsigned int len,
unsigned int copied),
TP_ARGS(inode, pos, len, copied),
TP_STRUCT__entry(
__field( dev_t, dev )
__field( ino_t, ino )
__field( loff_t, pos )
__field( unsigned int, len )
__field( unsigned int, copied )
),
TP_fast_assign(
__entry->dev = inode->i_sb->s_dev;
__entry->ino = inode->i_ino;
__entry->pos = pos;
__entry->len = len;
__entry->copied = copied;
),
TP_printk("dev %d,%d ino %lu pos %llu len %u copied %u",
MAJOR(__entry->dev), MINOR(__entry->dev),
(unsigned long) __entry->ino,
(unsigned long long) __entry->pos, __entry->len,
__entry->copied)
);
DEFINE_EVENT(ext3__write_end, ext3_ordered_write_end,
TP_PROTO(struct inode *inode, loff_t pos, unsigned int len,
unsigned int copied),
TP_ARGS(inode, pos, len, copied)
);
DEFINE_EVENT(ext3__write_end, ext3_writeback_write_end,
TP_PROTO(struct inode *inode, loff_t pos, unsigned int len,
unsigned int copied),
TP_ARGS(inode, pos, len, copied)
);
DEFINE_EVENT(ext3__write_end, ext3_journalled_write_end,
TP_PROTO(struct inode *inode, loff_t pos, unsigned int len,
unsigned int copied),
TP_ARGS(inode, pos, len, copied)
);
DECLARE_EVENT_CLASS(ext3__page_op,
TP_PROTO(struct page *page),
TP_ARGS(page),
TP_STRUCT__entry(
__field( dev_t, dev )
__field( ino_t, ino )
__field( pgoff_t, index )
),
TP_fast_assign(
__entry->index = page->index;
__entry->ino = page->mapping->host->i_ino;
__entry->dev = page->mapping->host->i_sb->s_dev;
),
TP_printk("dev %d,%d ino %lu page_index %lu",
MAJOR(__entry->dev), MINOR(__entry->dev),
(unsigned long) __entry->ino, __entry->index)
);
DEFINE_EVENT(ext3__page_op, ext3_ordered_writepage,
TP_PROTO(struct page *page),
TP_ARGS(page)
);
DEFINE_EVENT(ext3__page_op, ext3_writeback_writepage,
TP_PROTO(struct page *page),
TP_ARGS(page)
);
DEFINE_EVENT(ext3__page_op, ext3_journalled_writepage,
TP_PROTO(struct page *page),
TP_ARGS(page)
);
DEFINE_EVENT(ext3__page_op, ext3_readpage,
TP_PROTO(struct page *page),
TP_ARGS(page)
);
DEFINE_EVENT(ext3__page_op, ext3_releasepage,
TP_PROTO(struct page *page),
TP_ARGS(page)
);
TRACE_EVENT(ext3_invalidatepage,
TP_PROTO(struct page *page, unsigned int offset, unsigned int length),
TP_ARGS(page, offset, length),
TP_STRUCT__entry(
__field( pgoff_t, index )
__field( unsigned int, offset )
__field( unsigned int, length )
__field( ino_t, ino )
__field( dev_t, dev )
),
TP_fast_assign(
__entry->index = page->index;
__entry->offset = offset;
__entry->length = length;
__entry->ino = page->mapping->host->i_ino;
__entry->dev = page->mapping->host->i_sb->s_dev;
),
TP_printk("dev %d,%d ino %lu page_index %lu offset %u length %u",
MAJOR(__entry->dev), MINOR(__entry->dev),
(unsigned long) __entry->ino,
__entry->index, __entry->offset, __entry->length)
);
TRACE_EVENT(ext3_discard_blocks,
TP_PROTO(struct super_block *sb, unsigned long blk,
unsigned long count),
TP_ARGS(sb, blk, count),
TP_STRUCT__entry(
__field( dev_t, dev )
__field( unsigned long, blk )
__field( unsigned long, count )
),
TP_fast_assign(
__entry->dev = sb->s_dev;
__entry->blk = blk;
__entry->count = count;
),
TP_printk("dev %d,%d blk %lu count %lu",
MAJOR(__entry->dev), MINOR(__entry->dev),
__entry->blk, __entry->count)
);
TRACE_EVENT(ext3_request_blocks,
TP_PROTO(struct inode *inode, unsigned long goal,
unsigned long count),
TP_ARGS(inode, goal, count),
TP_STRUCT__entry(
__field( dev_t, dev )
__field( ino_t, ino )
__field( unsigned long, count )
__field( unsigned long, goal )
),
TP_fast_assign(
__entry->dev = inode->i_sb->s_dev;
__entry->ino = inode->i_ino;
__entry->count = count;
__entry->goal = goal;
),
TP_printk("dev %d,%d ino %lu count %lu goal %lu ",
MAJOR(__entry->dev), MINOR(__entry->dev),
(unsigned long) __entry->ino,
__entry->count, __entry->goal)
);
TRACE_EVENT(ext3_allocate_blocks,
TP_PROTO(struct inode *inode, unsigned long goal,
unsigned long count, unsigned long block),
TP_ARGS(inode, goal, count, block),
TP_STRUCT__entry(
__field( dev_t, dev )
__field( ino_t, ino )
__field( unsigned long, block )
__field( unsigned long, count )
__field( unsigned long, goal )
),
TP_fast_assign(
__entry->dev = inode->i_sb->s_dev;
__entry->ino = inode->i_ino;
__entry->block = block;
__entry->count = count;
__entry->goal = goal;
),
TP_printk("dev %d,%d ino %lu count %lu block %lu goal %lu",
MAJOR(__entry->dev), MINOR(__entry->dev),
(unsigned long) __entry->ino,
__entry->count, __entry->block,
__entry->goal)
);
TRACE_EVENT(ext3_free_blocks,
TP_PROTO(struct inode *inode, unsigned long block,
unsigned long count),
TP_ARGS(inode, block, count),
TP_STRUCT__entry(
__field( dev_t, dev )
__field( ino_t, ino )
__field( umode_t, mode )
__field( unsigned long, block )
__field( unsigned long, count )
),
TP_fast_assign(
__entry->dev = inode->i_sb->s_dev;
__entry->ino = inode->i_ino;
__entry->mode = inode->i_mode;
__entry->block = block;
__entry->count = count;
),
TP_printk("dev %d,%d ino %lu mode 0%o block %lu count %lu",
MAJOR(__entry->dev), MINOR(__entry->dev),
(unsigned long) __entry->ino,
__entry->mode, __entry->block, __entry->count)
);
TRACE_EVENT(ext3_sync_file_enter,
TP_PROTO(struct file *file, int datasync),
TP_ARGS(file, datasync),
TP_STRUCT__entry(
__field( dev_t, dev )
__field( ino_t, ino )
__field( ino_t, parent )
__field( int, datasync )
),
TP_fast_assign(
struct dentry *dentry = file->f_path.dentry;
__entry->dev = d_inode(dentry)->i_sb->s_dev;
__entry->ino = d_inode(dentry)->i_ino;
__entry->datasync = datasync;
__entry->parent = d_inode(dentry->d_parent)->i_ino;
),
TP_printk("dev %d,%d ino %lu parent %ld datasync %d ",
MAJOR(__entry->dev), MINOR(__entry->dev),
(unsigned long) __entry->ino,
(unsigned long) __entry->parent, __entry->datasync)
);
TRACE_EVENT(ext3_sync_file_exit,
TP_PROTO(struct inode *inode, int ret),
TP_ARGS(inode, ret),
TP_STRUCT__entry(
__field( int, ret )
__field( ino_t, ino )
__field( dev_t, dev )
),
TP_fast_assign(
__entry->ret = ret;
__entry->ino = inode->i_ino;
__entry->dev = inode->i_sb->s_dev;
),
TP_printk("dev %d,%d ino %lu ret %d",
MAJOR(__entry->dev), MINOR(__entry->dev),
(unsigned long) __entry->ino,
__entry->ret)
);
TRACE_EVENT(ext3_sync_fs,
TP_PROTO(struct super_block *sb, int wait),
TP_ARGS(sb, wait),
TP_STRUCT__entry(
__field( dev_t, dev )
__field( int, wait )
),
TP_fast_assign(
__entry->dev = sb->s_dev;
__entry->wait = wait;
),
TP_printk("dev %d,%d wait %d",
MAJOR(__entry->dev), MINOR(__entry->dev),
__entry->wait)
);
TRACE_EVENT(ext3_rsv_window_add,
TP_PROTO(struct super_block *sb,
struct ext3_reserve_window_node *rsv_node),
TP_ARGS(sb, rsv_node),
TP_STRUCT__entry(
__field( unsigned long, start )
__field( unsigned long, end )
__field( dev_t, dev )
),
TP_fast_assign(
__entry->dev = sb->s_dev;
__entry->start = rsv_node->rsv_window._rsv_start;
__entry->end = rsv_node->rsv_window._rsv_end;
),
TP_printk("dev %d,%d start %lu end %lu",
MAJOR(__entry->dev), MINOR(__entry->dev),
__entry->start, __entry->end)
);
TRACE_EVENT(ext3_discard_reservation,
TP_PROTO(struct inode *inode,
struct ext3_reserve_window_node *rsv_node),
TP_ARGS(inode, rsv_node),
TP_STRUCT__entry(
__field( unsigned long, start )
__field( unsigned long, end )
__field( ino_t, ino )
__field( dev_t, dev )
),
TP_fast_assign(
__entry->start = rsv_node->rsv_window._rsv_start;
__entry->end = rsv_node->rsv_window._rsv_end;
__entry->ino = inode->i_ino;
__entry->dev = inode->i_sb->s_dev;
),
TP_printk("dev %d,%d ino %lu start %lu end %lu",
MAJOR(__entry->dev), MINOR(__entry->dev),
(unsigned long)__entry->ino, __entry->start,
__entry->end)
);
TRACE_EVENT(ext3_alloc_new_reservation,
TP_PROTO(struct super_block *sb, unsigned long goal),
TP_ARGS(sb, goal),
TP_STRUCT__entry(
__field( dev_t, dev )
__field( unsigned long, goal )
),
TP_fast_assign(
__entry->dev = sb->s_dev;
__entry->goal = goal;
),
TP_printk("dev %d,%d goal %lu",
MAJOR(__entry->dev), MINOR(__entry->dev),
__entry->goal)
);
TRACE_EVENT(ext3_reserved,
TP_PROTO(struct super_block *sb, unsigned long block,
struct ext3_reserve_window_node *rsv_node),
TP_ARGS(sb, block, rsv_node),
TP_STRUCT__entry(
__field( unsigned long, block )
__field( unsigned long, start )
__field( unsigned long, end )
__field( dev_t, dev )
),
TP_fast_assign(
__entry->block = block;
__entry->start = rsv_node->rsv_window._rsv_start;
__entry->end = rsv_node->rsv_window._rsv_end;
__entry->dev = sb->s_dev;
),
TP_printk("dev %d,%d block %lu, start %lu end %lu",
MAJOR(__entry->dev), MINOR(__entry->dev),
__entry->block, __entry->start, __entry->end)
);
TRACE_EVENT(ext3_forget,
TP_PROTO(struct inode *inode, int is_metadata, unsigned long block),
TP_ARGS(inode, is_metadata, block),
TP_STRUCT__entry(
__field( dev_t, dev )
__field( ino_t, ino )
__field( umode_t, mode )
__field( int, is_metadata )
__field( unsigned long, block )
),
TP_fast_assign(
__entry->dev = inode->i_sb->s_dev;
__entry->ino = inode->i_ino;
__entry->mode = inode->i_mode;
__entry->is_metadata = is_metadata;
__entry->block = block;
),
TP_printk("dev %d,%d ino %lu mode 0%o is_metadata %d block %lu",
MAJOR(__entry->dev), MINOR(__entry->dev),
(unsigned long) __entry->ino,
__entry->mode, __entry->is_metadata, __entry->block)
);
TRACE_EVENT(ext3_read_block_bitmap,
TP_PROTO(struct super_block *sb, unsigned int group),
TP_ARGS(sb, group),
TP_STRUCT__entry(
__field( dev_t, dev )
__field( __u32, group )
),
TP_fast_assign(
__entry->dev = sb->s_dev;
__entry->group = group;
),
TP_printk("dev %d,%d group %u",
MAJOR(__entry->dev), MINOR(__entry->dev),
__entry->group)
);
TRACE_EVENT(ext3_direct_IO_enter,
TP_PROTO(struct inode *inode, loff_t offset, unsigned long len, int rw),
TP_ARGS(inode, offset, len, rw),
TP_STRUCT__entry(
__field( ino_t, ino )
__field( dev_t, dev )
__field( loff_t, pos )
__field( unsigned long, len )
__field( int, rw )
),
TP_fast_assign(
__entry->ino = inode->i_ino;
__entry->dev = inode->i_sb->s_dev;
__entry->pos = offset;
__entry->len = len;
__entry->rw = rw;
),
TP_printk("dev %d,%d ino %lu pos %llu len %lu rw %d",
MAJOR(__entry->dev), MINOR(__entry->dev),
(unsigned long) __entry->ino,
(unsigned long long) __entry->pos, __entry->len,
__entry->rw)
);
TRACE_EVENT(ext3_direct_IO_exit,
TP_PROTO(struct inode *inode, loff_t offset, unsigned long len,
int rw, int ret),
TP_ARGS(inode, offset, len, rw, ret),
TP_STRUCT__entry(
__field( ino_t, ino )
__field( dev_t, dev )
__field( loff_t, pos )
__field( unsigned long, len )
__field( int, rw )
__field( int, ret )
),
TP_fast_assign(
__entry->ino = inode->i_ino;
__entry->dev = inode->i_sb->s_dev;
__entry->pos = offset;
__entry->len = len;
__entry->rw = rw;
__entry->ret = ret;
),
TP_printk("dev %d,%d ino %lu pos %llu len %lu rw %d ret %d",
MAJOR(__entry->dev), MINOR(__entry->dev),
(unsigned long) __entry->ino,
(unsigned long long) __entry->pos, __entry->len,
__entry->rw, __entry->ret)
);
TRACE_EVENT(ext3_unlink_enter,
TP_PROTO(struct inode *parent, struct dentry *dentry),
TP_ARGS(parent, dentry),
TP_STRUCT__entry(
__field( ino_t, parent )
__field( ino_t, ino )
__field( loff_t, size )
__field( dev_t, dev )
),
TP_fast_assign(
__entry->parent = parent->i_ino;
__entry->ino = d_inode(dentry)->i_ino;
__entry->size = d_inode(dentry)->i_size;
__entry->dev = d_inode(dentry)->i_sb->s_dev;
),
TP_printk("dev %d,%d ino %lu size %lld parent %ld",
MAJOR(__entry->dev), MINOR(__entry->dev),
(unsigned long) __entry->ino,
(unsigned long long)__entry->size,
(unsigned long) __entry->parent)
);
TRACE_EVENT(ext3_unlink_exit,
TP_PROTO(struct dentry *dentry, int ret),
TP_ARGS(dentry, ret),
TP_STRUCT__entry(
__field( ino_t, ino )
__field( dev_t, dev )
__field( int, ret )
),
TP_fast_assign(
__entry->ino = d_inode(dentry)->i_ino;
__entry->dev = d_inode(dentry)->i_sb->s_dev;
__entry->ret = ret;
),
TP_printk("dev %d,%d ino %lu ret %d",
MAJOR(__entry->dev), MINOR(__entry->dev),
(unsigned long) __entry->ino,
__entry->ret)
);
DECLARE_EVENT_CLASS(ext3__truncate,
TP_PROTO(struct inode *inode),
TP_ARGS(inode),
TP_STRUCT__entry(
__field( ino_t, ino )
__field( dev_t, dev )
__field( blkcnt_t, blocks )
),
TP_fast_assign(
__entry->ino = inode->i_ino;
__entry->dev = inode->i_sb->s_dev;
__entry->blocks = inode->i_blocks;
),
TP_printk("dev %d,%d ino %lu blocks %lu",
MAJOR(__entry->dev), MINOR(__entry->dev),
(unsigned long) __entry->ino, (unsigned long) __entry->blocks)
);
DEFINE_EVENT(ext3__truncate, ext3_truncate_enter,
TP_PROTO(struct inode *inode),
TP_ARGS(inode)
);
DEFINE_EVENT(ext3__truncate, ext3_truncate_exit,
TP_PROTO(struct inode *inode),
TP_ARGS(inode)
);
TRACE_EVENT(ext3_get_blocks_enter,
TP_PROTO(struct inode *inode, unsigned long lblk,
unsigned long len, int create),
TP_ARGS(inode, lblk, len, create),
TP_STRUCT__entry(
__field( ino_t, ino )
__field( dev_t, dev )
__field( unsigned long, lblk )
__field( unsigned long, len )
__field( int, create )
),
TP_fast_assign(
__entry->ino = inode->i_ino;
__entry->dev = inode->i_sb->s_dev;
__entry->lblk = lblk;
__entry->len = len;
__entry->create = create;
),
TP_printk("dev %d,%d ino %lu lblk %lu len %lu create %u",
MAJOR(__entry->dev), MINOR(__entry->dev),
(unsigned long) __entry->ino,
__entry->lblk, __entry->len, __entry->create)
);
TRACE_EVENT(ext3_get_blocks_exit,
TP_PROTO(struct inode *inode, unsigned long lblk,
unsigned long pblk, unsigned long len, int ret),
TP_ARGS(inode, lblk, pblk, len, ret),
TP_STRUCT__entry(
__field( ino_t, ino )
__field( dev_t, dev )
__field( unsigned long, lblk )
__field( unsigned long, pblk )
__field( unsigned long, len )
__field( int, ret )
),
TP_fast_assign(
__entry->ino = inode->i_ino;
__entry->dev = inode->i_sb->s_dev;
__entry->lblk = lblk;
__entry->pblk = pblk;
__entry->len = len;
__entry->ret = ret;
),
TP_printk("dev %d,%d ino %lu lblk %lu pblk %lu len %lu ret %d",
MAJOR(__entry->dev), MINOR(__entry->dev),
(unsigned long) __entry->ino,
__entry->lblk, __entry->pblk,
__entry->len, __entry->ret)
);
TRACE_EVENT(ext3_load_inode,
TP_PROTO(struct inode *inode),
TP_ARGS(inode),
TP_STRUCT__entry(
__field( ino_t, ino )
__field( dev_t, dev )
),
TP_fast_assign(
__entry->ino = inode->i_ino;
__entry->dev = inode->i_sb->s_dev;
),
TP_printk("dev %d,%d ino %lu",
MAJOR(__entry->dev), MINOR(__entry->dev),
(unsigned long) __entry->ino)
);
#endif /* _TRACE_EXT3_H */
/* This part must be outside protection */
#include <trace/define_trace.h>

View File

@ -1,194 +0,0 @@
#undef TRACE_SYSTEM
#define TRACE_SYSTEM jbd
#if !defined(_TRACE_JBD_H) || defined(TRACE_HEADER_MULTI_READ)
#define _TRACE_JBD_H
#include <linux/jbd.h>
#include <linux/tracepoint.h>
TRACE_EVENT(jbd_checkpoint,
TP_PROTO(journal_t *journal, int result),
TP_ARGS(journal, result),
TP_STRUCT__entry(
__field( dev_t, dev )
__field( int, result )
),
TP_fast_assign(
__entry->dev = journal->j_fs_dev->bd_dev;
__entry->result = result;
),
TP_printk("dev %d,%d result %d",
MAJOR(__entry->dev), MINOR(__entry->dev),
__entry->result)
);
DECLARE_EVENT_CLASS(jbd_commit,
TP_PROTO(journal_t *journal, transaction_t *commit_transaction),
TP_ARGS(journal, commit_transaction),
TP_STRUCT__entry(
__field( dev_t, dev )
__field( int, transaction )
),
TP_fast_assign(
__entry->dev = journal->j_fs_dev->bd_dev;
__entry->transaction = commit_transaction->t_tid;
),
TP_printk("dev %d,%d transaction %d",
MAJOR(__entry->dev), MINOR(__entry->dev),
__entry->transaction)
);
DEFINE_EVENT(jbd_commit, jbd_start_commit,
TP_PROTO(journal_t *journal, transaction_t *commit_transaction),
TP_ARGS(journal, commit_transaction)
);
DEFINE_EVENT(jbd_commit, jbd_commit_locking,
TP_PROTO(journal_t *journal, transaction_t *commit_transaction),
TP_ARGS(journal, commit_transaction)
);
DEFINE_EVENT(jbd_commit, jbd_commit_flushing,
TP_PROTO(journal_t *journal, transaction_t *commit_transaction),
TP_ARGS(journal, commit_transaction)
);
DEFINE_EVENT(jbd_commit, jbd_commit_logging,
TP_PROTO(journal_t *journal, transaction_t *commit_transaction),
TP_ARGS(journal, commit_transaction)
);
TRACE_EVENT(jbd_drop_transaction,
TP_PROTO(journal_t *journal, transaction_t *commit_transaction),
TP_ARGS(journal, commit_transaction),
TP_STRUCT__entry(
__field( dev_t, dev )
__field( int, transaction )
),
TP_fast_assign(
__entry->dev = journal->j_fs_dev->bd_dev;
__entry->transaction = commit_transaction->t_tid;
),
TP_printk("dev %d,%d transaction %d",
MAJOR(__entry->dev), MINOR(__entry->dev),
__entry->transaction)
);
TRACE_EVENT(jbd_end_commit,
TP_PROTO(journal_t *journal, transaction_t *commit_transaction),
TP_ARGS(journal, commit_transaction),
TP_STRUCT__entry(
__field( dev_t, dev )
__field( int, transaction )
__field( int, head )
),
TP_fast_assign(
__entry->dev = journal->j_fs_dev->bd_dev;
__entry->transaction = commit_transaction->t_tid;
__entry->head = journal->j_tail_sequence;
),
TP_printk("dev %d,%d transaction %d head %d",
MAJOR(__entry->dev), MINOR(__entry->dev),
__entry->transaction, __entry->head)
);
TRACE_EVENT(jbd_do_submit_data,
TP_PROTO(journal_t *journal, transaction_t *commit_transaction),
TP_ARGS(journal, commit_transaction),
TP_STRUCT__entry(
__field( dev_t, dev )
__field( int, transaction )
),
TP_fast_assign(
__entry->dev = journal->j_fs_dev->bd_dev;
__entry->transaction = commit_transaction->t_tid;
),
TP_printk("dev %d,%d transaction %d",
MAJOR(__entry->dev), MINOR(__entry->dev),
__entry->transaction)
);
TRACE_EVENT(jbd_cleanup_journal_tail,
TP_PROTO(journal_t *journal, tid_t first_tid,
unsigned long block_nr, unsigned long freed),
TP_ARGS(journal, first_tid, block_nr, freed),
TP_STRUCT__entry(
__field( dev_t, dev )
__field( tid_t, tail_sequence )
__field( tid_t, first_tid )
__field(unsigned long, block_nr )
__field(unsigned long, freed )
),
TP_fast_assign(
__entry->dev = journal->j_fs_dev->bd_dev;
__entry->tail_sequence = journal->j_tail_sequence;
__entry->first_tid = first_tid;
__entry->block_nr = block_nr;
__entry->freed = freed;
),
TP_printk("dev %d,%d from %u to %u offset %lu freed %lu",
MAJOR(__entry->dev), MINOR(__entry->dev),
__entry->tail_sequence, __entry->first_tid,
__entry->block_nr, __entry->freed)
);
TRACE_EVENT(journal_write_superblock,
TP_PROTO(journal_t *journal, int write_op),
TP_ARGS(journal, write_op),
TP_STRUCT__entry(
__field( dev_t, dev )
__field( int, write_op )
),
TP_fast_assign(
__entry->dev = journal->j_fs_dev->bd_dev;
__entry->write_op = write_op;
),
TP_printk("dev %d,%d write_op %x", MAJOR(__entry->dev),
MINOR(__entry->dev), __entry->write_op)
);
#endif /* _TRACE_JBD_H */
/* This part must be outside protection */
#include <trace/define_trace.h>

View File

@ -299,15 +299,9 @@ config BOUNCE
# On the 'tile' arch, USB OHCI needs the bounce pool since tilegx will often
# have more than 4GB of memory, but we don't currently use the IOTLB to present
# a 32-bit address to OHCI. So we need to use a bounce pool instead.
#
# We also use the bounce pool to provide stable page writes for jbd. jbd
# initiates buffer writeback without locking the page or setting PG_writeback,
# and fixing that behavior (a second time; jbd2 doesn't have this problem) is
# a major rework effort. Instead, use the bounce buffer to snapshot pages
# (until jbd goes away). The only jbd user is ext3.
config NEED_BOUNCE_POOL
bool
default y if (TILE && USB_OHCI_HCD) || (BLK_DEV_INTEGRITY && JBD)
default y if TILE && USB_OHCI_HCD
config NR_QUICK
int