Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs

Pull ext3 removal, quota & udf fixes from Jan Kara: "The biggest change in the pull is the removal of ext3 filesystem driver (~28k lines removed). Ext4 driver is a full featured replacement these days and both RH and SUSE use it for several years without issues. Also there are some workarounds in VM & block layer mainly for ext3 which we could eventually get rid of. Other larger change is addition of proper error handling for dquot_initialize(). The rest is small fixes and cleanups" [ I wasn't convinced about the ext3 removal and worried about things falling through the cracks for legacy users, but ext4 maintainers piped up and were all unanimously in favor of removal, and maintaining all legacy ext3 support inside ext4. - Linus ] * 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs: udf: Don't modify filesystem for read-only mounts quota: remove an unneeded condition ext4: memory leak on error in ext4_symlink() mm/Kconfig: NEED_BOUNCE_POOL: clean-up condition ext4: Improve ext4 Kconfig test block: Remove forced page bouncing under IO fs: Remove ext3 filesystem driver doc: Update doc about journalling layer jfs: Handle error from dquot_initialize() reiserfs: Handle error from dquot_initialize() ocfs2: Handle error from dquot_initialize() ext4: Handle error from dquot_initialize() ext2: Handle error from dquot_initalize() quota: Propagate error from ->acquire_dquot()
2015-09-03 12:28:30 -07:00 · 2015-09-03 12:28:30 -07:00 · e31fb9e005
parent 824b005c86 9181f8bf5a
commit e31fb9e005
69 changed files with 505 additions and 28389 deletions
--- a/Documentation/DocBook/filesystems.tmpl
+++ b/Documentation/DocBook/filesystems.tmpl
@ -146,36 +146,30 @@
 The journalling layer is  easy to use. You need to
 first of all create a journal_t data structure. There are
 two calls to do this dependent on how you decide to allocate the physical
-media on which the journal resides. The journal_init_inode() call
-is for journals stored in filesystem inodes, or the journal_init_dev()
-call can be use for journal stored on a raw device (in a continuous range
+media on which the journal resides. The jbd2_journal_init_inode() call
+is for journals stored in filesystem inodes, or the jbd2_journal_init_dev()
+call can be used for journal stored on a raw device (in a continuous range
 of blocks). A journal_t is a typedef for a struct pointer, so when
-you are finally finished make sure you call journal_destroy() on it
+you are finally finished make sure you call jbd2_journal_destroy() on it
 to free up any used kernel memory.
 </para>

 <para>
 Once you have got your journal_t object you need to 'mount' or load the journal
-file, unless of course you haven't initialised it yet - in which case you
-need to call journal_create().
+file. The journalling layer expects the space for the journal was already
+allocated and initialized properly by the userspace tools.  When loading the
+journal you must call jbd2_journal_load() to process journal contents.  If the
+client file system detects the journal contents does not need to be processed
+(or even need not have valid contents), it may call jbd2_journal_wipe() to
+clear the journal contents before calling jbd2_journal_load().
 </para>

 <para>
-Most of the time however your journal file will already have been created, but
-before you load it you must call journal_wipe() to empty the journal file.
-Hang on, you say , what if the filesystem wasn't cleanly umount()'d . Well, it is the
-job of the client file system to detect this and skip the call to journal_wipe().
-</para>
-
-<para>
-In either case the next call should be to journal_load() which prepares the
-journal file for use. Note that journal_wipe(..,0) calls journal_skip_recovery()
-for you if it detects any outstanding transactions in the journal and similarly
-journal_load() will call journal_recover() if necessary.
-I would advise reading fs/ext3/super.c for examples on this stage.
-[RGG: Why is the journal_wipe() call necessary - doesn't this needlessly
-complicate the API. Or isn't a good idea for the journal layer to hide
-dirty mounts from the client fs]
+Note that jbd2_journal_wipe(..,0) calls jbd2_journal_skip_recovery() for you if
+it detects any outstanding transactions in the journal and similarly
+jbd2_journal_load() will call jbd2_journal_recover() if necessary.  I would
+advise reading ext4_load_journal() in fs/ext4/super.c for examples on this
+stage.
 </para>

 <para>
@ -189,41 +183,41 @@ You still need to actually journal your filesystem changes, this
 is done by wrapping them into transactions. Additionally you
 also need to wrap the modification of each of the buffers
 with calls to the journal layer, so it knows what the modifications
-you are actually making are. To do this use  journal_start() which
+you are actually making are. To do this use jbd2_journal_start() which
 returns a transaction handle.
 </para>

 <para>
-journal_start()
-and its counterpart journal_stop(), which indicates the end of a transaction
-are nestable calls, so you can reenter a transaction if necessary,
-but remember you must call journal_stop() the same number of times as
-journal_start() before the transaction is completed (or more accurately
-leaves the update phase). Ext3/VFS makes use of this feature to simplify
-quota support.
+jbd2_journal_start()
+and its counterpart jbd2_journal_stop(), which indicates the end of a
+transaction are nestable calls, so you can reenter a transaction if necessary,
+but remember you must call jbd2_journal_stop() the same number of times as
+jbd2_journal_start() before the transaction is completed (or more accurately
+leaves the update phase). Ext4/VFS makes use of this feature to simplify
+handling of inode dirtying, quota support, etc.
 </para>

 <para>
 Inside each transaction you need to wrap the modifications to the
 individual buffers (blocks). Before you start to modify a buffer you
-need to call journal_get_{create,write,undo}_access() as appropriate,
+need to call jbd2_journal_get_{create,write,undo}_access() as appropriate,
 this allows the journalling layer to copy the unmodified data if it
 needs to. After all the buffer may be part of a previously uncommitted
 transaction.
 At this point you are at last ready to modify a buffer, and once
-you are have done so you need to call journal_dirty_{meta,}data().
+you are have done so you need to call jbd2_journal_dirty_{meta,}data().
 Or if you've asked for access to a buffer you now know is now longer
-required to be pushed back on the device you can call journal_forget()
+required to be pushed back on the device you can call jbd2_journal_forget()
 in much the same way as you might have used bforget() in the past.
 </para>

 <para>
-A journal_flush() may be called at any time to commit and checkpoint
+A jbd2_journal_flush() may be called at any time to commit and checkpoint
 all your transactions.
 </para>

 <para>
-Then at umount time , in your put_super() you can then call journal_destroy()
+Then at umount time , in your put_super() you can then call jbd2_journal_destroy()
 to clean up your in-core journal object.
 </para>

@ -231,82 +225,74 @@ to clean up your in-core journal object.
 Unfortunately there a couple of ways the journal layer can cause a deadlock.
 The first thing to note is that each task can only have
 a single outstanding transaction at any one time, remember nothing
-commits until the outermost journal_stop(). This means
+commits until the outermost jbd2_journal_stop(). This means
 you must complete the transaction at the end of each file/inode/address
 etc. operation you perform, so that the journalling system isn't re-entered
 on another journal. Since transactions can't be nested/batched
 across differing journals, and another filesystem other than
-yours (say ext3) may be modified in a later syscall.
+yours (say ext4) may be modified in a later syscall.
 </para>

 <para>
-The second case to bear in mind is that journal_start() can
+The second case to bear in mind is that jbd2_journal_start() can
 block if there isn't enough space in the journal for your transaction
 (based on the passed nblocks param) - when it blocks it merely(!) needs to
 wait for transactions to complete and be committed from other tasks,
-so essentially we are waiting for journal_stop(). So to avoid
-deadlocks you must treat journal_start/stop() as if they
+so essentially we are waiting for jbd2_journal_stop(). So to avoid
+deadlocks you must treat jbd2_journal_start/stop() as if they
 were semaphores and include them in your semaphore ordering rules to prevent
-deadlocks. Note that journal_extend() has similar blocking behaviour to
-journal_start() so you can deadlock here just as easily as on journal_start().
+deadlocks. Note that jbd2_journal_extend() has similar blocking behaviour to
+jbd2_journal_start() so you can deadlock here just as easily as on
+jbd2_journal_start().
 </para>

 <para>
 Try to reserve the right number of blocks the first time. ;-). This will
 be the maximum number of blocks you are going to touch in this transaction.
-I advise having a look at at least ext3_jbd.h to see the basis on which
-ext3 uses to make these decisions.
+I advise having a look at at least ext4_jbd.h to see the basis on which
+ext4 uses to make these decisions.
 </para>

 <para>
 Another wriggle to watch out for is your on-disk block allocation strategy.
-why? Because, if you undo a delete, you need to ensure you haven't reused any
-of the freed blocks in a later transaction. One simple way of doing this
-is make sure any blocks you allocate only have checkpointed transactions
-listed against them. Ext3 does this in ext3_test_allocatable().
+Why? Because, if you do a delete, you need to ensure you haven't reused any
+of the freed blocks until the transaction freeing these blocks commits. If you
+reused these blocks and crash happens, there is no way to restore the contents
+of the reallocated blocks at the end of the last fully committed transaction.
+
+One simple way of doing this is to mark blocks as free in internal in-memory
+block allocation structures only after the transaction freeing them commits.
+Ext4 uses journal commit callback for this purpose.
 </para>

 <para>
-Lock is also providing through journal_{un,}lock_updates(),
-ext3 uses this when it wants a window with a clean and stable fs for a moment.
-eg.
+With journal commit callbacks you can ask the journalling layer to call a
+callback function when the transaction is finally committed to disk, so that
+you can do some of your own management. You ask the journalling layer for
+calling the callback by simply setting journal->j_commit_callback function
+pointer and that function is called after each transaction commit. You can also
+use transaction->t_private_list for attaching entries to a transaction that
+need processing when the transaction commits.
+</para>
+
+<para>
+JBD2 also provides a way to block all transaction updates via
+jbd2_journal_{un,}lock_updates(). Ext4 uses this when it wants a window with a
+clean and stable fs for a moment.  E.g.
 </para>

 <programlisting>

-	journal_lock_updates() //stop new stuff happening..
-	journal_flush()        // checkpoint everything.
+	jbd2_journal_lock_updates() //stop new stuff happening..
+	jbd2_journal_flush()        // checkpoint everything.
 	..do stuff on stable fs
-	journal_unlock_updates() // carry on with filesystem use.
+	jbd2_journal_unlock_updates() // carry on with filesystem use.
 </programlisting>

 <para>
 The opportunities for abuse and DOS attacks with this should be obvious,
 if you allow unprivileged userspace to trigger codepaths containing these
 calls.
-</para>
-
-<para>
-A new feature of jbd since 2.5.25 is commit callbacks with the new
-journal_callback_set() function you can now ask the journalling layer
-to call you back when the transaction is finally committed to disk, so that
-you can do some of your own management. The key to this is the journal_callback
-struct, this maintains the internal callback information but you can
-extend it like this:-
-</para>
-<programlisting>
-	struct  myfs_callback_s {
-		//Data structure element required by jbd..
-		struct journal_callback for_jbd;
-		// Stuff for myfs allocated together.
-		myfs_inode*    i_commited;
-
-	}
-</programlisting>
-
-<para>
-this would be useful if you needed to know when data was committed to a
-particular inode.
 </para>

    </sect2>
@ -319,36 +305,6 @@ being each mount, each modification (transaction) and each changed buffer
 to tell the journalling layer about them.
 </para>

-<para>
-Here is a some pseudo code to give you an idea of how it works, as
-an example.
-</para>
-
-<programlisting>
-  journal_t* my_jnrl = journal_create();
-  journal_init_{dev,inode}(jnrl,...)
-  if (clean) journal_wipe();
-  journal_load();
-
-   foreach(transaction) { /*transactions must be
-                            completed before
-                            a syscall returns to
-                            userspace*/
-
-          handle_t * xct=journal_start(my_jnrl);
-          foreach(bh) {
-                journal_get_{create,write,undo}_access(xact,bh);
-                if ( myfs_modify(bh) ) { /* returns true
-                                        if makes changes */
-                           journal_dirty_{meta,}data(xact,bh);
-                } else {
-                           journal_forget(bh);
-                }
-          }
-          journal_stop(xct);
-   }
-   journal_destroy(my_jrnl);
-</programlisting>
    </sect2>

    </sect1>
@ -357,13 +313,13 @@ an example.
     <title>Data Types</title>
     <para>
 	The journalling layer uses typedefs to 'hide' the concrete definitions
-	of the structures used. As a client of the JBD layer you can
+	of the structures used. As a client of the JBD2 layer you can
 	just rely on the using the pointer as a magic cookie  of some sort.

 	Obviously the hiding is not enforced as this is 'C'.
     </para>
 	<sect2 id="structures"><title>Structures</title>
-!Iinclude/linux/jbd.h
+!Iinclude/linux/jbd2.h
 	</sect2>
    </sect1>

@ -375,11 +331,11 @@ an example.
 	manage transactions
     </para>
 	<sect2 id="journal_level"><title>Journal Level</title>
-!Efs/jbd/journal.c
-!Ifs/jbd/recovery.c
+!Efs/jbd2/journal.c
+!Ifs/jbd2/recovery.c
 	</sect2>
 	<sect2 id="transaction_level"><title>Transasction Level</title>
-!Efs/jbd/transaction.c
+!Efs/jbd2/transaction.c
 	</sect2>
    </sect1>
    <sect1 id="see_also">
--- a/Documentation/filesystems/ext2.txt
+++ b/Documentation/filesystems/ext2.txt
@ -360,8 +360,8 @@ and are copied into the filesystem.  If a transaction is incomplete at
 the time of the crash, then there is no guarantee of consistency for
 the blocks in that transaction so they are discarded (which means any
 filesystem changes they represent are also lost).
-Check Documentation/filesystems/ext3.txt if you want to read more about
-ext3 and journaling.
+Check Documentation/filesystems/ext4.txt if you want to read more about
+ext4 and journaling.

 References
 ==========
--- a/Documentation/filesystems/ext3.txt
+++ b/Documentation/filesystems/ext3.txt
@ -6,210 +6,7 @@ Ext3 was originally released in September 1999. Written by Stephen Tweedie
 for the 2.2 branch, and ported to 2.4 kernels by Peter Braam, Andreas Dilger,
 Andrew Morton, Alexander Viro, Ted Ts'o and Stephen Tweedie.

-Ext3 is the ext2 filesystem enhanced with journalling capabilities.
+Ext3 is the ext2 filesystem enhanced with journalling capabilities. The
+filesystem is a subset of ext4 filesystem so use ext4 driver for accessing
+ext3 filesystems.

-Options
-=======
-
-When mounting an ext3 filesystem, the following option are accepted:
-(*) == default
-
-ro			Mount filesystem read only. Note that ext3 will replay
-			the journal (and thus write to the partition) even when
-			mounted "read only". Mount options "ro,noload" can be
-			used to prevent writes to the filesystem.
-
-journal=update		Update the ext3 file system's journal to the current
-			format.
-
-journal=inum		When a journal already exists, this option is ignored.
-			Otherwise, it specifies the number of the inode which
-			will represent the ext3 file system's journal file.
-
-journal_path=path
-journal_dev=devnum	When the external journal device's major/minor numbers
-			have changed, these options allow the user to specify
-			the new journal location.  The journal device is
-			identified through either its new major/minor numbers
-			encoded in devnum, or via a path to the device.
-
-norecovery		Don't load the journal on mounting. Note that this forces
-noload			mount of inconsistent filesystem, which can lead to
-			various problems.
-
-data=journal		All data are committed into the journal prior to being
-			written into the main file system.
-
-data=ordered	(*)	All data are forced directly out to the main file
-			system prior to its metadata being committed to the
-			journal.
-
-data=writeback		Data ordering is not preserved, data may be written
-			into the main file system after its metadata has been
-			committed to the journal.
-
-commit=nrsec	(*)	Ext3 can be told to sync all its data and metadata
-			every 'nrsec' seconds. The default value is 5 seconds.
-			This means that if you lose your power, you will lose
-			as much as the latest 5 seconds of work (your
-			filesystem will not be damaged though, thanks to the
-			journaling).  This default value (or any low value)
-			will hurt performance, but it's good for data-safety.
-			Setting it to 0 will have the same effect as leaving
-			it at the default (5 seconds).
-			Setting it to very large values will improve
-			performance.
-
-barrier=<0|1(*)>	This enables/disables the use of write barriers in
-barrier	(*)		the jbd code.  barrier=0 disables, barrier=1 enables.
-nobarrier		This also requires an IO stack which can support
-			barriers, and if jbd gets an error on a barrier
-			write, it will disable again with a warning.
-			Write barriers enforce proper on-disk ordering
-			of journal commits, making volatile disk write caches
-			safe to use, at some performance penalty.  If
-			your disks are battery-backed in one way or another,
-			disabling barriers may safely improve performance.
-			The mount options "barrier" and "nobarrier" can
-			also be used to enable or disable barriers, for
-			consistency with other ext3 mount options.
-
-user_xattr		Enables Extended User Attributes.  Additionally, you
-			need to have extended attribute support enabled in the
-			kernel configuration (CONFIG_EXT3_FS_XATTR).  See the
-			attr(5) manual page and http://acl.bestbits.at/ to
-			learn more about extended attributes.
-
-nouser_xattr		Disables Extended User Attributes.
-
-acl			Enables POSIX Access Control Lists support.
-			Additionally, you need to have ACL support enabled in
-			the kernel configuration (CONFIG_EXT3_FS_POSIX_ACL).
-			See the acl(5) manual page and http://acl.bestbits.at/
-			for more information.
-
-noacl			This option disables POSIX Access Control List
-			support.
-
-reservation
-
-noreservation
-
-bsddf 		(*)	Make 'df' act like BSD.
-minixdf			Make 'df' act like Minix.
-
-check=none		Don't do extra checking of bitmaps on mount.
-nocheck
-
-debug			Extra debugging information is sent to syslog.
-
-errors=remount-ro	Remount the filesystem read-only on an error.
-errors=continue		Keep going on a filesystem error.
-errors=panic		Panic and halt the machine if an error occurs.
-			(These mount options override the errors behavior
-			specified in the superblock, which can be
-			configured using tune2fs.)
-
-data_err=ignore(*)	Just print an error message if an error occurs
-			in a file data buffer in ordered mode.
-data_err=abort		Abort the journal if an error occurs in a file
-			data buffer in ordered mode.
-
-grpid			Give objects the same group ID as their creator.
-bsdgroups
-
-nogrpid		(*)	New objects have the group ID of their creator.
-sysvgroups
-
-resgid=n		The group ID which may use the reserved blocks.
-
-resuid=n		The user ID which may use the reserved blocks.
-
-sb=n			Use alternate superblock at this location.
-
-quota			These options are ignored by the filesystem. They
-noquota			are used only by quota tools to recognize volumes
-grpquota		where quota should be turned on. See documentation
-usrquota		in the quota-tools package for more details
-			(http://sourceforge.net/projects/linuxquota).
-
-jqfmt=<quota type>	These options tell filesystem details about quota
-usrjquota=<file>	so that quota information can be properly updated
-grpjquota=<file>	during journal replay. They replace the above
-			quota options. See documentation in the quota-tools
-			package for more details
-			(http://sourceforge.net/projects/linuxquota).
-
-Specification
-=============
-Ext3 shares all disk implementation with the ext2 filesystem, and adds
-transactions capabilities to ext2.  Journaling is done by the Journaling Block
-Device layer.
-
-Journaling Block Device layer
-----------------------------
-The Journaling Block Device layer (JBD) isn't ext3 specific.  It was designed
-to add journaling capabilities to a block device.  The ext3 filesystem code
-will inform the JBD of modifications it is performing (called a transaction).
-The journal supports the transactions start and stop, and in case of a crash,
-the journal can replay the transactions to quickly put the partition back into
-a consistent state.
-
-Handles represent a single atomic update to a filesystem.  JBD can handle an
-external journal on a block device.
-
-Data Mode
---------
-There are 3 different data modes:
-
-* writeback mode
-In data=writeback mode, ext3 does not journal data at all.  This mode provides
-a similar level of journaling as that of XFS, JFS, and ReiserFS in its default
-mode - metadata journaling.  A crash+recovery can cause incorrect data to
-appear in files which were written shortly before the crash.  This mode will
-typically provide the best ext3 performance.
-
-* ordered mode
-In data=ordered mode, ext3 only officially journals metadata, but it logically
-groups metadata and data blocks into a single unit called a transaction.  When
-it's time to write the new metadata out to disk, the associated data blocks
-are written first.  In general, this mode performs slightly slower than
-writeback but significantly faster than journal mode.
-
-* journal mode
-data=journal mode provides full data and metadata journaling.  All new data is
-written to the journal first, and then to its final location.
-In the event of a crash, the journal can be replayed, bringing both data and
-metadata into a consistent state.  This mode is the slowest except when data
-needs to be read from and written to disk at the same time where it
-outperforms all other modes.
-
-Compatibility
-------------
-
-Ext2 partitions can be easily convert to ext3, with `tune2fs -j <dev>`.
-Ext3 is fully compatible with Ext2.  Ext3 partitions can easily be mounted as
-Ext2.
-
-
-External Tools
-==============
-See manual pages to learn more.
-
-tune2fs: 	create a ext3 journal on a ext2 partition with the -j flag.
-mke2fs: 	create a ext3 partition with the -j flag.
-debugfs: 	ext2 and ext3 file system debugger.
-ext2online:	online (mounted) ext2 and ext3 filesystem resizer
-
-
-References
-==========
-
-kernel source:	<file:fs/ext3/>
-		<file:fs/jbd/>
-
-programs: 	http://e2fsprogs.sourceforge.net/
-		http://ext2resize.sourceforge.net
-
-useful links:	http://www.ibm.com/developerworks/library/l-fs7/index.html
-        http://www.ibm.com/developerworks/library/l-fs8/index.html
--- a/Documentation/filesystems/vfs.txt
+++ b/Documentation/filesystems/vfs.txt
@ -769,7 +769,7 @@ struct address_space_operations {
 	to stall to allow flushers a chance to complete some IO. Ordinarily
 	it can use PageDirty and PageWriteback but some filesystems have
 	more complex state (unstable pages in NFS prevent reclaim) or
-	do not set those flags due to locking problems (jbd). This callback
+	do not set those flags due to locking problems. This callback
 	allows a filesystem to indicate to the VM if a page should be
 	treated as dirty or writeback for the purposes of stalling.

--- a/18
+++ b/18
@ -4078,15 +4078,6 @@ F:	Documentation/filesystems/ext2.txt
 F:	fs/ext2/
 F:	include/linux/ext2*

-EXT3 FILE SYSTEM
-M:	Jan Kara <jack@suse.com>
-M:	Andrew Morton <akpm@linux-foundation.org>
-M:	Andreas Dilger <adilger.kernel@dilger.ca>
-L:	linux-ext4@vger.kernel.org
-S:	Maintained
-F:	Documentation/filesystems/ext3.txt
-F:	fs/ext3/
-
 EXT4 FILE SYSTEM
 M:	"Theodore Ts'o" <tytso@mit.edu>
 M:	Andreas Dilger <adilger.kernel@dilger.ca>
@ -5787,16 +5778,9 @@ S:	Maintained
 F:	fs/jffs2/
 F:	include/uapi/linux/jffs2.h

-JOURNALLING LAYER FOR BLOCK DEVICES (JBD)
-M:	Andrew Morton <akpm@linux-foundation.org>
-M:	Jan Kara <jack@suse.com>
-L:	linux-ext4@vger.kernel.org
-S:	Maintained
-F:	fs/jbd/
-F:	include/linux/jbd.h
-
 JOURNALLING LAYER FOR BLOCK DEVICES (JBD2)
 M:	"Theodore Ts'o" <tytso@mit.edu>
+M:	Jan Kara <jack@suse.com>
 L:	linux-ext4@vger.kernel.org
 S:	Maintained
 F:	fs/jbd2/
--- a/block/bounce.c
+++ b/block/bounce.c
@ -177,26 +177,8 @@ static void bounce_end_io_read_isa(struct bio *bio)
 	__bounce_end_io_read(bio, isa_page_pool);
 }

-#ifdef CONFIG_NEED_BOUNCE_POOL
-static int must_snapshot_stable_pages(struct request_queue *q, struct bio *bio)
-{
-	if (bio_data_dir(bio) != WRITE)
-		return 0;
-
-	if (!bdi_cap_stable_pages_required(&q->backing_dev_info))
-		return 0;
-
-	return bio_flagged(bio, BIO_SNAP_STABLE);
-}
-#else
-static int must_snapshot_stable_pages(struct request_queue *q, struct bio *bio)
-{
-	return 0;
-}
-#endif /* CONFIG_NEED_BOUNCE_POOL */
-
 static void __blk_queue_bounce(struct request_queue *q, struct bio **bio_orig,
-			       mempool_t *pool, int force)
+			       mempool_t *pool)
 {
 	struct bio *bio;
 	int rw = bio_data_dir(*bio_orig);
@ -204,8 +186,6 @@ static void __blk_queue_bounce(struct request_queue *q, struct bio **bio_orig,
 	struct bvec_iter iter;
 	unsigned i;

-	if (force)
-		goto bounce;
 	bio_for_each_segment(from, *bio_orig, iter)
 		if (page_to_pfn(from.bv_page) > queue_bounce_pfn(q))
 			goto bounce;
@ -217,7 +197,7 @@ static void __blk_queue_bounce(struct request_queue *q, struct bio **bio_orig,
 	bio_for_each_segment_all(to, bio, i) {
 		struct page *page = to->bv_page;

-		if (page_to_pfn(page) <= queue_bounce_pfn(q) && !force)
+		if (page_to_pfn(page) <= queue_bounce_pfn(q))
 			continue;

 		to->bv_page = mempool_alloc(pool, q->bounce_gfp);
@ -255,7 +235,6 @@ static void __blk_queue_bounce(struct request_queue *q, struct bio **bio_orig,

 void blk_queue_bounce(struct request_queue *q, struct bio **bio_orig)
 {
-	int must_bounce;
 	mempool_t *pool;

 	/*
@ -264,15 +243,13 @@ void blk_queue_bounce(struct request_queue *q, struct bio **bio_orig)
 	if (!bio_has_data(*bio_orig))
 		return;

-	must_bounce = must_snapshot_stable_pages(q, *bio_orig);
-
 	/*
 	 * for non-isa bounce case, just check if the bounce pfn is equal
 	 * to or bigger than the highest pfn in the system -- in that case,
 	 * don't waste time iterating over bio segments
 	 */
 	if (!(q->bounce_gfp & GFP_DMA)) {
-		if (queue_bounce_pfn(q) >= blk_max_pfn && !must_bounce)
+		if (queue_bounce_pfn(q) >= blk_max_pfn)
 			return;
 		pool = page_pool;
 	} else {
@ -283,7 +260,7 @@ void blk_queue_bounce(struct request_queue *q, struct bio **bio_orig)
 	/*
 	 * slow path
 	 */
-	__blk_queue_bounce(q, bio_orig, pool, must_bounce);
+	__blk_queue_bounce(q, bio_orig, pool);
 }

 EXPORT_SYMBOL(blk_queue_bounce);
--- a/fs/Kconfig
+++ b/fs/Kconfig
@ -11,18 +11,15 @@ config DCACHE_WORD_ACCESS
 if BLOCK

 source "fs/ext2/Kconfig"
-source "fs/ext3/Kconfig"
 source "fs/ext4/Kconfig"
-source "fs/jbd/Kconfig"
 source "fs/jbd2/Kconfig"

 config FS_MBCACHE
 # Meta block cache for Extended Attributes (ext2/ext3/ext4)
 	tristate
 	default y if EXT2_FS=y && EXT2_FS_XATTR
-	default y if EXT3_FS=y && EXT3_FS_XATTR
 	default y if EXT4_FS=y
-	default m if EXT2_FS_XATTR || EXT3_FS_XATTR || EXT4_FS
+	default m if EXT2_FS_XATTR || EXT4_FS

 source "fs/reiserfs/Kconfig"
 source "fs/jfs/Kconfig"
--- a/fs/Makefile
+++ b/fs/Makefile
@ -62,12 +62,10 @@ obj-$(CONFIG_DLM)		+= dlm/
 # Do not add any filesystems before this line
 obj-$(CONFIG_FSCACHE)		+= fscache/
 obj-$(CONFIG_REISERFS_FS)	+= reiserfs/
-obj-$(CONFIG_EXT3_FS)		+= ext3/ # Before ext2 so root fs can be ext3
 obj-$(CONFIG_EXT2_FS)		+= ext2/
 # We place ext4 after ext2 so plain ext2 root fs's are mounted using ext2
 # unless explicitly requested by rootfstype
 obj-$(CONFIG_EXT4_FS)		+= ext4/
-obj-$(CONFIG_JBD)		+= jbd/
 obj-$(CONFIG_JBD2)		+= jbd2/
 obj-$(CONFIG_CRAMFS)		+= cramfs/
 obj-$(CONFIG_SQUASHFS)		+= squashfs/
--- a/fs/ext2/ialloc.c
+++ b/fs/ext2/ialloc.c
@ -577,7 +577,10 @@ struct inode *ext2_new_inode(struct inode *dir, umode_t mode,
 		goto fail;
 	}

-	dquot_initialize(inode);
+	err = dquot_initialize(inode);
+	if (err)
+		goto fail_drop;
+
 	err = dquot_alloc_inode(inode);
 	if (err)
 		goto fail_drop;
--- a/fs/ext2/inode.c
+++ b/fs/ext2/inode.c
@ -1552,8 +1552,11 @@ int ext2_setattr(struct dentry *dentry, struct iattr *iattr)
 	if (error)
 		return error;

-	if (is_quota_modification(inode, iattr))
-		dquot_initialize(inode);
+	if (is_quota_modification(inode, iattr)) {
+		error = dquot_initialize(inode);
+		if (error)
+			return error;
+	}
 	if ((iattr->ia_valid & ATTR_UID && !uid_eq(iattr->ia_uid, inode->i_uid)) ||
 	    (iattr->ia_valid & ATTR_GID && !gid_eq(iattr->ia_gid, inode->i_gid))) {
 		error = dquot_transfer(inode, iattr);
--- a/fs/ext2/namei.c
+++ b/fs/ext2/namei.c
@ -96,8 +96,11 @@ struct dentry *ext2_get_parent(struct dentry *child)
 static int ext2_create (struct inode * dir, struct dentry * dentry, umode_t mode, bool excl)
 {
 	struct inode *inode;
+	int err;

-	dquot_initialize(dir);
+	err = dquot_initialize(dir);
+	if (err)
+		return err;

 	inode = ext2_new_inode(dir, mode, &dentry->d_name);
 	if (IS_ERR(inode))
@ -143,7 +146,9 @@ static int ext2_mknod (struct inode * dir, struct dentry *dentry, umode_t mode,
 	if (!new_valid_dev(rdev))
 		return -EINVAL;

-	dquot_initialize(dir);
+	err = dquot_initialize(dir);
+	if (err)
+		return err;

 	inode = ext2_new_inode (dir, mode, &dentry->d_name);
 	err = PTR_ERR(inode);
@ -169,7 +174,9 @@ static int ext2_symlink (struct inode * dir, struct dentry * dentry,
 	if (l > sb->s_blocksize)
 		goto out;

-	dquot_initialize(dir);
+	err = dquot_initialize(dir);
+	if (err)
+		goto out;

 	inode = ext2_new_inode (dir, S_IFLNK | S_IRWXUGO, &dentry->d_name);
 	err = PTR_ERR(inode);
@ -212,7 +219,9 @@ static int ext2_link (struct dentry * old_dentry, struct inode * dir,
 	struct inode *inode = d_inode(old_dentry);
 	int err;

-	dquot_initialize(dir);
+	err = dquot_initialize(dir);
+	if (err)
+		return err;

 	inode->i_ctime = CURRENT_TIME_SEC;
 	inode_inc_link_count(inode);
@ -233,7 +242,9 @@ static int ext2_mkdir(struct inode * dir, struct dentry * dentry, umode_t mode)
 	struct inode * inode;
 	int err;

-	dquot_initialize(dir);
+	err = dquot_initialize(dir);
+	if (err)
+		return err;

 	inode_inc_link_count(dir);

@ -279,13 +290,17 @@ static int ext2_unlink(struct inode * dir, struct dentry *dentry)
 	struct inode * inode = d_inode(dentry);
 	struct ext2_dir_entry_2 * de;
 	struct page * page;
-	int err = -ENOENT;
+	int err;

-	dquot_initialize(dir);
+	err = dquot_initialize(dir);
+	if (err)
+		goto out;

 	de = ext2_find_entry (dir, &dentry->d_name, &page);
-	if (!de)
+	if (!de) {
+		err = -ENOENT;
 		goto out;
+	}

 	err = ext2_delete_entry (de, page);
 	if (err)
@ -323,14 +338,21 @@ static int ext2_rename (struct inode * old_dir, struct dentry * old_dentry,
 	struct ext2_dir_entry_2 * dir_de = NULL;
 	struct page * old_page;
 	struct ext2_dir_entry_2 * old_de;
-	int err = -ENOENT;
+	int err;

-	dquot_initialize(old_dir);
-	dquot_initialize(new_dir);
+	err = dquot_initialize(old_dir);
+	if (err)
+		goto out;
+
+	err = dquot_initialize(new_dir);
+	if (err)
+		goto out;

 	old_de = ext2_find_entry (old_dir, &old_dentry->d_name, &old_page);
-	if (!old_de)
+	if (!old_de) {
+		err = -ENOENT;
 		goto out;
+	}

 	if (S_ISDIR(old_inode->i_mode)) {
 		err = -EIO;
--- a/fs/ext3/Kconfig
+++ b/fs/ext3/Kconfig
@ -1,89 +0,0 @@
-config EXT3_FS
-	tristate "Ext3 journalling file system support"
-	select JBD
-	help
-	  This is the journalling version of the Second extended file system
-	  (often called ext3), the de facto standard Linux file system
-	  (method to organize files on a storage device) for hard disks.
-
-	  The journalling code included in this driver means you do not have
-	  to run e2fsck (file system checker) on your file systems after a
-	  crash.  The journal keeps track of any changes that were being made
-	  at the time the system crashed, and can ensure that your file system
-	  is consistent without the need for a lengthy check.
-
-	  Other than adding the journal to the file system, the on-disk format
-	  of ext3 is identical to ext2.  It is possible to freely switch
-	  between using the ext3 driver and the ext2 driver, as long as the
-	  file system has been cleanly unmounted, or e2fsck is run on the file
-	  system.
-
-	  To add a journal on an existing ext2 file system or change the
-	  behavior of ext3 file systems, you can use the tune2fs utility ("man
-	  tune2fs").  To modify attributes of files and directories on ext3
-	  file systems, use chattr ("man chattr").  You need to be using
-	  e2fsprogs version 1.20 or later in order to create ext3 journals
-	  (available at <http://sourceforge.net/projects/e2fsprogs/>).
-
-	  To compile this file system support as a module, choose M here: the
-	  module will be called ext3.
-
-config EXT3_DEFAULTS_TO_ORDERED
-	bool "Default to 'data=ordered' in ext3"
-	depends on EXT3_FS
-	default y
-	help
-	  The journal mode options for ext3 have different tradeoffs
-	  between when data is guaranteed to be on disk and
-	  performance.	The use of "data=writeback" can cause
-	  unwritten data to appear in files after an system crash or
-	  power failure, which can be a security issue.	 However,
-	  "data=ordered" mode can also result in major performance
-	  problems, including seconds-long delays before an fsync()
-	  call returns.	 For details, see:
-
-	  http://ext4.wiki.kernel.org/index.php/Ext3_data_mode_tradeoffs
-
-	  If you have been historically happy with ext3's performance,
-	  data=ordered mode will be a safe choice and you should
-	  answer 'y' here.  If you understand the reliability and data
-	  privacy issues of data=writeback and are willing to make
-	  that trade off, answer 'n'.
-
-config EXT3_FS_XATTR
-	bool "Ext3 extended attributes"
-	depends on EXT3_FS
-	default y
-	help
-	  Extended attributes are name:value pairs associated with inodes by
-	  the kernel or by users (see the attr(5) manual page, or visit
-	  <http://acl.bestbits.at/> for details).
-
-	  If unsure, say N.
-
-	  You need this for POSIX ACL support on ext3.
-
-config EXT3_FS_POSIX_ACL
-	bool "Ext3 POSIX Access Control Lists"
-	depends on EXT3_FS_XATTR
-	select FS_POSIX_ACL
-	help
-	  Posix Access Control Lists (ACLs) support permissions for users and
-	  groups beyond the owner/group/world scheme.
-
-	  To learn more about Access Control Lists, visit the Posix ACLs for
-	  Linux website <http://acl.bestbits.at/>.
-
-	  If you don't know what Access Control Lists are, say N
-
-config EXT3_FS_SECURITY
-	bool "Ext3 Security Labels"
-	depends on EXT3_FS_XATTR
-	help
-	  Security labels support alternative access control models
-	  implemented by security modules like SELinux.  This option
-	  enables an extended attribute handler for file security
-	  labels in the ext3 filesystem.
-
-	  If you are not using a security module that requires using
-	  extended attributes for file security labels, say N.
--- a/fs/ext3/Makefile
+++ b/fs/ext3/Makefile
@ -1,12 +0,0 @@
-#
-# Makefile for the linux ext3-filesystem routines.
-#
-
-obj-$(CONFIG_EXT3_FS) += ext3.o
-
-ext3-y	:= balloc.o bitmap.o dir.o file.o fsync.o ialloc.o inode.o \
-	   ioctl.o namei.o super.o symlink.o hash.o resize.o ext3_jbd.o
-
-ext3-$(CONFIG_EXT3_FS_XATTR)	 += xattr.o xattr_user.o xattr_trusted.o
-ext3-$(CONFIG_EXT3_FS_POSIX_ACL) += acl.o
-ext3-$(CONFIG_EXT3_FS_SECURITY)	 += xattr_security.o
--- a/fs/ext3/acl.c
+++ b/fs/ext3/acl.c
@ -1,281 +0,0 @@
-/*
- * linux/fs/ext3/acl.c
- *
- * Copyright (C) 2001-2003 Andreas Gruenbacher, <agruen@suse.de>
- */
-
-#include "ext3.h"
-#include "xattr.h"
-#include "acl.h"
-
-/*
- * Convert from filesystem to in-memory representation.
- */
-static struct posix_acl *
-ext3_acl_from_disk(const void *value, size_t size)
-{
-	const char *end = (char *)value + size;
-	int n, count;
-	struct posix_acl *acl;
-
-	if (!value)
-		return NULL;
-	if (size < sizeof(ext3_acl_header))
-		 return ERR_PTR(-EINVAL);
-	if (((ext3_acl_header *)value)->a_version !=
-	    cpu_to_le32(EXT3_ACL_VERSION))
-		return ERR_PTR(-EINVAL);
-	value = (char *)value + sizeof(ext3_acl_header);
-	count = ext3_acl_count(size);
-	if (count < 0)
-		return ERR_PTR(-EINVAL);
-	if (count == 0)
-		return NULL;
-	acl = posix_acl_alloc(count, GFP_NOFS);
-	if (!acl)
-		return ERR_PTR(-ENOMEM);
-	for (n=0; n < count; n++) {
-		ext3_acl_entry *entry =
-			(ext3_acl_entry *)value;
-		if ((char *)value + sizeof(ext3_acl_entry_short) > end)
-			goto fail;
-		acl->a_entries[n].e_tag  = le16_to_cpu(entry->e_tag);
-		acl->a_entries[n].e_perm = le16_to_cpu(entry->e_perm);
-		switch(acl->a_entries[n].e_tag) {
-			case ACL_USER_OBJ:
-			case ACL_GROUP_OBJ:
-			case ACL_MASK:
-			case ACL_OTHER:
-				value = (char *)value +
-					sizeof(ext3_acl_entry_short);
-				break;
-
-			case ACL_USER:
-				value = (char *)value + sizeof(ext3_acl_entry);
-				if ((char *)value > end)
-					goto fail;
-				acl->a_entries[n].e_uid =
-					make_kuid(&init_user_ns,
-						  le32_to_cpu(entry->e_id));
-				break;
-			case ACL_GROUP:
-				value = (char *)value + sizeof(ext3_acl_entry);
-				if ((char *)value > end)
-					goto fail;
-				acl->a_entries[n].e_gid =
-					make_kgid(&init_user_ns,
-						  le32_to_cpu(entry->e_id));
-				break;
-
-			default:
-				goto fail;
-		}
-	}
-	if (value != end)
-		goto fail;
-	return acl;
-
-fail:
-	posix_acl_release(acl);
-	return ERR_PTR(-EINVAL);
-}
-
-/*
- * Convert from in-memory to filesystem representation.
- */
-static void *
-ext3_acl_to_disk(const struct posix_acl *acl, size_t *size)
-{
-	ext3_acl_header *ext_acl;
-	char *e;
-	size_t n;
-
-	*size = ext3_acl_size(acl->a_count);
-	ext_acl = kmalloc(sizeof(ext3_acl_header) + acl->a_count *
-			sizeof(ext3_acl_entry), GFP_NOFS);
-	if (!ext_acl)
-		return ERR_PTR(-ENOMEM);
-	ext_acl->a_version = cpu_to_le32(EXT3_ACL_VERSION);
-	e = (char *)ext_acl + sizeof(ext3_acl_header);
-	for (n=0; n < acl->a_count; n++) {
-		const struct posix_acl_entry *acl_e = &acl->a_entries[n];
-		ext3_acl_entry *entry = (ext3_acl_entry *)e;
-		entry->e_tag  = cpu_to_le16(acl_e->e_tag);
-		entry->e_perm = cpu_to_le16(acl_e->e_perm);
-		switch(acl_e->e_tag) {
-			case ACL_USER:
-				entry->e_id = cpu_to_le32(
-					from_kuid(&init_user_ns, acl_e->e_uid));
-				e += sizeof(ext3_acl_entry);
-				break;
-			case ACL_GROUP:
-				entry->e_id = cpu_to_le32(
-					from_kgid(&init_user_ns, acl_e->e_gid));
-				e += sizeof(ext3_acl_entry);
-				break;
-
-			case ACL_USER_OBJ:
-			case ACL_GROUP_OBJ:
-			case ACL_MASK:
-			case ACL_OTHER:
-				e += sizeof(ext3_acl_entry_short);
-				break;
-
-			default:
-				goto fail;
-		}
-	}
-	return (char *)ext_acl;
-
-fail:
-	kfree(ext_acl);
-	return ERR_PTR(-EINVAL);
-}
-
-/*
- * Inode operation get_posix_acl().
- *
- * inode->i_mutex: don't care
- */
-struct posix_acl *
-ext3_get_acl(struct inode *inode, int type)
-{
-	int name_index;
-	char *value = NULL;
-	struct posix_acl *acl;
-	int retval;
-
-	switch (type) {
-	case ACL_TYPE_ACCESS:
-		name_index = EXT3_XATTR_INDEX_POSIX_ACL_ACCESS;
-		break;
-	case ACL_TYPE_DEFAULT:
-		name_index = EXT3_XATTR_INDEX_POSIX_ACL_DEFAULT;
-		break;
-	default:
-		BUG();
-	}
-
-	retval = ext3_xattr_get(inode, name_index, "", NULL, 0);
-	if (retval > 0) {
-		value = kmalloc(retval, GFP_NOFS);
-		if (!value)
-			return ERR_PTR(-ENOMEM);
-		retval = ext3_xattr_get(inode, name_index, "", value, retval);
-	}
-	if (retval > 0)
-		acl = ext3_acl_from_disk(value, retval);
-	else if (retval == -ENODATA || retval == -ENOSYS)
-		acl = NULL;
-	else
-		acl = ERR_PTR(retval);
-	kfree(value);
-
-	if (!IS_ERR(acl))
-		set_cached_acl(inode, type, acl);
-
-	return acl;
-}
-
-/*
- * Set the access or default ACL of an inode.
- *
- * inode->i_mutex: down unless called from ext3_new_inode
- */
-static int
-__ext3_set_acl(handle_t *handle, struct inode *inode, int type,
-	     struct posix_acl *acl)
-{
-	int name_index;
-	void *value = NULL;
-	size_t size = 0;
-	int error;
-
-	switch(type) {
-		case ACL_TYPE_ACCESS:
-			name_index = EXT3_XATTR_INDEX_POSIX_ACL_ACCESS;
-			if (acl) {
-				error = posix_acl_equiv_mode(acl, &inode->i_mode);
-				if (error < 0)
-					return error;
-				else {
-					inode->i_ctime = CURRENT_TIME_SEC;
-					ext3_mark_inode_dirty(handle, inode);
-					if (error == 0)
-						acl = NULL;
-				}
-			}
-			break;
-
-		case ACL_TYPE_DEFAULT:
-			name_index = EXT3_XATTR_INDEX_POSIX_ACL_DEFAULT;
-			if (!S_ISDIR(inode->i_mode))
-				return acl ? -EACCES : 0;
-			break;
-
-		default:
-			return -EINVAL;
-	}
-	if (acl) {
-		value = ext3_acl_to_disk(acl, &size);
-		if (IS_ERR(value))
-			return (int)PTR_ERR(value);
-	}
-
-	error = ext3_xattr_set_handle(handle, inode, name_index, "",
-				      value, size, 0);
-
-	kfree(value);
-
-	if (!error)
-		set_cached_acl(inode, type, acl);
-
-	return error;
-}
-
-int
-ext3_set_acl(struct inode *inode, struct posix_acl *acl, int type)
-{
-	handle_t *handle;
-	int error, retries = 0;
-
-retry:
-	handle = ext3_journal_start(inode, EXT3_DATA_TRANS_BLOCKS(inode->i_sb));
-	if (IS_ERR(handle))
-		return PTR_ERR(handle);
-	error = __ext3_set_acl(handle, inode, type, acl);
-	ext3_journal_stop(handle);
-	if (error == -ENOSPC && ext3_should_retry_alloc(inode->i_sb, &retries))
-		goto retry;
-	return error;
-}
-
-/*
- * Initialize the ACLs of a new inode. Called from ext3_new_inode.
- *
- * dir->i_mutex: down
- * inode->i_mutex: up (access to inode is still exclusive)
- */
-int
-ext3_init_acl(handle_t *handle, struct inode *inode, struct inode *dir)
-{
-	struct posix_acl *default_acl, *acl;
-	int error;
-
-	error = posix_acl_create(dir, &inode->i_mode, &default_acl, &acl);
-	if (error)
-		return error;
-
-	if (default_acl) {
-		error = __ext3_set_acl(handle, inode, ACL_TYPE_DEFAULT,
-				       default_acl);
-		posix_acl_release(default_acl);
-	}
-	if (acl) {
-		if (!error)
-			error = __ext3_set_acl(handle, inode, ACL_TYPE_ACCESS,
-					       acl);
-		posix_acl_release(acl);
-	}
-	return error;
-}
--- a/fs/ext3/acl.h
+++ b/fs/ext3/acl.h
@ -1,72 +0,0 @@
-/*
-  File: fs/ext3/acl.h
-
-  (C) 2001 Andreas Gruenbacher, <a.gruenbacher@computer.org>
-*/
-
-#include <linux/posix_acl_xattr.h>
-
-#define EXT3_ACL_VERSION	0x0001
-
-typedef struct {
-	__le16		e_tag;
-	__le16		e_perm;
-	__le32		e_id;
-} ext3_acl_entry;
-
-typedef struct {
-	__le16		e_tag;
-	__le16		e_perm;
-} ext3_acl_entry_short;
-
-typedef struct {
-	__le32		a_version;
-} ext3_acl_header;
-
-static inline size_t ext3_acl_size(int count)
-{
-	if (count <= 4) {
-		return sizeof(ext3_acl_header) +
-		       count * sizeof(ext3_acl_entry_short);
-	} else {
-		return sizeof(ext3_acl_header) +
-		       4 * sizeof(ext3_acl_entry_short) +
-		       (count - 4) * sizeof(ext3_acl_entry);
-	}
-}
-
-static inline int ext3_acl_count(size_t size)
-{
-	ssize_t s;
-	size -= sizeof(ext3_acl_header);
-	s = size - 4 * sizeof(ext3_acl_entry_short);
-	if (s < 0) {
-		if (size % sizeof(ext3_acl_entry_short))
-			return -1;
-		return size / sizeof(ext3_acl_entry_short);
-	} else {
-		if (s % sizeof(ext3_acl_entry))
-			return -1;
-		return s / sizeof(ext3_acl_entry) + 4;
-	}
-}
-
-#ifdef CONFIG_EXT3_FS_POSIX_ACL
-
-/* acl.c */
-extern struct posix_acl *ext3_get_acl(struct inode *inode, int type);
-extern int ext3_set_acl(struct inode *inode, struct posix_acl *acl, int type);
-extern int ext3_init_acl (handle_t *, struct inode *, struct inode *);
-
-#else  /* CONFIG_EXT3_FS_POSIX_ACL */
-#include <linux/sched.h>
-#define ext3_get_acl NULL
-#define ext3_set_acl NULL
-
-static inline int
-ext3_init_acl(handle_t *handle, struct inode *inode, struct inode *dir)
-{
-	return 0;
-}
-#endif  /* CONFIG_EXT3_FS_POSIX_ACL */
-
--- a/fs/ext3/balloc.c
+++ b/fs/ext3/balloc.c
--- a/fs/ext3/bitmap.c
+++ b/fs/ext3/bitmap.c
@ -1,20 +0,0 @@
-/*
- *  linux/fs/ext3/bitmap.c
- *
- * Copyright (C) 1992, 1993, 1994, 1995
- * Remy Card (card@masi.ibp.fr)
- * Laboratoire MASI - Institut Blaise Pascal
- * Universite Pierre et Marie Curie (Paris VI)
- */
-
-#include "ext3.h"
-
-#ifdef EXT3FS_DEBUG
-
-unsigned long ext3_count_free (struct buffer_head * map, unsigned int numchars)
-{
-	return numchars * BITS_PER_BYTE - memweight(map->b_data, numchars);
-}
-
-#endif  /*  EXT3FS_DEBUG  */
-
--- a/fs/ext3/dir.c
+++ b/fs/ext3/dir.c
@ -1,537 +0,0 @@
-/*
- *  linux/fs/ext3/dir.c
- *
- * Copyright (C) 1992, 1993, 1994, 1995
- * Remy Card (card@masi.ibp.fr)
- * Laboratoire MASI - Institut Blaise Pascal
- * Universite Pierre et Marie Curie (Paris VI)
- *
- *  from
- *
- *  linux/fs/minix/dir.c
- *
- *  Copyright (C) 1991, 1992  Linus Torvalds
- *
- *  ext3 directory handling functions
- *
- *  Big-endian to little-endian byte-swapping/bitmaps by
- *        David S. Miller (davem@caip.rutgers.edu), 1995
- *
- * Hash Tree Directory indexing (c) 2001  Daniel Phillips
- *
- */
-
-#include <linux/compat.h>
-#include "ext3.h"
-
-static unsigned char ext3_filetype_table[] = {
-	DT_UNKNOWN, DT_REG, DT_DIR, DT_CHR, DT_BLK, DT_FIFO, DT_SOCK, DT_LNK
-};
-
-static int ext3_dx_readdir(struct file *, struct dir_context *);
-
-static unsigned char get_dtype(struct super_block *sb, int filetype)
-{
-	if (!EXT3_HAS_INCOMPAT_FEATURE(sb, EXT3_FEATURE_INCOMPAT_FILETYPE) ||
-	    (filetype >= EXT3_FT_MAX))
-		return DT_UNKNOWN;
-
-	return (ext3_filetype_table[filetype]);
-}
-
-/**
- * Check if the given dir-inode refers to an htree-indexed directory
- * (or a directory which could potentially get converted to use htree
- * indexing).
- *
- * Return 1 if it is a dx dir, 0 if not
- */
-static int is_dx_dir(struct inode *inode)
-{
-	struct super_block *sb = inode->i_sb;
-
-	if (EXT3_HAS_COMPAT_FEATURE(inode->i_sb,
-		     EXT3_FEATURE_COMPAT_DIR_INDEX) &&
-	    ((EXT3_I(inode)->i_flags & EXT3_INDEX_FL) ||
-	     ((inode->i_size >> sb->s_blocksize_bits) == 1)))
-		return 1;
-
-	return 0;
-}
-
-int ext3_check_dir_entry (const char * function, struct inode * dir,
-			  struct ext3_dir_entry_2 * de,
-			  struct buffer_head * bh,
-			  unsigned long offset)
-{
-	const char * error_msg = NULL;
-	const int rlen = ext3_rec_len_from_disk(de->rec_len);
-
-	if (unlikely(rlen < EXT3_DIR_REC_LEN(1)))
-		error_msg = "rec_len is smaller than minimal";
-	else if (unlikely(rlen % 4 != 0))
-		error_msg = "rec_len % 4 != 0";
-	else if (unlikely(rlen < EXT3_DIR_REC_LEN(de->name_len)))
-		error_msg = "rec_len is too small for name_len";
-	else if (unlikely((((char *) de - bh->b_data) + rlen > dir->i_sb->s_blocksize)))
-		error_msg = "directory entry across blocks";
-	else if (unlikely(le32_to_cpu(de->inode) >
-			le32_to_cpu(EXT3_SB(dir->i_sb)->s_es->s_inodes_count)))
-		error_msg = "inode out of bounds";
-
-	if (unlikely(error_msg != NULL))
-		ext3_error (dir->i_sb, function,
-			"bad entry in directory #%lu: %s - "
-			"offset=%lu, inode=%lu, rec_len=%d, name_len=%d",
-			dir->i_ino, error_msg, offset,
-			(unsigned long) le32_to_cpu(de->inode),
-			rlen, de->name_len);
-
-	return error_msg == NULL ? 1 : 0;
-}
-
-static int ext3_readdir(struct file *file, struct dir_context *ctx)
-{
-	unsigned long offset;
-	int i;
-	struct ext3_dir_entry_2 *de;
-	int err;
-	struct inode *inode = file_inode(file);
-	struct super_block *sb = inode->i_sb;
-	int dir_has_error = 0;
-
-	if (is_dx_dir(inode)) {
-		err = ext3_dx_readdir(file, ctx);
-		if (err != ERR_BAD_DX_DIR)
-			return err;
-		/*
-		 * We don't set the inode dirty flag since it's not
-		 * critical that it get flushed back to the disk.
-		 */
-		EXT3_I(inode)->i_flags &= ~EXT3_INDEX_FL;
-	}
-	offset = ctx->pos & (sb->s_blocksize - 1);
-
-	while (ctx->pos < inode->i_size) {
-		unsigned long blk = ctx->pos >> EXT3_BLOCK_SIZE_BITS(sb);
-		struct buffer_head map_bh;
-		struct buffer_head *bh = NULL;
-
-		map_bh.b_state = 0;
-		err = ext3_get_blocks_handle(NULL, inode, blk, 1, &map_bh, 0);
-		if (err > 0) {
-			pgoff_t index = map_bh.b_blocknr >>
-					(PAGE_CACHE_SHIFT - inode->i_blkbits);
-			if (!ra_has_index(&file->f_ra, index))
-				page_cache_sync_readahead(
-					sb->s_bdev->bd_inode->i_mapping,
-					&file->f_ra, file,
-					index, 1);
-			file->f_ra.prev_pos = (loff_t)index << PAGE_CACHE_SHIFT;
-			bh = ext3_bread(NULL, inode, blk, 0, &err);
-		}
-
-		/*
-		 * We ignore I/O errors on directories so users have a chance
-		 * of recovering data when there's a bad sector
-		 */
-		if (!bh) {
-			if (!dir_has_error) {
-				ext3_error(sb, __func__, "directory #%lu "
-					"contains a hole at offset %lld",
-					inode->i_ino, ctx->pos);
-				dir_has_error = 1;
-			}
-			/* corrupt size?  Maybe no more blocks to read */
-			if (ctx->pos > inode->i_blocks << 9)
-				break;
-			ctx->pos += sb->s_blocksize - offset;
-			continue;
-		}
-
-		/* If the dir block has changed since the last call to
-		 * readdir(2), then we might be pointing to an invalid
-		 * dirent right now.  Scan from the start of the block
-		 * to make sure. */
-		if (offset && file->f_version != inode->i_version) {
-			for (i = 0; i < sb->s_blocksize && i < offset; ) {
-				de = (struct ext3_dir_entry_2 *)
-					(bh->b_data + i);
-				/* It's too expensive to do a full
-				 * dirent test each time round this
-				 * loop, but we do have to test at
-				 * least that it is non-zero.  A
-				 * failure will be detected in the
-				 * dirent test below. */
-				if (ext3_rec_len_from_disk(de->rec_len) <
-						EXT3_DIR_REC_LEN(1))
-					break;
-				i += ext3_rec_len_from_disk(de->rec_len);
-			}
-			offset = i;
-			ctx->pos = (ctx->pos & ~(sb->s_blocksize - 1))
-				| offset;
-			file->f_version = inode->i_version;
-		}
-
-		while (ctx->pos < inode->i_size
-		       && offset < sb->s_blocksize) {
-			de = (struct ext3_dir_entry_2 *) (bh->b_data + offset);
-			if (!ext3_check_dir_entry ("ext3_readdir", inode, de,
-						   bh, offset)) {
-				/* On error, skip the to the
-                                   next block. */
-				ctx->pos = (ctx->pos |
-						(sb->s_blocksize - 1)) + 1;
-				break;
-			}
-			offset += ext3_rec_len_from_disk(de->rec_len);
-			if (le32_to_cpu(de->inode)) {
-				if (!dir_emit(ctx, de->name, de->name_len,
-					      le32_to_cpu(de->inode),
-					      get_dtype(sb, de->file_type))) {
-					brelse(bh);
-					return 0;
-				}
-			}
-			ctx->pos += ext3_rec_len_from_disk(de->rec_len);
-		}
-		offset = 0;
-		brelse (bh);
-		if (ctx->pos < inode->i_size)
-			if (!dir_relax(inode))
-				return 0;
-	}
-	return 0;
-}
-
-static inline int is_32bit_api(void)
-{
-#ifdef CONFIG_COMPAT
-	return is_compat_task();
-#else
-	return (BITS_PER_LONG == 32);
-#endif
-}
-
-/*
- * These functions convert from the major/minor hash to an f_pos
- * value for dx directories
- *
- * Upper layer (for example NFS) should specify FMODE_32BITHASH or
- * FMODE_64BITHASH explicitly. On the other hand, we allow ext3 to be mounted
- * directly on both 32-bit and 64-bit nodes, under such case, neither
- * FMODE_32BITHASH nor FMODE_64BITHASH is specified.
- */
-static inline loff_t hash2pos(struct file *filp, __u32 major, __u32 minor)
-{
-	if ((filp->f_mode & FMODE_32BITHASH) ||
-	    (!(filp->f_mode & FMODE_64BITHASH) && is_32bit_api()))
-		return major >> 1;
-	else
-		return ((__u64)(major >> 1) << 32) | (__u64)minor;
-}
-
-static inline __u32 pos2maj_hash(struct file *filp, loff_t pos)
-{
-	if ((filp->f_mode & FMODE_32BITHASH) ||
-	    (!(filp->f_mode & FMODE_64BITHASH) && is_32bit_api()))
-		return (pos << 1) & 0xffffffff;
-	else
-		return ((pos >> 32) << 1) & 0xffffffff;
-}
-
-static inline __u32 pos2min_hash(struct file *filp, loff_t pos)
-{
-	if ((filp->f_mode & FMODE_32BITHASH) ||
-	    (!(filp->f_mode & FMODE_64BITHASH) && is_32bit_api()))
-		return 0;
-	else
-		return pos & 0xffffffff;
-}
-
-/*
- * Return 32- or 64-bit end-of-file for dx directories
- */
-static inline loff_t ext3_get_htree_eof(struct file *filp)
-{
-	if ((filp->f_mode & FMODE_32BITHASH) ||
-	    (!(filp->f_mode & FMODE_64BITHASH) && is_32bit_api()))
-		return EXT3_HTREE_EOF_32BIT;
-	else
-		return EXT3_HTREE_EOF_64BIT;
-}
-
-
-/*
- * ext3_dir_llseek() calls generic_file_llseek[_size]() to handle both
- * non-htree and htree directories, where the "offset" is in terms
- * of the filename hash value instead of the byte offset.
- *
- * Because we may return a 64-bit hash that is well beyond s_maxbytes,
- * we need to pass the max hash as the maximum allowable offset in
- * the htree directory case.
- *
- * NOTE: offsets obtained *before* ext3_set_inode_flag(dir, EXT3_INODE_INDEX)
- *       will be invalid once the directory was converted into a dx directory
- */
-static loff_t ext3_dir_llseek(struct file *file, loff_t offset, int whence)
-{
-	struct inode *inode = file->f_mapping->host;
-	int dx_dir = is_dx_dir(inode);
-	loff_t htree_max = ext3_get_htree_eof(file);
-
-	if (likely(dx_dir))
-		return generic_file_llseek_size(file, offset, whence,
-					        htree_max, htree_max);
-	else
-		return generic_file_llseek(file, offset, whence);
-}
-
-/*
- * This structure holds the nodes of the red-black tree used to store
- * the directory entry in hash order.
- */
-struct fname {
-	__u32		hash;
-	__u32		minor_hash;
-	struct rb_node	rb_hash;
-	struct fname	*next;
-	__u32		inode;
-	__u8		name_len;
-	__u8		file_type;
-	char		name[0];
-};
-
-/*
- * This functoin implements a non-recursive way of freeing all of the
- * nodes in the red-black tree.
- */
-static void free_rb_tree_fname(struct rb_root *root)
-{
-	struct fname *fname, *next;
-
-	rbtree_postorder_for_each_entry_safe(fname, next, root, rb_hash)
-		do {
-			struct fname *old = fname;
-			fname = fname->next;
-			kfree(old);
-		} while (fname);
-
-	*root = RB_ROOT;
-}
-
-static struct dir_private_info *ext3_htree_create_dir_info(struct file *filp,
-							   loff_t pos)
-{
-	struct dir_private_info *p;
-
-	p = kzalloc(sizeof(struct dir_private_info), GFP_KERNEL);
-	if (!p)
-		return NULL;
-	p->curr_hash = pos2maj_hash(filp, pos);
-	p->curr_minor_hash = pos2min_hash(filp, pos);
-	return p;
-}
-
-void ext3_htree_free_dir_info(struct dir_private_info *p)
-{
-	free_rb_tree_fname(&p->root);
-	kfree(p);
-}
-
-/*
- * Given a directory entry, enter it into the fname rb tree.
- */
-int ext3_htree_store_dirent(struct file *dir_file, __u32 hash,
-			     __u32 minor_hash,
-			     struct ext3_dir_entry_2 *dirent)
-{
-	struct rb_node **p, *parent = NULL;
-	struct fname * fname, *new_fn;
-	struct dir_private_info *info;
-	int len;
-
-	info = (struct dir_private_info *) dir_file->private_data;
-	p = &info->root.rb_node;
-
-	/* Create and allocate the fname structure */
-	len = sizeof(struct fname) + dirent->name_len + 1;
-	new_fn = kzalloc(len, GFP_KERNEL);
-	if (!new_fn)
-		return -ENOMEM;
-	new_fn->hash = hash;
-	new_fn->minor_hash = minor_hash;
-	new_fn->inode = le32_to_cpu(dirent->inode);
-	new_fn->name_len = dirent->name_len;
-	new_fn->file_type = dirent->file_type;
-	memcpy(new_fn->name, dirent->name, dirent->name_len);
-	new_fn->name[dirent->name_len] = 0;
-
-	while (*p) {
-		parent = *p;
-		fname = rb_entry(parent, struct fname, rb_hash);
-
-		/*
-		 * If the hash and minor hash match up, then we put
-		 * them on a linked list.  This rarely happens...
-		 */
-		if ((new_fn->hash == fname->hash) &&
-		    (new_fn->minor_hash == fname->minor_hash)) {
-			new_fn->next = fname->next;
-			fname->next = new_fn;
-			return 0;
-		}
-
-		if (new_fn->hash < fname->hash)
-			p = &(*p)->rb_left;
-		else if (new_fn->hash > fname->hash)
-			p = &(*p)->rb_right;
-		else if (new_fn->minor_hash < fname->minor_hash)
-			p = &(*p)->rb_left;
-		else /* if (new_fn->minor_hash > fname->minor_hash) */
-			p = &(*p)->rb_right;
-	}
-
-	rb_link_node(&new_fn->rb_hash, parent, p);
-	rb_insert_color(&new_fn->rb_hash, &info->root);
-	return 0;
-}
-
-
-
-/*
- * This is a helper function for ext3_dx_readdir.  It calls filldir
- * for all entres on the fname linked list.  (Normally there is only
- * one entry on the linked list, unless there are 62 bit hash collisions.)
- */
-static bool call_filldir(struct file *file, struct dir_context *ctx,
-			struct fname *fname)
-{
-	struct dir_private_info *info = file->private_data;
-	struct inode *inode = file_inode(file);
-	struct super_block *sb = inode->i_sb;
-
-	if (!fname) {
-		printk("call_filldir: called with null fname?!?\n");
-		return true;
-	}
-	ctx->pos = hash2pos(file, fname->hash, fname->minor_hash);
-	while (fname) {
-		if (!dir_emit(ctx, fname->name, fname->name_len,
-				fname->inode,
-				get_dtype(sb, fname->file_type))) {
-			info->extra_fname = fname;
-			return false;
-		}
-		fname = fname->next;
-	}
-	return true;
-}
-
-static int ext3_dx_readdir(struct file *file, struct dir_context *ctx)
-{
-	struct dir_private_info *info = file->private_data;
-	struct inode *inode = file_inode(file);
-	struct fname *fname;
-	int	ret;
-
-	if (!info) {
-		info = ext3_htree_create_dir_info(file, ctx->pos);
-		if (!info)
-			return -ENOMEM;
-		file->private_data = info;
-	}
-
-	if (ctx->pos == ext3_get_htree_eof(file))
-		return 0;	/* EOF */
-
-	/* Some one has messed with f_pos; reset the world */
-	if (info->last_pos != ctx->pos) {
-		free_rb_tree_fname(&info->root);
-		info->curr_node = NULL;
-		info->extra_fname = NULL;
-		info->curr_hash = pos2maj_hash(file, ctx->pos);
-		info->curr_minor_hash = pos2min_hash(file, ctx->pos);
-	}
-
-	/*
-	 * If there are any leftover names on the hash collision
-	 * chain, return them first.
-	 */
-	if (info->extra_fname) {
-		if (!call_filldir(file, ctx, info->extra_fname))
-			goto finished;
-		info->extra_fname = NULL;
-		goto next_node;
-	} else if (!info->curr_node)
-		info->curr_node = rb_first(&info->root);
-
-	while (1) {
-		/*
-		 * Fill the rbtree if we have no more entries,
-		 * or the inode has changed since we last read in the
-		 * cached entries.
-		 */
-		if ((!info->curr_node) ||
-		    (file->f_version != inode->i_version)) {
-			info->curr_node = NULL;
-			free_rb_tree_fname(&info->root);
-			file->f_version = inode->i_version;
-			ret = ext3_htree_fill_tree(file, info->curr_hash,
-						   info->curr_minor_hash,
-						   &info->next_hash);
-			if (ret < 0)
-				return ret;
-			if (ret == 0) {
-				ctx->pos = ext3_get_htree_eof(file);
-				break;
-			}
-			info->curr_node = rb_first(&info->root);
-		}
-
-		fname = rb_entry(info->curr_node, struct fname, rb_hash);
-		info->curr_hash = fname->hash;
-		info->curr_minor_hash = fname->minor_hash;
-		if (!call_filldir(file, ctx, fname))
-			break;
-	next_node:
-		info->curr_node = rb_next(info->curr_node);
-		if (info->curr_node) {
-			fname = rb_entry(info->curr_node, struct fname,
-					 rb_hash);
-			info->curr_hash = fname->hash;
-			info->curr_minor_hash = fname->minor_hash;
-		} else {
-			if (info->next_hash == ~0) {
-				ctx->pos = ext3_get_htree_eof(file);
-				break;
-			}
-			info->curr_hash = info->next_hash;
-			info->curr_minor_hash = 0;
-		}
-	}
-finished:
-	info->last_pos = ctx->pos;
-	return 0;
-}
-
-static int ext3_release_dir (struct inode * inode, struct file * filp)
-{
-       if (filp->private_data)
-		ext3_htree_free_dir_info(filp->private_data);
-
-	return 0;
-}
-
-const struct file_operations ext3_dir_operations = {
-	.llseek		= ext3_dir_llseek,
-	.read		= generic_read_dir,
-	.iterate	= ext3_readdir,
-	.unlocked_ioctl = ext3_ioctl,
-#ifdef CONFIG_COMPAT
-	.compat_ioctl	= ext3_compat_ioctl,
-#endif
-	.fsync		= ext3_sync_file,
-	.release	= ext3_release_dir,
-};
--- a/fs/ext3/ext3.h
+++ b/fs/ext3/ext3.h
--- a/fs/ext3/ext3_jbd.c
+++ b/fs/ext3/ext3_jbd.c
@ -1,59 +0,0 @@
-/*
- * Interface between ext3 and JBD
- */
-
-#include "ext3.h"
-
-int __ext3_journal_get_undo_access(const char *where, handle_t *handle,
-				struct buffer_head *bh)
-{
-	int err = journal_get_undo_access(handle, bh);
-	if (err)
-		ext3_journal_abort_handle(where, __func__, bh, handle,err);
-	return err;
-}
-
-int __ext3_journal_get_write_access(const char *where, handle_t *handle,
-				struct buffer_head *bh)
-{
-	int err = journal_get_write_access(handle, bh);
-	if (err)
-		ext3_journal_abort_handle(where, __func__, bh, handle,err);
-	return err;
-}
-
-int __ext3_journal_forget(const char *where, handle_t *handle,
-				struct buffer_head *bh)
-{
-	int err = journal_forget(handle, bh);
-	if (err)
-		ext3_journal_abort_handle(where, __func__, bh, handle,err);
-	return err;
-}
-
-int __ext3_journal_revoke(const char *where, handle_t *handle,
-				unsigned long blocknr, struct buffer_head *bh)
-{
-	int err = journal_revoke(handle, blocknr, bh);
-	if (err)
-		ext3_journal_abort_handle(where, __func__, bh, handle,err);
-	return err;
-}
-
-int __ext3_journal_get_create_access(const char *where,
-				handle_t *handle, struct buffer_head *bh)
-{
-	int err = journal_get_create_access(handle, bh);
-	if (err)
-		ext3_journal_abort_handle(where, __func__, bh, handle,err);
-	return err;
-}
-
-int __ext3_journal_dirty_metadata(const char *where,
-				handle_t *handle, struct buffer_head *bh)
-{
-	int err = journal_dirty_metadata(handle, bh);
-	if (err)
-		ext3_journal_abort_handle(where, __func__, bh, handle,err);
-	return err;
-}
--- a/fs/ext3/file.c
+++ b/fs/ext3/file.c
@ -1,79 +0,0 @@
-/*
- *  linux/fs/ext3/file.c
- *
- * Copyright (C) 1992, 1993, 1994, 1995
- * Remy Card (card@masi.ibp.fr)
- * Laboratoire MASI - Institut Blaise Pascal
- * Universite Pierre et Marie Curie (Paris VI)
- *
- *  from
- *
- *  linux/fs/minix/file.c
- *
- *  Copyright (C) 1991, 1992  Linus Torvalds
- *
- *  ext3 fs regular file handling primitives
- *
- *  64-bit file support on 64-bit platforms by Jakub Jelinek
- *	(jj@sunsite.ms.mff.cuni.cz)
- */
-
-#include <linux/quotaops.h>
-#include "ext3.h"
-#include "xattr.h"
-#include "acl.h"
-
-/*
- * Called when an inode is released. Note that this is different
- * from ext3_file_open: open gets called at every open, but release
- * gets called only when /all/ the files are closed.
- */
-static int ext3_release_file (struct inode * inode, struct file * filp)
-{
-	if (ext3_test_inode_state(inode, EXT3_STATE_FLUSH_ON_CLOSE)) {
-		filemap_flush(inode->i_mapping);
-		ext3_clear_inode_state(inode, EXT3_STATE_FLUSH_ON_CLOSE);
-	}
-	/* if we are the last writer on the inode, drop the block reservation */
-	if ((filp->f_mode & FMODE_WRITE) &&
-			(atomic_read(&inode->i_writecount) == 1))
-	{
-		mutex_lock(&EXT3_I(inode)->truncate_mutex);
-		ext3_discard_reservation(inode);
-		mutex_unlock(&EXT3_I(inode)->truncate_mutex);
-	}
-	if (is_dx(inode) && filp->private_data)
-		ext3_htree_free_dir_info(filp->private_data);
-
-	return 0;
-}
-
-const struct file_operations ext3_file_operations = {
-	.llseek		= generic_file_llseek,
-	.read_iter	= generic_file_read_iter,
-	.write_iter	= generic_file_write_iter,
-	.unlocked_ioctl	= ext3_ioctl,
-#ifdef CONFIG_COMPAT
-	.compat_ioctl	= ext3_compat_ioctl,
-#endif
-	.mmap		= generic_file_mmap,
-	.open		= dquot_file_open,
-	.release	= ext3_release_file,
-	.fsync		= ext3_sync_file,
-	.splice_read	= generic_file_splice_read,
-	.splice_write	= iter_file_splice_write,
-};
-
-const struct inode_operations ext3_file_inode_operations = {
-	.setattr	= ext3_setattr,
-#ifdef CONFIG_EXT3_FS_XATTR
-	.setxattr	= generic_setxattr,
-	.getxattr	= generic_getxattr,
-	.listxattr	= ext3_listxattr,
-	.removexattr	= generic_removexattr,
-#endif
-	.get_acl	= ext3_get_acl,
-	.set_acl	= ext3_set_acl,
-	.fiemap		= ext3_fiemap,
-};
-
--- a/fs/ext3/fsync.c
+++ b/fs/ext3/fsync.c
@ -1,109 +0,0 @@
-/*
- *  linux/fs/ext3/fsync.c
- *
- *  Copyright (C) 1993  Stephen Tweedie (sct@redhat.com)
- *  from
- *  Copyright (C) 1992  Remy Card (card@masi.ibp.fr)
- *                      Laboratoire MASI - Institut Blaise Pascal
- *                      Universite Pierre et Marie Curie (Paris VI)
- *  from
- *  linux/fs/minix/truncate.c   Copyright (C) 1991, 1992  Linus Torvalds
- *
- *  ext3fs fsync primitive
- *
- *  Big-endian to little-endian byte-swapping/bitmaps by
- *        David S. Miller (davem@caip.rutgers.edu), 1995
- *
- *  Removed unnecessary code duplication for little endian machines
- *  and excessive __inline__s.
- *        Andi Kleen, 1997
- *
- * Major simplications and cleanup - we only need to do the metadata, because
- * we can depend on generic_block_fdatasync() to sync the data blocks.
- */
-
-#include <linux/blkdev.h>
-#include <linux/writeback.h>
-#include "ext3.h"
-
-/*
- * akpm: A new design for ext3_sync_file().
- *
- * This is only called from sys_fsync(), sys_fdatasync() and sys_msync().
- * There cannot be a transaction open by this task.
- * Another task could have dirtied this inode.  Its data can be in any
- * state in the journalling system.
- *
- * What we do is just kick off a commit and wait on it.  This will snapshot the
- * inode to disk.
- */
-
-int ext3_sync_file(struct file *file, loff_t start, loff_t end, int datasync)
-{
-	struct inode *inode = file->f_mapping->host;
-	struct ext3_inode_info *ei = EXT3_I(inode);
-	journal_t *journal = EXT3_SB(inode->i_sb)->s_journal;
-	int ret, needs_barrier = 0;
-	tid_t commit_tid;
-
-	trace_ext3_sync_file_enter(file, datasync);
-
-	if (inode->i_sb->s_flags & MS_RDONLY) {
-		/* Make sure that we read updated state */
-		smp_rmb();
-		if (EXT3_SB(inode->i_sb)->s_mount_state & EXT3_ERROR_FS)
-			return -EROFS;
-		return 0;
-	}
-	ret = filemap_write_and_wait_range(inode->i_mapping, start, end);
-	if (ret)
-		goto out;
-
-	J_ASSERT(ext3_journal_current_handle() == NULL);
-
-	/*
-	 * data=writeback,ordered:
-	 *  The caller's filemap_fdatawrite()/wait will sync the data.
-	 *  Metadata is in the journal, we wait for a proper transaction
-	 *  to commit here.
-	 *
-	 * data=journal:
-	 *  filemap_fdatawrite won't do anything (the buffers are clean).
-	 *  ext3_force_commit will write the file data into the journal and
-	 *  will wait on that.
-	 *  filemap_fdatawait() will encounter a ton of newly-dirtied pages
-	 *  (they were dirtied by commit).  But that's OK - the blocks are
-	 *  safe in-journal, which is all fsync() needs to ensure.
-	 */
-	if (ext3_should_journal_data(inode)) {
-		ret = ext3_force_commit(inode->i_sb);
-		goto out;
-	}
-
-	if (datasync)
-		commit_tid = atomic_read(&ei->i_datasync_tid);
-	else
-		commit_tid = atomic_read(&ei->i_sync_tid);
-
-	if (test_opt(inode->i_sb, BARRIER) &&
-	    !journal_trans_will_send_data_barrier(journal, commit_tid))
-		needs_barrier = 1;
-	log_start_commit(journal, commit_tid);
-	ret = log_wait_commit(journal, commit_tid);
-
-	/*
-	 * In case we didn't commit a transaction, we have to flush
-	 * disk caches manually so that data really is on persistent
-	 * storage
-	 */
-	if (needs_barrier) {
-		int err;
-
-		err = blkdev_issue_flush(inode->i_sb->s_bdev, GFP_KERNEL, NULL);
-		if (!ret)
-			ret = err;
-	}
-out:
-	trace_ext3_sync_file_exit(inode, ret);
-	return ret;
-}
--- a/fs/ext3/hash.c
+++ b/fs/ext3/hash.c
@ -1,206 +0,0 @@
-/*
- *  linux/fs/ext3/hash.c
- *
- * Copyright (C) 2002 by Theodore Ts'o
- *
- * This file is released under the GPL v2.
- *
- * This file may be redistributed under the terms of the GNU Public
- * License.
- */
-
-#include "ext3.h"
-#include <linux/cryptohash.h>
-
-#define DELTA 0x9E3779B9
-
-static void TEA_transform(__u32 buf[4], __u32 const in[])
-{
-	__u32	sum = 0;
-	__u32	b0 = buf[0], b1 = buf[1];
-	__u32	a = in[0], b = in[1], c = in[2], d = in[3];
-	int	n = 16;
-
-	do {
-		sum += DELTA;
-		b0 += ((b1 << 4)+a) ^ (b1+sum) ^ ((b1 >> 5)+b);
-		b1 += ((b0 << 4)+c) ^ (b0+sum) ^ ((b0 >> 5)+d);
-	} while(--n);
-
-	buf[0] += b0;
-	buf[1] += b1;
-}
-
-
-/* The old legacy hash */
-static __u32 dx_hack_hash_unsigned(const char *name, int len)
-{
-	__u32 hash, hash0 = 0x12a3fe2d, hash1 = 0x37abe8f9;
-	const unsigned char *ucp = (const unsigned char *) name;
-
-	while (len--) {
-		hash = hash1 + (hash0 ^ (((int) *ucp++) * 7152373));
-
-		if (hash & 0x80000000)
-			hash -= 0x7fffffff;
-		hash1 = hash0;
-		hash0 = hash;
-	}
-	return hash0 << 1;
-}
-
-static __u32 dx_hack_hash_signed(const char *name, int len)
-{
-	__u32 hash, hash0 = 0x12a3fe2d, hash1 = 0x37abe8f9;
-	const signed char *scp = (const signed char *) name;
-
-	while (len--) {
-		hash = hash1 + (hash0 ^ (((int) *scp++) * 7152373));
-
-		if (hash & 0x80000000)
-			hash -= 0x7fffffff;
-		hash1 = hash0;
-		hash0 = hash;
-	}
-	return hash0 << 1;
-}
-
-static void str2hashbuf_signed(const char *msg, int len, __u32 *buf, int num)
-{
-	__u32	pad, val;
-	int	i;
-	const signed char *scp = (const signed char *) msg;
-
-	pad = (__u32)len | ((__u32)len << 8);
-	pad |= pad << 16;
-
-	val = pad;
-	if (len > num*4)
-		len = num * 4;
-	for (i = 0; i < len; i++) {
-		if ((i % 4) == 0)
-			val = pad;
-		val = ((int) scp[i]) + (val << 8);
-		if ((i % 4) == 3) {
-			*buf++ = val;
-			val = pad;
-			num--;
-		}
-	}
-	if (--num >= 0)
-		*buf++ = val;
-	while (--num >= 0)
-		*buf++ = pad;
-}
-
-static void str2hashbuf_unsigned(const char *msg, int len, __u32 *buf, int num)
-{
-	__u32	pad, val;
-	int	i;
-	const unsigned char *ucp = (const unsigned char *) msg;
-
-	pad = (__u32)len | ((__u32)len << 8);
-	pad |= pad << 16;
-
-	val = pad;
-	if (len > num*4)
-		len = num * 4;
-	for (i=0; i < len; i++) {
-		if ((i % 4) == 0)
-			val = pad;
-		val = ((int) ucp[i]) + (val << 8);
-		if ((i % 4) == 3) {
-			*buf++ = val;
-			val = pad;
-			num--;
-		}
-	}
-	if (--num >= 0)
-		*buf++ = val;
-	while (--num >= 0)
-		*buf++ = pad;
-}
-
-/*
- * Returns the hash of a filename.  If len is 0 and name is NULL, then
- * this function can be used to test whether or not a hash version is
- * supported.
- *
- * The seed is an 4 longword (32 bits) "secret" which can be used to
- * uniquify a hash.  If the seed is all zero's, then some default seed
- * may be used.
- *
- * A particular hash version specifies whether or not the seed is
- * represented, and whether or not the returned hash is 32 bits or 64
- * bits.  32 bit hashes will return 0 for the minor hash.
- */
-int ext3fs_dirhash(const char *name, int len, struct dx_hash_info *hinfo)
-{
-	__u32	hash;
-	__u32	minor_hash = 0;
-	const char	*p;
-	int		i;
-	__u32		in[8], buf[4];
-	void		(*str2hashbuf)(const char *, int, __u32 *, int) =
-				str2hashbuf_signed;
-
-	/* Initialize the default seed for the hash checksum functions */
-	buf[0] = 0x67452301;
-	buf[1] = 0xefcdab89;
-	buf[2] = 0x98badcfe;
-	buf[3] = 0x10325476;
-
-	/* Check to see if the seed is all zero's */
-	if (hinfo->seed) {
-		for (i=0; i < 4; i++) {
-			if (hinfo->seed[i])
-				break;
-		}
-		if (i < 4)
-			memcpy(buf, hinfo->seed, sizeof(buf));
-	}
-
-	switch (hinfo->hash_version) {
-	case DX_HASH_LEGACY_UNSIGNED:
-		hash = dx_hack_hash_unsigned(name, len);
-		break;
-	case DX_HASH_LEGACY:
-		hash = dx_hack_hash_signed(name, len);
-		break;
-	case DX_HASH_HALF_MD4_UNSIGNED:
-		str2hashbuf = str2hashbuf_unsigned;
-	case DX_HASH_HALF_MD4:
-		p = name;
-		while (len > 0) {
-			(*str2hashbuf)(p, len, in, 8);
-			half_md4_transform(buf, in);
-			len -= 32;
-			p += 32;
-		}
-		minor_hash = buf[2];
-		hash = buf[1];
-		break;
-	case DX_HASH_TEA_UNSIGNED:
-		str2hashbuf = str2hashbuf_unsigned;
-	case DX_HASH_TEA:
-		p = name;
-		while (len > 0) {
-			(*str2hashbuf)(p, len, in, 4);
-			TEA_transform(buf, in);
-			len -= 16;
-			p += 16;
-		}
-		hash = buf[0];
-		minor_hash = buf[1];
-		break;
-	default:
-		hinfo->hash = 0;
-		return -1;
-	}
-	hash = hash & ~1;
-	if (hash == (EXT3_HTREE_EOF_32BIT << 1))
-		hash = (EXT3_HTREE_EOF_32BIT - 1) << 1;
-	hinfo->hash = hash;
-	hinfo->minor_hash = minor_hash;
-	return 0;
-}
--- a/fs/ext3/ialloc.c
+++ b/fs/ext3/ialloc.c
@ -1,706 +0,0 @@
-/*
- *  linux/fs/ext3/ialloc.c
- *
- * Copyright (C) 1992, 1993, 1994, 1995
- * Remy Card (card@masi.ibp.fr)
- * Laboratoire MASI - Institut Blaise Pascal
- * Universite Pierre et Marie Curie (Paris VI)
- *
- *  BSD ufs-inspired inode and directory allocation by
- *  Stephen Tweedie (sct@redhat.com), 1993
- *  Big-endian to little-endian byte-swapping/bitmaps by
- *        David S. Miller (davem@caip.rutgers.edu), 1995
- */
-
-#include <linux/quotaops.h>
-#include <linux/random.h>
-
-#include "ext3.h"
-#include "xattr.h"
-#include "acl.h"
-
-/*
- * ialloc.c contains the inodes allocation and deallocation routines
- */
-
-/*
- * The free inodes are managed by bitmaps.  A file system contains several
- * blocks groups.  Each group contains 1 bitmap block for blocks, 1 bitmap
- * block for inodes, N blocks for the inode table and data blocks.
- *
- * The file system contains group descriptors which are located after the
- * super block.  Each descriptor contains the number of the bitmap block and
- * the free blocks count in the block.
- */
-
-
-/*
- * Read the inode allocation bitmap for a given block_group, reading
- * into the specified slot in the superblock's bitmap cache.
- *
- * Return buffer_head of bitmap on success or NULL.
- */
-static struct buffer_head *
-read_inode_bitmap(struct super_block * sb, unsigned long block_group)
-{
-	struct ext3_group_desc *desc;
-	struct buffer_head *bh = NULL;
-
-	desc = ext3_get_group_desc(sb, block_group, NULL);
-	if (!desc)
-		goto error_out;
-
-	bh = sb_bread(sb, le32_to_cpu(desc->bg_inode_bitmap));
-	if (!bh)
-		ext3_error(sb, "read_inode_bitmap",
-			    "Cannot read inode bitmap - "
-			    "block_group = %lu, inode_bitmap = %u",
-			    block_group, le32_to_cpu(desc->bg_inode_bitmap));
-error_out:
-	return bh;
-}
-
-/*
- * NOTE! When we get the inode, we're the only people
- * that have access to it, and as such there are no
- * race conditions we have to worry about. The inode
- * is not on the hash-lists, and it cannot be reached
- * through the filesystem because the directory entry
- * has been deleted earlier.
- *
- * HOWEVER: we must make sure that we get no aliases,
- * which means that we have to call "clear_inode()"
- * _before_ we mark the inode not in use in the inode
- * bitmaps. Otherwise a newly created file might use
- * the same inode number (not actually the same pointer
- * though), and then we'd have two inodes sharing the
- * same inode number and space on the harddisk.
- */
-void ext3_free_inode (handle_t *handle, struct inode * inode)
-{
-	struct super_block * sb = inode->i_sb;
-	int is_directory;
-	unsigned long ino;
-	struct buffer_head *bitmap_bh = NULL;
-	struct buffer_head *bh2;
-	unsigned long block_group;
-	unsigned long bit;
-	struct ext3_group_desc * gdp;
-	struct ext3_super_block * es;
-	struct ext3_sb_info *sbi;
-	int fatal = 0, err;
-
-	if (atomic_read(&inode->i_count) > 1) {
-		printk ("ext3_free_inode: inode has count=%d\n",
-					atomic_read(&inode->i_count));
-		return;
-	}
-	if (inode->i_nlink) {
-		printk ("ext3_free_inode: inode has nlink=%d\n",
-			inode->i_nlink);
-		return;
-	}
-	if (!sb) {
-		printk("ext3_free_inode: inode on nonexistent device\n");
-		return;
-	}
-	sbi = EXT3_SB(sb);
-
-	ino = inode->i_ino;
-	ext3_debug ("freeing inode %lu\n", ino);
-	trace_ext3_free_inode(inode);
-
-	is_directory = S_ISDIR(inode->i_mode);
-
-	es = EXT3_SB(sb)->s_es;
-	if (ino < EXT3_FIRST_INO(sb) || ino > le32_to_cpu(es->s_inodes_count)) {
-		ext3_error (sb, "ext3_free_inode",
-			    "reserved or nonexistent inode %lu", ino);
-		goto error_return;
-	}
-	block_group = (ino - 1) / EXT3_INODES_PER_GROUP(sb);
-	bit = (ino - 1) % EXT3_INODES_PER_GROUP(sb);
-	bitmap_bh = read_inode_bitmap(sb, block_group);
-	if (!bitmap_bh)
-		goto error_return;
-
-	BUFFER_TRACE(bitmap_bh, "get_write_access");
-	fatal = ext3_journal_get_write_access(handle, bitmap_bh);
-	if (fatal)
-		goto error_return;
-
-	/* Ok, now we can actually update the inode bitmaps.. */
-	if (!ext3_clear_bit_atomic(sb_bgl_lock(sbi, block_group),
-					bit, bitmap_bh->b_data))
-		ext3_error (sb, "ext3_free_inode",
-			      "bit already cleared for inode %lu", ino);
-	else {
-		gdp = ext3_get_group_desc (sb, block_group, &bh2);
-
-		BUFFER_TRACE(bh2, "get_write_access");
-		fatal = ext3_journal_get_write_access(handle, bh2);
-		if (fatal) goto error_return;
-
-		if (gdp) {
-			spin_lock(sb_bgl_lock(sbi, block_group));
-			le16_add_cpu(&gdp->bg_free_inodes_count, 1);
-			if (is_directory)
-				le16_add_cpu(&gdp->bg_used_dirs_count, -1);
-			spin_unlock(sb_bgl_lock(sbi, block_group));
-			percpu_counter_inc(&sbi->s_freeinodes_counter);
-			if (is_directory)
-				percpu_counter_dec(&sbi->s_dirs_counter);
-
-		}
-		BUFFER_TRACE(bh2, "call ext3_journal_dirty_metadata");
-		err = ext3_journal_dirty_metadata(handle, bh2);
-		if (!fatal) fatal = err;
-	}
-	BUFFER_TRACE(bitmap_bh, "call ext3_journal_dirty_metadata");
-	err = ext3_journal_dirty_metadata(handle, bitmap_bh);
-	if (!fatal)
-		fatal = err;
-
-error_return:
-	brelse(bitmap_bh);
-	ext3_std_error(sb, fatal);
-}
-
-/*
- * Orlov's allocator for directories.
- *
- * We always try to spread first-level directories.
- *
- * If there are blockgroups with both free inodes and free blocks counts
- * not worse than average we return one with smallest directory count.
- * Otherwise we simply return a random group.
- *
- * For the rest rules look so:
- *
- * It's OK to put directory into a group unless
- * it has too many directories already (max_dirs) or
- * it has too few free inodes left (min_inodes) or
- * it has too few free blocks left (min_blocks).
- * Parent's group is preferred, if it doesn't satisfy these
- * conditions we search cyclically through the rest. If none
- * of the groups look good we just look for a group with more
- * free inodes than average (starting at parent's group).
- *
- * Debt is incremented each time we allocate a directory and decremented
- * when we allocate an inode, within 0--255.
- */
-
-static int find_group_orlov(struct super_block *sb, struct inode *parent)
-{
-	int parent_group = EXT3_I(parent)->i_block_group;
-	struct ext3_sb_info *sbi = EXT3_SB(sb);
-	int ngroups = sbi->s_groups_count;
-	int inodes_per_group = EXT3_INODES_PER_GROUP(sb);
-	unsigned int freei, avefreei;
-	ext3_fsblk_t freeb, avefreeb;
-	unsigned int ndirs;
-	int max_dirs, min_inodes;
-	ext3_grpblk_t min_blocks;
-	int group = -1, i;
-	struct ext3_group_desc *desc;
-
-	freei = percpu_counter_read_positive(&sbi->s_freeinodes_counter);
-	avefreei = freei / ngroups;
-	freeb = percpu_counter_read_positive(&sbi->s_freeblocks_counter);
-	avefreeb = freeb / ngroups;
-	ndirs = percpu_counter_read_positive(&sbi->s_dirs_counter);
-
-	if ((parent == d_inode(sb->s_root)) ||
-	    (EXT3_I(parent)->i_flags & EXT3_TOPDIR_FL)) {
-		int best_ndir = inodes_per_group;
-		int best_group = -1;
-
-		group = prandom_u32();
-		parent_group = (unsigned)group % ngroups;
-		for (i = 0; i < ngroups; i++) {
-			group = (parent_group + i) % ngroups;
-			desc = ext3_get_group_desc (sb, group, NULL);
-			if (!desc || !desc->bg_free_inodes_count)
-				continue;
-			if (le16_to_cpu(desc->bg_used_dirs_count) >= best_ndir)
-				continue;
-			if (le16_to_cpu(desc->bg_free_inodes_count) < avefreei)
-				continue;
-			if (le16_to_cpu(desc->bg_free_blocks_count) < avefreeb)
-				continue;
-			best_group = group;
-			best_ndir = le16_to_cpu(desc->bg_used_dirs_count);
-		}
-		if (best_group >= 0)
-			return best_group;
-		goto fallback;
-	}
-
-	max_dirs = ndirs / ngroups + inodes_per_group / 16;
-	min_inodes = avefreei - inodes_per_group / 4;
-	min_blocks = avefreeb - EXT3_BLOCKS_PER_GROUP(sb) / 4;
-
-	for (i = 0; i < ngroups; i++) {
-		group = (parent_group + i) % ngroups;
-		desc = ext3_get_group_desc (sb, group, NULL);
-		if (!desc || !desc->bg_free_inodes_count)
-			continue;
-		if (le16_to_cpu(desc->bg_used_dirs_count) >= max_dirs)
-			continue;
-		if (le16_to_cpu(desc->bg_free_inodes_count) < min_inodes)
-			continue;
-		if (le16_to_cpu(desc->bg_free_blocks_count) < min_blocks)
-			continue;
-		return group;
-	}
-
-fallback:
-	for (i = 0; i < ngroups; i++) {
-		group = (parent_group + i) % ngroups;
-		desc = ext3_get_group_desc (sb, group, NULL);
-		if (!desc || !desc->bg_free_inodes_count)
-			continue;
-		if (le16_to_cpu(desc->bg_free_inodes_count) >= avefreei)
-			return group;
-	}
-
-	if (avefreei) {
-		/*
-		 * The free-inodes counter is approximate, and for really small
-		 * filesystems the above test can fail to find any blockgroups
-		 */
-		avefreei = 0;
-		goto fallback;
-	}
-
-	return -1;
-}
-
-static int find_group_other(struct super_block *sb, struct inode *parent)
-{
-	int parent_group = EXT3_I(parent)->i_block_group;
-	int ngroups = EXT3_SB(sb)->s_groups_count;
-	struct ext3_group_desc *desc;
-	int group, i;
-
-	/*
-	 * Try to place the inode in its parent directory
-	 */
-	group = parent_group;
-	desc = ext3_get_group_desc (sb, group, NULL);
-	if (desc && le16_to_cpu(desc->bg_free_inodes_count) &&
-			le16_to_cpu(desc->bg_free_blocks_count))
-		return group;
-
-	/*
-	 * We're going to place this inode in a different blockgroup from its
-	 * parent.  We want to cause files in a common directory to all land in
-	 * the same blockgroup.  But we want files which are in a different
-	 * directory which shares a blockgroup with our parent to land in a
-	 * different blockgroup.
-	 *
-	 * So add our directory's i_ino into the starting point for the hash.
-	 */
-	group = (group + parent->i_ino) % ngroups;
-
-	/*
-	 * Use a quadratic hash to find a group with a free inode and some free
-	 * blocks.
-	 */
-	for (i = 1; i < ngroups; i <<= 1) {
-		group += i;
-		if (group >= ngroups)
-			group -= ngroups;
-		desc = ext3_get_group_desc (sb, group, NULL);
-		if (desc && le16_to_cpu(desc->bg_free_inodes_count) &&
-				le16_to_cpu(desc->bg_free_blocks_count))
-			return group;
-	}
-
-	/*
-	 * That failed: try linear search for a free inode, even if that group
-	 * has no free blocks.
-	 */
-	group = parent_group;
-	for (i = 0; i < ngroups; i++) {
-		if (++group >= ngroups)
-			group = 0;
-		desc = ext3_get_group_desc (sb, group, NULL);
-		if (desc && le16_to_cpu(desc->bg_free_inodes_count))
-			return group;
-	}
-
-	return -1;
-}
-
-/*
- * There are two policies for allocating an inode.  If the new inode is
- * a directory, then a forward search is made for a block group with both
- * free space and a low directory-to-inode ratio; if that fails, then of
- * the groups with above-average free space, that group with the fewest
- * directories already is chosen.
- *
- * For other inodes, search forward from the parent directory's block
- * group to find a free inode.
- */
-struct inode *ext3_new_inode(handle_t *handle, struct inode * dir,
-			     const struct qstr *qstr, umode_t mode)
-{
-	struct super_block *sb;
-	struct buffer_head *bitmap_bh = NULL;
-	struct buffer_head *bh2;
-	int group;
-	unsigned long ino = 0;
-	struct inode * inode;
-	struct ext3_group_desc * gdp = NULL;
-	struct ext3_super_block * es;
-	struct ext3_inode_info *ei;
-	struct ext3_sb_info *sbi;
-	int err = 0;
-	struct inode *ret;
-	int i;
-
-	/* Cannot create files in a deleted directory */
-	if (!dir || !dir->i_nlink)
-		return ERR_PTR(-EPERM);
-
-	sb = dir->i_sb;
-	trace_ext3_request_inode(dir, mode);
-	inode = new_inode(sb);
-	if (!inode)
-		return ERR_PTR(-ENOMEM);
-	ei = EXT3_I(inode);
-
-	sbi = EXT3_SB(sb);
-	es = sbi->s_es;
-	if (S_ISDIR(mode))
-		group = find_group_orlov(sb, dir);
-	else
-		group = find_group_other(sb, dir);
-
-	err = -ENOSPC;
-	if (group == -1)
-		goto out;
-
-	for (i = 0; i < sbi->s_groups_count; i++) {
-		err = -EIO;
-
-		gdp = ext3_get_group_desc(sb, group, &bh2);
-		if (!gdp)
-			goto fail;
-
-		brelse(bitmap_bh);
-		bitmap_bh = read_inode_bitmap(sb, group);
-		if (!bitmap_bh)
-			goto fail;
-
-		ino = 0;
-
-repeat_in_this_group:
-		ino = ext3_find_next_zero_bit((unsigned long *)
-				bitmap_bh->b_data, EXT3_INODES_PER_GROUP(sb), ino);
-		if (ino < EXT3_INODES_PER_GROUP(sb)) {
-
-			BUFFER_TRACE(bitmap_bh, "get_write_access");
-			err = ext3_journal_get_write_access(handle, bitmap_bh);
-			if (err)
-				goto fail;
-
-			if (!ext3_set_bit_atomic(sb_bgl_lock(sbi, group),
-						ino, bitmap_bh->b_data)) {
-				/* we won it */
-				BUFFER_TRACE(bitmap_bh,
-					"call ext3_journal_dirty_metadata");
-				err = ext3_journal_dirty_metadata(handle,
-								bitmap_bh);
-				if (err)
-					goto fail;
-				goto got;
-			}
-			/* we lost it */
-			journal_release_buffer(handle, bitmap_bh);
-
-			if (++ino < EXT3_INODES_PER_GROUP(sb))
-				goto repeat_in_this_group;
-		}
-
-		/*
-		 * This case is possible in concurrent environment.  It is very
-		 * rare.  We cannot repeat the find_group_xxx() call because
-		 * that will simply return the same blockgroup, because the
-		 * group descriptor metadata has not yet been updated.
-		 * So we just go onto the next blockgroup.
-		 */
-		if (++group == sbi->s_groups_count)
-			group = 0;
-	}
-	err = -ENOSPC;
-	goto out;
-
-got:
-	ino += group * EXT3_INODES_PER_GROUP(sb) + 1;
-	if (ino < EXT3_FIRST_INO(sb) || ino > le32_to_cpu(es->s_inodes_count)) {
-		ext3_error (sb, "ext3_new_inode",
-			    "reserved inode or inode > inodes count - "
-			    "block_group = %d, inode=%lu", group, ino);
-		err = -EIO;
-		goto fail;
-	}
-
-	BUFFER_TRACE(bh2, "get_write_access");
-	err = ext3_journal_get_write_access(handle, bh2);
-	if (err) goto fail;
-	spin_lock(sb_bgl_lock(sbi, group));
-	le16_add_cpu(&gdp->bg_free_inodes_count, -1);
-	if (S_ISDIR(mode)) {
-		le16_add_cpu(&gdp->bg_used_dirs_count, 1);
-	}
-	spin_unlock(sb_bgl_lock(sbi, group));
-	BUFFER_TRACE(bh2, "call ext3_journal_dirty_metadata");
-	err = ext3_journal_dirty_metadata(handle, bh2);
-	if (err) goto fail;
-
-	percpu_counter_dec(&sbi->s_freeinodes_counter);
-	if (S_ISDIR(mode))
-		percpu_counter_inc(&sbi->s_dirs_counter);
-
-
-	if (test_opt(sb, GRPID)) {
-		inode->i_mode = mode;
-		inode->i_uid = current_fsuid();
-		inode->i_gid = dir->i_gid;
-	} else
-		inode_init_owner(inode, dir, mode);
-
-	inode->i_ino = ino;
-	/* This is the optimal IO size (for stat), not the fs block size */
-	inode->i_blocks = 0;
-	inode->i_mtime = inode->i_atime = inode->i_ctime = CURRENT_TIME_SEC;
-
-	memset(ei->i_data, 0, sizeof(ei->i_data));
-	ei->i_dir_start_lookup = 0;
-	ei->i_disksize = 0;
-
-	ei->i_flags =
-		ext3_mask_flags(mode, EXT3_I(dir)->i_flags & EXT3_FL_INHERITED);
-#ifdef EXT3_FRAGMENTS
-	ei->i_faddr = 0;
-	ei->i_frag_no = 0;
-	ei->i_frag_size = 0;
-#endif
-	ei->i_file_acl = 0;
-	ei->i_dir_acl = 0;
-	ei->i_dtime = 0;
-	ei->i_block_alloc_info = NULL;
-	ei->i_block_group = group;
-
-	ext3_set_inode_flags(inode);
-	if (IS_DIRSYNC(inode))
-		handle->h_sync = 1;
-	if (insert_inode_locked(inode) < 0) {
-		/*
-		 * Likely a bitmap corruption causing inode to be allocated
-		 * twice.
-		 */
-		err = -EIO;
-		goto fail;
-	}
-	spin_lock(&sbi->s_next_gen_lock);
-	inode->i_generation = sbi->s_next_generation++;
-	spin_unlock(&sbi->s_next_gen_lock);
-
-	ei->i_state_flags = 0;
-	ext3_set_inode_state(inode, EXT3_STATE_NEW);
-
-	/* See comment in ext3_iget for explanation */
-	if (ino >= EXT3_FIRST_INO(sb) + 1 &&
-	    EXT3_INODE_SIZE(sb) > EXT3_GOOD_OLD_INODE_SIZE) {
-		ei->i_extra_isize =
-			sizeof(struct ext3_inode) - EXT3_GOOD_OLD_INODE_SIZE;
-	} else {
-		ei->i_extra_isize = 0;
-	}
-
-	ret = inode;
-	dquot_initialize(inode);
-	err = dquot_alloc_inode(inode);
-	if (err)
-		goto fail_drop;
-
-	err = ext3_init_acl(handle, inode, dir);
-	if (err)
-		goto fail_free_drop;
-
-	err = ext3_init_security(handle, inode, dir, qstr);
-	if (err)
-		goto fail_free_drop;
-
-	err = ext3_mark_inode_dirty(handle, inode);
-	if (err) {
-		ext3_std_error(sb, err);
-		goto fail_free_drop;
-	}
-
-	ext3_debug("allocating inode %lu\n", inode->i_ino);
-	trace_ext3_allocate_inode(inode, dir, mode);
-	goto really_out;
-fail:
-	ext3_std_error(sb, err);
-out:
-	iput(inode);
-	ret = ERR_PTR(err);
-really_out:
-	brelse(bitmap_bh);
-	return ret;
-
-fail_free_drop:
-	dquot_free_inode(inode);
-
-fail_drop:
-	dquot_drop(inode);
-	inode->i_flags |= S_NOQUOTA;
-	clear_nlink(inode);
-	unlock_new_inode(inode);
-	iput(inode);
-	brelse(bitmap_bh);
-	return ERR_PTR(err);
-}
-
-/* Verify that we are loading a valid orphan from disk */
-struct inode *ext3_orphan_get(struct super_block *sb, unsigned long ino)
-{
-	unsigned long max_ino = le32_to_cpu(EXT3_SB(sb)->s_es->s_inodes_count);
-	unsigned long block_group;
-	int bit;
-	struct buffer_head *bitmap_bh;
-	struct inode *inode = NULL;
-	long err = -EIO;
-
-	/* Error cases - e2fsck has already cleaned up for us */
-	if (ino > max_ino) {
-		ext3_warning(sb, __func__,
-			     "bad orphan ino %lu!  e2fsck was run?", ino);
-		goto error;
-	}
-
-	block_group = (ino - 1) / EXT3_INODES_PER_GROUP(sb);
-	bit = (ino - 1) % EXT3_INODES_PER_GROUP(sb);
-	bitmap_bh = read_inode_bitmap(sb, block_group);
-	if (!bitmap_bh) {
-		ext3_warning(sb, __func__,
-			     "inode bitmap error for orphan %lu", ino);
-		goto error;
-	}
-
-	/* Having the inode bit set should be a 100% indicator that this
-	 * is a valid orphan (no e2fsck run on fs).  Orphans also include
-	 * inodes that were being truncated, so we can't check i_nlink==0.
-	 */
-	if (!ext3_test_bit(bit, bitmap_bh->b_data))
-		goto bad_orphan;
-
-	inode = ext3_iget(sb, ino);
-	if (IS_ERR(inode))
-		goto iget_failed;
-
-	/*
-	 * If the orphans has i_nlinks > 0 then it should be able to be
-	 * truncated, otherwise it won't be removed from the orphan list
-	 * during processing and an infinite loop will result.
-	 */
-	if (inode->i_nlink && !ext3_can_truncate(inode))
-		goto bad_orphan;
-
-	if (NEXT_ORPHAN(inode) > max_ino)
-		goto bad_orphan;
-	brelse(bitmap_bh);
-	return inode;
-
-iget_failed:
-	err = PTR_ERR(inode);
-	inode = NULL;
-bad_orphan:
-	ext3_warning(sb, __func__,
-		     "bad orphan inode %lu!  e2fsck was run?", ino);
-	printk(KERN_NOTICE "ext3_test_bit(bit=%d, block=%llu) = %d\n",
-	       bit, (unsigned long long)bitmap_bh->b_blocknr,
-	       ext3_test_bit(bit, bitmap_bh->b_data));
-	printk(KERN_NOTICE "inode=%p\n", inode);
-	if (inode) {
-		printk(KERN_NOTICE "is_bad_inode(inode)=%d\n",
-		       is_bad_inode(inode));
-		printk(KERN_NOTICE "NEXT_ORPHAN(inode)=%u\n",
-		       NEXT_ORPHAN(inode));
-		printk(KERN_NOTICE "max_ino=%lu\n", max_ino);
-		printk(KERN_NOTICE "i_nlink=%u\n", inode->i_nlink);
-		/* Avoid freeing blocks if we got a bad deleted inode */
-		if (inode->i_nlink == 0)
-			inode->i_blocks = 0;
-		iput(inode);
-	}
-	brelse(bitmap_bh);
-error:
-	return ERR_PTR(err);
-}
-
-unsigned long ext3_count_free_inodes (struct super_block * sb)
-{
-	unsigned long desc_count;
-	struct ext3_group_desc *gdp;
-	int i;
-#ifdef EXT3FS_DEBUG
-	struct ext3_super_block *es;
-	unsigned long bitmap_count, x;
-	struct buffer_head *bitmap_bh = NULL;
-
-	es = EXT3_SB(sb)->s_es;
-	desc_count = 0;
-	bitmap_count = 0;
-	gdp = NULL;
-	for (i = 0; i < EXT3_SB(sb)->s_groups_count; i++) {
-		gdp = ext3_get_group_desc (sb, i, NULL);
-		if (!gdp)
-			continue;
-		desc_count += le16_to_cpu(gdp->bg_free_inodes_count);
-		brelse(bitmap_bh);
-		bitmap_bh = read_inode_bitmap(sb, i);
-		if (!bitmap_bh)
-			continue;
-
-		x = ext3_count_free(bitmap_bh, EXT3_INODES_PER_GROUP(sb) / 8);
-		printk("group %d: stored = %d, counted = %lu\n",
-			i, le16_to_cpu(gdp->bg_free_inodes_count), x);
-		bitmap_count += x;
-	}
-	brelse(bitmap_bh);
-	printk("ext3_count_free_inodes: stored = %u, computed = %lu, %lu\n",
-		le32_to_cpu(es->s_free_inodes_count), desc_count, bitmap_count);
-	return desc_count;
-#else
-	desc_count = 0;
-	for (i = 0; i < EXT3_SB(sb)->s_groups_count; i++) {
-		gdp = ext3_get_group_desc (sb, i, NULL);
-		if (!gdp)
-			continue;
-		desc_count += le16_to_cpu(gdp->bg_free_inodes_count);
-		cond_resched();
-	}
-	return desc_count;
-#endif
-}
-
-/* Called at mount-time, super-block is locked */
-unsigned long ext3_count_dirs (struct super_block * sb)
-{
-	unsigned long count = 0;
-	int i;
-
-	for (i = 0; i < EXT3_SB(sb)->s_groups_count; i++) {
-		struct ext3_group_desc *gdp = ext3_get_group_desc (sb, i, NULL);
-		if (!gdp)
-			continue;
-		count += le16_to_cpu(gdp->bg_used_dirs_count);
-	}
-	return count;
-}
-
--- a/fs/ext3/inode.c
+++ b/fs/ext3/inode.c
--- a/fs/ext3/ioctl.c
+++ b/fs/ext3/ioctl.c
@ -1,327 +0,0 @@
-/*
- * linux/fs/ext3/ioctl.c
- *
- * Copyright (C) 1993, 1994, 1995
- * Remy Card (card@masi.ibp.fr)
- * Laboratoire MASI - Institut Blaise Pascal
- * Universite Pierre et Marie Curie (Paris VI)
- */
-
-#include <linux/mount.h>
-#include <linux/compat.h>
-#include <asm/uaccess.h>
-#include "ext3.h"
-
-long ext3_ioctl(struct file *filp, unsigned int cmd, unsigned long arg)
-{
-	struct inode *inode = file_inode(filp);
-	struct ext3_inode_info *ei = EXT3_I(inode);
-	unsigned int flags;
-	unsigned short rsv_window_size;
-
-	ext3_debug ("cmd = %u, arg = %lu\n", cmd, arg);
-
-	switch (cmd) {
-	case EXT3_IOC_GETFLAGS:
-		ext3_get_inode_flags(ei);
-		flags = ei->i_flags & EXT3_FL_USER_VISIBLE;
-		return put_user(flags, (int __user *) arg);
-	case EXT3_IOC_SETFLAGS: {
-		handle_t *handle = NULL;
-		int err;
-		struct ext3_iloc iloc;
-		unsigned int oldflags;
-		unsigned int jflag;
-
-		if (!inode_owner_or_capable(inode))
-			return -EACCES;
-
-		if (get_user(flags, (int __user *) arg))
-			return -EFAULT;
-
-		err = mnt_want_write_file(filp);
-		if (err)
-			return err;
-
-		flags = ext3_mask_flags(inode->i_mode, flags);
-
-		mutex_lock(&inode->i_mutex);
-
-		/* Is it quota file? Do not allow user to mess with it */
-		err = -EPERM;
-		if (IS_NOQUOTA(inode))
-			goto flags_out;
-
-		oldflags = ei->i_flags;
-
-		/* The JOURNAL_DATA flag is modifiable only by root */
-		jflag = flags & EXT3_JOURNAL_DATA_FL;
-
-		/*
-		 * The IMMUTABLE and APPEND_ONLY flags can only be changed by
-		 * the relevant capability.
-		 *
-		 * This test looks nicer. Thanks to Pauline Middelink
-		 */
-		if ((flags ^ oldflags) & (EXT3_APPEND_FL | EXT3_IMMUTABLE_FL)) {
-			if (!capable(CAP_LINUX_IMMUTABLE))
-				goto flags_out;
-		}
-
-		/*
-		 * The JOURNAL_DATA flag can only be changed by
-		 * the relevant capability.
-		 */
-		if ((jflag ^ oldflags) & (EXT3_JOURNAL_DATA_FL)) {
-			if (!capable(CAP_SYS_RESOURCE))
-				goto flags_out;
-		}
-
-		handle = ext3_journal_start(inode, 1);
-		if (IS_ERR(handle)) {
-			err = PTR_ERR(handle);
-			goto flags_out;
-		}
-		if (IS_SYNC(inode))
-			handle->h_sync = 1;
-		err = ext3_reserve_inode_write(handle, inode, &iloc);
-		if (err)
-			goto flags_err;
-
-		flags = flags & EXT3_FL_USER_MODIFIABLE;
-		flags |= oldflags & ~EXT3_FL_USER_MODIFIABLE;
-		ei->i_flags = flags;
-
-		ext3_set_inode_flags(inode);
-		inode->i_ctime = CURRENT_TIME_SEC;
-
-		err = ext3_mark_iloc_dirty(handle, inode, &iloc);
-flags_err:
-		ext3_journal_stop(handle);
-		if (err)
-			goto flags_out;
-
-		if ((jflag ^ oldflags) & (EXT3_JOURNAL_DATA_FL))
-			err = ext3_change_inode_journal_flag(inode, jflag);
-flags_out:
-		mutex_unlock(&inode->i_mutex);
-		mnt_drop_write_file(filp);
-		return err;
-	}
-	case EXT3_IOC_GETVERSION:
-	case EXT3_IOC_GETVERSION_OLD:
-		return put_user(inode->i_generation, (int __user *) arg);
-	case EXT3_IOC_SETVERSION:
-	case EXT3_IOC_SETVERSION_OLD: {
-		handle_t *handle;
-		struct ext3_iloc iloc;
-		__u32 generation;
-		int err;
-
-		if (!inode_owner_or_capable(inode))
-			return -EPERM;
-
-		err = mnt_want_write_file(filp);
-		if (err)
-			return err;
-		if (get_user(generation, (int __user *) arg)) {
-			err = -EFAULT;
-			goto setversion_out;
-		}
-
-		mutex_lock(&inode->i_mutex);
-		handle = ext3_journal_start(inode, 1);
-		if (IS_ERR(handle)) {
-			err = PTR_ERR(handle);
-			goto unlock_out;
-		}
-		err = ext3_reserve_inode_write(handle, inode, &iloc);
-		if (err == 0) {
-			inode->i_ctime = CURRENT_TIME_SEC;
-			inode->i_generation = generation;
-			err = ext3_mark_iloc_dirty(handle, inode, &iloc);
-		}
-		ext3_journal_stop(handle);
-
-unlock_out:
-		mutex_unlock(&inode->i_mutex);
-setversion_out:
-		mnt_drop_write_file(filp);
-		return err;
-	}
-	case EXT3_IOC_GETRSVSZ:
-		if (test_opt(inode->i_sb, RESERVATION)
-			&& S_ISREG(inode->i_mode)
-			&& ei->i_block_alloc_info) {
-			rsv_window_size = ei->i_block_alloc_info->rsv_window_node.rsv_goal_size;
-			return put_user(rsv_window_size, (int __user *)arg);
-		}
-		return -ENOTTY;
-	case EXT3_IOC_SETRSVSZ: {
-		int err;
-
-		if (!test_opt(inode->i_sb, RESERVATION) ||!S_ISREG(inode->i_mode))
-			return -ENOTTY;
-
-		err = mnt_want_write_file(filp);
-		if (err)
-			return err;
-
-		if (!inode_owner_or_capable(inode)) {
-			err = -EACCES;
-			goto setrsvsz_out;
-		}
-
-		if (get_user(rsv_window_size, (int __user *)arg)) {
-			err = -EFAULT;
-			goto setrsvsz_out;
-		}
-
-		if (rsv_window_size > EXT3_MAX_RESERVE_BLOCKS)
-			rsv_window_size = EXT3_MAX_RESERVE_BLOCKS;
-
-		/*
-		 * need to allocate reservation structure for this inode
-		 * before set the window size
-		 */
-		mutex_lock(&ei->truncate_mutex);
-		if (!ei->i_block_alloc_info)
-			ext3_init_block_alloc_info(inode);
-
-		if (ei->i_block_alloc_info){
-			struct ext3_reserve_window_node *rsv = &ei->i_block_alloc_info->rsv_window_node;
-			rsv->rsv_goal_size = rsv_window_size;
-		}
-		mutex_unlock(&ei->truncate_mutex);
-setrsvsz_out:
-		mnt_drop_write_file(filp);
-		return err;
-	}
-	case EXT3_IOC_GROUP_EXTEND: {
-		ext3_fsblk_t n_blocks_count;
-		struct super_block *sb = inode->i_sb;
-		int err, err2;
-
-		if (!capable(CAP_SYS_RESOURCE))
-			return -EPERM;
-
-		err = mnt_want_write_file(filp);
-		if (err)
-			return err;
-
-		if (get_user(n_blocks_count, (__u32 __user *)arg)) {
-			err = -EFAULT;
-			goto group_extend_out;
-		}
-		err = ext3_group_extend(sb, EXT3_SB(sb)->s_es, n_blocks_count);
-		journal_lock_updates(EXT3_SB(sb)->s_journal);
-		err2 = journal_flush(EXT3_SB(sb)->s_journal);
-		journal_unlock_updates(EXT3_SB(sb)->s_journal);
-		if (err == 0)
-			err = err2;
-group_extend_out:
-		mnt_drop_write_file(filp);
-		return err;
-	}
-	case EXT3_IOC_GROUP_ADD: {
-		struct ext3_new_group_data input;
-		struct super_block *sb = inode->i_sb;
-		int err, err2;
-
-		if (!capable(CAP_SYS_RESOURCE))
-			return -EPERM;
-
-		err = mnt_want_write_file(filp);
-		if (err)
-			return err;
-
-		if (copy_from_user(&input, (struct ext3_new_group_input __user *)arg,
-				sizeof(input))) {
-			err = -EFAULT;
-			goto group_add_out;
-		}
-
-		err = ext3_group_add(sb, &input);
-		journal_lock_updates(EXT3_SB(sb)->s_journal);
-		err2 = journal_flush(EXT3_SB(sb)->s_journal);
-		journal_unlock_updates(EXT3_SB(sb)->s_journal);
-		if (err == 0)
-			err = err2;
-group_add_out:
-		mnt_drop_write_file(filp);
-		return err;
-	}
-	case FITRIM: {
-
-		struct super_block *sb = inode->i_sb;
-		struct fstrim_range range;
-		int ret = 0;
-
-		if (!capable(CAP_SYS_ADMIN))
-			return -EPERM;
-
-		if (copy_from_user(&range, (struct fstrim_range __user *)arg,
-				   sizeof(range)))
-			return -EFAULT;
-
-		ret = ext3_trim_fs(sb, &range);
-		if (ret < 0)
-			return ret;
-
-		if (copy_to_user((struct fstrim_range __user *)arg, &range,
-				 sizeof(range)))
-			return -EFAULT;
-
-		return 0;
-	}
-
-	default:
-		return -ENOTTY;
-	}
-}
-
-#ifdef CONFIG_COMPAT
-long ext3_compat_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
-{
-	/* These are just misnamed, they actually get/put from/to user an int */
-	switch (cmd) {
-	case EXT3_IOC32_GETFLAGS:
-		cmd = EXT3_IOC_GETFLAGS;
-		break;
-	case EXT3_IOC32_SETFLAGS:
-		cmd = EXT3_IOC_SETFLAGS;
-		break;
-	case EXT3_IOC32_GETVERSION:
-		cmd = EXT3_IOC_GETVERSION;
-		break;
-	case EXT3_IOC32_SETVERSION:
-		cmd = EXT3_IOC_SETVERSION;
-		break;
-	case EXT3_IOC32_GROUP_EXTEND:
-		cmd = EXT3_IOC_GROUP_EXTEND;
-		break;
-	case EXT3_IOC32_GETVERSION_OLD:
-		cmd = EXT3_IOC_GETVERSION_OLD;
-		break;
-	case EXT3_IOC32_SETVERSION_OLD:
-		cmd = EXT3_IOC_SETVERSION_OLD;
-		break;
-#ifdef CONFIG_JBD_DEBUG
-	case EXT3_IOC32_WAIT_FOR_READONLY:
-		cmd = EXT3_IOC_WAIT_FOR_READONLY;
-		break;
-#endif
-	case EXT3_IOC32_GETRSVSZ:
-		cmd = EXT3_IOC_GETRSVSZ;
-		break;
-	case EXT3_IOC32_SETRSVSZ:
-		cmd = EXT3_IOC_SETRSVSZ;
-		break;
-	case EXT3_IOC_GROUP_ADD:
-		break;
-	default:
-		return -ENOIOCTLCMD;
-	}
-	return ext3_ioctl(file, cmd, (unsigned long) compat_ptr(arg));
-}
-#endif
--- a/fs/ext3/namei.c
+++ b/fs/ext3/namei.c
--- a/fs/ext3/namei.h
+++ b/fs/ext3/namei.h
@ -1,27 +0,0 @@
-/*  linux/fs/ext3/namei.h
- *
- * Copyright (C) 2005 Simtec Electronics
- *	Ben Dooks <ben@simtec.co.uk>
- *
-*/
-
-extern struct dentry *ext3_get_parent(struct dentry *child);
-
-static inline struct buffer_head *ext3_dir_bread(handle_t *handle,
-						 struct inode *inode,
-						 int block, int create,
-						 int *err)
-{
-	struct buffer_head *bh;
-
-	bh = ext3_bread(handle, inode, block, create, err);
-
-	if (!bh && !(*err)) {
-		*err = -EIO;
-		ext3_error(inode->i_sb, __func__,
-			   "Directory hole detected on inode %lu\n",
-			   inode->i_ino);
-		return NULL;
-	}
-	return bh;
-}
--- a/fs/ext3/resize.c
+++ b/fs/ext3/resize.c
--- a/fs/ext3/super.c
+++ b/fs/ext3/super.c
--- a/fs/ext3/symlink.c
+++ b/fs/ext3/symlink.c
@ -1,46 +0,0 @@
-/*
- *  linux/fs/ext3/symlink.c
- *
- * Only fast symlinks left here - the rest is done by generic code. AV, 1999
- *
- * Copyright (C) 1992, 1993, 1994, 1995
- * Remy Card (card@masi.ibp.fr)
- * Laboratoire MASI - Institut Blaise Pascal
- * Universite Pierre et Marie Curie (Paris VI)
- *
- *  from
- *
- *  linux/fs/minix/symlink.c
- *
- *  Copyright (C) 1991, 1992  Linus Torvalds
- *
- *  ext3 symlink handling code
- */
-
-#include "ext3.h"
-#include "xattr.h"
-
-const struct inode_operations ext3_symlink_inode_operations = {
-	.readlink	= generic_readlink,
-	.follow_link	= page_follow_link_light,
-	.put_link	= page_put_link,
-	.setattr	= ext3_setattr,
-#ifdef CONFIG_EXT3_FS_XATTR
-	.setxattr	= generic_setxattr,
-	.getxattr	= generic_getxattr,
-	.listxattr	= ext3_listxattr,
-	.removexattr	= generic_removexattr,
-#endif
-};
-
-const struct inode_operations ext3_fast_symlink_inode_operations = {
-	.readlink	= generic_readlink,
-	.follow_link	= simple_follow_link,
-	.setattr	= ext3_setattr,
-#ifdef CONFIG_EXT3_FS_XATTR
-	.setxattr	= generic_setxattr,
-	.getxattr	= generic_getxattr,
-	.listxattr	= ext3_listxattr,
-	.removexattr	= generic_removexattr,
-#endif
-};
--- a/fs/ext3/xattr.c
+++ b/fs/ext3/xattr.c
--- a/fs/ext3/xattr.h
+++ b/fs/ext3/xattr.h
@ -1,136 +0,0 @@
-/*
-  File: fs/ext3/xattr.h
-
-  On-disk format of extended attributes for the ext3 filesystem.
-
-  (C) 2001 Andreas Gruenbacher, <a.gruenbacher@computer.org>
-*/
-
-#include <linux/xattr.h>
-
-/* Magic value in attribute blocks */
-#define EXT3_XATTR_MAGIC		0xEA020000
-
-/* Maximum number of references to one attribute block */
-#define EXT3_XATTR_REFCOUNT_MAX		1024
-
-/* Name indexes */
-#define EXT3_XATTR_INDEX_USER			1
-#define EXT3_XATTR_INDEX_POSIX_ACL_ACCESS	2
-#define EXT3_XATTR_INDEX_POSIX_ACL_DEFAULT	3
-#define EXT3_XATTR_INDEX_TRUSTED		4
-#define	EXT3_XATTR_INDEX_LUSTRE			5
-#define EXT3_XATTR_INDEX_SECURITY	        6
-
-struct ext3_xattr_header {
-	__le32	h_magic;	/* magic number for identification */
-	__le32	h_refcount;	/* reference count */
-	__le32	h_blocks;	/* number of disk blocks used */
-	__le32	h_hash;		/* hash value of all attributes */
-	__u32	h_reserved[4];	/* zero right now */
-};
-
-struct ext3_xattr_ibody_header {
-	__le32	h_magic;	/* magic number for identification */
-};
-
-struct ext3_xattr_entry {
-	__u8	e_name_len;	/* length of name */
-	__u8	e_name_index;	/* attribute name index */
-	__le16	e_value_offs;	/* offset in disk block of value */
-	__le32	e_value_block;	/* disk block attribute is stored on (n/i) */
-	__le32	e_value_size;	/* size of attribute value */
-	__le32	e_hash;		/* hash value of name and value */
-	char	e_name[0];	/* attribute name */
-};
-
-#define EXT3_XATTR_PAD_BITS		2
-#define EXT3_XATTR_PAD		(1<<EXT3_XATTR_PAD_BITS)
-#define EXT3_XATTR_ROUND		(EXT3_XATTR_PAD-1)
-#define EXT3_XATTR_LEN(name_len) \
-	(((name_len) + EXT3_XATTR_ROUND + \
-	sizeof(struct ext3_xattr_entry)) & ~EXT3_XATTR_ROUND)
-#define EXT3_XATTR_NEXT(entry) \
-	( (struct ext3_xattr_entry *)( \
-	  (char *)(entry) + EXT3_XATTR_LEN((entry)->e_name_len)) )
-#define EXT3_XATTR_SIZE(size) \
-	(((size) + EXT3_XATTR_ROUND) & ~EXT3_XATTR_ROUND)
-
-# ifdef CONFIG_EXT3_FS_XATTR
-
-extern const struct xattr_handler ext3_xattr_user_handler;
-extern const struct xattr_handler ext3_xattr_trusted_handler;
-extern const struct xattr_handler ext3_xattr_security_handler;
-
-extern ssize_t ext3_listxattr(struct dentry *, char *, size_t);
-
-extern int ext3_xattr_get(struct inode *, int, const char *, void *, size_t);
-extern int ext3_xattr_set(struct inode *, int, const char *, const void *, size_t, int);
-extern int ext3_xattr_set_handle(handle_t *, struct inode *, int, const char *, const void *, size_t, int);
-
-extern void ext3_xattr_delete_inode(handle_t *, struct inode *);
-extern void ext3_xattr_put_super(struct super_block *);
-
-extern int init_ext3_xattr(void);
-extern void exit_ext3_xattr(void);
-
-extern const struct xattr_handler *ext3_xattr_handlers[];
-
-# else  /* CONFIG_EXT3_FS_XATTR */
-
-static inline int
-ext3_xattr_get(struct inode *inode, int name_index, const char *name,
-	       void *buffer, size_t size, int flags)
-{
-	return -EOPNOTSUPP;
-}
-
-static inline int
-ext3_xattr_set(struct inode *inode, int name_index, const char *name,
-	       const void *value, size_t size, int flags)
-{
-	return -EOPNOTSUPP;
-}
-
-static inline int
-ext3_xattr_set_handle(handle_t *handle, struct inode *inode, int name_index,
-	       const char *name, const void *value, size_t size, int flags)
-{
-	return -EOPNOTSUPP;
-}
-
-static inline void
-ext3_xattr_delete_inode(handle_t *handle, struct inode *inode)
-{
-}
-
-static inline void
-ext3_xattr_put_super(struct super_block *sb)
-{
-}
-
-static inline int
-init_ext3_xattr(void)
-{
-	return 0;
-}
-
-static inline void
-exit_ext3_xattr(void)
-{
-}
-
-#define ext3_xattr_handlers	NULL
-
-# endif  /* CONFIG_EXT3_FS_XATTR */
-
-#ifdef CONFIG_EXT3_FS_SECURITY
-extern int ext3_init_security(handle_t *handle, struct inode *inode,
-			      struct inode *dir, const struct qstr *qstr);
-#else
-static inline int ext3_init_security(handle_t *handle, struct inode *inode,
-				     struct inode *dir, const struct qstr *qstr)
-{
-	return 0;
-}
-#endif
--- a/fs/ext3/xattr_security.c
+++ b/fs/ext3/xattr_security.c
@ -1,78 +0,0 @@
-/*
- * linux/fs/ext3/xattr_security.c
- * Handler for storing security labels as extended attributes.
- */
-
-#include <linux/security.h>
-#include "ext3.h"
-#include "xattr.h"
-
-static size_t
-ext3_xattr_security_list(struct dentry *dentry, char *list, size_t list_size,
-			 const char *name, size_t name_len, int type)
-{
-	const size_t prefix_len = XATTR_SECURITY_PREFIX_LEN;
-	const size_t total_len = prefix_len + name_len + 1;
-
-
-	if (list && total_len <= list_size) {
-		memcpy(list, XATTR_SECURITY_PREFIX, prefix_len);
-		memcpy(list+prefix_len, name, name_len);
-		list[prefix_len + name_len] = '\0';
-	}
-	return total_len;
-}
-
-static int
-ext3_xattr_security_get(struct dentry *dentry, const char *name,
-		void *buffer, size_t size, int type)
-{
-	if (strcmp(name, "") == 0)
-		return -EINVAL;
-	return ext3_xattr_get(d_inode(dentry), EXT3_XATTR_INDEX_SECURITY,
-			      name, buffer, size);
-}
-
-static int
-ext3_xattr_security_set(struct dentry *dentry, const char *name,
-		const void *value, size_t size, int flags, int type)
-{
-	if (strcmp(name, "") == 0)
-		return -EINVAL;
-	return ext3_xattr_set(d_inode(dentry), EXT3_XATTR_INDEX_SECURITY,
-			      name, value, size, flags);
-}
-
-static int ext3_initxattrs(struct inode *inode,
-			   const struct xattr *xattr_array,
-			   void *fs_info)
-{
-	const struct xattr *xattr;
-	handle_t *handle = fs_info;
-	int err = 0;
-
-	for (xattr = xattr_array; xattr->name != NULL; xattr++) {
-		err = ext3_xattr_set_handle(handle, inode,
-					    EXT3_XATTR_INDEX_SECURITY,
-					    xattr->name, xattr->value,
-					    xattr->value_len, 0);
-		if (err < 0)
-			break;
-	}
-	return err;
-}
-
-int
-ext3_init_security(handle_t *handle, struct inode *inode, struct inode *dir,
-		   const struct qstr *qstr)
-{
-	return security_inode_init_security(inode, dir, qstr,
-					    &ext3_initxattrs, handle);
-}
-
-const struct xattr_handler ext3_xattr_security_handler = {
-	.prefix	= XATTR_SECURITY_PREFIX,
-	.list	= ext3_xattr_security_list,
-	.get	= ext3_xattr_security_get,
-	.set	= ext3_xattr_security_set,
-};
--- a/fs/ext3/xattr_trusted.c
+++ b/fs/ext3/xattr_trusted.c
@ -1,54 +0,0 @@
-/*
- * linux/fs/ext3/xattr_trusted.c
- * Handler for trusted extended attributes.
- *
- * Copyright (C) 2003 by Andreas Gruenbacher, <a.gruenbacher@computer.org>
- */
-
-#include "ext3.h"
-#include "xattr.h"
-
-static size_t
-ext3_xattr_trusted_list(struct dentry *dentry, char *list, size_t list_size,
-		const char *name, size_t name_len, int type)
-{
-	const size_t prefix_len = XATTR_TRUSTED_PREFIX_LEN;
-	const size_t total_len = prefix_len + name_len + 1;
-
-	if (!capable(CAP_SYS_ADMIN))
-		return 0;
-
-	if (list && total_len <= list_size) {
-		memcpy(list, XATTR_TRUSTED_PREFIX, prefix_len);
-		memcpy(list+prefix_len, name, name_len);
-		list[prefix_len + name_len] = '\0';
-	}
-	return total_len;
-}
-
-static int
-ext3_xattr_trusted_get(struct dentry *dentry, const char *name,
-		       void *buffer, size_t size, int type)
-{
-	if (strcmp(name, "") == 0)
-		return -EINVAL;
-	return ext3_xattr_get(d_inode(dentry), EXT3_XATTR_INDEX_TRUSTED,
-			      name, buffer, size);
-}
-
-static int
-ext3_xattr_trusted_set(struct dentry *dentry, const char *name,
-		const void *value, size_t size, int flags, int type)
-{
-	if (strcmp(name, "") == 0)
-		return -EINVAL;
-	return ext3_xattr_set(d_inode(dentry), EXT3_XATTR_INDEX_TRUSTED, name,
-			      value, size, flags);
-}
-
-const struct xattr_handler ext3_xattr_trusted_handler = {
-	.prefix	= XATTR_TRUSTED_PREFIX,
-	.list	= ext3_xattr_trusted_list,
-	.get	= ext3_xattr_trusted_get,
-	.set	= ext3_xattr_trusted_set,
-};
--- a/fs/ext3/xattr_user.c
+++ b/fs/ext3/xattr_user.c
@ -1,58 +0,0 @@
-/*
- * linux/fs/ext3/xattr_user.c
- * Handler for extended user attributes.
- *
- * Copyright (C) 2001 by Andreas Gruenbacher, <a.gruenbacher@computer.org>
- */
-
-#include "ext3.h"
-#include "xattr.h"
-
-static size_t
-ext3_xattr_user_list(struct dentry *dentry, char *list, size_t list_size,
-		const char *name, size_t name_len, int type)
-{
-	const size_t prefix_len = XATTR_USER_PREFIX_LEN;
-	const size_t total_len = prefix_len + name_len + 1;
-
-	if (!test_opt(dentry->d_sb, XATTR_USER))
-		return 0;
-
-	if (list && total_len <= list_size) {
-		memcpy(list, XATTR_USER_PREFIX, prefix_len);
-		memcpy(list+prefix_len, name, name_len);
-		list[prefix_len + name_len] = '\0';
-	}
-	return total_len;
-}
-
-static int
-ext3_xattr_user_get(struct dentry *dentry, const char *name, void *buffer,
-		size_t size, int type)
-{
-	if (strcmp(name, "") == 0)
-		return -EINVAL;
-	if (!test_opt(dentry->d_sb, XATTR_USER))
-		return -EOPNOTSUPP;
-	return ext3_xattr_get(d_inode(dentry), EXT3_XATTR_INDEX_USER,
-			      name, buffer, size);
-}
-
-static int
-ext3_xattr_user_set(struct dentry *dentry, const char *name,
-		const void *value, size_t size, int flags, int type)
-{
-	if (strcmp(name, "") == 0)
-		return -EINVAL;
-	if (!test_opt(dentry->d_sb, XATTR_USER))
-		return -EOPNOTSUPP;
-	return ext3_xattr_set(d_inode(dentry), EXT3_XATTR_INDEX_USER,
-			      name, value, size, flags);
-}
-
-const struct xattr_handler ext3_xattr_user_handler = {
-	.prefix	= XATTR_USER_PREFIX,
-	.list	= ext3_xattr_user_list,
-	.get	= ext3_xattr_user_get,
-	.set	= ext3_xattr_user_set,
-};
--- a/fs/ext4/Kconfig
+++ b/fs/ext4/Kconfig
@ -1,5 +1,38 @@
+# Ext3 configs are here for backward compatibility with old configs which may
+# have EXT3_FS set but not EXT4_FS set and thus would result in non-bootable
+# kernels after the removal of ext3 driver.
+config EXT3_FS
+	tristate "The Extended 3 (ext3) filesystem"
+	# These must match EXT4_FS selects...
+	select EXT4_FS
+	select JBD2
+	select CRC16
+	select CRYPTO
+	select CRYPTO_CRC32C
+	help
+	  This config option is here only for backward compatibility. ext3
+	  filesystem is now handled by the ext4 driver.
+
+config EXT3_FS_POSIX_ACL
+	bool "Ext3 POSIX Access Control Lists"
+	depends on EXT3_FS
+	select EXT4_FS_POSIX_ACL
+	select FS_POSIX_ACL
+	help
+	  This config option is here only for backward compatibility. ext3
+	  filesystem is now handled by the ext4 driver.
+
+config EXT3_FS_SECURITY
+	bool "Ext3 Security Labels"
+	depends on EXT3_FS
+	select EXT4_FS_SECURITY
+	help
+	  This config option is here only for backward compatibility. ext3
+	  filesystem is now handled by the ext4 driver.
+
 config EXT4_FS
 	tristate "The Extended 4 (ext4) filesystem"
+	# Please update EXT3_FS selects when changing these
 	select JBD2
 	select CRC16
 	select CRYPTO
@ -16,26 +49,27 @@ config EXT4_FS
 	  up fsck time.  For more information, please see the web pages at
 	  http://ext4.wiki.kernel.org.

-	  The ext4 filesystem will support mounting an ext3
-	  filesystem; while there will be some performance gains from
-	  the delayed allocation and inode table readahead, the best
-	  performance gains will require enabling ext4 features in the
-	  filesystem, or formatting a new filesystem as an ext4
-	  filesystem initially.
+	  The ext4 filesystem supports mounting an ext3 filesystem; while there
+	  are some performance gains from the delayed allocation and inode
+	  table readahead, the best performance gains require enabling ext4
+	  features in the filesystem using tune2fs, or formatting a new
+	  filesystem as an ext4 filesystem initially. Without explicit enabling
+	  of ext4 features, the on disk filesystem format stays fully backward
+	  compatible.

 	  To compile this file system support as a module, choose M here. The
 	  module will be called ext4.

 	  If unsure, say N.

-config EXT4_USE_FOR_EXT23
+config EXT4_USE_FOR_EXT2
 	bool "Use ext4 for ext2/ext3 file systems"
 	depends on EXT4_FS
-	depends on EXT3_FS=n || EXT2_FS=n
+	depends on EXT2_FS=n
 	default y
 	help
-	  Allow the ext4 file system driver code to be used for ext2 or
-	  ext3 file system mounts.  This allows users to reduce their
+	  Allow the ext4 file system driver code to be used for ext2
+	  file system mounts.  This allows users to reduce their
 	  compiled kernel size by using one file system driver for
 	  ext2, ext3, and ext4 file systems.

--- a/fs/ext4/ialloc.c
+++ b/fs/ext4/ialloc.c
@ -721,7 +721,7 @@ struct inode *__ext4_new_inode(handle_t *handle, struct inode *dir,
 	struct ext4_group_desc *gdp = NULL;
 	struct ext4_inode_info *ei;
 	struct ext4_sb_info *sbi;
-	int ret2, err = 0;
+	int ret2, err;
 	struct inode *ret;
 	ext4_group_t i;
 	ext4_group_t flex_group;
@ -769,7 +769,9 @@ struct inode *__ext4_new_inode(handle_t *handle, struct inode *dir,
 		inode->i_gid = dir->i_gid;
 	} else
 		inode_init_owner(inode, dir, mode);
-	dquot_initialize(inode);
+	err = dquot_initialize(inode);
+	if (err)
+		goto out;

 	if (!goal)
 		goal = sbi->s_inode_goal;
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@ -4661,8 +4661,11 @@ int ext4_setattr(struct dentry *dentry, struct iattr *attr)
 	if (error)
 		return error;

-	if (is_quota_modification(inode, attr))
-		dquot_initialize(inode);
+	if (is_quota_modification(inode, attr)) {
+		error = dquot_initialize(inode);
+		if (error)
+			return error;
+	}
 	if ((ia_valid & ATTR_UID && !uid_eq(attr->ia_uid, inode->i_uid)) ||
 	    (ia_valid & ATTR_GID && !gid_eq(attr->ia_gid, inode->i_gid))) {
 		handle_t *handle;
--- a/fs/ext4/namei.c
+++ b/fs/ext4/namei.c
@ -2436,7 +2436,9 @@ static int ext4_create(struct inode *dir, struct dentry *dentry, umode_t mode,
 	struct inode *inode;
 	int err, credits, retries = 0;

-	dquot_initialize(dir);
+	err = dquot_initialize(dir);
+	if (err)
+		return err;

 	credits = (EXT4_DATA_TRANS_BLOCKS(dir->i_sb) +
 		   EXT4_INDEX_EXTRA_TRANS_BLOCKS + 3);
@ -2470,7 +2472,9 @@ static int ext4_mknod(struct inode *dir, struct dentry *dentry,
 	if (!new_valid_dev(rdev))
 		return -EINVAL;

-	dquot_initialize(dir);
+	err = dquot_initialize(dir);
+	if (err)
+		return err;

 	credits = (EXT4_DATA_TRANS_BLOCKS(dir->i_sb) +
 		   EXT4_INDEX_EXTRA_TRANS_BLOCKS + 3);
@ -2499,7 +2503,9 @@ static int ext4_tmpfile(struct inode *dir, struct dentry *dentry, umode_t mode)
 	struct inode *inode;
 	int err, retries = 0;

-	dquot_initialize(dir);
+	err = dquot_initialize(dir);
+	if (err)
+		return err;

 retry:
 	inode = ext4_new_inode_start_handle(dir, mode,
@ -2612,7 +2618,9 @@ static int ext4_mkdir(struct inode *dir, struct dentry *dentry, umode_t mode)
 	if (EXT4_DIR_LINK_MAX(dir))
 		return -EMLINK;

-	dquot_initialize(dir);
+	err = dquot_initialize(dir);
+	if (err)
+		return err;

 	credits = (EXT4_DATA_TRANS_BLOCKS(dir->i_sb) +
 		   EXT4_INDEX_EXTRA_TRANS_BLOCKS + 3);
@ -2910,8 +2918,12 @@ static int ext4_rmdir(struct inode *dir, struct dentry *dentry)

 	/* Initialize quotas before so that eventual writes go in
 	 * separate transaction */
-	dquot_initialize(dir);
-	dquot_initialize(d_inode(dentry));
+	retval = dquot_initialize(dir);
+	if (retval)
+		return retval;
+	retval = dquot_initialize(d_inode(dentry));
+	if (retval)
+		return retval;

 	retval = -ENOENT;
 	bh = ext4_find_entry(dir, &dentry->d_name, &de, NULL);
@ -2980,8 +2992,12 @@ static int ext4_unlink(struct inode *dir, struct dentry *dentry)
 	trace_ext4_unlink_enter(dir, dentry);
 	/* Initialize quotas before so that eventual writes go
 	 * in separate transaction */
-	dquot_initialize(dir);
-	dquot_initialize(d_inode(dentry));
+	retval = dquot_initialize(dir);
+	if (retval)
+		return retval;
+	retval = dquot_initialize(d_inode(dentry));
+	if (retval)
+		return retval;

 	retval = -ENOENT;
 	bh = ext4_find_entry(dir, &dentry->d_name, &de, NULL);
@ -3066,7 +3082,9 @@ static int ext4_symlink(struct inode *dir,
 		goto err_free_sd;
 	}

-	dquot_initialize(dir);
+	err = dquot_initialize(dir);
+	if (err)
+		goto err_free_sd;

 	if ((disk_link.len > EXT4_N_BLOCKS * 4)) {
 		/*
@ -3197,7 +3215,9 @@ static int ext4_link(struct dentry *old_dentry,
 	if (ext4_encrypted_inode(dir) &&
 	    !ext4_is_child_context_consistent_with_parent(dir, inode))
 		return -EPERM;
-	dquot_initialize(dir);
+	err = dquot_initialize(dir);
+	if (err)
+		return err;

 retry:
 	handle = ext4_journal_start(dir, EXT4_HT_DIR,
@ -3476,13 +3496,20 @@ static int ext4_rename(struct inode *old_dir, struct dentry *old_dentry,
 	int credits;
 	u8 old_file_type;

-	dquot_initialize(old.dir);
-	dquot_initialize(new.dir);
+	retval = dquot_initialize(old.dir);
+	if (retval)
+		return retval;
+	retval = dquot_initialize(new.dir);
+	if (retval)
+		return retval;

 	/* Initialize quotas before so that eventual writes go
 	 * in separate transaction */
-	if (new.inode)
-		dquot_initialize(new.inode);
+	if (new.inode) {
+		retval = dquot_initialize(new.inode);
+		if (retval)
+			return retval;
+	}

 	old.bh = ext4_find_entry(old.dir, &old.dentry->d_name, &old.de, NULL);
 	if (IS_ERR(old.bh))
@ -3678,8 +3705,12 @@ static int ext4_cross_rename(struct inode *old_dir, struct dentry *old_dentry,
 							   new.inode)))
 		return -EPERM;

-	dquot_initialize(old.dir);
-	dquot_initialize(new.dir);
+	retval = dquot_initialize(old.dir);
+	if (retval)
+		return retval;
+	retval = dquot_initialize(new.dir);
+	if (retval)
+		return retval;

 	old.bh = ext4_find_entry(old.dir, &old.dentry->d_name,
 				 &old.de, &old.inlined);
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@ -84,7 +84,7 @@ static void ext4_unregister_li_request(struct super_block *sb);
 static void ext4_clear_request_list(void);
 static int ext4_reserve_clusters(struct ext4_sb_info *, ext4_fsblk_t);

-#if !defined(CONFIG_EXT2_FS) && !defined(CONFIG_EXT2_FS_MODULE) && defined(CONFIG_EXT4_USE_FOR_EXT23)
+#if !defined(CONFIG_EXT2_FS) && !defined(CONFIG_EXT2_FS_MODULE) && defined(CONFIG_EXT4_USE_FOR_EXT2)
 static struct file_system_type ext2_fs_type = {
 	.owner		= THIS_MODULE,
 	.name		= "ext2",
@ -100,7 +100,6 @@ MODULE_ALIAS("ext2");
 #endif


-#if !defined(CONFIG_EXT3_FS) && !defined(CONFIG_EXT3_FS_MODULE) && defined(CONFIG_EXT4_USE_FOR_EXT23)
 static struct file_system_type ext3_fs_type = {
 	.owner		= THIS_MODULE,
 	.name		= "ext3",
@ -111,9 +110,6 @@ static struct file_system_type ext3_fs_type = {
 MODULE_ALIAS_FS("ext3");
 MODULE_ALIAS("ext3");
 #define IS_EXT3_SB(sb) ((sb)->s_bdev->bd_holder == &ext3_fs_type)
-#else
-#define IS_EXT3_SB(sb) (0)
-#endif

 static int ext4_verify_csum_type(struct super_block *sb,
 				 struct ext4_super_block *es)
@ -5500,7 +5496,7 @@ static struct dentry *ext4_mount(struct file_system_type *fs_type, int flags,
 	return mount_bdev(fs_type, flags, dev_name, data, ext4_fill_super);
 }

-#if !defined(CONFIG_EXT2_FS) && !defined(CONFIG_EXT2_FS_MODULE) && defined(CONFIG_EXT4_USE_FOR_EXT23)
+#if !defined(CONFIG_EXT2_FS) && !defined(CONFIG_EXT2_FS_MODULE) && defined(CONFIG_EXT4_USE_FOR_EXT2)
 static inline void register_as_ext2(void)
 {
 	int err = register_filesystem(&ext2_fs_type);
@ -5530,7 +5526,6 @@ static inline void unregister_as_ext2(void) { }
 static inline int ext2_feature_set_ok(struct super_block *sb) { return 0; }
 #endif

-#if !defined(CONFIG_EXT3_FS) && !defined(CONFIG_EXT3_FS_MODULE) && defined(CONFIG_EXT4_USE_FOR_EXT23)
 static inline void register_as_ext3(void)
 {
 	int err = register_filesystem(&ext3_fs_type);
@ -5556,11 +5551,6 @@ static inline int ext3_feature_set_ok(struct super_block *sb)
 		return 0;
 	return 1;
 }
-#else
-static inline void register_as_ext3(void) { }
-static inline void unregister_as_ext3(void) { }
-static inline int ext3_feature_set_ok(struct super_block *sb) { return 0; }
-#endif

 static struct file_system_type ext4_fs_type = {
 	.owner		= THIS_MODULE,
--- a/fs/jbd/Kconfig
+++ b/fs/jbd/Kconfig
@ -1,30 +0,0 @@
-config JBD
-	tristate
-	help
-	  This is a generic journalling layer for block devices.  It is
-	  currently used by the ext3 file system, but it could also be
-	  used to add journal support to other file systems or block
-	  devices such as RAID or LVM.
-
-	  If you are using the ext3 file system, you need to say Y here.
-	  If you are not using ext3 then you will probably want to say N.
-
-	  To compile this device as a module, choose M here: the module will be
-	  called jbd.  If you are compiling ext3 into the kernel, you
-	  cannot compile this code as a module.
-
-config JBD_DEBUG
-	bool "JBD (ext3) debugging support"
-	depends on JBD && DEBUG_FS
-	help
-	  If you are using the ext3 journaled file system (or potentially any
-	  other file system/device using JBD), this option allows you to
-	  enable debugging output while the system is running, in order to
-	  help track down any problems you are having.  By default the
-	  debugging output will be turned off.
-
-	  If you select Y here, then you will be able to turn on debugging
-	  with "echo N > /sys/kernel/debug/jbd/jbd-debug", where N is a
-	  number between 1 and 5, the higher the number, the more debugging
-	  output is generated.  To turn debugging off again, do
-	  "echo 0 > /sys/kernel/debug/jbd/jbd-debug".
--- a/fs/jbd/Makefile
+++ b/fs/jbd/Makefile
@ -1,7 +0,0 @@
-#
-# Makefile for the linux journaling routines.
-#
-
-obj-$(CONFIG_JBD) += jbd.o
-
-jbd-objs := transaction.o commit.o recovery.o checkpoint.o revoke.o journal.o
--- a/fs/jbd/checkpoint.c
+++ b/fs/jbd/checkpoint.c
@ -1,782 +0,0 @@
-/*
- * linux/fs/jbd/checkpoint.c
- *
- * Written by Stephen C. Tweedie <sct@redhat.com>, 1999
- *
- * Copyright 1999 Red Hat Software --- All Rights Reserved
- *
- * This file is part of the Linux kernel and is made available under
- * the terms of the GNU General Public License, version 2, or at your
- * option, any later version, incorporated herein by reference.
- *
- * Checkpoint routines for the generic filesystem journaling code.
- * Part of the ext2fs journaling system.
- *
- * Checkpointing is the process of ensuring that a section of the log is
- * committed fully to disk, so that that portion of the log can be
- * reused.
- */
-
-#include <linux/time.h>
-#include <linux/fs.h>
-#include <linux/jbd.h>
-#include <linux/errno.h>
-#include <linux/slab.h>
-#include <linux/blkdev.h>
-#include <trace/events/jbd.h>
-
-/*
- * Unlink a buffer from a transaction checkpoint list.
- *
- * Called with j_list_lock held.
- */
-static inline void __buffer_unlink_first(struct journal_head *jh)
-{
-	transaction_t *transaction = jh->b_cp_transaction;
-
-	jh->b_cpnext->b_cpprev = jh->b_cpprev;
-	jh->b_cpprev->b_cpnext = jh->b_cpnext;
-	if (transaction->t_checkpoint_list == jh) {
-		transaction->t_checkpoint_list = jh->b_cpnext;
-		if (transaction->t_checkpoint_list == jh)
-			transaction->t_checkpoint_list = NULL;
-	}
-}
-
-/*
- * Unlink a buffer from a transaction checkpoint(io) list.
- *
- * Called with j_list_lock held.
- */
-static inline void __buffer_unlink(struct journal_head *jh)
-{
-	transaction_t *transaction = jh->b_cp_transaction;
-
-	__buffer_unlink_first(jh);
-	if (transaction->t_checkpoint_io_list == jh) {
-		transaction->t_checkpoint_io_list = jh->b_cpnext;
-		if (transaction->t_checkpoint_io_list == jh)
-			transaction->t_checkpoint_io_list = NULL;
-	}
-}
-
-/*
- * Move a buffer from the checkpoint list to the checkpoint io list
- *
- * Called with j_list_lock held
- */
-static inline void __buffer_relink_io(struct journal_head *jh)
-{
-	transaction_t *transaction = jh->b_cp_transaction;
-
-	__buffer_unlink_first(jh);
-
-	if (!transaction->t_checkpoint_io_list) {
-		jh->b_cpnext = jh->b_cpprev = jh;
-	} else {
-		jh->b_cpnext = transaction->t_checkpoint_io_list;
-		jh->b_cpprev = transaction->t_checkpoint_io_list->b_cpprev;
-		jh->b_cpprev->b_cpnext = jh;
-		jh->b_cpnext->b_cpprev = jh;
-	}
-	transaction->t_checkpoint_io_list = jh;
-}
-
-/*
- * Try to release a checkpointed buffer from its transaction.
- * Returns 1 if we released it and 2 if we also released the
- * whole transaction.
- *
- * Requires j_list_lock
- * Called under jbd_lock_bh_state(jh2bh(jh)), and drops it
- */
-static int __try_to_free_cp_buf(struct journal_head *jh)
-{
-	int ret = 0;
-	struct buffer_head *bh = jh2bh(jh);
-
-	if (jh->b_jlist == BJ_None && !buffer_locked(bh) &&
-	    !buffer_dirty(bh) && !buffer_write_io_error(bh)) {
-		/*
-		 * Get our reference so that bh cannot be freed before
-		 * we unlock it
-		 */
-		get_bh(bh);
-		JBUFFER_TRACE(jh, "remove from checkpoint list");
-		ret = __journal_remove_checkpoint(jh) + 1;
-		jbd_unlock_bh_state(bh);
-		BUFFER_TRACE(bh, "release");
-		__brelse(bh);
-	} else {
-		jbd_unlock_bh_state(bh);
-	}
-	return ret;
-}
-
-/*
- * __log_wait_for_space: wait until there is space in the journal.
- *
- * Called under j-state_lock *only*.  It will be unlocked if we have to wait
- * for a checkpoint to free up some space in the log.
- */
-void __log_wait_for_space(journal_t *journal)
-{
-	int nblocks, space_left;
-	assert_spin_locked(&journal->j_state_lock);
-
-	nblocks = jbd_space_needed(journal);
-	while (__log_space_left(journal) < nblocks) {
-		if (journal->j_flags & JFS_ABORT)
-			return;
-		spin_unlock(&journal->j_state_lock);
-		mutex_lock(&journal->j_checkpoint_mutex);
-
-		/*
-		 * Test again, another process may have checkpointed while we
-		 * were waiting for the checkpoint lock. If there are no
-		 * transactions ready to be checkpointed, try to recover
-		 * journal space by calling cleanup_journal_tail(), and if
-		 * that doesn't work, by waiting for the currently committing
-		 * transaction to complete.  If there is absolutely no way
-		 * to make progress, this is either a BUG or corrupted
-		 * filesystem, so abort the journal and leave a stack
-		 * trace for forensic evidence.
-		 */
-		spin_lock(&journal->j_state_lock);
-		spin_lock(&journal->j_list_lock);
-		nblocks = jbd_space_needed(journal);
-		space_left = __log_space_left(journal);
-		if (space_left < nblocks) {
-			int chkpt = journal->j_checkpoint_transactions != NULL;
-			tid_t tid = 0;
-
-			if (journal->j_committing_transaction)
-				tid = journal->j_committing_transaction->t_tid;
-			spin_unlock(&journal->j_list_lock);
-			spin_unlock(&journal->j_state_lock);
-			if (chkpt) {
-				log_do_checkpoint(journal);
-			} else if (cleanup_journal_tail(journal) == 0) {
-				/* We were able to recover space; yay! */
-				;
-			} else if (tid) {
-				log_wait_commit(journal, tid);
-			} else {
-				printk(KERN_ERR "%s: needed %d blocks and "
-				       "only had %d space available\n",
-				       __func__, nblocks, space_left);
-				printk(KERN_ERR "%s: no way to get more "
-				       "journal space\n", __func__);
-				WARN_ON(1);
-				journal_abort(journal, 0);
-			}
-			spin_lock(&journal->j_state_lock);
-		} else {
-			spin_unlock(&journal->j_list_lock);
-		}
-		mutex_unlock(&journal->j_checkpoint_mutex);
-	}
-}
-
-/*
- * We were unable to perform jbd_trylock_bh_state() inside j_list_lock.
- * The caller must restart a list walk.  Wait for someone else to run
- * jbd_unlock_bh_state().
- */
-static void jbd_sync_bh(journal_t *journal, struct buffer_head *bh)
-	__releases(journal->j_list_lock)
-{
-	get_bh(bh);
-	spin_unlock(&journal->j_list_lock);
-	jbd_lock_bh_state(bh);
-	jbd_unlock_bh_state(bh);
-	put_bh(bh);
-}
-
-/*
- * Clean up transaction's list of buffers submitted for io.
- * We wait for any pending IO to complete and remove any clean
- * buffers. Note that we take the buffers in the opposite ordering
- * from the one in which they were submitted for IO.
- *
- * Return 0 on success, and return <0 if some buffers have failed
- * to be written out.
- *
- * Called with j_list_lock held.
- */
-static int __wait_cp_io(journal_t *journal, transaction_t *transaction)
-{
-	struct journal_head *jh;
-	struct buffer_head *bh;
-	tid_t this_tid;
-	int released = 0;
-	int ret = 0;
-
-	this_tid = transaction->t_tid;
-restart:
-	/* Did somebody clean up the transaction in the meanwhile? */
-	if (journal->j_checkpoint_transactions != transaction ||
-			transaction->t_tid != this_tid)
-		return ret;
-	while (!released && transaction->t_checkpoint_io_list) {
-		jh = transaction->t_checkpoint_io_list;
-		bh = jh2bh(jh);
-		if (!jbd_trylock_bh_state(bh)) {
-			jbd_sync_bh(journal, bh);
-			spin_lock(&journal->j_list_lock);
-			goto restart;
-		}
-		get_bh(bh);
-		if (buffer_locked(bh)) {
-			spin_unlock(&journal->j_list_lock);
-			jbd_unlock_bh_state(bh);
-			wait_on_buffer(bh);
-			/* the journal_head may have gone by now */
-			BUFFER_TRACE(bh, "brelse");
-			__brelse(bh);
-			spin_lock(&journal->j_list_lock);
-			goto restart;
-		}
-		if (unlikely(buffer_write_io_error(bh)))
-			ret = -EIO;
-
-		/*
-		 * Now in whatever state the buffer currently is, we know that
-		 * it has been written out and so we can drop it from the list
-		 */
-		released = __journal_remove_checkpoint(jh);
-		jbd_unlock_bh_state(bh);
-		__brelse(bh);
-	}
-
-	return ret;
-}
-
-#define NR_BATCH	64
-
-static void
-__flush_batch(journal_t *journal, struct buffer_head **bhs, int *batch_count)
-{
-	int i;
-	struct blk_plug plug;
-
-	blk_start_plug(&plug);
-	for (i = 0; i < *batch_count; i++)
-		write_dirty_buffer(bhs[i], WRITE_SYNC);
-	blk_finish_plug(&plug);
-
-	for (i = 0; i < *batch_count; i++) {
-		struct buffer_head *bh = bhs[i];
-		clear_buffer_jwrite(bh);
-		BUFFER_TRACE(bh, "brelse");
-		__brelse(bh);
-	}
-	*batch_count = 0;
-}
-
-/*
- * Try to flush one buffer from the checkpoint list to disk.
- *
- * Return 1 if something happened which requires us to abort the current
- * scan of the checkpoint list.  Return <0 if the buffer has failed to
- * be written out.
- *
- * Called with j_list_lock held and drops it if 1 is returned
- * Called under jbd_lock_bh_state(jh2bh(jh)), and drops it
- */
-static int __process_buffer(journal_t *journal, struct journal_head *jh,
-			struct buffer_head **bhs, int *batch_count)
-{
-	struct buffer_head *bh = jh2bh(jh);
-	int ret = 0;
-
-	if (buffer_locked(bh)) {
-		get_bh(bh);
-		spin_unlock(&journal->j_list_lock);
-		jbd_unlock_bh_state(bh);
-		wait_on_buffer(bh);
-		/* the journal_head may have gone by now */
-		BUFFER_TRACE(bh, "brelse");
-		__brelse(bh);
-		ret = 1;
-	} else if (jh->b_transaction != NULL) {
-		transaction_t *t = jh->b_transaction;
-		tid_t tid = t->t_tid;
-
-		spin_unlock(&journal->j_list_lock);
-		jbd_unlock_bh_state(bh);
-		log_start_commit(journal, tid);
-		log_wait_commit(journal, tid);
-		ret = 1;
-	} else if (!buffer_dirty(bh)) {
-		ret = 1;
-		if (unlikely(buffer_write_io_error(bh)))
-			ret = -EIO;
-		get_bh(bh);
-		J_ASSERT_JH(jh, !buffer_jbddirty(bh));
-		BUFFER_TRACE(bh, "remove from checkpoint");
-		__journal_remove_checkpoint(jh);
-		spin_unlock(&journal->j_list_lock);
-		jbd_unlock_bh_state(bh);
-		__brelse(bh);
-	} else {
-		/*
-		 * Important: we are about to write the buffer, and
-		 * possibly block, while still holding the journal lock.
-		 * We cannot afford to let the transaction logic start
-		 * messing around with this buffer before we write it to
-		 * disk, as that would break recoverability.
-		 */
-		BUFFER_TRACE(bh, "queue");
-		get_bh(bh);
-		J_ASSERT_BH(bh, !buffer_jwrite(bh));
-		set_buffer_jwrite(bh);
-		bhs[*batch_count] = bh;
-		__buffer_relink_io(jh);
-		jbd_unlock_bh_state(bh);
-		(*batch_count)++;
-		if (*batch_count == NR_BATCH) {
-			spin_unlock(&journal->j_list_lock);
-			__flush_batch(journal, bhs, batch_count);
-			ret = 1;
-		}
-	}
-	return ret;
-}
-
-/*
- * Perform an actual checkpoint. We take the first transaction on the
- * list of transactions to be checkpointed and send all its buffers
- * to disk. We submit larger chunks of data at once.
- *
- * The journal should be locked before calling this function.
- * Called with j_checkpoint_mutex held.
- */
-int log_do_checkpoint(journal_t *journal)
-{
-	transaction_t *transaction;
-	tid_t this_tid;
-	int result;
-
-	jbd_debug(1, "Start checkpoint\n");
-
-	/*
-	 * First thing: if there are any transactions in the log which
-	 * don't need checkpointing, just eliminate them from the
-	 * journal straight away.
-	 */
-	result = cleanup_journal_tail(journal);
-	trace_jbd_checkpoint(journal, result);
-	jbd_debug(1, "cleanup_journal_tail returned %d\n", result);
-	if (result <= 0)
-		return result;
-
-	/*
-	 * OK, we need to start writing disk blocks.  Take one transaction
-	 * and write it.
-	 */
-	result = 0;
-	spin_lock(&journal->j_list_lock);
-	if (!journal->j_checkpoint_transactions)
-		goto out;
-	transaction = journal->j_checkpoint_transactions;
-	this_tid = transaction->t_tid;
-restart:
-	/*
-	 * If someone cleaned up this transaction while we slept, we're
-	 * done (maybe it's a new transaction, but it fell at the same
-	 * address).
-	 */
-	if (journal->j_checkpoint_transactions == transaction &&
-			transaction->t_tid == this_tid) {
-		int batch_count = 0;
-		struct buffer_head *bhs[NR_BATCH];
-		struct journal_head *jh;
-		int retry = 0, err;
-
-		while (!retry && transaction->t_checkpoint_list) {
-			struct buffer_head *bh;
-
-			jh = transaction->t_checkpoint_list;
-			bh = jh2bh(jh);
-			if (!jbd_trylock_bh_state(bh)) {
-				jbd_sync_bh(journal, bh);
-				retry = 1;
-				break;
-			}
-			retry = __process_buffer(journal, jh, bhs,&batch_count);
-			if (retry < 0 && !result)
-				result = retry;
-			if (!retry && (need_resched() ||
-				spin_needbreak(&journal->j_list_lock))) {
-				spin_unlock(&journal->j_list_lock);
-				retry = 1;
-				break;
-			}
-		}
-
-		if (batch_count) {
-			if (!retry) {
-				spin_unlock(&journal->j_list_lock);
-				retry = 1;
-			}
-			__flush_batch(journal, bhs, &batch_count);
-		}
-
-		if (retry) {
-			spin_lock(&journal->j_list_lock);
-			goto restart;
-		}
-		/*
-		 * Now we have cleaned up the first transaction's checkpoint
-		 * list. Let's clean up the second one
-		 */
-		err = __wait_cp_io(journal, transaction);
-		if (!result)
-			result = err;
-	}
-out:
-	spin_unlock(&journal->j_list_lock);
-	if (result < 0)
-		journal_abort(journal, result);
-	else
-		result = cleanup_journal_tail(journal);
-
-	return (result < 0) ? result : 0;
-}
-
-/*
- * Check the list of checkpoint transactions for the journal to see if
- * we have already got rid of any since the last update of the log tail
- * in the journal superblock.  If so, we can instantly roll the
- * superblock forward to remove those transactions from the log.
- *
- * Return <0 on error, 0 on success, 1 if there was nothing to clean up.
- *
- * This is the only part of the journaling code which really needs to be
- * aware of transaction aborts.  Checkpointing involves writing to the
- * main filesystem area rather than to the journal, so it can proceed
- * even in abort state, but we must not update the super block if
- * checkpointing may have failed.  Otherwise, we would lose some metadata
- * buffers which should be written-back to the filesystem.
- */
-
-int cleanup_journal_tail(journal_t *journal)
-{
-	transaction_t * transaction;
-	tid_t		first_tid;
-	unsigned int	blocknr, freed;
-
-	if (is_journal_aborted(journal))
-		return 1;
-
-	/*
-	 * OK, work out the oldest transaction remaining in the log, and
-	 * the log block it starts at.
-	 *
-	 * If the log is now empty, we need to work out which is the
-	 * next transaction ID we will write, and where it will
-	 * start.
-	 */
-	spin_lock(&journal->j_state_lock);
-	spin_lock(&journal->j_list_lock);
-	transaction = journal->j_checkpoint_transactions;
-	if (transaction) {
-		first_tid = transaction->t_tid;
-		blocknr = transaction->t_log_start;
-	} else if ((transaction = journal->j_committing_transaction) != NULL) {
-		first_tid = transaction->t_tid;
-		blocknr = transaction->t_log_start;
-	} else if ((transaction = journal->j_running_transaction) != NULL) {
-		first_tid = transaction->t_tid;
-		blocknr = journal->j_head;
-	} else {
-		first_tid = journal->j_transaction_sequence;
-		blocknr = journal->j_head;
-	}
-	spin_unlock(&journal->j_list_lock);
-	J_ASSERT(blocknr != 0);
-
-	/* If the oldest pinned transaction is at the tail of the log
-           already then there's not much we can do right now. */
-	if (journal->j_tail_sequence == first_tid) {
-		spin_unlock(&journal->j_state_lock);
-		return 1;
-	}
-	spin_unlock(&journal->j_state_lock);
-
-	/*
-	 * We need to make sure that any blocks that were recently written out
-	 * --- perhaps by log_do_checkpoint() --- are flushed out before we
-	 * drop the transactions from the journal. Similarly we need to be sure
-	 * superblock makes it to disk before next transaction starts reusing
-	 * freed space (otherwise we could replay some blocks of the new
-	 * transaction thinking they belong to the old one). So we use
-	 * WRITE_FLUSH_FUA. It's unlikely this will be necessary, especially
-	 * with an appropriately sized journal, but we need this to guarantee
-	 * correctness.  Fortunately cleanup_journal_tail() doesn't get called
-	 * all that often.
-	 */
-	journal_update_sb_log_tail(journal, first_tid, blocknr,
-				   WRITE_FLUSH_FUA);
-
-	spin_lock(&journal->j_state_lock);
-	/* OK, update the superblock to recover the freed space.
-	 * Physical blocks come first: have we wrapped beyond the end of
-	 * the log?  */
-	freed = blocknr - journal->j_tail;
-	if (blocknr < journal->j_tail)
-		freed = freed + journal->j_last - journal->j_first;
-
-	trace_jbd_cleanup_journal_tail(journal, first_tid, blocknr, freed);
-	jbd_debug(1,
-		  "Cleaning journal tail from %d to %d (offset %u), "
-		  "freeing %u\n",
-		  journal->j_tail_sequence, first_tid, blocknr, freed);
-
-	journal->j_free += freed;
-	journal->j_tail_sequence = first_tid;
-	journal->j_tail = blocknr;
-	spin_unlock(&journal->j_state_lock);
-	return 0;
-}
-
-
-/* Checkpoint list management */
-
-/*
- * journal_clean_one_cp_list
- *
- * Find all the written-back checkpoint buffers in the given list and release
- * them.
- *
- * Called with j_list_lock held.
- * Returns number of buffers reaped (for debug)
- */
-
-static int journal_clean_one_cp_list(struct journal_head *jh, int *released)
-{
-	struct journal_head *last_jh;
-	struct journal_head *next_jh = jh;
-	int ret, freed = 0;
-
-	*released = 0;
-	if (!jh)
-		return 0;
-
-	last_jh = jh->b_cpprev;
-	do {
-		jh = next_jh;
-		next_jh = jh->b_cpnext;
-		/* Use trylock because of the ranking */
-		if (jbd_trylock_bh_state(jh2bh(jh))) {
-			ret = __try_to_free_cp_buf(jh);
-			if (ret) {
-				freed++;
-				if (ret == 2) {
-					*released = 1;
-					return freed;
-				}
-			}
-		}
-		/*
-		 * This function only frees up some memory
-		 * if possible so we dont have an obligation
-		 * to finish processing. Bail out if preemption
-		 * requested:
-		 */
-		if (need_resched())
-			return freed;
-	} while (jh != last_jh);
-
-	return freed;
-}
-
-/*
- * journal_clean_checkpoint_list
- *
- * Find all the written-back checkpoint buffers in the journal and release them.
- *
- * Called with the journal locked.
- * Called with j_list_lock held.
- * Returns number of buffers reaped (for debug)
- */
-
-int __journal_clean_checkpoint_list(journal_t *journal)
-{
-	transaction_t *transaction, *last_transaction, *next_transaction;
-	int ret = 0;
-	int released;
-
-	transaction = journal->j_checkpoint_transactions;
-	if (!transaction)
-		goto out;
-
-	last_transaction = transaction->t_cpprev;
-	next_transaction = transaction;
-	do {
-		transaction = next_transaction;
-		next_transaction = transaction->t_cpnext;
-		ret += journal_clean_one_cp_list(transaction->
-				t_checkpoint_list, &released);
-		/*
-		 * This function only frees up some memory if possible so we
-		 * dont have an obligation to finish processing. Bail out if
-		 * preemption requested:
-		 */
-		if (need_resched())
-			goto out;
-		if (released)
-			continue;
-		/*
-		 * It is essential that we are as careful as in the case of
-		 * t_checkpoint_list with removing the buffer from the list as
-		 * we can possibly see not yet submitted buffers on io_list
-		 */
-		ret += journal_clean_one_cp_list(transaction->
-				t_checkpoint_io_list, &released);
-		if (need_resched())
-			goto out;
-	} while (transaction != last_transaction);
-out:
-	return ret;
-}
-
-/*
- * journal_remove_checkpoint: called after a buffer has been committed
- * to disk (either by being write-back flushed to disk, or being
- * committed to the log).
- *
- * We cannot safely clean a transaction out of the log until all of the
- * buffer updates committed in that transaction have safely been stored
- * elsewhere on disk.  To achieve this, all of the buffers in a
- * transaction need to be maintained on the transaction's checkpoint
- * lists until they have been rewritten, at which point this function is
- * called to remove the buffer from the existing transaction's
- * checkpoint lists.
- *
- * The function returns 1 if it frees the transaction, 0 otherwise.
- * The function can free jh and bh.
- *
- * This function is called with j_list_lock held.
- * This function is called with jbd_lock_bh_state(jh2bh(jh))
- */
-
-int __journal_remove_checkpoint(struct journal_head *jh)
-{
-	transaction_t *transaction;
-	journal_t *journal;
-	int ret = 0;
-
-	JBUFFER_TRACE(jh, "entry");
-
-	if ((transaction = jh->b_cp_transaction) == NULL) {
-		JBUFFER_TRACE(jh, "not on transaction");
-		goto out;
-	}
-	journal = transaction->t_journal;
-
-	JBUFFER_TRACE(jh, "removing from transaction");
-	__buffer_unlink(jh);
-	jh->b_cp_transaction = NULL;
-	journal_put_journal_head(jh);
-
-	if (transaction->t_checkpoint_list != NULL ||
-	    transaction->t_checkpoint_io_list != NULL)
-		goto out;
-
-	/*
-	 * There is one special case to worry about: if we have just pulled the
-	 * buffer off a running or committing transaction's checkpoing list,
-	 * then even if the checkpoint list is empty, the transaction obviously
-	 * cannot be dropped!
-	 *
-	 * The locking here around t_state is a bit sleazy.
-	 * See the comment at the end of journal_commit_transaction().
-	 */
-	if (transaction->t_state != T_FINISHED)
-		goto out;
-
-	/* OK, that was the last buffer for the transaction: we can now
-	   safely remove this transaction from the log */
-
-	__journal_drop_transaction(journal, transaction);
-
-	/* Just in case anybody was waiting for more transactions to be
-           checkpointed... */
-	wake_up(&journal->j_wait_logspace);
-	ret = 1;
-out:
-	return ret;
-}
-
-/*
- * journal_insert_checkpoint: put a committed buffer onto a checkpoint
- * list so that we know when it is safe to clean the transaction out of
- * the log.
- *
- * Called with the journal locked.
- * Called with j_list_lock held.
- */
-void __journal_insert_checkpoint(struct journal_head *jh,
-			       transaction_t *transaction)
-{
-	JBUFFER_TRACE(jh, "entry");
-	J_ASSERT_JH(jh, buffer_dirty(jh2bh(jh)) || buffer_jbddirty(jh2bh(jh)));
-	J_ASSERT_JH(jh, jh->b_cp_transaction == NULL);
-
-	/* Get reference for checkpointing transaction */
-	journal_grab_journal_head(jh2bh(jh));
-	jh->b_cp_transaction = transaction;
-
-	if (!transaction->t_checkpoint_list) {
-		jh->b_cpnext = jh->b_cpprev = jh;
-	} else {
-		jh->b_cpnext = transaction->t_checkpoint_list;
-		jh->b_cpprev = transaction->t_checkpoint_list->b_cpprev;
-		jh->b_cpprev->b_cpnext = jh;
-		jh->b_cpnext->b_cpprev = jh;
-	}
-	transaction->t_checkpoint_list = jh;
-}
-
-/*
- * We've finished with this transaction structure: adios...
- *
- * The transaction must have no links except for the checkpoint by this
- * point.
- *
- * Called with the journal locked.
- * Called with j_list_lock held.
- */
-
-void __journal_drop_transaction(journal_t *journal, transaction_t *transaction)
-{
-	assert_spin_locked(&journal->j_list_lock);
-	if (transaction->t_cpnext) {
-		transaction->t_cpnext->t_cpprev = transaction->t_cpprev;
-		transaction->t_cpprev->t_cpnext = transaction->t_cpnext;
-		if (journal->j_checkpoint_transactions == transaction)
-			journal->j_checkpoint_transactions =
-				transaction->t_cpnext;
-		if (journal->j_checkpoint_transactions == transaction)
-			journal->j_checkpoint_transactions = NULL;
-	}
-
-	J_ASSERT(transaction->t_state == T_FINISHED);
-	J_ASSERT(transaction->t_buffers == NULL);
-	J_ASSERT(transaction->t_sync_datalist == NULL);
-	J_ASSERT(transaction->t_forget == NULL);
-	J_ASSERT(transaction->t_iobuf_list == NULL);
-	J_ASSERT(transaction->t_shadow_list == NULL);
-	J_ASSERT(transaction->t_log_list == NULL);
-	J_ASSERT(transaction->t_checkpoint_list == NULL);
-	J_ASSERT(transaction->t_checkpoint_io_list == NULL);
-	J_ASSERT(transaction->t_updates == 0);
-	J_ASSERT(journal->j_committing_transaction != transaction);
-	J_ASSERT(journal->j_running_transaction != transaction);
-
-	trace_jbd_drop_transaction(journal, transaction);
-	jbd_debug(1, "Dropping transaction %d, all done\n", transaction->t_tid);
-	kfree(transaction);
-}
--- a/fs/jbd/commit.c
+++ b/fs/jbd/commit.c
--- a/fs/jbd/journal.c
+++ b/fs/jbd/journal.c
--- a/fs/jbd/recovery.c
+++ b/fs/jbd/recovery.c
@ -1,594 +0,0 @@
-/*
- * linux/fs/jbd/recovery.c
- *
- * Written by Stephen C. Tweedie <sct@redhat.com>, 1999
- *
- * Copyright 1999-2000 Red Hat Software --- All Rights Reserved
- *
- * This file is part of the Linux kernel and is made available under
- * the terms of the GNU General Public License, version 2, or at your
- * option, any later version, incorporated herein by reference.
- *
- * Journal recovery routines for the generic filesystem journaling code;
- * part of the ext2fs journaling system.
- */
-
-#ifndef __KERNEL__
-#include "jfs_user.h"
-#else
-#include <linux/time.h>
-#include <linux/fs.h>
-#include <linux/jbd.h>
-#include <linux/errno.h>
-#include <linux/blkdev.h>
-#endif
-
-/*
- * Maintain information about the progress of the recovery job, so that
- * the different passes can carry information between them.
- */
-struct recovery_info
-{
-	tid_t		start_transaction;
-	tid_t		end_transaction;
-
-	int		nr_replays;
-	int		nr_revokes;
-	int		nr_revoke_hits;
-};
-
-enum passtype {PASS_SCAN, PASS_REVOKE, PASS_REPLAY};
-static int do_one_pass(journal_t *journal,
-				struct recovery_info *info, enum passtype pass);
-static int scan_revoke_records(journal_t *, struct buffer_head *,
-				tid_t, struct recovery_info *);
-
-#ifdef __KERNEL__
-
-/* Release readahead buffers after use */
-static void journal_brelse_array(struct buffer_head *b[], int n)
-{
-	while (--n >= 0)
-		brelse (b[n]);
-}
-
-
-/*
- * When reading from the journal, we are going through the block device
- * layer directly and so there is no readahead being done for us.  We
- * need to implement any readahead ourselves if we want it to happen at
- * all.  Recovery is basically one long sequential read, so make sure we
- * do the IO in reasonably large chunks.
- *
- * This is not so critical that we need to be enormously clever about
- * the readahead size, though.  128K is a purely arbitrary, good-enough
- * fixed value.
- */
-
-#define MAXBUF 8
-static int do_readahead(journal_t *journal, unsigned int start)
-{
-	int err;
-	unsigned int max, nbufs, next;
-	unsigned int blocknr;
-	struct buffer_head *bh;
-
-	struct buffer_head * bufs[MAXBUF];
-
-	/* Do up to 128K of readahead */
-	max = start + (128 * 1024 / journal->j_blocksize);
-	if (max > journal->j_maxlen)
-		max = journal->j_maxlen;
-
-	/* Do the readahead itself.  We'll submit MAXBUF buffer_heads at
-	 * a time to the block device IO layer. */
-
-	nbufs = 0;
-
-	for (next = start; next < max; next++) {
-		err = journal_bmap(journal, next, &blocknr);
-
-		if (err) {
-			printk (KERN_ERR "JBD: bad block at offset %u\n",
-				next);
-			goto failed;
-		}
-
-		bh = __getblk(journal->j_dev, blocknr, journal->j_blocksize);
-		if (!bh) {
-			err = -ENOMEM;
-			goto failed;
-		}
-
-		if (!buffer_uptodate(bh) && !buffer_locked(bh)) {
-			bufs[nbufs++] = bh;
-			if (nbufs == MAXBUF) {
-				ll_rw_block(READ, nbufs, bufs);
-				journal_brelse_array(bufs, nbufs);
-				nbufs = 0;
-			}
-		} else
-			brelse(bh);
-	}
-
-	if (nbufs)
-		ll_rw_block(READ, nbufs, bufs);
-	err = 0;
-
-failed:
-	if (nbufs)
-		journal_brelse_array(bufs, nbufs);
-	return err;
-}
-
-#endif /* __KERNEL__ */
-
-
-/*
- * Read a block from the journal
- */
-
-static int jread(struct buffer_head **bhp, journal_t *journal,
-		 unsigned int offset)
-{
-	int err;
-	unsigned int blocknr;
-	struct buffer_head *bh;
-
-	*bhp = NULL;
-
-	if (offset >= journal->j_maxlen) {
-		printk(KERN_ERR "JBD: corrupted journal superblock\n");
-		return -EIO;
-	}
-
-	err = journal_bmap(journal, offset, &blocknr);
-
-	if (err) {
-		printk (KERN_ERR "JBD: bad block at offset %u\n",
-			offset);
-		return err;
-	}
-
-	bh = __getblk(journal->j_dev, blocknr, journal->j_blocksize);
-	if (!bh)
-		return -ENOMEM;
-
-	if (!buffer_uptodate(bh)) {
-		/* If this is a brand new buffer, start readahead.
-                   Otherwise, we assume we are already reading it.  */
-		if (!buffer_req(bh))
-			do_readahead(journal, offset);
-		wait_on_buffer(bh);
-	}
-
-	if (!buffer_uptodate(bh)) {
-		printk (KERN_ERR "JBD: Failed to read block at offset %u\n",
-			offset);
-		brelse(bh);
-		return -EIO;
-	}
-
-	*bhp = bh;
-	return 0;
-}
-
-
-/*
- * Count the number of in-use tags in a journal descriptor block.
- */
-
-static int count_tags(struct buffer_head *bh, int size)
-{
-	char *			tagp;
-	journal_block_tag_t *	tag;
-	int			nr = 0;
-
-	tagp = &bh->b_data[sizeof(journal_header_t)];
-
-	while ((tagp - bh->b_data + sizeof(journal_block_tag_t)) <= size) {
-		tag = (journal_block_tag_t *) tagp;
-
-		nr++;
-		tagp += sizeof(journal_block_tag_t);
-		if (!(tag->t_flags & cpu_to_be32(JFS_FLAG_SAME_UUID)))
-			tagp += 16;
-
-		if (tag->t_flags & cpu_to_be32(JFS_FLAG_LAST_TAG))
-			break;
-	}
-
-	return nr;
-}
-
-
-/* Make sure we wrap around the log correctly! */
-#define wrap(journal, var)						\
-do {									\
-	if (var >= (journal)->j_last)					\
-		var -= ((journal)->j_last - (journal)->j_first);	\
-} while (0)
-
-/**
- * journal_recover - recovers a on-disk journal
- * @journal: the journal to recover
- *
- * The primary function for recovering the log contents when mounting a
- * journaled device.
- *
- * Recovery is done in three passes.  In the first pass, we look for the
- * end of the log.  In the second, we assemble the list of revoke
- * blocks.  In the third and final pass, we replay any un-revoked blocks
- * in the log.
- */
-int journal_recover(journal_t *journal)
-{
-	int			err, err2;
-	journal_superblock_t *	sb;
-
-	struct recovery_info	info;
-
-	memset(&info, 0, sizeof(info));
-	sb = journal->j_superblock;
-
-	/*
-	 * The journal superblock's s_start field (the current log head)
-	 * is always zero if, and only if, the journal was cleanly
-	 * unmounted.
-	 */
-
-	if (!sb->s_start) {
-		jbd_debug(1, "No recovery required, last transaction %d\n",
-			  be32_to_cpu(sb->s_sequence));
-		journal->j_transaction_sequence = be32_to_cpu(sb->s_sequence) + 1;
-		return 0;
-	}
-
-	err = do_one_pass(journal, &info, PASS_SCAN);
-	if (!err)
-		err = do_one_pass(journal, &info, PASS_REVOKE);
-	if (!err)
-		err = do_one_pass(journal, &info, PASS_REPLAY);
-
-	jbd_debug(1, "JBD: recovery, exit status %d, "
-		  "recovered transactions %u to %u\n",
-		  err, info.start_transaction, info.end_transaction);
-	jbd_debug(1, "JBD: Replayed %d and revoked %d/%d blocks\n",
-		  info.nr_replays, info.nr_revoke_hits, info.nr_revokes);
-
-	/* Restart the log at the next transaction ID, thus invalidating
-	 * any existing commit records in the log. */
-	journal->j_transaction_sequence = ++info.end_transaction;
-
-	journal_clear_revoke(journal);
-	err2 = sync_blockdev(journal->j_fs_dev);
-	if (!err)
-		err = err2;
-	/* Flush disk caches to get replayed data on the permanent storage */
-	if (journal->j_flags & JFS_BARRIER) {
-		err2 = blkdev_issue_flush(journal->j_fs_dev, GFP_KERNEL, NULL);
-		if (!err)
-			err = err2;
-	}
-
-	return err;
-}
-
-/**
- * journal_skip_recovery - Start journal and wipe exiting records
- * @journal: journal to startup
- *
- * Locate any valid recovery information from the journal and set up the
- * journal structures in memory to ignore it (presumably because the
- * caller has evidence that it is out of date).
- * This function does'nt appear to be exorted..
- *
- * We perform one pass over the journal to allow us to tell the user how
- * much recovery information is being erased, and to let us initialise
- * the journal transaction sequence numbers to the next unused ID.
- */
-int journal_skip_recovery(journal_t *journal)
-{
-	int			err;
-	struct recovery_info	info;
-
-	memset (&info, 0, sizeof(info));
-
-	err = do_one_pass(journal, &info, PASS_SCAN);
-
-	if (err) {
-		printk(KERN_ERR "JBD: error %d scanning journal\n", err);
-		++journal->j_transaction_sequence;
-	} else {
-#ifdef CONFIG_JBD_DEBUG
-		int dropped = info.end_transaction -
-			      be32_to_cpu(journal->j_superblock->s_sequence);
-		jbd_debug(1,
-			  "JBD: ignoring %d transaction%s from the journal.\n",
-			  dropped, (dropped == 1) ? "" : "s");
-#endif
-		journal->j_transaction_sequence = ++info.end_transaction;
-	}
-
-	journal->j_tail = 0;
-	return err;
-}
-
-static int do_one_pass(journal_t *journal,
-			struct recovery_info *info, enum passtype pass)
-{
-	unsigned int		first_commit_ID, next_commit_ID;
-	unsigned int		next_log_block;
-	int			err, success = 0;
-	journal_superblock_t *	sb;
-	journal_header_t *	tmp;
-	struct buffer_head *	bh;
-	unsigned int		sequence;
-	int			blocktype;
-
-	/*
-	 * First thing is to establish what we expect to find in the log
-	 * (in terms of transaction IDs), and where (in terms of log
-	 * block offsets): query the superblock.
-	 */
-
-	sb = journal->j_superblock;
-	next_commit_ID = be32_to_cpu(sb->s_sequence);
-	next_log_block = be32_to_cpu(sb->s_start);
-
-	first_commit_ID = next_commit_ID;
-	if (pass == PASS_SCAN)
-		info->start_transaction = first_commit_ID;
-
-	jbd_debug(1, "Starting recovery pass %d\n", pass);
-
-	/*
-	 * Now we walk through the log, transaction by transaction,
-	 * making sure that each transaction has a commit block in the
-	 * expected place.  Each complete transaction gets replayed back
-	 * into the main filesystem.
-	 */
-
-	while (1) {
-		int			flags;
-		char *			tagp;
-		journal_block_tag_t *	tag;
-		struct buffer_head *	obh;
-		struct buffer_head *	nbh;
-
-		cond_resched();
-
-		/* If we already know where to stop the log traversal,
-		 * check right now that we haven't gone past the end of
-		 * the log. */
-
-		if (pass != PASS_SCAN)
-			if (tid_geq(next_commit_ID, info->end_transaction))
-				break;
-
-		jbd_debug(2, "Scanning for sequence ID %u at %u/%u\n",
-			  next_commit_ID, next_log_block, journal->j_last);
-
-		/* Skip over each chunk of the transaction looking
-		 * either the next descriptor block or the final commit
-		 * record. */
-
-		jbd_debug(3, "JBD: checking block %u\n", next_log_block);
-		err = jread(&bh, journal, next_log_block);
-		if (err)
-			goto failed;
-
-		next_log_block++;
-		wrap(journal, next_log_block);
-
-		/* What kind of buffer is it?
-		 *
-		 * If it is a descriptor block, check that it has the
-		 * expected sequence number.  Otherwise, we're all done
-		 * here. */
-
-		tmp = (journal_header_t *)bh->b_data;
-
-		if (tmp->h_magic != cpu_to_be32(JFS_MAGIC_NUMBER)) {
-			brelse(bh);
-			break;
-		}
-
-		blocktype = be32_to_cpu(tmp->h_blocktype);
-		sequence = be32_to_cpu(tmp->h_sequence);
-		jbd_debug(3, "Found magic %d, sequence %d\n",
-			  blocktype, sequence);
-
-		if (sequence != next_commit_ID) {
-			brelse(bh);
-			break;
-		}
-
-		/* OK, we have a valid descriptor block which matches
-		 * all of the sequence number checks.  What are we going
-		 * to do with it?  That depends on the pass... */
-
-		switch(blocktype) {
-		case JFS_DESCRIPTOR_BLOCK:
-			/* If it is a valid descriptor block, replay it
-			 * in pass REPLAY; otherwise, just skip over the
-			 * blocks it describes. */
-			if (pass != PASS_REPLAY) {
-				next_log_block +=
-					count_tags(bh, journal->j_blocksize);
-				wrap(journal, next_log_block);
-				brelse(bh);
-				continue;
-			}
-
-			/* A descriptor block: we can now write all of
-			 * the data blocks.  Yay, useful work is finally
-			 * getting done here! */
-
-			tagp = &bh->b_data[sizeof(journal_header_t)];
-			while ((tagp - bh->b_data +sizeof(journal_block_tag_t))
-			       <= journal->j_blocksize) {
-				unsigned int io_block;
-
-				tag = (journal_block_tag_t *) tagp;
-				flags = be32_to_cpu(tag->t_flags);
-
-				io_block = next_log_block++;
-				wrap(journal, next_log_block);
-				err = jread(&obh, journal, io_block);
-				if (err) {
-					/* Recover what we can, but
-					 * report failure at the end. */
-					success = err;
-					printk (KERN_ERR
-						"JBD: IO error %d recovering "
-						"block %u in log\n",
-						err, io_block);
-				} else {
-					unsigned int blocknr;
-
-					J_ASSERT(obh != NULL);
-					blocknr = be32_to_cpu(tag->t_blocknr);
-
-					/* If the block has been
-					 * revoked, then we're all done
-					 * here. */
-					if (journal_test_revoke
-					    (journal, blocknr,
-					     next_commit_ID)) {
-						brelse(obh);
-						++info->nr_revoke_hits;
-						goto skip_write;
-					}
-
-					/* Find a buffer for the new
-					 * data being restored */
-					nbh = __getblk(journal->j_fs_dev,
-							blocknr,
-							journal->j_blocksize);
-					if (nbh == NULL) {
-						printk(KERN_ERR
-						       "JBD: Out of memory "
-						       "during recovery.\n");
-						err = -ENOMEM;
-						brelse(bh);
-						brelse(obh);
-						goto failed;
-					}
-
-					lock_buffer(nbh);
-					memcpy(nbh->b_data, obh->b_data,
-							journal->j_blocksize);
-					if (flags & JFS_FLAG_ESCAPE) {
-						*((__be32 *)nbh->b_data) =
-						cpu_to_be32(JFS_MAGIC_NUMBER);
-					}
-
-					BUFFER_TRACE(nbh, "marking dirty");
-					set_buffer_uptodate(nbh);
-					mark_buffer_dirty(nbh);
-					BUFFER_TRACE(nbh, "marking uptodate");
-					++info->nr_replays;
-					/* ll_rw_block(WRITE, 1, &nbh); */
-					unlock_buffer(nbh);
-					brelse(obh);
-					brelse(nbh);
-				}
-
-			skip_write:
-				tagp += sizeof(journal_block_tag_t);
-				if (!(flags & JFS_FLAG_SAME_UUID))
-					tagp += 16;
-
-				if (flags & JFS_FLAG_LAST_TAG)
-					break;
-			}
-
-			brelse(bh);
-			continue;
-
-		case JFS_COMMIT_BLOCK:
-			/* Found an expected commit block: not much to
-			 * do other than move on to the next sequence
-			 * number. */
-			brelse(bh);
-			next_commit_ID++;
-			continue;
-
-		case JFS_REVOKE_BLOCK:
-			/* If we aren't in the REVOKE pass, then we can
-			 * just skip over this block. */
-			if (pass != PASS_REVOKE) {
-				brelse(bh);
-				continue;
-			}
-
-			err = scan_revoke_records(journal, bh,
-						  next_commit_ID, info);
-			brelse(bh);
-			if (err)
-				goto failed;
-			continue;
-
-		default:
-			jbd_debug(3, "Unrecognised magic %d, end of scan.\n",
-				  blocktype);
-			brelse(bh);
-			goto done;
-		}
-	}
-
- done:
-	/*
-	 * We broke out of the log scan loop: either we came to the
-	 * known end of the log or we found an unexpected block in the
-	 * log.  If the latter happened, then we know that the "current"
-	 * transaction marks the end of the valid log.
-	 */
-
-	if (pass == PASS_SCAN)
-		info->end_transaction = next_commit_ID;
-	else {
-		/* It's really bad news if different passes end up at
-		 * different places (but possible due to IO errors). */
-		if (info->end_transaction != next_commit_ID) {
-			printk (KERN_ERR "JBD: recovery pass %d ended at "
-				"transaction %u, expected %u\n",
-				pass, next_commit_ID, info->end_transaction);
-			if (!success)
-				success = -EIO;
-		}
-	}
-
-	return success;
-
- failed:
-	return err;
-}
-
-
-/* Scan a revoke record, marking all blocks mentioned as revoked. */
-
-static int scan_revoke_records(journal_t *journal, struct buffer_head *bh,
-			       tid_t sequence, struct recovery_info *info)
-{
-	journal_revoke_header_t *header;
-	int offset, max;
-
-	header = (journal_revoke_header_t *) bh->b_data;
-	offset = sizeof(journal_revoke_header_t);
-	max = be32_to_cpu(header->r_count);
-
-	while (offset < max) {
-		unsigned int blocknr;
-		int err;
-
-		blocknr = be32_to_cpu(* ((__be32 *) (bh->b_data+offset)));
-		offset += 4;
-		err = journal_set_revoke(journal, blocknr, sequence);
-		if (err)
-			return err;
-		++info->nr_revokes;
-	}
-	return 0;
-}
--- a/fs/jbd/revoke.c
+++ b/fs/jbd/revoke.c
@ -1,733 +0,0 @@
-/*
- * linux/fs/jbd/revoke.c
- *
- * Written by Stephen C. Tweedie <sct@redhat.com>, 2000
- *
- * Copyright 2000 Red Hat corp --- All Rights Reserved
- *
- * This file is part of the Linux kernel and is made available under
- * the terms of the GNU General Public License, version 2, or at your
- * option, any later version, incorporated herein by reference.
- *
- * Journal revoke routines for the generic filesystem journaling code;
- * part of the ext2fs journaling system.
- *
- * Revoke is the mechanism used to prevent old log records for deleted
- * metadata from being replayed on top of newer data using the same
- * blocks.  The revoke mechanism is used in two separate places:
- *
- * + Commit: during commit we write the entire list of the current
- *   transaction's revoked blocks to the journal
- *
- * + Recovery: during recovery we record the transaction ID of all
- *   revoked blocks.  If there are multiple revoke records in the log
- *   for a single block, only the last one counts, and if there is a log
- *   entry for a block beyond the last revoke, then that log entry still
- *   gets replayed.
- *
- * We can get interactions between revokes and new log data within a
- * single transaction:
- *
- * Block is revoked and then journaled:
- *   The desired end result is the journaling of the new block, so we
- *   cancel the revoke before the transaction commits.
- *
- * Block is journaled and then revoked:
- *   The revoke must take precedence over the write of the block, so we
- *   need either to cancel the journal entry or to write the revoke
- *   later in the log than the log block.  In this case, we choose the
- *   latter: journaling a block cancels any revoke record for that block
- *   in the current transaction, so any revoke for that block in the
- *   transaction must have happened after the block was journaled and so
- *   the revoke must take precedence.
- *
- * Block is revoked and then written as data:
- *   The data write is allowed to succeed, but the revoke is _not_
- *   cancelled.  We still need to prevent old log records from
- *   overwriting the new data.  We don't even need to clear the revoke
- *   bit here.
- *
- * We cache revoke status of a buffer in the current transaction in b_states
- * bits.  As the name says, revokevalid flag indicates that the cached revoke
- * status of a buffer is valid and we can rely on the cached status.
- *
- * Revoke information on buffers is a tri-state value:
- *
- * RevokeValid clear:	no cached revoke status, need to look it up
- * RevokeValid set, Revoked clear:
- *			buffer has not been revoked, and cancel_revoke
- *			need do nothing.
- * RevokeValid set, Revoked set:
- *			buffer has been revoked.
- *
- * Locking rules:
- * We keep two hash tables of revoke records. One hashtable belongs to the
- * running transaction (is pointed to by journal->j_revoke), the other one
- * belongs to the committing transaction. Accesses to the second hash table
- * happen only from the kjournald and no other thread touches this table.  Also
- * journal_switch_revoke_table() which switches which hashtable belongs to the
- * running and which to the committing transaction is called only from
- * kjournald. Therefore we need no locks when accessing the hashtable belonging
- * to the committing transaction.
- *
- * All users operating on the hash table belonging to the running transaction
- * have a handle to the transaction. Therefore they are safe from kjournald
- * switching hash tables under them. For operations on the lists of entries in
- * the hash table j_revoke_lock is used.
- *
- * Finally, also replay code uses the hash tables but at this moment no one else
- * can touch them (filesystem isn't mounted yet) and hence no locking is
- * needed.
- */
-
-#ifndef __KERNEL__
-#include "jfs_user.h"
-#else
-#include <linux/time.h>
-#include <linux/fs.h>
-#include <linux/jbd.h>
-#include <linux/errno.h>
-#include <linux/slab.h>
-#include <linux/list.h>
-#include <linux/init.h>
-#include <linux/bio.h>
-#endif
-#include <linux/log2.h>
-#include <linux/hash.h>
-
-static struct kmem_cache *revoke_record_cache;
-static struct kmem_cache *revoke_table_cache;
-
-/* Each revoke record represents one single revoked block.  During
-   journal replay, this involves recording the transaction ID of the
-   last transaction to revoke this block. */
-
-struct jbd_revoke_record_s
-{
-	struct list_head  hash;
-	tid_t		  sequence;	/* Used for recovery only */
-	unsigned int	  blocknr;
-};
-
-
-/* The revoke table is just a simple hash table of revoke records. */
-struct jbd_revoke_table_s
-{
-	/* It is conceivable that we might want a larger hash table
-	 * for recovery.  Must be a power of two. */
-	int		  hash_size;
-	int		  hash_shift;
-	struct list_head *hash_table;
-};
-
-
-#ifdef __KERNEL__
-static void write_one_revoke_record(journal_t *, transaction_t *,
-				    struct journal_head **, int *,
-				    struct jbd_revoke_record_s *, int);
-static void flush_descriptor(journal_t *, struct journal_head *, int, int);
-#endif
-
-/* Utility functions to maintain the revoke table */
-
-static inline int hash(journal_t *journal, unsigned int block)
-{
-	struct jbd_revoke_table_s *table = journal->j_revoke;
-
-	return hash_32(block, table->hash_shift);
-}
-
-static int insert_revoke_hash(journal_t *journal, unsigned int blocknr,
-			      tid_t seq)
-{
-	struct list_head *hash_list;
-	struct jbd_revoke_record_s *record;
-
-repeat:
-	record = kmem_cache_alloc(revoke_record_cache, GFP_NOFS);
-	if (!record)
-		goto oom;
-
-	record->sequence = seq;
-	record->blocknr = blocknr;
-	hash_list = &journal->j_revoke->hash_table[hash(journal, blocknr)];
-	spin_lock(&journal->j_revoke_lock);
-	list_add(&record->hash, hash_list);
-	spin_unlock(&journal->j_revoke_lock);
-	return 0;
-
-oom:
-	if (!journal_oom_retry)
-		return -ENOMEM;
-	jbd_debug(1, "ENOMEM in %s, retrying\n", __func__);
-	yield();
-	goto repeat;
-}
-
-/* Find a revoke record in the journal's hash table. */
-
-static struct jbd_revoke_record_s *find_revoke_record(journal_t *journal,
-						      unsigned int blocknr)
-{
-	struct list_head *hash_list;
-	struct jbd_revoke_record_s *record;
-
-	hash_list = &journal->j_revoke->hash_table[hash(journal, blocknr)];
-
-	spin_lock(&journal->j_revoke_lock);
-	record = (struct jbd_revoke_record_s *) hash_list->next;
-	while (&(record->hash) != hash_list) {
-		if (record->blocknr == blocknr) {
-			spin_unlock(&journal->j_revoke_lock);
-			return record;
-		}
-		record = (struct jbd_revoke_record_s *) record->hash.next;
-	}
-	spin_unlock(&journal->j_revoke_lock);
-	return NULL;
-}
-
-void journal_destroy_revoke_caches(void)
-{
-	if (revoke_record_cache) {
-		kmem_cache_destroy(revoke_record_cache);
-		revoke_record_cache = NULL;
-	}
-	if (revoke_table_cache) {
-		kmem_cache_destroy(revoke_table_cache);
-		revoke_table_cache = NULL;
-	}
-}
-
-int __init journal_init_revoke_caches(void)
-{
-	J_ASSERT(!revoke_record_cache);
-	J_ASSERT(!revoke_table_cache);
-
-	revoke_record_cache = kmem_cache_create("revoke_record",
-					   sizeof(struct jbd_revoke_record_s),
-					   0,
-					   SLAB_HWCACHE_ALIGN|SLAB_TEMPORARY,
-					   NULL);
-	if (!revoke_record_cache)
-		goto record_cache_failure;
-
-	revoke_table_cache = kmem_cache_create("revoke_table",
-					   sizeof(struct jbd_revoke_table_s),
-					   0, SLAB_TEMPORARY, NULL);
-	if (!revoke_table_cache)
-		goto table_cache_failure;
-
-	return 0;
-
-table_cache_failure:
-	journal_destroy_revoke_caches();
-record_cache_failure:
-	return -ENOMEM;
-}
-
-static struct jbd_revoke_table_s *journal_init_revoke_table(int hash_size)
-{
-	int i;
-	struct jbd_revoke_table_s *table;
-
-	table = kmem_cache_alloc(revoke_table_cache, GFP_KERNEL);
-	if (!table)
-		goto out;
-
-	table->hash_size = hash_size;
-	table->hash_shift = ilog2(hash_size);
-	table->hash_table =
-		kmalloc(hash_size * sizeof(struct list_head), GFP_KERNEL);
-	if (!table->hash_table) {
-		kmem_cache_free(revoke_table_cache, table);
-		table = NULL;
-		goto out;
-	}
-
-	for (i = 0; i < hash_size; i++)
-		INIT_LIST_HEAD(&table->hash_table[i]);
-
-out:
-	return table;
-}
-
-static void journal_destroy_revoke_table(struct jbd_revoke_table_s *table)
-{
-	int i;
-	struct list_head *hash_list;
-
-	for (i = 0; i < table->hash_size; i++) {
-		hash_list = &table->hash_table[i];
-		J_ASSERT(list_empty(hash_list));
-	}
-
-	kfree(table->hash_table);
-	kmem_cache_free(revoke_table_cache, table);
-}
-
-/* Initialise the revoke table for a given journal to a given size. */
-int journal_init_revoke(journal_t *journal, int hash_size)
-{
-	J_ASSERT(journal->j_revoke_table[0] == NULL);
-	J_ASSERT(is_power_of_2(hash_size));
-
-	journal->j_revoke_table[0] = journal_init_revoke_table(hash_size);
-	if (!journal->j_revoke_table[0])
-		goto fail0;
-
-	journal->j_revoke_table[1] = journal_init_revoke_table(hash_size);
-	if (!journal->j_revoke_table[1])
-		goto fail1;
-
-	journal->j_revoke = journal->j_revoke_table[1];
-
-	spin_lock_init(&journal->j_revoke_lock);
-
-	return 0;
-
-fail1:
-	journal_destroy_revoke_table(journal->j_revoke_table[0]);
-fail0:
-	return -ENOMEM;
-}
-
-/* Destroy a journal's revoke table.  The table must already be empty! */
-void journal_destroy_revoke(journal_t *journal)
-{
-	journal->j_revoke = NULL;
-	if (journal->j_revoke_table[0])
-		journal_destroy_revoke_table(journal->j_revoke_table[0]);
-	if (journal->j_revoke_table[1])
-		journal_destroy_revoke_table(journal->j_revoke_table[1]);
-}
-
-
-#ifdef __KERNEL__
-
-/*
- * journal_revoke: revoke a given buffer_head from the journal.  This
- * prevents the block from being replayed during recovery if we take a
- * crash after this current transaction commits.  Any subsequent
- * metadata writes of the buffer in this transaction cancel the
- * revoke.
- *
- * Note that this call may block --- it is up to the caller to make
- * sure that there are no further calls to journal_write_metadata
- * before the revoke is complete.  In ext3, this implies calling the
- * revoke before clearing the block bitmap when we are deleting
- * metadata.
- *
- * Revoke performs a journal_forget on any buffer_head passed in as a
- * parameter, but does _not_ forget the buffer_head if the bh was only
- * found implicitly.
- *
- * bh_in may not be a journalled buffer - it may have come off
- * the hash tables without an attached journal_head.
- *
- * If bh_in is non-zero, journal_revoke() will decrement its b_count
- * by one.
- */
-
-int journal_revoke(handle_t *handle, unsigned int blocknr,
-		   struct buffer_head *bh_in)
-{
-	struct buffer_head *bh = NULL;
-	journal_t *journal;
-	struct block_device *bdev;
-	int err;
-
-	might_sleep();
-	if (bh_in)
-		BUFFER_TRACE(bh_in, "enter");
-
-	journal = handle->h_transaction->t_journal;
-	if (!journal_set_features(journal, 0, 0, JFS_FEATURE_INCOMPAT_REVOKE)){
-		J_ASSERT (!"Cannot set revoke feature!");
-		return -EINVAL;
-	}
-
-	bdev = journal->j_fs_dev;
-	bh = bh_in;
-
-	if (!bh) {
-		bh = __find_get_block(bdev, blocknr, journal->j_blocksize);
-		if (bh)
-			BUFFER_TRACE(bh, "found on hash");
-	}
-#ifdef JBD_EXPENSIVE_CHECKING
-	else {
-		struct buffer_head *bh2;
-
-		/* If there is a different buffer_head lying around in
-		 * memory anywhere... */
-		bh2 = __find_get_block(bdev, blocknr, journal->j_blocksize);
-		if (bh2) {
-			/* ... and it has RevokeValid status... */
-			if (bh2 != bh && buffer_revokevalid(bh2))
-				/* ...then it better be revoked too,
-				 * since it's illegal to create a revoke
-				 * record against a buffer_head which is
-				 * not marked revoked --- that would
-				 * risk missing a subsequent revoke
-				 * cancel. */
-				J_ASSERT_BH(bh2, buffer_revoked(bh2));
-			put_bh(bh2);
-		}
-	}
-#endif
-
-	/* We really ought not ever to revoke twice in a row without
-           first having the revoke cancelled: it's illegal to free a
-           block twice without allocating it in between! */
-	if (bh) {
-		if (!J_EXPECT_BH(bh, !buffer_revoked(bh),
-				 "inconsistent data on disk")) {
-			if (!bh_in)
-				brelse(bh);
-			return -EIO;
-		}
-		set_buffer_revoked(bh);
-		set_buffer_revokevalid(bh);
-		if (bh_in) {
-			BUFFER_TRACE(bh_in, "call journal_forget");
-			journal_forget(handle, bh_in);
-		} else {
-			BUFFER_TRACE(bh, "call brelse");
-			__brelse(bh);
-		}
-	}
-
-	jbd_debug(2, "insert revoke for block %u, bh_in=%p\n", blocknr, bh_in);
-	err = insert_revoke_hash(journal, blocknr,
-				handle->h_transaction->t_tid);
-	BUFFER_TRACE(bh_in, "exit");
-	return err;
-}
-
-/*
- * Cancel an outstanding revoke.  For use only internally by the
- * journaling code (called from journal_get_write_access).
- *
- * We trust buffer_revoked() on the buffer if the buffer is already
- * being journaled: if there is no revoke pending on the buffer, then we
- * don't do anything here.
- *
- * This would break if it were possible for a buffer to be revoked and
- * discarded, and then reallocated within the same transaction.  In such
- * a case we would have lost the revoked bit, but when we arrived here
- * the second time we would still have a pending revoke to cancel.  So,
- * do not trust the Revoked bit on buffers unless RevokeValid is also
- * set.
- */
-int journal_cancel_revoke(handle_t *handle, struct journal_head *jh)
-{
-	struct jbd_revoke_record_s *record;
-	journal_t *journal = handle->h_transaction->t_journal;
-	int need_cancel;
-	int did_revoke = 0;	/* akpm: debug */
-	struct buffer_head *bh = jh2bh(jh);
-
-	jbd_debug(4, "journal_head %p, cancelling revoke\n", jh);
-
-	/* Is the existing Revoke bit valid?  If so, we trust it, and
-	 * only perform the full cancel if the revoke bit is set.  If
-	 * not, we can't trust the revoke bit, and we need to do the
-	 * full search for a revoke record. */
-	if (test_set_buffer_revokevalid(bh)) {
-		need_cancel = test_clear_buffer_revoked(bh);
-	} else {
-		need_cancel = 1;
-		clear_buffer_revoked(bh);
-	}
-
-	if (need_cancel) {
-		record = find_revoke_record(journal, bh->b_blocknr);
-		if (record) {
-			jbd_debug(4, "cancelled existing revoke on "
-				  "blocknr %llu\n", (unsigned long long)bh->b_blocknr);
-			spin_lock(&journal->j_revoke_lock);
-			list_del(&record->hash);
-			spin_unlock(&journal->j_revoke_lock);
-			kmem_cache_free(revoke_record_cache, record);
-			did_revoke = 1;
-		}
-	}
-
-#ifdef JBD_EXPENSIVE_CHECKING
-	/* There better not be one left behind by now! */
-	record = find_revoke_record(journal, bh->b_blocknr);
-	J_ASSERT_JH(jh, record == NULL);
-#endif
-
-	/* Finally, have we just cleared revoke on an unhashed
-	 * buffer_head?  If so, we'd better make sure we clear the
-	 * revoked status on any hashed alias too, otherwise the revoke
-	 * state machine will get very upset later on. */
-	if (need_cancel) {
-		struct buffer_head *bh2;
-		bh2 = __find_get_block(bh->b_bdev, bh->b_blocknr, bh->b_size);
-		if (bh2) {
-			if (bh2 != bh)
-				clear_buffer_revoked(bh2);
-			__brelse(bh2);
-		}
-	}
-	return did_revoke;
-}
-
-/*
- * journal_clear_revoked_flags clears revoked flag of buffers in
- * revoke table to reflect there is no revoked buffer in the next
- * transaction which is going to be started.
- */
-void journal_clear_buffer_revoked_flags(journal_t *journal)
-{
-	struct jbd_revoke_table_s *revoke = journal->j_revoke;
-	int i = 0;
-
-	for (i = 0; i < revoke->hash_size; i++) {
-		struct list_head *hash_list;
-		struct list_head *list_entry;
-		hash_list = &revoke->hash_table[i];
-
-		list_for_each(list_entry, hash_list) {
-			struct jbd_revoke_record_s *record;
-			struct buffer_head *bh;
-			record = (struct jbd_revoke_record_s *)list_entry;
-			bh = __find_get_block(journal->j_fs_dev,
-					      record->blocknr,
-					      journal->j_blocksize);
-			if (bh) {
-				clear_buffer_revoked(bh);
-				__brelse(bh);
-			}
-		}
-	}
-}
-
-/* journal_switch_revoke table select j_revoke for next transaction
- * we do not want to suspend any processing until all revokes are
- * written -bzzz
- */
-void journal_switch_revoke_table(journal_t *journal)
-{
-	int i;
-
-	if (journal->j_revoke == journal->j_revoke_table[0])
-		journal->j_revoke = journal->j_revoke_table[1];
-	else
-		journal->j_revoke = journal->j_revoke_table[0];
-
-	for (i = 0; i < journal->j_revoke->hash_size; i++)
-		INIT_LIST_HEAD(&journal->j_revoke->hash_table[i]);
-}
-
-/*
- * Write revoke records to the journal for all entries in the current
- * revoke hash, deleting the entries as we go.
- */
-void journal_write_revoke_records(journal_t *journal,
-				  transaction_t *transaction, int write_op)
-{
-	struct journal_head *descriptor;
-	struct jbd_revoke_record_s *record;
-	struct jbd_revoke_table_s *revoke;
-	struct list_head *hash_list;
-	int i, offset, count;
-
-	descriptor = NULL;
-	offset = 0;
-	count = 0;
-
-	/* select revoke table for committing transaction */
-	revoke = journal->j_revoke == journal->j_revoke_table[0] ?
-		journal->j_revoke_table[1] : journal->j_revoke_table[0];
-
-	for (i = 0; i < revoke->hash_size; i++) {
-		hash_list = &revoke->hash_table[i];
-
-		while (!list_empty(hash_list)) {
-			record = (struct jbd_revoke_record_s *)
-				hash_list->next;
-			write_one_revoke_record(journal, transaction,
-						&descriptor, &offset,
-						record, write_op);
-			count++;
-			list_del(&record->hash);
-			kmem_cache_free(revoke_record_cache, record);
-		}
-	}
-	if (descriptor)
-		flush_descriptor(journal, descriptor, offset, write_op);
-	jbd_debug(1, "Wrote %d revoke records\n", count);
-}
-
-/*
- * Write out one revoke record.  We need to create a new descriptor
- * block if the old one is full or if we have not already created one.
- */
-
-static void write_one_revoke_record(journal_t *journal,
-				    transaction_t *transaction,
-				    struct journal_head **descriptorp,
-				    int *offsetp,
-				    struct jbd_revoke_record_s *record,
-				    int write_op)
-{
-	struct journal_head *descriptor;
-	int offset;
-	journal_header_t *header;
-
-	/* If we are already aborting, this all becomes a noop.  We
-           still need to go round the loop in
-           journal_write_revoke_records in order to free all of the
-           revoke records: only the IO to the journal is omitted. */
-	if (is_journal_aborted(journal))
-		return;
-
-	descriptor = *descriptorp;
-	offset = *offsetp;
-
-	/* Make sure we have a descriptor with space left for the record */
-	if (descriptor) {
-		if (offset == journal->j_blocksize) {
-			flush_descriptor(journal, descriptor, offset, write_op);
-			descriptor = NULL;
-		}
-	}
-
-	if (!descriptor) {
-		descriptor = journal_get_descriptor_buffer(journal);
-		if (!descriptor)
-			return;
-		header = (journal_header_t *) &jh2bh(descriptor)->b_data[0];
-		header->h_magic     = cpu_to_be32(JFS_MAGIC_NUMBER);
-		header->h_blocktype = cpu_to_be32(JFS_REVOKE_BLOCK);
-		header->h_sequence  = cpu_to_be32(transaction->t_tid);
-
-		/* Record it so that we can wait for IO completion later */
-		JBUFFER_TRACE(descriptor, "file as BJ_LogCtl");
-		journal_file_buffer(descriptor, transaction, BJ_LogCtl);
-
-		offset = sizeof(journal_revoke_header_t);
-		*descriptorp = descriptor;
-	}
-
-	* ((__be32 *)(&jh2bh(descriptor)->b_data[offset])) =
-		cpu_to_be32(record->blocknr);
-	offset += 4;
-	*offsetp = offset;
-}
-
-/*
- * Flush a revoke descriptor out to the journal.  If we are aborting,
- * this is a noop; otherwise we are generating a buffer which needs to
- * be waited for during commit, so it has to go onto the appropriate
- * journal buffer list.
- */
-
-static void flush_descriptor(journal_t *journal,
-			     struct journal_head *descriptor,
-			     int offset, int write_op)
-{
-	journal_revoke_header_t *header;
-	struct buffer_head *bh = jh2bh(descriptor);
-
-	if (is_journal_aborted(journal)) {
-		put_bh(bh);
-		return;
-	}
-
-	header = (journal_revoke_header_t *) jh2bh(descriptor)->b_data;
-	header->r_count = cpu_to_be32(offset);
-	set_buffer_jwrite(bh);
-	BUFFER_TRACE(bh, "write");
-	set_buffer_dirty(bh);
-	write_dirty_buffer(bh, write_op);
-}
-#endif
-
-/*
- * Revoke support for recovery.
- *
- * Recovery needs to be able to:
- *
- *  record all revoke records, including the tid of the latest instance
- *  of each revoke in the journal
- *
- *  check whether a given block in a given transaction should be replayed
- *  (ie. has not been revoked by a revoke record in that or a subsequent
- *  transaction)
- *
- *  empty the revoke table after recovery.
- */
-
-/*
- * First, setting revoke records.  We create a new revoke record for
- * every block ever revoked in the log as we scan it for recovery, and
- * we update the existing records if we find multiple revokes for a
- * single block.
- */
-
-int journal_set_revoke(journal_t *journal,
-		       unsigned int blocknr,
-		       tid_t sequence)
-{
-	struct jbd_revoke_record_s *record;
-
-	record = find_revoke_record(journal, blocknr);
-	if (record) {
-		/* If we have multiple occurrences, only record the
-		 * latest sequence number in the hashed record */
-		if (tid_gt(sequence, record->sequence))
-			record->sequence = sequence;
-		return 0;
-	}
-	return insert_revoke_hash(journal, blocknr, sequence);
-}
-
-/*
- * Test revoke records.  For a given block referenced in the log, has
- * that block been revoked?  A revoke record with a given transaction
- * sequence number revokes all blocks in that transaction and earlier
- * ones, but later transactions still need replayed.
- */
-
-int journal_test_revoke(journal_t *journal,
-			unsigned int blocknr,
-			tid_t sequence)
-{
-	struct jbd_revoke_record_s *record;
-
-	record = find_revoke_record(journal, blocknr);
-	if (!record)
-		return 0;
-	if (tid_gt(sequence, record->sequence))
-		return 0;
-	return 1;
-}
-
-/*
- * Finally, once recovery is over, we need to clear the revoke table so
- * that it can be reused by the running filesystem.
- */
-
-void journal_clear_revoke(journal_t *journal)
-{
-	int i;
-	struct list_head *hash_list;
-	struct jbd_revoke_record_s *record;
-	struct jbd_revoke_table_s *revoke;
-
-	revoke = journal->j_revoke;
-
-	for (i = 0; i < revoke->hash_size; i++) {
-		hash_list = &revoke->hash_table[i];
-		while (!list_empty(hash_list)) {
-			record = (struct jbd_revoke_record_s*) hash_list->next;
-			list_del(&record->hash);
-			kmem_cache_free(revoke_record_cache, record);
-		}
-	}
-}
--- a/fs/jbd/transaction.c
+++ b/fs/jbd/transaction.c
--- a/fs/jfs/file.c
+++ b/fs/jfs/file.c
@ -107,8 +107,11 @@ int jfs_setattr(struct dentry *dentry, struct iattr *iattr)
 	if (rc)
 		return rc;

-	if (is_quota_modification(inode, iattr))
-		dquot_initialize(inode);
+	if (is_quota_modification(inode, iattr)) {
+		rc = dquot_initialize(inode);
+		if (rc)
+			return rc;
+	}
 	if ((iattr->ia_valid & ATTR_UID && !uid_eq(iattr->ia_uid, inode->i_uid)) ||
 	    (iattr->ia_valid & ATTR_GID && !gid_eq(iattr->ia_gid, inode->i_gid))) {
 		rc = dquot_transfer(inode, iattr);
--- a/fs/jfs/jfs_inode.c
+++ b/fs/jfs/jfs_inode.c
@ -109,7 +109,9 @@ struct inode *ialloc(struct inode *parent, umode_t mode)
 	/*
 	 * Allocate inode to quota.
 	 */
-	dquot_initialize(inode);
+	rc = dquot_initialize(inode);
+	if (rc)
+		goto fail_drop;
 	rc = dquot_alloc_inode(inode);
 	if (rc)
 		goto fail_drop;
--- a/fs/jfs/namei.c
+++ b/fs/jfs/namei.c
@ -86,7 +86,9 @@ static int jfs_create(struct inode *dip, struct dentry *dentry, umode_t mode,

 	jfs_info("jfs_create: dip:0x%p name:%pd", dip, dentry);

-	dquot_initialize(dip);
+	rc = dquot_initialize(dip);
+	if (rc)
+		goto out1;

 	/*
 	 * search parent directory for entry/freespace
@ -218,7 +220,9 @@ static int jfs_mkdir(struct inode *dip, struct dentry *dentry, umode_t mode)

 	jfs_info("jfs_mkdir: dip:0x%p name:%pd", dip, dentry);

-	dquot_initialize(dip);
+	rc = dquot_initialize(dip);
+	if (rc)
+		goto out1;

 	/*
 	 * search parent directory for entry/freespace
@ -355,8 +359,12 @@ static int jfs_rmdir(struct inode *dip, struct dentry *dentry)
 	jfs_info("jfs_rmdir: dip:0x%p name:%pd", dip, dentry);

 	/* Init inode for quota operations. */
-	dquot_initialize(dip);
-	dquot_initialize(ip);
+	rc = dquot_initialize(dip);
+	if (rc)
+		goto out;
+	rc = dquot_initialize(ip);
+	if (rc)
+		goto out;

 	/* directory must be empty to be removed */
 	if (!dtEmpty(ip)) {
@ -483,8 +491,12 @@ static int jfs_unlink(struct inode *dip, struct dentry *dentry)
 	jfs_info("jfs_unlink: dip:0x%p name:%pd", dip, dentry);

 	/* Init inode for quota operations. */
-	dquot_initialize(dip);
-	dquot_initialize(ip);
+	rc = dquot_initialize(dip);
+	if (rc)
+		goto out;
+	rc = dquot_initialize(ip);
+	if (rc)
+		goto out;

 	if ((rc = get_UCSname(&dname, dentry)))
 		goto out;
@ -799,7 +811,9 @@ static int jfs_link(struct dentry *old_dentry,

 	jfs_info("jfs_link: %pd %pd", old_dentry, dentry);

-	dquot_initialize(dir);
+	rc = dquot_initialize(dir);
+	if (rc)
+		goto out;

 	tid = txBegin(ip->i_sb, 0);

@ -810,7 +824,7 @@ static int jfs_link(struct dentry *old_dentry,
 	 * scan parent directory for entry/freespace
 	 */
 	if ((rc = get_UCSname(&dname, dentry)))
-		goto out;
+		goto out_tx;

 	if ((rc = dtSearch(dir, &dname, &ino, &btstack, JFS_CREATE)))
 		goto free_dname;
@ -842,12 +856,13 @@ static int jfs_link(struct dentry *old_dentry,
      free_dname:
 	free_UCSname(&dname);

-      out:
+      out_tx:
 	txEnd(tid);

 	mutex_unlock(&JFS_IP(ip)->commit_mutex);
 	mutex_unlock(&JFS_IP(dir)->commit_mutex);

+      out:
 	jfs_info("jfs_link: rc:%d", rc);
 	return rc;
 }
@ -891,7 +906,9 @@ static int jfs_symlink(struct inode *dip, struct dentry *dentry,

 	jfs_info("jfs_symlink: dip:0x%p name:%s", dip, name);

-	dquot_initialize(dip);
+	rc = dquot_initialize(dip);
+	if (rc)
+		goto out1;

 	ssize = strlen(name) + 1;

@ -1082,8 +1099,12 @@ static int jfs_rename(struct inode *old_dir, struct dentry *old_dentry,

 	jfs_info("jfs_rename: %pd %pd", old_dentry, new_dentry);

-	dquot_initialize(old_dir);
-	dquot_initialize(new_dir);
+	rc = dquot_initialize(old_dir);
+	if (rc)
+		goto out1;
+	rc = dquot_initialize(new_dir);
+	if (rc)
+		goto out1;

 	old_ip = d_inode(old_dentry);
 	new_ip = d_inode(new_dentry);
@ -1130,7 +1151,9 @@ static int jfs_rename(struct inode *old_dir, struct dentry *old_dentry,
 	} else if (new_ip) {
 		IWRITE_LOCK(new_ip, RDWRLOCK_NORMAL);
 		/* Init inode for quota operations. */
-		dquot_initialize(new_ip);
+		rc = dquot_initialize(new_ip);
+		if (rc)
+			goto out_unlock;
 	}

 	/*
@ -1318,6 +1341,7 @@ static int jfs_rename(struct inode *old_dir, struct dentry *old_dentry,

 		clear_cflag(COMMIT_Stale, old_dir);
 	}
+      out_unlock:
 	if (new_ip && !S_ISDIR(new_ip->i_mode))
 		IWRITE_UNLOCK(new_ip);
      out3:
@ -1353,7 +1377,9 @@ static int jfs_mknod(struct inode *dir, struct dentry *dentry,

 	jfs_info("jfs_mknod: %pd", dentry);

-	dquot_initialize(dir);
+	rc = dquot_initialize(dir);
+	if (rc)
+		goto out;

 	if ((rc = get_UCSname(&dname, dentry)))
 		goto out;
--- a/fs/ocfs2/file.c
+++ b/fs/ocfs2/file.c
@ -105,8 +105,11 @@ static int ocfs2_file_open(struct inode *inode, struct file *file)
 			      file->f_path.dentry->d_name.len,
 			      file->f_path.dentry->d_name.name, mode);

-	if (file->f_mode & FMODE_WRITE)
-		dquot_initialize(inode);
+	if (file->f_mode & FMODE_WRITE) {
+		status = dquot_initialize(inode);
+		if (status)
+			goto leave;
+	}

 	spin_lock(&oi->ip_lock);

@ -1155,8 +1158,11 @@ int ocfs2_setattr(struct dentry *dentry, struct iattr *attr)
 	if (status)
 		return status;

-	if (is_quota_modification(inode, attr))
-		dquot_initialize(inode);
+	if (is_quota_modification(inode, attr)) {
+		status = dquot_initialize(inode);
+		if (status)
+			return status;
+	}
 	size_change = S_ISREG(inode->i_mode) && attr->ia_valid & ATTR_SIZE;
 	if (size_change) {
 		status = ocfs2_rw_lock(inode, 1);
@ -1209,8 +1215,8 @@ int ocfs2_setattr(struct dentry *dentry, struct iattr *attr)
 		    && OCFS2_HAS_RO_COMPAT_FEATURE(sb,
 		    OCFS2_FEATURE_RO_COMPAT_USRQUOTA)) {
 			transfer_to[USRQUOTA] = dqget(sb, make_kqid_uid(attr->ia_uid));
-			if (!transfer_to[USRQUOTA]) {
-				status = -ESRCH;
+			if (IS_ERR(transfer_to[USRQUOTA])) {
+				status = PTR_ERR(transfer_to[USRQUOTA]);
 				goto bail_unlock;
 			}
 		}
@ -1218,8 +1224,8 @@ int ocfs2_setattr(struct dentry *dentry, struct iattr *attr)
 		    && OCFS2_HAS_RO_COMPAT_FEATURE(sb,
 		    OCFS2_FEATURE_RO_COMPAT_GRPQUOTA)) {
 			transfer_to[GRPQUOTA] = dqget(sb, make_kqid_gid(attr->ia_gid));
-			if (!transfer_to[GRPQUOTA]) {
-				status = -ESRCH;
+			if (IS_ERR(transfer_to[GRPQUOTA])) {
+				status = PTR_ERR(transfer_to[GRPQUOTA]);
 				goto bail_unlock;
 			}
 		}
--- a/fs/ocfs2/namei.c
+++ b/fs/ocfs2/namei.c
@ -200,11 +200,12 @@ static struct dentry *ocfs2_lookup(struct inode *dir, struct dentry *dentry,
 static struct inode *ocfs2_get_init_inode(struct inode *dir, umode_t mode)
 {
 	struct inode *inode;
+	int status;

 	inode = new_inode(dir->i_sb);
 	if (!inode) {
 		mlog(ML_ERROR, "new_inode failed!\n");
-		return NULL;
+		return ERR_PTR(-ENOMEM);
 	}

 	/* populate as many fields early on as possible - many of
@ -213,7 +214,10 @@ static struct inode *ocfs2_get_init_inode(struct inode *dir, umode_t mode)
 	if (S_ISDIR(mode))
 		set_nlink(inode, 2);
 	inode_init_owner(inode, dir, mode);
-	dquot_initialize(inode);
+	status = dquot_initialize(inode);
+	if (status)
+		return ERR_PTR(status);
+
 	return inode;
 }

@ -264,7 +268,11 @@ static int ocfs2_mknod(struct inode *dir,
 			  (unsigned long long)OCFS2_I(dir)->ip_blkno,
 			  (unsigned long)dev, mode);

-	dquot_initialize(dir);
+	status = dquot_initialize(dir);
+	if (status) {
+		mlog_errno(status);
+		return status;
+	}

 	/* get our super block */
 	osb = OCFS2_SB(dir->i_sb);
@ -311,8 +319,9 @@ static int ocfs2_mknod(struct inode *dir,
 	}

 	inode = ocfs2_get_init_inode(dir, mode);
-	if (!inode) {
-		status = -ENOMEM;
+	if (IS_ERR(inode)) {
+		status = PTR_ERR(inode);
+		inode = NULL;
 		mlog_errno(status);
 		goto leave;
 	}
@ -708,7 +717,11 @@ static int ocfs2_link(struct dentry *old_dentry,
 	if (S_ISDIR(inode->i_mode))
 		return -EPERM;

-	dquot_initialize(dir);
+	err = dquot_initialize(dir);
+	if (err) {
+		mlog_errno(err);
+		return err;
+	}

 	err = ocfs2_double_lock(osb, &old_dir_bh, old_dir,
 			&parent_fe_bh, dir, 0);
@ -896,7 +909,11 @@ static int ocfs2_unlink(struct inode *dir,
 			   (unsigned long long)OCFS2_I(dir)->ip_blkno,
 			   (unsigned long long)OCFS2_I(inode)->ip_blkno);

-	dquot_initialize(dir);
+	status = dquot_initialize(dir);
+	if (status) {
+		mlog_errno(status);
+		return status;
+	}

 	BUG_ON(d_inode(dentry->d_parent) != dir);

@ -1230,8 +1247,16 @@ static int ocfs2_rename(struct inode *old_dir,
 			   old_dentry->d_name.len, old_dentry->d_name.name,
 			   new_dentry->d_name.len, new_dentry->d_name.name);

-	dquot_initialize(old_dir);
-	dquot_initialize(new_dir);
+	status = dquot_initialize(old_dir);
+	if (status) {
+		mlog_errno(status);
+		goto bail;
+	}
+	status = dquot_initialize(new_dir);
+	if (status) {
+		mlog_errno(status);
+		goto bail;
+	}

 	osb = OCFS2_SB(old_dir->i_sb);

@ -1786,7 +1811,11 @@ static int ocfs2_symlink(struct inode *dir,
 	trace_ocfs2_symlink_begin(dir, dentry, symname,
 				  dentry->d_name.len, dentry->d_name.name);

-	dquot_initialize(dir);
+	status = dquot_initialize(dir);
+	if (status) {
+		mlog_errno(status);
+		goto bail;
+	}

 	sb = dir->i_sb;
 	osb = OCFS2_SB(sb);
@ -1831,8 +1860,9 @@ static int ocfs2_symlink(struct inode *dir,
 	}

 	inode = ocfs2_get_init_inode(dir, S_IFLNK | S_IRWXUGO);
-	if (!inode) {
-		status = -ENOMEM;
+	if (IS_ERR(inode)) {
+		status = PTR_ERR(inode);
+		inode = NULL;
 		mlog_errno(status);
 		goto bail;
 	}
@ -2485,8 +2515,9 @@ int ocfs2_create_inode_in_orphan(struct inode *dir,
 	}

 	inode = ocfs2_get_init_inode(dir, mode);
-	if (!inode) {
-		status = -ENOMEM;
+	if (IS_ERR(inode)) {
+		status = PTR_ERR(inode);
+		inode = NULL;
 		mlog_errno(status);
 		goto leave;
 	}
--- a/fs/ocfs2/quota_local.c
+++ b/fs/ocfs2/quota_local.c
@ -499,8 +499,8 @@ static int ocfs2_recover_local_quota_file(struct inode *lqinode,
 			dquot = dqget(sb,
 				      make_kqid(&init_user_ns, type,
 						le64_to_cpu(dqblk->dqb_id)));
-			if (!dquot) {
-				status = -EIO;
+			if (IS_ERR(dquot)) {
+				status = PTR_ERR(dquot);
 				mlog(ML_ERROR, "Failed to get quota structure "
 				     "for id %u, type %d. Cannot finish quota "
 				     "file recovery.\n",
--- a/fs/ocfs2/refcounttree.c
+++ b/fs/ocfs2/refcounttree.c
@ -4419,8 +4419,9 @@ static int ocfs2_vfs_reflink(struct dentry *old_dentry, struct inode *dir,
 	}

 	mutex_lock(&inode->i_mutex);
-	dquot_initialize(dir);
-	error = ocfs2_reflink(old_dentry, dir, new_dentry, preserve);
+	error = dquot_initialize(dir);
+	if (!error)
+		error = ocfs2_reflink(old_dentry, dir, new_dentry, preserve);
 	mutex_unlock(&inode->i_mutex);
 	if (!error)
 		fsnotify_create(dir, new_dentry);
--- a/fs/quota/dquot.c
+++ b/fs/quota/dquot.c
@ -247,7 +247,7 @@ struct dqstats dqstats;
 EXPORT_SYMBOL(dqstats);

 static qsize_t inode_get_rsv_space(struct inode *inode);
-static void __dquot_initialize(struct inode *inode, int type);
+static int __dquot_initialize(struct inode *inode, int type);

 static inline unsigned int
 hashfn(const struct super_block *sb, struct kqid qid)
@ -832,16 +832,17 @@ static struct dquot *get_empty_dquot(struct super_block *sb, int type)
 struct dquot *dqget(struct super_block *sb, struct kqid qid)
 {
 	unsigned int hashent = hashfn(sb, qid);
-	struct dquot *dquot = NULL, *empty = NULL;
+	struct dquot *dquot, *empty = NULL;

        if (!sb_has_quota_active(sb, qid.type))
-		return NULL;
+		return ERR_PTR(-ESRCH);
 we_slept:
 	spin_lock(&dq_list_lock);
 	spin_lock(&dq_state_lock);
 	if (!sb_has_quota_active(sb, qid.type)) {
 		spin_unlock(&dq_state_lock);
 		spin_unlock(&dq_list_lock);
+		dquot = ERR_PTR(-ESRCH);
 		goto out;
 	}
 	spin_unlock(&dq_state_lock);
@ -876,11 +877,15 @@ struct dquot *dqget(struct super_block *sb, struct kqid qid)
 	 * already finished or it will be canceled due to dq_count > 1 test */
 	wait_on_dquot(dquot);
 	/* Read the dquot / allocate space in quota file */
-	if (!test_bit(DQ_ACTIVE_B, &dquot->dq_flags) &&
-	    sb->dq_op->acquire_dquot(dquot) < 0) {
-		dqput(dquot);
-		dquot = NULL;
-		goto out;
+	if (!test_bit(DQ_ACTIVE_B, &dquot->dq_flags)) {
+		int err;
+
+		err = sb->dq_op->acquire_dquot(dquot);
+		if (err < 0) {
+			dqput(dquot);
+			dquot = ERR_PTR(err);
+			goto out;
+		}
 	}
 #ifdef CONFIG_QUOTA_DEBUG
 	BUG_ON(!dquot->dq_sb);	/* Has somebody invalidated entry under us? */
@ -1390,15 +1395,16 @@ static int dquot_active(const struct inode *inode)
 * It is better to call this function outside of any transaction as it
 * might need a lot of space in journal for dquot structure allocation.
 */
-static void __dquot_initialize(struct inode *inode, int type)
+static int __dquot_initialize(struct inode *inode, int type)
 {
 	int cnt, init_needed = 0;
 	struct dquot **dquots, *got[MAXQUOTAS];
 	struct super_block *sb = inode->i_sb;
 	qsize_t rsv;
+	int ret = 0;

 	if (!dquot_active(inode))
-		return;
+		return 0;

 	dquots = i_dquot(inode);

@ -1407,6 +1413,7 @@ static void __dquot_initialize(struct inode *inode, int type)
 		struct kqid qid;
 		kprojid_t projid;
 		int rc;
+		struct dquot *dquot;

 		got[cnt] = NULL;
 		if (type != -1 && cnt != type)
@ -1438,16 +1445,25 @@ static void __dquot_initialize(struct inode *inode, int type)
 			qid = make_kqid_projid(projid);
 			break;
 		}
-		got[cnt] = dqget(sb, qid);
+		dquot = dqget(sb, qid);
+		if (IS_ERR(dquot)) {
+			/* We raced with somebody turning quotas off... */
+			if (PTR_ERR(dquot) != -ESRCH) {
+				ret = PTR_ERR(dquot);
+				goto out_put;
+			}
+			dquot = NULL;
+		}
+		got[cnt] = dquot;
 	}

 	/* All required i_dquot has been initialized */
 	if (!init_needed)
-		return;
+		return 0;

 	spin_lock(&dq_data_lock);
 	if (IS_NOQUOTA(inode))
-		goto out_err;
+		goto out_lock;
 	for (cnt = 0; cnt < MAXQUOTAS; cnt++) {
 		if (type != -1 && cnt != type)
 			continue;
@ -1469,15 +1485,18 @@ static void __dquot_initialize(struct inode *inode, int type)
 				dquot_resv_space(dquots[cnt], rsv);
 		}
 	}
-out_err:
+out_lock:
 	spin_unlock(&dq_data_lock);
+out_put:
 	/* Drop unused references */
 	dqput_all(got);
+
+	return ret;
 }

-void dquot_initialize(struct inode *inode)
+int dquot_initialize(struct inode *inode)
 {
-	__dquot_initialize(inode, -1);
+	return __dquot_initialize(inode, -1);
 }
 EXPORT_SYMBOL(dquot_initialize);

@ -1961,18 +1980,37 @@ EXPORT_SYMBOL(__dquot_transfer);
 int dquot_transfer(struct inode *inode, struct iattr *iattr)
 {
 	struct dquot *transfer_to[MAXQUOTAS] = {};
+	struct dquot *dquot;
 	struct super_block *sb = inode->i_sb;
 	int ret;

 	if (!dquot_active(inode))
 		return 0;

-	if (iattr->ia_valid & ATTR_UID && !uid_eq(iattr->ia_uid, inode->i_uid))
-		transfer_to[USRQUOTA] = dqget(sb, make_kqid_uid(iattr->ia_uid));
-	if (iattr->ia_valid & ATTR_GID && !gid_eq(iattr->ia_gid, inode->i_gid))
-		transfer_to[GRPQUOTA] = dqget(sb, make_kqid_gid(iattr->ia_gid));
-
+	if (iattr->ia_valid & ATTR_UID && !uid_eq(iattr->ia_uid, inode->i_uid)){
+		dquot = dqget(sb, make_kqid_uid(iattr->ia_uid));
+		if (IS_ERR(dquot)) {
+			if (PTR_ERR(dquot) != -ESRCH) {
+				ret = PTR_ERR(dquot);
+				goto out_put;
+			}
+			dquot = NULL;
+		}
+		transfer_to[USRQUOTA] = dquot;
+	}
+	if (iattr->ia_valid & ATTR_GID && !gid_eq(iattr->ia_gid, inode->i_gid)){
+		dquot = dqget(sb, make_kqid_gid(iattr->ia_gid));
+		if (IS_ERR(dquot)) {
+			if (PTR_ERR(dquot) != -ESRCH) {
+				ret = PTR_ERR(dquot);
+				goto out_put;
+			}
+			dquot = NULL;
+		}
+		transfer_to[GRPQUOTA] = dquot;
+	}
 	ret = __dquot_transfer(inode, transfer_to);
+out_put:
 	dqput_all(transfer_to);
 	return ret;
 }
@ -2518,8 +2556,8 @@ int dquot_get_dqblk(struct super_block *sb, struct kqid qid,
 	struct dquot *dquot;

 	dquot = dqget(sb, qid);
-	if (!dquot)
-		return -ESRCH;
+	if (IS_ERR(dquot))
+		return PTR_ERR(dquot);
 	do_get_dqblk(dquot, di);
 	dqput(dquot);

@ -2631,8 +2669,8 @@ int dquot_set_dqblk(struct super_block *sb, struct kqid qid,
 	int rc;

 	dquot = dqget(sb, qid);
-	if (!dquot) {
-		rc = -ESRCH;
+	if (IS_ERR(dquot)) {
+		rc = PTR_ERR(dquot);
 		goto out;
 	}
 	rc = do_set_dqblk(dquot, di);
--- a/fs/quota/quota.c
+++ b/fs/quota/quota.c
@ -141,9 +141,9 @@ static int quota_getinfo(struct super_block *sb, int type, void __user *addr)
 	if (tstate->flags & QCI_ROOT_SQUASH)
 		uinfo.dqi_flags |= DQF_ROOT_SQUASH;
 	uinfo.dqi_valid = IIF_ALL;
-	if (!ret && copy_to_user(addr, &uinfo, sizeof(uinfo)))
+	if (copy_to_user(addr, &uinfo, sizeof(uinfo)))
 		return -EFAULT;
-	return ret;
+	return 0;
 }

 static int quota_setinfo(struct super_block *sb, int type, void __user *addr)
--- a/fs/reiserfs/inode.c
+++ b/fs/reiserfs/inode.c
@ -3319,8 +3319,11 @@ int reiserfs_setattr(struct dentry *dentry, struct iattr *attr)
 	/* must be turned off for recursive notify_change calls */
 	ia_valid = attr->ia_valid &= ~(ATTR_KILL_SUID|ATTR_KILL_SGID);

-	if (is_quota_modification(inode, attr))
-		dquot_initialize(inode);
+	if (is_quota_modification(inode, attr)) {
+		error = dquot_initialize(inode);
+		if (error)
+			return error;
+	}
 	reiserfs_write_lock(inode->i_sb);
 	if (attr->ia_valid & ATTR_SIZE) {
 		/*
--- a/fs/reiserfs/namei.c
+++ b/fs/reiserfs/namei.c
@ -613,8 +613,7 @@ static int new_inode_init(struct inode *inode, struct inode *dir, umode_t mode)
 	 * we have to set uid and gid here
 	 */
 	inode_init_owner(inode, dir, mode);
-	dquot_initialize(inode);
-	return 0;
+	return dquot_initialize(inode);
 }

 static int reiserfs_create(struct inode *dir, struct dentry *dentry, umode_t mode,
@ -633,12 +632,18 @@ static int reiserfs_create(struct inode *dir, struct dentry *dentry, umode_t mod
 	struct reiserfs_transaction_handle th;
 	struct reiserfs_security_handle security;

-	dquot_initialize(dir);
+	retval = dquot_initialize(dir);
+	if (retval)
+		return retval;

 	if (!(inode = new_inode(dir->i_sb))) {
 		return -ENOMEM;
 	}
-	new_inode_init(inode, dir, mode);
+	retval = new_inode_init(inode, dir, mode);
+	if (retval) {
+		drop_new_inode(inode);
+		return retval;
+	}

 	jbegin_count += reiserfs_cache_default_acl(dir);
 	retval = reiserfs_security_init(dir, inode, &dentry->d_name, &security);
@ -710,12 +715,18 @@ static int reiserfs_mknod(struct inode *dir, struct dentry *dentry, umode_t mode
 	if (!new_valid_dev(rdev))
 		return -EINVAL;

-	dquot_initialize(dir);
+	retval = dquot_initialize(dir);
+	if (retval)
+		return retval;

 	if (!(inode = new_inode(dir->i_sb))) {
 		return -ENOMEM;
 	}
-	new_inode_init(inode, dir, mode);
+	retval = new_inode_init(inode, dir, mode);
+	if (retval) {
+		drop_new_inode(inode);
+		return retval;
+	}

 	jbegin_count += reiserfs_cache_default_acl(dir);
 	retval = reiserfs_security_init(dir, inode, &dentry->d_name, &security);
@ -787,7 +798,9 @@ static int reiserfs_mkdir(struct inode *dir, struct dentry *dentry, umode_t mode
 	    2 * (REISERFS_QUOTA_INIT_BLOCKS(dir->i_sb) +
 		 REISERFS_QUOTA_TRANS_BLOCKS(dir->i_sb));

-	dquot_initialize(dir);
+	retval = dquot_initialize(dir);
+	if (retval)
+		return retval;

 #ifdef DISPLACE_NEW_PACKING_LOCALITIES
 	/*
@ -800,7 +813,11 @@ static int reiserfs_mkdir(struct inode *dir, struct dentry *dentry, umode_t mode
 	if (!(inode = new_inode(dir->i_sb))) {
 		return -ENOMEM;
 	}
-	new_inode_init(inode, dir, mode);
+	retval = new_inode_init(inode, dir, mode);
+	if (retval) {
+		drop_new_inode(inode);
+		return retval;
+	}

 	jbegin_count += reiserfs_cache_default_acl(dir);
 	retval = reiserfs_security_init(dir, inode, &dentry->d_name, &security);
@ -899,7 +916,9 @@ static int reiserfs_rmdir(struct inode *dir, struct dentry *dentry)
 	    JOURNAL_PER_BALANCE_CNT * 2 + 2 +
 	    4 * REISERFS_QUOTA_TRANS_BLOCKS(dir->i_sb);

-	dquot_initialize(dir);
+	retval = dquot_initialize(dir);
+	if (retval)
+		return retval;

 	reiserfs_write_lock(dir->i_sb);
 	retval = journal_begin(&th, dir->i_sb, jbegin_count);
@ -985,7 +1004,9 @@ static int reiserfs_unlink(struct inode *dir, struct dentry *dentry)
 	int jbegin_count;
 	unsigned long savelink;

-	dquot_initialize(dir);
+	retval = dquot_initialize(dir);
+	if (retval)
+		return retval;

 	inode = d_inode(dentry);

@ -1095,12 +1116,18 @@ static int reiserfs_symlink(struct inode *parent_dir,
 	    2 * (REISERFS_QUOTA_INIT_BLOCKS(parent_dir->i_sb) +
 		 REISERFS_QUOTA_TRANS_BLOCKS(parent_dir->i_sb));

-	dquot_initialize(parent_dir);
+	retval = dquot_initialize(parent_dir);
+	if (retval)
+		return retval;

 	if (!(inode = new_inode(parent_dir->i_sb))) {
 		return -ENOMEM;
 	}
-	new_inode_init(inode, parent_dir, mode);
+	retval = new_inode_init(inode, parent_dir, mode);
+	if (retval) {
+		drop_new_inode(inode);
+		return retval;
+	}

 	retval = reiserfs_security_init(parent_dir, inode, &dentry->d_name,
 					&security);
@ -1184,7 +1211,9 @@ static int reiserfs_link(struct dentry *old_dentry, struct inode *dir,
 	    JOURNAL_PER_BALANCE_CNT * 3 +
 	    2 * REISERFS_QUOTA_TRANS_BLOCKS(dir->i_sb);

-	dquot_initialize(dir);
+	retval = dquot_initialize(dir);
+	if (retval)
+		return retval;

 	reiserfs_write_lock(dir->i_sb);
 	if (inode->i_nlink >= REISERFS_LINK_MAX) {
@ -1308,8 +1337,12 @@ static int reiserfs_rename(struct inode *old_dir, struct dentry *old_dentry,
 	    JOURNAL_PER_BALANCE_CNT * 3 + 5 +
 	    4 * REISERFS_QUOTA_TRANS_BLOCKS(old_dir->i_sb);

-	dquot_initialize(old_dir);
-	dquot_initialize(new_dir);
+	retval = dquot_initialize(old_dir);
+	if (retval)
+		return retval;
+	retval = dquot_initialize(new_dir);
+	if (retval)
+		return retval;

 	old_inode = d_inode(old_dentry);
 	new_dentry_inode = d_inode(new_dentry);
--- a/fs/udf/super.c
+++ b/fs/udf/super.c
@ -2070,6 +2070,7 @@ static int udf_fill_super(struct super_block *sb, void *options, int silent)
 	struct udf_options uopt;
 	struct kernel_lb_addr rootdir, fileset;
 	struct udf_sb_info *sbi;
+	bool lvid_open = false;

 	uopt.flags = (1 << UDF_FLAG_USE_AD_IN_ICB) | (1 << UDF_FLAG_STRICT);
 	uopt.uid = INVALID_UID;
@ -2216,8 +2217,10 @@ static int udf_fill_super(struct super_block *sb, void *options, int silent)
 			 le16_to_cpu(ts.year), ts.month, ts.day,
 			 ts.hour, ts.minute, le16_to_cpu(ts.typeAndTimezone));
 	}
-	if (!(sb->s_flags & MS_RDONLY))
+	if (!(sb->s_flags & MS_RDONLY)) {
 		udf_open_lvid(sb);
+		lvid_open = true;
+	}

 	/* Assign the root inode */
 	/* assign inodes by physical block number */
@ -2248,7 +2251,7 @@ static int udf_fill_super(struct super_block *sb, void *options, int silent)
 	if (UDF_QUERY_FLAG(sb, UDF_FLAG_NLS_MAP))
 		unload_nls(sbi->s_nls_map);
 #endif
-	if (!(sb->s_flags & MS_RDONLY))
+	if (lvid_open)
 		udf_close_lvid(sb);
 	brelse(sbi->s_lvid_bh);
 	udf_sb_free_partitions(sb);
--- a/include/linux/blk_types.h
+++ b/include/linux/blk_types.h
@ -118,9 +118,8 @@ struct bio {
 #define BIO_USER_MAPPED 4	/* contains user pages */
 #define BIO_NULL_MAPPED 5	/* contains invalid user pages */
 #define BIO_QUIET	6	/* Make BIO Quiet */
-#define BIO_SNAP_STABLE	7	/* bio data must be snapshotted during write */
-#define BIO_CHAIN	8	/* chained bio, ->bi_remaining in effect */
-#define BIO_REFFED	9	/* bio has elevated ->bi_cnt */
+#define BIO_CHAIN	7	/* chained bio, ->bi_remaining in effect */
+#define BIO_REFFED	8	/* bio has elevated ->bi_cnt */

 /*
 * Flags starting here get preserved by bio_reset() - this includes
--- a/include/linux/jbd.h
+++ b/include/linux/jbd.h
--- a/include/linux/jbd2.h
+++ b/include/linux/jbd2.h
@ -29,6 +29,7 @@
 #include <linux/mutex.h>
 #include <linux/timer.h>
 #include <linux/slab.h>
+#include <linux/bit_spinlock.h>
 #include <crypto/hash.h>
 #endif

@ -336,7 +337,45 @@ BUFFER_FNS(Freed, freed)
 BUFFER_FNS(Shadow, shadow)
 BUFFER_FNS(Verified, verified)

-#include <linux/jbd_common.h>
+static inline struct buffer_head *jh2bh(struct journal_head *jh)
+{
+	return jh->b_bh;
+}
+
+static inline struct journal_head *bh2jh(struct buffer_head *bh)
+{
+	return bh->b_private;
+}
+
+static inline void jbd_lock_bh_state(struct buffer_head *bh)
+{
+	bit_spin_lock(BH_State, &bh->b_state);
+}
+
+static inline int jbd_trylock_bh_state(struct buffer_head *bh)
+{
+	return bit_spin_trylock(BH_State, &bh->b_state);
+}
+
+static inline int jbd_is_locked_bh_state(struct buffer_head *bh)
+{
+	return bit_spin_is_locked(BH_State, &bh->b_state);
+}
+
+static inline void jbd_unlock_bh_state(struct buffer_head *bh)
+{
+	bit_spin_unlock(BH_State, &bh->b_state);
+}
+
+static inline void jbd_lock_bh_journal_head(struct buffer_head *bh)
+{
+	bit_spin_lock(BH_JournalHead, &bh->b_state);
+}
+
+static inline void jbd_unlock_bh_journal_head(struct buffer_head *bh)
+{
+	bit_spin_unlock(BH_JournalHead, &bh->b_state);
+}

 #define J_ASSERT(assert)	BUG_ON(!(assert))

--- a/include/linux/jbd_common.h
+++ b/include/linux/jbd_common.h
@ -1,46 +0,0 @@
-#ifndef _LINUX_JBD_STATE_H
-#define _LINUX_JBD_STATE_H
-
-#include <linux/bit_spinlock.h>
-
-static inline struct buffer_head *jh2bh(struct journal_head *jh)
-{
-	return jh->b_bh;
-}
-
-static inline struct journal_head *bh2jh(struct buffer_head *bh)
-{
-	return bh->b_private;
-}
-
-static inline void jbd_lock_bh_state(struct buffer_head *bh)
-{
-	bit_spin_lock(BH_State, &bh->b_state);
-}
-
-static inline int jbd_trylock_bh_state(struct buffer_head *bh)
-{
-	return bit_spin_trylock(BH_State, &bh->b_state);
-}
-
-static inline int jbd_is_locked_bh_state(struct buffer_head *bh)
-{
-	return bit_spin_is_locked(BH_State, &bh->b_state);
-}
-
-static inline void jbd_unlock_bh_state(struct buffer_head *bh)
-{
-	bit_spin_unlock(BH_State, &bh->b_state);
-}
-
-static inline void jbd_lock_bh_journal_head(struct buffer_head *bh)
-{
-	bit_spin_lock(BH_JournalHead, &bh->b_state);
-}
-
-static inline void jbd_unlock_bh_journal_head(struct buffer_head *bh)
-{
-	bit_spin_unlock(BH_JournalHead, &bh->b_state);
-}
-
-#endif
--- a/include/linux/quotaops.h
+++ b/include/linux/quotaops.h
@ -43,7 +43,7 @@ void inode_claim_rsv_space(struct inode *inode, qsize_t number);
 void inode_sub_rsv_space(struct inode *inode, qsize_t number);
 void inode_reclaim_rsv_space(struct inode *inode, qsize_t number);

-void dquot_initialize(struct inode *inode);
+int dquot_initialize(struct inode *inode);
 void dquot_drop(struct inode *inode);
 struct dquot *dqget(struct super_block *sb, struct kqid qid);
 static inline struct dquot *dqgrab(struct dquot *dquot)
@ -200,8 +200,9 @@ static inline int sb_has_quota_active(struct super_block *sb, int type)
 	return 0;
 }

-static inline void dquot_initialize(struct inode *inode)
+static inline int dquot_initialize(struct inode *inode)
 {
+	return 0;
 }

 static inline void dquot_drop(struct inode *inode)
--- a/include/trace/events/ext3.h
+++ b/include/trace/events/ext3.h
@ -1,866 +0,0 @@
-#undef TRACE_SYSTEM
-#define TRACE_SYSTEM ext3
-
-#if !defined(_TRACE_EXT3_H) || defined(TRACE_HEADER_MULTI_READ)
-#define _TRACE_EXT3_H
-
-#include <linux/tracepoint.h>
-
-TRACE_EVENT(ext3_free_inode,
-	TP_PROTO(struct inode *inode),
-
-	TP_ARGS(inode),
-
-	TP_STRUCT__entry(
-		__field(	dev_t,	dev			)
-		__field(	ino_t,	ino			)
-		__field(	umode_t, mode			)
-		__field(	uid_t,	uid			)
-		__field(	gid_t,	gid			)
-		__field(	blkcnt_t, blocks		)
-	),
-
-	TP_fast_assign(
-		__entry->dev	= inode->i_sb->s_dev;
-		__entry->ino	= inode->i_ino;
-		__entry->mode	= inode->i_mode;
-		__entry->uid	= i_uid_read(inode);
-		__entry->gid	= i_gid_read(inode);
-		__entry->blocks	= inode->i_blocks;
-	),
-
-	TP_printk("dev %d,%d ino %lu mode 0%o uid %u gid %u blocks %lu",
-		  MAJOR(__entry->dev), MINOR(__entry->dev),
-		  (unsigned long) __entry->ino,
-		  __entry->mode, __entry->uid, __entry->gid,
-		  (unsigned long) __entry->blocks)
-);
-
-TRACE_EVENT(ext3_request_inode,
-	TP_PROTO(struct inode *dir, int mode),
-
-	TP_ARGS(dir, mode),
-
-	TP_STRUCT__entry(
-		__field(	dev_t,	dev			)
-		__field(	ino_t,	dir			)
-		__field(	umode_t, mode			)
-	),
-
-	TP_fast_assign(
-		__entry->dev	= dir->i_sb->s_dev;
-		__entry->dir	= dir->i_ino;
-		__entry->mode	= mode;
-	),
-
-	TP_printk("dev %d,%d dir %lu mode 0%o",
-		  MAJOR(__entry->dev), MINOR(__entry->dev),
-		  (unsigned long) __entry->dir, __entry->mode)
-);
-
-TRACE_EVENT(ext3_allocate_inode,
-	TP_PROTO(struct inode *inode, struct inode *dir, int mode),
-
-	TP_ARGS(inode, dir, mode),
-
-	TP_STRUCT__entry(
-		__field(	dev_t,	dev			)
-		__field(	ino_t,	ino			)
-		__field(	ino_t,	dir			)
-		__field(	umode_t, mode			)
-	),
-
-	TP_fast_assign(
-		__entry->dev	= inode->i_sb->s_dev;
-		__entry->ino	= inode->i_ino;
-		__entry->dir	= dir->i_ino;
-		__entry->mode	= mode;
-	),
-
-	TP_printk("dev %d,%d ino %lu dir %lu mode 0%o",
-		  MAJOR(__entry->dev), MINOR(__entry->dev),
-		  (unsigned long) __entry->ino,
-		  (unsigned long) __entry->dir, __entry->mode)
-);
-
-TRACE_EVENT(ext3_evict_inode,
-	TP_PROTO(struct inode *inode),
-
-	TP_ARGS(inode),
-
-	TP_STRUCT__entry(
-		__field(	dev_t,	dev			)
-		__field(	ino_t,	ino			)
-		__field(	int,	nlink			)
-	),
-
-	TP_fast_assign(
-		__entry->dev	= inode->i_sb->s_dev;
-		__entry->ino	= inode->i_ino;
-		__entry->nlink	= inode->i_nlink;
-	),
-
-	TP_printk("dev %d,%d ino %lu nlink %d",
-		  MAJOR(__entry->dev), MINOR(__entry->dev),
-		  (unsigned long) __entry->ino, __entry->nlink)
-);
-
-TRACE_EVENT(ext3_drop_inode,
-	TP_PROTO(struct inode *inode, int drop),
-
-	TP_ARGS(inode, drop),
-
-	TP_STRUCT__entry(
-		__field(	dev_t,	dev			)
-		__field(	ino_t,	ino			)
-		__field(	int,	drop			)
-	),
-
-	TP_fast_assign(
-		__entry->dev	= inode->i_sb->s_dev;
-		__entry->ino	= inode->i_ino;
-		__entry->drop	= drop;
-	),
-
-	TP_printk("dev %d,%d ino %lu drop %d",
-		  MAJOR(__entry->dev), MINOR(__entry->dev),
-		  (unsigned long) __entry->ino, __entry->drop)
-);
-
-TRACE_EVENT(ext3_mark_inode_dirty,
-	TP_PROTO(struct inode *inode, unsigned long IP),
-
-	TP_ARGS(inode, IP),
-
-	TP_STRUCT__entry(
-		__field(	dev_t,	dev			)
-		__field(	ino_t,	ino			)
-		__field(unsigned long,	ip			)
-	),
-
-	TP_fast_assign(
-		__entry->dev	= inode->i_sb->s_dev;
-		__entry->ino	= inode->i_ino;
-		__entry->ip	= IP;
-	),
-
-	TP_printk("dev %d,%d ino %lu caller %pS",
-		  MAJOR(__entry->dev), MINOR(__entry->dev),
-		  (unsigned long) __entry->ino, (void *)__entry->ip)
-);
-
-TRACE_EVENT(ext3_write_begin,
-	TP_PROTO(struct inode *inode, loff_t pos, unsigned int len,
-		 unsigned int flags),
-
-	TP_ARGS(inode, pos, len, flags),
-
-	TP_STRUCT__entry(
-		__field(	dev_t,	dev			)
-		__field(	ino_t,	ino			)
-		__field(	loff_t,	pos			)
-		__field(	unsigned int, len		)
-		__field(	unsigned int, flags		)
-	),
-
-	TP_fast_assign(
-		__entry->dev	= inode->i_sb->s_dev;
-		__entry->ino	= inode->i_ino;
-		__entry->pos	= pos;
-		__entry->len	= len;
-		__entry->flags	= flags;
-	),
-
-	TP_printk("dev %d,%d ino %lu pos %llu len %u flags %u",
-		  MAJOR(__entry->dev), MINOR(__entry->dev),
-		  (unsigned long) __entry->ino,
-		  (unsigned long long) __entry->pos, __entry->len,
-		  __entry->flags)
-);
-
-DECLARE_EVENT_CLASS(ext3__write_end,
-	TP_PROTO(struct inode *inode, loff_t pos, unsigned int len,
-			unsigned int copied),
-
-	TP_ARGS(inode, pos, len, copied),
-
-	TP_STRUCT__entry(
-		__field(	dev_t,	dev			)
-		__field(	ino_t,	ino			)
-		__field(	loff_t,	pos			)
-		__field(	unsigned int, len		)
-		__field(	unsigned int, copied		)
-	),
-
-	TP_fast_assign(
-		__entry->dev	= inode->i_sb->s_dev;
-		__entry->ino	= inode->i_ino;
-		__entry->pos	= pos;
-		__entry->len	= len;
-		__entry->copied	= copied;
-	),
-
-	TP_printk("dev %d,%d ino %lu pos %llu len %u copied %u",
-		  MAJOR(__entry->dev), MINOR(__entry->dev),
-		  (unsigned long) __entry->ino,
-		  (unsigned long long) __entry->pos, __entry->len,
-		  __entry->copied)
-);
-
-DEFINE_EVENT(ext3__write_end, ext3_ordered_write_end,
-
-	TP_PROTO(struct inode *inode, loff_t pos, unsigned int len,
-		 unsigned int copied),
-
-	TP_ARGS(inode, pos, len, copied)
-);
-
-DEFINE_EVENT(ext3__write_end, ext3_writeback_write_end,
-
-	TP_PROTO(struct inode *inode, loff_t pos, unsigned int len,
-		 unsigned int copied),
-
-	TP_ARGS(inode, pos, len, copied)
-);
-
-DEFINE_EVENT(ext3__write_end, ext3_journalled_write_end,
-
-	TP_PROTO(struct inode *inode, loff_t pos, unsigned int len,
-		 unsigned int copied),
-
-	TP_ARGS(inode, pos, len, copied)
-);
-
-DECLARE_EVENT_CLASS(ext3__page_op,
-	TP_PROTO(struct page *page),
-
-	TP_ARGS(page),
-
-	TP_STRUCT__entry(
-		__field(	dev_t,	dev			)
-		__field(	ino_t,	ino			)
-		__field(	pgoff_t, index			)
-
-	),
-
-	TP_fast_assign(
-		__entry->index	= page->index;
-		__entry->ino	= page->mapping->host->i_ino;
-		__entry->dev	= page->mapping->host->i_sb->s_dev;
-	),
-
-	TP_printk("dev %d,%d ino %lu page_index %lu",
-		  MAJOR(__entry->dev), MINOR(__entry->dev),
-		  (unsigned long) __entry->ino, __entry->index)
-);
-
-DEFINE_EVENT(ext3__page_op, ext3_ordered_writepage,
-
-	TP_PROTO(struct page *page),
-
-	TP_ARGS(page)
-);
-
-DEFINE_EVENT(ext3__page_op, ext3_writeback_writepage,
-
-	TP_PROTO(struct page *page),
-
-	TP_ARGS(page)
-);
-
-DEFINE_EVENT(ext3__page_op, ext3_journalled_writepage,
-
-	TP_PROTO(struct page *page),
-
-	TP_ARGS(page)
-);
-
-DEFINE_EVENT(ext3__page_op, ext3_readpage,
-
-	TP_PROTO(struct page *page),
-
-	TP_ARGS(page)
-);
-
-DEFINE_EVENT(ext3__page_op, ext3_releasepage,
-
-	TP_PROTO(struct page *page),
-
-	TP_ARGS(page)
-);
-
-TRACE_EVENT(ext3_invalidatepage,
-	TP_PROTO(struct page *page, unsigned int offset, unsigned int length),
-
-	TP_ARGS(page, offset, length),
-
-	TP_STRUCT__entry(
-		__field(	pgoff_t, index			)
-		__field(	unsigned int, offset		)
-		__field(	unsigned int, length		)
-		__field(	ino_t,	ino			)
-		__field(	dev_t,	dev			)
-
-	),
-
-	TP_fast_assign(
-		__entry->index	= page->index;
-		__entry->offset	= offset;
-		__entry->length	= length;
-		__entry->ino	= page->mapping->host->i_ino;
-		__entry->dev	= page->mapping->host->i_sb->s_dev;
-	),
-
-	TP_printk("dev %d,%d ino %lu page_index %lu offset %u length %u",
-		  MAJOR(__entry->dev), MINOR(__entry->dev),
-		  (unsigned long) __entry->ino,
-		  __entry->index, __entry->offset, __entry->length)
-);
-
-TRACE_EVENT(ext3_discard_blocks,
-	TP_PROTO(struct super_block *sb, unsigned long blk,
-			unsigned long count),
-
-	TP_ARGS(sb, blk, count),
-
-	TP_STRUCT__entry(
-		__field(	dev_t,		dev		)
-		__field(	unsigned long,	blk		)
-		__field(	unsigned long,	count		)
-
-	),
-
-	TP_fast_assign(
-		__entry->dev	= sb->s_dev;
-		__entry->blk	= blk;
-		__entry->count	= count;
-	),
-
-	TP_printk("dev %d,%d blk %lu count %lu",
-		  MAJOR(__entry->dev), MINOR(__entry->dev),
-		  __entry->blk, __entry->count)
-);
-
-TRACE_EVENT(ext3_request_blocks,
-	TP_PROTO(struct inode *inode, unsigned long goal,
-		 unsigned long count),
-
-	TP_ARGS(inode, goal, count),
-
-	TP_STRUCT__entry(
-		__field(	dev_t,	dev			)
-		__field(	ino_t,	ino			)
-		__field(	unsigned long, count		)
-		__field(	unsigned long,	goal		)
-	),
-
-	TP_fast_assign(
-		__entry->dev	= inode->i_sb->s_dev;
-		__entry->ino	= inode->i_ino;
-		__entry->count	= count;
-		__entry->goal	= goal;
-	),
-
-	TP_printk("dev %d,%d ino %lu count %lu goal %lu ",
-		  MAJOR(__entry->dev), MINOR(__entry->dev),
-		  (unsigned long) __entry->ino,
-		  __entry->count, __entry->goal)
-);
-
-TRACE_EVENT(ext3_allocate_blocks,
-	TP_PROTO(struct inode *inode, unsigned long goal,
-		 unsigned long count, unsigned long block),
-
-	TP_ARGS(inode, goal, count, block),
-
-	TP_STRUCT__entry(
-		__field(	dev_t,	dev			)
-		__field(	ino_t,	ino			)
-		__field(	unsigned long,	block		)
-		__field(	unsigned long, count		)
-		__field(	unsigned long,	goal		)
-	),
-
-	TP_fast_assign(
-		__entry->dev	= inode->i_sb->s_dev;
-		__entry->ino	= inode->i_ino;
-		__entry->block	= block;
-		__entry->count	= count;
-		__entry->goal	= goal;
-	),
-
-	TP_printk("dev %d,%d ino %lu count %lu block %lu goal %lu",
-		  MAJOR(__entry->dev), MINOR(__entry->dev),
-		  (unsigned long) __entry->ino,
-		   __entry->count, __entry->block,
-		  __entry->goal)
-);
-
-TRACE_EVENT(ext3_free_blocks,
-	TP_PROTO(struct inode *inode, unsigned long block,
-		 unsigned long count),
-
-	TP_ARGS(inode, block, count),
-
-	TP_STRUCT__entry(
-		__field(	dev_t,	dev			)
-		__field(	ino_t,	ino			)
-		__field(	umode_t, mode			)
-		__field(	unsigned long,	block		)
-		__field(	unsigned long,	count		)
-	),
-
-	TP_fast_assign(
-		__entry->dev		= inode->i_sb->s_dev;
-		__entry->ino		= inode->i_ino;
-		__entry->mode		= inode->i_mode;
-		__entry->block		= block;
-		__entry->count		= count;
-	),
-
-	TP_printk("dev %d,%d ino %lu mode 0%o block %lu count %lu",
-		  MAJOR(__entry->dev), MINOR(__entry->dev),
-		  (unsigned long) __entry->ino,
-		  __entry->mode, __entry->block, __entry->count)
-);
-
-TRACE_EVENT(ext3_sync_file_enter,
-	TP_PROTO(struct file *file, int datasync),
-
-	TP_ARGS(file, datasync),
-
-	TP_STRUCT__entry(
-		__field(	dev_t,	dev			)
-		__field(	ino_t,	ino			)
-		__field(	ino_t,	parent			)
-		__field(	int,	datasync		)
-	),
-
-	TP_fast_assign(
-		struct dentry *dentry = file->f_path.dentry;
-
-		__entry->dev		= d_inode(dentry)->i_sb->s_dev;
-		__entry->ino		= d_inode(dentry)->i_ino;
-		__entry->datasync	= datasync;
-		__entry->parent		= d_inode(dentry->d_parent)->i_ino;
-	),
-
-	TP_printk("dev %d,%d ino %lu parent %ld datasync %d ",
-		  MAJOR(__entry->dev), MINOR(__entry->dev),
-		  (unsigned long) __entry->ino,
-		  (unsigned long) __entry->parent, __entry->datasync)
-);
-
-TRACE_EVENT(ext3_sync_file_exit,
-	TP_PROTO(struct inode *inode, int ret),
-
-	TP_ARGS(inode, ret),
-
-	TP_STRUCT__entry(
-		__field(	int,	ret			)
-		__field(	ino_t,	ino			)
-		__field(	dev_t,	dev			)
-	),
-
-	TP_fast_assign(
-		__entry->ret		= ret;
-		__entry->ino		= inode->i_ino;
-		__entry->dev		= inode->i_sb->s_dev;
-	),
-
-	TP_printk("dev %d,%d ino %lu ret %d",
-		  MAJOR(__entry->dev), MINOR(__entry->dev),
-		  (unsigned long) __entry->ino,
-		  __entry->ret)
-);
-
-TRACE_EVENT(ext3_sync_fs,
-	TP_PROTO(struct super_block *sb, int wait),
-
-	TP_ARGS(sb, wait),
-
-	TP_STRUCT__entry(
-		__field(	dev_t,	dev			)
-		__field(	int,	wait			)
-
-	),
-
-	TP_fast_assign(
-		__entry->dev	= sb->s_dev;
-		__entry->wait	= wait;
-	),
-
-	TP_printk("dev %d,%d wait %d",
-		  MAJOR(__entry->dev), MINOR(__entry->dev),
-		  __entry->wait)
-);
-
-TRACE_EVENT(ext3_rsv_window_add,
-	TP_PROTO(struct super_block *sb,
-		 struct ext3_reserve_window_node *rsv_node),
-
-	TP_ARGS(sb, rsv_node),
-
-	TP_STRUCT__entry(
-		__field(	unsigned long,	start		)
-		__field(	unsigned long,	end		)
-		__field(	dev_t,	dev			)
-	),
-
-	TP_fast_assign(
-		__entry->dev	= sb->s_dev;
-		__entry->start	= rsv_node->rsv_window._rsv_start;
-		__entry->end	= rsv_node->rsv_window._rsv_end;
-	),
-
-	TP_printk("dev %d,%d start %lu end %lu",
-		  MAJOR(__entry->dev), MINOR(__entry->dev),
-		  __entry->start, __entry->end)
-);
-
-TRACE_EVENT(ext3_discard_reservation,
-	TP_PROTO(struct inode *inode,
-		 struct ext3_reserve_window_node *rsv_node),
-
-	TP_ARGS(inode, rsv_node),
-
-	TP_STRUCT__entry(
-		__field(	unsigned long,	start		)
-		__field(	unsigned long,	end		)
-		__field(	ino_t,	ino			)
-		__field(	dev_t,	dev			)
-	),
-
-	TP_fast_assign(
-		__entry->start	= rsv_node->rsv_window._rsv_start;
-		__entry->end	= rsv_node->rsv_window._rsv_end;
-		__entry->ino	= inode->i_ino;
-		__entry->dev	= inode->i_sb->s_dev;
-	),
-
-	TP_printk("dev %d,%d ino %lu start %lu end %lu",
-		  MAJOR(__entry->dev), MINOR(__entry->dev),
-		  (unsigned long)__entry->ino, __entry->start,
-		  __entry->end)
-);
-
-TRACE_EVENT(ext3_alloc_new_reservation,
-	TP_PROTO(struct super_block *sb, unsigned long goal),
-
-	TP_ARGS(sb, goal),
-
-	TP_STRUCT__entry(
-		__field(	dev_t,	dev			)
-		__field(	unsigned long,	goal		)
-	),
-
-	TP_fast_assign(
-		__entry->dev	= sb->s_dev;
-		__entry->goal	= goal;
-	),
-
-	TP_printk("dev %d,%d goal %lu",
-		  MAJOR(__entry->dev), MINOR(__entry->dev),
-		  __entry->goal)
-);
-
-TRACE_EVENT(ext3_reserved,
-	TP_PROTO(struct super_block *sb, unsigned long block,
-		 struct ext3_reserve_window_node *rsv_node),
-
-	TP_ARGS(sb, block, rsv_node),
-
-	TP_STRUCT__entry(
-		__field(	unsigned long,	block		)
-		__field(	unsigned long,	start		)
-		__field(	unsigned long,	end		)
-		__field(	dev_t,	dev			)
-	),
-
-	TP_fast_assign(
-		__entry->block	= block;
-		__entry->start	= rsv_node->rsv_window._rsv_start;
-		__entry->end	= rsv_node->rsv_window._rsv_end;
-		__entry->dev	= sb->s_dev;
-	),
-
-	TP_printk("dev %d,%d block %lu, start %lu end %lu",
-		  MAJOR(__entry->dev), MINOR(__entry->dev),
-		  __entry->block, __entry->start, __entry->end)
-);
-
-TRACE_EVENT(ext3_forget,
-	TP_PROTO(struct inode *inode, int is_metadata, unsigned long block),
-
-	TP_ARGS(inode, is_metadata, block),
-
-	TP_STRUCT__entry(
-		__field(	dev_t,	dev			)
-		__field(	ino_t,	ino			)
-		__field(	umode_t, mode			)
-		__field(	int,	is_metadata		)
-		__field(	unsigned long,	block		)
-	),
-
-	TP_fast_assign(
-		__entry->dev	= inode->i_sb->s_dev;
-		__entry->ino	= inode->i_ino;
-		__entry->mode	= inode->i_mode;
-		__entry->is_metadata = is_metadata;
-		__entry->block	= block;
-	),
-
-	TP_printk("dev %d,%d ino %lu mode 0%o is_metadata %d block %lu",
-		  MAJOR(__entry->dev), MINOR(__entry->dev),
-		  (unsigned long) __entry->ino,
-		  __entry->mode, __entry->is_metadata, __entry->block)
-);
-
-TRACE_EVENT(ext3_read_block_bitmap,
-	TP_PROTO(struct super_block *sb, unsigned int group),
-
-	TP_ARGS(sb, group),
-
-	TP_STRUCT__entry(
-		__field(	dev_t,	dev			)
-		__field(	__u32,	group			)
-
-	),
-
-	TP_fast_assign(
-		__entry->dev	= sb->s_dev;
-		__entry->group	= group;
-	),
-
-	TP_printk("dev %d,%d group %u",
-		  MAJOR(__entry->dev), MINOR(__entry->dev),
-		  __entry->group)
-);
-
-TRACE_EVENT(ext3_direct_IO_enter,
-	TP_PROTO(struct inode *inode, loff_t offset, unsigned long len, int rw),
-
-	TP_ARGS(inode, offset, len, rw),
-
-	TP_STRUCT__entry(
-		__field(	ino_t,	ino			)
-		__field(	dev_t,	dev			)
-		__field(	loff_t,	pos			)
-		__field(	unsigned long,	len		)
-		__field(	int,	rw			)
-	),
-
-	TP_fast_assign(
-		__entry->ino	= inode->i_ino;
-		__entry->dev	= inode->i_sb->s_dev;
-		__entry->pos	= offset;
-		__entry->len	= len;
-		__entry->rw	= rw;
-	),
-
-	TP_printk("dev %d,%d ino %lu pos %llu len %lu rw %d",
-		  MAJOR(__entry->dev), MINOR(__entry->dev),
-		  (unsigned long) __entry->ino,
-		  (unsigned long long) __entry->pos, __entry->len,
-		  __entry->rw)
-);
-
-TRACE_EVENT(ext3_direct_IO_exit,
-	TP_PROTO(struct inode *inode, loff_t offset, unsigned long len,
-		 int rw, int ret),
-
-	TP_ARGS(inode, offset, len, rw, ret),
-
-	TP_STRUCT__entry(
-		__field(	ino_t,	ino			)
-		__field(	dev_t,	dev			)
-		__field(	loff_t,	pos			)
-		__field(	unsigned long,	len		)
-		__field(	int,	rw			)
-		__field(	int,	ret			)
-	),
-
-	TP_fast_assign(
-		__entry->ino	= inode->i_ino;
-		__entry->dev	= inode->i_sb->s_dev;
-		__entry->pos	= offset;
-		__entry->len	= len;
-		__entry->rw	= rw;
-		__entry->ret	= ret;
-	),
-
-	TP_printk("dev %d,%d ino %lu pos %llu len %lu rw %d ret %d",
-		  MAJOR(__entry->dev), MINOR(__entry->dev),
-		  (unsigned long) __entry->ino,
-		  (unsigned long long) __entry->pos, __entry->len,
-		  __entry->rw, __entry->ret)
-);
-
-TRACE_EVENT(ext3_unlink_enter,
-	TP_PROTO(struct inode *parent, struct dentry *dentry),
-
-	TP_ARGS(parent, dentry),
-
-	TP_STRUCT__entry(
-		__field(	ino_t,	parent			)
-		__field(	ino_t,	ino			)
-		__field(	loff_t,	size			)
-		__field(	dev_t,	dev			)
-	),
-
-	TP_fast_assign(
-		__entry->parent		= parent->i_ino;
-		__entry->ino		= d_inode(dentry)->i_ino;
-		__entry->size		= d_inode(dentry)->i_size;
-		__entry->dev		= d_inode(dentry)->i_sb->s_dev;
-	),
-
-	TP_printk("dev %d,%d ino %lu size %lld parent %ld",
-		  MAJOR(__entry->dev), MINOR(__entry->dev),
-		  (unsigned long) __entry->ino,
-		  (unsigned long long)__entry->size,
-		  (unsigned long) __entry->parent)
-);
-
-TRACE_EVENT(ext3_unlink_exit,
-	TP_PROTO(struct dentry *dentry, int ret),
-
-	TP_ARGS(dentry, ret),
-
-	TP_STRUCT__entry(
-		__field(	ino_t,	ino			)
-		__field(	dev_t,	dev			)
-		__field(	int,	ret			)
-	),
-
-	TP_fast_assign(
-		__entry->ino		= d_inode(dentry)->i_ino;
-		__entry->dev		= d_inode(dentry)->i_sb->s_dev;
-		__entry->ret		= ret;
-	),
-
-	TP_printk("dev %d,%d ino %lu ret %d",
-		  MAJOR(__entry->dev), MINOR(__entry->dev),
-		  (unsigned long) __entry->ino,
-		  __entry->ret)
-);
-
-DECLARE_EVENT_CLASS(ext3__truncate,
-	TP_PROTO(struct inode *inode),
-
-	TP_ARGS(inode),
-
-	TP_STRUCT__entry(
-		__field(	ino_t,		ino		)
-		__field(	dev_t,		dev		)
-		__field(	blkcnt_t,	blocks		)
-	),
-
-	TP_fast_assign(
-		__entry->ino    = inode->i_ino;
-		__entry->dev    = inode->i_sb->s_dev;
-		__entry->blocks	= inode->i_blocks;
-	),
-
-	TP_printk("dev %d,%d ino %lu blocks %lu",
-		  MAJOR(__entry->dev), MINOR(__entry->dev),
-		  (unsigned long) __entry->ino, (unsigned long) __entry->blocks)
-);
-
-DEFINE_EVENT(ext3__truncate, ext3_truncate_enter,
-
-	TP_PROTO(struct inode *inode),
-
-	TP_ARGS(inode)
-);
-
-DEFINE_EVENT(ext3__truncate, ext3_truncate_exit,
-
-	TP_PROTO(struct inode *inode),
-
-	TP_ARGS(inode)
-);
-
-TRACE_EVENT(ext3_get_blocks_enter,
-	TP_PROTO(struct inode *inode, unsigned long lblk,
-		 unsigned long len, int create),
-
-	TP_ARGS(inode, lblk, len, create),
-
-	TP_STRUCT__entry(
-		__field(	ino_t,		ino		)
-		__field(	dev_t,		dev		)
-		__field(	unsigned long,	lblk		)
-		__field(	unsigned long,	len		)
-		__field(	int,		create		)
-	),
-
-	TP_fast_assign(
-		__entry->ino    = inode->i_ino;
-		__entry->dev    = inode->i_sb->s_dev;
-		__entry->lblk	= lblk;
-		__entry->len	= len;
-		__entry->create	= create;
-	),
-
-	TP_printk("dev %d,%d ino %lu lblk %lu len %lu create %u",
-		  MAJOR(__entry->dev), MINOR(__entry->dev),
-		  (unsigned long) __entry->ino,
-		  __entry->lblk, __entry->len, __entry->create)
-);
-
-TRACE_EVENT(ext3_get_blocks_exit,
-	TP_PROTO(struct inode *inode, unsigned long lblk,
-		 unsigned long pblk, unsigned long len, int ret),
-
-	TP_ARGS(inode, lblk, pblk, len, ret),
-
-	TP_STRUCT__entry(
-		__field(	ino_t,		ino		)
-		__field(	dev_t,		dev		)
-		__field(	unsigned long,	lblk		)
-		__field(	unsigned long,	pblk		)
-		__field(	unsigned long,	len		)
-		__field(	int,		ret		)
-	),
-
-	TP_fast_assign(
-		__entry->ino    = inode->i_ino;
-		__entry->dev    = inode->i_sb->s_dev;
-		__entry->lblk	= lblk;
-		__entry->pblk	= pblk;
-		__entry->len	= len;
-		__entry->ret	= ret;
-	),
-
-	TP_printk("dev %d,%d ino %lu lblk %lu pblk %lu len %lu ret %d",
-		  MAJOR(__entry->dev), MINOR(__entry->dev),
-		  (unsigned long) __entry->ino,
-		   __entry->lblk, __entry->pblk,
-		  __entry->len, __entry->ret)
-);
-
-TRACE_EVENT(ext3_load_inode,
-	TP_PROTO(struct inode *inode),
-
-	TP_ARGS(inode),
-
-	TP_STRUCT__entry(
-		__field(	ino_t,	ino		)
-		__field(	dev_t,	dev		)
-	),
-
-	TP_fast_assign(
-		__entry->ino		= inode->i_ino;
-		__entry->dev		= inode->i_sb->s_dev;
-	),
-
-	TP_printk("dev %d,%d ino %lu",
-		  MAJOR(__entry->dev), MINOR(__entry->dev),
-		  (unsigned long) __entry->ino)
-);
-
-#endif /* _TRACE_EXT3_H */
-
-/* This part must be outside protection */
-#include <trace/define_trace.h>
--- a/include/trace/events/jbd.h
+++ b/include/trace/events/jbd.h
@ -1,194 +0,0 @@
-#undef TRACE_SYSTEM
-#define TRACE_SYSTEM jbd
-
-#if !defined(_TRACE_JBD_H) || defined(TRACE_HEADER_MULTI_READ)
-#define _TRACE_JBD_H
-
-#include <linux/jbd.h>
-#include <linux/tracepoint.h>
-
-TRACE_EVENT(jbd_checkpoint,
-
-	TP_PROTO(journal_t *journal, int result),
-
-	TP_ARGS(journal, result),
-
-	TP_STRUCT__entry(
-		__field(	dev_t,	dev			)
-		__field(	int,	result			)
-	),
-
-	TP_fast_assign(
-		__entry->dev		= journal->j_fs_dev->bd_dev;
-		__entry->result		= result;
-	),
-
-	TP_printk("dev %d,%d result %d",
-		  MAJOR(__entry->dev), MINOR(__entry->dev),
-		  __entry->result)
-);
-
-DECLARE_EVENT_CLASS(jbd_commit,
-
-	TP_PROTO(journal_t *journal, transaction_t *commit_transaction),
-
-	TP_ARGS(journal, commit_transaction),
-
-	TP_STRUCT__entry(
-		__field(	dev_t,	dev			)
-		__field(	int,	transaction		)
-	),
-
-	TP_fast_assign(
-		__entry->dev		= journal->j_fs_dev->bd_dev;
-		__entry->transaction	= commit_transaction->t_tid;
-	),
-
-	TP_printk("dev %d,%d transaction %d",
-		  MAJOR(__entry->dev), MINOR(__entry->dev),
-		  __entry->transaction)
-);
-
-DEFINE_EVENT(jbd_commit, jbd_start_commit,
-
-	TP_PROTO(journal_t *journal, transaction_t *commit_transaction),
-
-	TP_ARGS(journal, commit_transaction)
-);
-
-DEFINE_EVENT(jbd_commit, jbd_commit_locking,
-
-	TP_PROTO(journal_t *journal, transaction_t *commit_transaction),
-
-	TP_ARGS(journal, commit_transaction)
-);
-
-DEFINE_EVENT(jbd_commit, jbd_commit_flushing,
-
-	TP_PROTO(journal_t *journal, transaction_t *commit_transaction),
-
-	TP_ARGS(journal, commit_transaction)
-);
-
-DEFINE_EVENT(jbd_commit, jbd_commit_logging,
-
-	TP_PROTO(journal_t *journal, transaction_t *commit_transaction),
-
-	TP_ARGS(journal, commit_transaction)
-);
-
-TRACE_EVENT(jbd_drop_transaction,
-
-	TP_PROTO(journal_t *journal, transaction_t *commit_transaction),
-
-	TP_ARGS(journal, commit_transaction),
-
-	TP_STRUCT__entry(
-		__field(	dev_t,	dev			)
-		__field(	int,	transaction		)
-	),
-
-	TP_fast_assign(
-		__entry->dev		= journal->j_fs_dev->bd_dev;
-		__entry->transaction	= commit_transaction->t_tid;
-	),
-
-	TP_printk("dev %d,%d transaction %d",
-		  MAJOR(__entry->dev), MINOR(__entry->dev),
-		  __entry->transaction)
-);
-
-TRACE_EVENT(jbd_end_commit,
-	TP_PROTO(journal_t *journal, transaction_t *commit_transaction),
-
-	TP_ARGS(journal, commit_transaction),
-
-	TP_STRUCT__entry(
-		__field(	dev_t,	dev			)
-		__field(	int,	transaction		)
-		__field(	int,	head			)
-	),
-
-	TP_fast_assign(
-		__entry->dev		= journal->j_fs_dev->bd_dev;
-		__entry->transaction	= commit_transaction->t_tid;
-		__entry->head		= journal->j_tail_sequence;
-	),
-
-	TP_printk("dev %d,%d transaction %d head %d",
-		  MAJOR(__entry->dev), MINOR(__entry->dev),
-		  __entry->transaction, __entry->head)
-);
-
-TRACE_EVENT(jbd_do_submit_data,
-	TP_PROTO(journal_t *journal, transaction_t *commit_transaction),
-
-	TP_ARGS(journal, commit_transaction),
-
-	TP_STRUCT__entry(
-		__field(	dev_t,	dev			)
-		__field(	int,	transaction		)
-	),
-
-	TP_fast_assign(
-		__entry->dev		= journal->j_fs_dev->bd_dev;
-		__entry->transaction	= commit_transaction->t_tid;
-	),
-
-	TP_printk("dev %d,%d transaction %d",
-		  MAJOR(__entry->dev), MINOR(__entry->dev),
-		   __entry->transaction)
-);
-
-TRACE_EVENT(jbd_cleanup_journal_tail,
-
-	TP_PROTO(journal_t *journal, tid_t first_tid,
-		 unsigned long block_nr, unsigned long freed),
-
-	TP_ARGS(journal, first_tid, block_nr, freed),
-
-	TP_STRUCT__entry(
-		__field(	dev_t,	dev			)
-		__field(	tid_t,	tail_sequence		)
-		__field(	tid_t,	first_tid		)
-		__field(unsigned long,	block_nr		)
-		__field(unsigned long,	freed			)
-	),
-
-	TP_fast_assign(
-		__entry->dev		= journal->j_fs_dev->bd_dev;
-		__entry->tail_sequence	= journal->j_tail_sequence;
-		__entry->first_tid	= first_tid;
-		__entry->block_nr	= block_nr;
-		__entry->freed		= freed;
-	),
-
-	TP_printk("dev %d,%d from %u to %u offset %lu freed %lu",
-		  MAJOR(__entry->dev), MINOR(__entry->dev),
-		  __entry->tail_sequence, __entry->first_tid,
-		  __entry->block_nr, __entry->freed)
-);
-
-TRACE_EVENT(journal_write_superblock,
-	TP_PROTO(journal_t *journal, int write_op),
-
-	TP_ARGS(journal, write_op),
-
-	TP_STRUCT__entry(
-		__field(	dev_t,	dev			)
-		__field(	int,	write_op		)
-	),
-
-	TP_fast_assign(
-		__entry->dev		= journal->j_fs_dev->bd_dev;
-		__entry->write_op	= write_op;
-	),
-
-	TP_printk("dev %d,%d write_op %x", MAJOR(__entry->dev),
-		  MINOR(__entry->dev), __entry->write_op)
-);
-
-#endif /* _TRACE_JBD_H */
-
-/* This part must be outside protection */
-#include <trace/define_trace.h>
--- a/mm/Kconfig
+++ b/mm/Kconfig
@ -299,15 +299,9 @@ config BOUNCE
 # On the 'tile' arch, USB OHCI needs the bounce pool since tilegx will often
 # have more than 4GB of memory, but we don't currently use the IOTLB to present
 # a 32-bit address to OHCI.  So we need to use a bounce pool instead.
-#
-# We also use the bounce pool to provide stable page writes for jbd.  jbd
-# initiates buffer writeback without locking the page or setting PG_writeback,
-# and fixing that behavior (a second time; jbd2 doesn't have this problem) is
-# a major rework effort.  Instead, use the bounce buffer to snapshot pages
-# (until jbd goes away).  The only jbd user is ext3.
 config NEED_BOUNCE_POOL
 	bool
-	default y if (TILE && USB_OHCI_HCD) || (BLK_DEV_INTEGRITY && JBD)
+	default y if TILE && USB_OHCI_HCD

 config NR_QUICK
 	int