linux

Commit Graph

Author	SHA1	Message	Date
Lachlan McIlroy	364f358a73	[XFS] Prevent direct I/O from mapping extents beyond eof With the help from some tracing I found that we try to map extents beyond eof when doing a direct I/O read. It appears that the way to inform the generic direct I/O path (ie do_direct_IO()) that we have breached eof is to return an unmapped buffer from xfs_get_blocks_direct(). This will cause do_direct_IO() to jump to the hole handling code where is will check for eof and then abort. This problem was found because a direct I/O read was trying to map beyond eof and was encountering delayed allocations. The delayed allocations beyond eof are speculative allocations and they didn't get converted when the direct I/O flushed the file because there was only enough space in the current AG to convert and write out the dirty pages within eof. Note that xfs_iomap_write_allocate() wont necessarily convert all the delayed allocation passed to it - it will return after allocating the first extent - so if the delayed allocation extends beyond eof then it will stay that way. SGI-PV: 983683 SGI-Modid: xfs-linux-melb:xfs-kern:31929a Signed-off-by: Lachlan McIlroy <lachlan@sgi.com> Signed-off-by: Christoph Hellwig <hch@infradead.org>	2008-09-17 16:50:14 +10:00
David Chinner	e6064d30c3	[XFS] XFS: Kill xfs_vtoi() xfs_vtoi() is redundant and only unsed in small sections of code. Replace them with widely used XFS_I() inline and kill xfs_vtoi(). SGI-PV: 981498 SGI-Modid: xfs-linux-melb:xfs-kern:31725a Signed-off-by: David Chinner <david@fromorbit.com> Signed-off-by: Niv Sardi <xaiki@sgi.com> Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>	2008-08-13 16:01:45 +10:00
Nick Piggin	ca5de404ff	fs: rename buffer trylock Like the page lock change, this also requires name change, so convert the raw test_and_set bitop to a trylock. Signed-off-by: Nick Piggin <npiggin@suse.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-08-04 21:56:09 -07:00
Nick Piggin	529ae9aaa0	mm: rename page trylock Converting page lock to new locking bitops requires a change of page flag operation naming, so we might as well convert it to something nicer (!TestSetPageLocked_Lock => trylock_page, SetPageLocked => set_page_locked). This also facilitates lockdeping of page lock. Signed-off-by: Nick Piggin <npiggin@suse.de> Acked-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Acked-by: Peter Zijlstra <peterz@infradead.org> Acked-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-08-04 21:31:34 -07:00
Denys Vlasenko	b41759cf11	[XFS] Remove unused wbc parameter from xfs_start_page_writeback() SGI-PV: 981498 SGI-Modid: xfs-linux-melb:xfs-kern:31057a Signed-off-by: Denys Vlasenko <vda.linux@googlemail.com> Signed-off-by: David Chinner <dgc@sgi.com> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>	2008-07-28 16:58:09 +10:00
David Chinner	cc88466f3f	[XFS] Catch unwritten extent conversion errors. On unwritten I/O completion, we fail to propagate an error when converting the extent to a written extent. This means that the I/O silently fails. propagate the error onto the ioend so that the inode is marked with an error appropriately. SGI-PV: 980084 SGI-Modid: xfs-linux-melb:xfs-kern:30826a Signed-off-by: David Chinner <dgc@sgi.com> Signed-off-by: Niv Sardi <xaiki@sgi.com> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>	2008-04-18 12:00:58 +10:00
Christoph Hellwig	126468b115	[XFS] kill xfs_rwlock/xfs_rwunlock We can just use xfs_ilock/xfs_iunlock instead and get rid of the ugly bhv_vrwlock_t. SGI-PV: 976035 SGI-Modid: xfs-linux-melb:xfs-kern:30533a Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>	2008-04-18 11:39:25 +10:00
Eric Sandeen	71ddabb94a	[XFS] optimize XFS_IS_REALTIME_INODE w/o realtime config Use XFS_IS_REALTIME_INODE in more places, and #define it to 0 if CONFIG_XFS_RT is off. This should be safe because mount checks in xfs_rtmount_init: so if we get mounted w/o CONFIG_XFS_RT, no realtime inodes should be encountered after that. Defining XFS_IS_REALTIME_INODE to 0 saves a bit of stack space, presumeably gcc can optimize around the various "if (0)" type checks: xfs_alloc_file_space -8 xfs_bmap_adjacent -16 xfs_bmapi -8 xfs_bmap_rtalloc -16 xfs_bunmapi -28 xfs_free_file_space -64 xfs_imap +8 <-- ? hmm. xfs_iomap_write_direct -12 xfs_qm_dqusage_adjust -4 xfs_qm_vop_chown_reserve -4 SGI-PV: 971186 SGI-Modid: xfs-linux-melb:xfs-kern:30014a Signed-off-by: Eric Sandeen <sandeen@sandeen.net> Signed-off-by: David Chinner <dgc@sgi.com> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>	2008-02-07 18:16:43 +11:00
Christoph Hellwig	613d70436c	[XFS] kill xfs_iocore_t xfs_iocore_t is a structure embedded in xfs_inode. Except for one field it just duplicates fields already in xfs_inode, and there is nothing this abstraction buys us on XFS/Linux. This patch removes it and shrinks source and binary size of xfs aswell as shrinking the size of xfs_inode by 60/44 bytes in debug/non-debug builds. SGI-PV: 970852 SGI-Modid: xfs-linux-melb:xfs-kern:29754a Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com> Signed-off-by: Tim Shimmin <tes@sgi.com>	2008-02-07 16:48:58 +11:00
Lachlan McIlroy	541d7d3c4b	[XFS] kill unnessecary ioops indirection Currently there is an indirection called ioops in the XFS data I/O path. Various functions are called by functions pointers, but there is no coherence in what this is for, and of course for XFS itself it's entirely unused. This patch removes it instead and significantly reduces source and binary size of XFS while making maintaince easier. SGI-PV: 970841 SGI-Modid: xfs-linux-melb:xfs-kern:29737a Signed-off-by: Lachlan McIlroy <lachlan@sgi.com> Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Tim Shimmin <tes@sgi.com>	2008-02-07 16:44:14 +11:00
Christoph Hellwig	7642861b7e	[XFS] kill BMAPI_UNWRITTEN There is no reason to go through xfs_iomap for the BMAPI_UNWRITTEN because it has nothing in common with the other cases. Instead check for the shutdown filesystem in xfs_end_bio_unwritten and perform a direct call to xfs_iomap_write_unwritten (which should be renamed to something more sensible one day) SGI-PV: 970241 SGI-Modid: xfs-linux-melb:xfs-kern:29681a Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Donald Douwsma <donaldd@sgi.com> Signed-off-by: Tim Shimmin <tes@sgi.com>	2008-02-07 16:43:44 +11:00
Christoph Hellwig	6214ed4461	[XFS] kill BMAPI_DEVICE There is no reason to go into the iomap machinery just to get the right block device for an inode. Instead look at the realtime flag in the inode and grab the right device from the mount structure. I created a new helper, xfs_find_bdev_for_inode instead of opencoding it because I plan to use it in other places in the future. SGI-PV: 970240 SGI-Modid: xfs-linux-melb:xfs-kern:29680a Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Donald Douwsma <donaldd@sgi.com> Signed-off-by: Tim Shimmin <tes@sgi.com>	2008-02-07 16:43:35 +11:00
Lachlan McIlroy	cf441eeb79	[XFS] clean up vnode/inode tracing Simplify vnode tracing calls by embedding function name & return addr in the calling macro. Also do a lot of vnode->inode renaming for consistency, while we're at it. SGI-PV: 970335 SGI-Modid: xfs-linux-melb:xfs-kern:29650a Signed-off-by: Eric Sandeen <sandeen@sandeen.net> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com> Signed-off-by: Tim Shimmin <tes@sgi.com>	2008-02-07 16:42:19 +11:00
Linus Torvalds	347c53dca7	Merge branch 'for-linus' of git://oss.sgi.com:8090/xfs/xfs-2.6 * 'for-linus' of git://oss.sgi.com:8090/xfs/xfs-2.6: (59 commits) [XFS] eagerly remove vmap mappings to avoid upsetting Xen [XFS] simplify validata_fields [XFS] no longer using io_vnode, as was remaining from 23 cherrypick [XFS] Remove STATIC which was missing from prior manual merge [XFS] Put back the QUEUE_ORDERED_NONE test in the barrier check. [XFS] Turn off XBF_ASYNC flag before re-reading superblock. [XFS] avoid race in sync_inodes() that can fail to write out all dirty data [XFS] This fix prevents bulkstat from spinning in an infinite loop. [XFS] simplify xfs_create/mknod/symlink prototype [XFS] avoid xfs_getattr in XFS_IOC_FSGETXATTR ioctl [XFS] get_bulkall() could return incorrect inode state [XFS] Kill unused IOMAP_EOF flag [XFS] fix when DMAPI mount option processing happens [XFS] ensure file size is logged on synchronous writes [XFS] growlock should be a mutex [XFS] replace some large xfs_log_priv.h macros by proper functions [XFS] kill struct bhv_vfs [XFS] move syncing related members from struct bhv_vfs to struct xfs_mount [XFS] kill the vfs_flags member in struct bhv_vfs [XFS] kill the vfs_fsid and vfs_altfsid members in struct bhv_vfs ...	2007-10-17 09:04:11 -07:00
Fengguang Wu	1f7decf6d9	writeback: remove pages_skipped accounting in __block_write_full_page() Miklos Szeredi <miklos@szeredi.hu> and me identified a writeback bug: > The following strange behavior can be observed: > > 1. large file is written > 2. after 30 seconds, nr_dirty goes down by 1024 > 3. then for some time (< 30 sec) nothing happens (disk idle) > 4. then nr_dirty again goes down by 1024 > 5. repeat from 3. until whole file is written > > So basically a 4Mbyte chunk of the file is written every 30 seconds. > I'm quite sure this is not the intended behavior. It can be produced by the following test scheme: # cat bin/test-writeback.sh grep nr_dirty /proc/vmstat echo 1 > /proc/sys/fs/inode_debug dd if=/dev/zero of=/var/x bs=1K count=204800& while true; do grep nr_dirty /proc/vmstat; sleep 1; done # bin/test-writeback.sh nr_dirty 19207 nr_dirty 19207 nr_dirty 30924 204800+0 records in 204800+0 records out 209715200 bytes (210 MB) copied, 1.58363 seconds, 132 MB/s nr_dirty 47150 nr_dirty 47141 nr_dirty 47142 nr_dirty 47142 nr_dirty 47142 nr_dirty 47142 nr_dirty 47205 nr_dirty 47214 nr_dirty 47214 nr_dirty 47214 nr_dirty 47214 nr_dirty 47214 nr_dirty 47215 nr_dirty 47216 nr_dirty 47216 nr_dirty 47216 nr_dirty 47154 nr_dirty 47143 nr_dirty 47143 nr_dirty 47143 nr_dirty 47143 nr_dirty 47143 nr_dirty 47142 nr_dirty 47142 nr_dirty 47142 nr_dirty 47142 nr_dirty 47134 nr_dirty 47134 nr_dirty 47135 nr_dirty 47135 nr_dirty 47135 nr_dirty 46097 <== -1038 nr_dirty 46098 nr_dirty 46098 nr_dirty 46098 [...] nr_dirty 46091 nr_dirty 46092 nr_dirty 46092 nr_dirty 45069 <== -1023 nr_dirty 45056 nr_dirty 45056 nr_dirty 45056 [...] nr_dirty 37822 nr_dirty 36799 <== -1023 [...] nr_dirty 36781 nr_dirty 35758 <== -1023 [...] nr_dirty 34708 nr_dirty 33672 <== -1024 [...] nr_dirty 33692 nr_dirty 32669 <== -1023 % ls -li /var/x 847824 -rw-r--r-- 1 root root 200M 2007-08-12 04:12 /var/x % dmesg\|grep 847824 # generated by a debug printk [ 529.263184] redirtied inode 847824 line 548 [ 564.250872] redirtied inode 847824 line 548 [ 594.272797] redirtied inode 847824 line 548 [ 629.231330] redirtied inode 847824 line 548 [ 659.224674] redirtied inode 847824 line 548 [ 689.219890] redirtied inode 847824 line 548 [ 724.226655] redirtied inode 847824 line 548 [ 759.198568] redirtied inode 847824 line 548 # line 548 in fs/fs-writeback.c: 543 if (wbc->pages_skipped != pages_skipped) { 544 /* 545 * writeback is not making progress due to locked 546 * buffers. Skip this inode for now. 547 / 548 redirty_tail(inode); 549 } More debug efforts show that __block_write_full_page() never has the chance to call submit_bh() for that big dirty file: the buffer head is clean. So basicly no page io is issued by __block_write_full_page(), hence pages_skipped goes up. Also the comment in generic_sync_sb_inodes(): 544 / 545 * writeback is not making progress due to locked 546 * buffers. Skip this inode for now. 547 / and the comment in __block_write_full_page(): 1713 / 1714 * The page was marked dirty, but the buffers were 1715 * clean. Someone wrote them back by hand with 1716 * ll_rw_block/submit_bh. A rare case. 1717 */ do not quite agree with each other. The page writeback should be skipped for 'locked buffer', but here it is 'clean buffer'! This patch fixes this bug. Though I'm not sure why __block_write_full_page() is called only to do nothing and who actually issued the writeback for us. This is the two possible new behaviors after the patch: 1) pretty nice: wait 30s and write ALL:) 2) not so good: - during the dd: ~16M - after 30s: ~4M - after 5s: ~4M - after 5s: ~176M The next patch will fix case (2). Cc: David Chinner <dgc@sgi.com> Cc: Ken Chen <kenchen@google.com> Signed-off-by: Fengguang Wu <wfg@mail.ustc.edu.cn> Signed-off-by: David Chinner <dgc@sgi.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-17 08:43:02 -07:00
Nick Piggin	d79689c703	xfs: convert to new aops Signed-off-by: Nick Piggin <npiggin@suse.de> Cc: David Chinner <dgc@sgi.com> Cc: Timothy Shimmin <tes@sgi.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-16 09:42:55 -07:00
Tim Shimmin	150f29ef2e	[XFS] no longer using io_vnode, as was remaining from 23 cherrypick Because we cherrypicked SGI-Modid xfs-linux-melb:xfs-kern:29675a and it depended on the sgi mod which removed io_vnode (which was not cherrypicked in 23) it was hand modified. This fixes things back up (to the originial mod) now we have moved on again. Reviewed-by: Lachlan McIlroy <lachlan@sgi.com> Signed-off-by: Tim Shimmin <tes@sgi.com>	2007-10-16 16:20:12 +10:00
Christoph Hellwig	1543d79c45	[XFS] move v_trace from bhv_vnode to xfs_inode struct bhv_vnode is on it's way out, so move the trace buffer to the XFS inode. Note that this makes the tracing macros rather misnamed, but this kind of fallout will be fixed up incrementally later on. SGI-PV: 969608 SGI-Modid: xfs-linux-melb:xfs-kern:29498a Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: David Chinner <dgc@sgi.com> Signed-off-by: Tim Shimmin <tes@sgi.com>	2007-10-16 11:39:25 +10:00
Christoph Hellwig	b677c210ce	[XFS] move v_iocount from bhv_vnode to xfs_inode struct bhv_vnode is on it's way out, so move the I/O count to the XFS inode. SGI-PV: 969608 SGI-Modid: xfs-linux-melb:xfs-kern:29497a Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: David Chinner <dgc@sgi.com> Signed-off-by: Tim Shimmin <tes@sgi.com>	2007-10-16 11:38:56 +10:00
Christoph Hellwig	b3aea4edc2	[XFS] kill the v_flag member in struct bhv_vnode All flags previously handled at the vnode level are not in the xfs_inode where we already have a flags mechanisms and free bits for flags previously in the vnode. SGI-PV: 969608 SGI-Modid: xfs-linux-melb:xfs-kern:29495a Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: David Chinner <dgc@sgi.com> Signed-off-by: Tim Shimmin <tes@sgi.com>	2007-10-16 11:37:29 +10:00
Christoph Hellwig	739bfb2a7d	[XFS] call common xfs vnode-level helpers directly and remove vnode operations SGI-PV: 969608 SGI-Modid: xfs-linux-melb:xfs-kern:29493a Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: David Chinner <dgc@sgi.com> Signed-off-by: Tim Shimmin <tes@sgi.com>	2007-10-16 10:40:00 +10:00
Al Viro	782e3b3b38	Fix up more bio fallout Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-12 00:29:50 -07:00
NeilBrown	6712ecf8f6	Drop 'size' argument from bio_endio and bi_end_io As bi_end_io is only called once when the reqeust is complete, the 'size' argument is now redundant. Remove it. Now there is no need for bio_endio to subtract the size completed from bi_size. So don't do that either. While we are at it, change bi_end_io to return void. Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2007-10-10 09:25:57 +02:00
Lachlan McIlroy	776a75fa5c	[XFS] Ensure file size updates have been completed before writing inode to disk. SGI-PV: 968767 SGI-Modid: xfs-linux-melb:xfs-kern:29675a Signed-off-by: Lachlan McIlroy <lachlan@sgi.com> Signed-off-by: David Chinner <dgc@sgi.com> Signed-off-by: Tim Shimmin <tes@sgi.com>	2007-09-18 20:12:51 +10:00
Christoph Hellwig	265c1fac38	[XFS] fix sparse shadowed variable warnings - in xfs_probe_cluster rename the inner len to pg_len. There's no harm here because the outer len isn't used after the inner len comes into existence but it keeps the code clean. - in xfs_da_do_buf remove the inner i because they don't overlap and they are both the same type. SGI-PV: 968555 SGI-Modid: xfs-linux-melb:xfs-kern:29311a Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: David Chinner <dgc@sgi.com> Signed-off-by: Tim Shimmin <tes@sgi.com>	2007-09-05 14:50:26 +10:00
David Chinner	effd120edb	[XFS] Map unwritten extents correctly for I/o completion processing If we have multiple unwritten extents within a single page, we fail to tell the I/o completion construction handlers we need a new handle for the second and subsequent blocks in the page. While we still issue the I/O correctly, we do not have the correct ranges recorded in the ioend structures and hence when we go to convert the unwritten extents we screw it up. Make sure we start a new ioend every time the mapping changes so that we convert the correct ranges on I/O completion. SGI-PV: 964647 SGI-Modid: xfs-linux-melb:xfs-kern:28797a Signed-off-by: David Chinner <dgc@sgi.com> Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Tim Shimmin <tes@sgi.com>	2007-07-14 15:32:49 +10:00
David Chinner	b2826136a1	[XFS] Handle null returned from xfs_vtoi() in xfs_setfilesize(). SGI-PV: 965636 SGI-Modid: xfs-linux-melb:xfs-kern:28777a Signed-off-by: David Chinner <dgc@sgi.com> Signed-off-by: Olaf Weber <olaf@sgi.com> Signed-off-by: Tim Shimmin <tes@sgi.com>	2007-07-14 15:31:03 +10:00
David Chinner	e927af90aa	[XFS] Block on unwritten extent conversion during synchronous direct I/O. Currently we do not wait on extent conversion to occur, and hence we can return to userspace from a synchronous direct I/O write without having completed all the actions in the write. Hence a read after the write may see zeroes (unwritten extent) rather than the data that was written. Block the I/O completion by triggering a synchronous workqueue flush to ensure that the conversion has occurred before we return to userspace. SGI-PV: 964092 SGI-Modid: xfs-linux-melb:xfs-kern:28775a Signed-off-by: David Chinner <dgc@sgi.com> Signed-off-by: Tim Shimmin <tes@sgi.com>	2007-07-14 15:30:52 +10:00
David Chinner	df3c724426	[XFS] Write at EOF may not update filesize correctly. The recent fix for preventing NULL files from being left around does not update the file size corectly in all cases. The missing case is a write extending the file that does not need to allocate a block. In that case we used a read mapping of the extent which forced the use of the read I/O completion handler instead of the write I/O completion handle. Hence the file size was not updated on I/O completion. SGI-PV: 965068 SGI-Modid: xfs-linux-melb:xfs-kern:28657a Signed-off-by: David Chinner <dgc@sgi.com> Signed-off-by: Nathan Scott <nscott@aconex.com> Signed-off-by: Tim Shimmin <tes@sgi.com>	2007-05-29 18:15:17 +10:00
Lachlan McIlroy	ba87ea699e	[XFS] Fix to prevent the notorious 'NULL files' problem after a crash. The problem that has been addressed is that of synchronising updates of the file size with writes that extend a file. Without the fix the update of a file's size, as a result of a write beyond eof, is independent of when the cached data is flushed to disk. Often the file size update would be written to the filesystem log before the data is flushed to disk. When a system crashes between these two events and the filesystem log is replayed on mount the file's size will be set but since the contents never made it to disk the file is full of holes. If some of the cached data was flushed to disk then it may just be a section of the file at the end that has holes. There are existing fixes to help alleviate this problem, particularly in the case where a file has been truncated, that force cached data to be flushed to disk when the file is closed. If the system crashes while the file(s) are still open then this flushing will never occur. The fix that we have implemented is to introduce a second file size, called the in-memory file size, that represents the current file size as viewed by the user. The existing file size, called the on-disk file size, is the one that get's written to the filesystem log and we only update it when it is safe to do so. When we write to a file beyond eof we only update the in- memory file size in the write operation. Later when the I/O operation, that flushes the cached data to disk completes, an I/O completion routine will update the on-disk file size. The on-disk file size will be updated to the maximum offset of the I/O or to the value of the in-memory file size if the I/O includes eof. SGI-PV: 958522 SGI-Modid: xfs-linux-melb:xfs-kern:28322a Signed-off-by: Lachlan McIlroy <lachlan@sgi.com> Signed-off-by: David Chinner <dgc@sgi.com> Signed-off-by: Tim Shimmin <tes@sgi.com>	2007-05-08 13:49:46 +10:00
David Chinner	6ab8eb1cff	[PATCH] Make XFS use BH_Unwritten and BH_Delay correctly Don't hide buffer_unwritten behind buffer_delay() and remove the hack that clears unexpected buffer_unwritten() states now that it can't happen. Signed-off-by: Dave Chinner <dgc@sgi.com> Acked-by: Christoph Hellwig <hch@lst.de> Cc: Timothy Shimmin <tes@sgi.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-02-12 09:48:27 -08:00
David Chinner	549054afad	[XFS] Fix sub-block zeroing for buffered writes into unwritten extents. When writing less than a filesystem block of data into an unwritten extent via buffered I/O, __xfs_get_blocks fails to set the buffer new flag. As a result, the generic code will not zero either edge of the block resulting in garbage being written to disk either side of the real data. Set the buffer new state on bufferd writes to unwritten extents to ensure that zeroing occurs. SGI-PV: 960328 SGI-Modid: xfs-linux-melb:xfs-kern:28000a Signed-off-by: David Chinner <dgc@sgi.com> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com> Signed-off-by: Tim Shimmin <tes@sgi.com>	2007-02-10 18:36:35 +11:00
David Chinner	7989cb8ef5	[XFS] Keep stack usage down for 4k stacks by using noinline. gcc-4.1 and more recent aggressively inline static functions which increases XFS stack usage by ~15% in critical paths. Prevent this from occurring by adding noinline to the STATIC definition. Also uninline some functions that are too large to be inlined and were causing problems with CONFIG_FORCED_INLINING=y. Finally, clean up all the different users of inline, __inline and __inline__ and put them under one STATIC_INLINE macro. For debug kernels the STATIC_INLINE macro uninlines those functions. SGI-PV: 957159 SGI-Modid: xfs-linux-melb:xfs-kern:27585a Signed-off-by: David Chinner <dgc@sgi.com> Signed-off-by: David Chatterton <chatz@sgi.com> Signed-off-by: Tim Shimmin <tes@sgi.com>	2007-02-10 18:34:56 +11:00
David Chinner	921320210b	[PATCH] Fix XFS after clear_page_dirty() removal XFS appears to call clear_page_dirty to get the mapping tree dirty tag set correctly at the same time the page dirty flag is cleared. I note that this can be done by set_page_writeback() if we clear the dirty flag on the page first when we are writing back the entire page. Hence it seems to me that the XFS call to clear_page_dirty() could easily be substituted by clear_page_dirty_for_io() followed by a call to set_page_writeback() to get the mapping tree tags set correctly after the page has been marked clean. Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-12-21 10:01:08 -08:00
Zach Brown	8459d86aff	[PATCH] dio: only call aio_complete() after returning -EIOCBQUEUED The only time it is safe to call aio_complete() is when the ->ki_retry function returns -EIOCBQUEUED to the AIO core. direct_io_worker() has historically done this by relying on its caller to translate positive return codes into -EIOCBQUEUED for the aio case. It did this by trying to keep conditionals in sync. direct_io_worker() knew when finished_one_bio() was going to call aio_complete(). It would reverse the test and wait and free the dio in the cases it thought that finished_one_bio() wasn't going to. Not surprisingly, it ended up getting it wrong. 'ret' could be a negative errno from the submission path but it failed to communicate this to finished_one_bio(). direct_io_worker() would return < 0, it's callers wouldn't raise -EIOCBQUEUED, and aio_complete() would be called. In the future finished_one_bio()'s tests wouldn't reflect this and aio_complete() would be called for a second time which can manifest as an oops. The previous cleanups have whittled the sync and async completion paths down to the point where we can collapse them and clearly reassert the invariant that we must only call aio_complete() after returning -EIOCBQUEUED. direct_io_worker() will only return -EIOCBQUEUED when it is not the last to drop the dio refcount and the aio bio completion path will only call aio_complete() when it is the last to drop the dio refcount. direct_io_worker() can ensure that it is the last to drop the reference count by waiting for bios to drain. It does this for sync ops, of course, and for partial dio writes that must fall back to buffered and for aio ops that saw errors during submission. This means that operations that end up waiting, even if they were issued as aio ops, will not call aio_complete() from dio. Instead we return the return code of the operation and let the aio core call aio_complete(). This is purposely done to fix a bug where AIO DIO file extensions would call aio_complete() before their callers have a chance to update i_size. Now that direct_io_worker() is explicitly returning -EIOCBQUEUED its callers no longer have to translate for it. XFS needs to be careful not to free resources that will be used during AIO completion if -EIOCBQUEUED is returned. We maintain the previous behaviour of trying to write fs metadata for O_SYNC aio+dio writes. Signed-off-by: Zach Brown <zach.brown@oracle.com> Cc: Badari Pulavarty <pbadari@us.ibm.com> Cc: Suparna Bhattacharya <suparna@in.ibm.com> Acked-by: Jeff Moyer <jmoyer@redhat.com> Cc: <xfs-masters@oss.sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-12-10 09:57:21 -08:00
David Howells	c4028958b6	WorkStruct: make allyesconfig Fix up for make allyesconfig. Signed-Off-By: David Howells <dhowells@redhat.com>	2006-11-22 14:57:56 +00:00
Nathan Scott	b627259c60	[XFS] Remove a no-longer-correct debug assert from dio completion handling. SGI-PV: 955302 SGI-Modid: xfs-linux-melb:xfs-kern:26804a Signed-off-by: Nathan Scott <nathans@sgi.com> Signed-off-by: Tim Shimmin <tes@sgi.com>	2006-09-28 11:03:33 +10:00
Nathan Scott	ed9d88f7b7	[XFS] Fix sparse warning found when page tracing enabled, due to overloaded gfp_t param. SGI-PV: 954580 SGI-Modid: xfs-linux-melb:xfs-kern:26552a Signed-off-by: Nathan Scott <nathans@sgi.com> Signed-off-by: Tim Shimmin <tes@sgi.com>	2006-09-28 10:56:43 +10:00
Lachlan McIlroy	721259bce2	[XFS] Fix ABBA deadlock between i_mutex and iolock. Avoid calling __blockdev_direct_IO for the DIO_OWN_LOCKING case for direct I/O reads since it drops and reacquires the i_mutex while holding the iolock and this violates the locking order. SGI-PV: 955696 SGI-Modid: xfs-linux-melb:xfs-kern:26898a Signed-off-by: Lachlan McIlroy <lachlan@sgi.com> Signed-off-by: David Chatterton <chatz@sgi.com>	2006-09-07 14:27:05 +10:00
Christoph Hellwig	f5e54d6e53	[PATCH] mark address_space_operations const Same as with already do with the file operations: keep them in .rodata and prevents people from doing runtime patching. Signed-off-by: Christoph Hellwig <hch@lst.de> Cc: Steven French <sfrench@us.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-06-28 14:59:04 -07:00
Nathan Scott	f6c2d1fa63	[XFS] Remove version 1 directory code. Never functioned on Linux, just pure bloat. SGI-PV: 952969 SGI-Modid: xfs-linux-melb:xfs-kern:26251a Signed-off-by: Nathan Scott <nathans@sgi.com>	2006-06-20 13:04:51 +10:00
Nathan Scott	67fcaa73ad	[XFS] Resolve a namespace collision on vnode/vnodeops for FreeBSD porters. SGI-PV: 953338 SGI-Modid: xfs-linux-melb:xfs-kern:26107a Signed-off-by: Nathan Scott <nathans@sgi.com>	2006-06-09 17:00:52 +10:00
Nathan Scott	7d4fb40ad7	[XFS] Start writeout earlier (on last close) in the case where we have a truncate down followed by delayed allocation (buffered writes) - worst case scenario for the notorious NULL files problem. This reduces the window where we are exposed to that problem significantly. SGI-PV: 917976 SGI-Modid: xfs-linux-melb:xfs-kern:26100a Signed-off-by: Nathan Scott <nathans@sgi.com>	2006-06-09 15:27:16 +10:00
Nathan Scott	59c1b082f5	[XFS] Make the pflags test/set wrappers more legible for us mere humans. SGI-PV: 953338 SGI-Modid: xfs-linux-melb:xfs-kern:26099a Signed-off-by: Nathan Scott <nathans@sgi.com>	2006-06-09 14:59:13 +10:00
Nathan Scott	7d04a335b6	[XFS] Shutdown the filesystem if all device paths have gone. Made shutdown vop flags consistent with sync vop flags declarations too. SGI-PV: 939911 SGI-Modid: xfs-linux-melb:xfs-kern:26096a Signed-off-by: Nathan Scott <nathans@sgi.com>	2006-06-09 14:58:38 +10:00
Nathan Scott	8272145c05	[XFS] Fix a writepage regression where we accidentally stopped honouring nonblock mode with the new IO path code (since 2.6.16). SGI-PV: 951662 SGI-Modid: xfs-linux-melb:xfs-kern:25676a Signed-off-by: Nathan Scott <nathans@sgi.com>	2006-04-11 15:10:55 +10:00
Nathan Scott	c25366680b	[XFS] Cleanup in XFS after recent get_block_t interface tweaks. Signed-off-by: Nathan Scott <nathans@sgi.com>	2006-03-29 10:44:40 +10:00
Nathan Scott	c41564b5af	[XFS] We really suck at spulling. Thanks to Chris Pascoe for fixing all these typos. SGI-PV: 904196 SGI-Modid: xfs-linux-melb:xfs-kern:25539a Signed-off-by: Nathan Scott <nathans@sgi.com>	2006-03-29 08:55:14 +10:00
Badari Pulavarty	1d8fa7a2b9	[PATCH] remove ->get_blocks() support Now that get_block() can handle mapping multiple disk blocks, no need to have ->get_blocks(). This patch removes fs specific ->get_blocks() added for DIO and makes it users use get_block() instead. Signed-off-by: Badari Pulavarty <pbadari@us.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-03-26 08:57:01 -08:00
Badari Pulavarty	fa30bd058b	[PATCH] map multiple blocks for mpage_readpages() This patch changes mpage_readpages() and get_block() to get the disk mapping information for multiple blocks at the same time. b_size represents the amount of disk mapping that needs to mapped. On the successful get_block() b_size indicates the amount of disk mapping thats actually mapped. Only the filesystems who care to use this information and provide multiple disk blocks at a time can choose to do so. No changes are needed for the filesystems who wants to ignore this. [akpm@osdl.org: cleanups] Signed-off-by: Badari Pulavarty <pbadari@us.ibm.com> Cc: Mingming Cao <cmm@us.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-03-26 08:57:01 -08:00

1 2

90 Commits