It is incorrect to test inode dirty bits without participating in the inode
writeback protocol. Inode writeback sets I_SYNC and clears I_DIRTY_?, then
writes out the particular bits, then clears I_SYNC when it is done. BTW. it
may not completely write all pages out, so I_DIRTY_PAGES would get set
again.
This is a standard pattern used throughout the kernel's writeback caches
(I_SYNC ~= I_WRITEBACK, if that makes it clearer).
And so it is not possible to determine an inode's dirty status just by
checking I_DIRTY bits. Especially not for the purpose of data integrity
syncs.
Missing the check for these bits means that fsync can complete while
writeback to the inode is underway. Inode writeback functions get this
right, so call into them rather than try to shortcut things by testing
dirty state improperly.
Signed-off-by: Nick Piggin <npiggin@kernel.dk>
Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
Add a new helper to write out the inode using the writeback code,
that is including the correct dirty bit and list manipulation. A few
of filesystems already opencode this, and a lot of others should be
using it instead of using write_inode_now which also writes out the
data.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* 'for-linus' of git://git.open-osd.org/linux-open-osd:
exofs: Fix groups code when num_devices is not divisible by group_width
exofs: Remove useless optimization
exofs: exofs_file_fsync and exofs_file_flush correctness
exofs: Remove superfluous dependency on buffer_head and writeback
These changes are crafted based on the similar
conversion done to ext2 by Nick Piggin.
* Remove the deprecated ->truncate vector. Let exofs_setattr
take care of on-disk size updates.
* Call truncate_pagecache on the unused pages if
write_begin/end fails.
* Cleanup exofs_delete_inode that did stupid inode
writes and updates on an inode that will be
removed.
* And finally get rid of exofs_get_block. We never
had any blocks it was all for calling nobh_truncate_page.
nobh_truncate_page is not actually needed in exofs since
the last page is complete and gone, just like all the other
pages. There is no partial blocks in exofs.
I've tested with this patch, and there are no apparent
failures, so far.
CC: Nick Piggin <npiggin@suse.de>
CC: Christoph Hellwig <hch@lst.de>
Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
As per Christoph advise: no need to call filemap_write_and_wait().
In exofs all metadata is at the inode so just writing the inode is
all is needed. ->fsync implies this must be done synchronously.
But now exofs_file_fsync can not be used by exofs_file_flush.
vfs_fsync() should do that job correctly.
FIXME: remove the sb_sync and fix that sb_update better.
Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
The use of file_fsync() in exofs_file_sync() is not necessary since it
does some extra stuff not used by exofs. Open code just the parts that
are currently needed.
TODO: Farther optimization can be done to sync the sb only on inode
update of new files, Usually the sb update is not needed in exofs.
Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
Boaz,
Congrats on getting all the OSD stuff into 2.6.30!
I just pulled the git, and saw that the IBM copyrights are still there.
Please remove them from all files:
* Copyright (C) 2005, 2006
* International Business Machines
IBM has revoked all rights on the code - they gave it to me.
Thanks!
Avishay
Signed-off-by: Avishay Traeger <avishay@gmail.com>
Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
implementation of the file_operations and inode_operations for
regular data files.
Most file_operations are generic vfs implementations except:
- exofs_truncate will truncate the OSD object as well
- Generic file_fsync is not good for none_bd devices so open code it
- The default for .flush in Linux is todo nothing so call exofs_fsync
on the file.
Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>