Commit Graph

849 Commits

Author SHA1 Message Date
Jan Kara 9e33d69f55 ocfs2: Implementation of local and global quota file handling
For each quota type each node has local quota file. In this file it stores
changes users have made to disk usage via this node. Once in a while this
information is synced to global file (and thus with other nodes) so that
limits enforcement at least aproximately works.

Global quota files contain all the information about usage and limits. It's
mostly handled by the generic VFS code (which implements a trie of structures
inside a quota file). We only have to provide functions to convert structures
from on-disk format to in-memory one. We also have to provide wrappers for
various quota functions starting transactions and acquiring necessary cluster
locks before the actual IO is really started.

Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2009-01-05 08:40:23 -08:00
Jan Kara bbbd0eb34b ocfs2: Mark system files as not subject to quota accounting
Mark system files as not subject to quota accounting. This prevents
possible recursions into quota code and thus deadlocks.

Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2009-01-05 08:40:23 -08:00
Jan Kara 1a224ad11e ocfs2: Assign feature bits and system inodes to quota feature and quota files
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2009-01-05 08:40:23 -08:00
Jan Kara 90e86a63ea ocfs2: Support nested transactions
OCFS2 can easily support nested transactions. We just have to
take care and not spoil statistics acquire semaphore unnecessarily.

Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2009-01-05 08:40:23 -08:00
Tao Ma 9f868f16e4 ocfs2/xattr: Restore not_found in xis
During an xattr set, when we move a xattr which was stored in inode to the
outside bucket, we have to delete it and it will use the old value of
xis->not_found. xis->not_found is removed by ocfs2_calc_xattr_set_need
though, so we must restore it.

Signed-off-by: Tao Ma <tao.ma@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2009-01-05 08:36:55 -08:00
Tao Ma 97aff52ae1 ocfs2/xattr: Fix a bug in xattr allocation estimation
When we extend one xattr's value to a large size, the old value size might
be smaller than the size of a value root. In those cases, we still need to
guess the metadata allocation.

Reported-by: Tiger Yang <tiger.yang@oracle.com>
Signed-off-by: Tao Ma <tao.ma@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2009-01-05 08:36:55 -08:00
Mark Fasheh 53ef99cad9 ocfs2: Remove JBD compatibility layer
JBD2 is fully backwards compatible with JBD and it's been tested enough with
Ocfs2 that we can clean this code up now.

Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2009-01-05 08:36:55 -08:00
Joel Becker 511308d90b ocfs2: Convert ocfs2_read_dir_block() to ocfs2_read_virt_blocks()
Now that we've centralized the ocfs2_read_virt_blocks() code, let's use
it in ocfs2_read_dir_block().

Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2009-01-05 08:36:55 -08:00
Joel Becker a8549fb5ab ocfs2: Wrap virtual block reads in ocfs2_read_virt_blocks()
The ocfs2_read_dir_block() function really maps an inode's virtual
blocks to physical ones before calling ocfs2_read_blocks().  Let's
extract that to common code, because other places might want to do that.

Other than the block number being virtual, ocfs2_read_virt_blocks()
takes the same arguments as ocfs2_read_blocks().  It converts those
virtual block numbers to physical before calling ocfs2_read_blocks()
directly.  If the blocks asked for are discontiguous, this can mean
multiple calls to ocfs2_read_blocks(), but this is mostly hidden from
the caller.

Like ocfs2_read_blocks(), the caller can pass in an existing
buffer_head.  This is usually done to pick up some readahead I/O.
ocfs2_read_virt_blocks() checks the buffer_head's block number
against the extent map - it must match.

Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2009-01-05 08:36:54 -08:00
Joel Becker 970e4936d7 ocfs2: Validate metadata only when it's read from disk.
Add an optional validation hook to ocfs2_read_blocks().  Now the
validation function is only called when a block was actually read off of
disk.  It is not called when the buffer was in cache.

We add a buffer state bit BH_NeedsValidate to flag these buffers.  It
must always be one higher than the last JBD2 buffer state bit.

The dinode, dirblock, extent_block, and xattr_block validators are
lifted to this scheme directly.  The group_descriptor validator needs to
be split into two pieces.  The first part only needs the gd buffer and
is passed to ocfs2_read_block().  The second part requires the dinode as
well, and is called every time.  It's only 3 compares, so it's tiny.
This also allows us to clean up the non-fatal gd check used by resize.c.
It now has no magic argument.

Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2009-01-05 08:36:53 -08:00
Joel Becker 4ae1d69bed ocfs2: Wrap xattr block reads in a dedicated function
We weren't consistently checking xattr blocks after we read them.
Most places checked the signature, but none checked xb_blkno or
xb_fs_signature.  Create a toplevel ocfs2_read_xattr_block() that does
the read and the validation.

Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2009-01-05 08:36:53 -08:00
Joel Becker a22305cc69 ocfs2: Wrap dirblock reads in a dedicated function.
We have ocfs2_bread() as a vestige of the original ext-based dir code.
It's only used by directories, though.  Turn it into
ocfs2_read_dir_block(), with a prototype matching the other metadata
read functions.  It's set up to validate dirblocks when the time comes.

Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2009-01-05 08:36:53 -08:00
Joel Becker 5e96581a37 ocfs2: Wrap extent block reads in a dedicated function.
We weren't consistently checking extent blocks after we read them.
Most places checked the signature, but none checked h_blkno or
h_fs_signature.  Create a toplevel ocfs2_read_extent_block() that does
the read and the validation.

Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2009-01-05 08:36:53 -08:00
Joel Becker 4203530613 ocfs2: Morph the haphazard OCFS2_IS_VALID_GROUP_DESC() checks.
Random places in the code would check a group descriptor bh to see if it
was valid. The previous commit unified descriptor block reads,
validating all block reads in the same place.  Thus, these checks are no
longer necessary.  Rather than eliminate them, however, we change them
to BUG_ON() checks.  This ensures the assumptions remain true.  All of
the code paths to these checks have been audited to ensure they come
from a validated descriptor read.

Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2009-01-05 08:36:53 -08:00
Joel Becker 68f64d471b ocfs2: Wrap group descriptor reads in a dedicated function.
We have a clean call for validating group descriptors, but every place
that wants the always does a read_block()+validate() call pair.  Create
a toplevel ocfs2_read_group_descriptor() that does the right
thing.  This allows us to leverage the single call point later for
fancier handling.  We also add validation of gd->bg_generation against
the superblock and gd->bg_blkno against the block we thought we read.

Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2009-01-05 08:36:53 -08:00
Joel Becker 57e3e79711 ocfs2: Consolidate validation of group descriptors.
Currently the validation of group descriptors is directly duplicated so
that one version can error the filesystem and the other (resize) can
just report the problem.  Consolidate to one function that takes a
boolean.  Wrap that function with the old call for the old users.

This is in preparation for lifting the read+validate step into a
single function.

Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2009-01-05 08:36:53 -08:00
Joel Becker 10995aa245 ocfs2: Morph the haphazard OCFS2_IS_VALID_DINODE() checks.
Random places in the code would check a dinode bh to see if it was
valid.  Not only did they do different levels of validation, they
handled errors in different ways.

The previous commit unified inode block reads, validating all block
reads in the same place.  Thus, these haphazard checks are no longer
necessary.  Rather than eliminate them, however, we change them to
BUG_ON() checks.  This ensures the assumptions remain true.  All of the
code paths to these checks have been audited to ensure they come from a
validated inode read.

Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2009-01-05 08:36:52 -08:00
Joel Becker b657c95c11 ocfs2: Wrap inode block reads in a dedicated function.
The ocfs2 code currently reads inodes off disk with a simple
ocfs2_read_block() call.  Each place that does this has a different set
of sanity checks it performs.  Some check only the signature.  A couple
validate the block number (the block read vs di->i_blkno).  A couple
others check for VALID_FL.  Only one place validates i_fs_generation.  A
couple check nothing.  Even when an error is found, they don't all do
the same thing.

We wrap inode reading into ocfs2_read_inode_block().  This will validate
all the above fields, going readonly if they are invalid (they never
should be).  ocfs2_read_inode_block_full() is provided for the places
that want to pass read_block flags.  Every caller is passing a struct
inode with a valid ip_blkno, so we don't need a separate blkno argument
either.

We will remove the validation checks from the rest of the code in a
later commit, as they are no longer necessary.

Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2009-01-05 08:36:52 -08:00
Tiger Yang a68979b857 ocfs2: add mount option and Kconfig option for acl
This patch adds the Kconfig option "CONFIG_OCFS2_FS_POSIX_ACL"
and mount options "acl" to enable acls in Ocfs2.

Signed-off-by: Tiger Yang <tiger.yang@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2009-01-05 08:36:52 -08:00
Tiger Yang 89c38bd0ad ocfs2: add ocfs2_init_acl in mknod
We need to get the parent directories acls and let the new child inherit it.
To this, we add additional calculations for data/metadata allocation.

Signed-off-by: Tiger Yang <tiger.yang@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2009-01-05 08:34:20 -08:00
Tiger Yang 060bc66dd5 ocfs2: add ocfs2_acl_chmod
This function is used to update acl xattrs during file mode changes.

Signed-off-by: Tiger Yang <tiger.yang@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2009-01-05 08:34:20 -08:00
Tiger Yang 23fc2702be ocfs2: add ocfs2_check_acl
This function is used to enhance permission checking with POSIX ACLs.

Signed-off-by: Tiger Yang <tiger.yang@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2009-01-05 08:34:20 -08:00
Tiger Yang 929fb014e0 ocfs2: add POSIX ACL API
This patch adds POSIX ACL(access control lists) APIs in ocfs2. We convert
struct posix_acl to many ocfs2_acl_entry and regard them as an extended
attribute entry.

Signed-off-by: Tiger Yang <tiger.yang@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2009-01-05 08:34:20 -08:00
Tiger Yang 4e3e9d027f ocfs2: add ocfs2_xattr_get_nolock
This function does the work of ocfs2_xattr_get under an open lock.

Signed-off-by: Tiger Yang <tiger.yang@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2009-01-05 08:34:20 -08:00
Tiger Yang 534eadddc1 ocfs2: add ocfs2_init_security in during file create
Security attributes must be set when creating a new inode.

We do this in three steps.

- First, get security xattr's name and value by security_operation

- Calculate and reserve the meta data and clusters needed by this security
  xattr before starting transaction

- Finally, we set it before add_entry

Signed-off-by: Tiger Yang <tiger.yang@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2009-01-05 08:34:20 -08:00
Tiger Yang 923f7f3102 ocfs2: add security xattr API
This patch add security xattr set/get/list APIs to
support security attributes in Ocfs2.

Signed-off-by: Tiger Yang <tiger.yang@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2009-01-05 08:34:20 -08:00
Tiger Yang 6c3faba442 ocfs2: add ocfs2_xattr_set_handle
This function is used to set xattr's in a started transaction. It is only
called during inode creation inode for initial security/acl xattrs of the
new inode. These xattrs could be put into ibody or extent block, so xattr
bucket would not be use in this case.

Signed-off-by: Tiger Yang <tiger.yang@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2009-01-05 08:34:19 -08:00
Tiger Yang f5d362022a ocfs2: move new inode allocation out of the transaction
Move out inode allocation from ocfs2_mknod_locked() because
vfs_dq_init() must be called outside of a transaction.

Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Tiger Yang <tiger.yang@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2009-01-05 08:34:19 -08:00
Mark Fasheh fecc01126d ocfs2: turn __ocfs2_remove_inode_range() into ocfs2_remove_btree_range()
This patch genericizes the high level handling of extent removal.
ocfs2_remove_btree_range() is nearly identical to
__ocfs2_remove_inode_range(), except that extent tree operations have been
used where necessary. We update ocfs2_remove_inode_range() to use the
generic helper. Now extent tree based structures have an easy way to
truncate ranges.

Signed-off-by: Mark Fasheh <mfasheh@suse.com>
Acked-by: Joel Becker <joel.becker@oracle.com>
2009-01-05 08:34:19 -08:00
Tao Ma 85db90e778 ocfs2/xattr: Merge xattr set transaction.
In current ocfs2/xattr, the whole xattr set is divided into
many steps are many transaction are used, this make the
xattr set process isn't like a real transaction, so this
patch try to merge all the transaction into one. Another
benefit is that acl can use it easily now.

I don't merge the transaction of deleting xattr when we
remove an inode. The reason is that if we have a large number
of xattrs and every xattrs has large values(large enough
for outside storage), the whole transaction will be very
huge and it looks like jbd can't handle it(I meet with a
jbd complain once). And the old inode removal is also divided
into many steps, so I'd like to leave as it is.

Note:
In xattr set, I try to avoid ocfs2_extend_trans since if
the credits aren't enough for the extension, it will commit
all the dirty blocks and create a new transaction which may
lead to inconsistency in metadata. All ocfs2_extend_trans
remained are safe now.

Signed-off-by: Tao Ma <tao.ma@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2009-01-05 08:34:19 -08:00
Tao Ma 78f30c314a ocfs2/xattr: Reserve meta/data at the beginning of ocfs2_xattr_set.
In ocfs2 xattr set, we reserve metadata and clusters in any place
they are needed. It is time-consuming and ineffective, so this
patch try to reserve metadata and clusters at the beginning of
ocfs2_xattr_set.

Signed-off-by: Tao Ma <tao.ma@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2009-01-05 08:34:19 -08:00
Tao Ma c73f60f900 ocfs2/xattr: Move clusters free into dealloc.
Move clusters free process into dealloc context so that
they can be freed after the transaction.

Signed-off-by: Tao Ma <tao.ma@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2009-01-05 08:34:19 -08:00
Tao Ma 2891d290aa ocfs2: Add clusters free in dealloc_ctxt.
Now in ocfs2 xattr set, the whole process are divided into many small
parts and they are wrapped into diffrent transactions and it make the
set doesn't look like a real transaction. So we want to integrate it
into a real one.

In some cases we will allocate some clusters and free some in just one
transaction. e.g, one xattr is larger than inline size, so it and its
value root is stored within the inode while the value is outside in a
cluster. Then we try to update it with a smaller value(larger than the
size of root but smaller than inline size), we may need to free the
outside cluster while allocate a new bucket(one cluster) since now the
inode may be full. The old solution will lock the global_bitmap(if the
local alloc failed in stress test) and then the truncate log. This will
cause a ABBA lock with truncate log flush.

This patch add the clusters free in dealloc_ctxt, so that we can record
the free clusters during the transaction and then free it after we
release the global_bitmap in xattr set.

Signed-off-by: Tao Ma <tao.ma@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2009-01-05 08:34:18 -08:00
Tao Ma 976331d878 ocfs2/xattr: Only extend xattr bucket in need.
When the first block of a bucket is filled up with xattr
entries, we normally extend the bucket. But if we are
just replace one xattr with small length, we don't need
to extend it. This is important since we will calculate
what we need before the transaction and in this situation
no resources will be allocated.

Signed-off-by: Tao Ma <tao.ma@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2009-01-05 08:34:18 -08:00
Tao Ma 757055adc5 ocfs2/xattr: Only set buffer update if it doesn't exist in cache.
When we call ocfs2_init_xattr_bucket, we deem that the new buffer head
will be written to disk immediately, so we just use sb_getblk. But in
some cases the buffer may have already been in ocfs2 uptodate cache,
so we only call ocfs2_set_buffer_uptodate if the buffer head isn't
in the cache.

Signed-off-by: Tao Ma <tao.ma@oracle.com>
Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2009-01-05 08:34:18 -08:00
Tao Ma 1c32a2fd46 ocfs2/xattr: Remove additional bucket allocation in bucket defragment.
Joel has refactored xattr bucket and make xattr bucket a general
wrapper. So in ocfs2_defrag_xattr_bucket, we have already passed the
bucket in, so there is no need to allocate a new one and read it.

Signed-off-by: Tao Ma <tao.ma@oracle.com>
Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2009-01-05 08:34:18 -08:00
Joel Becker 02dbf38d19 ocfs2: Use buckets in ocfs2_xattr_set_entry_in_bucket().
The ocfs2_xattr_set_entry_in_bucket() function is already working on an
ocfs2_xattr_bucket structure, so let's use the bucket API.

Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2009-01-05 08:34:18 -08:00
Joel Becker 161d6f30f1 ocfs2: Use buckets in ocfs2_defrag_xattr_bucket().
Use the ocfs2_xattr_bucket abstraction for reading and writing the
bucket in ocfs2_defrag_xattr_bucket().

Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2009-01-05 08:34:18 -08:00
Joel Becker 178eeac354 ocfs2: Use buckets in ocfs2_xattr_create_index_block().
Use the ocfs2_xattr_bucket abstraction in
ocfs2_xattr_create_index_block() and its helpers.  We get more efficient
reads, a lot less buffer_head munging, and nicer code to boot.  While
we're at it, ocfs2_xattr_update_xattr_search() becomes void.

Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2009-01-05 08:34:18 -08:00
Joel Becker e2356a3f02 ocfs2: Use buckets in ocfs2_xattr_bucket_find().
Change the ocfs2_xattr_bucket_find() function to use ocfs2_xattr_bucket
as its abstraction.  This makes for more efficient reads, as buckets are
linear blocks, and also has improved caching characteristics.  It also
reads better.

Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2009-01-05 08:34:17 -08:00
Joel Becker ba93712759 ocfs2: Take ocfs2_xattr_bucket structures off of the stack.
The ocfs2_xattr_bucket structure is a nice abstraction, but it is a bit
large to have on the stack.  Just like ocfs2_path, let's allocate it
with a ocfs2_xattr_bucket_new() function.

We can now store the inode on the bucket, cleaning up all the other
bucket functions.  While we're here, we catch another place or two that
wasn't using ocfs2_read_xattr_bucket().

Updates:
- No longer allocating xis.bucket, as it will never be used.

Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2009-01-05 08:34:17 -08:00
Joel Becker 4980c6daba ocfs2: Copy xattr buckets with a dedicated function.
Now that the places that copy whole buckets are using struct
ocfs2_xattr_bucket, we can do the copy in a dedicated function.

Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2009-01-05 08:34:17 -08:00
Joel Becker 1224be020f ocfs2: Wrap journal_access/journal_dirty for xattr buckets.
A common action is to call ocfs2_journal_access() and
ocfs2_journal_dirty() on the buffer heads of an xattr bucket.  Let's
create nice wrappers.

While we're there, let's drop the places that try to be smart by writing
only the first and last blocks of a bucket.  A bucket is contiguous, so
writing the whole thing is actually more efficient.

Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2009-01-05 08:34:17 -08:00
Joel Becker 784b816a91 ocfs2: Improve ocfs2_read_xattr_bucket().
The ocfs2_read_xattr_bucket() function would read an xattr bucket into a
list of buffer heads.  However, we have a nice ocfs2_xattr_bucket
structure.  Let's have it fill that out instead.

In addition, ocfs2_read_xattr_bucket() would initialize buffer heads for
a bucket that's never been on disk before.  That's confusing.  Let's
call that functionality ocfs2_init_xattr_bucket().

The functions ocfs2_cp_xattr_bucket() and ocfs2_half_xattr_bucket() are
updated to use the ocfs2_xattr_bucket structure rather than raw bh
lists.  That way they can use the new read/init calls.  In addition,
they drop the wasted read of an existing target bucket.

Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2009-01-05 08:34:17 -08:00
Joel Becker 6dde41d9e7 ocfs2: Provide a wrapper to brelse() xattr bucket buffers.
A common theme is walking all the buffer heads on an ocfs2_xattr_bucket
and releasing them.  Let's wrap that.

Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2009-01-05 08:34:17 -08:00
Joel Becker 3e6329463e ocfs2: Convenient access to an xattr bucket's header.
The xattr code often wants to access the ocfs2_xattr_header at the start
of an bucket.  Rather than walk the pointer chains, let's just create
another nice macro.  As a side benefit, we can get rid of the mostly
spurious ->bu_xh element on the bucket structure.  The idea is ripped
from the ocfs2_path code.

Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2009-01-05 08:34:16 -08:00
Joel Becker 51def39f0c ocfs2: Convenient access to xattr bucket data blocks.
The xattr code often wants to access the data pointer for blocks in an
xattr bucket.  This is usually found by dereferencing the bh array
hanging off of the ocfs2_xattr_bucket structure.  Rather than do this
all the time, let's provide a nice little macro.  The idea is ripped
from the ocfs2_path code.

Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2009-01-05 08:34:16 -08:00
Joel Becker 9c7759aa67 ocfs2: Convenient access to an xattr bucket's block number.
The xattr code often wants to know the block number of an xattr bucket.
This is usually found by dereferencing the first bh hanging off of the
ocfs2_xattr_bucket structure.  Rather than do this all the time, let's
provide a nice little macro.  The idea is ripped from the ocfs2_path
code.

Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2009-01-05 08:34:16 -08:00
Joel Becker 4ac6032d6c ocfs2: Field prefixes for the xattr_bucket structure
The ocfs2_xattr_bucket structure keeps track of the buffers for one
xattr bucket.  Let's prefix the fields for easier code navigation.

Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2009-01-05 08:34:16 -08:00
Linus Torvalds 0191b625ca Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6: (1429 commits)
  net: Allow dependancies of FDDI & Tokenring to be modular.
  igb: Fix build warning when DCA is disabled.
  net: Fix warning fallout from recent NAPI interface changes.
  gro: Fix potential use after free
  sfc: If AN is enabled, always read speed/duplex from the AN advertising bits
  sfc: When disabling the NIC, close the device rather than unregistering it
  sfc: SFT9001: Add cable diagnostics
  sfc: Add support for multiple PHY self-tests
  sfc: Merge top-level functions for self-tests
  sfc: Clean up PHY mode management in loopback self-test
  sfc: Fix unreliable link detection in some loopback modes
  sfc: Generate unique names for per-NIC workqueues
  802.3ad: use standard ethhdr instead of ad_header
  802.3ad: generalize out mac address initializer
  802.3ad: initialize ports LACPDU from const initializer
  802.3ad: remove typedef around ad_system
  802.3ad: turn ports is_individual into a bool
  802.3ad: turn ports is_enabled into a bool
  802.3ad: make ntt bool
  ixgbe: Fix set_ringparam in ixgbe to use the same memory pools.
  ...

Fixed trivial IPv4/6 address printing conflicts in fs/cifs/connect.c due
to the conversion to %pI (in this networking merge) and the addition of
doing IPv6 addresses (from the earlier merge of CIFS).
2008-12-28 12:49:40 -08:00
James Morris cbacc2c7f0 Merge branch 'next' into for-linus 2008-12-25 11:40:09 +11:00
Joel Becker a97721894a ocfs2: Add JBD2 compat feature bit.
Define the OCFS2_FEATURE_COMPAT_JBD2 bit in the filesystem header.

Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2008-12-16 18:26:16 -08:00
Tao Ma 83099bc647 ocfs2: Always update xattr search when creating bucket.
When we create xattr bucket during the process of xattr set, we always
need to update the ocfs2_xattr_search since even if the bucket size is
the same as block size, the offset will change because of the removal
of the ocfs2_xattr_block header.

Signed-off-by: Tao Ma <tao.ma@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2008-12-16 14:07:37 -08:00
James Morris ec98ce480a Merge branch 'master' into next
Conflicts:
	fs/nfsd/nfs4recover.c

Manually fixed above to use new creds API functions, e.g.
nfs4_save_creds().

Signed-off-by: James Morris <jmorris@namei.org>
2008-12-04 17:16:36 +11:00
David S. Miller aa2ba5f108 Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6
Conflicts:

	drivers/net/ixgbe/ixgbe_main.c
	drivers/net/smc91x.c
2008-12-02 19:50:27 -08:00
Mark Fasheh d6b58f89f7 ocfs2: fix regression in ocfs2_read_blocks_sync()
We're panicing in ocfs2_read_blocks_sync() if a jbd-managed buffer is seen.
At first glance, this seems ok but in reality it can happen. My test case
was to just run 'exorcist'. A struct inode is being pushed out of memory but
is then re-read at a later time, before the buffer has been checkpointed by
jbd. This causes a BUG to be hit in ocfs2_read_blocks_sync().

Reviewed-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2008-12-01 14:46:58 -08:00
Coly Li 07d9a3954a ocfs2: fix return value set in init_dlmfs_fs()
In init_dlmfs_fs(), if calling kmem_cache_create() failed, the code will use return value from
calling bdi_init(). The correct behavior should be set status as -ENOMEM before going to "bail:".

Signed-off-by: Coly Li <coyli@suse.de>
Acked-by: Sunil Mushran <sunil.mushran@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2008-12-01 14:46:55 -08:00
David Teigland 07f9eebcdf ocfs2: fix wake_up in unlock_ast
In ocfs2_unlock_ast(), call wake_up() on lockres before releasing
the spin lock on it.  As soon as the spin lock is released, the
lockres can be freed.

Signed-off-by: David Teigland <teigland@redhat.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2008-12-01 14:46:45 -08:00
David Teigland 66f502a416 ocfs2: initialize stack_user lvbptr
The locking_state dump, ocfs2_dlm_seq_show, reads the lvb on locks where it
has not yet been initialized by a lock call.

Signed-off-by: David Teigland <teigland@redhat.com>
Acked-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2008-12-01 14:46:39 -08:00
Coly Li 3b5da0189c ocfs2: comments typo fix
This patch fixes two typos in comments of ocfs2.

Signed-off-by: Coly Li <coyli@suse.de>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2008-12-01 14:46:31 -08:00
James Morris 2b82892565 Merge branch 'master' into next
Conflicts:
	security/keys/internal.h
	security/keys/process_keys.c
	security/keys/request_key.c

Fixed conflicts above by using the non 'tsk' versions.

Signed-off-by: James Morris <jmorris@namei.org>
2008-11-14 11:29:12 +11:00
David Howells b19c2a3b83 CRED: Wrap task credential accesses in the OCFS2 filesystem
Wrap access to task credentials so that they can be separated more easily from
the task_struct during the introduction of COW creds.

Change most current->(|e|s|fs)[ug]id to current_(|e|s|fs)[ug]id().

Change some task->e?[ug]id to task_e?[ug]id().  In some places it makes more
sense to use RCU directly rather than a convenient wrapper; these will be
addressed by later patches.

Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: James Morris <jmorris@namei.org>
Acked-by: Serge Hallyn <serue@us.ibm.com>
Acked-by: Mark Fasheh <mfasheh@suse.com>
Cc: Joel Becker <joel.becker@oracle.com>
Cc: ocfs2-devel@oss.oracle.com
Signed-off-by: James Morris <jmorris@namei.org>
2008-11-14 10:38:59 +11:00
David S. Miller 7e452baf6b Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6
Conflicts:

	drivers/message/fusion/mptlan.c
	drivers/net/sfc/ethtool.c
	net/mac80211/debugfs_sta.c
2008-11-11 15:43:02 -08:00
Tiger Yang 6c1e183e12 ocfs2: Check search result in ocfs2_xattr_block_get()
ocfs2_xattr_block_get() calls ocfs2_xattr_search() to find an external
xattr, but doesn't check the search result that is passed back via struct
ocfs2_xattr_search. Add a check for search result, and pass back -ENODATA if
the xattr search failed. This avoids a later NULL pointer error.

Signed-off-by: Tiger Yang <tiger.yang@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2008-11-10 09:51:47 -08:00
Mark Fasheh de29c08528 ocfs2: fix printk related build warnings in xattr.c
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2008-11-10 09:51:47 -08:00
Dmitri Monakhov c435400140 ocfs2: truncate outstanding block after direct io failure
Signed-off-by: Dmitri Monakhov <dmonakhov@openvz.org>
Cc: Jeff Moyer <jmoyer@redhat.com>
Cc: Mark Fasheh <mark.fasheh@oracle.com>
Cc: Joel Becker <Joel.Becker@oracle.com>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2008-11-10 09:51:47 -08:00
Tao Ma 80bcaf3469 ocfs2/xattr: Proper hash collision handle in bucket division
In ocfs2/xattr, we must make sure the xattrs which have the same hash value
exist in the same bucket so that the search schema can work. But in the old
implementation, when we want to extend a bucket, we just move half number of
xattrs to the new bucket. This works in most cases, but if we are lucky
enough we will move 2 xattrs into 2 different buckets. This means that an
xattr from the previous bucket cannot be found anymore. This patch fix this
problem by finding the right position during extending the bucket and extend
an empty bucket if needed.

Signed-off-by: Tao Ma <tao.ma@oracle.com>
Cc: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2008-11-10 09:51:47 -08:00
Tao Ma 4c1bbf1ba6 ocfs2: return 0 in page_mkwrite to let VFS retry.
In ocfs2_page_mkwrite, we return -EINVAL when we found the page mapping
isn't updated, and it will cause the user space program get SIGBUS and
exit. The reason is that during race writeable mmap, we will do
unmap_mapping_range in ocfs2_data_downconvert_worker. The good thing is
that if we reuturn 0 in page_mkwrite, VFS will retry fault and then
call page_mkwrite again, so it is safe to return 0 here.

Signed-off-by: Tao Ma <tao.ma@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2008-11-10 09:51:47 -08:00
Sunil Mushran ae0dff6830 ocfs2: Set journal descriptor to NULL after journal shutdown
Patch sets journal descriptor to NULL after the journal is shutdown.
This ensures that jbd2_journal_release_jbd_inode(), which removes the
jbd2 inode from txn lists, can be called safely from ocfs2_clear_inode()
even after the journal has been shutdown.

Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>
Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2008-11-10 09:51:47 -08:00
Tao Ma d32647993c ocfs2: Fix check of return value of ocfs2_start_trans() in xattr.c.
On failure, ocfs2_start_trans() returns values like ERR_PTR(-ENOMEM),
so we should check whether handle is NULL. Fix them to use IS_ERR().
Jan has made the patch for other part in ocfs2(thank Jan for it), so
this is just the fix for fs/ocfs2/xattr.c.

Signed-off-by: Tao Ma <tao.ma@oracle.com>
Cc: Jan Kara <jack@suse.cz>
Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2008-11-10 09:51:47 -08:00
Jan Kara b99835c168 ocfs2: Let inode be really deleted when ocfs2_mknod_locked() fails
We forgot to set i_nlink to 0 when returning due to error from ocfs2_mknod_locked()
and thus inode was not properly released via ocfs2_delete_inode() (e.g. claimed
space was not released). Fix it.

Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2008-11-10 09:51:46 -08:00
Jan Kara 87cfa00432 ocfs2: Fix checking of return value of new_inode()
new_inode() does not return ERR_PTR() but NULL in case of failure. Correct
checking of the return value.

Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2008-11-10 09:51:46 -08:00
Jan Kara fa38e92cb3 ocfs2: Fix check of return value of ocfs2_start_trans()
On failure, ocfs2_start_trans() returns values like ERR_PTR(-ENOMEM).
Thus checks for !handle are wrong. Fix them to use IS_ERR().

Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2008-11-10 09:51:46 -08:00
Tao Ma 8573f79d30 ocfs2: Fix some typos in xattr annotations.
Fix some typos in the xattr annotations.

Signed-off-by: Tao Ma <tao.ma@oracle.com>
Reported-by: Coly Li <coyli@suse.de>
Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2008-11-10 09:51:46 -08:00
Tao Ma 63fd775737 ocfs2: Remove unused ocfs2_restore_xattr_block().
Since now ocfs2 supports empty xattr buckets, we will never remove
the xattr index tree even if all the xattrs are removed, so this
function will never be called. So remove it.

Signed-off-by: Tao Ma <tao.ma@oracle.com>
Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2008-11-10 09:51:46 -08:00
Joel Becker 54f443f4e7 ocfs2: Don't repeat ocfs2_xattr_block_find()
ocfs2_xattr_block_get() looks up the xattr in a startlingly familiar
way; it's identical to the function ocfs2_xattr_block_find().  Let's just
use the later in the former.

Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2008-11-10 09:51:46 -08:00
Joel Becker eb6ff2397d ocfs2: Specify appropriate journal access for new xattr buckets.
There are a couple places that get an xattr bucket that may be reading
an existing one or may be allocating a new one.  They should specify the
correct journal access mode depending.

Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2008-11-10 09:51:46 -08:00
Joel Becker bd60bd37ad ocfs2: Check errors from ocfs2_xattr_update_xattr_search()
The ocfs2_xattr_update_xattr_search() function can return an error when
trying to read blocks off of disk.  The caller needs to check this error
before using those (possibly invalid) blocks.

Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2008-11-10 09:51:45 -08:00
Joel Becker b37c4d84e9 ocfs2: Don't return -EFAULT from a corrupt xattr entry.
If the xattr disk structures are corrupt, return -EIO, not -EFAULT.

Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2008-11-10 09:51:45 -08:00
Joel Becker f6087fb799 ocfs2: Check xattr block signatures properly.
The xattr.c code is currently memcmp()ing naking buffer pointers.
Create the OCFS2_IS_VALID_XATTR_BLOCK() macro to match its peers and use
that.

In addition, failed signature checks were returning -EFAULT, which is
completely wrong.  Return -EIO.

Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2008-11-10 09:51:44 -08:00
Tiger Yang c988fd045f ocfs2: add handler_map array bounds checking
Make the handler_map array as large as the possible value range to avoid
a fencepost error.

[ Utilize alternate method -- Joel ]

Signed-off-by: Tiger Yang <tiger.yang@oracle.com>
Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2008-11-10 09:51:44 -08:00
Tiger Yang ceb1eba3dc ocfs2: remove duplicate definition in xattr
Include/linux/xattr.h already has the definition about xattr prefix,
so remove the duplicate definitions in xattr.c.

Signed-off-by: Tiger Yang <tiger.yang@oracle.com>
Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2008-11-10 09:51:44 -08:00
Tiger Yang 0030e00150 ocfs2: fix function declaration and definition in xattr
Because we merged the xattr sources into one file, some functions
no longer belong in the header file.

Signed-off-by: Tiger Yang <tiger.yang@oracle.com>
Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2008-11-10 09:51:44 -08:00
Tiger Yang c3cb682735 ocfs2: fix license in xattr
This patch fixes the license in xattr.c and xattr.h.

Signed-off-by: Tiger Yang <tiger.yang@oracle.com>
Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2008-11-10 09:51:43 -08:00
Harvey Harrison be85940548 fs: replace NIPQUAD()
Using NIPQUAD() with NIPQUAD_FMT, %d.%d.%d.%d or %u.%u.%u.%u
can be replaced with %pI4

Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-10-31 00:56:28 -07:00
Nick Piggin 4e02ed4b4a fs: remove prepare_write/commit_write
Nothing uses prepare_write or commit_write. Remove them from the tree
completely.

[akpm@linux-foundation.org: schedule simple_prepare_write() for unexporting]
Signed-off-by: Nick Piggin <npiggin@suse.de>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-10-30 11:38:45 -07:00
Linus Torvalds 2248485640 Merge git://git.kernel.org/pub/scm/linux/kernel/git/viro/bdev
* git://git.kernel.org/pub/scm/linux/kernel/git/viro/bdev: (66 commits)
  [PATCH] kill the rest of struct file propagation in block ioctls
  [PATCH] get rid of struct file use in blkdev_ioctl() BLKBSZSET
  [PATCH] get rid of blkdev_locked_ioctl()
  [PATCH] get rid of blkdev_driver_ioctl()
  [PATCH] sanitize blkdev_get() and friends
  [PATCH] remember mode of reiserfs journal
  [PATCH] propagate mode through swsusp_close()
  [PATCH] propagate mode through open_bdev_excl/close_bdev_excl
  [PATCH] pass fmode_t to blkdev_put()
  [PATCH] kill the unused bsize on the send side of /dev/loop
  [PATCH] trim file propagation in block/compat_ioctl.c
  [PATCH] end of methods switch: remove the old ones
  [PATCH] switch sr
  [PATCH] switch sd
  [PATCH] switch ide-scsi
  [PATCH] switch tape_block
  [PATCH] switch dcssblk
  [PATCH] switch dasd
  [PATCH] switch mtd_blkdevs
  [PATCH] switch mmc
  ...
2008-10-23 10:23:07 -07:00
Christoph Hellwig 440037287c [PATCH] switch all filesystems over to d_obtain_alias
Switch all users of d_alloc_anon to d_obtain_alias.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2008-10-23 05:13:01 -04:00
Al Viro 572c489215 [PATCH] sanitize blkdev_get() and friends
* get rid of fake struct file/struct dentry in __blkdev_get()
* merge __blkdev_get() and do_open()
* get rid of flags argument of blkdev_get()

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2008-10-21 07:49:06 -04:00
Al Viro 9a1c354276 [PATCH] pass fmode_t to blkdev_put()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2008-10-21 07:48:58 -04:00
Mark Fasheh 1efd47f873 ocfs2: fix build error
I merged the latest ocfs2_read_blocks() changes in xattr.c wrong. This makes
Ocfs2 compile again.

Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2008-10-14 18:31:46 -07:00
Joel Becker d4a8c93c82 ocfs2: Make cached block reads the common case.
ocfs2_read_blocks() currently requires the CACHED flag for cached I/O.
However, that's the common case.  Let's flip it around and provide an
IGNORE_CACHE flag for the special users.  This has the added benefit of
cleaning up the code some (ignore_cache takes on its special meaning
earlier in the loop).

Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2008-10-14 11:58:22 -07:00
Joel Becker 5e0b3dec01 ocfs2: Kill the last naked wait_on_buffer() for cached reads.
ocfs2's cached buffer I/O goes through ocfs2_read_block(s)().  dir.c had
a naked wait_on_buffer() to wait for some readahead, but it should
use ocfs2_read_block() instead.

Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2008-10-14 11:58:11 -07:00
Joel Becker 07446dc72c ocfs2: Move ocfs2_bread() into dir.c
dir.c is the only place using ocfs2_bread(), so let's make it static to
that file.

Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2008-10-14 11:58:03 -07:00
Joel Becker 0fcaa56a2a ocfs2: Simplify ocfs2_read_block()
More than 30 callers of ocfs2_read_block() pass exactly OCFS2_BH_CACHED.
Only six pass a different flag set.  Rather than have every caller care,
let's make ocfs2_read_block() take no flags and always do a cached read.
The remaining six places can call ocfs2_read_blocks() directly.

Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2008-10-14 11:51:57 -07:00
Joel Becker 31d33073ca ocfs2: Require an inode for ocfs2_read_block(s)().
Now that synchronous readers are using ocfs2_read_blocks_sync(), all
callers of ocfs2_read_blocks() are passing an inode.  Use it
unconditionally.  Since it's there, we don't need to pass the
ocfs2_super either.

Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2008-10-14 11:43:29 -07:00
Joel Becker da1e90985a ocfs2: Separate out sync reads from ocfs2_read_blocks()
The ocfs2_read_blocks() function currently handles sync reads, cached,
reads, and sometimes cached reads.  We're going to add some
functionality to it, so first we should simplify it.  The uncached,
synchronous reads are much easer to handle as a separate function, so we
instroduce ocfs2_read_blocks_sync().

Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2008-10-14 11:29:10 -07:00
Tao Ma 936b883436 ocfs2: Refactor xattr list and remove ocfs2_xattr_handler().
According to Christoph Hellwig's advice, we really don't need
a ->list to handle one xattr's list. Just a map from index to
xattr prefix is enough. And I also refactor the old list method
with the reference from fs/xfs/linux-2.6/xfs_xattr.c and the
xattr list method in btrfs.

Signed-off-by: Tao Ma <tao.ma@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2008-10-13 17:02:45 -07:00
Tao Ma 2057e5c678 ocfs2: Calculate EA hash only by its suffix.
According to Christoph Hellwig's advice, the hash value of EA
is only calculated by its suffix.

Signed-off-by: Tao Ma <tao.ma@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2008-10-13 17:02:44 -07:00
Mark Fasheh 99219aea68 ocfs2: Move trusted and user attribute support into xattr.c
Per Christoph Hellwig's suggestion - don't split these up. It's not like we
gained much by having the two tiny files around.

Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2008-10-13 17:02:44 -07:00
Mark Fasheh 40daa16a34 ocfs2: Uninline ocfs2_xattr_name_hash()
This is too big to be inlined.

Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2008-10-13 17:02:44 -07:00
Mark Fasheh a81cb88b64 ocfs2: Don't check for NULL before brelse()
This is pointless as brelse() already does the check.

Signed-off-by: Mark Fasheh
2008-10-13 17:02:44 -07:00
Mark Fasheh fd8351f83d ocfs2: use smaller counters in ocfs2_remove_xattr_clusters_from_cache
i and b_len don't really need to be u64's. Xattr extent lengths should be
limited by the VFS, and then the size of our on-disk length field.

Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2008-10-13 17:02:44 -07:00
Mark Fasheh 4cc8124584 ocfs2: make la_debug_mutex static
It can also be moved into ocfs2_la_debug_read().

Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2008-10-13 17:02:44 -07:00
Mark Fasheh 009d37502a ocfs2: Remove pointless !!
ocfs2_stack_supports_plocks() doesn't need this to properly return a zero or
one value.

Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2008-10-13 17:02:44 -07:00
Tao Ma 5a09561199 ocfs2: Add empty bucket support in xattr.
As Mark mentioned, it may be time-consuming when we remove the
empty xattr bucket, so this patch try to let empty bucket exist
in xattr operation. The modification includes:
1. Remove the functin of bucket and extent record deletion during
   xattr delete.
2. In xattr set:
   1) Don't clean the last entry so that if the bucket is empty,
      the hash value of the bucket is the hash value of the entry
      which is deleted last.
   2) During insert, if we meet with an empty bucket, just use the
      1st entry.
3. In binary search of xattr bucket, use the bucket hash value(which
   stored in the 1st xattr entry) to find the right place.

Signed-off-by: Tao Ma <tao.ma@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2008-10-13 17:02:43 -07:00
Tao Ma 06b240d8af ocfs2/xattr.c: Fix a bug when inserting xattr.
During the process of xatt insertion, we use binary search
to find the right place and "low" is set to it. But when
there is one xattr which has the same name hash as the inserted
one, low is the wrong value. So set it to the right position.

Signed-off-by: Tao Ma <tao.ma@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2008-10-13 17:02:43 -07:00
Sunil Mushran b0f73cfc36 ocfs2: Add xattr mount option in ocfs2_show_options()
Patch adds check for [no]user_xattr in ocfs2_show_options() that completes
the list of all mount options.

Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2008-10-13 17:02:43 -07:00
Joel Becker 2b4e30fbde ocfs2: Switch over to JBD2.
ocfs2 wants JBD2 for many reasons, not the least of which is that JBD is
limiting our maximum filesystem size.

It's a pretty trivial change.  Most functions are just renamed.  The
only functional change is moving to Jan's inode-based ordered data mode.
It's better, too.

Because JBD2 reads and writes JBD journals, this is compatible with any
existing filesystem.  It can even interact with JBD-based ocfs2 as long
as the journal is formated for JBD.

We provide a compatibility option so that paranoid people can still use
JBD for the time being.  This will go away shortly.

[ Moved call of ocfs2_begin_ordered_truncate() from ocfs2_delete_inode() to
  ocfs2_truncate_for_delete(). --Mark ]

Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2008-10-13 17:02:43 -07:00
Joel Becker 12462f1d9f ocfs2: Add the 'inode64' mount option.
Now that ocfs2 limits inode numbers to 32bits, add a mount option to
disable the limit.  This parallels XFS.  64bit systems can handle the
larger inode numbers.

[ Added description of inode64 mount option in ocfs2.txt. --Mark ]

Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2008-10-13 16:57:08 -07:00
Joel Becker 1187c96885 ocfs2: Limit inode allocation to 32bits.
ocfs2 inode numbers are block numbers.  For any filesystem with less
than 2^32 blocks, this is not a problem.  However, when ocfs2 starts
using JDB2, it will be able to support filesystems with more than 2^32
blocks.  This would result in inode numbers higher than 2^32.

The problem is that stat(2) can't handle those numbers on 32bit
machines.  The simple solution is to have ocfs2 allocate all inodes
below that boundary.

The suballoc code is changed to honor an optional block limit.  Only the
inode suballocator sets that limit - all other allocations stay unlimited.

The biggest trick is to grow the inode suballocator beneath that limit.
There's no point in allocating block groups that are above the limit,
then rejecting their elements later on.  We want to prevent the inode
allocator from ever having block groups above the limit.  This involves
a little gyration with the local alloc code.  If the local alloc window
is above the limit, it signals the caller to try the global bitmap but
does not disable the local alloc file (which can be used for other
allocations).

[ Minor cleanup - removed an ML_NOTICE comment. --Mark ]

Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2008-10-13 16:57:07 -07:00
Tao Ma 08413899db ocfs2: Resolve deadlock in ocfs2_xattr_free_block.
In ocfs2_xattr_free_block, we take a cluster lock on xb_alloc_inode while we
have a transaction open. This will deadlock the downconvert thread, so fix
it.

We can clean up how xattr blocks are removed while here - this patch also
moves the mechanism of releasing xattr block (including both value, xattr
tree and xattr block) into this function.

Signed-off-by: Tao Ma <tao.ma@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2008-10-13 16:57:06 -07:00
Tao Ma 28b8ca0b7f ocfs2: bug-fix for journal extend in xattr.
In ocfs2_extend_trans, when we can't extend the current
transaction, it will commit current transaction and restart
a new one. So if the previous credits we have allocated aren't
used(the block isn't dirtied before our extend), we will not
have enough credits for any future operation(it will cause jbd
complain and bug out). So check this and re-extend it.

Signed-off-by: Tao Ma <tao.ma@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2008-10-13 16:57:06 -07:00
Joel Becker 8d6220d6a7 ocfs2: Change ocfs2_get_*_extent_tree() to ocfs2_init_*_extent_tree()
The original get/put_extent_tree() functions held a reference on
et_root_bh.  However, every single caller already has a safe reference,
making the get/put cycle irrelevant.

We change ocfs2_get_*_extent_tree() to ocfs2_init_*_extent_tree().  It
no longer gets a reference on et_root_bh.  ocfs2_put_extent_tree() is
removed.  Callers now have a simpler init+use pattern.

Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2008-10-13 16:57:05 -07:00
Joel Becker 1625f8ac15 ocfs2: Comment struct ocfs2_extent_tree_operations.
struct ocfs2_extent_tree_operations provides methods for the different
on-disk btrees in ocfs2.  Describing what those methods do is probably a
good idea.

Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2008-10-13 16:57:05 -07:00
Joel Becker f99b9b7ccf ocfs2: Make ocfs2_extent_tree the first-class representation of a tree.
We now have three different kinds of extent trees in ocfs2: inode data
(dinode), extended attributes (xattr_tree), and extended attribute
values (xattr_value).  There is a nice abstraction for them,
ocfs2_extent_tree, but it is hidden in alloc.c.  All the calling
functions have to pick amongst a varied API and pass in type bits and
often extraneous pointers.

A better way is to make ocfs2_extent_tree a first-class object.
Everyone converts their object to an ocfs2_extent_tree() via the
ocfs2_get_*_extent_tree() calls, then uses the ocfs2_extent_tree for all
tree calls to alloc.c.

This simplifies a lot of callers, making for readability.  It also
provides an easy way to add additional extent tree types, as they only
need to be defined in alloc.c with a ocfs2_get_<new>_extent_tree()
function.

Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2008-10-13 16:57:05 -07:00
Joel Becker 1e61ee79e2 ocfs2: Add an insertion check to ocfs2_extent_tree_operations.
A couple places check an extent_tree for a valid inode.  We move that
out to add an eo_insert_check() operation.  It can be called from
ocfs2_insert_extent() and elsewhere.

We also have the wrapper calls ocfs2_et_insert_check() and
ocfs2_et_sanity_check() ignore NULL ops.  That way we don't have to
provide useless operations for xattr types.

Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2008-10-13 16:57:05 -07:00
Joel Becker 1a09f556e5 ocfs2: Create specific get_extent_tree functions.
A caller knows what kind of extent tree they have.  There's no reason
they have to call ocfs2_get_extent_tree() with a NULL when they could
just as easily call a specific function to their type of extent tree.

Introduce ocfs2_dinode_get_extent_tree(),
ocfs2_xattr_tree_get_extent_tree(), and
ocfs2_xattr_value_get_extent_tree().  They only take the necessary
arguments, calling into the underlying __ocfs2_get_extent_tree() to do
the real work.

__ocfs2_get_extent_tree() is the old ocfs2_get_extent_tree(), but
without needing any switch-by-type logic.

ocfs2_get_extent_tree() is now a wrapper around the specific calls.  It
exists because a couple alloc.c functions can take et_type.  This will
go later.

Another benefit is that ocfs2_xattr_value_get_extent_tree() can take a
struct ocfs2_xattr_value_root* instead of void*.  This gives us
typechecking where we didn't have it before.

Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2008-10-13 16:57:05 -07:00
Joel Becker 943cced39e ocfs2: Determine an extent tree's max_leaf_clusters in an et_op.
Provide an optional extent_tree_operation to specify the
max_leaf_clusters of an ocfs2_extent_tree.  If not provided, the value
is 0 (unlimited).

Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2008-10-13 16:57:04 -07:00
Joel Becker 1c25d93a4a ocfs2: Use struct ocfs2_extent_tree in ocfs2_num_free_extents().
ocfs2_num_free_extents() re-implements the logic of
ocfs2_get_extent_tree().  Now that ocfs2_get_extent_tree() does not
allocate, let's use it in ocfs2_num_free_extents() to simplify the code.

The inode validation code in ocfs2_num_free_extents() is not needed.
All callers are passing in pre-validated inodes.

Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2008-10-13 16:57:04 -07:00
Joel Becker 0ce1010f1a ocfs2: Provide the get_root_el() method to ocfs2_extent_tree_operations.
The root_el of an ocfs2_extent_tree needs to be calculated from
et->et_object.  Make it an operation on et->et_ops.

Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2008-10-13 16:57:04 -07:00
Joel Becker ea5efa1512 ocfs2: Make 'private' into 'object' on ocfs2_extent_tree.
The 'private' pointer was a way to store off xattr values, which don't
live at a set place in the bh.  But the concept of "the object
containing the extent tree" is much more generic.  For an inode it's the
struct ocfs2_dinode, for an xattr value its the value.  Let's save off
the 'object' at all times.  If NULL is passed to
ocfs2_get_extent_tree(), 'object' is set to bh->b_data;

Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2008-10-13 16:57:04 -07:00
Joel Becker dc0ce61af4 ocfs2: Make ocfs2_extent_tree get/put instead of alloc.
Rather than allocating a struct ocfs2_extent_tree, just put it on the
stack.  Fill it with ocfs2_get_extent_tree() and drop it with
ocfs2_put_extent_tree().  Now the callers don't have to ENOMEM, yet
still safely ref the root_bh.

Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2008-10-13 16:57:04 -07:00
Joel Becker ce1d9ea621 ocfs2: Prefix the ocfs2_extent_tree structure.
The members of the ocfs2_extent_tree structure gain a prefix of 'et_'.
All users are updated.

Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2008-10-13 16:57:04 -07:00
Joel Becker 35dc0aa3c5 ocfs2: Prefix the extent tree operations structure.
The ocfs2_extent_tree_operations structure gains a field prefix on its
members.  The ->eo_sanity_check() operation gains a wrapper function for
completeness.  All of the extent tree operation wrappers gain a
consistent name (ocfs2_et_*()).

Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2008-10-13 16:57:04 -07:00
Mark Fasheh ff1ec20ef6 ocfs2: fix printk format warnings
This patch fixes the following build warnings:

fs/ocfs2/xattr.c: In function 'ocfs2_half_xattr_bucket':
fs/ocfs2/xattr.c:3282: warning: format '%d' expects type 'int', but argument 7 has type 'long int'
fs/ocfs2/xattr.c:3282: warning: format '%d' expects type 'int', but argument 8 has type 'long int'
fs/ocfs2/xattr.c:3282: warning: format '%d' expects type 'int', but argument 7 has type 'long int'
fs/ocfs2/xattr.c:3282: warning: format '%d' expects type 'int', but argument 8 has type 'long int'
fs/ocfs2/xattr.c:3282: warning: format '%d' expects type 'int', but argument 7 has type 'long int'
fs/ocfs2/xattr.c:3282: warning: format '%d' expects type 'int', but argument 8 has type 'long int'
fs/ocfs2/xattr.c: In function 'ocfs2_xattr_set_entry_in_bucket':
fs/ocfs2/xattr.c:4092: warning: format '%d' expects type 'int', but argument 6 has type 'size_t'
fs/ocfs2/xattr.c:4092: warning: format '%d' expects type 'int', but argument 6 has type 'size_t'
fs/ocfs2/xattr.c:4092: warning: format '%d' expects type 'int', but argument 6 has type 'size_t'

Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2008-10-13 16:57:03 -07:00
Tiger Yang 8154da3d21 ocfs2: Add incompatible flag for extended attribute
This patch adds the s_incompat flag for extended attribute support. This
helps us ensure that older versions of Ocfs2 or ocfs2-tools will not be able
to mount a volume with xattr support.

Signed-off-by: Tiger Yang <tiger.yang@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2008-10-13 16:57:03 -07:00
Tao Ma a394425643 ocfs2: Delete all xattr buckets during inode removal
In inode removal, we need to iterate all the buckets, remove any
externally-stored EA values and delete the xattr buckets.

Signed-off-by: Tao Ma <tao.ma@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2008-10-13 16:57:03 -07:00
Tao Ma 012255961c ocfs2: Enable xattr set in index btree
Where the previous patches added the ability of list/get xattr in buckets
for ocfs2, this patch enables ocfs2 to store large numbers of EAs.

The original design doc is written by Mark Fasheh, and it can be found in
http://oss.oracle.com/osswiki/OCFS2/DesignDocs/IndexedEATrees. I only had to
make small modifications to it.

First, because the bucket size is 4K, a new field named xh_free_start is added
in ocfs2_xattr_header to indicate the next valid name/value offset in a bucket.
It is used when we store new EA name/value. With this field, we can find the
place more quickly and what's more, we don't need to sort the name/value every
time to let the last entry indicate the next unused space. This makes the
insert operation more efficient for blocksizes smaller than 4k.

Because of the new xh_free_start, another field named as xh_name_value_len is
also added in ocfs2_xattr_header. It records the total length of all the
name/values in the bucket. We need this so that we can check it and defragment
the bucket if there is not enough contiguous free space.

An xattr insertion looks like this:
1. xattr_index_block_find: find the right bucket by the name_hash, say bucketA.
2. check whether there is enough space in bucketA. If yes, insert it directly
   and modify xh_free_start and xh_name_value_len accordingly. If not, check
   xh_name_value_len to see whether we can store this by defragment the bucket.
   If yes, defragment it and go on insertion.
3. If defragement doesn't work, check whether there is new empty bucket in
   the clusters within this extent record. If yes, init the new bucket and move
   all the buckets after bucketA one by one to the next bucket. Move half of the
   entries in bucketA to the next bucket and go on insertion.
4. If there is no new bucket, grow the extent tree.

As for xattr deletion, we will delete an xattr bucket when all it's xattrs
are removed and move all the buckets after it to the previous one. When all
the xattr buckets in an extend record are freed, free this extend records
from ocfs2_xattr_tree.

Signed-off-by: Tao Ma <tao.ma@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2008-10-13 16:57:03 -07:00
Tao Ma ca12b7c489 ocfs2: Optionally limit extent size in ocfs2_insert_extent()
In xattr bucket, we want to limit the maximum size of a btree leaf,
otherwise we'll lose the benefits of hashing because we'll have to search
large leaves.

So add a new field in ocfs2_extent_tree which indicates the maximum leaf cluster
size we want so that we can prevent ocfs2_insert_extent() from merging the leaf
record even if it is contiguous with an adjacent record.

Other btree types are not affected by this change.

Signed-off-by: Tao Ma <tao.ma@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2008-10-13 16:57:03 -07:00
Tao Ma 589dc2602f ocfs2: Add xattr lookup code xattr btrees
Add code to lookup a given extended attribute in the xattr btree. Lookup
follows this general scheme:

1. Use ocfs2_xattr_get_rec to find the xattr extent record

2. Find the xattr bucket within the extent which may contain this xattr

3. Iterate the bucket to find the xattr. In ocfs2_xattr_block_get(), we need
   to recalcuate the block offset and name offset for the right position of
   name/value.

Signed-off-by: Tao Ma <tao.ma@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2008-10-13 16:57:03 -07:00
Tao Ma 0c044f0b24 ocfs2: Add xattr bucket iteration for large numbers of EAs
Ocfs2 breaks up xattr index tree leaves into 4k regions, called buckets.
Attributes are stored within a given bucket, depending on hash value.

After a discussion with Mark, we decided that the per-bucket index
(xe_entry[]) would only exist in the 1st block of a bucket. Likewise,
name/value pairs will not straddle more than one block. This allows the
majority of operations to work directly on the buffer heads in a leaf block.

This patch adds code to iterate the buckets in an EA. A new abstration of
ocfs2_xattr_bucket is added. It records the bhs in this bucket and
ocfs2_xattr_header. This keeps the code neat, improving readibility.

Signed-off-by: Tao Ma <tao.ma@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2008-10-13 16:57:03 -07:00
Tao Ma ba492615f0 ocfs2: Add xattr index tree operations
When necessary, an ocfs2_xattr_block will embed an ocfs2_extent_list to
store large numbers of EAs. This patch adds a new type in
ocfs2_extent_tree_type and adds the implementation so that we can re-use the
b-tree code to handle the storage of many EAs.

Signed-off-by: Tao Ma <tao.ma@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2008-10-13 16:57:02 -07:00
Tiger Yang cf1d6c763f ocfs2: Add extended attribute support
This patch implements storing extended attributes both in inode or a single
external block. We only store EA's in-inode when blocksize > 512 or that
inode block has free space for it. When an EA's value is larger than 80
bytes, we will store the value via b-tree outside inode or block.

Signed-off-by: Tiger Yang <tiger.yang@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2008-10-13 16:57:02 -07:00
Tiger Yang fdd77704a8 ocfs2: reserve inline space for extended attribute
Add the structures and helper functions we want for handling inline extended
attributes. We also update the inline-data handlers so that they properly
function in the event that we have both inline data and inline attributes
sharing an inode block.

Signed-off-by: Tiger Yang <tiger.yang@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2008-10-13 16:57:02 -07:00
Tao Ma f56654c435 ocfs2: Add extent tree operation for xattr value btrees
Add some thin wrappers around ocfs2_insert_extent() for each of the 3
different btree types, ocfs2_inode_insert_extent(),
ocfs2_xattr_value_insert_extent() and ocfs2_xattr_tree_insert_extent(). The
last is for the xattr index btree, which will be used in a followup patch.

All the old callers in file.c etc will call ocfs2_dinode_insert_extent(),
while the other two handle the xattr issue. And the init of extent tree are
handled by these functions.

When storing xattr value which is too large, we will allocate some clusters
for it and here ocfs2_extent_list and ocfs2_extent_rec will also be used. In
order to re-use the b-tree operation code, a new parameter named "private"
is added into ocfs2_extent_tree and it is used to indicate the root of
ocfs2_exent_list. The reason is that we can't deduce the root from the
buffer_head now. It may be in an inode, an ocfs2_xattr_block or even worse,
in any place in an ocfs2_xattr_bucket.

Signed-off-by: Tao Ma <tao.ma@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2008-10-13 16:57:01 -07:00
Tao Ma ac11c82719 ocfs2: Add helper function in uptodate.c for removing xattr clusters
The old uptodate only handles the issue of removing one buffer_head from
ocfs2 inode's buffer cache. With xattr clusters, we may need to remove
multiple buffer_head's at a time.

Signed-off-by: Tao Ma <tao.ma@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2008-10-13 13:57:59 -07:00
Tao Ma 5a7bc8eb29 ocfs2: Add the basic xattr disk layout in ocfs2_fs.h
Ocfs2 uses a very flexible structure for storing extended attributes on
disk. Small amount of attributes are stored directly in the inode block - up
to 256 bytes worth. If that fills up, attributes are also stored in an
external block, linked to from the inode block. That block can in turn
expand to a btree, capable of storing large numbers of attributes.

Individual attribute values are stored inline if they're small enough
(currently about 80 bytes, this can be changed though), and otherwise are
expanded to a btree. The theoretical limit to the size of an individual
attribute is about the same as an inode, though the kernel's upper bound on
the size of an attributes data is far smaller.

Signed-off-by: Tao Ma <tao.ma@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2008-10-13 13:57:59 -07:00
Tao Ma 0eb8d47e69 ocfs2: Make high level btree extend code generic
Factor out the non-inode specifics of ocfs2_do_extend_allocation() into a more generic
function, ocfs2_do_cluster_allocation(). ocfs2_do_extend_allocation calls
ocfs2_do_cluster_allocation() now, but the latter can be used for other
btree types as well.

Signed-off-by: Tao Ma <tao.ma@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2008-10-13 13:57:59 -07:00
Tao Ma e7d4cb6bc1 ocfs2: Abstract ocfs2_extent_tree in b-tree operations.
In the old extent tree operation, we take the hypothesis that we
are using the ocfs2_extent_list in ocfs2_dinode as the tree root.
As xattr will also use ocfs2_extent_list to store large value
for a xattr entry, we refactor the tree operation so that xattr
can use it directly.

The refactoring includes 4 steps:
1. Abstract set/get of last_eb_blk and update_clusters since they may
   be stored in different location for dinode and xattr.
2. Add a new structure named ocfs2_extent_tree to indicate the
   extent tree the operation will work on.
3. Remove all the use of fe_bh and di, use root_bh and root_el in
   extent tree instead. So now all the fe_bh is replaced with
   et->root_bh, el with root_el accordingly.
4. Make ocfs2_lock_allocators generic. Now it is limited to be only used
   in file extend allocation. But the whole function is useful when we want
   to store large EAs.

Note: This patch doesn't touch ocfs2_commit_truncate() since it is not used
for anything other than truncate inode data btrees.

Signed-off-by: Tao Ma <tao.ma@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2008-10-13 13:57:58 -07:00
Tao Ma 811f933df1 ocfs2: Use ocfs2_extent_list instead of ocfs2_dinode.
ocfs2_extend_meta_needed(), ocfs2_calc_extend_credits() and
ocfs2_reserve_new_metadata() are all useful for extent tree operations. But
they are all limited to an inode btree because they use a struct
ocfs2_dinode parameter. Change their parameter to struct ocfs2_extent_list
(the part of an ocfs2_dinode they actually use) so that the xattr btree code
can use these functions.

Signed-off-by: Tao Ma <tao.ma@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2008-10-13 13:57:58 -07:00
Tao Ma 231b87d109 ocfs2: Modify ocfs2_num_free_extents for future xattr usage.
ocfs2_num_free_extents() is used to find the number of free extent records
in an inode btree. Hence, it takes an "ocfs2_dinode" parameter. We want to
use this for extended attribute trees in the future, so genericize the
interface the take a buffer head. A future patch will allow that buffer_head
to contain any structure rooting an ocfs2 btree.

Signed-off-by: Tao Ma <tao.ma@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2008-10-13 13:57:58 -07:00
Mark Fasheh 9a8ff578fb ocfs2: track local alloc state via debugfs
A per-mount debugfs file, "local_alloc" is created which when read will
expose live state of the nodes local alloc file. Performance impact is
minimal, only a bit of memory overhead per mount point. Still, the code is
hidden behind CONFIG_OCFS2_FS_STATS. This feature will help us debug
local alloc performance problems on a live system.

Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2008-10-13 13:57:58 -07:00
Mark Fasheh 9c7af40b21 ocfs2: throttle back local alloc when low on disk space
Ocfs2's local allocator disables itself for the duration of a mount point
when it has trouble allocating a large enough area from the primary bitmap.
That can cause performance problems, especially for disks which were only
temporarily full or fragmented. This patch allows for the allocator to
shrink it's window first, before being disabled. Later, it can also be
re-enabled so that any performance drop is minimized.

To do this, we allow the value of osb->local_alloc_bits to be shrunk when
needed. The default value is recorded in a mostly read-only variable so that
we can re-initialize when required.

Locking had to be updated so that we could protect changes to
local_alloc_bits. Mostly this involves protecting various local alloc values
with the osb spinlock. A new state is also added, OCFS2_LA_THROTTLED, which
is used when the local allocator is has shrunk, but is not disabled. If the
available space dips below 1 megabyte, the local alloc file is disabled. In
either case, local alloc is re-enabled 30 seconds after the event, or when
an appropriate amount of bits is seen in the primary bitmap.

Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2008-10-13 13:57:57 -07:00
Mark Fasheh ebcee4b5c9 ocfs2: Track local alloc bits internally
Do this instead of tracking absolute local alloc size. This avoids
needless re-calculatiion of bits from bytes in localalloc.c. Additionally,
the value is now in a more natural unit for internal file system bitmap
work.

Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2008-10-13 13:57:57 -07:00
Mark Fasheh 53da4939f3 ocfs2: POSIX file locks support
This is actually pretty easy since fs/dlm already handles the bulk of the
work. The Ocfs2 userspace cluster stack module already uses fs/dlm as the
underlying lock manager, so I only had to add the right calls.

Cluster-aware POSIX locks ("plocks") can be turned off by the same means at
UNIX locks - mount with 'noflocks', or create a local-only Ocfs2 volume.
Internally, the file system uses two sets of file_operations, depending on
whether cluster aware plocks is required. This turns out to be easier than
implementing local-only versions of ->lock.

Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2008-10-13 13:57:57 -07:00
Steven Whitehouse a447c09324 vfs: Use const for kernel parser table
This is a much better version of a previous patch to make the parser
tables constant. Rather than changing the typedef, we put the "const" in
all the various places where its required, allowing the __initconst
exception for nfsroot which was the cause of the previous trouble.

This was posted for review some time ago and I believe its been in -mm
since then.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Cc: Alexander Viro <aviro@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-10-13 10:10:37 -07:00
Mark Fasheh 00dc417fa3 ocfs2: fiemap support
Plug ocfs2 into ->fiemap. Some portions of ocfs2_get_clusters() had to be
refactored so that the extent cache can be skipped in favor of going
directly to the on-disk records. This makes it easier for us to determine
which extent is the last one in the btree. Also, I'm not sure we want to be
caching fiemap lookups anyway as they're not directly related to data
read/write.

Signed-off-by: Mark Fasheh <mfasheh@suse.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Cc: ocfs2-devel@oss.oracle.com
Cc: linux-fsdevel@vger.kernel.org
2008-10-03 17:32:11 -04:00
Tao Ma 0e116227a0 ocfs2: Fix a bug in direct IO read.
ocfs2 will become read-only if we try to read the bytes which pass
the end of i_size. This can be easily reproduced by following steps:
1. mkfs a ocfs2 volume with bs=4k cs=4k and nosparse.
2. create a small file(say less than 100 bytes) and we will create the file
   which is allocated 1 cluster.
3. read 8196 bytes from the kernel using O_DIRECT which exceeds the limit.
4. The ocfs2 volume becomes read-only and dmesg shows:
OCFS2: ERROR (device sda13): ocfs2_direct_IO_get_blocks:
Inode 66010 has a hole at block 1
File system is now read-only due to the potential of on-disk corruption.
Please run fsck.ocfs2 once the file system is unmounted.

So suppress the ERROR message.

Signed-off-by: Tao Ma <tao.ma@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2008-09-10 01:44:08 -07:00
Joel Becker d6817cdbd1 ocfs2: Increment the reference count of an already-active stack.
The ocfs2_stack_driver_request() function failed to increment the
refcount of an already-active stack.  It only did the increment on the
first reference.  Whoops.

Signed-off-by: Joel Becker <joel.becker@oracle.com>
Tested-by: Marcos Matsunaga <marcos.matsunaga@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2008-08-25 07:29:47 -07:00