Commit Graph

19326 Commits

Author SHA1 Message Date
Al Viro 39b743c619 switch stat_file() to passing a single struct rather than fsckloads of pointers
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-08-09 16:48:10 -04:00
Al Viro 5e2df28cc6 hostfs: pass pathname to init_inode()
We will calculate it in all callers anyway, so there's no
need to duplicate that inside.  Moreover, that way we lose
all failure exits in init_inode(), so it doesn't need to
return anything.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-08-09 16:48:09 -04:00
Al Viro 52b209f7b8 get rid of hostfs_read_inode()
There are only two call sites; in one (hostfs_iget()) it's actually
a no-op and in another (fill_super()) it's easier to expand the
damn thing and use what we know about its arguments to simplify
it.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-08-09 16:48:08 -04:00
Al Viro 601d2c38b9 hostfs: don't keep a field in each inode when we are using it only in root
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-08-09 16:48:07 -04:00
Al Viro e971a6d7b9 stop icache pollution in hostfs, switch to ->evict_inode()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-08-09 16:48:06 -04:00
Al Viro f053ddde75 switch affs to ->evict_inode()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-08-09 16:48:06 -04:00
Al Viro 69c9e75017 switch omfs to ->evict_inode()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-08-09 16:48:05 -04:00
Al Viro 9df2f85128 switch bfs to ->evict_inode(), clean up
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-08-09 16:48:04 -04:00
Al Viro ac14a95b52 convert ext3 to ->evict_inode()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-08-09 16:48:03 -04:00
Al Viro 58e8268c7b switch ufs to ->evict_inode()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-08-09 16:48:02 -04:00
Al Viro deee3ce466 covert fatfs to ->evict_inode()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-08-09 16:48:01 -04:00
Al Viro d3b4f9ae18 switch smbfs to evict_inode()
NB: treatment of inode hash is completely braindead there

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-08-09 16:48:00 -04:00
Al Viro d299eadc09 switch sysv to ->evict_inode()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-08-09 16:47:59 -04:00
Al Viro 72edc4d087 merge ext2 delete_inode and clear_inode, switch to ->evict_inode()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-08-09 16:47:57 -04:00
Al Viro 3937871d91 Don't dirty the victim in ext2_xattr_delete_inode()
... it's beyond fs-writeback reach already - writeback won't
be started at that point.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-08-09 16:47:56 -04:00
Al Viro addacc7d6f Take dirtying the inode to callers of ext2_free_blocks()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-08-09 16:47:55 -04:00
Al Viro 3889717d28 ext2: switch to dquot_free_block_nodirty()
brute-force conversion

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-08-09 16:47:54 -04:00
Al Viro 5ccb4a78d8 switch minix to ->evict_inode(), fix write_inode/delete_inode race
We need to wait for completion of possible writeback in progress
before we clear on-disk inode during deletion.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-08-09 16:47:53 -04:00
Al Viro 01cd9fef6e switch sysfs to ->evict_inode()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-08-09 16:47:53 -04:00
Al Viro 8267952b36 switch procfs to ->evict_inode()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-08-09 16:47:52 -04:00
Al Viro 77b8a75f5b simplify get_cramfs_inode()
simply don't hash the inodes that don't have real inumber instead of
skipping them during iget5_locked(); as the result, simple iget_locked()
would do and we can get rid of cramfs ->drop_inode() as well.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-08-09 16:47:51 -04:00
Al Viro b0683aa638 new helper: end_writeback()
Essentially, the minimal variant of ->evict_inode().  It's
a trimmed-down clear_inode(), sans any fs callbacks.  Once
it returns we know that no async writeback will be happening;
every ->evict_inode() instance should do that once and do that
before doing anything ->write_inode() could interfere with
(e.g. freeing the on-disk inode).

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-08-09 16:47:49 -04:00
Al Viro 661074e91b Take ->i_bdev/->i_cdev handling out of clear_inode()
All call chains to clear_inode() pass through evict_inode() and
clear_inode() should be called by evict_inode() exactly once.
So we can pull i_bdev/i_cdev detaching up to evict_inode() itself.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-08-09 16:47:48 -04:00
Al Viro c6287315cb generic_detach_inode() can be static now
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-08-09 16:47:48 -04:00
Al Viro 2bbbda308f switch hugetlbfs to ->evict_inode()
The first spoils - hugetlb can use default ->drop_inode() now.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-08-09 16:47:47 -04:00
Al Viro be7ce4161f New method - evict_inode()
Hybrid of ->clear_inode() and ->delete_inode(); if present, does
all fs work to be done when in-core inode is about to be gone,
for whatever reason.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-08-09 16:47:46 -04:00
Al Viro b4272d4c81 unify fs/inode.c callers of clear_inode()
For now, just a straightforward merge

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-08-09 16:47:45 -04:00
Al Viro a4ffdde6e5 simplify checks for I_CLEAR/I_FREEING
add I_CLEAR instead of replacing I_FREEING with it.  I_CLEAR is
equivalent to I_FREEING for almost all code looking at either;
it's there to keep track of having called clear_inode() exactly
once per inode lifetime, at some point after having set I_FREEING.
I_CLEAR and I_FREEING never get set at the same time with the
current code, so we can switch to setting i_flags to I_FREEING | I_CLEAR
instead of I_CLEAR without loss of information.  As the result of
such change, checks become simpler and the amount of code that needs
to know about I_CLEAR shrinks a lot.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-08-09 16:47:44 -04:00
Al Viro b5fc510c48 get rid of file_fsync()
Copy and simplify in the only two users remaining.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-08-09 16:47:43 -04:00
Christoph Hellwig fa9b227e90 xfs: new truncate sequence
Convert XFS to the new truncate sequence.  We still can have errors after
updating the file size in xfs_setattr, but these are real I/O errors and lead
to a transaction abort and filesystem shutdown, so they are not an issue.

Errors from ->write_begin and write_end can now be handled correctly because
we can actually get rid of the delalloc extents while previous the buffer
state was stipped in block_invalidatepage.

There is still no error handling for ->direct_IO, because doing so will need
some major restructuring given that we only have the iolock shared and do not
hold i_mutex at all.  Fortunately leaving the normally allocated blocks behind
there is not a major issue and this will get cleaned up by xfs_free_eofblock
later.

Note: the patch is against Al's vfs.git tree as that contains the nessecary
preparations.  I'd prefer to get it applied there so that we can get some
testing in linux-next.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-08-09 16:47:42 -04:00
Boaz Harrosh 2f246fd0f1 exofs: New truncate sequence
These changes are crafted based on the similar
conversion done to ext2 by Nick Piggin.

* Remove the deprecated ->truncate vector. Let exofs_setattr
  take care of on-disk size updates.
* Call truncate_pagecache on the unused pages if
  write_begin/end fails.
* Cleanup exofs_delete_inode that did stupid inode
  writes and updates on an inode that will be
  removed.
* And finally get rid of exofs_get_block. We never
  had any blocks it was all for calling nobh_truncate_page.
  nobh_truncate_page is not actually needed in exofs since
  the last page is complete and gone, just like all the other
  pages. There is no partial blocks in exofs.

I've tested with this patch, and there are no apparent
failures, so far.

CC: Nick Piggin <npiggin@suse.de>
CC: Christoph Hellwig <hch@lst.de>
Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-08-09 16:47:41 -04:00
Al Viro 41cce647f8 jffs2: don't open-code iget_failed()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-08-09 16:47:41 -04:00
Christoph Hellwig 2c27c65ed0 check ATTR_SIZE contraints in inode_change_ok
Make sure we check the truncate constraints early on in ->setattr by adding
those checks to inode_change_ok.  Also clean up and document inode_change_ok
to make this obvious.

As a fallout we don't have to call inode_newsize_ok from simple_setsize and
simplify it down to a truncate_setsize which doesn't return an error.  This
simplifies a lot of setattr implementations and means we use truncate_setsize
almost everywhere.  Get rid of fat_setsize now that it's trivial and mark
ext2_setsize static to make the calling convention obvious.

Keep the inode_newsize_ok in vmtruncate for now as all callers need an
audit for its removal anyway.

Note: setattr code in ecryptfs doesn't call inode_change_ok at all and
needs a deeper audit, but that is left for later.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-08-09 16:47:39 -04:00
Christoph Hellwig db78b877f7 always call inode_change_ok early in ->setattr
Make sure we call inode_change_ok before doing any changes in ->setattr,
and make sure to call it even if our fs wants to ignore normal UNIX
permissions, but use the ATTR_FORCE to skip those.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-08-09 16:47:38 -04:00
Christoph Hellwig 1025774ce4 remove inode_setattr
Replace inode_setattr with opencoded variants of it in all callers.  This
moves the remaining call to vmtruncate into the filesystem methods where it
can be replaced with the proper truncate sequence.

In a few cases it was obvious that we would never end up calling vmtruncate
so it was left out in the opencoded variant:

 spufs: explicitly checks for ATTR_SIZE earlier
 btrfs,hugetlbfs,logfs,dlmfs: explicitly clears ATTR_SIZE earlier
 ufs: contains an opencoded simple_seattr + truncate that sets the filesize just above

In addition to that ncpfs called inode_setattr with handcrafted iattrs,
which allowed to trim down the opencoded variant.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-08-09 16:47:37 -04:00
Christoph Hellwig eef2380c18 default to simple_setattr
With the new truncate sequence every filesystem that wants to support file
size changes on disk needs to implement its own ->setattr.  So instead
of calling inode_setattr which supports size changes call into a simple
method that doesn't support this.  simple_setattr is almost what we
want except that it does not mark the inode dirty after changes.  Given
that marking the inode dirty is a no-op for the simple in-memory filesystems
that use simple_setattr currently just add the mark_inode_dirty call.

Also add a WARN_ON for the presence of a truncate method to simple_setattr
to catch new instances of it during the transition period.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-08-09 16:47:36 -04:00
Christoph Hellwig 6a1a90ad1b rename generic_setattr
Despite its name it's now a generic implementation of ->setattr, but
rather a helper to copy attributes from a struct iattr to the inode.
Rename it to setattr_copy to reflect this fact.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-08-09 16:47:35 -04:00
Christoph Hellwig d39aae9ec4 add missing setattr methods
For the new truncate sequence every filesystem that wants to truncate on-disk
state needs a seattr method.  Convert the remaining filesystems that implement
the truncate inode operation to have its own setattr method.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-08-09 16:47:34 -04:00
Christoph Hellwig 155130a4f7 get rid of block_write_begin_newtrunc
Move the call to vmtruncate to get rid of accessive blocks to the callers
in preparation of the new truncate sequence and rename the non-truncating
version to block_write_begin.

While we're at it also remove several unused arguments to block_write_begin.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-08-09 16:47:33 -04:00
Christoph Hellwig 6e1db88d53 introduce __block_write_begin
Split up the block_write_begin implementation - __block_write_begin is a new
trivial wrapper for block_prepare_write that always takes an already
allocated page and can be either called from block_write_begin or filesystem
code that already has a page allocated.  Remove the handling of already
allocated pages from block_write_begin after switching all callers that
do it to __block_write_begin.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-08-09 16:47:32 -04:00
Christoph Hellwig f4e420dc42 clean up write_begin usage for directories in pagecache
For filesystem that implement directories in pagecache we call
block_write_begin with an already allocated page for this code, while the
normal regular file write path uses the default block_write_begin behaviour.

Get rid of the __foofs_write_begin helper and opencode the normal write_begin
call in foofs_write_begin, while adding a new foofs_prepare_chunk helper for
the directory code.  The added benefit is that foofs_prepare_chunk has
a much saner calling convention.

Note that the interruptible flag passed into block_write_begin is always
ignored if we already pass in a page (see next patch for details), and
we never were doing truncations of exessive blocks for this case either so we
can switch directly to block_write_begin_newtrunc.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-08-09 16:47:31 -04:00
Christoph Hellwig 282dc17884 get rid of cont_write_begin_newtrunc
Move the call to vmtruncate to get rid of accessive blocks to the callers
in preparation of the new truncate sequence and rename the non-truncating
version to cont_write_begin.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-08-09 16:47:31 -04:00
Christoph Hellwig ea0f04e595 get rid of nobh_write_begin_newtrunc
Move the call to vmtruncate to get rid of accessive blocks to the only
remaining caller and rename the non-truncating version to nobh_write_begin.

Get rid of the superflous file argument to it while we're at it.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-08-09 16:47:30 -04:00
Christoph Hellwig eafdc7d190 sort out blockdev_direct_IO variants
Move the call to vmtruncate to get rid of accessive blocks to the callers
in prepearation of the new truncate calling sequence.  This was only done
for DIO_LOCKING filesystems, so the __blockdev_direct_IO_newtrunc variant
was not needed anyway.  Get rid of blockdev_direct_IO_no_locking and
its _newtrunc variant while at it as just opencoding the two additional
paramters is shorted than the name suffix.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-08-09 16:47:29 -04:00
Al Viro 256249584b fix leak in __logfs_create()
if kmalloc fails, we still need to drop the inode, as we do
on other failure exits.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-08-09 16:47:28 -04:00
Al Viro 0e4f6a791b Fix reiserfs_file_release()
a) count file openers correctly; i_count use was completely wrong
b) use new mutex for exclusion between final close/open/truncate,
to protect tailpacking logics.  i_mutex use was wrong and resulted
in deadlocks.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-08-09 16:47:27 -04:00
Al Viro 918377b696 missing include in hppfs
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-08-09 16:47:26 -04:00
Al Viro 005a59ec74 Deal with missing exports for hostfs
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-08-09 16:47:25 -04:00
Lino Sanfilippo 21edad3220 ecryptfs: dont call lookup_one_len to avoid NULL nameidata
I have encountered the same problem that Eric Sandeen described in
this post

 http://lkml.org/lkml/fancy/2010/4/23/467

while experimenting with stackable filesystems.

The reason seems to be that ecryptfs calls lookup_one_len() to get the
lower dentry, which in turn calls the lower parent dirs d_revalidate()
with a NULL nameidata object.
If ecryptfs is the underlaying filesystem, the NULL pointer dereference
occurs, since ecryptfs is not prepared to handle a NULL nameidata.

I know that this cant happen any more, since it is no longer allowed to
mount ecryptfs upon itself.

But maybe this patch it useful nevertheless, since the problem would still
apply for an underlaying filesystem that implements d_revalidate() and is
not prepared to handle a NULL nameidata (I dont know if there actually
is such a fs).

With this patch (against 2.6.35-rc5) ecryptfs uses the vfs_lookup_path()
function instead of lookup_one_len() which ensures that the nameidata
passed to the lower filesystems d_revalidate().

Signed-off-by: Lino Sanfilippo <LinoSanfilippo@gmx.de>
Signed-off-by: Tyler Hicks <tyhicks@linux.vnet.ibm.com>
2010-08-09 13:42:12 -05:00
Julia Lawall ceeab92971 fs/ecryptfs/file.c: introduce missing free
The comments in the code indicate that file_info should be released if the
function fails.  This releasing is done at the label out_free, not out.

The semantic match that finds this problem is as follows:
(http://www.emn.fr/x-info/coccinelle/)

// <smpl>
@r exists@
local idexpression x;
statement S;
expression E;
identifier f,f1,l;
position p1,p2;
expression *ptr != NULL;
@@

x@p1 = kmem_cache_zalloc(...);
...
if (x == NULL) S
<... when != x
     when != if (...) { <+...x...+> }
(
x->f1 = E
|
 (x->f1 == NULL || ...)
|
 f(...,x->f1,...)
)
...>
(
 return <+...x...+>;
|
 return@p2 ...;
)

@script:python@
p1 << r.p1;
p2 << r.p2;
@@

print "* file: %s kmem_cache_zalloc %s" % (p1[0].file,p1[0].line)
// </smpl>

Signed-off-by: Julia Lawall <julia@diku.dk>
Cc: stable@kernel.org
Signed-off-by: Tyler Hicks <tyhicks@linux.vnet.ibm.com>
2010-08-09 13:25:24 -05:00
Lino Sanfilippo 31f73bee3e ecryptfs: release reference to lower mount if interpose fails
In ecryptfs_lookup_and_interpose_lower() the lower mount is not decremented
if allocation of a dentry info struct failed. As a result the lower filesystem
cant be unmounted any more (since it is considered busy). This patch corrects
the reference counting.

Signed-off-by: Lino Sanfilippo <LinoSanfilippo@gmx.de>
Cc: stable@kernel.org
Signed-off-by: Tyler Hicks <tyhicks@linux.vnet.ibm.com>
2010-08-09 10:33:05 -05:00
Tyler Hicks c43f7b8fb0 eCryptfs: Handle ioctl calls with unlocked and compat functions
Lower filesystems that only implemented unlocked_ioctl weren't being
passed ioctl calls because eCryptfs only checked for
lower_file->f_op->ioctl and returned -ENOTTY if it was NULL.

eCryptfs shouldn't implement ioctl(), since it doesn't require the BKL.
This patch introduces ecryptfs_unlocked_ioctl() and
ecryptfs_compat_ioctl(), which passes the calls on to the lower file
system.

https://bugs.launchpad.net/ecryptfs/+bug/469664

Reported-by: James Dupin <james.dupin@gmail.com>
Cc: stable@kernel.org
Signed-off-by: Tyler Hicks <tyhicks@linux.vnet.ibm.com>
2010-08-09 10:33:04 -05:00
Prarit Bhargava a1275c3b21 ecryptfs: Fix warning in ecryptfs_process_response()
Fix warning seen with "make -j24 CONFIG_DEBUG_SECTION_MISMATCH=y V=1":

fs/ecryptfs/messaging.c: In function 'ecryptfs_process_response':
fs/ecryptfs/messaging.c:276: warning: 'daemon' may be used uninitialized in this function

Signed-off-by: Prarit Bhargava <prarit@redhat.com>
Signed-off-by: Tyler Hicks <tyhicks@linux.vnet.ibm.com>
2010-08-09 10:33:03 -05:00
Frederic Weisbecker d9a145fb6e Merge commit 'linus/master' into bkl/core
Merge reason: The staging tree has introduced the easycap
driver lately. We need the latest updates to pushdown the
bkl in its ioctl helper.
2010-08-09 02:14:15 +02:00
Phillip Lougher 66048c381b Squashfs: fix checkpatch.pl warnings
Checkpatch.pl in 2.6.34 added a check for spaces between tabs.

Signed-off-by: Phillip Lougher <phillip@lougher.demon.co.uk>
2010-08-08 22:29:33 +00:00
Arnd Bergmann c9243f5bdd autofs/autofs4: Move compat_ioctl handling into fs
Handling of autofs ioctl numbers does not need to be generic
and can easily be done directly in autofs itself.

This also pushes the BKL into autofs and autofs4 ioctl
methods.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: H. Peter Anvin <hpa@zytor.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Ian Kent <raven@themaw.net>
Cc: Autofs <autofs@linux.kernel.org>
Cc: John Kacur <jkacur@redhat.com>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
2010-08-09 00:13:34 +02:00
David Woodhouse 6ae0185fe2 mtd: Remove obsolete <mtd/compatmac.h> include
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
2010-08-08 21:19:42 +01:00
Bill Pemberton ffc1887990 omfs: fix uninitialized variable warning
quiet the warning:
fs/omfs/file.c: In function 'omfs_get_block':
fs/omfs/file.c:225: warning: 'new_block' may be used uninitialized in
this function

new_block is used properly by the call to omfs_grow_extent()

Signed-off-by: Bill Pemberton <wfp5p@virginia.edu>
Signed-off-by: Bob Copeland <me@bobcopeland.com>
2010-08-08 12:02:05 -04:00
David Woodhouse 6088c05877 jffs2: Update copyright notices
Signed-off-by: David Woodhouse <dwmw2@infradead.org>
2010-08-08 14:15:22 +01:00
Linus Torvalds 78417334b5 Merge branch 'bkl/core' of git://git.kernel.org/pub/scm/linux/kernel/git/frederic/random-tracing
* 'bkl/core' of git://git.kernel.org/pub/scm/linux/kernel/git/frederic/random-tracing:
  do_coredump: Do not take BKL
  init: Remove the BKL from startup code
2010-08-07 17:06:54 -07:00
Linus Torvalds 0d9f9e122c Merge branch 'for-2.6.36' of git://linux-nfs.org/~bfields/linux
* 'for-2.6.36' of git://linux-nfs.org/~bfields/linux: (34 commits)
  nfsd4: fix file open accounting for RDWR opens
  nfsd: don't allow setting maxblksize after svc created
  nfsd: initialize nfsd versions before creating svc
  net: sunrpc: removed duplicated #include
  nfsd41: Fix a crash when a callback is retried
  nfsd: fix startup/shutdown order bug
  nfsd: minor nfsd read api cleanup
  gcc-4.6: nfsd: fix initialized but not read warnings
  nfsd4: share file descriptors between stateid's
  nfsd4: fix openmode checking on IO using lock stateid
  nfsd4: miscellaneous process_open2 cleanup
  nfsd4: don't pretend to support write delegations
  nfsd: bypass readahead cache when have struct file
  nfsd: minor nfsd_svc() cleanup
  nfsd: move more into nfsd_startup()
  nfsd: just keep single lockd reference for nfsd
  nfsd: clean up nfsd_create_serv error handling
  nfsd: fix error handling in __write_ports_addxprt
  nfsd: fix error handling when starting nfsd with rpcbind down
  nfsd4: fix v4 state shutdown error paths
  ...
2010-08-07 14:24:41 -07:00
David Howells df44f9f4f9 AFS: Fix the module init error handling
Fix the module init error handling.  There are a bunch of goto labels for
aborting the init procedure at different points and just undoing what needs
undoing - they aren't all in the right places, however.

This can lead to an oops like the following:

	BUG: unable to handle kernel NULL pointer dereference at 0000000000000020
	IP: [<ffffffff81042a31>] destroy_workqueue+0x17/0xc0
	...
	Modules linked in: kafs(+) dns_resolver rxkad af_rxrpc fscache

	Pid: 2171, comm: insmod Not tainted 2.6.35-cachefs+ #319 DG965RY/
	...
	Process insmod (pid: 2171, threadinfo ffff88003ca6a000, task ffff88003dcc3050)
	...
	Call Trace:
	 [<ffffffffa0055994>] afs_callback_update_kill+0x10/0x12 [kafs]
	 [<ffffffffa007d1c5>] afs_init+0x190/0x1ce [kafs]
	 [<ffffffffa007d035>] ? afs_init+0x0/0x1ce [kafs]
	 [<ffffffff810001ef>] do_one_initcall+0x59/0x14e
	 [<ffffffff8105f7ee>] sys_init_module+0x9c/0x1de
	 [<ffffffff81001eab>] system_call_fastpath+0x16/0x1b

Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-08-07 14:23:37 -07:00
Linus Torvalds 5df6b8e65a Merge branch 'nfs-for-2.6.36' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6
* 'nfs-for-2.6.36' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6: (42 commits)
  NFS: NFSv4.1 is no longer a "developer only" feature
  NFS: NFS_V4 is no longer an EXPERIMENTAL feature
  NFS: Fix /proc/mount for legacy binary interface
  NFS: Fix the locking in nfs4_callback_getattr
  SUNRPC: Defer deleting the security context until gss_do_free_ctx()
  SUNRPC: prevent task_cleanup running on freed xprt
  SUNRPC: Reduce asynchronous RPC task stack usage
  SUNRPC: Move the bound cred to struct rpc_rqst
  SUNRPC: Clean up of rpc_bindcred()
  SUNRPC: Move remaining RPC client related task initialisation into clnt.c
  SUNRPC: Ensure that rpc_exit() always wakes up a sleeping task
  SUNRPC: Make the credential cache hashtable size configurable
  SUNRPC: Store the hashtable size in struct rpc_cred_cache
  NFS: Ensure the AUTH_UNIX credcache is allocated dynamically
  NFS: Fix the NFS users of rpc_restart_call()
  SUNRPC: The function rpc_restart_call() should return success/failure
  NFSv4: Get rid of the bogus RPC_ASSASSINATED(task) checks
  NFSv4: Clean up the process of renewing the NFSv4 lease
  NFSv4.1: Handle NFS4ERR_DELAY on SEQUENCE correctly
  NFS: nfs_rename() should not have to flush out writebacks
  ...
2010-08-07 13:19:36 -07:00
Linus Torvalds fe21ea18c7 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse:
  fuse: add retrieve request
  fuse: add store request
  fuse: don't use atomic kmap
2010-08-07 13:18:36 -07:00
Linus Torvalds a57f9a3e81 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ryusuke/nilfs2
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ryusuke/nilfs2: (45 commits)
  nilfs2: reject filesystem with unsupported block size
  nilfs2: avoid rec_len overflow with 64KB block size
  nilfs2: simplify nilfs_get_page function
  nilfs2: reject incompatible filesystem
  nilfs2: add feature set fields to super block
  nilfs2: clarify byte offset in super block format
  nilfs2: apply read-ahead for nilfs_btree_lookup_contig
  nilfs2: introduce check flag to btree node buffer
  nilfs2: add btree get block function with readahead option
  nilfs2: add read ahead mode to nilfs_btnode_submit_block
  nilfs2: fix buffer head leak in nilfs_btnode_submit_block
  nilfs2: eliminate inline keywords in btree implementation
  nilfs2: get maximum number of child nodes from bmap object
  nilfs2: reduce repetitive calculation of max number of child nodes
  nilfs2: optimize calculation of min/max number of btree node children
  nilfs2: remove redundant pointer checks in bmap lookup functions
  nilfs2: get rid of nilfs_bmap_union
  nilfs2: unify bmap set_target_v operations
  nilfs2: get rid of nilfs_btree uses
  nilfs2: get rid of nilfs_direct uses
  ...
2010-08-07 13:10:55 -07:00
Linus Torvalds 09dc942c2a Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4
* 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (40 commits)
  ext4: Adding error check after calling ext4_mb_regular_allocator()
  ext4: Fix dirtying of journalled buffers in data=journal mode
  ext4: re-inline ext4_rec_len_(to|from)_disk functions
  jbd2: Remove t_handle_lock from start_this_handle()
  jbd2: Change j_state_lock to be a rwlock_t
  jbd2: Use atomic variables to avoid taking t_handle_lock in jbd2_journal_stop
  ext4: Add mount options in superblock
  ext4: force block allocation on quota_off
  ext4: fix freeze deadlock under IO
  ext4: drop inode from orphan list if ext4_delete_inode() fails
  ext4: check to make make sure bd_dev is set before dereferencing it
  jbd2: Make barrier messages less scary
  ext4: don't print scary messages for allocation failures post-abort
  ext4: fix EFBIG edge case when writing to large non-extent file
  ext4: fix ext4_get_blocks references
  ext4: Always journal quota file modifications
  ext4: Fix potential memory leak in ext4_fill_super
  ext4: Don't error out the fs if the user tries to make a file too big
  ext4: allocate stripe-multiple IOs on stripe boundaries
  ext4: move aio completion after unwritten extent conversion
  ...

Fix up conflicts in fs/ext4/inode.c as per Ted.

Fix up xfs conflicts as per earlier xfs merge.
2010-08-07 13:03:53 -07:00
Linus Torvalds 90e0c22596 Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs-2.6
* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs-2.6:
  ext3: Fix dirtying of journalled buffers in data=journal mode
  ext3: default to ordered mode
  quota: Use mark_inode_dirty_sync instead of mark_inode_dirty
  quota: Change quota error message to print out disk and function name
  MAINTAINERS: Update entries of ext2 and ext3
  MAINTAINERS: Update address of Andreas Dilger
  ext3: Avoid filesystem corruption after a crash under heavy delete load
  ext3: remove vestiges of nobh support
  ext3: Fix set but unused variables
  quota: clean up quota active checks
  quota: Clean up the namespace in dqblk_xfs.h
  quota: check quota reservation on remove_dquot_ref
2010-08-07 12:57:07 -07:00
Linus Torvalds 938a73b959 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/teigland/dlm
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/teigland/dlm:
  fs/dlm: Drop unnecessary null test
  dlm: use genl_register_family_with_ops()
2010-08-07 12:56:48 -07:00
Linus Torvalds 45480aa75b Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-udf-2.6
* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-udf-2.6:
  udf: super.c Fix warning: variable 'sbi' set but not used
  udf: remove duplicated #include
2010-08-07 12:56:27 -07:00
Linus Torvalds 1fc7995d19 Merge git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6:
  [DNS RESOLVER] Minor typo correction
  DNS: Fixes for the DNS query module
  cifs: Include linux/err.h for IS_ERR and PTR_ERR
  DNS: Make AFS go to the DNS for AFSDB records for unknown cells
  DNS: Separate out CIFS DNS Resolver code
  cifs: account for new creduid=0x%x parameter in spnego upcall string
  cifs: reduce false positives with inode aliasing serverino autodisable
  CIFS: Make cifs_convert_address() take a const src pointer and a length
  cifs: show features compiled in as part of DebugData
  cifs: update README

Fix up trivial conflicts in fs/cifs/cifsfs.c due to workqueue changes
2010-08-07 12:54:46 -07:00
Linus Torvalds 3b7433b8a8 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq: (55 commits)
  workqueue: mark init_workqueues() as early_initcall()
  workqueue: explain for_each_*cwq_cpu() iterators
  fscache: fix build on !CONFIG_SYSCTL
  slow-work: kill it
  gfs2: use workqueue instead of slow-work
  drm: use workqueue instead of slow-work
  cifs: use workqueue instead of slow-work
  fscache: drop references to slow-work
  fscache: convert operation to use workqueue instead of slow-work
  fscache: convert object to use workqueue instead of slow-work
  workqueue: fix how cpu number is stored in work->data
  workqueue: fix mayday_mask handling on UP
  workqueue: fix build problem on !CONFIG_SMP
  workqueue: fix locking in retry path of maybe_create_worker()
  async: use workqueue for worker pool
  workqueue: remove WQ_SINGLE_CPU and use WQ_UNBOUND instead
  workqueue: implement unbound workqueue
  workqueue: prepare for WQ_UNBOUND implementation
  libata: take advantage of cmwq and remove concurrency limitations
  workqueue: fix worker management invocation without pending works
  ...

Fixed up conflicts in fs/cifs/* as per Tejun. Other trivial conflicts in
include/linux/workqueue.h, kernel/trace/Kconfig and kernel/workqueue.c
2010-08-07 12:42:58 -07:00
Tristan Ye 415cf32c9c O2net: Disallow o2net accept connection request from itself.
Currently, o2net_accept_one() is allowed to accept a connection from
listening node itself, such a fake connection will not be successfully
established due to no handshake detected afterwards, and later end up
with triggering connecting worker in a loop.

We're going to fix this by treating such connection request as 'invalid',
since we've got no chance of requesting connection from a node to itself
in a OCFS2 cluster.

The fix doesn't hurt user's scan for o2net-listener, it always gets a
successful connection from userpace.

Signed-off-by: Tristan Ye <tristan.ye@oracle.com>
Acked-by: Sunil Mushran <sunil.mushran@oracle.com>
Signed-off-by: Joel Becker <joel.becker@oracle.com>
2010-08-07 10:50:33 -07:00
Wengang Wang b11f1f1ab7 ocfs2/dlm: remove potential deadlock -V3
When we need to take both dlm_domain_lock and dlm->spinlock, we should take
them in order of: dlm_domain_lock then dlm->spinlock.

There is pathes disobey this order. That is calling dlm_lockres_put() with
dlm->spinlock held in dlm_run_purge_list. dlm_lockres_put() calls dlm_put() at
the ref and dlm_put() locks on dlm_domain_lock.

Fix:
Don't grab/put the dlm when the initialising/releasing lockres.
That grab is not required because we don't call dlm_unregister_domain()
based on refcount.

Signed-off-by: Wengang Wang <wen.gang.wang@oracle.com>
Cc: stable@kernel.org
Signed-off-by: Joel Becker <joel.becker@oracle.com>
2010-08-07 10:50:30 -07:00
Wengang Wang a524812b7e ocfs2/dlm: avoid incorrect bit set in refmap on recovery master
In the following situation, there remains an incorrect bit in refmap on the
recovery master. Finally the recovery master will fail at purging the lockres
due to the incorrect bit in refmap.

1) node A has no interest on lockres A any longer, so it is purging it.
2) the owner of lockres A is node B, so node A is sending de-ref message
to node B.
3) at this time, node B crashed. node C becomes the recovery master. it recovers
lockres A(because the master is the dead node B).
4) node A migrated lockres A to node C with a refbit there.
5) node A failed to send de-ref message to node B because it crashed. The failure
is ignored. no other action is done for lockres A any more.

For mormal, re-send the deref message to it to recovery master can fix it. Well,
ignoring the failure of deref to the original master and not recovering the lockres
to recovery master has the same effect. And the later is simpler.

Signed-off-by: Wengang Wang <wen.gang.wang@oracle.com>
Acked-by: Srinivas Eeda <srinivas.eeda@oracle.com>
Cc: stable@kernel.org
Signed-off-by: Joel Becker <joel.becker@oracle.com>
2010-08-07 10:49:41 -07:00
Jiaju Zhang 845b6cf341 Fix the nested PR lock calling issue in ACL
Hi,

Thanks a lot for all the review and comments so far;) I'd like to send
the improved (V4) version of this patch.

This patch fixes a deadlock in OCFS2 ACL. We found this bug in OCFS2
and Samba integration using scenario, the symptom is several smbd
processes will be hung under heavy workload. Finally we found out it
is the nested PR lock calling that leads to this deadlock:

 node1        node2
              gr PR
                |
                V
 PR(EX)---> BAST:OCFS2_LOCK_BLOCKED
                |
                V
              rq PR
                |
                V
              wait=1

After requesting the 2nd PR lock, the process "smbd" went into D
state. It can only be woken up when the 1st PR lock's RO holder equals
zero. There should be an ocfs2_inode_unlock in the calling path later
on, which can decrement the RO holder. But since it has been in
uninterruptible sleep, the unlock function has no chance to be called.

The related stack trace is:
smbd          D ffff8800013d0600     0  9522   5608 0x00000000
 ffff88002ca7fb18 0000000000000282 ffff88002f964500 ffff88002ca7fa98
 ffff8800013d0600 ffff88002ca7fae0 ffff88002f964340 ffff88002f964340
 ffff88002ca7ffd8 ffff88002ca7ffd8 ffff88002f964340 ffff88002f964340
Call Trace:
[<ffffffff80350425>] schedule_timeout+0x175/0x210
[<ffffffff8034f580>] wait_for_common+0xf0/0x210
[<ffffffffa03e12b9>] __ocfs2_cluster_lock+0x3b9/0xa90 [ocfs2]
[<ffffffffa03e7665>] ocfs2_inode_lock_full_nested+0x255/0xdb0 [ocfs2]
[<ffffffffa0446019>] ocfs2_get_acl+0x69/0x120 [ocfs2]
[<ffffffffa0446368>] ocfs2_check_acl+0x28/0x80 [ocfs2]
[<ffffffff800e3507>] acl_permission_check+0x57/0xb0
[<ffffffff800e357d>] generic_permission+0x1d/0xc0
[<ffffffffa03eecea>] ocfs2_permission+0x10a/0x1d0 [ocfs2]
[<ffffffff800e3f65>] inode_permission+0x45/0x100
[<ffffffff800d86b3>] sys_chdir+0x53/0x90
[<ffffffff80007458>] system_call_fastpath+0x16/0x1b
[<00007f34a4ef6927>] 0x7f34a4ef6927

For details, please see:
https://bugzilla.novell.com/show_bug.cgi?id=614332 and
http://oss.oracle.com/bugzilla/show_bug.cgi?id=1278

Signed-off-by: Jiaju Zhang <jjzhang@suse.de>
Acked-by: Mark Fasheh <mfasheh@suse.com>
Cc: stable@kernel.org
Signed-off-by: Joel Becker <joel.becker@oracle.com>
2010-08-07 10:46:46 -07:00
Tao Ma 8a2e70c40f ocfs2: Count more refcount records in file system fragmentation.
The refcount record calculation in ocfs2_calc_refcount_meta_credits
is too optimistic that we can always allocate contiguous clusters
and handle an already existed refcount rec as a whole. Actually
because of file system fragmentation, we may have the chance to split
a refcount record into 3 parts during the transaction. So consider
the worst case in record calculation.

Cc: stable@kernel.org
Signed-off-by: Tao Ma <tao.ma@oracle.com>
Signed-off-by: Joel Becker <joel.becker@oracle.com>
2010-08-07 10:44:49 -07:00
Srinivas Eeda 7beaf24378 ocfs2 fix o2dlm dlm run purgelist (rev 3)
This patch fixes two problems in dlm_run_purgelist

1. If a lockres is found to be in use, dlm_run_purgelist keeps trying to purge
the same lockres instead of trying the next lockres.

2. When a lockres is found unused, dlm_run_purgelist releases lockres spinlock
before setting DLM_LOCK_RES_DROPPING_REF and calls dlm_purge_lockres.
spinlock is reacquired but in this window lockres can get reused. This leads
to BUG.

This patch modifies dlm_run_purgelist to skip lockres if it's in use and purge
 next lockres. It also sets DLM_LOCK_RES_DROPPING_REF before releasing the
lockres spinlock protecting it from getting reused.

Signed-off-by: Srinivas Eeda <srinivas.eeda@oracle.com>
Acked-by: Sunil Mushran <sunil.mushran@oracle.com>
Cc: stable@kernel.org
Signed-off-by: Joel Becker <joel.becker@oracle.com>
2010-08-07 10:44:40 -07:00
Wengang Wang 6d98c3ccb5 ocfs2/dlm: fix a dead lock
When we have to take both dlm->master_lock and lockres->spinlock,
take them in order

lockres->spinlock and then dlm->master_lock.

The patch fixes a violation of the rule.
We can simply move taking dlm->master_lock to where we have dropped res->spinlock
since when we access res->state and free mle memory we don't need master_lock's
protection.

Signed-off-by: Wengang Wang <wen.gang.wang@oracle.com>
Cc: stable@kernel.org
Signed-off-by: Joel Becker <joel.becker@oracle.com>
2010-08-07 10:44:20 -07:00
Tiger Yang 6eda3dd33f ocfs2: do not overwrite error codes in ocfs2_init_acl
Setting the acl while creating a new inode depends on
the error codes of posix_acl_create_masq. This patch fix
a issue of overwriting the error codes of it.

Reported-by: Pawel Zawora <pzawora@gmail.com>
Cc: <stable@kernel.org> [ .33, .34 ]
Signed-off-by: Tiger Yang <tiger.yang@oracle.com>
Signed-off-by: Joel Becker <joel.becker@oracle.com>
2010-08-07 10:43:25 -07:00
Artem Bityutskiy 6467716a37 writeback: optimize periodic bdi thread wakeups
Whe the first inode for a bdi is marked dirty, we wake up the bdi thread which
should take care of the periodic background write-out. However, the write-out
will actually start only 'dirty_writeback_interval' centisecs later, so we can
delay the wake-up.

This change was requested by Nick Piggin who pointed out that if we delay the
wake-up, we weed out 2 unnecessary contex switches, which matters because
'__mark_inode_dirty()' is a hot-path function.

This patch introduces a new function - 'bdi_wakeup_thread_delayed()', which
sets up a timer to wake-up the bdi thread and returns. So the wake-up is
delayed.

We also delete the timer in bdi threads just before writing-back. And
synchronously delete it when unregistering bdi. At the unregister point the bdi
does not have any users, so no one can arm it again.

Since now we take 'bdi->wb_lock' in the timer, which can execute in softirq
context, we have to use 'spin_lock_bh()' for 'bdi->wb_lock'. This patch makes
this change as well.

This patch also moves the 'bdi_wb_init()' function down in the file to avoid
forward-declaration of 'bdi_wakeup_thread_delayed()'.

Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2010-08-07 18:53:56 +02:00
Artem Bityutskiy 253c34e9b1 writeback: prevent unnecessary bdi threads wakeups
Finally, we can get rid of unnecessary wake-ups in bdi threads, which are very
bad for battery-driven devices.

There are two types of activities bdi threads do:
1. process bdi works from the 'bdi->work_list'
2. periodic write-back

So there are 2 sources of wake-up events for bdi threads:

1. 'bdi_queue_work()' - submits bdi works
2. '__mark_inode_dirty()' - adds dirty I/O to bdi's

The former already has bdi wake-up code. The latter does not, and this patch
adds it.

'__mark_inode_dirty()' is hot-path function, but this patch adds another
'spin_lock(&bdi->wb_lock)' there. However, it is taken only in rare cases when
the bdi has no dirty inodes. So adding this spinlock should be fine and should
not affect performance.

This patch makes sure bdi threads and the forker thread do not wake-up if there
is nothing to do. The forker thread will nevertheless wake up at least every
5 min. to check whether it has to kill a bdi thread. This can also be optimized,
but is not worth it.

This patch also tidies up the warning about unregistered bid, and turns it from
an ugly crocodile to a simple 'WARN()' statement.

Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2010-08-07 18:53:56 +02:00
Artem Bityutskiy fff5b85aa4 writeback: move bdi threads exiting logic to the forker thread
Currently, bdi threads can decide to exit if there were no useful activities
for 5 minutes. However, this causes nasty races: we can easily oops in the
'bdi_queue_work()' if the bdi thread decides to exit while we are waking it up.

And even if we do not oops, but the bdi tread exits immediately after we wake
it up, we'd lose the wake-up event and have an unnecessary delay (up to 5 secs)
in the bdi work processing.

This patch makes the forker thread to be the central place which not only
creates bdi threads, but also kills them if they were inactive long enough.
This better design-wise.

Another reason why this change was done is to prepare for the further changes
which will prevent the bdi threads from waking up every 5 sec and wasting
power. Indeed, when the task does not wake up periodically anymore, it won't be
able to exit either.

This patch also moves the the 'wake_up_bit()' call from the bdi thread to the
forker thread as well. So now the forker thread sets the BDI_pending bit, then
forks the task or kills it, then clears the bit and wakes up the waiting
process.

The only process which may wain on the bit is 'bdi_wb_shutdown()'. This
function was changed as well - now it first removes the bdi from the
'bdi_list', then waits on the 'BDI_pending' bit. Once it wakes up, it is
guaranteed that the forker thread won't race with it, because the bdi is not
visible. Note, the forker thread sets the 'BDI_pending' bit under the
'bdi->wb_lock' which is essential for proper serialization.

And additionally, when we change 'bdi->wb.task', we now take the
'bdi->work_lock', to make sure that we do not lose wake-ups which we otherwise
would when raced with, say, 'bdi_queue_work()'.

Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2010-08-07 18:53:56 +02:00
Artem Bityutskiy ecd584030d writeback: move last_active to bdi
Currently bdi threads use local variable 'last_active' which stores last time
when the bdi thread did some useful work. Move this local variable to 'struct
bdi_writeback'. This is just a preparation for the further patches which will
make the forker thread decide when bdi threads should be killed.

Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2010-08-07 18:53:56 +02:00
Artem Bityutskiy 78c40cb658 writeback: do not remove bdi from bdi_list
The forker thread removes bdis from 'bdi_list' before forking the bdi thread.
But this is wrong for at least 2 reasons.

Reason #1: if we temporary remove a bdi from the list, we may miss works which
           would otherwise be given to us.

Reason #2: this is racy; indeed, 'bdi_wb_shutdown()' expects that bdis are
           always in the 'bdi_list' (see 'bdi_remove_from_list()'), and when
           it races with the forker thread, it can shut down the bdi thread
           at the same time as the forker creates it.

This patch makes sure the forker thread never removes bdis from 'bdi_list'
(which was suggested by Christoph Hellwig).

In order to make sure that we do not race with 'bdi_wb_shutdown()', we have to
hold the 'bdi_lock' while walking the 'bdi_list' and setting the 'BDI_pending'
flag.

NOTE! The error path is interesting. Currently, when we fail to create a bdi
thread, we move the bdi to the tail of 'bdi_list'. But if we never remove the
bdi from the list, we cannot move it to the tail either, because then we can
mess up the RCU readers which walk the list. And also, we'll have the race
described above in "Reason #2".

But I not think that adding to the tail is any important so I just do not do
that.

Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2010-08-07 18:53:56 +02:00
Artem Bityutskiy 297252c81d writeback: do not lose wake-ups in bdi threads
Currently, bdi threads ('bdi_writeback_thread()') can lose wake-ups. For
example, if 'bdi_queue_work()' is executed after the bdi thread have had
finished 'wb_do_writeback()' but before it called
'schedule_timeout_interruptible()'.

To fix this issue, we have to check whether we have works to process after we
have changed the task state to 'TASK_INTERRUPTIBLE'.

This patch also clean-ups handling of the cases when 'dirty_writeback_interval'
is zero or non-zero.

Additionally, this patch also removes unneeded 'list_empty_careful()' call.

Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2010-08-07 18:53:55 +02:00
Artem Bityutskiy 6f904ff0e3 writeback: harmonize writeback threads naming
The write-back code mixes words "thread" and "task" for the same things. This
is not a big deal, but still an inconsistency.

hch: a convention I tend to use and I've seen in various places
is to always use _task for the storage of the task_struct pointer,
and thread everywhere else.  This especially helps with having
foo_thread for the actual thread and foo_task for a global
variable keeping the task_struct pointer

This patch renames:
* 'bdi_add_default_flusher_task()' -> 'bdi_add_default_flusher_thread()'
* 'bdi_forker_task()'              -> 'bdi_forker_thread()'

because bdi threads are 'bdi_writeback_thread()', so these names are more
consistent.

This patch also amends commentaries and makes them refer the forker and bdi
threads as "thread", not "task".

Also, while on it, make 'bdi_add_default_flusher_thread()' declaration use
'static void' instead of 'void static' and make checkpatch.pl happy.

Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2010-08-07 18:53:16 +02:00
Jens Axboe 4aeefdc69f coda: fixup clash with block layer REQ_* defines
CODA should not be using defines in the global name space of
that nature, prefix them with CODA_.

Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2010-08-07 18:53:13 +02:00
Minchan Kim 08852b6d6c writeback: remove wb in get_next_work_item
83ba7b07 cleans up the writeback.
So we don't use wb any more in get_next_work_item.
Let's remove unnecessary argument.

CC: Christoph Hellwig <hch@lst.de>
Signed-off-by: Minchan Kim <minchan.kim@gmail.com>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2010-08-07 18:53:01 +02:00
Miklos Szeredi 6965031d33 splice: fix misuse of SPLICE_F_NONBLOCK
SPLICE_F_NONBLOCK is clearly documented to only affect blocking on the
pipe.  In __generic_file_splice_read(), however, it causes an EAGAIN
if the page is currently being read.

This makes it impossible to write an application that only wants
failure if the pipe is full.  For example if the same process is
handling both ends of a pipe and isn't otherwise able to determine
whether a splice to the pipe will fill it or not.

We could make the read non-blocking on O_NONBLOCK or some other splice
flag, but for now this is the simplest fix.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
CC: stable@kernel.org
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2010-08-07 18:52:56 +02:00
Arnd Bergmann 6e9624b8ca block: push down BKL into .open and .release
The open and release block_device_operations are currently
called with the BKL held. In order to change that, we must
first make sure that all drivers that currently rely
on this have no regressions.

This blindly pushes the BKL into all .open and .release
operations for all block drivers to prepare for the
next step. The drivers can subsequently replace the BKL
with their own locks or remove it completely when it can
be shown that it is not needed.

The functions blkdev_get and blkdev_put are the only
remaining users of the big kernel lock in the block
layer, besides a few uses in the ioctl code, none
of which need to serialize with blkdev_{get,put}.

Most of these two functions is also under the protection
of bdev->bd_mutex, including the actual calls to
->open and ->release, and the common code does not
access any global data structures that need the BKL.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2010-08-07 18:25:34 +02:00
Dave Chinner 028c2dd184 writeback: Add tracing to balance_dirty_pages
Tracing high level background writeback events is good, but it doesn't
give the entire picture. Add visibility into write throttling to catch IO
dispatched by foreground throttling of processing dirtying lots of pages.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2010-08-07 18:24:25 +02:00
Dave Chinner 455b286468 writeback: Initial tracing support
Trace queue/sched/exec parts of the writeback loop. This provides
insight into when and why flusher threads are scheduled to run. e.g
a sync invocation leaves traces like:

     sync-[...]: writeback_queue: bdi 8:0: sb_dev 8:1 nr_pages=7712 sync_mode=0 kupdate=0 range_cyclic=0 background=0
flush-8:0-[...]: writeback_exec: bdi 8:0: sb_dev 8:1 nr_pages=7712 sync_mode=0 kupdate=0 range_cyclic=0 background=0

This also lays the foundation for adding more writeback tracing to
provide deeper insight into the whole writeback path.

The original tracing code is from Jens Axboe, though this version is
a rewrite as a result of the code being traced changing
significantly.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2010-08-07 18:24:23 +02:00
Andi Kleen 1676effca4 gcc-4.6: fs: fix unused but set warnings
No real bugs I believe, just some dead code, and some
shut up code.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Cc: Eric Paris <eparis@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2010-08-07 18:23:12 +02:00
Christoph Hellwig 082439004b writeback: merge bdi_writeback_task and bdi_start_fn
Move all code for the writeback thread into fs/fs-writeback.c instead of
splitting it over two functions in two files.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2010-08-07 18:23:06 +02:00
Christoph Hellwig c1955ce32f writeback: remove wb_list
The wb_list member of struct backing_device_info always has exactly one
element.  Just use the direct bdi->wb pointer instead and simplify some
code.

Also remove bdi_task_init which is now trivial to prepare for the next
patch.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2010-08-07 18:23:03 +02:00
Christoph Hellwig 7b6d91daee block: unify flags for struct bio and struct request
Remove the current bio flags and reuse the request flags for the bio, too.
This allows to more easily trace the type of I/O from the filesystem
down to the block driver.  There were two flags in the bio that were
missing in the requests:  BIO_RW_UNPLUG and BIO_RW_AHEAD.  Also I've
renamed two request flags that had a superflous RW in them.

Note that the flags are in bio.h despite having the REQ_ name - as
blkdev.h includes bio.h that is the only way to go for now.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2010-08-07 18:20:39 +02:00
Christoph Hellwig 41f2df6289 block: BARRIER request should imply SYNC
A barrier request should by defintion have priority in get_request
and let the queue be unplugged immediately as it's blocking all forward
progress due to the queue draining.

Most filesystems already get this implicitly by the way how submit_bh
treats the buffer_ordered flag, and gfs2 sets it explicitly.  But btrfs
and XFS are still forgetting to set the flag, as is blkdev_issue_flush
and some places in DM/MD.

For XFS on metadata heavy workloads this gives a consistent speedup
in the 2-3% range.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2010-08-07 18:15:44 +02:00
J. Bruce Fields 998db52c03 nfsd4: fix file open accounting for RDWR opens
Commit f9d7562fdb "nfsd4: share file
descriptors between stateid's" didn't correctly account for O_RDWR opens.
Symptoms include leaked files, resulting in failures to unmount and/or
warnings about orphaned inodes on reboot.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2010-08-07 09:50:05 -04:00
J. Bruce Fields 7fa53cc872 nfsd: don't allow setting maxblksize after svc created
It's harmless to set this after the server is created, but also
ineffective, since the value is only used at the time of
svc_create_pooled().  So fail the attempt, in keeping with the pattern
set by write_versions, write_{lease,grace}time and write_recoverydir.

(This could break userspace that tried to write to nfsd/max_block_size
between setting up sockets and starting the server.  However, such code
wouldn't have worked anyway, and I don't know of any examples--rpc.nfsd
in nfs-utils, probably the only user of the interface, doesn't do that.)

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2010-08-06 18:00:33 -04:00
J. Bruce Fields e844a7b980 nfsd: initialize nfsd versions before creating svc
Commit 59db4a0c10 "nfsd: move more into
nfsd_startup()" inadvertently moved nfsd_versions after
nfsd_create_svc().  On older distributions using an rpc.nfsd that does
not explicitly set the list of nfsd versions, this results in
svc-create_pooled() being called with an empty versions array.  The
resulting incomplete initialization leads to a NULL dereference in
svc_process_common() the first time a client accesses the server.

Move nfsd_reset_versions() back before the svc_create_pooled(); this
time, put it closer to the svc_create_pooled() call, to make this
mistake more difficult in the future.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2010-08-06 17:05:40 -04:00
Boaz Harrosh c18c821fd4 nfsd41: Fix a crash when a callback is retried
If a callback is retried at nfsd4_cb_recall_done() due to
some error, the returned rpc reply crashes here:

@@ -514,6 +514,7 @@ decode_cb_sequence(struct xdr_stream *xdr, struct nfsd4_cb_sequence *res,
 	u32 dummy;
 	__be32 *p;

 +	BUG_ON(!res);
 	if (res->cbs_minorversion == 0)
 		return 0;

[BUG_ON added for demonstration]

This is because the nfsd4_cb_done_sequence() has NULLed out
the task->tk_msg.rpc_resp pointer.

Also eventually the rpc would use the new slot without making
sure it is free by calling nfsd41_cb_setup_sequence().

This problem was introduced by a 4.1 protocol addition patch:
	[0421b5c5] nfsd41: Backchannel: Implement cb_recall over NFSv4.1

Which was overlooking the possibility of an RPC callback retries.
For not-4.1 case redoing the _prepare is harmless.

Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2010-08-06 17:05:39 -04:00
J. Bruce Fields 774f8bbd9e nfsd: fix startup/shutdown order bug
We must create the server before we can call init_socks or check the
number of threads.

Symptoms were a NULL pointer dereference in nfsd_svc().  Problem
identified by Jeff Layton.

Also fix a minor cleanup-on-error case in nfsd_startup().

Reported-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2010-08-06 17:05:30 -04:00
Linus Torvalds ab69bcd66f Merge git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core-2.6: (28 commits)
  driver core: device_rename's new_name can be const
  sysfs: Remove owner field from sysfs struct attribute
  powerpc/pci: Remove owner field from attribute initialization in PCI bridge init
  regulator: Remove owner field from attribute initialization in regulator core driver
  leds: Remove owner field from attribute initialization in bd2802 driver
  scsi: Remove owner field from attribute initialization in ARCMSR driver
  scsi: Remove owner field from attribute initialization in LPFC driver
  cgroupfs: create /sys/fs/cgroup to mount cgroupfs on
  Driver core: Add BUS_NOTIFY_BIND_DRIVER
  driver core: fix memory leak on one error path in bus_register()
  debugfs: no longer needs to depend on SYSFS
  sysfs: Fix one more signature discrepancy between sysfs implementation and docs.
  sysfs: fix discrepancies between implementation and documentation
  dcdbas: remove a redundant smi_data_buf_free in dcdbas_exit
  dmi-id: fix a memory leak in dmi_id_init error path
  sysfs: sysfs_chmod_file's attr can be const
  firmware: Update hotplug script
  Driver core: move platform device creation helpers to .init.text (if MODULE=n)
  Driver core: reduce duplicated code for platform_device creation
  Driver core: use kmemdup in platform_device_add_resources
  ...
2010-08-06 11:36:30 -07:00
Trond Myklebust 3dce9a5c3a NFS: NFSv4.1 is no longer a "developer only" feature
Mark it as 'experimental' instead, since in practice, NFSv4.1 should now be
relatively stable.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-08-06 13:41:41 -04:00
Trond Myklebust b3edc2bc19 NFS: NFS_V4 is no longer an EXPERIMENTAL feature
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-08-06 13:41:40 -04:00
Bryan Schumaker d5eff1a341 NFS: Fix /proc/mount for legacy binary interface
Add a flag so we know if we mounted the NFS server using the legacy
binary interface.  If we used the legacy interface, then we should not
show the mountd options.

Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-08-06 13:41:39 -04:00
Trond Myklebust 761fe93cdf NFS: Fix the locking in nfs4_callback_getattr
The delegation is protected by RCU now, so we need to replace the
nfsi->rwsem protection with an rcu protected section.

Reported-by: Fred Isaman <iisaman@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-08-06 13:41:39 -04:00
Linus Torvalds 4aed2fd8e3 Merge branch 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (162 commits)
  tracing/kprobes: unregister_trace_probe needs to be called under mutex
  perf: expose event__process function
  perf events: Fix mmap offset determination
  perf, powerpc: fsl_emb: Restore setting perf_sample_data.period
  perf, powerpc: Convert the FSL driver to use local64_t
  perf tools: Don't keep unreferenced maps when unmaps are detected
  perf session: Invalidate last_match when removing threads from rb_tree
  perf session: Free the ref_reloc_sym memory at the right place
  x86,mmiotrace: Add support for tracing STOS instruction
  perf, sched migration: Librarize task states and event headers helpers
  perf, sched migration: Librarize the GUI class
  perf, sched migration: Make the GUI class client agnostic
  perf, sched migration: Make it vertically scrollable
  perf, sched migration: Parameterize cpu height and spacing
  perf, sched migration: Fix key bindings
  perf, sched migration: Ignore unhandled task states
  perf, sched migration: Handle ignored migrate out events
  perf: New migration tool overview
  tracing: Drop cpparg() macro
  perf: Use tracepoint_synchronize_unregister() to flush any pending tracepoint call
  ...

Fix up trivial conflicts in Makefile and drivers/cpufreq/cpufreq.c
2010-08-06 09:30:52 -07:00
Linus Torvalds 3a3527b646 Merge branch 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  Revert "net: Make accesses to ->br_port safe for sparse RCU"
  mce: convert to rcu_dereference_index_check()
  net: Make accesses to ->br_port safe for sparse RCU
  vfs: add fs.h to define struct file
  lockdep: Add an in_workqueue_context() lockdep-based test function
  rcu: add __rcu API for later sparse checking
  rcu: add an rcu_dereference_index_check()
  tree/tiny rcu: Add debug RCU head objects
  mm: remove all rcu head initializations
  fs: remove all rcu head initializations, except on_stack initializations
  powerpc: remove all rcu head initializations
2010-08-06 09:23:07 -07:00
David Howells 31d1d48e19 Fix init ordering of /dev/console vs callers of modprobe
Make /dev/console get initialised before any initialisation routine that
invokes modprobe because if modprobe fails, it's going to want to open
/dev/console, presumably to write an error message to.

The problem with that is that if the /dev/console driver is not yet
initialised, the chardev handler will call request_module() to invoke
modprobe, which will fail, because we never compile /dev/console as a
module.

This will lead to a modprobe loop, showing the following in the kernel
log:

	request_module: runaway loop modprobe char-major-5-1
	request_module: runaway loop modprobe char-major-5-1
	request_module: runaway loop modprobe char-major-5-1
	request_module: runaway loop modprobe char-major-5-1
	request_module: runaway loop modprobe char-major-5-1

This can happen, for example, when the built in md5 module can't find
the built in cryptomgr module (because the latter fails to initialise).
The md5 module comes before the call to tty_init(), presumably because
'crypto' comes before 'drivers' alphabetically.

Fix this by calling tty_init() from chrdev_init().

Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-08-06 09:17:02 -07:00
Phillip Lougher 4f86b8fd48 Squashfs: fix filename typo
Signed-off-by: Phillip Lougher <phillip@lougher.demon.co.uk>
2010-08-05 23:52:53 +01:00
Phillip Lougher 4b676d2dbe Squashfs: update Kconfig and documentation for LZO
Update compression types supported and add some help text for
the LZO Kconfig option.

Also add missing "default n" line and make some trivial whitespace
cleanups too.

Signed-off-by: Phillip Lougher <phillip@lougher.demon.co.uk>
2010-08-05 23:42:54 +01:00
Sage Weil 0eb6cd49f6 ceph: only queue async writeback on cap revocation if there is dirty data
Normally, if the Fb cap bit is being revoked, we queue an async writeback.
If there is no dirty data but we still hold the cap, this leaves the
client sitting around doing nothing until the cap timeouts expire and the
cap is released on its own (as it would have been without the revocation).

Instead, only queue writeback if the bit is actually used (i.e., we have
dirty data).  If not, we can reply to the revocation immediately.

Signed-off-by: Sage Weil <sage@newdream.net>
2010-08-05 13:53:40 -07:00
Jean Delvare 49c19400f6 sysfs: sysfs_chmod_file's attr can be const
sysfs_chmod_file doesn't change the attribute it operates on, so this
attribute can be marked const.

Signed-off-by: Jean Delvare <khali@linux-fr.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-08-05 13:53:34 -07:00
Aditya Kali 6c7a120ac6 ext4: Adding error check after calling ext4_mb_regular_allocator()
If the bitmap block on disk is bad, ext4_mb_load_buddy() returns an
error. This error is returned to the caller,
ext4_mb_regular_allocator() and then to ext4_mb_new_blocks().  But
ext4_mb_new_blocks() did not check for the return value of
ext4_mb_regular_allocator() and would repeatedly try to load the
bitmap block. The fix simply catches the return value and exits out of
the 'repeat' loop after cleanup.

We also take the opportunity to clean up the error handling in
ext4_mb_new_blocks().

Google-Bug-Id: 2853530

Signed-off-by: Aditya Kali <adityakali@google.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2010-08-05 16:22:24 -04:00
Satoru Takeuchi 642b5123ac aio: fix wrong subsystem comments
- sys_io_destroy(): acutually return -EINVAL if the context pointed to
   is invalidIndex: linux-2.6.33-rc4/fs/aio.c
 - sys_io_getevents(): An argument specifying timeout is not `when',
   but `timeout'.
 - sys_io_getevents(): Should describe what is returned if this syscall
   succeeds.

Signed-off-by: Satoru Takeuchi <takeuchi_satoru@jp.fujitsu.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Reviewed-by: Jeff Moyer <jmoyer@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-08-05 13:21:23 -07:00
Jan Kara 5f11e6a440 ext3: Fix dirtying of journalled buffers in data=journal mode
In data=journal mode, we still use block_write_begin() to prepare page for
writing. This function can occasionally mark buffer dirty which violates
journalling assumptions - when a buffer is part of a transaction, it should be
dirty and a buffer can be already part of a forget list of some transaction
when block_write_begin() gets called. This violation of journalling assumptions
then results in "JBD: Spotted dirty metadata buffer..." warnings.

In fact, temporary dirtying the buffer while the page is still locked does not
really cause problems to the journalling because we won't write the buffer
until the page gets unlocked. So we just have to make sure to clear dirty bits
before unlocking the page.

Reviewed-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Jan Kara <jack@suse.cz>
2010-08-05 21:28:28 +02:00
Julia Lawall f70cb33b9c fs/dlm: Drop unnecessary null test
hlist_for_each_entry binds its first argument to a non-null value, and thus
any null test on the value of that argument is superfluous.

The semantic patch that makes this change is as follows:
(http://coccinelle.lip6.fr/)

// <smpl>
@@
iterator I;
expression x,E,E1,E2;
statement S,S1,S2;
@@

I(x,...) { <...
- (x != NULL) &&
  E
  ...> }
// </smpl>

Signed-off-by: Julia Lawall <julia@diku.dk>
Signed-off-by: David Teigland <teigland@redhat.com>
2010-08-05 14:23:45 -05:00
Changli Gao a4d935bd97 dlm: use genl_register_family_with_ops()
Signed-off-by: Changli Gao <xiaosuo@gmail.com>
Signed-off-by: David Teigland <teigland@redhat.com>
2010-08-05 14:22:01 -05:00
Jan Kara 56d35a4cd1 ext4: Fix dirtying of journalled buffers in data=journal mode
In data=journal mode, we still use block_write_begin() to prepare
page for writing. This function can occasionally mark buffer dirty
which violates journalling assumptions - when a buffer is part of
a transaction, it should be dirty and a buffer can be already part
of a forget list of some transaction when block_write_begin()
gets called. This violation of journalling assumptions then results
in "JBD: Spotted dirty metadata buffer..." warnings.

In fact, temporary dirtying the buffer while the page is still locked
does not really cause problems to the journalling because we won't write
the buffer until the page gets unlocked. So we just have to make sure
to clear dirty bits before unlocking the page.

Signed-off-by: Jan Kara <jack@suse.cz>
2010-08-05 14:41:42 -04:00
Wang Lei 07567a5509 DNS: Make AFS go to the DNS for AFSDB records for unknown cells
Add DNS query support for AFS so that it can get the IP addresses of Volume
Location servers from the DNS using an AFSDB record.

This requires userspace support.  /etc/request-key.conf must be configured to
invoke a helper for dns_resolver type keys with a subtype of "afsdb:" in the
description.

Signed-off-by: Wang Lei <wang840925@gmail.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
2010-08-05 17:17:51 +00:00
Wang Lei 1a4240f476 DNS: Separate out CIFS DNS Resolver code
Separate out the DNS resolver key type from the CIFS filesystem into its own
module so that it can be made available for general use, including the AFS
filesystem module.

This facility makes it possible for the kernel to upcall to userspace to have
it issue DNS requests, package up the replies and present them to the kernel
in a useful form.  The kernel is then able to cache the DNS replies as keys
can be retained in keyrings.

Resolver keys are of type "dns_resolver" and have a case-insensitive
description that is of the form "[<type>:]<domain_name>".  The optional <type>
indicates the particular DNS lookup and packaging that's required.  The
<domain_name> is the query to be made.

If <type> isn't given, a basic hostname to IP address lookup is made, and the
result is stored in the key in the form of a printable string consisting of a
comma-separated list of IPv4 and IPv6 addresses.

This key type is supported by userspace helpers driven from /sbin/request-key
and configured through /etc/request-key.conf.  The cifs.upcall utility is
invoked for UNC path server name to IP address resolution.

The CIFS functionality is encapsulated by the dns_resolve_unc_to_ip() function,
which is used to resolve a UNC path to an IP address for CIFS filesystem.  This
part remains in the CIFS module for now.

See the added Documentation/networking/dns_resolver.txt for more information.

Signed-off-by: Wang Lei <wang840925@gmail.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
2010-08-05 17:17:51 +00:00
Jeff Layton ba5dadbf4e cifs: account for new creduid=0x%x parameter in spnego upcall string
The commit that added the creduid=0x%x parameter failed to increase the
buffer allocation to account for it.

Reported-by: J. Bruce Fields <bfields@fieldses.org>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
2010-08-05 17:17:50 +00:00
Jeff Layton 5acfec2502 cifs: reduce false positives with inode aliasing serverino autodisable
It turns out that not all directory inodes with dentries on the
i_dentry list are unusable here. We only consider them unusable if they
are still hashed or if they have a root dentry attached.

Full disclosure -- this check is inherently racy. There's nothing that
stops someone from slapping a new dentry onto this inode just after
this check, or hashing an existing one that's already attached. So,
this is really a "best effort" thing to work around misbehaving servers.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
2010-08-05 17:17:50 +00:00
David Howells 67b7626a05 CIFS: Make cifs_convert_address() take a const src pointer and a length
Make cifs_convert_address() take a const src pointer and a length so that all
the strlen() calls in their can be cut out and to make it unnecessary to modify
the src string.

Also return the data length from dns_resolve_server_name_to_ip() so that a
strlen() can be cut out of cifs_compose_mount_options() too.

Acked-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
2010-08-05 17:17:50 +00:00
Suresh Jayaraman f579903ef3 cifs: show features compiled in as part of DebugData
Fixed the nit pointed out by Jeff.

From: Suresh Jayaraman <sjayaraman@suse.de>
Subject: [PATCH 1/2] cifs: show features compiled in as part of DebugData

This patch adds the features that are compiled in to the CIFS debugging data
as shown below:

	$cat /proc/fs/cifs/DebugData
	Display Internal CIFS Data Structures for Debugging
	---------------------------------------------------
	CIFS Version 1.64
	Features: dfs fscache posix spnego xattr
	Active VFS Requests: 0
	...

This patch provides a definitive way to tell what features are currently
enabled in the running kernel. This could also help debugging.

Signed-off-by: Suresh Jayaraman <sjayaraman@suse.de>
Cc: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
2010-08-05 17:17:50 +00:00
Suresh Jayaraman 95c99904f6 cifs: update README
Update the README file to reflect that now DebugData shows all
the features enabled.

Signed-off-by: Suresh Jayaraman <sjayaraman@suse.de>
Cc: Jeff Layton <jlayton@redhat.com>
--
 fs/cifs/README |    5 +++--
 1 files changed, 3 insertions(+), 2 deletions(-)
Signed-off-by: Steve French <sfrench@us.ibm.com>
2010-08-05 17:17:50 +00:00
Eric Sandeen 0cfc9255a1 ext4: re-inline ext4_rec_len_(to|from)_disk functions
commit 3d0518f4, "ext4: New rec_len encoding for very
large blocksizes" made several changes to this path, but from
a perf perspective, un-inlining ext4_rec_len_from_disk() seems
most significant.  This function is called from ext4_check_dir_entry(),
which on a file-creation workload is called extremely often.

I tested this with bonnie:

# bonnie++ -u root -s 0 -f -x 200 -d /mnt/test -n 32

(this does 200 iterations) and got this for the file creations:

ext4 stock:   Average =  21206.8 files/s
ext4 inlined: Average =  22346.7 files/s  (+5%)

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2010-08-05 01:46:37 -04:00
Phillip Lougher f3065f60dd Squashfs: fix block size use in LZO decompressor
Sizing the buffer using block size alone is incorrect leading
to a potential buffer over-run on 4K block size file systems
(because the metadata block size is always 8K).  Srclength is
set to the maximum expected size of the decompressed block and
it is block_size or 8K depending on whether a data or metadata
block is being decompressed.

Signed-off-by: Phillip Lougher <phillip@lougher.demon.co.uk>
2010-08-05 04:51:50 +01:00
Chan Jeong 79cb8ced7e Squashfs: Add LZO compression support
Signed-off-by: Chan Jeong <chan.jeong@lge.com>
Signed-off-by: Phillip Lougher <phillip@lougher.demon.co.uk>
2010-08-05 02:29:59 +01:00
Linus Torvalds 3cfc2c42c1 Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial
* 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (48 commits)
  Documentation: update broken web addresses.
  fix comment typo "choosed" -> "chosen"
  hostap:hostap_hw.c Fix typo in comment
  Fix spelling contorller -> controller in comments
  Kconfig.debug: FAIL_IO_TIMEOUT: typo Faul -> Fault
  fs/Kconfig: Fix typo Userpace -> Userspace
  Removing dead MACH_U300_BS26
  drivers/infiniband: Remove unnecessary casts of private_data
  fs/ocfs2: Remove unnecessary casts of private_data
  libfc: use ARRAY_SIZE
  scsi: bfa: use ARRAY_SIZE
  drm: i915: use ARRAY_SIZE
  drm: drm_edid: use ARRAY_SIZE
  synclink: use ARRAY_SIZE
  block: cciss: use ARRAY_SIZE
  comment typo fixes: charater => character
  fix comment typos concerning "challenge"
  arm: plat-spear: fix typo in kerneldoc
  reiserfs: typo comment fix
  update email address
  ...
2010-08-04 15:31:02 -07:00
Linus Torvalds 6ba74014c1 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6: (1443 commits)
  phy/marvell: add 88ec048 support
  igb: Program MDICNFG register prior to PHY init
  e1000e: correct MAC-PHY interconnect register offset for 82579
  hso: Add new product ID
  can: Add driver for esd CAN-USB/2 device
  l2tp: fix export of header file for userspace
  can-raw: Fix skb_orphan_try handling
  Revert "net: remove zap_completion_queue"
  net: cleanup inclusion
  phy/marvell: add 88e1121 interface mode support
  u32: negative offset fix
  net: Fix a typo from "dev" to "ndev"
  igb: Use irq_synchronize per vector when using MSI-X
  ixgbevf: fix null pointer dereference due to filter being set for VLAN 0
  e1000e: Fix irq_synchronize in MSI-X case
  e1000e: register pm_qos request on hardware activation
  ip_fragment: fix subtracting PPPOE_SES_HLEN from mtu twice
  net: Add getsockopt support for TCP thin-streams
  cxgb4: update driver version
  cxgb4: add new PCI IDs
  ...

Manually fix up conflicts in:
 - drivers/net/e1000e/netdev.c: due to pm_qos registration
   infrastructure changes
 - drivers/net/phy/marvell.c: conflict between adding 88ec048 support
   and cleaning up the IDs
 - drivers/net/wireless/ipw2x00/ipw2100.c: trivial ipw2100_pm_qos_req
   conflict (registration change vs marking it static)
2010-08-04 11:47:58 -07:00
Tejun Heo e75aa85892 block_dev: always serialize exclusive open attempts
bd_prepare_to_claim() incorrectly allowed multiple attempts for
exclusive open to progress in parallel if the attempting holders are
identical.  This triggered BUG_ON() as reported in the following bug.

  https://bugzilla.kernel.org/show_bug.cgi?id=16393

__bd_abort_claiming() is used to finish claiming blocks and doesn't
work if multiple openers are inside a claiming block.  Allowing
multiple parallel open attempts to continue doesn't gain anything as
those are serialized down in the call chain anyway.  Fix it by always
allowing only single open attempt in a claiming block.

This problem can easily be reproduced by adding a delay after
bd_prepare_to_claim() and attempting to mount two partitions of a
disk.

stable: only applicable to v2.6.35

Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Markus Trippelsdorf <markus@trippelsdorf.de>
Cc: stable@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-08-04 11:17:10 -07:00
Linus Torvalds 7e6880951d Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/security-testing-2.6
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/security-testing-2.6: (90 commits)
  AppArmor: fix build warnings for non-const use of get_task_cred
  selinux: convert the policy type_attr_map to flex_array
  AppArmor: Enable configuring and building of the AppArmor security module
  TOMOYO: Use pathname specified by policy rather than execve()
  AppArmor: update path_truncate method to latest version
  AppArmor: core policy routines
  AppArmor: policy routines for loading and unpacking policy
  AppArmor: mediation of non file objects
  AppArmor: LSM interface, and security module initialization
  AppArmor: Enable configuring and building of the AppArmor security module
  AppArmor: update Maintainer and Documentation
  AppArmor: functions for domain transitions
  AppArmor: file enforcement routines
  AppArmor: userspace interfaces
  AppArmor: dfa match engine
  AppArmor: contexts used in attaching policy to system objects
  AppArmor: basic auditing infrastructure.
  AppArmor: misc. base functions and defines
  TOMOYO: Update version to 2.3.0
  TOMOYO: Fix quota check.
  ...
2010-08-04 10:28:39 -07:00
Jiri Kosina d790d4d583 Merge branch 'master' into for-next 2010-08-04 15:14:38 +02:00
Uwe Kleine-König 73b2c7165b fix comment typo "choosed" -> "chosen"
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2010-08-04 15:08:39 +02:00
Trond Myklebust a17c2153d2 SUNRPC: Move the bound cred to struct rpc_rqst
This will allow us to save the original generic cred in rpc_message, so
that if we migrate from one server to another, we can generate a new bound
cred without having to punt back to the NFS layer.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-08-04 08:54:09 -04:00
Boaz Harrosh 5002dd18c5 exofs: Fix groups code when num_devices is not divisible by group_width
There is a bug when num_devices is not divisible by group_width * mirrors.
We would not return to the proper device and offset when looping on to the
next group.

The fix makes code simpler actually.

Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
2010-08-04 13:17:58 +03:00
Boaz Harrosh 6e31609b1d exofs: Remove useless optimization
We used to compact all used devices in an IO to the beginning
of the device array in an io_state. And keep a last device used
so in later loops we don't iterate on all device slots. This
does not prevent us from checking if slots are empty since in
reads we only read from a single mirror and jump to the next
mirror-set.

This optimization is marginal, and needlessly complicates the
code. Specially when we will later want to support raid/456
with same abstract code. So remove the distinction between
"dev" and "comp". Only "dev" is used both as the device used
and as the index (component) in the device array.

[Note that now the io_state->dev member is redundant but I
 keep it because I might want to optimize by only IOing a
 single group, though keeping a group_width*mirrors devices
 in io_state, we now keep num-devices in each io_state]

Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
2010-08-04 13:17:57 +03:00
Boaz Harrosh b284834929 exofs: exofs_file_fsync and exofs_file_flush correctness
As per Christoph advise: no need to call filemap_write_and_wait().
In exofs all metadata is at the inode so just writing the inode is
all is needed. ->fsync implies this must be done synchronously.

But now exofs_file_fsync can not be used by exofs_file_flush.
vfs_fsync() should do that job correctly.

FIXME: remove the sb_sync and fix that sb_update better.

Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
2010-08-04 13:17:56 +03:00
Boaz Harrosh 85dc7878c6 exofs: Remove superfluous dependency on buffer_head and writeback
exofs_releasepage && exofs_invalidatepage are never called.
Leave the WARN_ONs but remove any code. Remove the

cleanup other stale #includes.

Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
2010-08-04 13:17:55 +03:00
Trond Myklebust d05dd4e98f NFS: Fix the NFS users of rpc_restart_call()
Fix up those functions that depend on knowing whether or not
rpc_restart_call is successful or not.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-08-03 22:06:44 -04:00
Trond Myklebust a6f03393ec NFSv4: Get rid of the bogus RPC_ASSASSINATED(task) checks
There is no real reason to have RPC_ASSASSINATED() checks in the NFS code.
As far as it is concerned, this is just an RPC error...

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-08-03 22:06:43 -04:00
Trond Myklebust 452e93523d NFSv4: Clean up the process of renewing the NFSv4 lease
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-08-03 22:06:42 -04:00
Trond Myklebust 14516c3a30 NFSv4.1: Handle NFS4ERR_DELAY on SEQUENCE correctly
In RFC5661, an NFS4ERR_DELAY error on a SEQUENCE operation has the special
meaning that the server is not finished processing the request. In this
case we want to just retry the request without touching the slot.

Also fix a bug whereby we would fail to update the sequence id if the
server returned any error other than NFS_OK/NFS4ERR_DELAY.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-08-03 22:06:42 -04:00
Trond Myklebust 0a8ebba943 NFS: nfs_rename() should not have to flush out writebacks
We don't really support nfs servers that invalidate the file handle after a
rename, so precautions such as flushing out dirty data before renaming the
file are superfluous.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-08-03 22:06:41 -04:00
Trond Myklebust 1b924e5f87 NFS: Clean up the callers of nfs_wb_all()
There is no need to flush out writes before calling nfs_wb_all().

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-08-03 22:06:40 -04:00
Trond Myklebust af7fa16506 NFS: Fix up the fsync code
Christoph points out that the VFS will always flush out data before calling
nfs_fsync(), so we can dispense with a full call to nfs_wb_all(), and
replace that with a simpler call to nfs_commit_inode().

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-08-03 22:06:07 -04:00
Theodore Ts'o 8dd420466c jbd2: Remove t_handle_lock from start_this_handle()
This should remove the last exclusive lock from start_this_handle(),
so that we should now be able to start multiple transactions at the
same time on large SMP systems.

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2010-08-03 21:38:29 -04:00
Theodore Ts'o a931da6ac9 jbd2: Change j_state_lock to be a rwlock_t
Lockstat reports have shown that j_state_lock is a major source of
lock contention, especially on systems with more than 4 CPU cores.  So
change it to be a read/write spinlock.

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2010-08-03 21:35:12 -04:00
Linus Torvalds 3a09b1be53 Merge git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-2.6-nmw
* git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-2.6-nmw:
  GFS2: Fix recovery stuck bug (try #2)
  GFS2: Fix typo in stuffed file data copy handling
  Revert "GFS2: recovery stuck on transaction lock"
  GFS2: Make "try" lock not try quite so hard
  GFS2: remove dependency on __GFP_NOFAIL
  GFS2: Simplify gfs2_write_alloc_required
  GFS2: Wait for journal id on mount if not specified on mount command line
  GFS2: Use nobh_writepage
2010-08-03 14:40:10 -07:00
Linus Torvalds c939f9f9d2 Merge branch 'linux-next' of git://git.infradead.org/ubifs-2.6
* 'linux-next' of git://git.infradead.org/ubifs-2.6:
  UBIFS: fix a memory leak on error path.
  UBIFS: fix GC LEB recovery
  UBIFS: use ERR_CAST
  UBIFS: check return code
2010-08-03 14:37:02 -07:00
Linus Torvalds b8b3e9058f Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ericvh/v9fs
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ericvh/v9fs: (22 commits)
  9p: fix sparse warnings in new xattr code
  fs/9p: remove sparse warning in vfs_inode
  fs/9p: destroy fid on failed remove
  fs/9p: Prevent parallel rename when doing fid_lookup
  fs/9p: Add support user. xattr
  net/9p: Implement TXATTRCREATE 9p call
  net/9p: Implement attrwalk 9p call
  9p: Implement LOPEN
  fs/9p: This patch implements TLCREATE for 9p2000.L protocol.
  9p: Implement TMKDIR
  9p: Implement TMKNOD
  9p: Define and implement TSYMLINK for 9P2000.L
  9p: Define and implement TLINK for 9P2000.L
  9p: Define and implement TLINK for 9P2000.L
  9p: Implement client side of setattr for 9P2000.L protocol.
  9p: getattr client implementation for 9P2000.L protocol.
  fs/9p: Pass the correct user credentials during attach
  net/9p: Handle the server returned error properly
  9p: readdir implementation for 9p2000.L
  9p: Make use of iounit for read/write
  ...
2010-08-03 14:36:16 -07:00
Linus Torvalds 51102ee5b8 Merge branch 'for-linus' of git://oss.sgi.com/xfs/xfs
* 'for-linus' of git://oss.sgi.com/xfs/xfs: (49 commits)
  xfs simplify and speed up direct I/O completions
  xfs: move aio completion after unwritten extent conversion
  direct-io: move aio_complete into ->end_io
  xfs: fix big endian build
  xfs: clean up xfs_bmap_get_bp
  xfs: simplify xfs_truncate_file
  xfs: kill the b_strat callback in xfs_buf
  xfs: remove obsolete osyncisosync mount option
  xfs: clean up filestreams helpers
  xfs: fix gcc 4.6 set but not read and unused statement warnings
  xfs: Fix build when CONFIG_XFS_POSIX_ACL=n
  xfs: fix unsigned underflow in xfs_free_eofblocks
  xfs: use GFP_NOFS for page cache allocation
  xfs: fix memory reclaim recursion deadlock on locked inode buffer
  xfs: fix xfs_trans_add_item() lockdep warnings
  xfs: simplify and remove xfs_ireclaim
  xfs: don't block on buffer read errors
  xfs: move inode shrinker unregister even earlier
  xfs: remove a dmapi leftover
  xfs: writepage always has buffers
  ...
2010-08-03 14:33:38 -07:00
Sage Weil e9d1774431 ceph: do not ignore osd_idle_ttl mount option
Actually apply the mount option to the mount_args struct.

Signed-off-by: Sage Weil <sage@newdream.net>
2010-08-03 12:56:57 -07:00
Sage Weil 52dfb8ac0e ceph: constify dentry_operations
This makes checkpatch happy.

Signed-off-by: Sage Weil <sage@newdream.net>
2010-08-03 10:25:30 -07:00
Sage Weil 213c99ee0c ceph: whitespace cleanup
Signed-off-by: Sage Weil <sage@newdream.net>
2010-08-03 10:25:11 -07:00
Matthieu CASTET c18de72fb3 UBIFS: fix a memory leak on error path.
In 'mount_ubifs()', in case of 'ubifs_leb_unmap()' falure,
free allocated resources.

Signed-off-by: Matthieu CASTET <matthieu.castet@parrot.com>
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
2010-08-03 08:58:09 +03:00
Greg Farnum 40819f6fb2 ceph: add flock/fcntl lock support
Implement flock inode operation to support advisory file locking.  All
lock/unlock operations are synchronous with the MDS.  Lock state is
sent when reconnecting to a recovering MDS to restore the shared lock
state.

Signed-off-by: Greg Farnum <gregf@hq.newdream.net>
Signed-off-by: Sage Weil <sage@newdream.net>
2010-08-02 16:10:53 -07:00
Greg Farnum fbaad9797a ceph: define on-wire types, constants for file locking support
Define the MDS operations and data types for doing file advisory locking
with the MDS.

Signed-off-by: Greg Farnum <gregf@hq.newdream.net>
Signed-off-by: Sage Weil <sage@newdream.net>
2010-08-02 15:48:54 -07:00
Greg Farnum c6f3fdc592 ceph: add CEPH_FEATURE_FLOCK to the supported feature bits
This informs the server that we will accept v2 client_caps format and v2
client_reconnect format messages.

Signed-off-by: Greg Farnum <gregf@hq.newdream.net>
Signed-off-by: Sage Weil <sage@newdream.net>
2010-08-02 15:48:51 -07:00
Sage Weil 20cb34ae9e ceph: support v2 reconnect encoding
Encode either old or v2 encoding of client_reconnect message, depending on
whether the peer has the FLOCK feature bit.

Signed-off-by: Sage Weil <sage@newdream.net>
2010-08-02 15:48:50 -07:00
Sage Weil ce1fbc8dd6 ceph: support v2 client_caps encoding
Add support for v2 encoding of MClientCaps, which includes a flock blob.

Signed-off-by: Sage Weil <sage@newdream.net>
2010-08-02 15:48:49 -07:00
Sage Weil cbbfe49905 ceph: move AES iv definition to shared header
Signed-off-by: Sage Weil <sage@newdream.net>
2010-08-02 15:48:31 -07:00
Eric Van Hensbergen 327aec03ac 9p: fix sparse warnings in new xattr code
fixes:

  CHECK   fs/9p/xattr.c
	fs/9p/xattr.c:73:6: warning: Using plain integer as NULL pointer
	fs/9p/xattr.c:135:6: warning: Using plain integer as NULL pointer

Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
2010-08-02 14:28:38 -05:00
Eric Van Hensbergen ea1375333e fs/9p: remove sparse warning in vfs_inode
make v9fs_dentry_from_dir_inode static

Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
2010-08-02 14:28:37 -05:00
Aneesh Kumar K.V a534c8d15b fs/9p: Prevent parallel rename when doing fid_lookup
During fid lookup we need to make sure that the dentry->d_parent doesn't
change so that we can safely walk the parent dentries. To ensure that
we need to prevent cross directory rename during fid_lookup. Add a
per superblock rename_sem rw_semaphore to prevent parallel fid lookup and
rename.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
2010-08-02 14:28:35 -05:00
Aneesh Kumar K.V ebf46264a0 fs/9p: Add support user. xattr
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
2010-08-02 14:28:35 -05:00
M. Mohan Kumar ef56547efa 9p: Implement LOPEN
Implement 9p2000.L version of open(LOPEN) interface in 9p client.

For LOPEN, no need to convert the flags to and from 9p mode to VFS mode.

Synopsis:

    size[4] Tlopen tag[2] fid[4] mode[4]

    size[4] Rlopen tag[2] qid[13] iounit[4]

[Fix mode bit format - jvrao@linux.vnet.ibm.com]

Signed-off-by: M. Mohan Kumar <mohan@in.ibm.com>
Signed-off-by: Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com>
Signed-off-by: Eric Van Hensbegren <ericvh@gmail.com>
2010-08-02 14:28:32 -05:00
Venkateswararao Jujjuri (JV) 5643135a28 fs/9p: This patch implements TLCREATE for 9p2000.L protocol.
SYNOPSIS

    size[4] Tlcreate tag[2] fid[4] name[s] flags[4] mode[4] gid[4]

    size[4] Rlcreate tag[2] qid[13] iounit[4]

DESCRIPTION

The Tlreate request asks the file server to create a new regular file with the
name supplied, in the directory (dir) represented by fid.
The mode argument specifies the permissions to use. New file is created with
the uid if the fid and with supplied gid.

The flags argument represent Linux access mode flags with which the caller
is requesting to open the file with. Protocol allows all the Linux access
modes but it is upto the server to allow/disallow any of these acess modes.
If the server doesn't support any of the access mode, it is expected to
return error.

Signed-off-by: Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
2010-08-02 14:28:32 -05:00
M. Mohan Kumar 01a622bd74 9p: Implement TMKDIR
Implement TMKDIR as part of 2000.L Work

Synopsis

    size[4] Tmkdir tag[2] fid[4] name[s] mode[4] gid[4]

    size[4] Rmkdir tag[2] qid[13]

Description

    mkdir asks the file server to create a directory with given name,
    mode and gid. The qid for the new directory is returned with
    the mkdir reply message.

Note: 72 is selected as the opcode for TMKDIR from the reserved list.

Signed-off-by: M. Mohan Kumar <mohan@in.ibm.com>
Signed-off-by: Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
2010-08-02 14:28:31 -05:00
M. Mohan Kumar 4b43516ab1 9p: Implement TMKNOD
Synopsis

    size[4] Tmknod tag[2] fid[4] name[s] mode[4] major[4] minor[4] gid[4]

    size[4] Rmknod tag[2] qid[13]

Description

    mknod asks the file server to create a device node with given major and
    minor number, mode and gid. The qid for the new device node is returned
    with the mknod reply message.

[sripathik@in.ibm.com: Fix error handling code]

Signed-off-by: M. Mohan Kumar <mohan@in.ibm.com>
Signed-off-by: Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
2010-08-02 14:28:30 -05:00
Venkateswararao Jujjuri (JV) 50cc42ff3d 9p: Define and implement TSYMLINK for 9P2000.L
Create a symbolic link

SYNOPSIS

size[4] Tsymlink tag[2] fid[4] name[s] symtgt[s] gid[4]

size[4] Rsymlink tag[2] qid[13]

DESCRIPTION

Create a symbolic link named 'name' pointing to 'symtgt'.
gid represents the effective group id of the caller.
The  permissions of a symbolic link are irrelevant hence it is omitted
from the protocol.

Signed-off-by: Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com>
Reviewed-by: Sripathi Kodi <sripathik@in.ibm.com>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
2010-08-02 14:28:29 -05:00
Eric Van Hensbergen 09d34ee5f9 9p: Define and implement TLINK for 9P2000.L
This patch adds a helper function to get the dentry from inode and
uses it in creating a Hardlink

SYNOPSIS

size[4] Tlink tag[2] dfid[4] oldfid[4] newpath[s]

size[4] Rlink tag[2]

DESCRIPTION

Create a link 'newpath' in directory pointed by dfid linking to oldfid path.

[sripathik@in.ibm.com : p9_client_link should not free req structure
if p9_client_rpc has returned an error.]

Signed-off-by: Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
2010-08-02 14:28:09 -05:00
Sripathi Kodi 87d7845aa0 9p: Implement client side of setattr for 9P2000.L protocol.
SYNOPSIS

      size[4] Tsetattr tag[2] attr[n]

      size[4] Rsetattr tag[2]

    DESCRIPTION

      The setattr command changes some of the file status information.
      attr resembles the iattr structure used in Linux kernel. It
      specifies which status parameter is to be changed and to what
      value. It is laid out as follows:

         valid[4]
            specifies which status information is to be changed. Possible
            values are:
            ATTR_MODE       (1 << 0)
            ATTR_UID        (1 << 1)
            ATTR_GID        (1 << 2)
            ATTR_SIZE       (1 << 3)
            ATTR_ATIME      (1 << 4)
            ATTR_MTIME      (1 << 5)
            ATTR_ATIME_SET  (1 << 7)
            ATTR_MTIME_SET  (1 << 8)

            The last two bits represent whether the time information
            is being sent by the client's user space. In the absense
            of these bits the server always uses server's time.

         mode[4]
            File permission bits

         uid[4]
            Owner id of file

         gid[4]
            Group id of the file

         size[8]
            File size

         atime_sec[8]
            Time of last file access, seconds

         atime_nsec[8]
            Time of last file access, nanoseconds

         mtime_sec[8]
            Time of last file modification, seconds

         mtime_nsec[8]
            Time of last file modification, nanoseconds

Explanation of the patches:
--------------------------

*) The kernel just copies relevent contents of iattr structure to
   p9_iattr_dotl structure and passes it down to the client. The
   only check it has is calling inode_change_ok()
*) The p9_iattr_dotl structure does not have ctime and ia_file
   parameters because I don't think these are needed in our case.
   The client user space can request updating just ctime by calling
   chown(fd, -1, -1). This is handled on server side without a need
   for putting ctime on the wire.
*) The server currently supports changing mode, time, ownership and
   size of the file.
*) 9P RFC says "Either all the changes in wstat request happen, or
   none of them does: if the request succeeds, all changes were made;
   if it fails, none were."
   I have not done anything to implement this specifically because I
   don't see a reason.

Signed-off-by: Sripathi Kodi <sripathik@in.ibm.com>
Signed-off-by: Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
2010-08-02 14:25:10 -05:00
Sripathi Kodi f085312204 9p: getattr client implementation for 9P2000.L protocol.
SYNOPSIS

              size[4] Tgetattr tag[2] fid[4] request_mask[8]

              size[4] Rgetattr tag[2] lstat[n]

           DESCRIPTION

              The getattr transaction inquires about the file identified by fid.
              request_mask is a bit mask that specifies which fields of the
              stat structure is the client interested in.

              The reply will contain a machine-independent directory entry,
              laid out as follows:

                 st_result_mask[8]
                    Bit mask that indicates which fields in the stat structure
                    have been populated by the server

                 qid.type[1]
                    the type of the file (directory, etc.), represented as a bit
                    vector corresponding to the high 8 bits of the file's mode
                    word.

                 qid.vers[4]
                    version number for given path

                 qid.path[8]
                    the file server's unique identification for the file

                 st_mode[4]
                    Permission and flags

                 st_uid[4]
                    User id of owner

                 st_gid[4]
                    Group ID of owner

                 st_nlink[8]
                    Number of hard links

                 st_rdev[8]
                    Device ID (if special file)

                 st_size[8]
                    Size, in bytes

                 st_blksize[8]
                    Block size for file system IO

                 st_blocks[8]
                    Number of file system blocks allocated

                 st_atime_sec[8]
                    Time of last access, seconds

                 st_atime_nsec[8]
                    Time of last access, nanoseconds

                 st_mtime_sec[8]
                    Time of last modification, seconds

                 st_mtime_nsec[8]
                    Time of last modification, nanoseconds

                 st_ctime_sec[8]
                    Time of last status change, seconds

                 st_ctime_nsec[8]
                    Time of last status change, nanoseconds

                 st_btime_sec[8]
                    Time of creation (birth) of file, seconds

                 st_btime_nsec[8]
                    Time of creation (birth) of file, nanoseconds

                 st_gen[8]
                    Inode generation

                 st_data_version[8]
                    Data version number

              request_mask and result_mask bit masks contain the following bits
                 #define P9_STATS_MODE          0x00000001ULL
                 #define P9_STATS_NLINK         0x00000002ULL
                 #define P9_STATS_UID           0x00000004ULL
                 #define P9_STATS_GID           0x00000008ULL
                 #define P9_STATS_RDEV          0x00000010ULL
                 #define P9_STATS_ATIME         0x00000020ULL
                 #define P9_STATS_MTIME         0x00000040ULL
                 #define P9_STATS_CTIME         0x00000080ULL
                 #define P9_STATS_INO           0x00000100ULL
                 #define P9_STATS_SIZE          0x00000200ULL
                 #define P9_STATS_BLOCKS        0x00000400ULL

                 #define P9_STATS_BTIME         0x00000800ULL
                 #define P9_STATS_GEN           0x00001000ULL
                 #define P9_STATS_DATA_VERSION  0x00002000ULL

                 #define P9_STATS_BASIC         0x000007ffULL
                 #define P9_STATS_ALL           0x00003fffULL

        This patch implements the client side of getattr implementation for
        9P2000.L. It introduces a new structure p9_stat_dotl for getting
        Linux stat information along with QID. The data layout is similar to
        stat structure in Linux user space with the following major
        differences:

        inode (st_ino) is not part of data. Instead qid is.

        device (st_dev) is not part of data because this doesn't make sense
        on the client.

        All time variables are 64 bit wide on the wire. The kernel seems to use
        32 bit variables for these variables. However, some of the architectures
        have used 64 bit variables and glibc exposes 64 bit variables to user
        space on some architectures. Hence to be on the safer side we have made
        these 64 bit in the protocol. Refer to the comments in
        include/asm-generic/stat.h

        There are some additional fields: st_btime_sec, st_btime_nsec, st_gen,
        st_data_version apart from the bitmask, st_result_mask. The bit mask
        is filled by the server to indicate which stat fields have been
        populated by the server. Currently there is no clean way for the
        server to obtain these additional fields, so it sends back just the
        basic fields.

Signed-off-by: Sripathi Kodi <sripathik@in.ibm.com>
Signed-off-by: Eric Van Hensbegren <ericvh@gmail.com>
2010-08-02 14:25:09 -05:00
Aneesh Kumar K.V 9ffaf63e34 fs/9p: Pass the correct user credentials during attach
We need to make sure we pass the right uid value
during attach. dotl is similar to dotu in this regard.
Without this mapped security model on dotl doesn't work

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
2010-08-02 14:25:08 -05:00
Sripathi Kodi 7751bdb3a0 9p: readdir implementation for 9p2000.L
This patch implements the kernel part of readdir() implementation for 9p2000.L

    Change from V3: Instead of inode, server now sends qids for each dirent

    SYNOPSIS

    size[4] Treaddir tag[2] fid[4] offset[8] count[4]
    size[4] Rreaddir tag[2] count[4] data[count]

    DESCRIPTION

    The readdir request asks the server to read the directory specified by 'fid'
    at an offset specified by 'offset' and return as many dirent structures as
    possible that fit into count bytes. Each dirent structure is laid out as
    follows.

            qid.type[1]
              the type of the file (directory, etc.), represented as a bit
              vector corresponding to the high 8 bits of the file's mode
              word.

            qid.vers[4]
              version number for given path

            qid.path[8]
              the file server's unique identification for the file

            offset[8]
              offset into the next dirent.

            type[1]
              type of this directory entry.

            name[256]
              name of this directory entry.

    This patch adds v9fs_dir_readdir_dotl() as the readdir() call for 9p2000.L.
    This function sends P9_TREADDIR command to the server. In response the server
    sends a buffer filled with dirent structures. This is different from the
    existing v9fs_dir_readdir() call which receives stat structures from the server.
    This results in significant speedup of readdir() on large directories.
    For example, doing 'ls >/dev/null' on a directory with 10000 files on my
    laptop takes 1.088 seconds with the existing code, but only takes 0.339 seconds
    with the new readdir.

Signed-off-by: Sripathi Kodi <sripathik@in.ibm.com>
Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
2010-08-02 14:25:07 -05:00
M. Mohan Kumar 97e8442b09 9p: Make use of iounit for read/write
Change the v9fs_file_readn function to limit the maximum transfer size
based on the iounit or msize.

Also remove the redundant check for limiting the transfer size in
v9fs_file_write. This check is done by p9_client_write.

Signed-off-by: M. Mohan Kumar <mohan@in.ibm.com>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
2010-08-02 14:25:06 -05:00
Sage Weil 73a7e693f9 ceph: fix decoding of pool snap info
The pool info contains a vector for snap_info_t, not snap ids.  This fixes
the broken decoding, which would declare teh update corrupt when a pool
snapshot was created.

Signed-off-by: Sage Weil <sage@newdream.net>
2010-08-02 11:10:07 -07:00
Alex Elder 6b0a2996a0 Merge branch 'v2.6.35' 2010-08-02 10:24:57 -05:00
Justin P. Mattock 581b7e9fc0 udf: super.c Fix warning: variable 'sbi' set but not used
This fixes this warning when building the kernel:
  CC      fs/udf/super.o
fs/udf/super.c: In function 'udf_load_sequence':
fs/udf/super.c:1582:22: warning: variable 'sbi' set but not used
Please have a look, when you have time and let me know.

Signed-off-by: Justin P. Mattock <justinmattock@gmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>
2010-08-02 14:57:40 +02:00
Huang Weiyi de67445f0e udf: remove duplicated #include
Remove duplicated #include('s) in
  fs/udf/file.c

Signed-off-by: Huang Weiyi <weiyi.huang@gmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>
2010-08-02 14:57:39 +02:00
Theodore Ts'o a51dca9cd3 jbd2: Use atomic variables to avoid taking t_handle_lock in jbd2_journal_stop
By using an atomic_t for t_updates and t_outstanding credits, this
should allow us to not need to take transaction t_handle_lock in
jbd2_journal_stop().

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2010-08-02 08:43:25 -04:00
Jeff Layton cb76d5e250 cifs: fsc should not default to "on"
I'm not sure why this was merged with this flag hardcoded on, but it
seems quite dangerous. Turn it off.

Also, mount.cifs hands unrecognized options off to the kernel so there
should be no need for changes there in order to support this.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
2010-08-02 12:40:41 +00:00
Steve French f67909cf80 [CIFS] remove redundant path walking in dfs_do_refmount
Reviewed-by: Dave Howells <dhowells@redhat.com>
Signed-off-by: Igor Mammedov <niallain@gmail.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
2010-08-02 12:40:40 +00:00
Jeff Layton f636a34802 cifs: ignore the "mand", "nomand" and "_netdev" mount options
These are all handled by the userspace mount programs, but older versions
of mount.cifs also handed them off to the kernel. Ignore them.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
2010-08-02 12:40:40 +00:00
Jeff Layton 3572d2857f cifs: map NT_STATUS_ERROR_WRITE_PROTECTED to -EROFS
Seems like a more sensible mapping than -EIO.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
2010-08-02 12:40:40 +00:00
Jeff Layton f30b9c1184 cifs: don't allow cifs_iget to match inodes of the wrong type
If the type is different from what we think it should be, then don't
match the existing inode.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
2010-08-02 12:40:39 +00:00
Steve French 9f841593ff [CIFS] relinquish fscache cookie before freeing CIFSTconInfo
Doh, fix a use after free bug.

Signed-off-by: Suresh Jayaraman <sjayaraman@suse.de>
Reviewed-and-Tested-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
2010-08-02 12:40:39 +00:00
Jeff Layton 3e4b3e1f68 cifs: add separate cred_uid field to sesInfo
Right now, there's no clear separation between the uid that owns the
credentials used to do the mount and the overriding owner of the files
on that mount.

Add a separate cred_uid field that is set to the real uid
of the mount user. Unlike the linux_uid, the uid= option does not
override this parameter. The parm is sent to cifs.upcall, which can then
preferentially use the creduid= parm instead of the uid= parm for
finding credentials.

This is not the only way to solve this. We could try to do all of this
in kernel instead by having a module parameter that affects what gets
passed in the uid= field of the upcall. That said, we have a lot more
flexibility to change things in userspace so I think it probably makes
sense to do it this way.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
2010-08-02 12:40:39 +00:00
Kulikov Vasiliy f55fdcca6b fs: cifs: check kmalloc() result
If kmalloc() fails exit with -ENOMEM.

Signed-off-by: Kulikov Vasiliy <segooon@gmail.com>
Acked-by: Dave Kleikamp <shaggy@linux.vnet.ibm.com>
Acked-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
2010-08-02 12:40:39 +00:00
Steve French 0ccd48025f [CIFS] Missing ifdef
Signed-off-by: Steve French <sfrench@us.ibm.com>
2010-08-02 12:40:38 +00:00
Steve French d0e6f44e6c [CIFS] Missing line from previous commit
CC: Suresh Jayaraman <sjayaraman@suse.de>
Signed-off-by: Steve French <sfrench@us.ibm.com>
2010-08-02 12:40:38 +00:00
Steve French c5e04a3e49 [CIFS] Fix build break when CONFIG_CIFS_FSCACHE disabled
CC: Suresh Jayaraman <sjayaraman@suse.de>
Signed-off-by: Steve French <sfrench@us.ibm.com>
2010-08-02 12:40:38 +00:00
Suresh Jayaraman fa1df75d4d cifs: add mount option to enable local caching
Add a mount option 'fsc' to enable local caching on CIFS.

I considered adding a separate debug bit for caching, but it appears that
debugging would be relatively easier with the normal CIFS_INFO level.

As the cifs-utils (userspace) changes are not done yet, this patch enables
'fsc' by default to enable testing.

Signed-off-by: Suresh Jayaraman <sjayaraman@suse.de>
Acked-by: David Howells <dhowells@redhat.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
2010-08-02 12:40:37 +00:00
Suresh Jayaraman 56698236e1 cifs: read pages from FS-Cache
Read pages from a FS-Cache data storage object into a CIFS inode.

Signed-off-by: Suresh Jayaraman <sjayaraman@suse.de>
Acked-by: David Howells <dhowells@redhat.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
2010-08-02 12:40:37 +00:00
Suresh Jayaraman 9dc06558c2 cifs: store pages into local cache
Store pages from an CIFS inode into the data storage object associated with
that inode.

Signed-off-by: Suresh Jayaraman <sjayaraman@suse.de>
Signed-off-by: Steve French <sfrench@us.ibm.com>
2010-08-02 12:40:37 +00:00
Suresh Jayaraman 85f2d6b44d cifs: FS-Cache page management
Takes care of invalidation and release of FS-Cache marked pages and also
invalidation of the FsCache page flag when the inode is removed.

Signed-off-by: Suresh Jayaraman <sjayaraman@suse.de>
Acked-by: David Howells <dhowells@redhat.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
2010-08-02 12:40:36 +00:00
Suresh Jayaraman 9451a9a52f cifs: define inode-level cache object and register them
Define inode-level data storage objects (managed by cifsInodeInfo structs).
Each inode-level object is created in a super-block level object and is itself
a data storage object in to which pages from the inode are stored.

The inode object is keyed by UniqueId. The coherency data being used is
LastWriteTime, LastChangeTime and end of file reported by the server.

Signed-off-by: Suresh Jayaraman <sjayaraman@suse.de>
Signed-off-by: Steve French <sfrench@us.ibm.com>
2010-08-02 12:40:36 +00:00
Suresh Jayaraman d03382ce9a cifs: define superblock-level cache index objects and register them
Define superblock-level cache index objects (managed by cifsTconInfo structs).
Each superblock object is created in a server-level index object and in itself
an index into which inode-level objects are inserted.

The superblock object is keyed by sharename. The UniqueId/IndexNumber is used to
validate that the exported share is the same since we accessed it last time.

Signed-off-by: Suresh Jayaraman <sjayaraman@suse.de>
Signed-off-by: Steve French <sfrench@us.ibm.com>
2010-08-02 12:40:36 +00:00
Jeff Layton 8913007e67 cifs: remove unused cifsUidInfo struct
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
2010-08-02 12:40:35 +00:00
Jeff Layton 4ff67b720c cifs: clean up cifs_find_smb_ses (try #2)
This patch replaces the earlier patch by the same name. The only
difference is that MAX_PASSWORD_SIZE has been increased to attempt to
match the limits that windows enforces.

Do a better job of matching sessions by authtype. Matching by username
for a Kerberos session is incorrect, and anonymous sessions need special
handling.

Also, in the case where we do match by username, we also need to match
by password. That ensures that someone else doesn't "borrow" an existing
session without needing to know the password.

Finally, passwords can be longer than 16 bytes. Bump MAX_PASSWORD_SIZE
to 512 to match the size that the userspace mount helper allows.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
2010-08-02 12:40:35 +00:00
Jeff Layton daf5b0b6f3 cifs: match secType when searching for existing tcp session
The secType is a per-tcp session entity, but the current routine doesn't
verify that it is acceptible when attempting to match an existing TCP
session.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
2010-08-02 12:40:35 +00:00
Jeff Layton 4515148ef7 cifs: move address comparison into separate function
Move the address comparator out of cifs_find_tcp_session and into a
separate function for cleanliness. Also change the argument to
that function to a "struct sockaddr" pointer. Passing pointers to
sockaddr_storage is a little odd since that struct is generally for
declaring static storage.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
2010-08-02 12:40:34 +00:00
Jeff Layton 50d971602a cifs: set the port in sockaddr in a more clearly defined fashion
This patch should replace the patch I sent a couple of weeks ago to
set the port in cifs_convert_address.

Currently we set this in cifs_find_tcp_session, but that's more of a
side effect than anything. Add a new function called cifs_fill_sockaddr.
Have it call cifs_convert_address and then set the port.

This also allows us to skip passing in the port as a separate parm to
cifs_find_tcp_session.

Also, change cifs_convert_address take a struct sockaddr * rather than
void * to make it clearer how this function should be called.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
2010-08-02 12:40:34 +00:00
Suresh Jayaraman 488f1d2d6c cifs: define server-level cache index objects and register them
Define server-level cache index objects (as managed by TCP_ServerInfo structs)
and register then with FS-Cache. Each server object is created in the CIFS
top-level index object and is itself an index into which superblock-level
objects are inserted.

The server objects are now keyed by {IPaddress,family,port} tuple.

Signed-off-by: Suresh Jayaraman <sjayaraman@suse.de>
Signed-off-by: Steve French <sfrench@us.ibm.com>
2010-08-02 12:40:34 +00:00
Suresh Jayaraman f579cf3cfd cifs: register CIFS for caching
Define CIFS for FS-Cache and register for caching. Upon registration the
top-level index object cookie will be stuck to the netfs definition by
FS-Cache.

Signed-off-by: Suresh Jayaraman <sjayaraman@suse.de>
Signed-off-by: Steve French <sfrench@us.ibm.com>
2010-08-02 12:40:33 +00:00
Joe Perches c21dfb699f fs/cifs: Remove unnecessary casts of private_data
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
2010-08-02 12:40:33 +00:00
Suresh Jayaraman 3feb41cff8 cifs: add kernel config option for CIFS Client caching support
Add a kernel config option to enable local caching for CIFS.

Signed-off-by: Suresh Jayaraman <sjayaraman@suse.de>
Signed-off-by: Steve French <sfrench@us.ibm.com>
2010-08-02 12:40:33 +00:00
Suresh Jayaraman c6332e237f cifs: remove unused ip_address field in struct TCP_Server_Info
The ip_address field is not used and seems redundant as there is union addr
already and I don't see any future use as well.

Signed-off-by: Suresh Jayaraman <sjayaraman@suse.de>
Signed-off-by: Steve French <sfrench@us.ibm.com>
2010-08-02 12:40:33 +00:00
Suresh Jayaraman e4317ceca2 cifs: remove an potentially confusing, obsolete comment
The recent commit 6ca9f3bae8 modified the code so
that filp is full instantiated whenever the file is created and passed back.
The below comment is no longer true, remove it.

Cc: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Suresh Jayaraman <sjayaraman@suse.de>
Signed-off-by: Steve French <sfrench@us.ibm.com>
2010-08-02 12:40:32 +00:00
Suresh Jayaraman abd2e44dca cifs: guard cifsglob.h against multiple inclusion
Add conditional compile macros to guard the header file against multiple
inclusion.

Signed-off-by: Suresh Jayaraman <sjayaraman@suse.de>
Acked-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
2010-08-02 12:40:32 +00:00
Steven Whitehouse 0809f6ec18 GFS2: Fix recovery stuck bug (try #2)
This is a clean up of the code which deals with LM_FLAG_NOEXP
which aims to remove any possible race conditions by using
gl_spin to cover the gap between testing for the LM_FLAG_NOEXP
and the GL_FROZEN flag.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2010-08-02 10:15:17 +01:00
Ingo Molnar 3772b73472 Merge commit 'v2.6.35' into perf/core
Conflicts:
	tools/perf/Makefile
	tools/perf/util/hist.c

Merge reason: Resolve the conflicts and update to latest upstream.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-08-02 08:31:54 +02:00
Eric Paris d09ca73979 security: make LSMs explicitly mask off permissions
SELinux needs to pass the MAY_ACCESS flag so it can handle auditting
correctly.  Presently the masking of MAY_* flags is done in the VFS.  In
order to allow LSMs to decide what flags they care about and what flags
they don't just pass them all and the each LSM mask off what they don't
need.  This patch should contain no functional changes to either the VFS or
any LSM.

Signed-off-by: Eric Paris <eparis@redhat.com>
Acked-by:  Stephen D. Smalley <sds@tycho.nsa.gov>
Signed-off-by: James Morris <jmorris@namei.org>
2010-08-02 15:35:07 +10:00
Eric Paris 9cfcac810e vfs: re-introduce MAY_CHDIR
Currently MAY_ACCESS means that filesystems must check the permissions
right then and not rely on cached results or the results of future
operations on the object.  This can be because of a call to sys_access() or
because of a call to chdir() which needs to check search without relying on
any future operations inside that dir.  I plan to use MAY_ACCESS for other
purposes in the security system, so I split the MAY_ACCESS and the
MAY_CHDIR cases.

Signed-off-by: Eric Paris <eparis@redhat.com>
Acked-by:  Stephen D. Smalley <sds@tycho.nsa.gov>
Signed-off-by: James Morris <jmorris@namei.org>
2010-08-02 15:35:06 +10:00
Tetsuo Handa ea0d3ab239 LSM: Remove unused arguments from security_path_truncate().
When commit be6d3e56a6 "introduce new LSM hooks
where vfsmount is available." was proposed, regarding security_path_truncate(),
only "struct file *" argument (which AppArmor wanted to use) was removed.
But length and time_attrs arguments are not used by TOMOYO nor AppArmor.
Thus, let's remove these arguments.

Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Acked-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: James Morris <jmorris@namei.org>
2010-08-02 15:33:40 +10:00
Theodore Ts'o 8b67f04ab9 ext4: Add mount options in superblock
Allow mount options to be stored in the superblock.  Also add default
mount option bits for nobarrier, block_validity, discard, and nodelalloc.

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2010-08-01 23:14:20 -04:00
Sage Weil 2d9c98ae97 ceph: make ->sync_fs not wait if wait==0
The ->sync_fs() super op only needs to wait if wait is true.  Otherwise,
just get some dirty cap writeback started.

Signed-off-by: Sage Weil <sage@newdream.net>
2010-08-01 20:11:42 -07:00
Sage Weil b8cd07e78e ceph: warn on missing snap realm
Well, this Shouldn't Happen, so it would be helpful to know the caller when
it does.

Signed-off-by: Sage Weil <sage@newdream.net>
2010-08-01 20:11:42 -07:00
Sage Weil effcb9ed43 ceph: print useful error message when crush rule not found
Include the crush_ruleset in the error message.

Signed-off-by: Sage Weil <sage@newdream.net>
2010-08-01 20:11:42 -07:00
Sage Weil a8b763a9b3 ceph: use %pU to print uuid (fsid)
Signed-off-by: Sage Weil <sage@newdream.net>
2010-08-01 20:11:42 -07:00
Sage Weil f0b18d9f22 ceph: sync header defs with server code
Define ROLLBACK op, IFLOCK inode lock (for advisory file locking).

Signed-off-by: Sage Weil <sage@newdream.net>
2010-08-01 20:11:42 -07:00
Sage Weil 5cd068c200 ceph: clean up header guards
Signed-off-by: Sage Weil <sage@newdream.net>
2010-08-01 20:11:42 -07:00
Sage Weil 9688f19a18 ceph: strip misleading/obsolete version, feature info
Signed-off-by: Sage Weil <sage@newdream.net>
2010-08-01 20:11:41 -07:00
Sage Weil 6a2593823a ceph: specify supported features in super.h
Specify the supported/required feature bits in super.h client code instead
of using the definitions from the shared kernel/userspace headers (which
will go away shortly).

Signed-off-by: Sage Weil <sage@newdream.net>
2010-08-01 20:11:41 -07:00
Sage Weil c309f0ab26 ceph: clean up fsid mount option
Specify the fsid mount option in hex, not via the major/minor u64 hackery we had
before.

Signed-off-by: Sage Weil <sage@newdream.net>
2010-08-01 20:11:41 -07:00
Sage Weil e0f9f9ee8f ceph: remove unused 'monport' mount option
Signed-off-by: Sage Weil <sage@newdream.net>
2010-08-01 20:11:41 -07:00
Greg Farnum e55b71f802 ceph: handle ESTALE properly; on receipt send to authority if it wasn't
Signed-off-by: Greg Farnum <gregf@hq.newdream.net>
Signed-off-by: Sage Weil <sage@newdream.net>
2010-08-01 20:11:41 -07:00
Greg Farnum 2bc50259fa ceph: add ceph_get_cap_for_mds function.
Signed-off-by: Greg Farnum <gregf@hq.newdream.net>
Signed-off-by: Sage Weil <sage@newdream.net>
2010-08-01 20:11:41 -07:00
Sage Weil 154f42c2c3 ceph: connect to export targets on cap export
When we get a cap EXPORT message, make sure we are connected to all export
targets to ensure we can handle the matching IMPORT.

Signed-off-by: Sage Weil <sage@newdream.net>
2010-08-01 20:11:41 -07:00
Sage Weil cb170a2215 ceph: connect to export targets if mds is laggy
If an MDS we are talking to may have failed, we need to open sessions to
its potential export targets to ensure that any in-progress migration that
may have involved some of our caps is properly handled.

Signed-off-by: Sage Weil <sage@newdream.net>
2010-08-01 20:11:40 -07:00
Sage Weil ed0552a1a2 ceph: introduce helper to connect to mds export targets
There are a few cases where we need to open sessions with a given mds's
potential export targets.

Signed-off-by: Sage Weil <sage@newdream.net>
2010-08-01 20:11:40 -07:00
Sage Weil 796d6955a5 ceph: only set num_pages in calc_layout
Setting it elsewhere is unnecessary and more fragile.

Signed-off-by: Sage Weil <sage@newdream.net>
2010-08-01 20:11:40 -07:00
Yehuda Sadeh 37151668ba ceph: do caps accounting per mds_client
Caps related accounting is now being done per mds client instead
of just being global. This prepares ground work for a later revision
of the caps preallocated reservation list.

Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net>
Signed-off-by: Sage Weil <sage@newdream.net>
2010-08-01 20:11:40 -07:00
Sage Weil 0deb01c999 ceph: track laggy state of mds from mdsmap
Signed-off-by: Sage Weil <sage@newdream.net>
2010-08-01 20:11:40 -07:00
Yehuda Sadeh cd84db6e40 ceph: code cleanup
Mainly fixing minor issues reported by sparse.

Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net>
Signed-off-by: Sage Weil <sage@newdream.net>
2010-08-01 20:11:40 -07:00
Sage Weil ca81f3f6bd ceph: skip if no auth cap in flush_snaps
If we have a capsnap but no auth cap (e.g. because it is migrating to
another mds), bail out and do nothing for now.  Do NOT remove the capsnap
from the flush list.

Signed-off-by: Sage Weil <sage@newdream.net>
2010-08-01 20:11:39 -07:00
Sage Weil 3b454c4945 ceph: simplify caps revocation, fix for multimds
The caps revocation should either initiate writeback, invalidateion, or
call check_caps to ack or do the dirty work.  The primary question is
whether we can get away with only checking the auth cap or whether all
caps need to be checked.

The old code was doing...something else.  At the very least, revocations
from non-auth MDSs could break by triggering the "check auth cap only"
case.

Signed-off-by: Sage Weil <sage@newdream.net>
2010-08-01 20:11:39 -07:00
Sage Weil 38e8883ee3 ceph: simplify add_cap_releases
No functional change, aside from more useful debug output.

Signed-off-by: Sage Weil <sage@newdream.net>
2010-08-01 20:11:39 -07:00
Sage Weil ee6b272b9c ceph: drop unused argument
Signed-off-by: Sage Weil <sage@newdream.net>
2010-08-01 20:11:39 -07:00
Sage Weil 2962507ca2 ceph: perform lazy reads when file mode and caps permit
If the file mode is marked as "lazy," perform cached/buffered reads when
the caps permit it.  Adjust the rdcache_gen and invalidation logic
accordingly so that we manage our cache based on the FILE_CACHE -or-
FILE_LAZYIO cap bits.

Signed-off-by: Sage Weil <sage@newdream.net>
2010-08-01 20:11:39 -07:00
Sage Weil 33caad324b ceph: perform lazy writes when file mode and caps permit
If we have marked a file as "lazy" (using the ceph ioctl), perform buffered
writes when the MDS caps allow it.

Signed-off-by: Sage Weil <sage@newdream.net>
2010-08-01 20:11:39 -07:00
Sage Weil 8c6e9229fc ceph: add LAZYIO ioctl to mark a file description for lazy consistency
Allow an application to mark a file descriptor for lazy file consistency
semantics, allowing buffered reads and writes when multiple clients are
accessing the same file.

Signed-off-by: Sage Weil <sage@newdream.net>
2010-08-01 20:11:39 -07:00
Sage Weil 84d9509234 ceph: request FILE_LAZYIO cap when LAZY file mode is set
Also clean up the file flags -> file mode -> wanted caps functions while
we're at it.  This resyncs this file with userspace.

Signed-off-by: Sage Weil <sage@newdream.net>
2010-08-01 20:11:38 -07:00
Trond Myklebust 77a63f3d1e NFS: Fix a typo in include/linux/nfs_fs.h
nfs_commit_inode() needs to be defined irrespectively of whether or not
we are supporting NFSv3 and NFSv4.

Allow the compiler to optimise away code in the NFSv2-only case by
converting it into an inlined stub function.

Reported-and-tested-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-08-01 15:10:01 -07:00
Dmitry Monakhov ca0e05e4b1 ext4: force block allocation on quota_off
Perform full sync procedure so that any delayed allocation blocks are
allocated so quota will be consistent.

Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2010-08-01 17:48:36 -04:00
Eric Sandeen 437f88cc03 ext4: fix freeze deadlock under IO
Commit 6b0310fbf0 caused a regression resulting in deadlocks
when freezing a filesystem which had active IO; the vfs_check_frozen
level (SB_FREEZE_WRITE) did not let the freeze-related IO syncing
through.  Duh.

Changing the test to FREEZE_TRANS should let the normal freeze
syncing get through the fs, but still block any transactions from
starting once the fs is completely frozen.

I tested this by running fsstress in the background while periodically
snapshotting the fs and running fsck on the result.  I ran into
occasional deadlocks, but different ones.  I think this is a
fine fix for the problem at hand, and the other deadlocky things
will need more investigation.

Reported-by: Phillip Susi <psusi@cfl.rr.com>
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2010-08-01 17:33:29 -04:00
Linus Torvalds fc71ff8a6c Merge branch 'bugfixes' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6
* 'bugfixes' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6:
  NFS: Ensure that writepage respects the nonblock flag
  NFS: kswapd must not block in nfs_release_page
  nfs: include space for the NUL in root path
2010-07-30 19:02:21 -07:00