Commit Graph

24082 Commits

Author SHA1 Message Date
Josef Bacik f1e490a7eb Btrfs: set i_size properly when fallocating and we already
xfstests exposed a problem with preallocate when it fallocates a range that
already has an extent.  We don't set the new i_size properly because we see that
we already have an extent.  This isn't right and we should update i_size if the
space already exists.  With this patch we now pass xfstests 075.  Thanks,

Signed-off-by: Josef Bacik <josef@redhat.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2011-08-18 10:36:39 -04:00
Dan Carpenter 9a4327ca1f btrfs: unlock on error in btrfs_file_llseek()
There were some unlocks on error missing in a recent patch to
btrfs_file_llseek().

Signed-off-by: Dan Carpenter <error27@gmail.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2011-08-18 10:16:05 -04:00
Jeff Mahoney cb6db4e576 btrfs: btrfs_permission's RO check shouldn't apply to device nodes
This patch tightens the read-only access checks in btrfs_permission to
 match the constraints in inode_permission. Currently, even though the
 device node itself will be unmodified, read-write access to device nodes
 is denied to when the device node resides on a read-only subvolume or a
 is a file that has been marked read-only by the btrfs conversion utility.

 With this patch applied, the check only affects regular files,
 directories, and symlinks. It also restructures the code a bit so that
 we don't duplicate the MAY_WRITE check for both tests.

Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2011-08-18 10:16:03 -04:00
Timo Warns 338d0f0a6f befs: Validate length of long symbolic links.
Signed-off-by: Timo Warns <warns@pre-sense.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-08-17 13:31:24 -07:00
Namjae Jeon 710d4403a4 fat: fat16 support maximum 4GB file/vol size as WinXP or 7.
FAT16 support maximum 4GB vol/file size with 64KB cluster size.

Win NT/XP/7 increased the maximum cluster size to 64KB, and file/vol
size increased 4GB also.  Although increasing, the file size of linux
FAT is still limited at 2GB.

I found that it is limited by sb->maxbytes(0x7fffffff) when partition
is formatted by FAT16.  sb->s_maxbytes in fill_super should be set to
0xffffffff like fat32.

Signed-off-by: Namjae Jeon <linkinjeon@gmail.com>
Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
2011-08-17 19:35:00 +09:00
Mihai Moldovan 186b53701c fat: fix utf8 iocharset warning message
The fat_msg function already formats the given message and appends
a newline to it - we don't need to do this in the passed message
string as well, or will end up with a blank line printed in the
kernel log ring buffer.

Also change the loglevel from error to warning.

Signed-off-by: Mihai Moldovan <ionic@ionic.de>
Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
2011-08-17 19:34:59 +09:00
Jonas Aberg 8c320c079c fat: fix build warning
This fixes a compile warning (unititialized variable) in
the fat filesystem code.

Signed-off-by: Jonas Aberg <jonas.aberg@stericsson.com>
Signed-off-by: Lee Jones <lee.jones@linaro.org>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
2011-08-17 19:34:58 +09:00
Sage Weil f81c9cdc56 Btrfs: truncate pages from clone ioctl target range
We need to truncate page cache pages for the clone ioctl target range or
else we'll confuse ourselves to no end.  If the old data was cached, we
used to still see it (until remount).  If the page was partially updated
we used to get a mix of old and new data.

Signed-off-by: Sage Weil <sage@newdream.net>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2011-08-16 21:09:31 -04:00
Miao Xie 0e58885961 Btrfs: fix uninitialized sync_pending
sync_pending is uninitialized before it be used, fix it.

Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2011-08-16 21:09:31 -04:00
Miao Xie bb3ac5a4df Btrfs: fix wrong free space information
Btrfs subtracted the size of the allocated space twice when it allocated
the space from the bitmap in the cluster, it broke the free space information
and led to oops finally.

And this patch also fixes the bug that ctl->free_space was subtracted
without lock.

Reported-by: Liu Bo <liubo2009@cn.fujitsu.com>
Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2011-08-16 21:09:31 -04:00
Dan Carpenter f4ac904c41 btrfs: memory leak in btrfs_add_inode_defrag()
We don't use the defrag struct on this path.

Signed-off-by: Dan Carpenter <error27@gmail.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2011-08-16 21:09:15 -04:00
Li Zefan c97c2916e2 Btrfs: use plain page_address() in header fields setget functions
We've stopped using highmem for extent buffers.

Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2011-08-16 21:09:15 -04:00
Tsutomu Itoh cb1b69f450 Btrfs: forced readonly when btrfs_drop_snapshot() fails
The filesystem turns readonly instead of returning the error to the
caller when detected error in btrfs_drop_snapshot().
and, because the caller doesn't check the error, the function type is
changed to 'void'.

Signed-off-by: Tsutomu Itoh <t-itoh@jp.fujitsu.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2011-08-16 21:09:15 -04:00
liubo cdcb725c05 Btrfs: check if there is enough space for balancing smarter
When checking if there is enough space for balancing a block group,
since we do not take raid types into consideration, we do not account
corrent amounts of space that we needed.  This makes us do some extra
work before we get ENOSPC.

Signed-off-by: Liu Bo <liubo2009@cn.fujitsu.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2011-08-16 21:09:15 -04:00
liubo 38c01b9605 Btrfs: fix a bug of balance on full multi-disk partitions
When balancing, we'll first try to shrink devices for some space,
but if it is working on a full multi-disk partition with raid protection,
we may encounter a bug, that is, while shrinking, total_bytes may be less
than bytes_used, and btrfs may allocate a dev extent that accesses out of
device's bounds.

Then we will not be able to write or read the data which stores at the end
of the device, and get the followings:

device fsid 0939f071-7ea3-46c8-95df-f176d773bfb6 devid 1 transid 10 /dev/sdb5
Btrfs detected SSD devices, enabling SSD mode
btrfs: relocating block group 476315648 flags 9
btrfs: found 4 extents
attempt to access beyond end of device
sdb5: rw=145, want=546176, limit=546147
attempt to access beyond end of device
sdb5: rw=145, want=546304, limit=546147
attempt to access beyond end of device
sdb5: rw=145, want=546432, limit=546147
attempt to access beyond end of device
sdb5: rw=145, want=546560, limit=546147
attempt to access beyond end of device

Signed-off-by: Liu Bo <liubo2009@cn.fujitsu.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2011-08-16 21:09:15 -04:00
liubo 34f3e4f23c Btrfs: fix an oops of log replay
When btrfs recovers from a crash, it may hit the oops below:

------------[ cut here ]------------
kernel BUG at fs/btrfs/inode.c:4580!
[...]
RIP: 0010:[<ffffffffa03df251>]  [<ffffffffa03df251>] btrfs_add_link+0x161/0x1c0 [btrfs]
[...]
Call Trace:
 [<ffffffffa03e7b31>] ? btrfs_inode_ref_index+0x31/0x80 [btrfs]
 [<ffffffffa04054e9>] add_inode_ref+0x319/0x3f0 [btrfs]
 [<ffffffffa0407087>] replay_one_buffer+0x2c7/0x390 [btrfs]
 [<ffffffffa040444a>] walk_down_log_tree+0x32a/0x480 [btrfs]
 [<ffffffffa0404695>] walk_log_tree+0xf5/0x240 [btrfs]
 [<ffffffffa0406cc0>] btrfs_recover_log_trees+0x250/0x350 [btrfs]
 [<ffffffffa0406dc0>] ? btrfs_recover_log_trees+0x350/0x350 [btrfs]
 [<ffffffffa03d18b2>] open_ctree+0x1442/0x17d0 [btrfs]
[...]

This comes from that while replaying an inode ref item, we forget to
check those old conflicting DIR_ITEM and DIR_INDEX items in fs/file tree,
then we will come to conflict corners which lead to BUG_ON().

Signed-off-by: Liu Bo <liubo2009@cn.fujitsu.com>
Tested-by: Andy Lutomirski <luto@mit.edu>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2011-08-16 21:09:15 -04:00
Josef Bacik d5e2003c2b Btrfs: detect wether a device supports discard
We have a problem where if a user specifies discard but doesn't actually support
it we will return EOPNOTSUPP from btrfs_discard_extent.  This is a problem
because this gets called (in a fashion) from the tree log recovery code, which
has a nice little BUG_ON(ret) after it, which causes us to fail the tree log
replay.  So instead detect wether our devices support discard when we're adding
them and then don't issue discards if we know that the device doesn't support
it.  And just for good measure set ret = 0 in btrfs_issue_discard just in case
we still get EOPNOTSUPP so we don't screw anybody up like this again.  Thanks,

Signed-off-by: Josef Bacik <josef@redhat.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2011-08-16 21:09:15 -04:00
Jeff Layton fa71f44706 cifs: demote cERROR in build_path_from_dentry to cFYI
Running the cthon tests on a recent kernel caused this message to pop
occasionally:

    CIFS VFS: did not end path lookup where expected namelen is 0

Some added debugging showed that namelen and dfsplen were both 0 when
this occurred. That means that the read_seqretry returned true.

Assuming that the comment inside the if statement is true, this should
be harmless and just means that we raced with a rename. If that is the
case, then there's no need for alarm and we can demote this to cFYI.

While we're at it, print the dfsplen too so that we can see what
happened here if the message pops during debugging.

Cc: stable@kernel.org
Cc: Al Viro <viro@ZenIV.linux.org.uk>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
2011-08-16 13:07:24 +00:00
Sage Weil 795858dbd2 ceph: fix encoding of ino only (not relative) paths
A 'path' consists of a starting ino and relative component.  Encode even
when there is no relative component.  This is primarily needed by the
NFS reexport code.

Signed-off-by: Sage Weil <sage@newdream.net>
2011-08-15 13:03:56 -07:00
Linus Torvalds a0b3447fb1 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/shaggy/jfs-2.6
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/shaggy/jfs-2.6:
  jfs: flush journal completely before releasing metadata inodes
2011-08-15 08:40:24 -07:00
Linus Torvalds aa2b1cf5ca Merge git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6:
  cifs: Do not set cifs/ntfs acl using a file handle (try #4)
  [CIFS] Cleanup use of CONFIG_CIFS_STATS2 ifdef to make transport routines more readable
2011-08-15 08:36:30 -07:00
Theodore Ts'o 9dd75f1f1a ext4: fix nomblk_io_submit option so it correctly converts uninit blocks
Bug discovered by Jan Kara:

Finally, commit 1449032be1 returned back
the old IO submission code but apparently it forgot to return the old
handling of uninitialized buffers so we unconditionnaly call
block_write_full_page() without specifying end_io function. So AFAICS
we never convert unwritten extents to written in some cases. For
example when I mount the fs as: mount -t ext4 -o
nomblk_io_submit,dioread_nolock /dev/ubdb /mnt and do
        int fd = open(argv[1], O_RDWR | O_CREAT | O_TRUNC, 0600);
        char buf[1024];
        memset(buf, 'a', sizeof(buf));
        fallocate(fd, 0, 0, 16384);
        write(fd, buf, sizeof(buf));

I get a file full of zeros (after remounting the filesystem so that
pagecache is dropped) instead of seeing the first KB contain 'a's.

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Cc: stable@kernel.org
2011-08-13 12:58:21 -04:00
Tao Ma 32c80b32c0 ext4: Resolve the hang of direct i/o read in handling EXT4_IO_END_UNWRITTEN.
EXT4_IO_END_UNWRITTEN flag set and the increase of i_aiodio_unwritten
should be done simultaneously since ext4_end_io_nolock always clear
the flag and decrease the counter in the same time.

We don't increase i_aiodio_unwritten when setting
EXT4_IO_END_UNWRITTEN so it will go nagative and causes some process
to wait forever.

Part of the patch came from Eric in his e-mail, but it doesn't fix the
problem met by Michael actually.

http://marc.info/?l=linux-ext4&m=131316851417460&w=2

Reported-and-Tested-by: Michael Tokarev<mjt@tls.msk.ru>
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Tao Ma <boyu.mt@taobao.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Cc: stable@kernel.org
2011-08-13 12:30:59 -04:00
Jiaying Zhang 2581fdc810 ext4: call ext4_ioend_wait and ext4_flush_completed_IO in ext4_evict_inode
Flush inode's i_completed_io_list before calling ext4_io_wait to
prevent the following deadlock scenario: A page fault happens while
some process is writing inode A. During page fault,
shrink_icache_memory is called that in turn evicts another inode
B. Inode B has some pending io_end work so it calls ext4_ioend_wait()
that waits for inode B's i_ioend_count to become zero. However, inode
B's ioend work was queued behind some of inode A's ioend work on the
same cpu's ext4-dio-unwritten workqueue. As the ext4-dio-unwritten
thread on that cpu is processing inode A's ioend work, it tries to
grab inode A's i_mutex lock. Since the i_mutex lock of inode A is
still hold before the page fault happened, we enter a deadlock.

Also moves ext4_flush_completed_IO and ext4_ioend_wait from
ext4_destroy_inode() to ext4_evict_inode(). During inode deleteion,
ext4_evict_inode() is called before ext4_destroy_inode() and in
ext4_evict_inode(), we may call ext4_truncate() without holding
i_mutex lock. As a result, there is a race between flush_completed_IO
that is called from ext4_ext_truncate() and ext4_end_io_work, which
may cause corruption on an io_end structure. This change moves
ext4_flush_completed_IO and ext4_ioend_wait from ext4_destroy_inode()
to ext4_evict_inode() to resolve the race between ext4_truncate() and
ext4_end_io_work during inode deletion.

Signed-off-by: Jiaying Zhang <jiayingz@google.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Cc: stable@kernel.org
2011-08-13 12:17:13 -04:00
Curt Wohlgemuth 441c850857 ext4: Fix ext4_should_writeback_data() for no-journal mode
ext4_should_writeback_data() had an incorrect sequence of
tests to determine if it should return 0 or 1: in
particular, even in no-journal mode, 0 was being returned
for a non-regular-file inode.

This meant that, in non-journal mode, we would use
ext4_journalled_aops for directories, symlinks, and other
non-regular files.  However, calling journalled aop
callbacks when there is no valid handle, can cause problems.

This would cause a kernel crash with Jan Kara's commit
2d859db3e4 ("ext4: fix data corruption in inodes with
journalled data"), because we now dereference 'handle' in
ext4_journalled_write_end().

I also added BUG_ONs to check for a valid handle in the
obviously journal-only aops callbacks.

I tested this running xfstests with a scratch device in
these modes:

   - no-journal
   - data=ordered
   - data=writeback
   - data=journal

All work fine; the data=journal run has many failures and a
crash in xfstests 074, but this is no different from a
vanilla kernel.

Signed-off-by: Curt Wohlgemuth <curtw@google.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Cc: stable@kernel.org
2011-08-13 11:25:18 -04:00
Linus Torvalds e68ff9cd15 Merge branch 'for-linus' of git://oss.sgi.com/xfs/xfs
* 'for-linus' of git://oss.sgi.com/xfs/xfs:
  xfs: replace xfs_buf_geterror() with bp->b_error
  xfs: Check the return value of xfs_buf_read() for NULL
  "xfs: fix error handling for synchronous writes" revisited
  xfs: set cursor in xfs_ail_splice() even when AIL was empty
  xfs: Remove the macro XFS_BUFTARG_NAME
  xfs: Remove the macro XFS_BUF_TARGET
  xfs: Remove the macro XFS_BUF_SET_TARGET
  Replace the macro XFS_BUF_ISPINNED with helper xfs_buf_ispinned
  xfs: Remove the macro XFS_BUF_SET_PTR
  xfs: Remove the macro XFS_BUF_PTR
  xfs: Remove macro XFS_BUF_SET_START
  xfs: Remove macro XFS_BUF_HOLD
  xfs: Remove macro XFS_BUF_BUSY and family
  xfs: Remove the macro XFS_BUF_ERROR and family
  xfs: Remove the macro XFS_BUF_BFLAGS
2011-08-12 20:43:01 -07:00
Christoph Hellwig c59d87c460 xfs: remove subdirectories
Use the move from Linux 2.6 to Linux 3.x as an excuse to kill the
annoying subdirectories in the XFS source code.  Besides the large
amount of file rename the only changes are to the Makefile, a few
files including headers with the subdirectory prefix, and the binary
sysctl compat code that includes a header under fs/xfs/ from
kernel/.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Alex Elder <aelder@sgi.com>
2011-08-12 16:21:35 -05:00
Alex Elder 06f8e2d675 xfs: don't expect xfs headers to be in subdirectories
Fix up some #include directives in preparation for moving a few
header files out of xfs source subdirectories.

Note that "xfs_linux.h" also got its quoting convention for included
files switched.

Signed-off-by: Alex Elder <aelder@sgi.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2011-08-12 13:57:55 -05:00
Chandra Seetharaman e570280521 xfs: replace xfs_buf_geterror() with bp->b_error
Since we just checked bp for NULL, it is ok to replace
xfs_buf_geterror() with bp->b_error in these places.

Signed-off-by: Chandra Seetharaman <sekharan@us.ibm.com>
Signed-off-by: Alex Elder <aelder@sgi.com>
2011-08-12 13:39:40 -05:00
Chandra Seetharaman ac4d6888b2 xfs: Check the return value of xfs_buf_read() for NULL
Check the return value of xfs_buf_read() for NULL and return ENOMEM
if it is NULL.  This is necessary in a few spots to avoid subsequent
code blindly dereferencing the null buffer pointer.

Signed-off-by: Chandra Seetharaman <sekharan@us.ibm.com>
Signed-off-by: Alex Elder <aelder@sgi.com>
2011-08-12 13:39:29 -05:00
Linus Torvalds ce8a84ef1e Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (44 commits)
  e1000e: increase driver version number
  e1000e: alternate MAC address update
  e1000e: do not disable receiver on 82574/82583
  e1000e: alternate MAC address does not work on device id 0x1060
  PCnet: Fix section mismatch
  bnx2x: disable dcb on 578xx since not supported yet
  bnx2x: properly clean indirect addresses
  bnx2x: prevent race between undi_unload and load flows
  bnx2x: fix select_queue when FCoE is disabled
  bnx2x: init FCOE FP only once
  ipv4: some rt_iif -> rt_route_iif conversions
  net/bridge/netfilter/ebtables.c: use available error handling code
  net/netlabel/netlabel_kapi.c: add missing cleanup code
  net/irda: sh_sir: tidyup compile warning
  net/irda: sh_sir: add missing header
  net/irda: sh_irda: add missing header
  slcan: ldisc generated skbs are received in softirq context
  scm: Capture the full credentials of the scm sender
  tcp: initialize variable ecn_ok in syncookies path
  drivers/net/wireless/wl1251: add missing kfree
  ...
2011-08-12 06:43:53 -07:00
Boaz Harrosh 8cf1fb2163 pnfs: Automatically select blocks & objects layouts
Just like files-layout, blocks & objects layouts are part of the
NFS 4.1 protocol and should be automatically selected if NFS_4_1
is selected. The small problem is that these depend on other
Kernel support being present, while files only depends on NFS
itself.

This patch removes from the user choice the presence of objects
and blocks layout. But makes sure these are selected only if
the depended subsystems are present in the Kernel.

Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
Acked-by: Peng Tao <peng_tao@emc.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-08-11 17:51:27 -07:00
Eric Sandeen 8c20871998 ext4: Properly count journal credits for long symlinks
Commit df5e622340 ("ext4: fix deadlock in ext4_symlink() in ENOSPC
conditions") recalculated the number of credits needed for a long
symlink, in the process of splitting it into two transactions.  However,
the first credit calculation under-counted because if selinux is
enabled, credits are needed to create the selinux xattr as well.

Overrunning the reservation will result in an OOPS in
jbd2_journal_dirty_metadata() due to this assert:

  J_ASSERT_JH(jh, handle->h_buffer_credits > 0);

Fix this by increasing the reservation size.

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Acked-by: "Theodore Ts'o" <tytso@mit.edu>
Cc: stable@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-08-11 17:23:40 -07:00
Eric Sandeen d2db60df1e ext3: Properly count journal credits for long symlinks
Commit ae54870a1d ("ext3: Fix lock inversion in ext3_symlink()")
recalculated the number of credits needed for a long symlink, in the
process of splitting it into two transactions.  However, the first
credit calculation under-counted because if selinux is enabled, credits
are needed to create the selinux xattr as well.

Overrunning the reservation will result in an OOPS in
journal_dirty_metadata() due to this assert:

  J_ASSERT_JH(jh, handle->h_buffer_credits > 0);

Fix this by increasing the reservation size.

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Acked-by: "Theodore Ts'o" <tytso@mit.edu>
Cc: stable@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-08-11 17:23:40 -07:00
Vasiliy Kulikov 72fa59970f move RLIMIT_NPROC check from set_user() to do_execve_common()
The patch http://lkml.org/lkml/2003/7/13/226 introduced an RLIMIT_NPROC
check in set_user() to check for NPROC exceeding via setuid() and
similar functions.

Before the check there was a possibility to greatly exceed the allowed
number of processes by an unprivileged user if the program relied on
rlimit only.  But the check created new security threat: many poorly
written programs simply don't check setuid() return code and believe it
cannot fail if executed with root privileges.  So, the check is removed
in this patch because of too often privilege escalations related to
buggy programs.

The NPROC can still be enforced in the common code flow of daemons
spawning user processes.  Most of daemons do fork()+setuid()+execve().
The check introduced in execve() (1) enforces the same limit as in
setuid() and (2) doesn't create similar security issues.

Neil Brown suggested to track what specific process has exceeded the
limit by setting PF_NPROC_EXCEEDED process flag.  With the change only
this process would fail on execve(), and other processes' execve()
behaviour is not changed.

Solar Designer suggested to re-check whether NPROC limit is still
exceeded at the moment of execve().  If the process was sleeping for
days between set*uid() and execve(), and the NPROC counter step down
under the limit, the defered execve() failure because NPROC limit was
exceeded days ago would be unexpected.  If the limit is not exceeded
anymore, we clear the flag on successful calls to execve() and fork().

The flag is also cleared on successful calls to set_user() as the limit
was exceeded for the previous user, not the current one.

Similar check was introduced in -ow patches (without the process flag).

v3 - clear PF_NPROC_EXCEEDED on successful calls to set_user().

Reviewed-by: James Morris <jmorris@namei.org>
Signed-off-by: Vasiliy Kulikov <segoon@openwall.com>
Acked-by: NeilBrown <neilb@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-08-11 11:24:42 -07:00
Shirish Pargaonkar e22906c564 cifs: Do not set cifs/ntfs acl using a file handle (try #4)
Set security descriptor using path name instead of a file handle.
We can't be sure that the file handle has adequate permission to
set a security descriptor (to modify DACL).

Function set_cifs_acl_by_fid() has been removed since we can't be
sure how a file was opened for writing, a valid request can fail
if the file was not opened with two above mentioned permissions.
We could have opted to add on WRITE_DAC and WRITE_OWNER permissions
to file opens and then use that file handle but adding addtional
permissions such as WRITE_DAC and WRITE_OWNER could cause an
any open to fail.

And it was incorrect to look for read file handle to set a
security descriptor anyway.

Signed-off-by: Shirish Pargaonkar <shirishpargaonkar@gmail.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
2011-08-11 18:23:45 +00:00
Steve French 789e666123 [CIFS] Cleanup use of CONFIG_CIFS_STATS2 ifdef to make transport routines more readable
Christoph had requested that the stats related code (in
CONFIG_CIFS_STATS2) be moved into helpers to make code flow more
readable.   This patch should help.   For example the following
section from transport.c

                       spin_unlock(&GlobalMid_Lock);
                       atomic_inc(&ses->server->num_waiters);
                       wait_event(ses->server->request_q,
                                  atomic_read(&ses->server->inFlight)
                                    < cifs_max_pending);
                       atomic_dec(&ses->server->num_waiters);
                       spin_lock(&GlobalMid_Lock);

becomes simpler (with the patch below):
                       spin_unlock(&GlobalMid_Lock);
                       cifs_num_waiters_inc(server);
                       wait_event(server->request_q,
                                  atomic_read(&server->inFlight)
                                    < cifs_max_pending);
                       cifs_num_waiters_dec(server);
                       spin_lock(&GlobalMid_Lock);

Reviewed-by: Jeff Layton <jlayton@redhat.com>
CC: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Steve French <sfrench@us.ibm.com>
Reviewed-by: Pavel Shilovsky <piastry@etersoft.ru>
2011-08-11 18:23:45 +00:00
Peng Tao 54a33b190a NFS41: make PNFS_BLOCK selectable
PNFS_BLOCK needs BLK_DEV_DM/MD, which is not a dependency for other
pnfs layout drivers. Seperate it out so others can still build when
BLK_DEV_DM/MD is not enabled.

Also change select to depends on to avoid build failures.

Reported-and-tested-by: Randy Dunlap <rdunlap@xenotime.net>
Signed-off-by: Peng Tao <peng_tao@emc.com>
Acked-by: Benny Halevy <bhalevy@tonian.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-08-11 08:58:02 -07:00
Ajeet Yadav 9e978d8f7d "xfs: fix error handling for synchronous writes" revisited
xfs: fix for hang during synchronous buffer write error

If removed storage while synchronous buffer write underway,
"xfslogd" hangs.

Detailed log http://oss.sgi.com/archives/xfs/2011-07/msg00740.html

Related work bfc60177f8
"xfs: fix error handling for synchronous writes"

Given that xfs_bwrite actually does the shutdown already after
waiting for the b_iodone completion and given that we actually
found that calling xfs_force_shutdown from inside
xfs_buf_iodone_callbacks was a major contributor the problem
it better to drop this call.

Signed-off-by: Ajeet Yadav <ajeet.yadav.77@gmail.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Alex Elder <aelder@sgi.com>
2011-08-10 17:00:21 -05:00
Linus Torvalds d55140ce3a Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ecryptfs/ecryptfs-2.6
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ecryptfs/ecryptfs-2.6:
  Ecryptfs: Add mount option to check uid of device being mounted = expect uid
  eCryptfs: Fix payload_len unitialized variable warning
  eCryptfs: fix compile error
  eCryptfs: Return error when lower file pointer is NULL
2011-08-10 11:08:06 -07:00
John Johansen 764355487e Ecryptfs: Add mount option to check uid of device being mounted = expect uid
Close a TOCTOU race for mounts done via ecryptfs-mount-private.  The mount
source (device) can be raced when the ownership test is done in userspace.
Provide Ecryptfs a means to force the uid check at mount time.

Signed-off-by: John Johansen <john.johansen@canonical.com>
Cc: <stable@kernel.org>
Signed-off-by: Tyler Hicks <tyhicks@linux.vnet.ibm.com>
2011-08-09 23:29:01 -05:00
Alex Elder e44f4112a4 xfs: set cursor in xfs_ail_splice() even when AIL was empty
In xfs_ail_splice(), if a cursor is provided it is updated to
point to the last item on the list being spliced into the AIL.
But if the AIL was found to be empty, the cursor (if provided)
is just initialized instead.

There is no reason the empty AIL case needs to be treated any
differently.  And treating it the same way allows this code
to be rearranged a bit, with a somewhat tidier result.

Signed-off-by: Alex Elder <aelder@sgi.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
2011-08-09 15:30:43 -05:00
Tyler Hicks 99b373ff2d eCryptfs: Fix payload_len unitialized variable warning
fs/ecryptfs/keystore.c: In function ‘ecryptfs_generate_key_packet_set’:
fs/ecryptfs/keystore.c:1991:28: warning: ‘payload_len’ may be used uninitialized in this function [-Wuninitialized]
fs/ecryptfs/keystore.c:1976:9: note: ‘payload_len’ was declared here

Signed-off-by: Tyler Hicks <tyhicks@linux.vnet.ibm.com>
2011-08-09 13:42:46 -05:00
Roberto Sassu 4b6fee17b1 eCryptfs: fix compile error
This patch fixes the compile error reported at the address:

https://bugzilla.kernel.org/show_bug.cgi?id=40292

The problem arises when compiling eCryptfs as built-in and the 'encrypted'
key type as a module. The patch prevents this combination from being set in
the kernel configuration, by fixing the eCryptfs dependencies.

Signed-off-by: Roberto Sassu <roberto.sassu@polito.it>
Reported-by: David Hill <hilld@binarystorm.net>
Signed-off-by: Tyler Hicks <tyhicks@linux.vnet.ibm.com>
2011-08-09 13:42:46 -05:00
Tyler Hicks f61500e000 eCryptfs: Return error when lower file pointer is NULL
When an eCryptfs inode's lower file has been closed, and the pointer has
been set to NULL, return an error when trying to do a lower read or
write rather than calling BUG().

https://bugzilla.kernel.org/show_bug.cgi?id=37292

Signed-off-by: Tyler Hicks <tyhicks@linux.vnet.ibm.com>
Cc: <stable@kernel.org>
2011-08-09 13:42:45 -05:00
Linus Torvalds 2f84dd7091 autofs4: fix debug printk warning uncovered by cleanup
The previous comit made the autofs4 debug printouts check types against
the printout format, and uncovered this bug:

  fs/autofs4/waitq.c:106:2: warning: format ‘%08lx’ expects type ‘long unsigned int’, but argument 4 has type ‘autofs_wqt_t’

which is due to the insane type for wait_queue_token.  That thing should
be some fixed well-defined size (preferably just 'unsigned int' or
'u32') but for unexplained reasons it is randomly either 'unsigned long'
or 'unsigned int' depending on the architecture.

For now, cast it to 'unsigned long' for printing, the way we do
elsewhere.  Somebody else can try to explain the typedef mess.

(There's a reason we don't support excessive use of typedefs in the
kernel: it's usually just a good way of confusing yourself).

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-08-08 12:02:43 -07:00
Linus Torvalds c3ad996246 autofs4: clean up uaotfs use of debug/info/warning printouts
Use 'pr_debug()' for DPRINTK, which will do the proper type checking on
the arguments (without generating code) even when DEBUG isn't #defined.

Also, use the standard __VA_ARGS__ for the macros, and stop the
pointless abuse of 'do { xyz } while (0)' when the macro is already a
perfectly well-formed single statement.

Reported-by: David Howells <dhowells@redhat.com>
Suggested-by: Joe Perches <joe@perches.com>
Cc: Ian Kent <raven@themaw.net>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-08-08 11:35:17 -07:00
Johannes Weiner 478e0841b3 fuse: mark pages accessed when written to
As fuse does not use the page cache library functions when userspace
writes to a file, it did not benefit from 'c8236db mm: mark page
accessed before we write_end()' that made sure pages are properly
marked accessed when written to.

Signed-off-by: Johannes Weiner <jweiner@redhat.com>
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
2011-08-08 16:08:08 +02:00
Johannes Weiner b40cdd56df fuse: delete dead .write_begin and .write_end aops
Ever since 'ea9b990 fuse: implement perform_write', the .write_begin
and .write_end aops have been dead code.

Their task - acquiring a page from the page cache, sending out a write
request and releasing the page again - is now done batch-wise to
maximize the number of pages send per userspace request.

Signed-off-by: Johannes Weiner <jweiner@redhat.com>
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
2011-08-08 16:08:08 +02:00
Miklos Szeredi 37fb3a30b4 fuse: fix flock
Commit a9ff4f87 "fuse: support BSD locking semantics" overlooked a
number of issues with supporing flock locks over existing POSIX
locking infrastructure:

  - it's not backward compatible, passing flock(2) calls to userspace
    unconditionally (if userspace sets FUSE_POSIX_LOCKS)

  - it doesn't cater for the fact that flock locks are automatically
    unlocked on file release

  - it doesn't take into account the fact that flock exclusive locks
    (write locks) don't need an fd opened for write.

The last one invalidates the original premise of the patch that flock
locks can be emulated with POSIX locks.

This patch fixes the first two issues.  The last one needs to be fixed
in userspace if the filesystem assumed that a write lock will happen
only on a file operned for write (as in the case of the current fuse
library).

Reported-by: Sebastian Pipping <webmaster@hartwork.org>
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
2011-08-08 16:08:08 +02:00
Alex Elder 2ddb4e9406 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux 2011-08-08 07:06:24 -05:00
Florian Westphal 8bab6f1408 compat_ioctl: add compat handler for PPPIOCGL2TPSTATS
fixes following error seen on x86_64 kernel:
ioctl32(openl2tpd:7480): Unknown cmd fd(14) cmd(80487436){t:'t';sz:72} arg(ffa7e6c0) on socket:[105094]

The argument (struct pppol2tp_ioc_stats) uses "aligned_u64" and thus doesn't need
fixups.

Cc: James Chapman <jchapman@katalix.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: linux-fsdevel@vger.kernel.org
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-08-07 22:24:41 -07:00
Linus Torvalds 7813b94a54 vfs: rename 'do_follow_link' to 'should_follow_link'
Al points out that the do_follow_link() helper function really is
misnamed - it's about whether we should try to follow a symlink or not,
not about actually doing the following.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-08-07 13:42:25 -07:00
Ari Savolainen 206b1d09a5 Fix POSIX ACL permission check
After commit 3567866bf261: "RCUify freeing acls, let check_acl() go ahead in
RCU mode if acl is cached" posix_acl_permission is being called with an
unsupported flag and the permission check fails. This patch fixes the issue.

Signed-off-by: Ari Savolainen <ari.m.savolainen@gmail.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-08-07 04:52:23 -04:00
Linus Torvalds c2f340a69c Merge branch 'for-linus' of git://git.open-osd.org/linux-open-osd
* 'for-linus' of git://git.open-osd.org/linux-open-osd:
  ore: Make ore its own module
  exofs: Rename raid engine from exofs/ios.c => ore
  exofs: ios: Move to a per inode components & device-table
  exofs: Move exofs specific osd operations out of ios.c
  exofs: Add offset/length to exofs_get_io_state
  exofs: Fix truncate for the raid-groups case
  exofs: Small cleanup of exofs_fill_super
  exofs: BUG: Avoid sbi realloc
  exofs: Remove pnfs-osd private definitions
  nfs_xdr: Move nfs4_string definition out of #ifdef CONFIG_NFS_V4
2011-08-06 22:56:03 -07:00
Linus Torvalds 3ddcd0569c vfs: optimize inode cache access patterns
The inode structure layout is largely random, and some of the vfs paths
really do care.  The path lookup in particular is already quite D$
intensive, and profiles show that accessing the 'inode->i_op->xyz'
fields is quite costly.

We already optimized the dcache to not unnecessarily load the d_op
structure for members that are often NULL using the DCACHE_OP_xyz bits
in dentry->d_flags, and this does something very similar for the inode
ops that are used during pathname lookup.

It also re-orders the fields so that the fields accessed by 'stat' are
together at the beginning of the inode structure, and roughly in the
order accessed.

The effect of this seems to be in the 1-2% range for an empty kernel
"make -j" run (which is fairly kernel-intensive, mostly in filename
lookup), so it's visible.  The numbers are fairly noisy, though, and
likely depend a lot on exact microarchitecture.  So there's more tuning
to be done.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-08-06 22:53:23 -07:00
Linus Torvalds 830c0f0edc vfs: renumber DCACHE_xyz flags, remove some stale ones
Gcc tends to generate better code with small integers, including the
DCACHE_xyz flag tests - so move the common ones to be first in the list.
Also just remove the unused DCACHE_INOTIFY_PARENT_WATCHED and
DCACHE_AUTOFS_PENDING values, their users no longer exists in the source
tree.

And add a "unlikely()" to the DCACHE_OP_COMPARE test, since we want the
common case to be a nice straight-line fall-through.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-08-06 22:52:40 -07:00
Boaz Harrosh cf283ade08 ore: Make ore its own module
Export everything from ore need exporting. Change Kbuild and Kconfig
to build ore.ko as an independent module. Import ore from exofs

Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
2011-08-06 19:36:19 -07:00
Boaz Harrosh 8ff660ab85 exofs: Rename raid engine from exofs/ios.c => ore
ORE stands for "Objects Raid Engine"

This patch is a mechanical rename of everything that was in ios.c
and its API declaration to an ore.c and an osd_ore.h header. The ore
engine will later be used by the pnfs objects layout driver.

* File ios.c => ore.c

* Declaration of types and API are moved from exofs.h to a new
  osd_ore.h

* All used types are prefixed by ore_ from their exofs_ name.

* Shift includes from exofs.h to osd_ore.h so osd_ore.h is
  independent, include it from exofs.h.

Other than a pure rename there are no other changes. Next patch
will move the ore into it's own module and will export the API
to be used by exofs and later the layout driver

Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
2011-08-06 19:36:18 -07:00
Boaz Harrosh 9e9db45649 exofs: ios: Move to a per inode components & device-table
Exofs raid engine was saving on memory space by having a single layout-info,
single pid, and a single device-table, global to the filesystem. Then passing
a credential and object_id info at the io_state level, private for each
inode. It would also devise this contraption of rotating the device table
view for each inode->ino to spread out the device usage.

This is not compatible with the pnfs-objects standard, demanding that
each inode can have it's own layout-info, device-table, and each object
component it's own pid, oid and creds.

So: Bring exofs raid engine to be usable for generic pnfs-objects use by:

* Define an exofs_comp structure that holds obj_id and credential info.

* Break up exofs_layout struct to an exofs_components structure that holds a
  possible array of exofs_comp and the array of devices + the size of the
  arrays.

* Add a "comps" parameter to get_io_state() that specifies the ids creds
  and device array to use for each IO.

  This enables to keep the layout global, but the device-table view, creds
  and IDs at the inode level. It only adds two 64bit to each inode, since
  some of these members already existed in another form.

* ios raid engine now access layout-info and comps-info through the passed
  pointers. Everything is pre-prepared by caller for generic access of
  these structures and arrays.

At the exofs Level:

* Super block holds an exofs_components struct that holds the device
  array, previously in layout. The devices there are in device-table
  order. The device-array is twice bigger and repeats the device-table
  twice so now each inode's device array can point to a random device
  and have a round-robin view of the table, making it compatible to
  previous exofs versions.

* Each inode has an exofs_components struct that is initialized at
  load time, with it's own view of the device table IDs and creds.
  When doing IO this gets passed to the io_state together with the
  layout.

While preforming this change. Bugs where found where credentials with the
wrong IDs where used to access the different SB objects (super.c). As well
as some dead code. It was never noticed because the target we use does not
check the credentials.

Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
2011-08-06 19:35:32 -07:00
Boaz Harrosh 85e44df474 exofs: Move exofs specific osd operations out of ios.c
ios.c will be moving to an external library, for use by the
objects-layout-driver. Remove from it some exofs specific functions.

Also g_attr_logical_length is used both by inode.c and ios.c
move definition to the later, to keep it independent

Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
2011-08-06 19:35:31 -07:00
Boaz Harrosh e1042ba099 exofs: Add offset/length to exofs_get_io_state
In future raid code we will need to know the IO offset/length
and if it's a read or write to determine some of the array
sizes we'll need.

So add a new exofs_get_rw_state() API for use when
writeing/reading. All other simple cases are left using the
old way.

The major change to this is that now we need to call
exofs_get_io_state later at inode.c::read_exec and
inode.c::write_exec when we actually know these things. So this
patch is kept separate so I can test things apart from other
changes.

Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
2011-08-06 19:35:31 -07:00
Linus Torvalds 1957e7fdef Merge git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6:
  cifs: cope with negative dentries in cifs_get_root
  cifs: convert prefixpath delimiters in cifs_build_path_to_root
  CIFS: Fix missing a decrement of inFlight value
  cifs: demote DFS referral lookup errors to cFYI
  Revert "cifs: advertise the right receive buffer size to the server"
2011-08-06 13:54:36 -07:00
Linus Torvalds 1117f72ea0 vfs: show O_CLOEXE bit properly in /proc/<pid>/fdinfo/<fd> files
The CLOEXE bit is magical, and for performance (and semantic) reasons we
don't actually maintain it in the file descriptor itself, but in a
separate bit array.  Which means that when we show f_flags, the CLOEXE
status is shown incorrectly: we show the status not as it is now, but as
it was when the file was opened.

Fix that by looking up the bit properly in the 'fdt->close_on_exec' bit
array.

Uli needs this in order to re-implement the pfiles program:

  "For normal file descriptors (not sockets) this was the last piece of
   information which wasn't available.  This is all part of my 'give
   Solaris users no reason to not switch' effort.  I intend to offer the
   code to the util-linux-ng maintainers."

Requested-by: Ulrich Drepper <drepper@akkadia.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-08-06 11:51:33 -07:00
Linus Torvalds c21427043d oom_ajd: don't use WARN_ONCE, just use printk_once
WARN_ONCE() is very annoying, in that it shows the stack trace that we
don't care about at all, and also triggers various user-level "kernel
oopsed" logic that we really don't care about.  And it's not like the
user can do anything about the applications (sshd) in question, it's a
distro issue.

Requested-by: Andi Kleen <andi@firstfloor.org> (and many others)
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-08-06 11:43:08 -07:00
Chris Mason 2ab1ba68ae Btrfs: force unplugs when switching from high to regular priority bios
Btrfs does bio submissions from a worker thread, and each device
has a list of high priority bios and regular priority bios.

Synchronous writes go to the high priority thread while async writes
go to regular list.  This commit brings back an explicit unplug
any time we switch from high to regular priority, which makes it
easier for the block layer to give us low latencies.

Signed-off-by: Chris Mason <chris.mason@oracle.com>
2011-08-05 13:48:18 -04:00
Jeff Layton 80975d21aa cifs: cope with negative dentries in cifs_get_root
The loop around lookup_one_len doesn't handle the case where it might
return a negative dentry, which can cause an oops on the next pass
through the loop. Check for that and break out of the loop with an
error of -ENOENT if there is one.

Fixes the panic reported here:

    https://bugzilla.redhat.com/show_bug.cgi?id=727927

Reported-by: TR Bentley <home@trarbentley.net>
Reported-by: Iain Arnell <iarnell@gmail.com>
Cc: Al Viro <viro@ZenIV.linux.org.uk>
Cc: stable@kernel.org
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
2011-08-05 15:03:09 +00:00
Jeff Layton f9e8c45002 cifs: convert prefixpath delimiters in cifs_build_path_to_root
Regression from 2.6.39...

The delimiters in the prefixpath are not being converted based on
whether posix paths are in effect. Fixes:

    https://bugzilla.redhat.com/show_bug.cgi?id=727834

Reported-and-Tested-by: Iain Arnell <iarnell@gmail.com>
Reported-by: Patrick Oltmann <patrick.oltmann@gmx.net>
Cc: Pavel Shilovsky <piastryyy@gmail.com>
Cc: stable@kernel.org
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
2011-08-05 14:55:15 +00:00
Linus Torvalds 24f0eed266 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6:
  RCUify freeing acls, let check_acl() go ahead in RCU mode if acl is cached
  get rid of boilerplate switches in posix_acl.h
  fix block device fallout from ->fsync() changes
2011-08-04 16:44:40 -10:00
Boaz Harrosh 16f75bb35d exofs: Fix truncate for the raid-groups case
In the general raid-group case the truncate was wrong in that
it did not also fix the object length of the neighboring groups.

There are two bad cases in the old code:
1. Space that should be freed was not.
2. If a file That was big is truncated small, then made bigger
   again, the holes would not contain zeros but could expose old data.
   (If the growing of the file expands to more than a full
    groups cycle + group size (> S + T))

Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
2011-08-04 12:35:25 -07:00
Boaz Harrosh 9ce730475e exofs: Small cleanup of exofs_fill_super
Small cleanup that unifies duplicated code used in both the
error and success cases

Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
2011-08-04 12:35:23 -07:00
Boaz Harrosh 6d4073e881 exofs: BUG: Avoid sbi realloc
Since the beginning we realloced the sbi structure when a bigger
then one device table was specified. (I know that was really stupid).

Then much later when "register bdi" was added (By Jens) it was
registering the pointer to sbi->bdi before the realloc.

We never saw this problem because up till now the realloc did not
do anything since the device table was small enough to fit in the
original allocation. But once we starting testing with large device
tables (Bigger then 28) we noticed the crash of writeback operating
on a deallocated pointer.

* Avoid the all mess by allocating the device-table as a second array
  and get rid of the variable-sized structure and the rest of this
  mess.
* Take the chance to clean near by structures and comments.
* Add a needed dprint on startup to indicate the loaded layout.
* Also move the bdi registration to the very end because it will
  only fail in a low memory, which will probably fail before hand.
  There are many more likely causes to not load before that. This
  way the error handling is made simpler. (Just doing this would be
  enough to fix the BUG)

Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
2011-08-04 12:35:20 -07:00
Boaz Harrosh 26ae93c2dc exofs: Remove pnfs-osd private definitions
Now that pnfs-osd has hit mainline we can remove exofs's
private header. (And the FIXME comment)

Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
2011-08-04 12:35:18 -07:00
Trond Myklebust 910ac68a2b NFSv4.1: Return NFS4ERR_BADSESSION to callbacks during session resets
If the client is in the process of resetting the session when it receives
a callback, then returning NFS4ERR_DELAY may cause a deadlock with the
DESTROY_SESSION call.

Basically, if the client returns NFS4ERR_DELAY in response to the
CB_SEQUENCE call, then the server is entitled to believe that the
client is busy because it is already processing that call. In that
case, the server is perfectly entitled to respond with a
NFS4ERR_BACK_CHAN_BUSY to any DESTROY_SESSION call.

Fix this by having the client reply with a NFS4ERR_BADSESSION in
response to the callback if it is resetting the session.

Cc: stable@kernel.org [2.6.38+]
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-08-04 11:55:35 -04:00
Trond Myklebust 55a673990e NFSv4.1: Fix the callback 'highest_used_slotid' behaviour
Currently, there is no guarantee that we will call nfs4_cb_take_slot() even
though nfs4_callback_compound() will consistently call
nfs4_cb_free_slot() provided the cb_process_state has set the 'clp' field.
The result is that we can trigger the BUG_ON() upon the next call to
nfs4_cb_take_slot().

This patch fixes the above problem by using the slot id that was taken in
the CB_SEQUENCE operation as a flag for whether or not we need to call
nfs4_cb_free_slot().
It also fixes an atomicity problem: we need to set tbl->highest_used_slotid
atomically with the check for NFS4_SESSION_DRAINING, otherwise we end up
racing with the various tests in nfs4_begin_drain_session().

Cc: stable@kernel.org [2.6.38+]
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-08-04 11:55:35 -04:00
Boaz Harrosh 9af7db3228 pnfs-obj: Fix the comp_index != 0 case
There were bugs in the case of partial layout where olo_comp_index
is not zero. This used to work and was tested but one of the later
cleanup SQUASHMEs broke it and was not tested since.

Also add a dprint that specify those received layout parameters.
Everything else was already printed.

[Needed in v3.0]
CC: Stable Tree <stable@kernel.org>
Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-08-04 11:54:48 -04:00
Boaz Harrosh 20618b21da pnfs-obj: Bug when we are running out of bio
When we have a situation that the number of pages we want
to encode is bigger then the size of the bio. (Which can
currently happen only when all IO is going to a single device
.e.g group_width==1) then the IO is submitted short and we
report back only the amount of bytes we actually wrote/read
and all is fine. BUT ...

There was a bug that the current length counter was advanced
before the fail to add the extra page, and we come to a situation
that the CDB length was one-page longer then the actual bio size,
which is of course rejected by the osd-target.

While here also fix the bio size calculation, in the case
that we received more then one group of devices.

CC: Stable Tree <stable@kernel.org>
Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-08-04 11:54:38 -04:00
Heiko Carstens 88c9e42196 nfs: add missing prefetch.h include
Fix this compile error on s390:

  CC [M]  fs/nfs/blocklayout/blocklayout.o
fs/nfs/blocklayout/blocklayout.c: In function 'bl_end_io_read':
fs/nfs/blocklayout/blocklayout.c:201:4: error: implicit declaration of function 'prefetchw'

Introduced with 9549ec01 "pnfsblock: bl_read_pagelist".

Cc: Fred Isaman <iisaman@citi.umich.edu>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-08-04 11:54:25 -04:00
Linus Torvalds 14595f708e Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4
* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
  ext4: use kzalloc in ext4_kzalloc()
2011-08-03 15:09:10 -10:00
Robert P. J. Day 206506ccf0 tmpfs: expand "help" to explain value of TMPFS_POSIX_ACL
Expand the fs/Kconfig "help" info to clarify why it's a bad idea to
deselect the TMPFS_POSIX_ACL config variable.

Signed-off-by: Robert P. J. Day <rpjday@crashcourse.ca>
Acked-by: Randy Dunlap <rdunlap@xenotime.net>
Acked-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-08-03 14:25:25 -10:00
Hugh Dickins 31475dd611 mm: a few small updates for radix-swap
Remove PageSwapBacked (!page_is_file_cache) cases from
add_to_page_cache_locked() and add_to_page_cache_lru(): those pages now
go through shmem_add_to_page_cache().

Remove a comment on maximum tmpfs size from fsstack_copy_inode_size(),
and add a comment on swap entries to invalidate_mapping_pages().

And mincore_page() uses find_get_page() on what might be shmem or a
tmpfs file: allow for a radix_tree_exceptional_entry(), and proceed to
find_get_page() on swapper_space if so (oh, swapper_space needs #ifdef).

Signed-off-by: Hugh Dickins <hughd@google.com>
Acked-by: Rik van Riel <riel@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-08-03 14:25:24 -10:00
Randy Dunlap 2af1416265 fs/dcache.c: fix new kernel-doc warning
Fix new kernel-doc warning in fs/dcache.c:

  Warning(fs/dcache.c:797): No description found for parameter 'sb'

Signed-off-by: Randy Dunlap <rdunlap@xenotime.net>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-08-03 14:25:21 -10:00
Pavel Shilovsky 0193e07226 CIFS: Fix missing a decrement of inFlight value
if we failed on getting mid entry in cifs_call_async.

Cc: stable@kernel.org
Signed-off-by: Pavel Shilovsky <piastryyy@gmail.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
2011-08-03 19:42:12 +00:00
Mathias Krause db9481c047 ext4: use kzalloc in ext4_kzalloc()
Commit 9933fc0i (ext4: introduce ext4_kvmalloc(), ext4_kzalloc(), and
ext4_kvfree()) intruduced wrappers around k*alloc/vmalloc but introduced
a typo for ext4_kzalloc() by not using kzalloc() but kmalloc().

Signed-off-by: Mathias Krause <minipli@googlemail.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2011-08-03 14:57:11 -04:00
Linus Torvalds ed8f37370d Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable: (31 commits)
  Btrfs: don't call writepages from within write_full_page
  Btrfs: Remove unused variable 'last_index' in file.c
  Btrfs: clean up for find_first_extent_bit()
  Btrfs: clean up for wait_extent_bit()
  Btrfs: clean up for insert_state()
  Btrfs: remove unused members from struct extent_state
  Btrfs: clean up code for merging extent maps
  Btrfs: clean up code for extent_map lookup
  Btrfs: clean up search_extent_mapping()
  Btrfs: remove redundant code for dir item lookup
  Btrfs: make acl functions really no-op if acl is not enabled
  Btrfs: remove remaining ref-cache code
  Btrfs: remove a BUG_ON() in btrfs_commit_transaction()
  Btrfs: use wait_event()
  Btrfs: check the nodatasum flag when writing compressed files
  Btrfs: copy string correctly in INO_LOOKUP ioctl
  Btrfs: don't print the leaf if we had an error
  btrfs: make btrfs_set_root_node void
  Btrfs: fix oops while writing data to SSD partitions
  Btrfs: Protect the readonly flag of block group
  ...

Fix up trivial conflicts (due to acl and writeback cleanups) in
 - fs/btrfs/acl.c
 - fs/btrfs/ctree.h
 - fs/btrfs/extent_io.c
2011-08-02 21:14:05 -10:00
Al Viro 3567866bf2 RCUify freeing acls, let check_acl() go ahead in RCU mode if acl is cached
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-08-03 00:58:42 -04:00
Jeff Layton b802898334 cifs: demote DFS referral lookup errors to cFYI
cifs: demote DFS referral lookup errors to cFYI

Now that we call into this routine on every mount, anyone who doesn't
have the upcall configured will get multiple printks about failed lookups.

Reported-and-Tested-by: Martijn Uffing <mp3project@sarijopen.student.utwente.nl>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
2011-08-03 03:19:28 +00:00
Steve French fc05a78efb Revert "cifs: advertise the right receive buffer size to the server"
This reverts commit c4d3396b26.

Problems discovered with readdir to Samba due to
not accounting for header size properly with this change
2011-08-03 03:17:43 +00:00
Rafael J. Wysocki da5aa861be fix block device fallout from ->fsync() changes
blkdev_fsync() needs to write pages in pagecache...

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-08-01 21:33:47 -04:00
Linus Torvalds 60ad446682 Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4
* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (60 commits)
  ext4: prevent memory leaks from ext4_mb_init_backend() on error path
  ext4: use EXT4_BAD_INO for buddy cache to avoid colliding with valid inode #
  ext4: use ext4_msg() instead of printk in mballoc
  ext4: use ext4_kvzalloc()/ext4_kvmalloc() for s_group_desc and s_group_info
  ext4: introduce ext4_kvmalloc(), ext4_kzalloc(), and ext4_kvfree()
  ext4: use the correct error exit path in ext4_init_inode_table()
  ext4: add missing kfree() on error return path in add_new_gdb()
  ext4: change umode_t in tracepoint headers to be an explicit __u16
  ext4: fix races in ext4_sync_parent()
  ext4: Fix overflow caused by missing cast in ext4_fallocate()
  ext4: add action of moving index in ext4_ext_rm_idx for Punch Hole
  ext4: simplify parameters of reserve_backup_gdb()
  ext4: simplify parameters of add_new_gdb()
  ext4: remove lock_buffer in bclean() and setup_new_group_blocks()
  ext4: simplify journal handling in setup_new_group_blocks()
  ext4: let setup_new_group_blocks() set multiple bits at a time
  ext4: fix a typo in ext4_group_extend()
  ext4: let ext4_group_add_blocks() handle 0 blocks quickly
  ext4: let ext4_group_add_blocks() return an error code
  ext4: rename ext4_add_groupblocks() to ext4_group_add_blocks()
  ...

Fix up conflict in fs/ext4/inode.c: commit aacfc19c62 ("fs: simplify
the blockdev_direct_IO prototype") had changed the ext4_ind_direct_IO()
function for the new simplified calling convention, while commit
dae1e52cb1 ("ext4: move ext4_ind_* functions from inode.c to
indirect.c") moved the function to another file.
2011-08-01 13:56:03 -10:00
Linus Torvalds 1b8e94993c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6:
  xfs: Fix build breakage in xfs_iops.c when CONFIG_FS_POSIX_ACL is not set
  VFS: Reorganise shrink_dcache_for_umount_subtree() after demise of dcache_lock
  VFS: Remove dentry->d_lock locking from shrink_dcache_for_umount_subtree()
  VFS: Remove detached-dentry counter from shrink_dcache_for_umount_subtree()
  switch posix_acl_chmod() to umode_t
  switch posix_acl_from_mode() to umode_t
  switch posix_acl_equiv_mode() to umode_t *
  switch posix_acl_create() to umode_t *
  block: initialise bd_super in bdget()
  vfs: avoid call to inode_lru_list_del() if possible
  vfs: avoid taking inode_hash_lock on pipes and sockets
  vfs: conditionally call inode_wb_list_del()
  VFS: Fix automount for negative autofs dentries
  Btrfs: load the key from the dir item in readdir into a fake dentry
  devtmpfs: missing initialialization in never-hit case
  hppfs: missing include
2011-08-01 13:48:31 -10:00
Linus Torvalds a2d7730235 Merge branch 'pstore-efi' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux-2.6
* 'pstore-efi' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux-2.6:
  efivars: Introduce PSTORE_EFI_ATTRIBUTES
  efivars: Use string functions in pstore_write
  efivars: introduce utf16_strncmp
  efivars: String functions
  efi: Add support for using efivars as a pstore backend
  pstore: Allow the user to explicitly choose a backend
  pstore: Make "part" unsigned
  pstore: Add extra context for writes and erases
  pstore: Extend API for more flexibility in new backends
2011-08-01 13:40:51 -10:00
Yu Jian 79a77c5ac3 ext4: prevent memory leaks from ext4_mb_init_backend() on error path
In ext4_mb_init(), if the s_locality_group allocation fails it will
currently cause the allocations made in ext4_mb_init_backend() to
be leaked.  Moving the ext4_mb_init_backend() allocation after the
s_locality_group allocation avoids that problem.

Signed-off-by: Yu Jian <yujian@whamcloud.com>
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2011-08-01 17:41:46 -04:00
Yu Jian 48e6061bf4 ext4: use EXT4_BAD_INO for buddy cache to avoid colliding with valid inode #
Signed-off-by: Yu Jian <yujian@whamcloud.com>
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2011-08-01 17:41:39 -04:00
Theodore Ts'o 9d8b9ec442 ext4: use ext4_msg() instead of printk in mballoc
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2011-08-01 17:41:35 -04:00
Josef Bacik 0d10ee2e6d Btrfs: don't call writepages from within write_full_page
When doing a writepage we call writepages to try and write out any other dirty
pages in the area.  This could cause problems where we commit a transaction and
then have somebody else dirtying metadata in the area as we could end up writing
out a lot more than we care about, which could cause latency on anybody who is
waiting for the transaction to completely finish committing.  Thanks,

Signed-off-by: Josef Bacik <josef@redhat.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2011-08-01 14:37:36 -04:00
Mitch Harder 341d14f161 Btrfs: Remove unused variable 'last_index' in file.c
The variable 'last_index' is calculated in the __btrfs_buffered_write
function and passed as a parameter to the prepare_pages function,
but is not used anywhere in the prepare_pages function.

Remove instances of 'last_index' in these functions.

Signed-off-by: Mitch Harder <mitch.harder@sabayonlinux.org>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2011-08-01 14:32:39 -04:00
Xiao Guangrong 69261c4b6a Btrfs: clean up for find_first_extent_bit()
find_first_extent_bit() and find_first_extent_bit_state() share
most of the code, and we can just make the former call the latter.

Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2011-08-01 14:32:39 -04:00
Xiao Guangrong ded91f0814 Btrfs: clean up for wait_extent_bit()
We can just use cond_resched_lock().

Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2011-08-01 14:32:38 -04:00
Xiao Guangrong 3150b69969 Btrfs: clean up for insert_state()
Don't duplicate set_state_bits().

Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2011-08-01 14:32:30 -04:00
Xiao Guangrong 3a6d457ec7 Btrfs: remove unused members from struct extent_state
These members are not used at all.

Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2011-08-01 14:30:50 -04:00
Li Zefan 4d2c8f62f1 Btrfs: clean up code for merging extent maps
unpin_extent_cache() and add_extent_mapping() shares the same code
that merges extent maps.

Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2011-08-01 14:30:50 -04:00
Li Zefan ed64f06652 Btrfs: clean up code for extent_map lookup
lookup_extent_map() and search_extent_map() can share most of code.

Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2011-08-01 14:30:49 -04:00
Li Zefan 7e016a038e Btrfs: clean up search_extent_mapping()
rb_node returned by __tree_search() can be a valid pointer or NULL,
but won't be some errno.

Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2011-08-01 14:30:49 -04:00
Li Zefan 85d85a743d Btrfs: remove redundant code for dir item lookup
When we search a dir item with a specific hash code, we can
just return NULL without further checking if btrfs_search_slot()
returns 1.

Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2011-08-01 14:30:48 -04:00
Li Zefan 9b89d95a14 Btrfs: make acl functions really no-op if acl is not enabled
So there's no overhead for something we don't use.

Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2011-08-01 14:30:48 -04:00
Li Zefan 15de900d08 Btrfs: remove remaining ref-cache code
Since commit f2a97a9dbd
("btrfs: remove all unused functions"), there's no extern functions
at all in ref-cache.c, so just remove the remaining dead code.

Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2011-08-01 14:30:47 -04:00
Li Zefan b9c8300c2a Btrfs: remove a BUG_ON() in btrfs_commit_transaction()
wait_for_commit() always returns 0.

Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2011-08-01 14:30:47 -04:00
Li Zefan 72d63ed642 Btrfs: use wait_event()
Use wait_event() when possible to avoid code duplication.

Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2011-08-01 14:30:46 -04:00
Li Zefan e55179b3d7 Btrfs: check the nodatasum flag when writing compressed files
If mounting with nodatasum option, we won't csum file data for
general write or direct-io write, and this rule should also be
applied when writing compressed files.

Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2011-08-01 14:30:46 -04:00
Li Zefan 77906a5075 Btrfs: copy string correctly in INO_LOOKUP ioctl
Memory areas [ptr, ptr+total_len] and [name, name+total_len]
may overlap, so it's wrong to use memcpy().

Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2011-08-01 14:30:45 -04:00
Josef Bacik b783e62d96 Btrfs: don't print the leaf if we had an error
In __btrfs_free_extent we will print the leaf if we fail to find the extent we
wanted, but the problem is if we get an error we won't have a leaf so often this
leads to a NULL pointer dereference and we lose the error that actually
occurred.  So only print the leaf if ret > 0, which means we didn't find the
item we were looking for but we didn't error either.  This way the error is
preserved.

Signed-off-by: Josef Bacik <josef@redhat.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2011-08-01 14:30:45 -04:00
Mark Fasheh bf5f32ecb6 btrfs: make btrfs_set_root_node void
This is fairly trivial - btrfs_set_root_node() - always returns zero so we
can just make it void.  All callers ignore the return code now anyway.  I
also made sure to check that none of the functions that
btrfs_set_root_node() calls returns an error that we might have needed to
catch and pass back.

Signed-off-by: Mark Fasheh <mfasheh@suse.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2011-08-01 14:30:44 -04:00
liubo ff1f2b4407 Btrfs: fix oops while writing data to SSD partitions
Here I have a two SSD-partitions btrfs, and they are defaultly set to
"data=raid0, metadata=raid1", then I try to fill my btrfs partition
till "No space left on device", via "dd if=/dev/zero of=/mnt/btrfs/tmp".

I get an oops panic from kernel BUG at fs/btrfs/extent-tree.c:5199!, which
refers to find_free_extent's
BUG_ON(index != get_block_group_index(block_group));

In SSD mode, in order to find enough space to alloc, we may check the
block_group cache which has been checked sometime before, but the index is not
updated, where it hits the BUG_ON.

Signed-off-by: Liu Bo <liubo2009@cn.fujitsu.com>
Acked-by: Josef Bacik <josef@redhat.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2011-08-01 14:30:44 -04:00
WuBo 61cfea9bb8 Btrfs: Protect the readonly flag of block group
The access for ro in btrfs_block_group_cache should be protected
because of the racy lock in relocation.

Signed-off-by: Wu Bo <wu.bo@cn.fujitsu.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2011-08-01 14:30:43 -04:00
Jeff Mahoney 1bf85046e4 btrfs: Make extent-io callbacks that never fail return void
The set/clear bit and the extent split/merge hooks only ever return 0.

 Changing them to return void simplifies the error handling cases later.

 This patch changes the hook prototypes, the single implementation of each,
 and the functions that call them to return void instead.

 Since all four of these hooks execute under a spinlock, they're necessarily
 simple.

Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2011-08-01 14:30:43 -04:00
Li Zefan b6973aa622 Btrfs: fix readahead in file defrag
We passed the wrong value to btrfs_force_ra(). Fix this by changing
the argument of btrfs_force_ra() from last_index to nr_page.

Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2011-08-01 14:30:42 -04:00
Tsutomu Itoh b532402e4d Btrfs: return error to caller when btrfs_unlink() failes
When btrfs_unlink_inode() and btrfs_orphan_add() in btrfs_unlink()
are error, the error code is returned to the caller instead of
BUG_ON().

Signed-off-by: Tsutomu Itoh <t-itoh@jp.fujitsu.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2011-08-01 14:30:42 -04:00
Wanlong Gao a0f98dde11 Btrfs:don't check the return value of __btrfs_add_inode_defrag
Don't need to check the return value of __btrfs_add_inode_defrag(),
since it will always return 0.

Signed-off-by: Wanlong Gao <gaowanlong@cn.fujitsu.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2011-08-01 14:30:41 -04:00
Chris Mason b43b31bdf2 Merge branch 'alloc_path' of git://git.kernel.org/pub/scm/linux/kernel/git/mfasheh/btrfs-error-handling into for-linus 2011-08-01 14:27:34 -04:00
Dave Kleikamp 1c8007b076 jfs: flush journal completely before releasing metadata inodes
This fixes a race during unmount. We need to not only make sure that
the journal is completely written, but that the metadata changes make
it to disk before releasing ipimap and ipbmap.

Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>
2011-08-01 12:41:00 -05:00
Linus Torvalds 5f66d2b58c Merge git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6:
  CIFS: Cleanup demupltiplex thread exiting code
  CIFS: Move mid search to a separate function
  CIFS: Move RFC1002 check to a separate function
  CIFS: Simplify socket reading in demultiplex thread
  CIFS: Move buffer allocation to a separate function
  cifs: remove unneeded variable initialization in cifs_reconnect_tcon
  cifs: simplify refcounting for oplock breaks
  cifs: fix compiler warning in CIFSSMBQAllEAs
  cifs: fix name parsing in CIFSSMBQAllEAs
  cifs: don't start signing too early
  cifs: trivial: goto out here is unnecessary
  cifs: advertise the right receive buffer size to the server
2011-08-01 06:14:25 -10:00
Pavel Shilovsky 762dfd1057 CIFS: Cleanup demupltiplex thread exiting code
Reviewed-and-Tested-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Pavel Shilovsky <piastryyy@gmail.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
2011-08-01 12:49:45 +00:00
Pavel Shilovsky ad69bae178 CIFS: Move mid search to a separate function
Reviewed-and-Tested-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Pavel Shilovsky <piastryyy@gmail.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
2011-08-01 12:49:42 +00:00
Pavel Shilovsky 98bac62c9f CIFS: Move RFC1002 check to a separate function
Reviewed-and-Tested-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Pavel Shilovsky <piastryyy@gmail.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
2011-08-01 12:49:38 +00:00
Pavel Shilovsky e7015fb1c5 CIFS: Simplify socket reading in demultiplex thread
Move reading to separate function and remove csocket variable.

Also change semantic in a little: goto incomplete_rcv only when
we get -EAGAIN (or a familiar error) while reading rfc1002 header.
In this case we don't check for echo timeout when we don't get whole
header at once, as it was before.

Reviewed-and-Tested-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Pavel Shilovsky <piastryyy@gmail.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
2011-08-01 12:49:34 +00:00
Theodore Ts'o f18a5f21c2 ext4: use ext4_kvzalloc()/ext4_kvmalloc() for s_group_desc and s_group_info
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2011-08-01 08:45:38 -04:00
Theodore Ts'o 9933fc0ac1 ext4: introduce ext4_kvmalloc(), ext4_kzalloc(), and ext4_kvfree()
Introduce new helper functions which try kmalloc, and then fall back
to vmalloc if necessary, and use them for allocating and deallocating
s_flex_groups.

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2011-08-01 08:45:02 -04:00
Pavel Shilovsky 3d9c2472a5 CIFS: Move buffer allocation to a separate function
Reviewed-and-Tested-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Pavel Shilovsky <piastryyy@gmail.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
2011-08-01 12:33:44 +00:00
Yongqiang Yang 33853a0dde ext4: use the correct error exit path in ext4_init_inode_table()
This patch lets ext4_init_inode_table() handle errors right.
ext4_init_inode_table() should down_write() alloc_sem which
has been up_write()ed and stop the started journal handle.

Signed-off-by: Yongqiang Yang <xiaoqiangnk@gmail.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2011-08-01 06:32:19 -04:00
Markus Trippelsdorf 206d440f64 xfs: Fix build breakage in xfs_iops.c when CONFIG_FS_POSIX_ACL is not set
commit 4e34e719e4, that takes the ACL checks to common code,
accidentely broke the build when CONFIG_FS_POSIX_ACL is not set:

  CC      fs/xfs/linux-2.6/xfs_iops.o
fs/xfs/linux-2.6/xfs_iops.c:1025:14: error: ‘xfs_get_acl’ undeclared here (not in a function)

Fix this by declaring xfs_get_acl a static inline function.

Signed-off-by: Markus Trippelsdorf <markus@trippelsdorf.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-08-01 02:35:04 -04:00
David Howells 43c1c9cd24 VFS: Reorganise shrink_dcache_for_umount_subtree() after demise of dcache_lock
Reorganise shrink_dcache_for_umount_subtree() in light of the demise of
dcache_lock.  Without that dcache_lock, there is no need for the batching of
removal of dentries from the system under it (we wanted to make intensive use
of the locked data whilst we held it, but didn't want to hold it for long at a
time).

This works, provided the preceding patch is correct in its removal of locking
on dentry->d_lock on the basis that no one should be locking these dentries any
more as the whole superblock is defunct.

With this patch, the calls to dentry_lru_del() and __d_shrink() are placed at
the point where each dentry is detached handled.

It is possible that, as an alternative, the batching should still be done -
but only for dentry_lru_del() of all a dentry's children in one go.  In such a
case, the batching would be done under dcache_lru_lock.

Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-08-01 02:27:57 -04:00
David Howells c6627c60c0 VFS: Remove dentry->d_lock locking from shrink_dcache_for_umount_subtree()
Locks of the dcache_lock were replaced by locks of dentry->d_lock in commits
such as:

	2304450783
	2fd6b7f507

as part of the RCU-based pathwalk changes, despite the fact that the caller
(shrink_dcache_for_umount()) notes in the banner comment the reasons that
d_lock is not necessary in these functions:

/*
 * destroy the dentries attached to a superblock on unmounting
 * - we don't need to use dentry->d_lock because:
 *   - the superblock is detached from all mountings and open files, so the
 *     dentry trees will not be rearranged by the VFS
 *   - s_umount is write-locked, so the memory pressure shrinker will ignore
 *     any dentries belonging to this superblock that it comes across
 *   - the filesystem itself is no longer permitted to rearrange the dentries
 *     in this superblock
 */

So remove these locks.  If the locks are actually necessary, then this banner
comment should be altered instead.

The hash table chains are protected by 1-bit locks in the hash table heads, so
those shouldn't be a problem.

Note that to make this work, __d_drop() has to be split so that the RCUwalk
barrier can be avoided.  This causes problems otherwise as it has an assertion
that dentry->d_lock is locked - but there is no need for that as no one else
can be trying to access this dentry, except to step over it (and that should
be handled by d_free(), I think).

Signed-off-by: David Howells <dhowells@redhat.com>
Cc: Nick Piggin <npiggin@kernel.dk>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-08-01 02:27:57 -04:00
David Howells 35f40ef002 VFS: Remove detached-dentry counter from shrink_dcache_for_umount_subtree()
Remove the detached-dentry counter from shrink_dcache_for_umount_subtree() as
the value it computes is no longer used as of commit
312d3ca856 which made the nr_dentry counters
summed per-CPU rather than global atomic.

Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-08-01 02:27:57 -04:00
Al Viro 86bc704db0 switch posix_acl_chmod() to umode_t
again, that's what all callers pass to it

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-08-01 02:10:32 -04:00
Al Viro 3a5fba19b0 switch posix_acl_from_mode() to umode_t
... seeing that this is what all callers pass to it anyway.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-08-01 02:10:20 -04:00
Al Viro d6952123b5 switch posix_acl_equiv_mode() to umode_t *
... so that &inode->i_mode could be passed to it

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-08-01 02:10:06 -04:00
Al Viro d3fb612076 switch posix_acl_create() to umode_t *
so we can pass &inode->i_mode to it

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-08-01 02:09:42 -04:00
Lachlan McIlroy 782b94cdf5 block: initialise bd_super in bdget()
bd_super is currently reset to NULL in kill_block_super() so we rely on previous
users of the block_device object to initialise this value for the next user.
This quirk was exposed on RHEL5 when a third party filesystem did not always use
kill_block_super() and therefore bd_super wasn't being reset when a block_device
object was recycled within the cache.  This may not be a problem upstream but
makes sense to be defensive.

Signed-off-by: Lachlan McIlroy <lmcilroy@redhat.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-08-01 01:57:44 -04:00
Eric Dumazet c4ae0c6545 vfs: avoid call to inode_lru_list_del() if possible
inode_lru_list_del() is expensive because of per superblock lru locking,
while some inodes are not in lru list.

Adding a check in iput_final() can speedup pipe/sockets workloads on
SMP.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-08-01 01:41:17 -04:00
Eric Dumazet f2ee7abf4c vfs: avoid taking inode_hash_lock on pipes and sockets
Some inodes (pipes, sockets, ...) are not hashed, no need to take
contended inode_hash_lock at dismantle time.

nice speedup on SMP machines on socket intensive workloads.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-08-01 01:41:17 -04:00
Eric Dumazet b12362bdb6 vfs: conditionally call inode_wb_list_del()
Some inodes (pipes, sockets, ...) are not in bdi writeback list.

evict() can avoid calling inode_wb_list_del() and its expensive spinlock
by checking inode i_wb_list being empty or not.

At this point, no other cpu/user can concurrently manipulate this inode
i_wb_list

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-08-01 01:41:17 -04:00
David Howells 5a30d8a2b8 VFS: Fix automount for negative autofs dentries
Autofs may set the DCACHE_NEED_AUTOMOUNT flag on negative dentries.  These
need attention from the automounter daemon regardless of the LOOKUP_FOLLOW flag.

Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Ian Kent <raven@themaw.net>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-08-01 01:38:01 -04:00
Josef Bacik b4aff1f874 Btrfs: load the key from the dir item in readdir into a fake dentry
In btrfs we have 2 indexes for inodes.  One is for readdir, it's in this nice
sequential order and works out brilliantly for readdir.  However if you use ls,
it usually stat's each file it gets from readdir.  This is where the second
index comes in, which is based on a hash of the name of the file.  So then the
lookup has to lookup this index, and then lookup the inode.  The index lookup is
going to be in random order (since its based on the name hash), which gives us
less than stellar performance.  Since we know the inode location from the
readdir index, I create a dummy dentry and copy the location key into
dentry->d_fsdata.  Then on lookup if we have d_fsdata we use that location to
lookup the inode, avoiding looking up the other directory index.  Thanks,

Signed-off-by: Josef Bacik <josef@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-08-01 01:31:42 -04:00
Trond Myklebust a00ed25cce NFS: Re-enable compilation of nfs with !CONFIG_NFS_V4 || !CONFIG_NFS_V4_1
Fix two recently introduced compile problems:

Fix a typo in fs/nfs/pnfs.h

Move the pnfs_blksize declaration outside the CONFIG_NFS_V4 section in
struct nfs_server.

Reported-by: Jens Axboe <jaxboe@fusionio.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-07-31 14:27:04 -10:00
Jeff Layton c4a5534a1b cifs: remove unneeded variable initialization in cifs_reconnect_tcon
Reported-and-acked-by: David Howells <dhowells@redhat.com>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
2011-07-31 21:27:16 +00:00
Jeff Layton ad635942c8 cifs: simplify refcounting for oplock breaks
Currently, we take a sb->s_active reference and a cifsFileInfo reference
when an oplock break workqueue job is queued. This is unnecessary and
more complicated than it needs to be. Also as Al points out,
deactivate_super has non-trivial locking implications so it's best to
avoid that if we can.

Instead, just cancel any pending oplock breaks for this filehandle
synchronously in cifsFileInfo_put after taking it off the lists.
That should ensure that this job doesn't outlive the structures it
depends on.

Reported-by: Al Viro <viro@ZenIV.linux.org.uk>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
2011-07-31 21:21:20 +00:00
Jeff Layton 5980fc966b cifs: fix compiler warning in CIFSSMBQAllEAs
The recent fix to the above function causes this compiler warning to pop
on some gcc versions:

  CC [M]  fs/cifs/cifssmb.o
fs/cifs/cifssmb.c: In function ‘CIFSSMBQAllEAs’:
fs/cifs/cifssmb.c:5708: warning: ‘ea_name_len’ may be used uninitialized in
this function

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
2011-07-31 21:21:13 +00:00
Jeff Layton 91d065c473 cifs: fix name parsing in CIFSSMBQAllEAs
The code that matches EA names in CIFSSMBQAllEAs is incorrect. It
uses strncmp to do the comparison with the length limited to the
name_len sent in the response.

Problem: Suppose we're looking for an attribute named "foobar" and
have an attribute before it in the EA list named "foo". The
comparison will succeed since we're only looking at the first 3
characters. Fix this by also comparing the length of the provided
ea_name with the name_len in the response. If they're not equal then
it shouldn't match.

Reported-by: Jian Li <jiali@redhat.com>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Reviewed-by: Pavel Shilovsky <piastryyy@gmail.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
2011-07-31 21:21:09 +00:00
Jeff Layton 998d6fcb24 cifs: don't start signing too early
Sniffing traffic on the wire shows that windows clients send a zeroed
out signature field in a NEGOTIATE request, and send "BSRSPYL" in the
signature field during SESSION_SETUP. Make the cifs client behave the
same way.

It doesn't seem to make much difference in any server that I've tested
against, but it's probably best to follow windows behavior as closely as
possible here.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Reviewed-by: Shirish Pargaonkar <shirishpargaonkar@gmail.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
2011-07-31 21:21:06 +00:00