Commit Graph

30518 Commits

Author SHA1 Message Date
Liu Bo 6f1c36055f Btrfs: fix race between snapshot deletion and getting inode
While running snapshot testscript created by Mitch and David,
the race between autodefrag and snapshot deletion can lead to
corruption of dead_root list so that we can get crash on
btrfs_clean_old_snapshots().

And besides autodefrag, scrub also does the same thing, ie. read
root first and get inode.

Here is the story(take autodefrag as an example):
(1) when we delete a snapshot or subvolume, it will set its root's
refs to zero and do a iput() on its own inode, and if this inode happens
to be the only active in-meory one in root's inode rbtree, it will add
itself to the global dead_roots list for later cleanup.

(2) after (1), the autodefrag thread may read another inode for defrag
and the inode is just in the deleted snapshot/subvolume, but all of these
are without checking if the root is still valid(refs > 0).  So the end up
result is adding the deleted snapshot/subvolume's root to the global
dead_roots list AGAIN.

Fortunately, we already have a srcu lock to avoid the race, ie. subvol_srcu.

So all we need to do is to take the lock to protect 'read root and get inode',
since we synchronize to wait for the rcu grace period before adding something
to the global dead_roots list.

Reported-by: Mitch Harder <mitch.harder@sabayonlinux.org>
Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2013-02-05 16:09:13 -05:00
Miao Xie 843fcf3573 Btrfs: fix missing release of the space/qgroup reservation in start_transaction()
When we fail to start a transaction, we need to release the reserved free space
and qgroup space, fix it.

Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
Reviewed-by: Jan Schmidt <list.btrfs@jan-o-sch.net>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2013-02-05 16:09:11 -05:00
Miao Xie 0a3404dcff Btrfs: fix wrong sync_writers decrement in btrfs_file_aio_write()
If the checks at the beginning of btrfs_file_aio_write() fail, we needn't
decrease ->sync_writers, because we have not increased it. Fix it.

Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2013-02-05 16:09:10 -05:00
Josef Bacik 222c81dc38 Btrfs: do not merge logged extents if we've removed them from the tree
You can run into this problem where if somebody is fsyncing and writing out
the existing extents you will have removed the extent map from the em tree,
but it's still valid for the current fsync so we go ahead and write it.  The
problem is we unconditionally try to merge it back into the em tree, but if
we've removed it from the em tree that will cause use after free problems.
Fix this to only merge if we are still a part of the tree.  Thanks,

Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2013-02-05 16:09:03 -05:00
Jan Kara 288be96de6 udf: Remove unused s_extLength from udf_bitmap
s_extLength was assigned to but the value was never really used. So
just remove the field.

Signed-off-by: Jan Kara <jack@suse.cz>
2013-02-05 17:29:53 +01:00
Jan Kara c60305b578 udf: Make s_block_bitmap standard array
struct udf_bitmap has array of buffer pointers attached to it. The code
unnecessarily used s_block_bitmap as a pointer to the array instead of
the standard trick of using 0 length array in the declaration. Change
that to make code more readable and actually shrink the structure by one
pointer.

Signed-off-by: Jan Kara <jack@suse.cz>
2013-02-05 17:29:52 +01:00
Jan Kara 89b1f39eb4 udf: Fix bitmap overflow on large filesystems with small block size
For large UDF filesystems with 512-byte blocks the number of necessary
bitmap blocks is larger than 2^16 so s_nr_groups in udf_bitmap overflows
(the number will overflow for filesystems larger than 128 GB with
512-byte blocks). That results in ENOSPC errors despite the filesystem
has plenty of free space.

Fix the problem by changing s_nr_groups' type to 'int'. That is enough
even for filesystems 2^32 blocks (UDF maximum) and 512-byte blocksize.

Reported-and-tested-by: v10lator@myway.de
Signed-off-by: Jan Kara <jack@suse.cz>
2013-02-05 17:29:30 +01:00
Ingo Molnar b2c77a57e4 This implements the cputime accounting on full dynticks CPUs.
Typical cputime stats infrastructure relies on the timer tick and
 its periodic polling on the CPU to account the amount of time
 spent by the CPUs and the tasks per high level domains such as
 userspace, kernelspace, guest, ...
 
 Now we are preparing to implement full dynticks capability on
 Linux for Real Time and HPC users who want full CPU isolation.
 This feature requires a cputime accounting that doesn't depend
 on the timer tick.
 
 To implement it, this new cputime infrastructure plugs into
 kernel/user/guest boundaries to take snapshots of cputime and
 flush these to the stats when needed. This performs pretty
 much like CONFIG_VIRT_CPU_ACCOUNTING except that context location
 and cputime snaphots are synchronized between write and read
 side such that the latter can safely retrieve the pending tickless
 cputime of a task and add it to its latest cputime snapshot to
 return the correct result to the user.
 
 Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.12 (GNU/Linux)
 
 iQIcBAABAgAGBQJRBsKnAAoJEIUkVEdQjox3lMgP/2R6DU2f8PyGIao3hne4M3Pu
 L3q+mAG53b24Dy014KeW7gd8yv45fE7wp/rs8CGLte9VzbLkRCDSFQPgBuXVagRj
 tV5nfAuqD0wHTnA+HhBE3l3C2RKAPGIu79rBpnIR/QIPPl8Z3Dby8YgmxEQKDf8G
 j7MEBu2LthSuqEi2ZXemnO5r0oEnQAzAp4TTi/M38k0Fmt59nOGyjLnI+xHYCBMa
 1pnz7j3jjR9NJExGu8iVvbo+jupuQngP8qmkLXHvYnj/TEJNwzO1hHVoSwOpjYpS
 9ycl+T8IKQLbAkBywLtq3Mzde43xt/t8wYyGZ0oAV+Z7MIpz/9YIfDJwqQeqoNbD
 dAdbNjKMbsxCgmrnyqSagfMQg/r3CPZ4vf40TMCaN4gNUJC4Ie+E4kPRKRh59+PB
 Ukthmqujn0f40LAa+HXTUuzafd3b0s/ewH+8FuQ6LAG9b5+WnoN8JTJ5u6+ydokO
 ZleeOowuRZZEg+abQ8Sm2GRm/BzN29gi/npb//I+ZDXWv/+3yccgsiPjCRzCAAaO
 g1RmYryFSRUwHQbGNNypVWVuOLWvrBQ4jqbGO7BBuBByZMSHryKxR6mb+inH3qLE
 xIDM9SdSJisc292OzoFKwVZki4MaXaadJXJduVvqYlZQvXXs7eAa4wo3euhtVITD
 NLQO5OZXE4oIQmDFb0FV
 =1Tzp
 -----END PGP SIGNATURE-----

Merge tag 'full-dynticks-cputime-for-mingo' of git://git.kernel.org/pub/scm/linux/kernel/git/frederic/linux-dynticks into sched/core

Pull full-dynticks (user-space execution is undisturbed and
receives no timer IRQs) preparation changes that convert the
cputime accounting code to be full-dynticks ready,
from Frederic Weisbecker:

 "This implements the cputime accounting on full dynticks CPUs.

  Typical cputime stats infrastructure relies on the timer tick and
  its periodic polling on the CPU to account the amount of time
  spent by the CPUs and the tasks per high level domains such as
  userspace, kernelspace, guest, ...

  Now we are preparing to implement full dynticks capability on
  Linux for Real Time and HPC users who want full CPU isolation.
  This feature requires a cputime accounting that doesn't depend
  on the timer tick.

  To implement it, this new cputime infrastructure plugs into
  kernel/user/guest boundaries to take snapshots of cputime and
  flush these to the stats when needed. This performs pretty
  much like CONFIG_VIRT_CPU_ACCOUNTING except that context location
  and cputime snaphots are synchronized between write and read
  side such that the latter can safely retrieve the pending tickless
  cputime of a task and add it to its latest cputime snapshot to
  return the correct result to the user."

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-02-05 13:10:33 +01:00
Linus Torvalds fe547d7714 Merge branch 'fix-max-write' of git://git.kernel.org/pub/scm/linux/kernel/git/teigland/linux-dlm
Pull dlm fix from David Teigland:
 "Thanks to Jana who reported the problem and was able to test this fix
  so quickly."

This fixes an incorrect size check that triggered for CONFIG_COMPAT
whether the code was actually doing compat or not.  The incorrect write
size check broke userland (clvmd) when maximum resource name lengths are
used.

* 'fix-max-write' of git://git.kernel.org/pub/scm/linux/kernel/git/teigland/linux-dlm:
  dlm: check the write size from user
2013-02-05 20:50:11 +11:00
Vyacheslav Dubeyko a9bae18954 nilfs2: fix fix very long mount time issue
There exists a situation when GC can work in background alone without
any other filesystem activity during significant time.

The nilfs_clean_segments() method calls nilfs_segctor_construct() that
updates superblocks in the case of NILFS_SC_SUPER_ROOT and
THE_NILFS_DISCONTINUED flags are set.  But when GC is working alone the
nilfs_clean_segments() is called with unset THE_NILFS_DISCONTINUED flag.
As a result, the update of superblocks doesn't occurred all this time
and in the case of SPOR superblocks keep very old values of last super
root placement.

SYMPTOMS:

Trying to mount a NILFS2 volume after SPOR in such environment ends with
very long mounting time (it can achieve about several hours in some
cases).

REPRODUCING PATH:

1. It needs to use external USB HDD, disable automount and doesn't
   make any additional filesystem activity on the NILFS2 volume.

2. Generate temporary file with size about 100 - 500 GB (for example,
   dd if=/dev/zero of=<file_name> bs=1073741824 count=200).  The size of
   file defines duration of GC working.

3. Then it needs to delete file.

4. Start GC manually by means of command "nilfs-clean -p 0".  When you
   start GC by means of such way then, at the end, superblocks is updated
   by once.  So, for simulation of SPOR, it needs to wait sometime (15 -
   40 minutes) and simply switch off USB HDD manually.

5. Switch on USB HDD again and try to mount NILFS2 volume.  As a
   result, NILFS2 volume will mount during very long time.

REPRODUCIBILITY: 100%

FIX:

This patch adds checking that superblocks need to update and set
THE_NILFS_DISCONTINUED flag before nilfs_clean_segments() call.

Reported-by: Sergey Alexandrov <splavgm@gmail.com>
Signed-off-by: Vyacheslav Dubeyko <slava@dubeyko.com>
Tested-by: Vyacheslav Dubeyko <slava@dubeyko.com>
Acked-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Tested-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-02-05 20:38:46 +11:00
David Teigland d4b0bcf32b dlm: check the write size from user
Return EINVAL from write if the size is larger than
allowed.  Do this before allocating kernel memory for
the bogus size, which could lead to OOM.

Reported-by: Sasha Levin <levinsasha928@gmail.com>
Tested-by: Jana Saout <jana@saout.de>
Signed-off-by: David Teigland <teigland@redhat.com>
2013-02-04 15:31:22 -06:00
Theodore Ts'o 40ae348762 ext4: optimize mballoc for large allocations
The ext4 block allocator only maintains buddy bitmaps for chunks which
are less than or equal to one quarter of a block group.  That is, for
a file aystem with a 1k blocksize, and where the number of blocks in a
block group is 8192 blocks, the largest chunk size tracked by buddy
bitmaps is 2048 blocks.

For a file system with a 4k blocksize, and where the number of blocks
in a block group is 32768 blocks, the largest chunk size tracked by
buddy bitmaps is 8192 blocks.

To work around this code, mballoc.c before this commit would truncate
allocation requests to the number of blocks in a block group minus 10.
Why 10?  Aside from being a completely arbitrary number, it avoids
block allocation to be a power of two larger than 25% of the block
group.  If you try to explicitly fallocate 50% of the block group
size, this will demonstrate the problem; the block allocation code
will scan the all of the blocks in the file system with cr==0 (since
the request is for a natural power of two), but then completely fail
for all blocks groups, since the buddy bitmaps don't track chunk sizes
of 50% of the block group.

To fix this, in these we use ext4_mb_complex_scan_group() instead of
ext4_mb_simple_scan_group().

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Cc: Andreas Dilger <adilger@dilger.ca>
2013-02-04 15:08:40 -05:00
Enke Chen 0415d29102 fuse: send poll events
commit 626cf23660 "poll: add poll_requested_events()..." enabled us to send the
requested events to the filesystem.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
2013-02-04 16:14:32 +01:00
Miklos Szeredi dfca7cebc2 fuse: don't WARN when nlink is zero
drop_nlink() warns if nlink is already zero.  This is triggerable by a buggy
userspace filesystem.  The cure, I think, is worse than the disease so disable
the warning.

Reported-by: Tero Roponen <tero.roponen@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
2013-02-04 15:57:42 +01:00
Eric Wong 6a4e922c3d fuse: avoid out-of-scope stack access
The all pointers within fuse_req must point to valid memory once
fuse_force_forget() returns.

This bug appeared in "fuse: implement NFS-like readdirplus support"
and was never in any official Linux release.

I tested the fuse_force_forget() code path by injecting to fake -ENOMEM and
verified the FORGET operation was called properly in userspace.

Signed-off-by: Eric Wong <normalperson@yhbt.net>
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
2013-02-04 15:22:23 +01:00
Adam Thomas 8afd500cb5 UBIFS: fix double free of ubifs_orphan objects
The last orphan in the dnext list has its dnext set to NULL. Because
of that, ubifs_delete_orphan assumes that it is not on the dnext list
and frees it immediately instead ignoring it as a second delete. The
orphan is later freed again by erase_deleted.

This change adds an explicit flag to ubifs_orphan indicating whether
it is pending delete.

Signed-off-by: Adam Thomas <adamthomas1111@gmail.com>
Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Cc: stable@vger.kernel.org
2013-02-04 12:31:48 +02:00
Adam Thomas 2928f0d0c5 UBIFS: fix use of freed ubifs_orphan objects
The last orphan in the cnext list has its cnext set to NULL. Because
of that, ubifs_delete_orphan assumes that it is not on the cnext list
and frees it immediately instead of adding it to the dnext list. The
freed orphan is later modified by write_orph_node.

This can cause various inconsistencies including directory entries
that cannot be removed and this error:

UBIFS error (pid 20685): layout_cnodes: LPT out of space at LEB 14:129009 needing 17, done_ltab 1, done_lsave 1

This is a regression introduced by
"7074e5eb UBIFS: remove invalid reference to list iterator variable".

This change adds an explicit flag to ubifs_orphan indicating whether
it is pending commit.

Signed-off-by: Adam Thomas <adamthomas1111@gmail.com>
Reviewed-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: stable@vger.kernel.org # v3.6+
Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
2013-02-04 12:31:00 +02:00
Thomas Gleixner 90889a635a Merge branch 'fortglx/3.9/time' of git://git.linaro.org/people/jstultz/linux into timers/core
Trivial conflict in arch/x86/Kconfig

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2013-02-04 11:03:03 +01:00
Al Viro 9d94b9e2f3 switch timerfd compat syscalls to COMPAT_SYSCALL_DEFINE
... and move them over to fs/timerfd.c.  Cleaner and easier
that way...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-02-03 15:09:25 -05:00
Al Viro f482e1b4a4 switch compat_sys_open* to COMPAT_SYSCALL_DEFINE 2013-02-03 15:09:24 -05:00
Theodore Ts'o 8dc0aa8cf0 ext4: check incompatible mount options while mounting ext2/3
Check for incompatible mount options when using the ext4 file system
driver to mount ext2 or ext3 file systems.

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2013-02-02 23:38:39 -05:00
Jan Kara e33e60eaed ext4: print error when argument of inode_readahead_blk is invalid
If argument of inode_readahead_blk is too big, we just bail out
without printing any error. Fix this since it could confuse users.

Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2013-02-02 23:14:31 -05:00
Jan Kara 5f3633e36b ext4: make mount option parsing loop more logical
The loop looking for correct mount option entry is more logical if it is
written rewritten as an empty loop looking for correct option entry and then
code handling the option. It also saves one level of indentation for a lot of
code so we can join a couple of split lines.

Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2013-02-02 23:09:36 -05:00
Jan Kara 0efb3b2300 ext4: move several mount options to standard handling loop
Several mount option (resuid, resgid, journal_dev, journal_ioprio) are
currently handled before we enter standard option handling loop. I don't
see a reason for this so move them to normal handling loop to make things
more regular.

Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2013-02-02 22:52:19 -05:00
Cong Ding 0e79537d30 ext4: reduce one "if" comparison in ext4_dirhash()
It is unnecessary to check i<4 after the loop; just do it before the
break.

Signed-off-by: Cong Ding <dinggnu@gmail.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2013-02-01 22:33:21 -05:00
Niu Yawei f116700971 ext4: fix race in ext4_mb_add_n_trim()
In ext4_mb_add_n_trim(), lg_prealloc_lock should be taken when
changing the lg_prealloc_list.

Signed-off-by: Niu Yawei <yawei.niu@intel.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Cc: stable@vger.kernel.org
2013-02-01 21:31:27 -05:00
Akria Fujita 87e698734b ext4: fix smatch warning in move_extent.c's mext_replace_branches()
Commit 2147b1a6a4 resulted in a new smatch warning:

> fs/ext4/move_extent.c:693 mext_replace_branches()
> 	 warn: variable dereferenced before check 'dext' (see line 683)

Fix this by adding a check to make sure dext is non-NULL before we
derefrence it.

Signed-off-by: Akria Fujita <a-fujita@rs.jp.nec.com>
[ modified by tytso to make sure an ext4_error is called ]
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2013-02-01 20:52:46 -05:00
Julia Lawall 524c19ebc9 ext4: use WARN in ext4_alloc_blocks
Use WARN rather than printk followed by WARN_ON(1), for conciseness.

A simplified version of the semantic patch that makes this transformation
is as follows: (http://coccinelle.lip6.fr/)

// <smpl>
@@
expression list es;
@@

-printk(
+WARN(1,
  es);
-WARN_ON(1);
// </smpl>

Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2013-02-01 20:07:21 -05:00
Jeff Liu a21cd50367 xfs: refactor space log reservation for XFS_TRANS_ATTR_SET
Currently, we calculate the attribute set transaction
log space reservation at runtime in two parts:

1) XFS_ATTRSET_LOG_RES() which is calcuated out at mount time.

2) ((ext * (mp)->m_sb.sb_sectsize) + \
    (ext * XFS_FSB_TO_B((mp), XFS_BM_MAXLEVELS(mp, XFS_ATTR_FORK))) + \
    (128 * (ext + (ext * XFS_BM_MAXLEVELS(mp, XFS_ATTR_FORK))))))
which is calculated out at runtime since it depend on the given extent length in blocks.

This patch renamed XFS_ATTRSET_LOG_RES(mp) to XFS_ATTRSETM_LOG_RES(mp) to indicate
that it is figured out at mount time.  Introduce XFS_ATTRSETRT_LOG_RES(mp) which would
be used to calculate out the unit of the log space reservation for one block.

In this way, the total runtime space for the given extent length can be figured out by:
XFS_ATTRSETM_LOG_RES(mp) + XFS_ATTRSETRT_LOG_RES(mp) * ext

Signed-off-by: Jie Liu <jeff.liu@oracle.com>
CC: Dave Chinner <david@fromorbit.com>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2013-02-01 14:56:31 -06:00
Jeff Liu 762c585b18 xfs: make use of XFS_SB_LOG_RES() at xfs_fs_log_dummy()
Make use of XFS_SB_LOG_RES() at xfs_fs_log_dummy().

Signed-off-by: Jie Liu <jeff.liu@oracle.com>
CC: Dave Chinner <david@fromorbit.com>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2013-02-01 14:55:59 -06:00
Jeff Liu 5166ab0655 xfs: make use of XFS_SB_LOG_RES() at xfs_mount_log_sb()
Make use of XFS_SB_LOG_RES() at xfs_mount_log_sb().

Signed-off-by: Jie Liu <jeff.liu@oracle.com>
CC: Dave Chinner <david@fromorbit.com>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2013-02-01 14:55:08 -06:00
Jeff Liu e457274b60 xfs: make use of XFS_SB_LOG_RES() at xfs_log_sbcount()
Make use of XFS_SB_LOG_RES() at xfs_log_sbcount().

Signed-off-by: Jie Liu <jeff.liu@oracle.com>
CC: Dave Chinner <david@fromorbit.com>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2013-02-01 14:47:18 -06:00
Jeff Liu a7bd794a0f xfs: introduce XFS_SB_LOG_RES() for transactions that modify sb on disk
Introduce a new transaction space reservation XFS_SB_LOG_RES() for
those transactions that need to modify the superblock on disk.

Signed-off-by: Jie Liu <jeff.liu@oracle.com>
CC: Dave Chinner <david@fromorbit.com>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2013-02-01 14:46:35 -06:00
Jeff Liu 762d7ba657 xfs: calculate XFS_TRANS_QM_QUOTAOFF_END space log reservation at mount time
Convert the calculation for end of quotaoff log space reservation
from runtime to mount time.

Signed-off-by: Jie Liu <jeff.liu@oracle.com>
CC: Dave Chinner <david@fromorbit.com>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2013-02-01 14:45:50 -06:00
Jeff Liu a1bd955754 xfs: calculate XFS_TRANS_QM_QUOTAOFF space log reservation at mount time
Convert the calculation of quota off transaction log space reservation
from runtime to mount time.

Signed-off-by: Jie Liu <jeff.liu@oracle.com>
CC: Dave Chinner <david@fromorbit.com>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2013-02-01 14:44:29 -06:00
Jeff Liu 4800104438 xfs: calculate XFS_TRANS_QM_DQALLOC space log reservation at mount time
The disk quota allocation log space reservation is calcuated at runtime,
this patch does it at mount time.

Signed-off-by: Jie Liu <jeff.liu@oracle.com>
CC: Dave Chinner <david@fromorbit.com>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2013-02-01 14:43:51 -06:00
Jeff Liu f0f2df94fa xfs: calcuate XFS_TRANS_QM_SETQLIM space log reservation at mount time
For adjusting quota limits transactions, we calculate out the log space
reservation at runtime, this patch does it at mount time.

Signed-off-by: Jie Liu <jeff.liu@oracle.com>
CC: Dave Chinner <david@fromorbit.com>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2013-02-01 14:43:11 -06:00
Jeff Liu f910a8c620 xfs: calculate xfs_qm_write_sb_changes() space log reservation at mount time
For the transaction that write the incore superblock changes of quota flags
to disk, it would reserve the same log space to clear/reset quota flags
transaction, hence we can use XFS_TRANS_SBCHANGE_LOG_RES() for it as well.

Signed-off-by: Jie Liu <jeff.liu@oracle.com>
CC: Dave Chinner <david@fromorbit.com>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2013-02-01 14:42:32 -06:00
Jeff Liu b0c10b983a xfs: calculate XFS_TRANS_QM_SBCHANGE space log reservation at mount time
The transaction log space for clearing/reseting the quota flags
is calculated out at runtime, this patch can figure it out at
mount time.

Signed-off-by: Jie Liu <jeff.liu@oracle.com>
CC: Dave Chinner <david@fromorbit.com>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2013-02-01 14:40:17 -06:00
Jeff Liu 5b292ae3a9 xfs: make use of xfs_calc_buf_res() in xfs_trans.c
Refining the existing reservations with xfs_calc_buf_res() in xfs_trans.c

Signed-off-by: Jie Liu <jeff.liu@oracle.com>
CC: Dave Chinner <david@fromorbit.com>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2013-02-01 14:39:29 -06:00
Bob Peterson d2b47cfb26 GFS2: Get a block reservation before resizing a file
This patch allocates a block reservation structure before growing
or shrinking a file. Without this structure, the grow or shink code
can reference the bad pointer.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2013-02-01 20:37:33 +00:00
Steven Whitehouse 4506a519f2 GFS2: Split glock lru processing into two parts
The intent here is to split the processing of the glock lru
list into two parts, so that the selection of glocks and the
disposal are separate functions. The plan is then, that further
updates can then be made to these functions in the future
to improve the selection of glocks and also the efficiency of
glock disposal.

The new feature which this patch brings is sorting the
glocks to be disposed of into glock number (and thus also
disk block number) order. Not all glocks will need i/o in
order to dispose of them, but some will, and at least we'll
generate mostly disk block order i/o now.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2013-02-01 20:36:03 +00:00
Jeff Liu 4f3b57832b xfs: add a helper to figure out the space log reservation per item
Add a new helper xfs_calc_buf_res() to calcuate out the transaction space
reservations per item.  xfs_buf_log_overhead() is used to figure out the
extra space for struct xfs_buf_log_format that gets written into the log
for every buffer as well as a log opheader, i.e. struct xlog_op_header.

Signed-off-by: Jie Liu <jeff.liu@oracle.com>
CC: Dave Chinner <david@fromorbit.com>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2013-02-01 14:35:06 -06:00
Eric Sandeen 3c91160808 btrfs: don't try to notify udev about missing devices
If we remove a missing device, bdev is null, and if we
send that off to btrfs_kobject_uevent we'll panic.

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2013-02-01 11:47:37 -05:00
Trond Myklebust 322b2b9032 Revert "NFS: add nfs_sb_deactive_async to avoid deadlock"
This reverts commit 324d003b0c.

The deadlock turned out to be caused by a workqueue limitation that has
now been worked around in the RPC code (see comment in rpc_free_task).

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-02-01 10:13:48 -05:00
Linus Torvalds bf6c8a8148 NFS client bugfixe for Linux 3.8
- Error reporting in nfs_xdev_mount incorrectly maps all errors to ENOMEM
 - Fix an NFSv4 refcounting issue
 - Fix a mount failure when the server reboots during NFSv4 trunking discovery
 - NFSv4.1 mounts may need to run the lease recovery thread.
 - Don't silently fail setattr() requests on mountpoints
 - Fix a SUNRPC socket/transport livelock and priority queue issue
 - We must handle NFS4ERR_DELAY when resetting the NFSv4.1 session.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.13 (GNU/Linux)
 
 iQIcBAABAgAGBQJRCpS4AAoJEGcL54qWCgDyqucP/2CTv5leu+X5/0PXBOykAIHg
 8oEsTEz7/4IvIxTXzuHDYirMnm/mulfGF6NrdPqfvxpAHqRfVBfLFocfNLMVhQci
 97RmBfEEGM22AToYUubML5bIxr0QllV4s9Vmyh/zGDan52y7zNNlZX+v6aLjZbJB
 Fbolihpcch6lhQEUNAzK0B0ddimDl9lazx/WTmMOD/JrwOqzA4FJC+YxBe88nfzQ
 c6sYyEptBaSirbCOlueqGpv8skB1CLpFJXguXToPXFxpWed6uoGrIwLO7MLUdFpJ
 Xw+j8cuv/wjyYJGVKjhW7kXtwK8T7+u4bT2L883R01XYXr8XfkkLON0dgG1X/unk
 80mLzCO1+qRdoDSQ4b/V4B0nScPRCJuoZpftjCi2uhKewNcxQPMZ2V5/D7pO3uyE
 NdhxByB8D86JfNrIcBcRaxfuiQsurQBDvsDNmWPBZSOmH/dmHqTQGLcIe6N94A0B
 c7KFtXrN2MzOl8S68dUpbhftObq9X0oK2oxFFLWRQoqjrFtiDLU5JilV0bEH2CMo
 gJX7CRrPoJ2wmNToKdPJRWBYDmtMMIThq3vIpCj1FFzX17r5grJpqCmp/TViu/ER
 r8rmJzni+nmHaO1NLJMTCHzLJ8soiKyBq8PZKAbipTJxy+TXI/69jnWzpIQ/pbM/
 JN+0tiCcqmUmXsO+hrCP
 =lWFV
 -----END PGP SIGNATURE-----

Merge tag 'nfs-for-3.8-4' of git://git.linux-nfs.org/projects/trondmy/linux-nfs

Pull NFS client bugfixes from Trond Myklebust:

 - Error reporting in nfs_xdev_mount incorrectly maps all errors to
   ENOMEM

 - Fix an NFSv4 refcounting issue

 - Fix a mount failure when the server reboots during NFSv4 trunking
   discovery

 - NFSv4.1 mounts may need to run the lease recovery thread.

 - Don't silently fail setattr() requests on mountpoints

 - Fix a SUNRPC socket/transport livelock and priority queue issue

 - We must handle NFS4ERR_DELAY when resetting the NFSv4.1 session.

* tag 'nfs-for-3.8-4' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
  NFSv4.1: Handle NFS4ERR_DELAY when resetting the NFSv4.1 session
  SUNRPC: When changing the queue priority, ensure that we change the owner
  NFS: Don't silently fail setattr() requests on mountpoints
  NFSv4.1: Ensure that nfs41_walk_client_list() does start lease recovery
  NFSv4: Fix NFSv4 trunking discovery
  NFSv4: Fix NFSv4 reference counting for trunked sessions
  NFS: Fix error reporting in nfs_xdev_mount
2013-02-01 08:43:52 +11:00
Feng Shuo 4582a4ab2a FUSE: Adapt readdirplus to application usage patterns
Use the same adaptive readdirplus mechanism as NFS:

http://permalink.gmane.org/gmane.linux.nfs/49299

If the user space implementation wants to disable readdirplus
temporarily, it could just return ENOTSUPP. Then kernel will
recall it with readdir.

Signed-off-by: Feng Shuo <steve.shuo.feng@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
2013-01-31 17:08:11 +01:00
Anatol Pomozov c2132c1bc7 Do not use RCU for current process credentials
Commit c69e8d9c0 added rcu lock to fuse/dir.c It was assuming
that 'task' is some other process but in fact this parameter always
equals to 'current'. Inline this parameter to make it more readable
and remove RCU lock as it is not needed when access current process
credentials.

Signed-off-by: Anatol Pomozov <anatol.pomozov@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
2013-01-31 17:08:10 +01:00
Trond Myklebust c489ee290b NFSv4.1: Handle NFS4ERR_DELAY when resetting the NFSv4.1 session
NFS4ERR_DELAY is a legal reply when we call DESTROY_SESSION. It
usually means that the server is busy handling an unfinished RPC
request. Just sleep for a second and then retry.
We also need to be able to handle the NFS4ERR_BACK_CHAN_BUSY return
value. If the NFS server has outstanding callbacks, we just want to
similarly sleep & retry.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: stable@vger.kernel.org
2013-01-30 17:45:15 -05:00
Trond Myklebust ab22541782 NFS: Don't silently fail setattr() requests on mountpoints
Ensure that any setattr and getattr requests for junctions and/or
mountpoints are sent to the server. Ever since commit
0ec26fd069 (vfs: automount should ignore LOOKUP_FOLLOW), we have
silently dropped any setattr requests to a server-side mountpoint.
For referrals, we have silently dropped both getattr and setattr
requests.

This patch restores the original behaviour for setattr on mountpoints,
and tries to do the same for referrals, provided that we have a
filehandle...

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: stable@vger.kernel.org
2013-01-30 17:41:04 -05:00
Eric Sandeen e7b04ac00e jbd2: don't wake kjournald unnecessarily
Don't send an extra wakeup to kjournald in the case where we
already have the proper target in j_commit_request, i.e. that
transaction has already been requested for commit.

commit deeeaf13 "jbd2: fix fsync() tid wraparound bug" changed
the logic leading to a wakeup, but it caused some extra wakeups
which were found to lead to a measurable performance regression.

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
[tytso@mit.edu: reworked check to make it clearer]
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2013-01-30 00:39:28 -05:00
Jan Kara 091e26dfc1 ext4: fix possible use-after-free with AIO
Running AIO is pinning inode in memory using file reference. Once AIO
is completed using aio_complete(), file reference is put and inode can
be freed from memory. So we have to be sure that calling aio_complete()
is the last thing we do with the inode.

CC: stable@vger.kernel.org
Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>
Acked-by: Jeff Moyer <jmoyer@redhat.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2013-01-29 22:48:17 -05:00
Linus Torvalds f96736e1ba xfs: bugfixes for 3.8-rc6
- fix return value when filesystem probe finds no XFS magic, a
   regression introduced in 9802182.
 
 - fix stack switch in __xfs_bmapi_allocate by moving the check for stack
   switch up into xfs_bmapi_write.
 
 - fix oops in _xfs_buf_find by validating that the requested block is
   within the filesystem bounds.
 
 - limit speculative preallocation near ENOSPC.
 
 - fix an unmount hang in xfs_wait_buftarg by freeing the
   xfs_buf_log_item in xfs_buf_item_unlock.
 
 - fix a possible use after free with AIO.
 
 - fix xfs_swap_extents after removal of xfs_flushinval_pages, a
   regression introduced in fb59581404.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.10 (GNU/Linux)
 
 iQIcBAABAgAGBQJRBvmgAAoJENaLyazVq6ZOOacP/RilCPsi41NkJqRx1Rs5aRGE
 UvinrHfAL/tBupS2JVo1niIilBNJG/cI+lLcpV/P5omLBJfpEu0trzZUSxU7S1Vc
 a2M8J0qhmKfBcl70fuCALAxPY52895y+44gufxaH0O5HQDN6tB8n4MMqYGPmS8hz
 Ul/q3MO601hVyBHaoYa7BNGS3YG0TCdFGtWcC5tQaR3v7upTLR2ouZrGQ8CV0BBa
 Ek1xdxLh4D0fRybSL7lUw64W957iyldoLsEg+zQrE9NSfTE8DSqUG+NPWB0wjPce
 ICtmO6TbE5c6q1ScOL3YCC2cmYvjR9mlAHnPy73SqWSIsTqUsVzdibNo+tUJJZ5r
 RZf3u6Uri6uKC6Hl4XEtg4LVnnquKosTXfoiHmn+eh0dhYL7sZG0Ya5we5pH5Tmi
 P6B2DlfUA1fj4Ne4Asx2d7mwOJaZcLHDZoeCs/Haz2Z6kGVEm7ImyAb1h76uNOZo
 l0NFhXJGcOQLyjPtQjl81SGjQmntiIN0Poia3528zjxxGXlBNwAwalkOtdJnk5iN
 IaYRKtvIcrdjFvunasiKZIsV/O9w3/mguXlrSqBDgUsKPUc/cq5vLfsa70jYGc2j
 M6ldJRRqTvSjkVXc/7SXv4GLt/qbUWa92ESzhZQXEABIjZJUnOZpZuLmSVdRUyZk
 +SaGvbMphE0U/4ps3pO/
 =wViN
 -----END PGP SIGNATURE-----

Merge tag 'for-linus-v3.8-rc6' of git://oss.sgi.com/xfs/xfs

Pull xfs bugfixes from Ben Myers:
 "Here are fixes for returning EFSCORRUPTED on probe of a non-xfs
  filesystem, the stack switch in xfs_bmapi_allocate, a crash in
  _xfs_buf_find, speculative preallocation as the filesystem nears
  ENOSPC, an unmount hang, a race with AIO, and a regression with
  xfs_fsr:

   - fix return value when filesystem probe finds no XFS magic, a
     regression introduced in 9802182.

   - fix stack switch in __xfs_bmapi_allocate by moving the check for
     stack switch up into xfs_bmapi_write.

   - fix oops in _xfs_buf_find by validating that the requested block is
     within the filesystem bounds.

   - limit speculative preallocation near ENOSPC.

   - fix an unmount hang in xfs_wait_buftarg by freeing the
     xfs_buf_log_item in xfs_buf_item_unlock.

   - fix a possible use after free with AIO.

   - fix xfs_swap_extents after removal of xfs_flushinval_pages, a
     regression introduced in commit fb59581404a."

* tag 'for-linus-v3.8-rc6' of git://oss.sgi.com/xfs/xfs:
  xfs: Fix xfs_swap_extents() after removal of xfs_flushinval_pages()
  xfs: Fix possible use-after-free with AIO
  xfs: fix shutdown hang on invalid inode during create
  xfs: limit speculative prealloc near ENOSPC thresholds
  xfs: fix _xfs_buf_find oops on blocks beyond the filesystem end
  xfs: pull up stack_switch check into xfs_bmapi_write
  xfs: Do not return EFSCORRUPTED when filesystem probe finds no XFS magic
2013-01-30 11:59:37 +11:00
Steven Whitehouse 4513899092 GFS2: Use ->writepages for ordered writes
Instead of using a list of buffers to write ahead of the journal
flush, this now uses a list of inodes and calls ->writepages
via filemap_fdatawrite() in order to achieve the same thing. For
most use cases this results in a shorter ordered write list,
as well as much larger i/os being issued.

The ordered write list is sorted by inode number before writing
in order to retain the disk block ordering between inodes as
per the previous code.

The previous ordered write code used to conflict in its assumptions
about how to write out the disk blocks with mpage_writepages()
so that with this updated version we can also use mpage_writepages()
for GFS2's ordered write, writepages implementation. So we will
also send larger i/os from writeback too.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2013-01-29 10:29:17 +00:00
Steven Whitehouse d564053f07 GFS2: Clean up freeze code
The freeze code has not been looked at a lot recently. Upstream has
moved on, and this is an attempt to catch us back up again. There
is a vfs level interface for the freeze code which can be called
from our (obsolete, but kept for backward compatibility purposes)
sysfs freeze interface. This means freezing this way vs. doing it
from the ioctl should now work in identical fashion.

As a result of this, the freeze function is only called once
and we can drop our own special purpose code for counting the
number of freezes.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2013-01-29 10:29:05 +00:00
Steven Whitehouse c76c4d96bd GFS2: Merge gfs2_attach_bufdata() into trans.c
The locking in gfs2_attach_bufdata() was type specific (data/meta)
which made the function rather confusing. This patch moves the core
of gfs2_attach_bufdata() into trans.c renaming it gfs2_alloc_bufdata()
and moving the locking into gfs2_trans_add_data()/gfs2_trans_add_meta()

As a result all of the locking related to adding data and metadata to
the journal is now in these two functions. This should help to clarify
what is going on, and give us some opportunities to simplify in
some cases.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2013-01-29 10:28:44 +00:00
Steven Whitehouse 767f433f34 GFS2: Copy gfs2_trans_add_bh into new data/meta functions
This patch copies the body of gfs2_trans_add_bh into the two newly
added gfs2_trans_add_data and gfs2_trans_add_meta functions. We can
then move the .lo_add functions from lops.c into trans.c and call
them directly.

As a result of this, we no longer need to use the .lo_add functions
at all, so that is removed from the log operations structure.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2013-01-29 10:28:28 +00:00
Steven Whitehouse 350a9b0a72 GFS2: Split gfs2_trans_add_bh() into two
There is little common content in gfs2_trans_add_bh() between the data
and meta classes by the time that the functions which it calls are
taken into account. The intent here is to split this into two
separate functions. Stage one is to introduce gfs2_trans_add_data()
and gfs2_trans_add_meta() and update the callers accordingly.

Later patches will then pull in the content of gfs2_trans_add_bh()
and its dependent functions in order to clean up the code in this
area.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2013-01-29 10:28:04 +00:00
Steven Whitehouse 75f2b879ae GFS2: Merge revoke adding functions
This moves the lo_add function for revokes into trans.c, removing
a function call and making the code easier to read.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2013-01-29 10:27:46 +00:00
Steven Whitehouse 2a00585593 GFS2: Separate LRU scanning from shrinker
This breaks out the LRU scanning function from the shrinker in
preparation for adding other callers to the LRU scanner.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2013-01-29 10:27:28 +00:00
Jiri Kosina 617677295b Merge branch 'master' into for-next
Conflicts:
	drivers/devfreq/exynos4_bus.c

Sync with Linus' tree to be able to apply patches that are
against newer code (mvneta).
2013-01-29 10:48:30 +01:00
Guo Chao b1deefc99e ext4: remove unnecessary NULL pointer check
brelse() and ext4_journal_force_commit() are both inlined and able
to handle NULL.

Signed-off-by: Guo Chao <yan@linux.vnet.ibm.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2013-01-28 21:41:02 -05:00
Guo Chao 41be871f74 ext4: remove useless assignment in dx_probe()
Signed-off-by: Guo Chao <yan@linux.vnet.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2013-01-28 21:33:28 -05:00
Guo Chao 2bbbee2a68 ext4: remove unused variable in add_dirent_to_buf()
After commit 978fef9 (create __ext4_insert_dentry for dir entry
insertion), 'reclen' is not used anymore.

Signed-off-by: Guo Chao <yan@linux.vnet.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
2013-01-28 21:26:44 -05:00
Guo Chao d5ac777305 ext4: release buffer when checksum failed
Commit b0336e8d (ext4: calculate and verify checksums of directory
leaf blocks) and commit dbe89444 (ext4: Calculate and verify checksums
for htree nodes) forget to release buffer when checksum failed, at
some places.

Signed-off-by: Guo Chao <yan@linux.vnet.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
2013-01-28 21:23:24 -05:00
Lukas Czerner b06acd38a4 ext4: remove explicit WARN_ON when ext4_map_blocks() fails
In two places we call WARN_ON() before we print out the debug message,
however we agreed that the WARN_ON() is unnecessary at those places so
remove them.

Also use ext4_warning() instead of ext4_msg() and printk().

Signed-off-by: Lukas Czerner <lczerner@redhat.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2013-01-28 21:21:12 -05:00
Lukas Czerner cfa7275482 ext4: remove unused variable flags
Remove unused variable flags from dump_completed_IO(). The code is
only exercised when EXT4FS_DEBUG is defined.

Signed-off-by: Lukas Czerner <lczerner@redhat.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Reviewed-by: Zheng Liu <wenqing.lz@taobao.com>
2013-01-28 21:14:11 -05:00
Jan Kara fe386132f6 ext4: fix ext4_writepage() to achieve data=ordered guarantees
So far ext4_writepage() skipped writing pages that had any delayed or
unwritten buffers attached. When blocksize < pagesize this breaks
data=ordered mode guarantees as we can have a page with one freshly
allocated buffer whose allocation is part of the committing
transaction and another buffer in the page which is delayed or
unwritten. So fix this problem by calling ext4_bio_writepage()
anyway. It will submit mapped buffers and leave others alone.

Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2013-01-28 21:06:42 -05:00
Jan Kara 8a850c3fb8 ext4: Make ext4_bio_writepage() handle unprepared buffers
So far ext4_bio_writepage() unconditionally cleared dirty bit on all
buffers underlying the page. That implicitely assumes we can write all
buffers. So far that is true because callers call into
ext4_bio_writepage() make sure all buffers in the page are mapped but:

a) it's a data corruption bug waiting to happen
b) in data=ordered mode when blocksize < pagesize we do need to write
   pages that may have only some of dirty buffers mapped.

So change ext4_bio_writepage() to skip buffers that cannot be written without
clearing their dirty bit.

Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2013-01-28 20:53:28 -05:00
Torsten Kaiser 65e3aa77f1 xfs: Fix xfs_swap_extents() after removal of xfs_flushinval_pages()
Commit fb59581404 removed
xfs_flushinval_pages() and changed its callers to use
filemap_write_and_wait() and  truncate_pagecache_range() directly.

But in xfs_swap_extents() this change accidental switched the argument
for 'tip' to 'ip'. This patch switches it back to 'tip'

Signed-off-by: Torsten Kaiser <just.for.lkml@googlemail.com>
Reviewed-by: Ben Myers <bpm@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2013-01-28 16:05:10 -06:00
Torsten Kaiser 2729423cf2 xfs: Fix xfs_swap_extents() after removal of xfs_flushinval_pages()
Commit fb59581404 removed
xfs_flushinval_pages() and changed its callers to use
filemap_write_and_wait() and  truncate_pagecache_range() directly.

But in xfs_swap_extents() this change accidental switched the argument
for 'tip' to 'ip'. This patch switches it back to 'tip'

Signed-off-by: Torsten Kaiser <just.for.lkml@googlemail.com>
Reviewed-by: Ben Myers <bpm@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2013-01-28 13:50:10 -06:00
Jan Kara 4b05d09c18 xfs: Fix possible use-after-free with AIO
Running AIO is pinning inode in memory using file reference. Once AIO
is completed using aio_complete(), file reference is put and inode can
be freed from memory. So we have to be sure that calling aio_complete()
is the last thing we do with the inode.

CC: xfs@oss.sgi.com
CC: Ben Myers <bpm@sgi.com>
CC: stable@vger.kernel.org
Signed-off-by: Jan Kara <jack@suse.cz>
Reviewed-by: Ben Myers <bpm@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2013-01-28 12:51:22 -06:00
Dave Chinner 9f87832a82 xfs: fix shutdown hang on invalid inode during create
When the new inode verify in xfs_iread() fails, the create
transaction is aborted and a shutdown occurs. The subsequent unmount
then hangs in xfs_wait_buftarg() on a buffer that has an elevated
hold count. Debug showed that it was an AGI buffer getting stuck:

[   22.576147] XFS (vdb): buffer 0x2/0x1, hold 0x2 stuck
[   22.976213] XFS (vdb): buffer 0x2/0x1, hold 0x2 stuck
[   23.376206] XFS (vdb): buffer 0x2/0x1, hold 0x2 stuck
[   23.776325] XFS (vdb): buffer 0x2/0x1, hold 0x2 stuck

The trace of this buffer leading up to the shutdown (trimmed for
brevity) looks like:

xfs_buf_init:        bno 0x2 nblks 0x1 hold 1 caller xfs_buf_get_map
xfs_buf_get:         bno 0x2 len 0x200 hold 1 caller xfs_buf_read_map
xfs_buf_read:        bno 0x2 len 0x200 hold 1 caller xfs_trans_read_buf_map
xfs_buf_iorequest:   bno 0x2 nblks 0x1 hold 1 caller _xfs_buf_read
xfs_buf_hold:        bno 0x2 nblks 0x1 hold 1 caller xfs_buf_iorequest
xfs_buf_rele:        bno 0x2 nblks 0x1 hold 2 caller xfs_buf_iorequest
xfs_buf_iowait:      bno 0x2 nblks 0x1 hold 1 caller _xfs_buf_read
xfs_buf_ioerror:     bno 0x2 len 0x200 hold 1 caller xfs_buf_bio_end_io
xfs_buf_iodone:      bno 0x2 nblks 0x1 hold 1 caller _xfs_buf_ioend
xfs_buf_iowait_done: bno 0x2 nblks 0x1 hold 1 caller _xfs_buf_read
xfs_buf_hold:        bno 0x2 nblks 0x1 hold 1 caller xfs_buf_item_init
xfs_trans_read_buf:  bno 0x2 len 0x200 hold 2 recur 0 refcount 1
xfs_trans_brelse:    bno 0x2 len 0x200 hold 2 recur 0 refcount 1
xfs_buf_item_relse:  bno 0x2 nblks 0x1 hold 2 caller xfs_trans_brelse
xfs_buf_rele:        bno 0x2 nblks 0x1 hold 2 caller xfs_buf_item_relse
xfs_buf_unlock:      bno 0x2 nblks 0x1 hold 1 caller xfs_trans_brelse
xfs_buf_rele:        bno 0x2 nblks 0x1 hold 1 caller xfs_trans_brelse
xfs_buf_trylock:     bno 0x2 nblks 0x1 hold 2 caller _xfs_buf_find
xfs_buf_find:        bno 0x2 len 0x200 hold 2 caller xfs_buf_get_map
xfs_buf_get:         bno 0x2 len 0x200 hold 2 caller xfs_buf_read_map
xfs_buf_read:        bno 0x2 len 0x200 hold 2 caller xfs_trans_read_buf_map
xfs_buf_hold:        bno 0x2 nblks 0x1 hold 2 caller xfs_buf_item_init
xfs_trans_read_buf:  bno 0x2 len 0x200 hold 3 recur 0 refcount 1
xfs_trans_log_buf:   bno 0x2 len 0x200 hold 3 recur 0 refcount 1
xfs_buf_item_unlock: bno 0x2 len 0x200 hold 3 flags DIRTY liflags ABORTED
xfs_buf_unlock:      bno 0x2 nblks 0x1 hold 3 caller xfs_buf_item_unlock
xfs_buf_rele:        bno 0x2 nblks 0x1 hold 3 caller xfs_buf_item_unlock

And that is the AGI buffer from cold cache read into memory to
transaction abort. You can see at transaction abort the bli is dirty
and only has a single reference. The item is not pinned, and it's
not in the AIL. Hence the only reference to it is this transaction.

The problem is that the xfs_buf_item_unlock() call is dropping the
last reference to the xfs_buf_log_item attached to the buffer (which
holds a reference to the buffer), but it is not freeing the
xfs_buf_log_item. Hence nothing will ever release the buffer, and
the unmount hangs waiting for this reference to go away.

The fix is simple - xfs_buf_item_unlock needs to detect the last
reference going away in this case and free the xfs_buf_log_item to
release the reference it holds on the buffer.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Ben Myers <bpm@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2013-01-28 12:51:12 -06:00
Dave Chinner f2a459565b xfs: limit speculative prealloc near ENOSPC thresholds
There is a window on small filesytsems where specualtive
preallocation can be larger than that ENOSPC throttling thresholds,
resulting in specualtive preallocation trying to reserve more space
than there is space available. This causes immediate ENOSPC to be
triggered, prealloc to be turned off and flushing to occur. One the
next write (i.e. next 4k page), we do exactly the same thing, and so
effective drive into synchronous 4k writes by triggering ENOSPC
flushing on every page while in the window between the prealloc size
and the ENOSPC prealloc throttle threshold.

Fix this by checking to see if the prealloc size would consume all
free space, and throttle it appropriately to avoid premature
ENOSPC...

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2013-01-28 12:50:50 -06:00
Dave Chinner eb178619f9 xfs: fix _xfs_buf_find oops on blocks beyond the filesystem end
When _xfs_buf_find is passed an out of range address, it will fail
to find a relevant struct xfs_perag and oops with a null
dereference. This can happen when trying to walk a filesystem with a
metadata inode that has a partially corrupted extent map (i.e. the
block number returned is corrupt, but is otherwise intact) and we
try to read from the corrupted block address.

In this case, just fail the lookup. If it is readahead being issued,
it will simply not be done, but if it is real read that fails we
will get an error being reported.  Ideally this case should result
in an EFSCORRUPTED error being reported, but we cannot return an
error through xfs_buf_read() or xfs_buf_get() so this lookup failure
may result in ENOMEM or EIO errors being reported instead.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Ben Myers <bpm@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2013-01-28 12:49:21 -06:00
Brian Foster d26978dd86 xfs: pull up stack_switch check into xfs_bmapi_write
The stack_switch check currently occurs in __xfs_bmapi_allocate,
which means the stack switch only occurs when xfs_bmapi_allocate()
is called in a loop. Pull the check up before the loop in
xfs_bmapi_write() such that the first iteration of the loop has
consistent behavior.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2013-01-28 12:48:55 -06:00
Eric Sandeen 1bee12b8c4 xfs: Do not return EFSCORRUPTED when filesystem probe finds no XFS magic
9802182 changed the return value from EWRONGFS (aka EINVAL)
to EFSCORRUPTED which doesn't seem to be handled properly by
the root filesystem probe.

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Tested-by: Sergei Trofimovich <slyfox@gentoo.org>
Reviewed-by: Ben Myers <bpm@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2013-01-28 12:48:21 -06:00
Jan Kara b6a8e62f8b ext4: simplify mpage_add_bh_to_extent()
The argument b_size of mpage_add_bh_to_extent() was bogus since it was
always == blocksize (which we can easily derive from inode->i_blkbits).
Also second branch of condition:
	if (nrblocks >= EXT4_MAX_TRANS_DATA) {
	} else if ((nrblocks + (b_size >> mpd->inode->i_blkbits)) >
						EXT4_MAX_TRANS_DATA) {
	}
was never taken because (b_size >> mpd->inode->i_blkbits) == 1.

Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2013-01-28 13:06:48 -05:00
Jan Kara f8bec37037 ext4: dirty page has always buffers attached
ext4_writepage(), write_cache_pages_da(), and mpage_da_submit_io()
doesn't have to deal with the case when page doesn't have buffers. We
attach buffers to a page in ->write_begin() and ->page_mkwrite() which
covers all places where a page can become dirty.

Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2013-01-28 12:55:08 -05:00
Jan Kara 002bd7fa3a ext4: simplify list handling in ext4_do_flush_completed_IO()
The function splices i_completed_io_list to its private list
first.  From that moment on we don't need any lock for working with
io_end structures because all io_end structure on the list are only
our own. So we can remove the other two lists in the function and free
io_end immediately after we are done with it.

CC: Dmitry Monakhov <dmonakhov@openvz.org>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2013-01-28 09:49:15 -05:00
Jan Kara 84c17543ab ext4: move work from io_end to inode
It does not make much sense to have struct work in ext4_io_end_t
because we always use it for only one ext4_io_end_t per inode (the
first one in the i_completed_io list). So just move the structure to
inode itself.  This also allows for a small simplification in
processing io_end structures.

Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2013-01-28 09:43:46 -05:00
Jan Kara fe089c77f1 ext4: remove __ext4_journalled_writepage() from mpage_da_submit_io()
We don't support delayed allocation in data=journal mode. So checking for it in
mpage_da_submit_io() doesn't make really sence. If we ever decide to extend
delayed allocation support to data=journal mode, adding
__ext4_journalled_writepage() call will be the least of problems we have to
solve. Most likely we'd have to implement separate writepages call anyways
because we don't have transaction credits for writing more than a single page
so mapping of page buffers would have to be done differently.

Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2013-01-28 09:38:49 -05:00
Jan Kara 1ae48a6354 ext4: use redirty_page_for_writepage() in ext4_bio_write_page()
When we cannot write a page we should use redirty_page_for_writepage()
instead of plain set_page_dirty(). That tells writeback code we have
problems, redirties only the page (redirtying buffers is not needed),
and updates mm accounting of failed page writes.

Also move clearing of buffer dirty flag after io_submit_add_bh(). At that
moment we are sure buffer will be going to disk.

Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2013-01-28 09:32:54 -05:00
Jan Kara 36ade451a5 ext4: Always use ext4_bio_write_page() for writeout
Currently we sometimes used block_write_full_page() and sometimes
ext4_bio_write_page() for writeback (depending on mount options and call
path). Let's always use ext4_bio_write_page() to simplify things a bit.

Reviewed-by: Zheng Liu <wenqing.lz@taobao.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2013-01-28 09:30:52 -05:00
Zheng Liu 8bad6fc813 ext4: add punching hole support for non-extent-mapped files
This patch add supports for indirect file support punching hole.  It
is almost the same as ext4_ext_punch_hole.  First, we invalidate all
pages between this hole, and then we try to deallocate all blocks of
this hole.

A recursive function is used to handle deallocation of blocks.  In
this function, it iterates over the entries in inode's i_blocks or
indirect blocks, and try to free the block for each one of them.

After applying this patch, xfstest #255 will not pass w/o extent because
indirect-based file doesn't support unwritten extents.

Signed-off-by: Zheng Liu <wenqing.lz@taobao.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2013-01-28 09:21:37 -05:00
David Teigland d4e0bfec9b GFS2: fix skip unlock condition
The recent commit fb6791d100
included the wrong logic.  The lvbptr check was incorrectly
added after the patch was tested.

Signed-off-by: David Teigland <teigland@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2013-01-28 09:49:15 +00:00
Trond Myklebust 65436ec0c8 NFSv4.1: Ensure that nfs41_walk_client_list() does start lease recovery
We do need to start the lease recovery thread prior to waiting for the
client initialisation to complete in NFSv4.1.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: Chuck Lever <chuck.lever@oracle.com>
Cc: Ben Greear <greearb@candelatech.com>
Cc: stable@vger.kernel.org [>=3.7]
2013-01-27 15:51:41 -05:00
Trond Myklebust 202c312dba NFSv4: Fix NFSv4 trunking discovery
If walking the list in nfs4[01]_walk_client_list fails, then the most
likely explanation is that the server dropped the clientid before we
actually managed to confirm it. As long as our nfs_client is the very
last one in the list to be tested, the caller can be assured that this
is the case when the final return value is NFS4ERR_STALE_CLIENTID.

Reported-by: Ben Greear <greearb@candelatech.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: Chuck Lever <chuck.lever@oracle.com>
Cc: stable@vger.kernel.org [>=3.7]
Tested-by: Ben Greear <greearb@candelatech.com>
2013-01-27 15:51:28 -05:00
Trond Myklebust 4ae19c2dd7 NFSv4: Fix NFSv4 reference counting for trunked sessions
The reference counting in nfs4_init_client assumes wongly that it
is safe for nfs4_discover_server_trunking() to return a pointer to a
nfs_client prior to bumping the reference count.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: Chuck Lever <chuck.lever@oracle.com>
Cc: Ben Greear <greearb@candelatech.com>
Cc: stable@vger.kernel.org [>=3.7]
2013-01-27 15:51:15 -05:00
Trond Myklebust dee972b967 NFS: Fix error reporting in nfs_xdev_mount
Currently, nfs_xdev_mount converts all errors from clone_server() to
ENOMEM, which can then leak to userspace (for instance to 'mount'). Fix that.
Also ensure that if nfs_fs_mount_common() returns an error, we
don't dprintk(0)...

The regression originated in commit 3d176e3fe4
(NFS: Use nfs_fs_mount_common() for xdev mounts)

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: stable@vger.kernel.org [>= 3.5]
2013-01-27 15:51:15 -05:00
Frederic Weisbecker 6fac4829ce cputime: Use accessors to read task cputime stats
This is in preparation for the full dynticks feature. While
remotely reading the cputime of a task running in a full
dynticks CPU, we'll need to do some extra-computation. This
way we can account the time it spent tickless in userspace
since its last cputime snapshot.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Li Zhong <zhong@linux.vnet.ibm.com>
Cc: Namhyung Kim <namhyung.kim@lge.com>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
2013-01-27 19:23:31 +01:00
Eric W. Biederman b3c6761d9b userns: Allow the userns root to mount ramfs.
There is no backing store to ramfs and file creation
rules are the same as for any other filesystem so
it is semantically safe to allow unprivileged users
to mount it.

The memory control group successfully limits how much
memory ramfs can consume on any system that cares about
a user namespace root using ramfs to exhaust memory
the memory control group can be deployed.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2013-01-26 22:22:38 -08:00
Eric W. Biederman ec2aa8e8dd userns: Allow the userns root to mount of devpts
- The context in which devpts is mounted has no effect on the creation
  of ptys as the /dev/ptmx interface has been used by unprivileged
  users for many years.

- Only support unprivileged mounts in combination with the newinstance
  option to ensure that mounting of /dev/pts in a user namespace will
  not allow the options of an existing mount of devpts to be modified.

- Create /dev/pts/ptmx as the root user in the user namespace that
  mounts devpts so that it's permissions to be changed.

Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2013-01-26 22:22:21 -08:00
Jan Kara ced55f38d6 xfs: Fix possible use-after-free with AIO
Running AIO is pinning inode in memory using file reference. Once AIO
is completed using aio_complete(), file reference is put and inode can
be freed from memory. So we have to be sure that calling aio_complete()
is the last thing we do with the inode.

CC: xfs@oss.sgi.com
CC: Ben Myers <bpm@sgi.com>
CC: stable@vger.kernel.org
Signed-off-by: Jan Kara <jack@suse.cz>
Reviewed-by: Ben Myers <bpm@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2013-01-26 09:43:58 -06:00
Dave Chinner 3b19034d4f xfs: fix shutdown hang on invalid inode during create
When the new inode verify in xfs_iread() fails, the create
transaction is aborted and a shutdown occurs. The subsequent unmount
then hangs in xfs_wait_buftarg() on a buffer that has an elevated
hold count. Debug showed that it was an AGI buffer getting stuck:

[   22.576147] XFS (vdb): buffer 0x2/0x1, hold 0x2 stuck
[   22.976213] XFS (vdb): buffer 0x2/0x1, hold 0x2 stuck
[   23.376206] XFS (vdb): buffer 0x2/0x1, hold 0x2 stuck
[   23.776325] XFS (vdb): buffer 0x2/0x1, hold 0x2 stuck

The trace of this buffer leading up to the shutdown (trimmed for
brevity) looks like:

xfs_buf_init:        bno 0x2 nblks 0x1 hold 1 caller xfs_buf_get_map
xfs_buf_get:         bno 0x2 len 0x200 hold 1 caller xfs_buf_read_map
xfs_buf_read:        bno 0x2 len 0x200 hold 1 caller xfs_trans_read_buf_map
xfs_buf_iorequest:   bno 0x2 nblks 0x1 hold 1 caller _xfs_buf_read
xfs_buf_hold:        bno 0x2 nblks 0x1 hold 1 caller xfs_buf_iorequest
xfs_buf_rele:        bno 0x2 nblks 0x1 hold 2 caller xfs_buf_iorequest
xfs_buf_iowait:      bno 0x2 nblks 0x1 hold 1 caller _xfs_buf_read
xfs_buf_ioerror:     bno 0x2 len 0x200 hold 1 caller xfs_buf_bio_end_io
xfs_buf_iodone:      bno 0x2 nblks 0x1 hold 1 caller _xfs_buf_ioend
xfs_buf_iowait_done: bno 0x2 nblks 0x1 hold 1 caller _xfs_buf_read
xfs_buf_hold:        bno 0x2 nblks 0x1 hold 1 caller xfs_buf_item_init
xfs_trans_read_buf:  bno 0x2 len 0x200 hold 2 recur 0 refcount 1
xfs_trans_brelse:    bno 0x2 len 0x200 hold 2 recur 0 refcount 1
xfs_buf_item_relse:  bno 0x2 nblks 0x1 hold 2 caller xfs_trans_brelse
xfs_buf_rele:        bno 0x2 nblks 0x1 hold 2 caller xfs_buf_item_relse
xfs_buf_unlock:      bno 0x2 nblks 0x1 hold 1 caller xfs_trans_brelse
xfs_buf_rele:        bno 0x2 nblks 0x1 hold 1 caller xfs_trans_brelse
xfs_buf_trylock:     bno 0x2 nblks 0x1 hold 2 caller _xfs_buf_find
xfs_buf_find:        bno 0x2 len 0x200 hold 2 caller xfs_buf_get_map
xfs_buf_get:         bno 0x2 len 0x200 hold 2 caller xfs_buf_read_map
xfs_buf_read:        bno 0x2 len 0x200 hold 2 caller xfs_trans_read_buf_map
xfs_buf_hold:        bno 0x2 nblks 0x1 hold 2 caller xfs_buf_item_init
xfs_trans_read_buf:  bno 0x2 len 0x200 hold 3 recur 0 refcount 1
xfs_trans_log_buf:   bno 0x2 len 0x200 hold 3 recur 0 refcount 1
xfs_buf_item_unlock: bno 0x2 len 0x200 hold 3 flags DIRTY liflags ABORTED
xfs_buf_unlock:      bno 0x2 nblks 0x1 hold 3 caller xfs_buf_item_unlock
xfs_buf_rele:        bno 0x2 nblks 0x1 hold 3 caller xfs_buf_item_unlock

And that is the AGI buffer from cold cache read into memory to
transaction abort. You can see at transaction abort the bli is dirty
and only has a single reference. The item is not pinned, and it's
not in the AIL. Hence the only reference to it is this transaction.

The problem is that the xfs_buf_item_unlock() call is dropping the
last reference to the xfs_buf_log_item attached to the buffer (which
holds a reference to the buffer), but it is not freeing the
xfs_buf_log_item. Hence nothing will ever release the buffer, and
the unmount hangs waiting for this reference to go away.

The fix is simple - xfs_buf_item_unlock needs to detect the last
reference going away in this case and free the xfs_buf_log_item to
release the reference it holds on the buffer.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Ben Myers <bpm@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2013-01-26 09:34:38 -06:00
Greg Kroah-Hartman 422d26b6ec Merge 3.8-rc5 into driver-core-next
This resolves a gpio driver merge issue pointed out in linux-next.

Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-01-25 21:06:30 -08:00
Greg Kroah-Hartman 9f9cba810f Merge 3.8-rc5 into tty-next
This resolves a number of tty driver merge issues found in linux-next

Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-01-25 13:27:36 -08:00
Rafael J. Wysocki 0bb8f3d6ae sysfs: Functions for adding/removing symlinks to/from attribute groups
The most convenient way to expose ACPI power resources lists of a
device is to put symbolic links to sysfs directories representing
those resources into special attribute groups in the device's sysfs
directory.  For this purpose, it is necessary to be able to add
symbolic links to attribute groups.

For this reason, add sysfs helper functions for adding/removing
symbolic links to/from attribute groups, sysfs_add_link_to_group()
and sysfs_remove_link_from_group(), respectively.

This change set includes a build fix from David Rientjes.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-01-25 21:51:13 +01:00
Linus Torvalds d7df025eb4 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs
Pull btrfs fixes from Chris Mason:
 "It turns out that we had two crc bugs when running fsx-linux in a
  loop.  Many thanks to Josef, Miao Xie, and Dave Sterba for nailing it
  all down.  Miao also has a new OOM fix in this v2 pull as well.

  Ilya fixed a regression Liu Bo found in the balance ioctls for pausing
  and resuming a running balance across drives.

  Josef's orphan truncate patch fixes an obscure corruption we'd see
  during xfstests.

  Arne's patches address problems with subvolume quotas.  If the user
  destroys quota groups incorrectly the FS will refuse to mount.

  The rest are smaller fixes and plugs for memory leaks."

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs: (30 commits)
  Btrfs: fix repeated delalloc work allocation
  Btrfs: fix wrong max device number for single profile
  Btrfs: fix missed transaction->aborted check
  Btrfs: Add ACCESS_ONCE() to transaction->abort accesses
  Btrfs: put csums on the right ordered extent
  Btrfs: use right range to find checksum for compressed extents
  Btrfs: fix panic when recovering tree log
  Btrfs: do not allow logged extents to be merged or removed
  Btrfs: fix a regression in balance usage filter
  Btrfs: prevent qgroup destroy when there are still relations
  Btrfs: ignore orphan qgroup relations
  Btrfs: reorder locks and sanity checks in btrfs_ioctl_defrag
  Btrfs: fix unlock order in btrfs_ioctl_rm_dev
  Btrfs: fix unlock order in btrfs_ioctl_resize
  Btrfs: fix "mutually exclusive op is running" error code
  Btrfs: bring back balance pause/resume logic
  btrfs: update timestamps on truncate()
  btrfs: fix btrfs_cont_expand() freeing IS_ERR em
  Btrfs: fix a bug when llseek for delalloc bytes behind prealloc extents
  Btrfs: fix off-by-one in lseek
  ...
2013-01-25 10:55:21 -08:00
Chen Gang 03dafb5f59 ext4: fix memory leak when quota options are specified multiple times
When usrjquota or grpjquota mount options are specified several times,
we leak memory storing the names. Free the memory correctly.

Signed-off-by: Chen Gang <gang.chen@asianux.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Reviewed-by: Jan Kara <jack@suse.cz>
2013-01-24 23:24:58 -05:00
Theodore Ts'o 72ba74508b ext4: release sysfs kobject when failing to enable quotas on mount
In addition, print the error returned from ext4_enable_quotas()

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>
Cc: stable@vger.kernel.org
2013-01-24 23:24:54 -05:00
Linus Torvalds 66e2d3e8c2 Merge branch 'for-next' of git://git.samba.org/sfrench/cifs-2.6
Pull cifs fixes from Steve French:
 "Two small cifs fixes"

* 'for-next' of git://git.samba.org/sfrench/cifs-2.6:
  fs/cifs/cifs_dfs_ref.c: fix potential memory leakage
  cifs: fix srcip_matches() for ipv6
2013-01-24 19:15:43 -08:00
Miao Xie 1eafa6c737 Btrfs: fix repeated delalloc work allocation
btrfs_start_delalloc_inodes() locks the delalloc_inodes list, fetches the
first inode, unlocks the list, triggers btrfs_alloc_delalloc_work/
btrfs_queue_worker for this inode, and then it locks the list, checks the
head of the list again. But because we don't delete the first inode that it
deals with before, it will fetch the same inode. As a result, this function
allocates a huge amount of btrfs_delalloc_work structures, and OOM happens.

Fix this problem by splice this delalloc list.

Reported-by: Alex Lyakas <alex.btrfs@zadarastorage.com>
Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2013-01-24 12:51:27 -05:00
Miao Xie c9f01bfe0c Btrfs: fix wrong max device number for single profile
The max device number of single profile is 1, not 0 (0 means 'as many as
possible'). Fix it.

Cc: Liu Bo <bo.li.liu@oracle.com>
Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
Reviewed-by: Liu Bo <bo.li.liu@oracle.com>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2013-01-24 12:51:26 -05:00
Miao Xie 2cba30f172 Btrfs: fix missed transaction->aborted check
First, though the current transaction->aborted check can stop the commit early
and avoid unnecessary operations, it is too early, and some transaction handles
don't end, those handles may set transaction->aborted after the check.

Second, when we commit the transaction, we will wake up some worker threads to
flush the space cache and inode cache. Those threads also allocate some transaction
handles and may set transaction->aborted if some serious error happens.

So we need more check for ->aborted when committing the transaction. Fix it.

Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2013-01-24 12:51:25 -05:00
Miao Xie 8d25a086eb Btrfs: Add ACCESS_ONCE() to transaction->abort accesses
We may access and update transaction->aborted on the different CPUs without
lock, so we need ACCESS_ONCE() wrapper to prevent the compiler from creating
unsolicited accesses and make sure we can get the right value.

Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2013-01-24 12:51:23 -05:00
Josef Bacik e58dd74bcc Btrfs: put csums on the right ordered extent
I noticed a WARN_ON going off when adding csums because we were going over
the amount of csum bytes that should have been allowed for an ordered
extent.  This is a leftover from when we used to hold the csums privately
for direct io, but now we use the normal ordered sum stuff so we need to
make sure and check if we've moved on to another extent so that the csums
are added to the right extent.  Without this we could end up with csums for
bytenrs that don't have extents to cover them yet.  Thanks,

Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2013-01-24 12:51:22 -05:00
Liu Bo 192000dda2 Btrfs: use right range to find checksum for compressed extents
For compressed extents, the range of checksum is covered by disk length,
and the disk length is different with ram length, so we need to use disk
length instead to get us the right checksum.

Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2013-01-24 12:51:17 -05:00
Josef Bacik b0175117b9 Btrfs: fix panic when recovering tree log
A user reported a BUG_ON(ret) that occured during tree log replay.  Ret was
-EAGAIN, so what I think happened is that we removed an extent that covered
a bitmap entry and an extent entry.  We remove the part from the bitmap and
return -EAGAIN and then search for the next piece we want to remove, which
happens to be an entire extent entry, so we just free the sucker and return.
The problem is ret is still set to -EAGAIN so we trip the BUG_ON().  The
user used btrfs-zero-log so I'm not 100% sure this is what happened so I've
added a WARN_ON() to catch the other possibility.  Thanks,

Reported-by: Jan Steffens <jan.steffens@gmail.com>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2013-01-24 12:49:49 -05:00
Josef Bacik 201a903894 Btrfs: do not allow logged extents to be merged or removed
We drop the extent map tree lock while we're logging extents, so somebody
could come in and merge another extent into this one and screw up our
logging, or they could even remove us from the list which would keep us from
logging the extent or freeing our ref on it, so we need to make sure to not
clear LOGGING until after the extent is logged, and then we can merge it to
adjacent extents.  Thanks,

Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2013-01-24 12:49:48 -05:00
Dave Chinner 4d559a3bcb xfs: limit speculative prealloc near ENOSPC thresholds
There is a window on small filesytsems where specualtive
preallocation can be larger than that ENOSPC throttling thresholds,
resulting in specualtive preallocation trying to reserve more space
than there is space available. This causes immediate ENOSPC to be
triggered, prealloc to be turned off and flushing to occur. One the
next write (i.e. next 4k page), we do exactly the same thing, and so
effective drive into synchronous 4k writes by triggering ENOSPC
flushing on every page while in the window between the prealloc size
and the ENOSPC prealloc throttle threshold.

Fix this by checking to see if the prealloc size would consume all
free space, and throttle it appropriately to avoid premature
ENOSPC...

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2013-01-24 11:08:55 -06:00
Dave Chinner 10616b806d xfs: fix _xfs_buf_find oops on blocks beyond the filesystem end
When _xfs_buf_find is passed an out of range address, it will fail
to find a relevant struct xfs_perag and oops with a null
dereference. This can happen when trying to walk a filesystem with a
metadata inode that has a partially corrupted extent map (i.e. the
block number returned is corrupt, but is otherwise intact) and we
try to read from the corrupted block address.

In this case, just fail the lookup. If it is readahead being issued,
it will simply not be done, but if it is real read that fails we
will get an error being reported.  Ideally this case should result
in an EFSCORRUPTED error being reported, but we cannot return an
error through xfs_buf_read() or xfs_buf_get() so this lookup failure
may result in ENOMEM or EIO errors being reported instead.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Ben Myers <bpm@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2013-01-24 11:06:41 -06:00
Miklos Szeredi fb05f41f5f fuse: cleanup fuse_direct_io()
Fix the following sparse warnings:

fs/fuse/file.c:1216:43: warning: cast removes address space of expression
fs/fuse/file.c:1216:43: warning: incorrect type in initializer (different address spaces)
fs/fuse/file.c:1216:43:    expected void [noderef] <asn:1>*iov_base
fs/fuse/file.c:1216:43:    got void *<noident>
fs/fuse/file.c:1241:43: warning: cast removes address space of expression
fs/fuse/file.c:1241:43: warning: incorrect type in initializer (different address spaces)
fs/fuse/file.c:1241:43:    expected void [noderef] <asn:1>*iov_base
fs/fuse/file.c:1241:43:    got void *<noident>
fs/fuse/file.c:1267:43: warning: cast removes address space of expression
fs/fuse/file.c:1267:43: warning: incorrect type in initializer (different address spaces)
fs/fuse/file.c:1267:43:    expected void [noderef] <asn:1>*iov_base
fs/fuse/file.c:1267:43:    got void *<noident>

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
2013-01-24 16:21:28 +01:00
Maxim Patlasov 5565a9d884 fuse: optimize __fuse_direct_io()
__fuse_direct_io() allocates fuse-requests by calling fuse_get_req(fc, n). The
patch calculates 'n' based on iov[] array. This is useful because allocating
FUSE_MAX_PAGES_PER_REQ page pointers and descriptors for each fuse request
would be waste of memory in case of iov-s of smaller size.

Signed-off-by: Maxim Patlasov <mpatlasov@parallels.com>
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
2013-01-24 16:21:28 +01:00
Maxim Patlasov 7c190c8b9c fuse: optimize fuse_get_user_pages()
Let fuse_get_user_pages() pack as many iov-s to a single fuse_req as
possible. This is very beneficial in case of iov[] consisting of many
iov-s of relatively small sizes (e.g. PAGE_SIZE).

Signed-off-by: Maxim Patlasov <mpatlasov@parallels.com>
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
2013-01-24 16:21:27 +01:00
Maxim Patlasov b98d023a24 fuse: pass iov[] to fuse_get_user_pages()
The patch makes preliminary work for the next patch optimizing scatter-gather
direct IO. The idea is to allow fuse_get_user_pages() to pack as many iov-s
to each fuse request as possible. So, here we only rework all related
call-paths to carry iov[] from fuse_direct_IO() to fuse_get_user_pages().

Signed-off-by: Maxim Patlasov <mpatlasov@parallels.com>
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
2013-01-24 16:21:27 +01:00
Maxim Patlasov 85f40aec88 fuse: use req->page_descs[] for argpages cases
Previously, anyone who set flag 'argpages' only filled req->pages[] and set
per-request page_offset. This patch re-works all cases where argpages=1 to
fill req->page_descs[] properly.

Having req->page_descs[] filled properly allows to re-work fuse_copy_pages()
to copy page fragments described by req->page_descs[]. This will be useful
for next patches optimizing direct_IO.

Signed-off-by: Maxim Patlasov <mpatlasov@parallels.com>
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
2013-01-24 16:21:27 +01:00
Maxim Patlasov b2430d7567 fuse: add per-page descriptor <offset, length> to fuse_req
The ability to save page pointers along with lengths and offsets in fuse_req
will be useful to cover several iovec-s with a single fuse_req.

Per-request page_offset is removed because anybody who need it can use
req->page_descs[0].offset instead.

Signed-off-by: Maxim Patlasov <mpatlasov@parallels.com>
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
2013-01-24 16:21:27 +01:00
Maxim Patlasov 54b966702d fuse: rework fuse_do_ioctl()
fuse_do_ioctl() already calculates the number of pages it's going to use. It is
stored in 'num_pages' variable. So the patch simply uses it for allocating
fuse_req.

Signed-off-by: Maxim Patlasov <mpatlasov@parallels.com>
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
2013-01-24 16:21:26 +01:00
Maxim Patlasov d07f09f509 fuse: rework fuse_perform_write()
The patch allocates as many page pointers in fuse_req as needed to cover
interval [pos .. pos+len-1]. Inline helper fuse_wr_pages() is introduced
to hide this cumbersome arithmetic.

Signed-off-by: Maxim Patlasov <mpatlasov@parallels.com>
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
2013-01-24 16:21:26 +01:00
Maxim Patlasov f8dbdf8182 fuse: rework fuse_readpages()
The patch uses 'nr_pages' argument of fuse_readpages() as heuristics for the
number of page pointers to allocate.

This can be improved further by taking in consideration fc->max_read and gaps
between page indices, but it's not clear whether it's worthy or not.

Signed-off-by: Maxim Patlasov <mpatlasov@parallels.com>
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
2013-01-24 16:21:26 +01:00
Maxim Patlasov 4d53dc99ba fuse: rework fuse_retrieve()
The patch reworks fuse_retrieve() to allocate only so many page pointers
as needed. The core part of the patch is the following calculation:

	num_pages = (num + offset + PAGE_SIZE - 1) >> PAGE_SHIFT;

(thanks Miklos for formula). All other changes are mostly shuffling lines.

Signed-off-by: Maxim Patlasov <mpatlasov@parallels.com>
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
2013-01-24 16:21:26 +01:00
Maxim Patlasov b111c8c0e3 fuse: categorize fuse_get_req()
The patch categorizes all fuse_get_req() invocations into two categories:
 - fuse_get_req_nopages(fc) - when caller doesn't care about req->pages
 - fuse_get_req(fc, n) - when caller need n page pointers (n > 0)

Adding fuse_get_req_nopages() helps to avoid numerous fuse_get_req(fc, 0)
scattered over code. Now it's clear from the first glance when a caller need
fuse_req with page pointers.

The patch doesn't make any logic changes. In multi-page case, it silly
allocates array of FUSE_MAX_PAGES_PER_REQ page pointers. This will be amended
by future patches.

Signed-off-by: Maxim Patlasov <mpatlasov@parallels.com>
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
2013-01-24 16:21:25 +01:00
Maxim Patlasov 4250c0668e fuse: general infrastructure for pages[] of variable size
The patch removes inline array of FUSE_MAX_PAGES_PER_REQ page pointers from
fuse_req. Instead of that, req->pages may now point either to small inline
array or to an array allocated dynamically.

This essentially means that all callers of fuse_request_alloc[_nofs] should
pass the number of pages needed explicitly.

The patch doesn't make any logic changes.

Signed-off-by: Maxim Patlasov <mpatlasov@parallels.com>
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
2013-01-24 16:21:25 +01:00
Anand V. Avati 0b05b18381 fuse: implement NFS-like readdirplus support
This patch implements readdirplus support in FUSE, similar to NFS.
The payload returned in the readdirplus call contains
'fuse_entry_out' structure thereby providing all the necessary inputs
for 'faking' a lookup() operation on the spot.

If the dentry and inode already existed (for e.g. in a re-run of ls -l)
then just the inode attributes timeout and dentry timeout are refreshed.

With a simple client->network->server implementation of a FUSE based
filesystem, the following performance observations were made:

Test: Performing a filesystem crawl over 20,000 files with

sh# time ls -lR /mnt

Without readdirplus:
Run 1: 18.1s
Run 2: 16.0s
Run 3: 16.2s

With readdirplus:
Run 1: 4.1s
Run 2: 3.8s
Run 3: 3.8s

The performance improvement is significant as it avoided 20,000 upcalls
calls (lookup). Cache consistency is no worse than what already is.

Signed-off-by: Anand V. Avati <avati@redhat.com>
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
2013-01-24 16:21:25 +01:00
Cong Ding 10b8c7dff5 fs/cifs/cifs_dfs_ref.c: fix potential memory leakage
When it goes to error through line 144, the memory allocated to *devname is
not freed, and the caller doesn't free it either in line 250. So we free the
memroy of *devname in function cifs_compose_mount_options() when it goes to
error.

Signed-off-by: Cong Ding <dinggnu@gmail.com>
CC: stable <stable@kernel.org>
Reviewed-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <smfrench@gmail.com>
2013-01-22 23:58:16 -06:00
Linus Torvalds 262060ea46 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse
Pull fuse fixes from Miklos Szeredi:
 "This contain a bugfix for CUSE and miscellaneous small fixes"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse:
  fuse: remove unused variable in fuse_try_move_page()
  fuse: make fuse_file_fallocate() static
  fuse: Move CUSE Kconfig entry from fs/Kconfig into fs/fuse/Kconfig
  cuse: fix uninitialized variable warnings
  cuse: do not register multiple devices with identical names
  cuse: use mutex as registration lock instead of spinlocks
2013-01-22 11:53:19 -08:00
Linus Torvalds 05c2cf35c3 f2fs fixes for v3.8-rc5
o Support swap file and link generic_file_remap_pages
 o Enhance the bio streaming flow and free section control
 o Major bug fix on recovery routine
 o Minor bug/warning fixes and code cleanups
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.11 (GNU/Linux)
 
 iQIcBAABAgAGBQJQ/fpIAAoJEEAUqH6CSFDS6BAP/jz/JIPoSu4NunWgVH68tzA4
 xx4lsmJQJQG+1241uA9v4MMkcTzxE0QVNTOX3BpUXBhCMc00aPOPWlWjULridLI9
 0vRE6LpG+NntkyKF+M5T1ydGuQoDlEvqoGs6c5p3yaI6PxzbzmBmipsmqXUA8fyu
 280+OWJoAcALEMJiQ8JsHcDmvM9wdQ+BV/j3BNCm4dqBUA4dYPfDzRKUJYfwqiig
 qmVRseJJaekxrQ2lHG/K/WPAXa8aRcV6khP9tv/BPGRMt+I/fli/J4sWEFT6c73B
 +qYmhrf/RLNJ1O13dePlo3URwMu083PL8QN355GAKUJbMaX/UPjEnq0DLbBOkCS2
 KBQI5O1eiFauEE6YU7p7GuvnLeVkukcXSQNVnRrnWzTUA9CMThZZ2mAgb2lz+iEP
 oZWNirRwnxdcTQRXjPyTCtpPTCgJi4GO1WS5s6HeLP8G8Muo4PzDzcEwMX0Mw5ih
 s4n4wpQ+Zp3h53cc/DxdFAK15uM3XYtUfb92kJwqaEG5VmBy6KvliXDRnNXg7WMI
 imCb08c0Fr0M8ZHOd+UCveICcndFj25jkjx8w7PoE1KBbGJkKf+cjpZ3OhOAliux
 sGtH3EZLV6jx1MjV79OpDXQGoVsvktesCbRcaxc7BbqljXYVNiap8QbzjPnmAt6z
 KKN0GU32eolGgK4Zd/iF
 =vJQc
 -----END PGP SIGNATURE-----

Merge tag 'f2fs-for-3.8-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs

Pull f2fs fixes from Jaegeuk Kim:
 o Support swap file and link generic_file_remap_pages
 o Enhance the bio streaming flow and free section control
 o Major bug fix on recovery routine
 o Minor bug/warning fixes and code cleanups

* tag 'f2fs-for-3.8-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (22 commits)
  f2fs: use _safe() version of list_for_each
  f2fs: add comments of start_bidx_of_node
  f2fs: avoid issuing small bios due to several dirty node pages
  f2fs: support swapfile
  f2fs: add remap_pages as generic_file_remap_pages
  f2fs: add __init to functions in init_f2fs_fs
  f2fs: fix the debugfs entry creation path
  f2fs: add global mutex_lock to protect f2fs_stat_list
  f2fs: remove the blk_plug usage in f2fs_write_data_pages
  f2fs: avoid redundant time update for parent directory in f2fs_delete_entry
  f2fs: remove redundant call to set_blocksize in f2fs_fill_super
  f2fs: move f2fs_balance_fs to punch_hole
  f2fs: add f2fs_balance_fs in several interfaces
  f2fs: revisit the f2fs_gc flow
  f2fs: check return value during recovery
  f2fs: avoid null dereference in f2fs_acl_from_disk
  f2fs: initialize newly allocated dnode structure
  f2fs: update f2fs partition info about SIT/NAT layout
  f2fs: update f2fs document to reflect SIT/NAT layout correctly
  f2fs: remove unneeded INIT_LIST_HEAD at few places
  ...
2013-01-22 10:33:17 -08:00
Namjae Jeon 99600051b0 udf: add extent cache support in case of file reading
This patch implements extent caching in case of file reading.
While reading a file, currently, UDF reads metadata serially
which takes a lot of time depending on the number of extents present
in the file. Caching last accessd extent improves metadata read time.
Instead of reading file metadata from start, now we read from
the cached extent.

This patch considerably improves the time spent by CPU in kernel mode.
For example, while reading a 10.9 GB file using dd:
Time before applying patch:
11677022208 bytes (10.9GB) copied, 1529.748921 seconds, 7.3MB/s
real    25m 29.85s
user    0m 12.41s
sys     15m 34.75s

Time after applying patch:
11677022208 bytes (10.9GB) copied, 1469.338231 seconds, 7.6MB/s
real    24m 29.44s
user    0m 15.73s
sys     3m 27.61s

[JK: Fix bh refcounting issues, simplify initialization]

Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com>
Signed-off-by: Ashish Sangwan <a.sangwan@samsung.com>
Signed-off-by: Bonggil Bak <bgbak@samsung.com>
Signed-off-by: Jan Kara <jack@suse.cz>
2013-01-22 10:48:31 +01:00
Dan Carpenter d8b79b2f94 f2fs: use _safe() version of list_for_each
This is calling list_del() inside a loop which is a problem when we try
move to the next item on the list.  I've converted it to use the _safe
version.  And also, as a cleanup, I've converted it to use
list_for_each_entry instead of list_for_each.

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-01-22 10:49:00 +09:00
Jaegeuk Kim 9af45ef5ab f2fs: add comments of start_bidx_of_node
The caller of start_bidx_of_node() should give proper node offsets which
point only direct node blocks. Otherwise, it is a caller's bug.
This patch adds comments to make it clear.

Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-01-22 10:48:59 +09:00
Jaegeuk Kim a7fdffbd3e f2fs: avoid issuing small bios due to several dirty node pages
If some small bios of dirty node pages are supposed to be issued during the
sequential data writes, there-in well-produced consecutive data bios are able
to be split by the small node bios, resulting in performance degradation.
So, let's collect a number of dirty node pages until reaching a threshold.
And, by default, I set the threshold as 2MB, a segment size.

This improves sequential write performance on i5, 512GB SSD (830 w/ SATA2) as
follows.
Before: 231 MB/s -> After: 255 MB/s

Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
Reviewed-by: Namjae Jeon <namjae.jeon@samsung.com>
2013-01-22 10:48:59 +09:00
Jaegeuk Kim c01e54b770 f2fs: support swapfile
This patch adds f2fs_bmap operation to the data address space.
This enables f2fs to support swapfile.

Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-01-22 10:48:58 +09:00
Jaegeuk Kim 692bb55d1a f2fs: add remap_pages as generic_file_remap_pages
This was added for all the file systems before.

See the following commit.

commit id: 0b173bc4da

[PATCH] mm: kill vma flag VM_CAN_NONLINEAR

This patch moves actual ptes filling for non-linear file mappings
into special vma operation: ->remap_pages().

File system must implement this method to get non-linear mappings support,
if it uses filemap_fault() then generic_file_remap_pages() can be used.

Now device drivers can implement this method and obtain nonlinear vma support."

Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-01-22 10:48:58 +09:00
Namjae Jeon 6e6093a8f1 f2fs: add __init to functions in init_f2fs_fs
Add __init to functions in init_f2fs_fs for code consistency.

Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com>
Signed-off-by: Amit Sahrawat <a.sahrawat@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-01-22 10:48:38 +09:00
Ilya Dryomov a105bb88f4 Btrfs: fix a regression in balance usage filter
Commit 3fed40cc ("Btrfs: cleanup duplicated division functions"), which
was merged into 3.8-rc1, has introduced a regression by removing logic
that was guarding us against bad user input.  Bring it back.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2013-01-21 20:40:27 -05:00
Chris Mason 83bfccb5c0 Merge branch 'mutex-ops@next-for-chris' of git://github.com/idryomov/btrfs-unstable into linus 2013-01-21 20:39:06 -05:00
Chris Mason daf2c08911 Merge branch 'for-chris' of git://git.kernel.org/pub/scm/linux/kernel/git/josef/btrfs-next into linus 2013-01-21 20:26:55 -05:00
Arne Jansen 2cf6870396 Btrfs: prevent qgroup destroy when there are still relations
Currently you can just destroy a qgroup even though it is in use by other qgroups
or has qgroups assigned to it. This patch prevents destruction of qgroups unless
they are completely unused. Otherwise destroy will return EBUSY.

Reported-by: Eric Hopper <hopper@omnifarious.org>
Signed-off-by: Arne Jansen <sensille@gmx.net>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2013-01-21 20:18:11 -05:00
Arne Jansen ff24858c65 Btrfs: ignore orphan qgroup relations
If a qgroup that has still assignments is deleted by the user, the corresponding
relations are left in the tree. This leads to an unmountable filesystem.
With this patch, those relations are simple ignored.

Reported-by: Eric Hopper <hopper@omnifarious.org>
Signed-off-by: Arne Jansen <sensille@gmx.net>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2013-01-21 20:18:11 -05:00
Kees Cook b7e17a10ed fs/ufs: remove depends on CONFIG_EXPERIMENTAL
The CONFIG_EXPERIMENTAL config item has not carried much meaning for a
while now and is almost always enabled by default. As agreed during the
Linux kernel summit, remove it from any "depends on" lines in Kconfigs.

CC: Evgeniy Dushistov <dushistov@mail.ru>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-01-21 14:39:06 -08:00
Kees Cook f987c90257 fs/nfsd: remove depends on CONFIG_EXPERIMENTAL
The CONFIG_EXPERIMENTAL config item has not carried much meaning for a
while now and is almost always enabled by default. As agreed during the
Linux kernel summit, remove it from any "depends on" lines in Kconfigs.

CC: "J. Bruce Fields" <bfields@fieldses.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-01-21 14:39:05 -08:00
Kees Cook 03edef0fea fs/logfs: remove depends on CONFIG_EXPERIMENTAL
The CONFIG_EXPERIMENTAL config item has not carried much meaning for a
while now and is almost always enabled by default. As agreed during the
Linux kernel summit, remove it from any "depends on" lines in Kconfigs.

CC: Joern Engel <joern@logfs.org>
CC: Prasad Joshi <prasadjoshi.linux@gmail.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-01-21 14:39:05 -08:00
Kees Cook cf98c5e568 fs/jffs2: remove depends on CONFIG_EXPERIMENTAL
The CONFIG_EXPERIMENTAL config item has not carried much meaning for a
while now and is almost always enabled by default. As agreed during the
Linux kernel summit, remove it from any "depends on" lines in Kconfigs.

CC: David Woodhouse <dwmw2@infradead.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-01-21 14:39:05 -08:00
Kees Cook adae07485e fs/hfs: remove depends on CONFIG_EXPERIMENTAL
The CONFIG_EXPERIMENTAL config item has not carried much meaning for a
while now and is almost always enabled by default. As agreed during the
Linux kernel summit, remove it from any "depends on" lines in Kconfigs.

Cc: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-01-21 14:39:05 -08:00
Kees Cook 462f16a557 fs/efs: remove depends on CONFIG_EXPERIMENTAL
The CONFIG_EXPERIMENTAL config item has not carried much meaning for a
while now and is almost always enabled by default. As agreed during the
Linux kernel summit, remove it from any "depends on" lines in Kconfigs.

Cc: Neil Brown <neilb@suse.de>
Cc: "J. Bruce Fields" <bfields@fieldses.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-01-21 14:39:05 -08:00
Kees Cook 00f3616b25 fs/cifs: remove depends on CONFIG_EXPERIMENTAL
The CONFIG_EXPERIMENTAL config item has not carried much meaning for a
while now and is almost always enabled by default. As agreed during the
Linux kernel summit, remove it from any "depends on" lines in Kconfigs.

CC: Steve French <sfrench@samba.org>
CC: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-01-21 14:39:05 -08:00
Kees Cook 38db331b57 fs/btrfs: remove depends on CONFIG_EXPERIMENTAL
The CONFIG_EXPERIMENTAL config item has not carried much meaning for a
while now and is almost always enabled by default. As agreed during the
Linux kernel summit, remove it from any "depends on" lines in Kconfigs.

Cc: Chris Mason <chris.mason@fusionio.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-01-21 14:39:05 -08:00
Kees Cook d0e09c80f0 fs/bfs: remove depends on CONFIG_EXPERIMENTAL
The CONFIG_EXPERIMENTAL config item has not carried much meaning for a
while now and is almost always enabled by default. As agreed during the
Linux kernel summit, remove it from any "depends on" lines in Kconfigs.

Cc: "Tigran A. Aivazian" <tigran@aivazian.fsnet.co.uk>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-01-21 14:39:04 -08:00
Kees Cook 5c8e0226e7 fs/befs: remove depends on CONFIG_EXPERIMENTAL
The CONFIG_EXPERIMENTAL config item has not carried much meaning for a
while now and is almost always enabled by default. As agreed during the
Linux kernel summit, remove it from any "depends on" lines in Kconfigs.

Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-01-21 14:39:04 -08:00
Kees Cook 8dbd5d6df3 fs/afs: remove depends on CONFIG_EXPERIMENTAL
The CONFIG_EXPERIMENTAL config item has not carried much meaning for a
while now and is almost always enabled by default. As agreed during the
Linux kernel summit, remove it from any "depends on" lines in Kconfigs.

CC: David Howells <dhowells@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-01-21 14:39:04 -08:00
Kees Cook 6d7a19fa74 fs/affs: remove depends on CONFIG_EXPERIMENTAL
The CONFIG_EXPERIMENTAL config item has not carried much meaning for a
while now and is almost always enabled by default. As agreed during the
Linux kernel summit, remove it from any "depends on" lines in Kconfigs.

Cc: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-01-21 14:39:04 -08:00
Kees Cook acd4fd07d4 fs/adfs: remove depends on CONFIG_EXPERIMENTAL
The CONFIG_EXPERIMENTAL config item has not carried much meaning for a
while now and is almost always enabled by default. As agreed during the
Linux kernel summit, remove it from any "depends on" lines in Kconfigs.

Acked-by: Stuart Swales <stuart.swales.croftnuisk@gmail.com>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-01-21 14:39:04 -08:00
Kees Cook c904197533 fs/9p: remove depends on CONFIG_EXPERIMENTAL
The CONFIG_EXPERIMENTAL config item has not carried much meaning for a
while now and is almost always enabled by default. As agreed during the
Linux kernel summit, remove it from any "depends on" lines in Kconfigs.

CC: Eric Van Hensbergen <ericvh@gmail.com>
CC: Ron Minnich <rminnich@sandia.gov>
CC: Latchesar Ionkov <lucho@ionkov.net>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-01-21 14:39:04 -08:00
Jan Kara 9734c971aa udf: Write LVID to disk after opening / closing
So far we just marked the buffer as dirty and left writing on flusher thread
but especially on opening that opens possible race window where we could write
other modified fs structures to disk before we mark filesystem as open. So sync
LVID buffer to disk after opening and closing fs.

Reported-by: Steve Nickel <snickel58@gmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>
2013-01-21 11:19:58 +01:00
Wang Shilong c04e88e271 Ext3: return ENOMEM rather than EIO if sb_getblk fails
It will be better to use ENOMEM rather than EIO, because the only
reason that sb_getblk fails is that allocation fails.

Signed-off-by: Wang Shilong <wangsl-fnst@cn.fujitsu.com>
Signed-off-by: Jan Kara <jack@suse.cz>
2013-01-21 11:19:57 +01:00
Wang Shilong ab6a773dbc Ext2: return ENOMEM rather than EIO if sb_getblk fails
As the only reason that sb_getblks fails is that allocation fails.
It will be better to use ENOMEM rather than EIO.

Signed-off-by: Wang Shilong <wangsl-fnst@cn.fujitsu.com>
Signed-off-by: Jan Kara <jack@suse.cz>
2013-01-21 11:19:57 +01:00
Wang Shilong 1b7d76e9b1 Ext3: use unlikely to improve the efficiency of the kernel
Because the function 'sb_getblk' seldomly fails to return
NULL value,it will be better to use unlikely to check it.

Signed-off-by: Wang Shilong <wangsl-fnst@cn.fujitsu.com>
Signed-off-by: Jan Kara <jack@suse.cz>
2013-01-21 11:19:57 +01:00
Wang Shilong 2b0542a4a0 Ext2: use unlikely to improve the efficiency of the kernel
Because the function 'sb_getblk' seldomly fails to return
NULL value. It will be better to use unlikely to optimize it.

Signed-off-by: Wang shilong <wangsl-fnst@cn.fujitsu.com>
Signed-off-by: Jan Kara <jack@suse.cz>
2013-01-21 11:19:56 +01:00
Wang Shilong 61f43e6880 Ext3: add necessary check in case IO error happens
As we know io error may happen when the function 'sb_getblk'
is called.Add necessary check for it

The patch also fix a coding style problem.

Signed-off-by: Wang Shilong <wangsl-fnst@cn.fujitsu.com>
Signed-off-by: Jan Kara <jack@suse.cz>
2013-01-21 11:19:56 +01:00
Wang Shilong 8d8759eb48 Ext2: free memory allocated and forget buffer head when io error happens
Add a necessary check when an io error happens.
If io error happens,free the memory allocated and forget
buffer head.

Signed-off-by: Wang Shilong <wangsl-fnst@cn.fujitsu.com>
Signed-off-by: Jan Kara <jack@suse.cz>
2013-01-21 11:19:56 +01:00
Jan Kara f56426ae4d ext3: Fix memory leak when quota options are specified multiple times
When usrjquota or grpjquota mount options are specified several times,
we leak memory storing the names. Free the memory correctly.

Reported-by: Chen Gang <gang.chen@asianux.com>
Signed-off-by: Jan Kara <jack@suse.cz>
2013-01-21 11:19:55 +01:00
Guo Chao 306a74920b ext3, ext4, ocfs2: remove unused macro NAMEI_RA_INDEX
This macro, initially introduced by ext2 in v0.99.15, does not
have any users from the beginning. It has been removed in later
ext2 version but still remains in the code of ext3, ext4, ocfs2.
Remove this macro there.

Cc: Jan Kara <jack@suse.cz>
Cc: linux-ext4@vger.kernel.org
Cc: ocfs2-devel@oss.oracle.com
Acked-by: Mark Fasheh <mfasheh@suse.de>
Acked-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Guo Chao <yan@linux.vnet.ibm.com>
Signed-off-by: Jan Kara <jack@suse.cz>
2013-01-21 11:19:55 +01:00
Nickolai Zeldovich e3e2775ced cifs: fix srcip_matches() for ipv6
srcip_matches() previously had code like this:

  srcip_matches(..., struct sockaddr *rhs) {
    /* ... */
    struct sockaddr_in6 *vaddr6 = (struct sockaddr_in6 *) &rhs;
    return ipv6_addr_equal(..., &vaddr6->sin6_addr);
  }

which interpreted the values on the stack after the 'rhs' pointer as an
ipv6 address.  The correct thing to do is to use 'rhs', not '&rhs'.

Signed-off-by: Nickolai Zeldovich <nickolai@csail.mit.edu>
Reviewed-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
2013-01-21 01:37:26 -06:00
Ilya Dryomov 25122d15e2 Btrfs: reorder locks and sanity checks in btrfs_ioctl_defrag
Operation-specific check (whether subvol is readonly or not) should go
after the mutual exclusiveness check.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2013-01-20 16:21:22 +02:00
Ilya Dryomov 4ac20c70b0 Btrfs: fix unlock order in btrfs_ioctl_rm_dev
Fix unlock order in btrfs_ioctl_rm_dev().

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2013-01-20 16:21:21 +02:00
Ilya Dryomov 18f39c416d Btrfs: fix unlock order in btrfs_ioctl_resize
Fix unlock order in btrfs_ioctl_resize().

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2013-01-20 16:21:20 +02:00
Ilya Dryomov 2c0c9da02a Btrfs: fix "mutually exclusive op is running" error code
The error code that is returned in response to starting a mutually
exclusive operation when there is one already running got silently
changed from EINVAL to EINPROGRESS by 5ac00add.  Returning EINPROGRESS
to, say, add_dev, when rm_dev is running is misleading.  Furthermore,
the operation itself may want to use EINPROGRESS for other purposes.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2013-01-20 16:21:18 +02:00
Ilya Dryomov ed0fb78fb6 Btrfs: bring back balance pause/resume logic
Balance pause/resume logic got broken by 5ac00add (went in into 3.8-rc1
as part of dev-replace merge).  Offending commit took a stab at making
mutually exclusive volume operations (add_dev, rm_dev, resize, balance,
replace_dev) not block behind volume_mutex if another such operation is
in progress and instead return an error right away.  Balancing front-end
relied on the blocking behaviour, so the fix is ugly, but short of a
complete rework, it's the best we can do.

Reported-by: Liu Bo <bo.li.liu@oracle.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2013-01-20 16:21:17 +02:00
Joe Millenbach 4f73bc4dd3 tty: Added a CONFIG_TTY option to allow removal of TTY
The option allows you to remove TTY and compile without errors. This
saves space on systems that won't support TTY interfaces anyway.
bloat-o-meter output is below.

The bulk of this patch consists of Kconfig changes adding "depends on
TTY" to various serial devices and similar drivers that require the TTY
layer.  Ideally, these dependencies would occur on a common intermediate
symbol such as SERIO, but most drivers "select SERIO" rather than
"depends on SERIO", and "select" does not respect dependencies.

bloat-o-meter output comparing our previous minimal to new minimal by
removing TTY.  The list is filtered to not show removed entries with awk
'$3 != "-"' as the list was very long.

add/remove: 0/226 grow/shrink: 2/14 up/down: 6/-35356 (-35350)
function                                     old     new   delta
chr_dev_init                                 166     170      +4
allow_signal                                  80      82      +2
static.__warned                              143     142      -1
disallow_signal                               63      62      -1
__set_special_pids                            95      94      -1
unregister_console                           126     121      -5
start_kernel                                 546     541      -5
register_console                             593     588      -5
copy_from_user                                45      40      -5
sys_setsid                                   128     120      -8
sys_vhangup                                   32      19     -13
do_exit                                     1543    1526     -17
bitmap_zero                                   60      40     -20
arch_local_irq_save                          137     117     -20
release_task                                 674     652     -22
static.spin_unlock_irqrestore                308     260     -48

Signed-off-by: Joe Millenbach <jmillenbach@gmail.com>
Reviewed-by: Jamey Sharp <jamey@minilop.net>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-01-18 16:15:27 -08:00
Ben Myers 003fd6c8be xfs: fix fs/xfs/xfs_log.c:1740:39: error: 'B_TRUE' undeclared
Commit 667a9291c5 "xfs: Remove boolean_t typedef completely." didn't.

Remove a stray B_TRUE that breaks CONFIG_XFS_DEBUG=y.

Signed-off-by: Ben Myers <bpm@sgi.com>
Reported-by: Wu Fengguang <fengguang.wu@intel.com>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
2013-01-18 15:11:57 -06:00
Greg Kroah-Hartman ed408f7c0f Merge 3.9-rc4 into driver-core-next
This is to fix up a build problem with a wireless driver due to the
dynamic-debug patches in this branch.

Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-01-17 19:48:18 -08:00
Brian Foster 9e96fe6df4 xfs: pull up stack_switch check into xfs_bmapi_write
The stack_switch check currently occurs in __xfs_bmapi_allocate,
which means the stack switch only occurs when xfs_bmapi_allocate()
is called in a loop. Pull the check up before the loop in
xfs_bmapi_write() such that the first iteration of the loop has
consistent behavior.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2013-01-17 17:53:37 -06:00
Thiago Farina 667a9291c5 xfs: Remove boolean_t typedef completely.
Since we are using C99 we have one builtin defined in include/linux/types.h,
use that instead.

v2: you missed one in fs/xfs/xfs_qm_bhv.c, cleaned up. -bpm

Signed-off-by: Thiago Farina <tfarina@chromium.org>
Reviewed-by: Ben Myers <bpm@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2013-01-17 17:32:57 -06:00
Greg Kroah-Hartman 595e0eb067 Revert "sysfs: Convert print_symbol to %pSR"
This reverts commit 6ad58fa82d as %pSR
isn't in the tree yet.

Cc: Joe Perches <joe@perches.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-01-17 13:09:57 -08:00
Sasha Levin 1884bd4b14 debugfs: remove redundant initialization of dentry
We already initialize it to NULL when declaring it, no need to do
that twice.

Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-01-17 13:02:08 -08:00
Joe Perches 6ad58fa82d sysfs: Convert print_symbol to %pSR
Use the new vsprintf extension to avoid any possible
message interleaving.

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-01-17 12:57:07 -08:00
Bin Wang 6b8fbde418 sysfs: Fixed a trailing white space error
This patch removes the trailing white space in fs/sysfs/mount.c.

Signed-off-by: Bin Wang <wbin00@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-01-17 12:54:47 -08:00
Wei Yongjun 8f706111a8 fuse: remove unused variable in fuse_try_move_page()
The variables mapping,index are initialized but never used
otherwise, so remove the unused variables.

dpatch engine is used to auto generate this patch.
(https://github.com/weiyj/dpatch)

Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
2013-01-17 13:09:59 +01:00
Miklos Szeredi cdadb11cef fuse: make fuse_file_fallocate() static
Fix the following sparse warning:

fs/fuse/file.c:2249:6: warning: symbol 'fuse_file_fallocate' was not declared. Should it be static?

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
2013-01-17 13:09:47 +01:00
Robert P. J. Day 807185eb3e fuse: Move CUSE Kconfig entry from fs/Kconfig into fs/fuse/Kconfig
Given that CUSE depends on FUSE, it only makes sense to move its
Kconfig entry into the FUSE Kconfig file.  Also, add a few grammatical
and semantic touchups.

Signed-off-by: Robert P. J. Day <rpjday@crashcourse.ca>
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
2013-01-17 13:08:45 +01:00
Miklos Szeredi e2560362cc cuse: fix uninitialized variable warnings
Fix the following compiler warnings:

fs/fuse/cuse.c: In function 'cuse_process_init_reply':
fs/fuse/cuse.c:288:24: warning: 'val' may be used uninitialized in this function [-Wmaybe-uninitialized]
fs/fuse/cuse.c:272:14: note: 'val' was declared here
fs/fuse/cuse.c:284:10: warning: 'key' may be used uninitialized in this function [-Wmaybe-uninitialized]
fs/fuse/cuse.c:272:8: note: 'key' was declared here

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
2013-01-17 13:05:52 +01:00
David Herrmann 30783587b0 cuse: do not register multiple devices with identical names
Sysfs doesn't allow two devices with the same name, but we register a
sysfs entry for each cuse device without checking for name collisions.
This extends the registration to first check whether the name was already
registered.

To avoid race-conditions between the name-check and linking the device, we
need to protect the whole registration with a mutex.

Signed-off-by: David Herrmann <dh.herrmann@googlemail.com>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
2013-01-17 13:04:57 +01:00
David Herrmann 8ce03fd76d cuse: use mutex as registration lock instead of spinlocks
We need to check for name-collisions during cuse-device registration. To
avoid race-conditions, this needs to be protected during the whole device
registration. Therefore, replace the spinlocks by mutexes first so we can
safely extend the locked regions to include more expensive or sleeping
code paths.

Signed-off-by: David Herrmann <dh.herrmann@googlemail.com>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
2013-01-17 13:04:51 +01:00
Philippe De Muyter 4de80273a2 fs/jfs: Fix typo in comment : 'how may' -> 'how many'
Signed-off-by: Philippe De Muyter <phdm@macqel.be>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2013-01-17 10:18:02 +01:00
Zheng Liu aaddea812c ext4: add tracepoint in punching hole
This patch adds a tracepoint in ext4_punch_hole.

CC: Lukas Czerner <lczerner@redhat.com>
Signed-off-by: Zheng Liu <wenqing.lz@taobao.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2013-01-16 20:21:26 -05:00
Linus Torvalds dfdebc2483 xfs: bugfixes for 3.8-rc4
- fix(es) for compound buffers
 - fix for dquot soft timer asserts due to overflow of d_blk_softlimit
 - fix for regression in dir v2 code introduced in commit 20f7e9f3
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.10 (GNU/Linux)
 
 iQIcBAABAgAGBQJQ9zKnAAoJENaLyazVq6ZORGcP/RemqCHJEw0a89Y0tLLLAcz/
 Es97kJMESdvi3gX3JTdz3vC8LP21dSCR3k3MvVgucb8RsvGoiLixrmluIRxKb79M
 DEmz9YJ/qxFIpnM9y46VxCYV+/ezxUDEv68wA6T2wJbof26nTLlTj2gAgqjvyWiF
 R1c1OmdCsTfA257UvxfxSVixVnWv7E2io2ZXUGsrBkP4J9OMaMtn00UYOuP1YL8S
 NJ44z9QAzTqVEbAfGeaeV/QVUJzMj/IqWCwF7YKEhfmccO/tPyN0+nMG2DI0Fp5e
 cYGsi4JnaFbqE6Aa/7mu3kv8lYnPe0n3t9d3EwzxOEx+PAvuY8N0EW8Qa4c+805n
 zXFvAroLgP0jYEEuIfEGYIwDPxG0xjor6ztu8e2twcIj6cDHzSpeYaDPnYvWJlwu
 FiupnVu+3FX6mVY1jCealI47nOwzM12R7nXysqF3F6Sf95xGJtG3BoTIKioNqk1g
 dzJGMQvwg/WLvquYb9W/ZNb1T314R23wdYtmI7gWJ74z9IQqWCZBWFYyBhQ8y1Pr
 Vf3LFjzqNqqnYNzoe8Wnn9wKQ57Es7onAo34Y9HZCOkslZsn5nKriNTXNN6Q9Upc
 5RKvj1CbTpKAJYrrhWryI1HtlDKqqtMFdmRQulSu+O9ZJuWZh4XNTu4t3oewt0Ac
 5otZwOdk53V3tGxt3prs
 =gA4q
 -----END PGP SIGNATURE-----

Merge tag 'for-linus-v3.8-rc4' of git://oss.sgi.com/xfs/xfs

Pull xfs bugfixes from Ben Myers:

 - fix(es) for compound buffers

 - fix for dquot soft timer asserts due to overflow of d_blk_softlimit

 - fix for regression in dir v2 code introduced in commit 20f7e9f372
   ("xfs: factor dir2 block read operations")

* tag 'for-linus-v3.8-rc4' of git://oss.sgi.com/xfs/xfs:
  xfs: recalculate leaf entry pointer after compacting a dir2 block
  xfs: remove int casts from debug dquot soft limit timer asserts
  xfs: fix the multi-segment log buffer format
  xfs: fix segment in xfs_buf_item_format_segment
  xfs: rename bli_format to avoid confusion with bli_formats
  xfs: use b_maps[] for discontiguous buffers
2013-01-16 16:19:54 -08:00
Eric Sandeen aeb4f20a02 xfs: Do not return EFSCORRUPTED when filesystem probe finds no XFS magic
9802182 changed the return value from EWRONGFS (aka EINVAL)
to EFSCORRUPTED which doesn't seem to be handled properly by
the root filesystem probe.

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Tested-by: Sergei Trofimovich <slyfox@gentoo.org>
Reviewed-by: Ben Myers <bpm@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2013-01-16 17:33:53 -06:00
Eric Sandeen 37f13561de xfs: recalculate leaf entry pointer after compacting a dir2 block
Dave Jones hit this assert when doing a compile on recent git, with
CONFIG_XFS_DEBUG enabled:

XFS: Assertion failed: (char *)dup - (char *)hdr == be16_to_cpu(*xfs_dir2_data_unused_tag_p(dup)), file: fs/xfs/xfs_dir2_data.c, line: 828

Upon further digging, the tag found by xfs_dir2_data_unused_tag_p(dup)
contained "2" and not the proper offset, and I found that this value was
changed after the memmoves under "Use a stale leaf for our new entry."
in xfs_dir2_block_addname(), i.e.

                        memmove(&blp[mid + 1], &blp[mid],
                                (highstale - mid) * sizeof(*blp));

overwrote it.

What has happened is that the previous call to xfs_dir2_block_compact()
has rearranged things; it changes btp->count as well as the
blp array.  So after we make that call, we must recalculate the
proper pointer to the leaf entries by making another call to
xfs_dir2_block_leaf_p().

Dave provided a metadump image which led to a simple reproducer
(create a particular filename in the affected directory) and this
resolves the testcase as well as the bug on his live system.

Thanks also to dchinner for looking at this one with me.

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Tested-by: Dave Jones <davej@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2013-01-16 16:08:55 -06:00
Brian Foster ab7eac2200 xfs: remove int casts from debug dquot soft limit timer asserts
The int casts here make it easy to trigger an assert with a large
soft limit. For example, set a >4TB soft limit on an empty volume
to reproduce a (0 > -x) comparison due to an overflow of
d_blk_softlimit.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Ben Myers <bpm@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2013-01-16 16:08:40 -06:00
Mark Tinguely 91e4bac0b7 xfs: fix the multi-segment log buffer format
Per Dave Chinner suggestion, this patch:
 1) Corrects the detection of whether a multi-segment buffer is
    still tracking data.
 2) Clears all the buffer log formats for a multi-segment buffer.

Signed-off-by: Mark Tinguely <tinguely@sgi.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2013-01-16 16:08:08 -06:00
Mark Tinguely 2d0e9df579 xfs: fix segment in xfs_buf_item_format_segment
Not every segment in a multi-segment buffer is dirty in a
transaction and they will not be outputted. The assert in
xfs_buf_item_format_segment() that checks for the at least
one chunk of data in the segment to be used is not necessary
true for multi-segmented buffers.

Signed-off-by: Mark Tinguely <tinguely@sgi.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2013-01-16 16:07:56 -06:00
Mark Tinguely 0f22f9d0cd xfs: rename bli_format to avoid confusion with bli_formats
Rename the bli_format structure to __bli_format to avoid
accidently confusing them with the bli_formats pointer.

Signed-off-by: Mark Tinguely <tinguely@sgi.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2013-01-16 16:07:37 -06:00
Mark Tinguely d44d9bc68e xfs: use b_maps[] for discontiguous buffers
Commits starting at 77c1a08 introduced a multiple segment support
to xfs_buf. xfs_trans_buf_item_match() could not find a multi-segment
buffer in the transaction because it was looking at the single segment
block number rather than the multi-segment b_maps[0].bm.bn. This
results on a recursive buffer lock that can never be satisfied.

This patch:
 1) Changed the remaining b_map accesses to be b_maps[0] accesses.
 2) Renames the single segment b_map structure to __b_map to avoid
    future confusion.

Signed-off-by: Mark Tinguely <tinguely@sgi.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ben Myers <bpm@sgi.com>
2013-01-16 16:07:11 -06:00
Linus Torvalds 31db720643 Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs
Pull ext3 and udf fixes from Jan Kara:
 "One ext3 performance regression fix and one udf regression fix (oops
  on interrupted mount)."

* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
  UDF: Fix a null pointer dereference in udf_sb_free_partitions
  jbd: don't wake kjournald unnecessarily
2013-01-16 10:55:10 -08:00
Kees Cook 1e817fb62c time: create __getnstimeofday for WARNless calls
The pstore RAM backend can get called during resume, and must be defensive
against a suspended time source. Expose getnstimeofday logic that returns
an error instead of a WARN. This can be detected and the timestamp can
be zeroed out.

Reported-by: Doug Anderson <dianders@chromium.org>
Cc: John Stultz <johnstul@us.ibm.com>
Cc: Anton Vorontsov <anton.vorontsov@linaro.org>
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: John Stultz <john.stultz@linaro.org>
2013-01-15 18:16:02 -08:00
Akinobu Mita 3d251a5b9e UBIFS: rename random32() to prandom_u32()
Use more preferable function name which implies using a pseudo-random
number generator.

Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
2013-01-15 15:45:27 +02:00
Namjae Jeon 4589d25d01 f2fs: fix the debugfs entry creation path
As the "status" debugfs entry will be maintained for entire F2FS filesystem
irrespective of the number of partitions.
So, we can move the initialization to the init part of the f2fs and destroy will
be done from exit part. After making changes, for individual partition mount -
entry creation code will not be executed.

Signed-off-by: Jianpeng Ma <majianpeng@gmail.com>
Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com>
Signed-off-by: Amit Sahrawat <a.sahrawat@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-01-15 20:19:15 +09:00
majianpeng 66af62ce75 f2fs: add global mutex_lock to protect f2fs_stat_list
There is an race condition between umounting f2fs and reading f2fs/status, which
results in oops.

Fox example:
Thread A			Thread B
umount f2fs 			cat f2fs/status

f2fs_destroy_stats() {		stat_show() {
				 list_for_each_entry_safe(&f2fs_stat_list)
 list_del(&si->stat_list);
 mutex_lock(&si->stat_lock);
 si->sbi = NULL;
 mutex_unlock(&si->stat_lock);
 kfree(sbi->stat_info);
} 				 mutex_lock(&si->stat_lock) <- si is gone.
				 ...
				}

Solution with a global lock: f2fs_stat_mutex:
Thread A			Thread B
umount f2fs 			cat f2fs/status

f2fs_destroy_stats() {		stat_show() {
 mutex_lock(&f2fs_stat_mutex);
 list_del(&si->stat_list);
 mutex_unlock(&f2fs_stat_mutex);
 kfree(sbi->stat_info);		 mutex_lock(&f2fs_stat_mutex);
}				 list_for_each_entry_safe(&f2fs_stat_list)
				 ...
				 mutex_unlock(&f2fs_stat_mutex);
				}

Signed-off-by: Jianpeng Ma <majianpeng@gmail.com>
[jaegeuk.kim@samsung.com: fix typos, description, and remove the existing lock]
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-01-15 20:18:29 +09:00
Namjae Jeon fa9150a84c f2fs: remove the blk_plug usage in f2fs_write_data_pages
Let's consider the usage of blk_plug in f2fs_write_data_pages().
We can come up with the two issues: lock contention and task awareness.

1. Merging bios prior to grabing "queue lock"
 The f2fs merges consecutive IOs in the file system level before
 submitting any bios, which is similar with the back merge by the
 plugging mechanism in attempt_plug_merge(). Both of them need to acquire
 no queue lock.

2. Merging policy with respect to tasks
 The f2fs merges IOs as much as possible regardless of tasks, while
 blk-plugging is conducted on a basis of tasks. As we can understand
 there are trade-offs, f2fs tries to maximize the write performance with
 well-merged bios.

As a result, if f2fs produces many consecutive but separated bios in
writepages(), it would be good to use blk-plugging since f2fs would be
able to avoid queue lock contention in the block layer by merging them.
But, f2fs merges IOs and submit one bio, which means that there are not
much chances to merge bios by attempt_plug_merge().

However, f2fs has already been used blk_plug by triggering generic_writepages()
in f2fs_write_data_pages().
So to make the overall code consistency, I'd like to remove blk_plug there.

Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com>
Signed-off-by: Amit Sahrawat <a.sahrawat@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-01-15 20:18:16 +09:00
Namjae Jeon 1b1baff6e5 UDF: Fix a null pointer dereference in udf_sb_free_partitions
This patch fixes a regression caused by commit bff943af6f "udf: Fix memory
leak when mounting" due to which it was triggering a kernel null point
dereference in case of interrupted mount OR when allocating memory to
sbi->s_partmaps failed in function udf_sb_alloc_partition_maps.

Reported-and-tested-by: James Hogan <james@albanarts.com>
Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com>
Signed-off-by: Ashish Sangwan <a.sangwan@samsung.com>
Signed-off-by: Jan Kara <jack@suse.cz>
2013-01-14 22:53:47 +01:00
Eric Sandeen 7e2fb2d7e6 jbd: don't wake kjournald unnecessarily
Don't send an extra wakeup to kjournald in the case where we
already have the proper target in j_commit_request, i.e. that
commit has already been requested for commit.

commit d9b0193 "jbd: fix fsync() tid wraparound bug" changed
the logic leading to a wakeup, but it caused some extra wakeups
which were found to lead to a measurable performance regression.

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Jan Kara <jack@suse.cz>
2013-01-14 22:50:45 +01:00
Linus Torvalds 6d283dba37 vfs: add missing virtual cache flush after editing partial pages
Andrew Morton pointed this out a month ago, and then I completely forgot
about it.

If we read a partial last page of a block device, we will zero out the
end of the page, but since that page can then be mapped into user space,
we should also make sure to flush the cache on architectures that have
virtual caches.  We have the flush_dcache_page() function for this, so
use it.

Now, in practice this really never matters, because nobody sane uses
virtual caches to begin with, and they largely exist on old broken RISC
arhitectures.

And even if you did run on one of those obsolete CPU's, the whole "mmap
and access the last partial page of a block device" behavior probably
doesn't actually exist.  The normal IO functions (read/write) will never
see the zeroed-out part of the page that migth not be coherent in the
cache, because they honor the size of the device.

So I'm marking this for stable (3.7 only), but I'm not sure anybody will
ever care.

Pointed-out-by: Andrew Morton <akpm@linux-foundation.org>
Cc: stable@vger.kernel.org  # 3.7
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-01-14 13:17:50 -08:00
Eric Sandeen 3972f2603d btrfs: update timestamps on truncate()
truncate() vs. ftruncate() differ in the VFS; truncate()
doesn't set (ATTR_CTIME | ATTR_MTIME), and it's up to the
fs to do the timestamp updates if the size changes.

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
2013-01-14 13:53:37 -05:00
Zach Brown f276795627 btrfs: fix btrfs_cont_expand() freeing IS_ERR em
btrfs_cont_expand() tries to free an IS_ERR em as it gets an error from
btrfs_get_extent() and breaks out of its loop.

An instance of -EEXIST was reported in the wild:

  https://bugzilla.redhat.com/show_bug.cgi?id=874407

I have no idea if that -EEXIST is surprising, or not.  Regardless, this
error handling should be cleaned up to handle other reasonable errors
(ENOMEM, EIO; whatever).

This seemed to be the only buggy freeing of the relatively rare IS_ERR
em so I opted to fix the caller rather than teach free_extent_map() to
use IS_ERR_OR_NULL().

Signed-off-by: Zach Brown <zab@redhat.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
2013-01-14 13:53:23 -05:00
Liu Bo f9e4fb5393 Btrfs: fix a bug when llseek for delalloc bytes behind prealloc extents
xfstests case 285 complains.

It it because btrfs did not try to find unwritten delalloc
bytes(only dirty pages, not yet writeback) behind prealloc
extents, it ends up finding nothing while we're with SEEK_DATA.

Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2013-01-14 13:53:22 -05:00
Liu Bo 1214b53f90 Btrfs: fix off-by-one in lseek
Lock end is inclusive.

Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2013-01-14 13:53:22 -05:00
Liu Bo 3268a2468e Btrfs: reset path lock state to zero
We forgot to reset the path lock state to zero after we unlock the path block,
and this can lead to the ASSERT checker in tree unlock API.

Reported-by: Slava Barinov <rayslava@gmail.com>
Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2013-01-14 13:52:53 -05:00
Liu Bo ac5c93005b Btrfs: let allocation start from the right raid type
This'd avoid us empty looping.

Say we have only one disk and the metadata raid type will be defaultly DUP,
and we do not need to start from index=0(RAID10) and get over two empty
loops to index=2(DUP).

Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2013-01-14 13:52:52 -05:00
Josef Bacik f3fe820c20 Btrfs: add orphan before truncating pagecache
Running xfstests 83 in a loop would sometimes fail the fsck.  This happens
because if we invalidate a page that already has an ordered extent setup for
it we will complete the ordered extent ourselves, assuming that the truncate
will clean everything up.  The problem with this is there is plenty of time
for the truncate to fail after we've done this work.  So to fix this we need
to add the orphan item first to make sure the cleanup gets done properly,
and then we can truncate the pagecache and all that stuff and be safe.  This
fixes the btrfsck failures I was seeing while running 83 in a loop.  Thanks,

Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2013-01-14 13:52:52 -05:00
Josef Bacik 72bcd99d45 Btrfs: set flushing if we're limited flushing
We still need to say we're flushing if we're limit flushing to keep somebody
from coming in and stealing our reservation.  Thanks,

Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2013-01-14 13:52:51 -05:00
Miao Xie 9754767657 Btrfs: fix missing write access release in btrfs_ioctl_resize()
We forget to give up the write access after we find some device operation
is going on. Fix it.

Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2013-01-14 13:52:51 -05:00
Miao Xie dba60f3f5d Btrfs: fix resize a readonly device
We should not resize a readonly device, fix it.

Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2013-01-14 13:52:49 -05:00
Miao Xie 5c39da5b6c Btrfs: do not delete a subvolume which is in a R/O subvolume
Step to reproduce:
 # mkfs.btrfs <disk>
 # mount <disk> <mnt>
 # btrfs sub create <mnt>/subv0
 # btrfs sub snap <mnt> <mnt>/subv0/snap0
 # change <mnt>/subv0 from R/W to R/O
 # btrfs sub del <mnt>/subv0/snap0

We deleted the snapshot successfully. I think we should not be able to delete
the snapshot since the parent subvolume is R/O.

Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
2013-01-14 13:52:32 -05:00
Miao Xie d86e56cf7d Btrfs: disable qgroup id 0
Qgroup id 0 is a special number, we should set the id of a qgroup to 0.
Fix it.

Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
2013-01-14 13:52:31 -05:00
Lukas Czerner cc975eb460 btrfs: get the device in write mode when deleting it
When we're deleting the device we should get it in write mode since
we're going to re-write the super block magic on that device. And it
should fail if the device is read-only.

Signed-off-by: Lukas Czerner <lczerner@redhat.com>
2013-01-14 13:52:31 -05:00
Tsutomu Itoh cfa7a9ccda Btrfs: fix memory leak in name_cache_insert()
We should free name_cache_entry before returning from the
error handling code.

Signed-off-by: Tsutomu Itoh <t-itoh@jp.fujitsu.com>
2013-01-14 13:52:30 -05:00
Linus Torvalds 3441f0d26d Driver core fixes for 3.8-rc3
Here are two patches for 3.8-rc3.
 
 One removes the __dev* defines from init.h now that all usages of it are gone
 from your tree.  The other fix is for debugfs's paramater that was using the
 wrong base for the option.
 
 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.19 (GNU/Linux)
 
 iEYEABECAAYFAlDzjcAACgkQMUfUDdst+ykJVwCcDqiKrO9p0dcH9WXN5aukBWX/
 N8EAoK786v7PjtiVyNOJ/cPUDU8OHUpg
 =U4nL
 -----END PGP SIGNATURE-----

Merge tag 'driver-core-3.8-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core

Pull driver core fixes from Greg Kroah-Hartman:
 "Here are two patches for 3.8-rc3.

  One removes the __dev* defines from init.h now that all usages of it
  are gone from your tree.  The other fix is for debugfs's paramater
  that was using the wrong base for the option.

  Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>"

* tag 'driver-core-3.8-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core:
  debugfs: convert gid= argument from decimal, not octal
  Remove __dev* markings from init.h
2013-01-14 09:07:11 -08:00
Namjae Jeon 163799872b f2fs: avoid redundant time update for parent directory in f2fs_delete_entry
In call to f2fs_delete_entry, 'dir' time modification code is put
at two places.
So, remove the redundant code for timing update.

Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com>
Signed-off-by: Amit Sahrawat <a.sahrawat@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-01-14 09:43:27 +09:00
Namjae Jeon ff9234ad4e f2fs: remove redundant call to set_blocksize in f2fs_fill_super
Since, f2fs supports only 4KB blocksize, which is set at the beginning in
f2fs_fill_super. So, we do not need to again check this blocksize setting
in such case.

Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com>
Signed-off-by: Amit Sahrawat <a.sahrawat@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-01-14 09:41:30 +09:00
Abhijit Pawar a17164e54b fs/xfs remove obsolete simple_strto<foo>
This patch replaces usages of obsolete simple_strtoul with kstrtoint in
xfs_args and suffix_strtoul.

Signed-off-by: Abhijit Pawar <abhi.c.pawar@gmail.com>
Reviewed-by: Jie Liu <jeff.liu@oracle.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2013-01-13 14:42:07 -06:00
Eric Sandeen d4608632ec xfs: recalculate leaf entry pointer after compacting a dir2 block
Dave Jones hit this assert when doing a compile on recent git, with
CONFIG_XFS_DEBUG enabled:

XFS: Assertion failed: (char *)dup - (char *)hdr == be16_to_cpu(*xfs_dir2_data_unused_tag_p(dup)), file: fs/xfs/xfs_dir2_data.c, line: 828

Upon further digging, the tag found by xfs_dir2_data_unused_tag_p(dup)
contained "2" and not the proper offset, and I found that this value was
changed after the memmoves under "Use a stale leaf for our new entry."
in xfs_dir2_block_addname(), i.e.

                        memmove(&blp[mid + 1], &blp[mid],
                                (highstale - mid) * sizeof(*blp));

overwrote it.

What has happened is that the previous call to xfs_dir2_block_compact()
has rearranged things; it changes btp->count as well as the
blp array.  So after we make that call, we must recalculate the
proper pointer to the leaf entries by making another call to
xfs_dir2_block_leaf_p().

Dave provided a metadump image which led to a simple reproducer
(create a particular filename in the affected directory) and this
resolves the testcase as well as the bug on his live system.

Thanks also to dchinner for looking at this one with me.

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Tested-by: Dave Jones <davej@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2013-01-13 14:36:17 -06:00
Theodore Ts'o 7f5118629f ext4: trigger the lazy inode table initialization after resize
After we have finished extending the file system, we need to trigger a
the lazy inode table thread to zero out the inode tables.

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2013-01-13 08:41:45 -05:00
Eryu Guan 15b49132fc ext4: check bh in ext4_read_block_bitmap()
Validate the bh pointer before using it, since
ext4_read_block_bitmap_nowait() might return NULL.

I've seen this in fsfuzz testing.

 EXT4-fs error (device loop0): ext4_read_block_bitmap_nowait:385: comm touch: Cannot get buffer for block bitmap - block_group = 0, block_bitmap = 3925999616
 BUG: unable to handle kernel NULL pointer dereference at           (null)
 IP: [<ffffffff8121de25>] ext4_wait_block_bitmap+0x25/0xe0
 ...
 Call Trace:
  [<ffffffff8121e1e5>] ext4_read_block_bitmap+0x35/0x60
  [<ffffffff8125e9c6>] ext4_free_blocks+0x236/0xb80
  [<ffffffff811d0d36>] ? __getblk+0x36/0x70
  [<ffffffff811d0a5f>] ? __find_get_block+0x8f/0x210
  [<ffffffff81191ef3>] ? kmem_cache_free+0x33/0x140
  [<ffffffff812678e5>] ext4_xattr_release_block+0x1b5/0x1d0
  [<ffffffff812679be>] ext4_xattr_delete_inode+0xbe/0x100
  [<ffffffff81222a7c>] ext4_free_inode+0x7c/0x4d0
  [<ffffffff812277b8>] ? ext4_mark_inode_dirty+0x88/0x230
  [<ffffffff8122993c>] ext4_evict_inode+0x32c/0x490
  [<ffffffff811b8cd7>] evict+0xa7/0x1c0
  [<ffffffff811b8ed3>] iput_final+0xe3/0x170
  [<ffffffff811b8f9e>] iput+0x3e/0x50
  [<ffffffff812316fd>] ext4_add_nondir+0x4d/0x90
  [<ffffffff81231d0b>] ext4_create+0xeb/0x170
  [<ffffffff811aae9c>] vfs_create+0xac/0xd0
  [<ffffffff811ac845>] lookup_open+0x185/0x1c0
  [<ffffffff8129e3b9>] ? selinux_inode_permission+0xa9/0x170
  [<ffffffff811acb54>] do_last+0x2d4/0x7a0
  [<ffffffff811af743>] path_openat+0xb3/0x480
  [<ffffffff8116a8a1>] ? handle_mm_fault+0x251/0x3b0
  [<ffffffff811afc49>] do_filp_open+0x49/0xa0
  [<ffffffff811bbaad>] ? __alloc_fd+0xdd/0x150
  [<ffffffff8119da28>] do_sys_open+0x108/0x1f0
  [<ffffffff8119db51>] sys_open+0x21/0x30
  [<ffffffff81618959>] system_call_fastpath+0x16/0x1b

Also fix comment for ext4_read_block_bitmap_nowait()

Signed-off-by: Eryu Guan <guaneryu@gmail.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Cc: stable@vger.kernel.org
2013-01-12 16:33:25 -05:00
Wang Shilong aebf02430d ext4: use unlikely to improve the efficiency of the kernel
Because the function 'sb_getblk' seldomly fails to return NULL
value,it will be better to use 'unlikely' to optimize it.

Signed-off-by: Wang Shilong <wangsl-fnst@cn.fujitsu.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2013-01-12 16:28:47 -05:00
Theodore Ts'o 860d21e2c5 ext4: return ENOMEM if sb_getblk() fails
The only reason for sb_getblk() failing is if it can't allocate the
buffer_head.  So ENOMEM is more appropriate than EIO.  In addition,
make sure that the file system is marked as being inconsistent if
sb_getblk() fails.

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Cc: stable@vger.kernel.org
2013-01-12 16:19:36 -05:00
Xi Wang 6d92d4f6a7 fs/exec.c: work around icc miscompilation
The tricky problem is this check:

	if (i++ >= max)

icc (mis)optimizes this check as:

	if (++i > max)

The check now becomes a no-op since max is MAX_ARG_STRINGS (0x7FFFFFFF).

This is "allowed" by the C standard, assuming i++ never overflows,
because signed integer overflow is undefined behavior.  This
optimization effectively reverts the previous commit 362e6663ef
("exec.c, compat.c: fix count(), compat_count() bounds checking") that
tries to fix the check.

This patch simply moves ++ after the check.

Signed-off-by: Xi Wang <xi.wang@gmail.com>
Cc: Jason Baron <jbaron@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-01-11 14:54:55 -08:00
Kees Cook d9777b8de4 fs/xfs: remove depends on CONFIG_EXPERIMENTAL
The CONFIG_EXPERIMENTAL config item has not carried much meaning for a
while now and is almost always enabled by default. As agreed during the
Linux kernel summit, remove it from any "depends on" lines in Kconfigs.

CC: Ben Myers <bpm@sgi.com>
CC: Alex Elder <elder@kernel.org>
Signed-off-by: Kees Cook <keescook@chromium.org>
Acked-by: Ben Myers <bpm@sgi.com>
2013-01-11 11:39:04 -08:00
Kees Cook f11cb2271f fs/nilfs2: remove depends on CONFIG_EXPERIMENTAL
The CONFIG_EXPERIMENTAL config item has not carried much meaning for a
while now and is almost always enabled by default. As agreed during the
Linux kernel summit, remove it from any "depends on" lines in Kconfigs.

CC: KONISHI Ryusuke <konishi.ryusuke@lab.ntt.co.jp>
Signed-off-by: Kees Cook <keescook@chromium.org>
Acked-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2013-01-11 11:39:04 -08:00
Kees Cook 336d6d0323 fs/ecryptfs: remove depends on CONFIG_EXPERIMENTAL
The CONFIG_EXPERIMENTAL config item has not carried much meaning for a
while now and is almost always enabled by default. As agreed during the
Linux kernel summit, remove it from any "depends on" lines in Kconfigs.

CC: Tyler Hicks <tyhicks@canonical.com>
CC: Dustin Kirkland <dustin.kirkland@gazzang.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Acked-by: Tyler Hicks <tyhicks@canonical.com>
2013-01-11 11:39:04 -08:00
Kees Cook 1b6a78a522 fs/ceph: remove depends on CONFIG_EXPERIMENTAL
The CONFIG_EXPERIMENTAL config item has not carried much meaning for a
while now and is almost always enabled by default. As agreed during the
Linux kernel summit, remove it from any "depends on" lines in Kconfigs.

CC: Sage Weil <sage@inktank.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Acked-by: Sage Weil <sage@inktank.com>
2013-01-11 11:39:04 -08:00
Seiji Aguchi 9f244e9cfd pstore: Avoid deadlock in panic and emergency-restart path
[Issue]

When pstore is in panic and emergency-restart paths, it may be blocked
in those paths because it simply takes spin_lock.

This is an example scenario which pstore may hang up in a panic path:

 - cpuA grabs psinfo->buf_lock
 - cpuB panics and calls smp_send_stop
 - smp_send_stop sends IRQ to cpuA
 - after 1 second, cpuB gives up on cpuA and sends an NMI instead
 - cpuA is now in an NMI handler while still holding buf_lock
 - cpuB is deadlocked

This case may happen if a firmware has a bug and
cpuA is stuck talking with it more than one second.

Also, this is a similar scenario in an emergency-restart path:

 - cpuA grabs psinfo->buf_lock and stucks in a firmware
 - cpuB kicks emergency-restart via either sysrq-b or hangcheck timer.
   And then, cpuB is deadlocked by taking psinfo->buf_lock again.

[Solution]

This patch avoids the deadlocking issues in both panic and emergency_restart
paths by introducing a function, is_non_blocking_path(), to check if a cpu
can be blocked in current path.

With this patch, pstore is not blocked even if another cpu has
taken a spin_lock, in those paths by changing from spin_lock_irqsave
to spin_trylock_irqsave.

In addition, according to a comment of emergency_restart() in kernel/sys.c,
spin_lock shouldn't be taken in an emergency_restart path to avoid
deadlock. This patch fits the comment below.

<snip>
/**
 *      emergency_restart - reboot the system
 *
 *      Without shutting down any hardware or taking any locks
 *      reboot the system.  This is called when we know we are in
 *      trouble so this is our best effort to reboot.  This is
 *      safe to call in interrupt context.
 */
void emergency_restart(void)
<snip>

Signed-off-by: Seiji Aguchi <seiji.aguchi@hds.com>
Acked-by: Don Zickus <dzickus@redhat.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
2013-01-11 10:20:50 -08:00
Dave Reisner f1688e0431 debugfs: convert gid= argument from decimal, not octal
This patch technically breaks userspace, but I suspect that anyone who
actually used this flag would have encountered this brokenness, declared
it lunacy, and already sent a patch.

Signed-off-by: Dave Reisner <dreisner@archlinux.org>
Reviewed-by: Vasiliy Kulikov <segoon@openwall.com>
Acked-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-01-11 05:56:01 -08:00
Jaegeuk Kim 9eaeba7013 f2fs: move f2fs_balance_fs to punch_hole
The f2fs_fallocate() has two operations: punch_hole and expand_size.

Only in the case of punch_hole, dirty node pages can be produced, so let's
trigger f2fs_balance_fs() in this case only.
Furthermore, let's trigger it at every data truncation routine.

Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-01-11 15:09:23 +09:00
Jaegeuk Kim 7d82db8316 f2fs: add f2fs_balance_fs in several interfaces
The f2fs_balance_fs() is to check the number of free sections and decide whether
it needs to conduct cleaning or not. If there are not enough free sections, the
cleaning job should be started.

In order to control an amount of free sections even under high utilization, f2fs
should call f2fs_balance_fs at all the VFS interfaces that are able to produce
dirty pages.
This patch adds the function calls in the missing interfaces as follows.

1. f2fs_setxattr()
The f2fs_setxattr() produces dirty node pages so that we should call
f2fs_balance_fs() either likewise doing in other VFS interfaces such as
f2fs_lookup(), f2fs_mkdir(), and so on.

2. f2fs_sync_file()
We should guarantee serving free sections for syncing metadata during fsync.
Previously, there is no space check before triggering checkpoint and
sync_node_pages.
Therefore, if a bunch of fsync calls are triggered under 100% of FS utilization,
f2fs is able to be faced with no free sections, resulting in BUG_ON().

3. f2fs_sync_fs()
Before calling write_checkpoint(), we should guarantee that there are minimum
free sections.

4. f2fs_write_inode()
f2fs_write_inode() is also able to produce dirty node pages.

Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-01-11 15:09:17 +09:00
Randy Dunlap 254adaa465 seq_file: fix new kernel-doc warnings
Fix kernel-doc warnings in fs/seq_file.c:

  Warning(fs/seq_file.c:304): No description found for parameter 'whence'
  Warning(fs/seq_file.c:304): Excess function parameter 'origin' description in 'seq_lseek'

Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-01-10 14:35:24 -08:00
Jaegeuk Kim 408e937561 f2fs: revisit the f2fs_gc flow
I'd like to revisit the f2fs_gc flow and rewrite as follows.

1. In practical, the nGC parameter of f2fs_gc is meaningless. So, let's
  remove it.
2. Background GC marks victim blocks as dirty one at a time.
3. Foreground GC should do cleaning job until acquiring enough free
  sections. Afterwards, it needs to do checkpoint.

Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-01-10 07:42:59 +09:00
Wang Sheng-Hui 210b907acf btrfs: remove unnecessary cur_trans set before goto loop in join_transaction
In the big loop, cur_trans will be set fs_info->running_transaction
before it's used. And after kmem_cache_free it and goto loop, it will
be setup again. No need to setup it immediately after freed.

Signed-off-by: Wang Sheng-Hui <shhuiw@gmail.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2013-01-09 11:48:33 +01:00
Masanari Iida 8a168ca707 treewide: Fix typo in various drivers
Correct spelling typo in printk within various drivers.

Signed-off-by: Masanari Iida <standby24x7@gmail.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2013-01-09 11:43:32 +01:00
Liu Bo 2c016dc2cb btrfs: fix comment typos
Convert 'hepler' to 'helper'.

Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2013-01-09 11:40:52 +01:00
Linus Torvalds 5c33d9b248 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull networking fixes from David Miller:

 1) New sysctl ndisc_notify needs some documentation, from Hanns
    Frederic Sowa.

 2) Netfilter REJECT target doesn't set transport header of SKB
    correctly, from Mukund Jampala.

 3) Forcedeth driver needs to check for DMA mapping failures, from Larry
    Finger.

 4) brcmsmac driver can't use usleep_range while holding locks, use
    udelay instead.  From Niels Ole Salscheider.

 5) Fix unregister of netlink bridge multicast database handlers, from
    Vlad Yasevich and Rami Rosen.

 6) Fix checksum calculations in netfilter's ipv6 network prefix
    translation module.

 7) Fix high order page allocation failures in netfilter xt_recent, from
    Eric Dumazet.

 8) mac802154 needs to use netif_rx_ni() instead of netif_rx() because
    mac802154_process_data() can execute in process rather than
    interrupt context.  From Alexander Aring.

 9) Fix splice handling of MSG_SENDPAGE_NOTLAST, otherwise we elide one
    tcp_push() too many.  From Eric Dumazet and Willy Tarreau.

10) Fix skb->truesize tracking in XEN netfront driver, from Ian
    Campbell.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (46 commits)
  xen/netfront: improve truesize tracking
  ipv4: fix NULL checking in devinet_ioctl()
  tcp: fix MSG_SENDPAGE_NOTLAST logic
  net/ipv4/ipconfig: really display the BOOTP/DHCP server's address.
  ip-sysctl: fix spelling errors
  mac802154: fix NOHZ local_softirq_pending 08 warning
  ipv6: document ndisc_notify in networking/ip-sysctl.txt
  ath9k: Fix Kconfig for ATH9K_HTC
  netfilter: xt_recent: avoid high order page allocations
  netfilter: fix missing dependencies for the NOTRACK target
  netfilter: ip6t_NPT: fix IPv6 NTP checksum calculation
  bridge: add empty br_mdb_init() and br_mdb_uninit() definitions.
  vxlan: allow live mac address change
  bridge: Correctly unregister MDB rtnetlink handlers
  brcmfmac: fix parsing rsn ie for ap mode.
  brcmsmac: add copyright information for Canonical
  rtlwifi: rtl8723ae: Fix warning for unchecked pci_map_single() call
  rtlwifi: rtl8192se: Fix warning for unchecked pci_map_single() call
  rtlwifi: rtl8192de: Fix warning for unchecked pci_map_single() call
  rtlwifi: rtl8192ce: Fix warning for unchecked pci_map_single() call
  ...
2013-01-08 07:31:49 -08:00
Linus Torvalds 127aa93066 Merge branch 'for-next' of git://git.samba.org/sfrench/cifs-2.6
Pull CIFS fixes from Steve French:
 "Misc small cifs fixes"

* 'for-next' of git://git.samba.org/sfrench/cifs-2.6:
  CIFS: Don't let read only caching for mandatory byte-range locked files
  CIFS: Fix write after setting a read lock for read oplock files
  Revert "CIFS: Fix write after setting a read lock for read oplock files"
  cifs: adjust sequence number downward after signing NT_CANCEL request
  cifs: move check for NULL socket into smb_send_rqst
2013-01-07 13:21:55 -08:00
David Teigland f117228346 dlm: avoid scanning unchanged toss lists
Keep track of whether a toss list contains any
shrinkable rsbs.  If not, dlm_scand can avoid
scanning the list for rsbs to shrink.  Unnecessary
scanning can otherwise waste a lot of time because
the toss lists can contain a large number of rsbs
that are non-shrinkable (directory records).

Signed-off-by: David Teigland <teigland@redhat.com>
2013-01-07 12:02:49 -06:00
Linus Torvalds f77637206d Bug fixes, including two regressions introduced in v3.8. The most
serious of these regressions is a buffer cache leak.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.12 (GNU/Linux)
 
 iQIcBAABCAAGBQJQ6rvpAAoJENNvdpvBGATwRrIP/1bokaspEpaHVKFAqgJzRNew
 cPdINNqc85m5MmVGmvPq4EMx7+86Z459sZAKafaXV/qVR/m7vmtfIiWi8bWPZUkv
 MrblMSPAQHERypMWA8eJkNfkyw39HJb4GMAIgh1TBUXO2ocb46cGpl0Fcum0twT4
 pv2C2JqNcLtIsekJsrqmvdqrNW+bMoMJZtzjFwHuIknmbo7eSFtgV17EFlcfWhJ7
 CZoJwvk2s/dnS6l4icwKZBNbKnap6oR9SptoymvH+ATPIEO4qJisRbID6XVhTZZ8
 3lsOWfGlGIVEy6wsQKwfeWdCzAyAjyQza7eeMdVkqm97YynFSUD0pMZ9tQ1FpQ9E
 JGAWshLFyA8+atY/JZ8xGQisY2R57WoJd2m0Bf3ockhB963iu+vnIxZ4BaKW3K9l
 WMRqkouA1ijaeeUNHV4FulJ0cG6ioIYTF6nM/jGTJXhF8ZXjJHb4ZiwjjWS8fZv1
 Ooe6GkHz29txi+hOET0vtwDUqkihGFlfNDbZ48/JjlS4sCy5ntcwt321Tn7olbo5
 O72k+oQLtMLJshrZwTuSQZZtiv/9638gtNGC6EUy7p5LTmo5CgaueEh2qhXweVGR
 f8X25RjNWREvE78JiJw6SzYfNeaLO6I0f+Hs4z8PLVc8K02IO0tK8mw8LNgvFHRF
 xCFhD2w916Dsh9cwyiRT
 =pnxL
 -----END PGP SIGNATURE-----

Merge tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4

Pull ext4 regression fixes from Ted Ts'o:
 "Bug fixes, including two regressions introduced in v3.8.  The most
  serious of these regressions is a buffer cache leak."

* tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
  ext4: remove duplicate call to ext4_bread() in ext4_init_new_dir()
  ext4: release buffer in failed path in dx_probe()
  ext4: fix configuration dependencies for ext4 ACLs and security labels
2013-01-07 09:22:50 -08:00
Linus Torvalds 4c9014f2ca NFS client bugfixes for Linux 3.8
- Fix a permissions problem when opening NFSv4 files that only have the
   exec bit set.
 - Fix a couple of typos in pNFS (inverted logic), and the mount parsing
   (missing pointer dereference).
 - Work around a series of deadlock issues due to workqueues using
   struct work_struct pointer address comparisons in the re-entrancy
   tests. Ensure that we don't free struct work_struct prematurely if
   our work function involves waiting for completion of other work
   items (e.g. by calling rpc_shutdown_client).
 - Revert the part of commit 168e4b3 that is causing unnecessary warnings
   to be issued in the nfsd callback code.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.12 (GNU/Linux)
 
 iQIcBAABAgAGBQJQ6uzfAAoJEGcL54qWCgDyRvIP/06F4xZChZxH3prNz2cU2EDJ
 2RO3lWoZf8aBk2+bhU5Fm9cuUcCm1raoRFsdWoJC/dnlXL+A7ZeuYDvXycNfTJW3
 vCIRvZTU8TAxwR9szNkPhRIB09FAacioP/K0q6pBfwx8my5NC1yIpOMDVmND+40Z
 oWm7ICip4vblxNVQMYp6/JrDcc7LCDcOG5j0EKO5aSxRE0Ki2TYqN0nk20v9caDe
 43jOA2HWGPQ+Zg3sty2o728u57RB70OBqRJ//2pzW2fEwSut+pf5pC1MoVJvtgST
 Utwbt/tMEFAuvo0WGHqjaYHoAezcLkjSvYdrh7Vz6WJ0gsonCB8V8UvgKsFt0XSM
 YWtXa/PzkDfXDVNwkzpjmDCUDJwzAiTllQP5cJzAhsz7GG0xzGypyqmIMcJ6UGxy
 sADGXAhpJR8sWn7oxh4znMh+JlxW3MLA+jSgkrlzGM7wMinncnp5RSQIlY2/8qhZ
 GiO6b6wiUYYHbZtV456S4M+PwA0kGqkSDP/PlrquXAW8jPFCBlKLf4+SFK2kqs25
 yyJQXD0QycbtfbCGxaFJkt1qUEgIGkTpU8UnyxHKccIXhR4ZDi/IZ3OQSLexE+8V
 L8HZXeT32+fGgPz479bplXJ2VnLQo6VXOrA5ofylqesWW5klDPMTz7KIEJSyyHwG
 kiSLyjhDx/2vAG+Nwusr
 =yxcq
 -----END PGP SIGNATURE-----

Merge tag 'nfs-for-3.8-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs

Pull NFS client bugfixes from Trond Myklebust:

 - Fix a permissions problem when opening NFSv4 files that only have the
   exec bit set.

 - Fix a couple of typos in pNFS (inverted logic), and the mount parsing
   (missing pointer dereference).

 - Work around a series of deadlock issues due to workqueues using
   struct work_struct pointer address comparisons in the re-entrancy
   tests.  Ensure that we don't free struct work_struct prematurely if
   our work function involves waiting for completion of other work items
   (e.g. by calling rpc_shutdown_client).

 - Revert the part of commit 168e4b3 that is causing unnecessary
   warnings to be issued in the nfsd callback code.

* tag 'nfs-for-3.8-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
  nfs: avoid dereferencing null pointer in initiate_bulk_draining
  SUNRPC: Partial revert of commit 168e4b39d1
  NFS: Ensure that we free the rpc_task after read and write cleanups are done
  SUNRPC: Ensure that we free the rpc_task after cleanups are done
  nfs: fix null checking in nfs_get_option_str()
  pnfs: Increase the refcount when LAYOUTGET fails the first time
  NFS: Fix access to suid/sgid executables
2013-01-07 08:36:45 -08:00
Eric Dumazet ae62ca7b03 tcp: fix MSG_SENDPAGE_NOTLAST logic
commit 35f9c09fe9 (tcp: tcp_sendpages() should call tcp_push() once)
added an internal flag : MSG_SENDPAGE_NOTLAST meant to be set on all
frags but the last one for a splice() call.

The condition used to set the flag in pipe_to_sendpage() relied on
splice() user passing the exact number of bytes present in the pipe,
or a smaller one.

But some programs pass an arbitrary high value, and the test fails.

The effect of this bug is a lack of tcp_push() at the end of a
splice(pipe -> socket) call, and possibly very slow or erratic TCP
sessions.

We should both test sd->total_len and fact that another fragment
is in the pipe (pipe->nrbufs > 1)

Many thanks to Willy for providing very clear bug report, bisection
and test programs.

Reported-by: Willy Tarreau <w@1wt.eu>
Bisected-by: Willy Tarreau <w@1wt.eu>
Tested-by: Willy Tarreau <w@1wt.eu>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-06 20:58:13 -08:00
Guo Chao fef0ebdb22 ext4: remove duplicate call to ext4_bread() in ext4_init_new_dir()
This fixes a buffer cache leak when creating a directory, introduced
in commit a774f9c20.

Signed-off-by: Guo Chao <yan@linux.vnet.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Reviewed-by: Tao Ma <boyu.mt@taobao.com>
2013-01-06 23:40:25 -05:00
Guo Chao 0ecaef0644 ext4: release buffer in failed path in dx_probe()
If checksum fails, we should also release the buffer
read from previous iteration.

Signed-off-by: Guo Chao <yan@linux.vnet.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>-
Cc: stable@vger.kernel.org
--
 fs/ext4/namei.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
2013-01-06 23:38:47 -05:00
Valerie Aurora 96465efee1 ext4: fix configuration dependencies for ext4 ACLs and security labels
Commit "ext4: Remove CONFIG_EXT4_FS_XATTR" removed the configuration
dependencies for ext4 xattrs from the ext4 ACLs and security labels
configuration options, but did not replace them with a dependency on
ext4 itself.  Add back the dependency on ext4 so the options only show
up if ext4 is enabled.

Signed-off-by: Valerie Aurora <val@vaaconsulting.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Reviewed-by: Tao Ma <boyu.mt@taobao.com>
2013-01-06 23:38:44 -05:00
Nickolai Zeldovich ecf0eb9edb nfs: avoid dereferencing null pointer in initiate_bulk_draining
Fix an inverted null pointer check in initiate_bulk_draining().

Signed-off-by: Nickolai Zeldovich <nickolai@csail.mit.edu>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: stable@vger.kernel.org [>= 3.7]
2013-01-05 14:26:51 -05:00