Commit Graph

34219 Commits

Author SHA1 Message Date
Christoph Hellwig 854ff5caab exportfs: BUG_ON in crazy corner case
This would indicate a nasty bug in the dcache and has never triggered in
the past 10 years as far as I know.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-11-09 00:16:35 -05:00
J. Bruce Fields 13a2c3be03 dcache: fix outdated DCACHE_NEED_LOOKUP comment
The DCACHE_NEED_LOOKUP case referred to here was removed with
39e3c9553f "vfs: remove
DCACHE_NEED_LOOKUP".

There are only four real_lookup() callers and all of them pass in an
unhashed dentry just returned from d_alloc.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-11-09 00:16:34 -05:00
J. Bruce Fields f80de2cde1 dcache: don't clear DCACHE_DISCONNECTED too early
DCACHE_DISCONNECTED should not be cleared until we're sure the dentry is
connected all the way up to the root of the filesystem.  It *shouldn't*
be cleared as soon as the dentry is connected to a parent.  That will
cause bugs at least on exportable filesystems.

Acked-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-11-09 00:16:34 -05:00
J. Bruce Fields e1a24bb0aa dcache: Don't set DISCONNECTED on "pseudo filesystem" dentries
I can't for the life of me see any reason why anyone should care whether
a dentry that is never hooked into the dentry cache would need
DCACHE_DISCONNECTED set.

This originates from 4b936885ab "fs:
improve scalability of pseudo filesystems", which probably just made the
false assumption the DCACHE_DISCONNECTED was meant to be set on anything
not connected to a parent somehow.

So this is just confusing.  Ideally the only uses of DCACHE_DISCONNECTED
would be in the filehandle-lookup code, which needs it to ensure
dentries are connected into the dentry tree before use.

I left d_alloc_pseudo there even though it's now equivalent to
__d_alloc(), just on the theory the name is better documentation of its
intended use outside dcache.c.

Cc: Nick Piggin <npiggin@kernel.dk>
Acked-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-11-09 00:16:33 -05:00
J. Bruce Fields 7632e465fe dcache: use IS_ROOT to decide where dentry is hashed
Every hashed dentry is either hashed in the dentry_hashtable, or a
superblock's s_anon list.

__d_drop() assumes it can determine which is the case by checking
DCACHE_DISCONNECTED; this is not true.

It is true that when DCACHE_DISCONNECTED is cleared, the dentry is not
only hashed on dentry_hashtable, but is fully connected to its parents
back to the root.

But the converse is *not* true: fs/exportfs/expfs.c:reconnect_path()
attempts to connect a directory (found by filehandle lookup) back to
root by ascending to parents and performing lookups one at a time.  It
does not clear DCACHE_DISCONNECTED until it's done, and that is not at
all an atomic process.

In particular, it is possible for DCACHE_DISCONNECTED to be set on a
dentry which is hashed on the dentry_hashtable.

Instead, use IS_ROOT() to check which hash chain a dentry is on.  This
*does* work:

Dentries are hashed only by:

	- d_obtain_alias, which adds an IS_ROOT() dentry to sb_anon.

	- __d_rehash, called by _d_rehash: hashes to the dentry's
	  parent, and all callers of _d_rehash appear to have d_parent
	  set to a "real" parent.
	- __d_rehash, called by __d_move: rehashes the moved dentry to
	  hash chain determined by target, and assigns target's d_parent
	  to its d_parent, before dropping the dentry's d_lock.

Therefore I believe it's safe for a holder of a dentry's d_lock to
assume that it is hashed on sb_anon if and only if IS_ROOT(dentry) is
true.

I believe the incorrect assumption about DCACHE_DISCONNECTED was
originally introduced by ceb5bdc2d2 "fs: dcache per-bucket dcache hash
locking".

Also add a comment while we're here.

Cc: Nick Piggin <npiggin@kernel.dk>
Acked-by: Christoph Hellwig <hch@infradead.org>
Reviewed-by: NeilBrown <neilb@suse.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-11-09 00:16:33 -05:00
Al Viro b19f133674 ocfs2: get rid of impossible checks
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-11-09 00:16:32 -05:00
Al Viro fbad2bd132 qnx4: i_sb is never NULL
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-11-09 00:16:32 -05:00
J. Bruce Fields 950ee9566a exportfs: fix 32-bit nfsd handling of 64-bit inode numbers
Symptoms were spurious -ENOENTs on stat of an NFS filesystem from a
32-bit NFS server exporting a very large XFS filesystem, when the
server's cache is cold (so the inodes in question are not in cache).

Reviewed-by: Christoph Hellwig <hch@lst.de>
Reported-by: Trevor Cordes <trevor@tecnopolis.ca>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-11-09 00:16:32 -05:00
J. Bruce Fields b7a6ec52dd vfs: split out vfs_getattr_nosec
The filehandle lookup code wants this version of getattr.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-11-09 00:16:31 -05:00
Al Viro 5a3cd99285 iget/iget5: don't bother with ->i_lock until we find a match
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-11-09 00:16:31 -05:00
David Howells b18825a7c8 VFS: Put a small type field into struct dentry::d_flags
Put a type field into struct dentry::d_flags to indicate if the dentry is one
of the following types that relate particularly to pathwalk:

	Miss (negative dentry)
	Directory
	"Automount" directory (defective - no i_op->lookup())
	Symlink
	Other (regular, socket, fifo, device)

The type field is set to one of the first five types on a dentry by calls to
__d_instantiate() and d_obtain_alias() from information in the inode (if one is
given).

The type is cleared by dentry_unlink_inode() when it reconstitutes an existing
dentry as a negative dentry.

Accessors provided are:

	d_set_type(dentry, type)
	d_is_directory(dentry)
	d_is_autodir(dentry)
	d_is_symlink(dentry)
	d_is_file(dentry)
	d_is_negative(dentry)
	d_is_positive(dentry)

A bunch of checks in pathname resolution switched to those.

Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-11-09 00:16:30 -05:00
Al Viro afabada957 elf{,_fdpic} coredump: get rid of pointless if (siginfo->si_signo)
we can't get to do_coredump() if that condition isn't satisfied...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-11-09 00:16:30 -05:00
Al Viro ec57941e03 constify do_coredump() argument
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-11-09 00:16:29 -05:00
Al Viro ce39596048 constify copy_siginfo_to_user{,32}()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-11-09 00:16:29 -05:00
Al Viro 078d8e624c ... and kill anon_inode_getfile_private()
it's a seriously misguided API, now fortunately without users.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-11-09 00:16:28 -05:00
Benjamin LaHaise 71ad7490c1 rework aio migrate pages to use aio fs
Don't abuse anon_inodes.c to host private files needed by aio;
we can bloody well declare a mini-fs of our own instead of
patching up what anon_inodes can create for us.

Tested-by: Benjamin LaHaise <bcrl@kvack.org>
Acked-by: Benjamin LaHaise <bcrl@kvack.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-11-09 00:16:28 -05:00
Al Viro 6987843ff7 take anon inode allocation to libfs.c
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-11-09 00:16:27 -05:00
Al Viro 22a8cb8248 new helper: dump_align()
dump_skip to given alignment...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-11-09 00:16:27 -05:00
Al Viro 9b56d54380 dump_skip(): dump_seek() replacement taking coredump_params
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-11-09 00:16:26 -05:00
Al Viro 2507a4fbd4 make dump_emit() use vfs_write() instead of banging at ->f_op->write directly
... and deal with short writes properly - the output might be to pipe, after
all; as it is, e.g. no-MMU case of elf_fdpic coredump can write a whole lot
more than a page worth of data at one call.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-11-09 00:16:26 -05:00
Al Viro 1ad67015e6 binfmt_elf: count notes towards coredump limit
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-11-09 00:16:25 -05:00
Al Viro 43a5d548eb aout: switch to dump_emit
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-11-09 00:16:25 -05:00
Al Viro cdc3d5627d switch elf_coredump_extra_notes_write() to dump_emit()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-11-09 00:16:24 -05:00
Al Viro e6c1baa9b5 convert the rest of binfmt_elf_fdpic to dump_emit()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-11-09 00:16:24 -05:00
Al Viro 13046ece96 binfmt_elf: convert writing actual dump pages to dump_emit()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-11-09 00:16:24 -05:00
Al Viro aa3e7eaf0a switch elf_core_write_extra_data() to dump_emit()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-11-09 00:16:23 -05:00
Al Viro 506f21c556 switch elf_core_write_extra_phdrs() to dump_emit()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-11-09 00:16:23 -05:00
Al Viro ecc8c7725e new helper: dump_emit()
dump_write() analog, takes core_dump_params instead of file,
keeps track of the amount written in cprm->written and checks for
cprm->limit.  Start using it in binfmt_elf.c...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-11-09 00:16:22 -05:00
Al Viro 11d100d9a2 coda_revalidate_inode(): switch to passing inode...
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-11-09 00:16:21 -05:00
Al Viro b61625d245 fold __d_shrink() into its only remaining caller
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-11-09 00:16:21 -05:00
Al Viro eee5cc2702 get rid of s_files and files_lock
The only thing we need it for is alt-sysrq-r (emergency remount r/o)
and these days we can do just as well without going through the
list of files.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-11-09 00:16:20 -05:00
Al Viro 8b61e74ffc get rid of {lock,unlock}_rcu_walk()
those have become aliases for rcu_read_{lock,unlock}()

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-11-09 00:16:20 -05:00
Al Viro 48a066e72d RCU'd vfsmounts
* RCU-delayed freeing of vfsmounts
* vfsmount_lock replaced with a seqlock (mount_lock)
* sequence number from mount_lock is stored in nameidata->m_seq and
used when we exit RCU mode
* new vfsmount flag - MNT_SYNC_UMOUNT.  Set by umount_tree() when its
caller knows that vfsmount will have no surviving references.
* synchronize_rcu() done between unlocking namespace_sem in namespace_unlock()
and doing pending mntput().
* new helper: legitimize_mnt(mnt, seq).  Checks the mount_lock sequence
number against seq, then grabs reference to mnt.  Then it rechecks mount_lock
again to close the race and either returns success or drops the reference it
has acquired.  The subtle point is that in case of MNT_SYNC_UMOUNT we can
simply decrement the refcount and sod off - aforementioned synchronize_rcu()
makes sure that final mntput() won't come until we leave RCU mode.  We need
that, since we don't want to end up with some lazy pathwalk racing with
umount() and stealing the final mntput() from it - caller of umount() may
expect it to return only once the fs is shut down and we don't want to break
that.  In other cases (i.e. with MNT_SYNC_UMOUNT absent) we have to do
full-blown mntput() in case of mount_lock sequence number mismatch happening
just as we'd grabbed the reference, but in those cases we won't be stealing
the final mntput() from anything that would care.
* mntput_no_expire() doesn't lock anything on the fast path now.  Incidentally,
SMP and UP cases are handled the same way - no ifdefs there.
* normal pathname resolution does *not* do any writes to mount_lock.  It does,
of course, bump the refcounts of vfsmount and dentry in the very end, but that's
it.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-11-09 00:16:19 -05:00
Al Viro 42c326082d switch shrink_dcache_for_umount() to use of d_walk()
we have too many iterators in fs/dcache.c...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-11-09 00:16:06 -05:00
Kent Overstreet 6678d83f18 block: Consolidate duplicated bio_trim() implementations
Someone cut and pasted md's md_trim_bio() into xen-blkfront.c. Come on,
we should know better than this.

Signed-off-by: Kent Overstreet <kmo@daterainc.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Neil Brown <neilb@suse.de>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Jeremy Fitzhardinge <jeremy@goop.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2013-11-08 09:02:31 -07:00
Christoph Lameter 170d800af8 block: Replace __get_cpu_var uses
__get_cpu_var() is used for multiple purposes in the kernel source. One of
them is address calculation via the form &__get_cpu_var(x).  This calculates
the address for the instance of the percpu variable of the current processor
based on an offset.

Other use cases are for storing and retrieving data from the current
processors percpu area.  __get_cpu_var() can be used as an lvalue when
writing data or on the right side of an assignment.

__get_cpu_var() is defined as :

#define __get_cpu_var(var) (*this_cpu_ptr(&(var)))

__get_cpu_var() always only does an address determination. However, store
and retrieve operations could use a segment prefix (or global register on
other platforms) to avoid the address calculation.

this_cpu_write() and this_cpu_read() can directly take an offset into a
percpu area and use optimized assembly code to read and write per cpu
variables.

This patch converts __get_cpu_var into either an explicit address
calculation using this_cpu_ptr() or into a use of this_cpu operations that
use the offset.  Thereby address calculations are avoided and less registers
are used when code is generated.

At the end of the patch set all uses of __get_cpu_var have been removed so
the macro is removed too.

The patch set includes passes over all arches as well. Once these operations
are used throughout then specialized macros can be defined in non -x86
arches as well in order to optimize per cpu access by f.e.  using a global
register that may be set to the per cpu base.

Transformations done to __get_cpu_var()

1. Determine the address of the percpu instance of the current processor.

	DEFINE_PER_CPU(int, y);
	int *x = &__get_cpu_var(y);

    Converts to

	int *x = this_cpu_ptr(&y);

2. Same as #1 but this time an array structure is involved.

	DEFINE_PER_CPU(int, y[20]);
	int *x = __get_cpu_var(y);

    Converts to

	int *x = this_cpu_ptr(y);

3. Retrieve the content of the current processors instance of a per cpu
variable.

	DEFINE_PER_CPU(int, y);
	int x = __get_cpu_var(y)

   Converts to

	int x = __this_cpu_read(y);

4. Retrieve the content of a percpu struct

	DEFINE_PER_CPU(struct mystruct, y);
	struct mystruct x = __get_cpu_var(y);

   Converts to

	memcpy(&x, this_cpu_ptr(&y), sizeof(x));

5. Assignment to a per cpu variable

	DEFINE_PER_CPU(int, y)
	__get_cpu_var(y) = x;

   Converts to

	this_cpu_write(y, x);

6. Increment/Decrement etc of a per cpu variable

	DEFINE_PER_CPU(int, y);
	__get_cpu_var(y)++

   Converts to

	this_cpu_inc(y)

Signed-off-by: Christoph Lameter <cl@linux.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2013-11-08 08:59:58 -07:00
Mikulas Patocka 8077c0d983 bdi: test bdi_init failure
There were two places where return value from bdi_init was not tested.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2013-11-08 08:59:44 -07:00
Theodore Ts'o dd1f723bf5 ext4: use prandom_u32() instead of get_random_bytes()
Many of the uses of get_random_bytes() do not actually need
cryptographically secure random numbers.  Replace those uses with a
call to prandom_u32(), which is faster and which doesn't consume
entropy from the /dev/random driver.

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2013-11-08 00:14:53 -05:00
Chao Yu 1d15bd2034 f2fs: fix memory leak after kobject init failed in fill_super
If we failed to init&add kobject when fill_super, stats info and proc object of
f2fs will not be released.
We should free them before we finish fill_super.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-11-08 14:10:29 +09:00
Changman Lee fb51b5ef9c f2fs: cleanup waiting routine for writeback pages in cp
use genernal method supported by kernel

 o changes from v1
   If any waiter exists at end io, wake up it.

Signed-off-by: Changman Lee <cm224.lee@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-11-08 14:10:29 +09:00
Eric Sandeen f275411440 ext4: remove unreachable code after ext4_can_extents_be_merged()
Commit ec22ba8e ("ext4: disable merging of uninitialized extents")
ensured that if either extent under consideration is uninit, we
decline to merge, and ext4_can_extents_be_merged() returns false.

So there is no need for the caller to then test whether the
extent under consideration is unitialized; if it were, we
wouldn't have gotten that far.

The comments were also inaccurate; ext4_can_extents_be_merged()
no longer XORs the states, it fails if *either* is uninit.

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Reviewed-by: Zheng Liu <wenqing.lz@taobao.com>
2013-11-07 22:22:08 -05:00
Linus Torvalds 8efdf2b759 Merge branch 'for-linus' of git://git.samba.org/sfrench/cifs-2.6
Pull CIFS updates from Steve French:
 "Includes a couple of fixes, plus changes to make multiplex identifiers
  easier to read and correlate with network traces, and a set of
  enhancements for SMB3 dialect.  Also adds support for per-file
  compression for both cifs and smb2/smb3 ("chattr +c filename).

  Should have at least one other merge request ready by next week with
  some new SMB3 security features and copy offload support"

* 'for-linus' of git://git.samba.org/sfrench/cifs-2.6:
  Query network adapter info at mount time for debugging
  Fix unused variable warning when CIFS POSIX disabled
  Allow setting per-file compression via CIFS protocol
  Query File System Alignment
  Query device characteristics at mount time from server on SMB2/3 not just on cifs mounts
  cifs: Send a logoff request before removing a smb session
  cifs: Make big endian multiplex ID sequences monotonic on the wire
  cifs: Remove redundant multiplex identifier check from check_smb_hdr()
  Query file system attributes from server on SMB2, not just cifs, mounts
  Allow setting per-file compression via SMB2/3
  Fix corrupt SMB2 ioctl requests
2013-11-08 06:01:47 +09:00
Linus Torvalds c224b76b56 NFS client updates for Linux 3.13
Highlights include:
 
 - Changes to the RPC socket code to allow NFSv4 to turn off timeout+retry
   - Detect TCP connection breakage through the "keepalive" mechanism
 - Add client side support for NFSv4.x migration (Chuck Lever)
 - Add support for multiple security flavour arguments to the "sec=" mount
   option (Dros Adamson)
 - fs-cache bugfixes from David Howells:
   - Fix an issue whereby caching can be enabled on a file that is open for
     writing
 - More NFSv4 open code stable bugfixes
 - Various Labeled NFS (selinux) bugfixes, including one stable fix
 - Fix buffer overflow checking in the RPCSEC_GSS upcall encoding
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.15 (GNU/Linux)
 
 iQIcBAABAgAGBQJSe8TEAAoJEGcL54qWCgDydu0QAJVtVhfwlUKm/HZ4oAy0Q5T8
 rJOWupqGnwyqTNLIRTlNegFSwMY+bABbkihXzSoj641o5zRb200KePlNxknzzlu1
 Q715035LDeEC1jrrHHeztTa9uWxAZ9B6gstMzilJYbV72VRYuWA6Q5LstXwQy/jN
 ViSldrGJ4sRZUe6wpNLPBRDBfOMWOtZdyRqqqjm71ZHJJnaqQWLBvThTG4MsLlpg
 j/khi5189MxJWePTKI9zGZdnXZAZ0ar1tAi1QWDNv044EwsS3LZZIko+YdBh6LZx
 9IBwk6TqOXFY0jxPDsIZtTfWPf4pjewRrPINMkjlZl3TJEf97sIlavZ7gWqvVIz5
 eXzFGy7D2XBgub8TGcmZM/7keHY/sqghz7lXZ8FulXlVem52r/95NiQ9tu8l8hq3
 Ab0FUnjtXeuaDFPBCHlKb3zmCMGFF89VqtpCj2plCPvfcGgJvXJqddWBRisQw9St
 UgD1PQWRFGtkrHv5EcQkd5boVdRNjAVAC9PaCWNpOpSVDjJyuUE+v/k75+ZwDcG8
 afAFMJSbCwRxW+cFlLAsQTfQztzuWTTOOVQvJDxfyYulcWshyIruhiYItRDfJqRp
 RynuVzrBERzUs5wsefnBbC218C/WSlOrodPbsZvdhKolvRx1RNtWT29ilZ6+p2tH
 4378ZRLtQvm9RXBnAkRc
 =gflJ
 -----END PGP SIGNATURE-----

Merge tag 'nfs-for-3.13-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs

Pull NFS client updates from Trond Myklebust:
 "Highlights include:

   - Changes to the RPC socket code to allow NFSv4 to turn off
     timeout+retry:
      * Detect TCP connection breakage through the "keepalive" mechanism
   - Add client side support for NFSv4.x migration (Chuck Lever)
   - Add support for multiple security flavour arguments to the "sec="
     mount option (Dros Adamson)
   - fs-cache bugfixes from David Howells:
     * Fix an issue whereby caching can be enabled on a file that is
       open for writing
   - More NFSv4 open code stable bugfixes
   - Various Labeled NFS (selinux) bugfixes, including one stable fix
   - Fix buffer overflow checking in the RPCSEC_GSS upcall encoding"

* tag 'nfs-for-3.13-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (68 commits)
  NFSv4.2: Remove redundant checks in nfs_setsecurity+nfs4_label_init_security
  NFSv4: Sanity check the server reply in _nfs4_server_capabilities
  NFSv4.2: encode_readdir - only ask for labels when doing readdirplus
  nfs: set security label when revalidating inode
  NFSv4.2: Fix a mismatch between Linux labeled NFS and the NFSv4.2 spec
  NFS: Fix a missing initialisation when reading the SELinux label
  nfs: fix oops when trying to set SELinux label
  nfs: fix inverted test for delegation in nfs4_reclaim_open_state
  SUNRPC: Cleanup xs_destroy()
  SUNRPC: close a rare race in xs_tcp_setup_socket.
  SUNRPC: remove duplicated include from clnt.c
  nfs: use IS_ROOT not DCACHE_DISCONNECTED
  SUNRPC: Fix buffer overflow checking in gss_encode_v0_msg/gss_encode_v1_msg
  SUNRPC: gss_alloc_msg - choose _either_ a v0 message or a v1 message
  SUNRPC: remove an unnecessary if statement
  nfs: Use PTR_ERR_OR_ZERO in 'nfs/nfs4super.c'
  nfs: Use PTR_ERR_OR_ZERO in 'nfs41_callback_up' function
  nfs: Remove useless 'error' assignment
  sunrpc: comment typo fix
  SUNRPC: Add correct rcu_dereference annotation in rpc_clnt_set_transport
  ...
2013-11-08 05:57:46 +09:00
Rob Herring b5480950c6 Merge remote-tracking branch 'grant/devicetree/next' into for-next 2013-11-07 10:34:46 -06:00
Linus Torvalds a1212d278c Revert "sysfs: drop kobj_ns_type handling"
This reverts commit cb26a31157.

It mysteriously causes NetworkManager to not find the wireless device
for me.  As far as I can tell, Tejun *meant* for this commit to not make
any semantic changes, but there clearly are some.  So revert it, taking
into account some of the calling convention changes that happened in
this area in subsequent commits.

Cc: Tejun Heo <tj@kernel.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-11-07 20:47:28 +09:00
Linus Torvalds 0324e74534 Driver Core / sysfs patches for 3.13-rc1
Here's the big driver core / sysfs update for 3.13-rc1.
 
 There's lots of dev_groups updates for different subsystems, as they all
 get slowly migrated over to the safe versions of the attribute groups
 (removing userspace races with the creation of the sysfs files.)  Also
 in here are some kobject updates, devres expansions, and the first round
 of Tejun's sysfs reworking to enable it to be used by other subsystems
 as a backend for an in-kernel filesystem.
 
 All of these have been in linux-next for a while with no reported
 issues.
 
 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.19 (GNU/Linux)
 
 iEYEABECAAYFAlJ6xAMACgkQMUfUDdst+yk1kQCfcHXhfnrvFZ5J/mDP509IzhNS
 ddEAoLEWoivtBppNsgrWqXpD1vi4UMsE
 =JmVW
 -----END PGP SIGNATURE-----

Merge tag 'driver-core-3.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core

Pull driver core / sysfs patches from Greg KH:
 "Here's the big driver core / sysfs update for 3.13-rc1.

  There's lots of dev_groups updates for different subsystems, as they
  all get slowly migrated over to the safe versions of the attribute
  groups (removing userspace races with the creation of the sysfs
  files.) Also in here are some kobject updates, devres expansions, and
  the first round of Tejun's sysfs reworking to enable it to be used by
  other subsystems as a backend for an in-kernel filesystem.

  All of these have been in linux-next for a while with no reported
  issues"

* tag 'driver-core-3.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (83 commits)
  sysfs: rename sysfs_assoc_lock and explain what it's about
  sysfs: use generic_file_llseek() for sysfs_file_operations
  sysfs: return correct error code on unimplemented mmap()
  mdio_bus: convert bus code to use dev_groups
  device: Make dev_WARN/dev_WARN_ONCE print device as well as driver name
  sysfs: separate out dup filename warning into a separate function
  sysfs: move sysfs_hash_and_remove() to fs/sysfs/dir.c
  sysfs: remove unused sysfs_get_dentry() prototype
  sysfs: honor bin_attr.attr.ignore_lockdep
  sysfs: merge sysfs_elem_bin_attr into sysfs_elem_attr
  devres: restore zeroing behavior of devres_alloc()
  sysfs: fix sysfs_write_file for bin file
  input: gameport: convert bus code to use dev_groups
  input: serio: remove bus usage of dev_attrs
  input: serio: use DEVICE_ATTR_RO()
  i2o: convert bus code to use dev_groups
  memstick: convert bus code to use dev_groups
  tifm: convert bus code to use dev_groups
  virtio: convert bus code to use dev_groups
  ipack: convert bus code to use dev_groups
  ...
2013-11-07 11:42:15 +09:00
Gu Zheng 359d992bcd xfs: simplify kmem_{zone_}zalloc
Introduce flag KM_ZERO which is used to alloc zeroed entry, and convert
kmem_{zone_}zalloc to call kmem_{zone_}alloc() with KM_ZERO directly,
in order to avoid the setting to zero step. 
And following Dave's suggestion, make kmem_{zone_}zalloc static inline
into kmem.h as they're now just a simple wrapper.

V2:
  Make kmem_{zone_}zalloc static inline into kmem.h as Dave suggested.

Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2013-11-06 16:31:27 -06:00
Dave Chinner d123031a56 xfs: add tracepoints to AGF/AGI read operations
To help track down AGI/AGF lock ordering issues, I added these
tracepoints to tell us when an AGI or AGF is read and locked.  With
these we can now determine if the lock ordering goes wrong from
tracing captures.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ben Myers <bpm@sgi.com>
2013-11-06 12:42:52 -06:00
Dave Chinner 750b9c9066 xfs: trace AIL manipulations
I debugging a log tail issue on a RHEL6 kernel, I added these trace
points to trace log items being added, moved and removed in the AIL
and how that affected the log tail LSN that was written to the log.
They were very helpful in that they immediately identified the cause
of the problem being seen. Hence I'd like to always have them
available for use.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ben Myers <bpm@sgi.com>
2013-11-06 12:41:51 -06:00
John Stultz 1ca7d67cf5 seqcount: Add lockdep functionality to seqcount/seqlock structures
Currently seqlocks and seqcounts don't support lockdep.

After running across a seqcount related deadlock in the timekeeping
code, I used a less-refined and more focused variant of this patch
to narrow down the cause of the issue.

This is a first-pass attempt to properly enable lockdep functionality
on seqlocks and seqcounts.

Since seqcounts are used in the vdso gettimeofday code, I've provided
non-lockdep accessors for those needs.

I've also handled one case where there were nested seqlock writers
and there may be more edge cases.

Comments and feedback would be appreciated!

Signed-off-by: John Stultz <john.stultz@linaro.org>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Li Zefan <lizefan@huawei.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: netdev@vger.kernel.org
Link: http://lkml.kernel.org/r/1381186321-4906-3-git-send-email-john.stultz@linaro.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-11-06 12:40:26 +01:00
Chao Yu 3b03f72445 f2fs: avoid to use a NULL point in destroy_segment_manager
A NULL point should avoid to be used in destroy_segment_manager after allocating
memory fail for f2fs_sm_info.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-11-06 16:37:44 +09:00
Ingo Molnar c90423d1de Merge branch 'sched/core' into core/locking, to prepare the kernel/locking/ file move
Conflicts:
	kernel/Makefile

There are conflicts in kernel/Makefile due to file moving in the
scheduler tree - resolve them.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-11-06 07:50:37 +01:00
Richard Guy Briggs 9410d228a4 audit: call audit_bprm() only once to add AUDIT_EXECVE information
Move the audit_bprm() call from search_binary_handler() to exec_binprm().  This
allows us to get rid of the mm member of struct audit_aux_data_execve since
bprm->mm will equal current->mm.

This also mitigates the issue that ->argc could be modified by the
load_binary() call in search_binary_handler().

audit_bprm() was being called to add an AUDIT_EXECVE record to the audit
context every time search_binary_handler() was recursively called.  Only one
reference is necessary.

Reported-by: Oleg Nesterov <onestero@redhat.com>
Cc: Eric Paris <eparis@redhat.com>
Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
Signed-off-by: Eric Paris <eparis@redhat.com>
---
This patch is against 3.11, but was developed on Oleg's post-3.11 patches that
introduce exec_binprm().
2013-11-05 11:15:03 -05:00
Jeff Layton 14e972b451 audit: add child record before the create to handle case where create fails
Historically, when a syscall that creates a dentry fails, you get an audit
record that looks something like this (when trying to create a file named
"new" in "/tmp/tmp.SxiLnCcv63"):

    type=PATH msg=audit(1366128956.279:965): item=0 name="/tmp/tmp.SxiLnCcv63/new" inode=2138308 dev=fd:02 mode=040700 ouid=0 ogid=0 rdev=00:00 obj=staff_u:object_r:user_tmp_t:s15:c0.c1023

This record makes no sense since it's associating the inode information for
"/tmp/tmp.SxiLnCcv63" with the path "/tmp/tmp.SxiLnCcv63/new". The recent
patch I posted to fix the audit_inode call in do_last fixes this, by making it
look more like this:

    type=PATH msg=audit(1366128765.989:13875): item=0 name="/tmp/tmp.DJ1O8V3e4f/" inode=141 dev=fd:02 mode=040700 ouid=0 ogid=0 rdev=00:00 obj=staff_u:object_r:user_tmp_t:s15:c0.c1023

While this is more correct, if the creation of the file fails, then we
have no record of the filename that the user tried to create.

This patch adds a call to audit_inode_child to may_create. This creates
an AUDIT_TYPE_CHILD_CREATE record that will sit in place until the
create succeeds. When and if the create does succeed, then this record
will be updated with the correct inode info from the create.

This fixes what was broken in commit bfcec708.
Commit 79f6530c should also be backported to stable v3.7+.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Eric Paris <eparis@redhat.com>
Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
Signed-off-by: Eric Paris <eparis@redhat.com>
2013-11-05 11:08:44 -05:00
Eric Paris 81407c84ac audit: allow unsetting the loginuid (with priv)
If a task has CAP_AUDIT_CONTROL allow that task to unset their loginuid.
This would allow a child of that task to set their loginuid without
CAP_AUDIT_CONTROL.  Thus when launching a new login daemon, a
priviledged helper would be able to unset the loginuid and then the
daemon, which may be malicious user facing, do not need priv to function
correctly.

Signed-off-by: Eric Paris <eparis@redhat.com>
Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
Signed-off-by: Eric Paris <eparis@redhat.com>
2013-11-05 11:08:09 -05:00
Jan Kara 7ba3ec5749 ext2: Fix fs corruption in ext2_get_xip_mem()
Commit 8e3dffc651 "Ext2: mark inode dirty after the function
dquot_free_block_nodirty is called" unveiled a bug in __ext2_get_block()
called from ext2_get_xip_mem(). That function called ext2_get_block()
mistakenly asking it to map 0 blocks while 1 was intended. Before the
above mentioned commit things worked out fine by luck but after that commit
we started returning that we allocated 0 blocks while we in fact
allocated 1 block and thus allocation was looping until all blocks in
the filesystem were exhausted.

Fix the problem by properly asking for one block and also add assertion
in ext2_get_blocks() to catch similar problems.

Reported-and-tested-by: Andiry Xu <andiry.xu@gmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>
2013-11-05 11:26:47 +01:00
Maxim Patlasov ce128de626 fuse: writepages: protect secondary requests from fuse file release
All async fuse requests must be supplied with extra reference to a fuse
file.  This is necessary to ensure that the fuse file is not released until
all in-flight requests are completed.  Fuse secondary writeback requests
must obey this rule as well.

Signed-off-by: Maxim Patlasov <MPatlasov@parallels.com>
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
2013-11-05 10:11:29 +01:00
Maxim Patlasov 41b6e41fc6 fuse: writepages: update bdi writeout when deleting secondary request
BDI_WRITTEN counter is used to estimate bdi bandwidth.  It must be
incremented every time as bdi ends page writeback.  No matter whether it
was fulfilled by actual write or by discarding the request (e.g. due to
shrunk i_size).

Note that even before writepages patches, the case "Got truncated off
completely" was handled in fuse_send_writepage() by calling
fuse_writepage_finish() which updated BDI_WRITTEN unconditionally.

Signed-off-by: Maxim Patlasov <MPatlasov@parallels.com>
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
2013-11-05 10:11:28 +01:00
Maxim Patlasov 6eaf4782eb fuse: writepages: crop secondary requests
If writeback happens while fuse is in FUSE_NOWRITE condition, the request
will be queued but not processed immediately (see fuse_flush_writepages()).
Until FUSE_NOWRITE becomes relaxed, more writebacks can happen.  They will
be queued as "secondary" requests to that first ("primary") request.

Existing implementation crops only primary request.  This is not correct
because a subsequent extending write(2) may increase i_size and then
secondary requests won't be cropped properly.  The result would be stale
data written to the server to a file offset where zeros must be.

Similar problem may happen if secondary requests are attached to an
in-flight request that was already cropped.

The patch solves the issue by cropping all secondary requests in
fuse_writepage_end().  Thanks to Miklos for idea.

Signed-off-by: Maxim Patlasov <MPatlasov@parallels.com>
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
2013-11-05 10:11:27 +01:00
Maxim Patlasov f6011081f5 fuse: writepages: roll back changes if request not found
fuse_writepage_in_flight() returns false if it fails to find request with
given index in fi->writepages.  Then the caller proceeds with populating
data->orig_pages[] and incrementing req->num_pages.  Hence,
fuse_writepage_in_flight() must revert changes it made in request before
returning false.

Signed-off-by: Maxim Patlasov <MPatlasov@parallels.com>
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
2013-11-05 10:11:26 +01:00
J. Bruce Fields b78800baee Revert "nfsd: remove_stid can be incorporated into nfs4_put_delegation"
This reverts commit 7ebe40f203.  We forgot
the nfs4_put_delegation call in fs/nfsd/nfs4callback.c which should not
be unhashing the stateid.  This lead to warnings from the idr code when
we tried to removed id's twice.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2013-11-04 17:46:50 -05:00
Trond Myklebust fab99ebe39 NFSv4.2: Remove redundant checks in nfs_setsecurity+nfs4_label_init_security
We already check for nfs_server_capable(inode, NFS_CAP_SECURITY_LABEL)
in nfs4_label_alloc()
We check the minor version in _nfs4_server_capabilities before setting
NFS_CAP_SECURITY_LABEL.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-11-04 16:42:52 -05:00
Trond Myklebust b944dba31d NFSv4: Sanity check the server reply in _nfs4_server_capabilities
We don't want to be setting capabilities and/or requesting attributes
that are not appropriate for the NFSv4 minor version.

- Ensure that we clear the NFS_CAP_SECURITY_LABEL capability when appropriate
- Ensure that we limit the attribute bitmasks to the mounted_on_fileid
  attribute and less for NFSv4.0
- Ensure that we limit the attribute bitmasks to suppattr_exclcreat and
  less for NFSv4.1
- Ensure that we limit it to change_sec_label or less for NFSv4.2

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-11-04 16:42:52 -05:00
Trond Myklebust d204c5d2b8 NFSv4.2: encode_readdir - only ask for labels when doing readdirplus
Currently, if the server is doing NFSv4.2 and supports labeled NFS, then
our on-the-wire READDIR request ends up asking for the label information,
which is then ignored unless we're doing readdirplus.
This patch ensures that READDIR doesn't ask the server for label information
at all unless the readdir->bitmask contains the FATTR4_WORD2_SECURITY_LABEL
attribute, and the readdir->plus flag is set.

While we're at it, optimise away the 3rd bitmap field if it is zero.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-11-04 16:42:51 -05:00
Jeff Layton 3da580aab9 nfs: set security label when revalidating inode
Currently, we fetch the security label when revalidating an inode's
attributes, but don't apply it. This is in contrast to the readdir()
codepath where we do apply label changes.

Cc: Dave Quigley <dpquigl@davequigley.com>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-11-04 16:42:38 -05:00
Dave Chinner 273203699f xfs: xfs_remove deadlocks due to inverted AGF vs AGI lock ordering
Removing an inode from the namespace involves removing the directory
entry and dropping the link count on the inode. Removing the
directory entry can result in locking an AGF (directory blocks were
freed) and removing a link count can result in placing the inode on
an unlinked list which results in locking an AGI.

The big problem here is that we have an ordering constraint on AGF
and AGI locking - inode allocation locks the AGI, then can allocate
a new extent for new inodes, locking the AGF after the AGI.
Similarly, freeing the inode removes the inode from the unlinked
list, requiring that we lock the AGI first, and then freeing the
inode can result in an inode chunk being freed and hence freeing
disk space requiring that we lock an AGF.

Hence the ordering that is imposed by other parts of the code is AGI
before AGF. This means we cannot remove the directory entry before
we drop the inode reference count and put it on the unlinked list as
this results in a lock order of AGF then AGI, and this can deadlock
against inode allocation and freeing. Therefore we must drop the
link counts before we remove the directory entry.

This is still safe from a transactional point of view - it is not
until we get to xfs_bmap_finish() that we have the possibility of
multiple transactions in this operation. Hence as long as we remove
the directory entry and drop the link count in the first transaction
of the remove operation, there are no transactional constraints on
the ordering here.

Change the ordering of the operations in the xfs_remove() function
to align the ordering of AGI and AGF locking to match that of the
rest of the code.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Ben Myers <bpm@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2013-11-04 13:18:48 -06:00
Benjamin LaHaise 13fd8a5dc3 Merge branch 'aio-fix' of http://evilpiepirate.org/git/linux-bcache 2013-11-04 13:45:25 -05:00
Eric Sandeen da0169b3b9 ext4: remove unreachable code in ext4_can_extents_be_merged()
Commit ec22ba8e ("ext4: disable merging of uninitialized extents")
ensured that if either extent under consideration is uninit, we
decline to merge, and immediately return.

But right after that test, we test again for an uninit
extent; we can never hit this.  So just remove the impossible
test and associated variable.

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Reviewed-by: Zheng Liu <wenqing.lz@taobao.com>
2013-11-04 09:58:26 -05:00
Dan Carpenter 18da65e7d3 quota: info leak in quota_getquota()
The if_dqblk struct has a 4 byte hole at the end of the struct so
uninitialized stack information is leaked to user space.

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Jan Kara <jack@suse.cz>
2013-11-04 14:16:55 +01:00
Steven Whitehouse 2147dbfd05 GFS2: Use generic list_lru for quota
By using the generic list_lru code, we can now separate the
per sb quota list locking from the lru locking. The lru
lock is made into the inner-most lock.

As a result of this new lock order, we may occasionally see
items on the per-sb quota list which are "dead" so that the
two places where we traverse that list are updated to take
account of that.

As a result of this patch, the gfs2 quota shrinker is now
NUMA zone aware, and we are also laying the foundations for
further improvments in due course.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Signed-off-by: Abhijith Das <adas@redhat.com>
Tested-by: Abhijith Das <adas@redhat.com>
Cc: Dave Chinner <dchinner@redhat.com>
2013-11-04 11:17:49 +00:00
Steven Whitehouse 7d80823e1d GFS2: Rename quota qd_lru_lock qd_lock
This is a straight forward rename which is in preparation for
introducing the generic list_lru infrastructure in the
following patch.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Signed-off-by: Abhijith Das <adas@redhat.com>
Tested-by: Abhijith Das <adas@redhat.com>
2013-11-04 11:17:36 +00:00
Steven Whitehouse 9b9f039d57 GFS2: Use reflink for quota data cache
This patch adds reflink support to the quota data cache. It
looks a bit strange because we still don't have a sensible
split in the lookup by id and the lru list. That is coming in
later patches though.

The intent here is just to swap the current ref count for
reflinks in all cases with as little as possible other change.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Signed-off-by: Abhijith Das <adas@redhat.com>
Tested-by: Abhijith Das <adas@redhat.com>
2013-11-04 11:17:07 +00:00
Chao Yu 4bf08ff6f9 f2fs: remove unnecessary TestClearPageError when wait pages writeback
In wait_on_node_pages_writeback we will test and clear error flag for all
pages in radix tree, but not necessary.
So we only do this for pages belong to the specified inode.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-11-04 12:24:01 +09:00
Steve French c481e9feee Query network adapter info at mount time for debugging
When CONFIG_CIFS_STATS2 enabled query adapter info for debugging

It is easy now in SMB3 to query the information about the server's
network interfaces (and at least Windows 8 and above do this, if not
other clients) there are some useful pieces of information you can get
including:

- all of the network interfaces that the server advertises (not just
the one you are mounting over), and with SMB3 supporting multichannel
this helps with more than just failover (also aggregating multiple
sockets under one mount)

- whether the adapter supports RSS (useful to know if you want to
estimate whether setting up two or more socket connections to the same
address is going to be faster due to RSS offload in the adapter)

- whether the server supports RDMA

- whether the server has IPv6 interfaces (if you connected over IPv4
but prefer IPv6 e.g.)

- what the link speed is (you might want to reconnect over a higher
speed interface if available)

(Of course we could also rerequest this on every mount cheaplly to the
same server, as Windows apparently does, so we can update the adapter
info on new mounts, and also on every reconnect if the network
interface drops temporarily - so we don't have to rely on info from
the first mount to this server)

It is trivial to request this information - and certainly will be useful
when we get to the point of doing multichannel (and eventually RDMA),
but some of this (linkspeed etc.) info may help for debugging in
the meantime.  Enable this request when CONFIG_CIFS_STATS2 is on
(only for smb3 mounts since it is an SMB3 or later ioctl).

Signed-off-by: Steve French <smfrench@gmail.com>
2013-11-02 12:53:45 -05:00
Steve French f10d9ba405 Fix unused variable warning when CIFS POSIX disabled
Fix unused variable warning when CONFIG_CIFS_POSIX disabled.

   fs/cifs/ioctl.c: In function 'cifs_ioctl':
>> fs/cifs/ioctl.c:40:8: warning: unused variable 'ExtAttrMask' [-Wunused-variable]
     __u64 ExtAttrMask = 0;
           ^
Pointed out by 0-DAY kernel build testing backend

Signed-off-by: Steve French <smfrench@gmail.com>
2013-11-02 12:52:48 -05:00
Steve French c7f508a99b Allow setting per-file compression via CIFS protocol
An earlier patch allowed setting the per-file compression flag

"chattr +c filename"

on an smb2 or smb3 mount, and also allowed lsattr to return
whether a file on a cifs, or smb2/smb3 mount was compressed.

This patch extends the ability to set the per-file
compression flag to the cifs protocol, which uses a somewhat
different IOCTL mechanism than SMB2, although the payload
(the flags stored in the compression_state) are the same.

Reviewed-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <smfrench@gmail.com>
2013-11-02 12:52:44 -05:00
Steven French af6a12ea8d Query File System Alignment
In SMB3 it is now possible to query the file system
alignment info, and the preferred (for performance)
sector size and whether the underlying disk
has no seek penalty (like SSD).

Query this information at mount time for SMB3,
and make it visible in /proc/fs/cifs/DebugData
for debugging purposes.

This alignment information and preferred sector
size info will be helpful for the copy offload
patches to setup the right chunks in the CopyChunk
requests.   Presumably the knowledge that the
underlying disk is SSD could also help us
make better readahead and writebehind
decisions (something to look at in the future).

Signed-off-by: Steve French <smfrench@gmail.com>
2013-11-02 12:52:41 -05:00
Steven French 2167114c6e Query device characteristics at mount time from server on SMB2/3 not just on cifs mounts
Currently SMB2 and SMB3 mounts do not query the device information at mount time
from the server as is done for cifs.  These can be useful for debugging.
This is a minor patch, that extends the previous one (which added ability to
query file system attributes at mount time - this returns the device
characteristics - also via in /proc/fs/cifs/DebugData)

Signed-off-by: Steve French <smfrench@gmail.com>
2013-11-02 12:52:38 -05:00
Shirish Pargaonkar 7f48558e64 cifs: Send a logoff request before removing a smb session
Send a smb session logoff request before removing smb session off of the list.
On a signed smb session, remvoing a session off of the list before sending
a logoff request results in server returning an error for lack of
smb signature.

Never seen an error during smb logoff, so as per MS-SMB2 3.2.5.1,
not sure how an error during logoff should be retried. So for now,
if a server returns an error to a logoff request, log the error and
remove the session off of the list.

Signed-off-by: Shirish Pargaonkar <shirishpargaonkar@gmail.com>
Reviewed-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <smfrench@gmail.com>
2013-11-02 12:52:35 -05:00
Tim Gardner 3d378d3fd8 cifs: Make big endian multiplex ID sequences monotonic on the wire
The multiplex identifier (MID) in the SMB header is only
ever used by the client, in conjunction with PID, to match responses
from the server. As such, the endianess of the MID is not important.
However, When tracing packet sequences on the wire, protocol analyzers
such as wireshark display MID as little endian. It is much more informative
for the on-the-wire MID sequences to match debug information emitted by the
CIFS driver.  Therefore, one should write and read MID in the SMB header
assuming it is always little endian.

Observed from wireshark during the protocol negotiation
and session setup:

        Multiplex ID: 256
        Multiplex ID: 256
        Multiplex ID: 512
        Multiplex ID: 512
        Multiplex ID: 768
        Multiplex ID: 768

After this patch on-the-wire MID values begin at 1 and increase monotonically.

Introduce get_next_mid64() for the internal consumers that use the full 64 bit
multiplex identifier.

Introduce the helpers get_mid() and compare_mid() to make the endian
translation clear.

Reviewed-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Tim Gardner <timg@tpi.com>
Signed-off-by: Steve French <smfrench@gmail.com>
2013-11-02 12:51:53 -05:00
Tejun Heo 0cae60f914 sysfs: rename sysfs_assoc_lock and explain what it's about
sysfs_assoc_lock is an odd piece of locking.  In general, whoever owns
a kobject is responsible for synchronizing sysfs operations and sysfs
proper assumes that, for example, removal won't race with any other
operation; however, this doesn't work for symlinking because an entity
performing symlink doesn't usually own the target kobject and thus has
no control over its removal.

sysfs_assoc_lock synchronizes symlink operations against kobj->sd
disassociation so that symlink code doesn't end up dereferencing
already freed sysfs_dirent by racing with removal of the target
kobject.

This is quite obscure and the generic name of the lock and lack of
comments make it difficult to understand its role.  Let's rename it to
sysfs_symlink_target_lock and add comments explaining what's going on.

Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-11-01 12:13:37 -07:00
Tejun Heo 044e3bc333 sysfs: use generic_file_llseek() for sysfs_file_operations
13c589d5b0 ("sysfs: use seq_file when reading regular files")
converted regular sysfs files to use seq_file.  The commit substituted
generic_file_llseek() with seq_lseek() for llseek implementation.

Before the change, all regular sysfs files were allowed to seek to any
position in [0, PAGE_SIZE] as the file size is always PAGE_SIZE and
generic_file_llseek() allows any seeking inside the range under file
size; however, seq_lseek()'s behavior is different.  It traverses the
output by repeatedly invoking ->show() until it reaches the target
offset or traversal indicates EOF.  As seq_files are fully dynamic and
may not end at all, it doesn't support seeking from the end
(SEEK_END).

Apparently, there are userland tools which uses SEEK_END to discover
the buffer size to use and the switch to seq_lseek() disturbs them as
SEEK_END fails with -EINVAL.

The only benefits of using seq_lseek() instead of
generic_file_llseek() are

* Early failure.  If traversing to certain file position should fail,
  seq_lseek() will report such failures on lseek(2) instead of the
  following read/write operations.

* EOF detection.  While SEEK_END is not supported, SEEK_SET/CUR +
  large offset can be used to detect eof - eof at the time of the seek
  anyway as the file size may change dynamically.

Both aren't necessary for sysfs or prospect kernfs users.  Revert to
genefic_file_llseek() and preserve the original behavior.

Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Link: https://lkml.kernel.org/r/20131031114358.GA5551@osiris
Tested-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-11-01 12:13:37 -07:00
J. Bruce Fields 3378b7f40d nfsd4: fix discarded security labels on setattr
Security labels in setattr calls are currently ignored because we forget
to set label->len.

Cc: stable@vger.kernel.org
Reported-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2013-11-01 13:40:46 -04:00
Trond Myklebust fcb63a9bd8 NFS: Fix a missing initialisation when reading the SELinux label
Ensure that _nfs4_do_get_security_label() also initialises the
SEQUENCE call correctly, by having it call into nfs4_call_sync().

Reported-by: Jeff Layton <jlayton@redhat.com>
Cc: stable@vger.kernel.org # 3.11+
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-11-01 12:42:25 -04:00
Jeff Layton 12207f69b3 nfs: fix oops when trying to set SELinux label
Chao reported the following oops when testing labeled NFS:

BUG: unable to handle kernel NULL pointer dereference at           (null)
IP: [<ffffffffa0568703>] nfs4_xdr_enc_setattr+0x43/0x110 [nfsv4]
PGD 277bbd067 PUD 2777ea067 PMD 0
Oops: 0000 [#1] SMP
Modules linked in: rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache sg coretemp kvm_intel kvm crc32_pclmul crc32c_intel ghash_clmulni_intel aesni_intel lrw gf128mul iTCO_wdt glue_helper ablk_helper cryptd iTCO_vendor_support bnx2 pcspkr serio_raw i7core_edac cdc_ether microcode usbnet edac_core mii lpc_ich i2c_i801 mfd_core shpchp ioatdma dca acpi_cpufreq mperf nfsd auth_rpcgss nfs_acl lockd sunrpc xfs libcrc32c sr_mod sd_mod cdrom crc_t10dif mgag200 syscopyarea sysfillrect sysimgblt i2c_algo_bit drm_kms_helper ata_generic ttm pata_acpi drm ata_piix libata megaraid_sas i2c_core dm_mirror dm_region_hash dm_log dm_mod
CPU: 4 PID: 25657 Comm: chcon Not tainted 3.10.0-33.el7.x86_64 #1
Hardware name: IBM System x3550 M3 -[7944OEJ]-/90Y4784     , BIOS -[D6E150CUS-1.11]- 02/08/2011
task: ffff880178397220 ti: ffff8801595d2000 task.ti: ffff8801595d2000
RIP: 0010:[<ffffffffa0568703>]  [<ffffffffa0568703>] nfs4_xdr_enc_setattr+0x43/0x110 [nfsv4]
RSP: 0018:ffff8801595d3888  EFLAGS: 00010296
RAX: 0000000000000000 RBX: ffff8801595d3b30 RCX: 0000000000000b4c
RDX: ffff8801595d3b30 RSI: ffff8801595d38e0 RDI: ffff880278b6ec00
RBP: ffff8801595d38c8 R08: ffff8801595d3b30 R09: 0000000000000001
R10: 0000000000000000 R11: 0000000000000000 R12: ffff8801595d38e0
R13: ffff880277a4a780 R14: ffffffffa05686c0 R15: ffff8802765f206c
FS:  00007f2c68486800(0000) GS:ffff88027fc00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000000 CR3: 000000027651a000 CR4: 00000000000007e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Stack:
 0000000000000000 0000000000000000 0000000000000000 0000000000000000
 0000000000000000 ffff880277865800 ffff880278b6ec00 ffff880277a4a780
 ffff8801595d3948 ffffffffa02ad926 ffff8801595d3b30 ffff8802765f206c
Call Trace:
 [<ffffffffa02ad926>] rpcauth_wrap_req+0x86/0xd0 [sunrpc]
 [<ffffffffa02a1d40>] ? call_connect+0xb0/0xb0 [sunrpc]
 [<ffffffffa02a1d40>] ? call_connect+0xb0/0xb0 [sunrpc]
 [<ffffffffa02a1ecb>] call_transmit+0x18b/0x290 [sunrpc]
 [<ffffffffa02a1d40>] ? call_connect+0xb0/0xb0 [sunrpc]
 [<ffffffffa02aae14>] __rpc_execute+0x84/0x400 [sunrpc]
 [<ffffffffa02ac40e>] rpc_execute+0x5e/0xa0 [sunrpc]
 [<ffffffffa02a2ea0>] rpc_run_task+0x70/0x90 [sunrpc]
 [<ffffffffa02a2f03>] rpc_call_sync+0x43/0xa0 [sunrpc]
 [<ffffffffa055284d>] _nfs4_do_set_security_label+0x11d/0x170 [nfsv4]
 [<ffffffffa0558861>] nfs4_set_security_label.isra.69+0xf1/0x1d0 [nfsv4]
 [<ffffffff815fca8b>] ? avc_alloc_node+0x24/0x125
 [<ffffffff815fcd2f>] ? avc_compute_av+0x1a3/0x1b5
 [<ffffffffa055897b>] nfs4_xattr_set_nfs4_label+0x3b/0x50 [nfsv4]
 [<ffffffff811bc772>] generic_setxattr+0x62/0x80
 [<ffffffff811bcfc3>] __vfs_setxattr_noperm+0x63/0x1b0
 [<ffffffff811bd1c5>] vfs_setxattr+0xb5/0xc0
 [<ffffffff811bd2fe>] setxattr+0x12e/0x1c0
 [<ffffffff811a4d22>] ? final_putname+0x22/0x50
 [<ffffffff811a4f2b>] ? putname+0x2b/0x40
 [<ffffffff811aa1cf>] ? user_path_at_empty+0x5f/0x90
 [<ffffffff8119bc29>] ? __sb_start_write+0x49/0x100
 [<ffffffff811bd66f>] SyS_lsetxattr+0x8f/0xd0
 [<ffffffff8160cf99>] system_call_fastpath+0x16/0x1b
Code: 48 8b 02 48 c7 45 c0 00 00 00 00 48 c7 45 c8 00 00 00 00 48 c7 45 d0 00 00 00 00 48 c7 45 d8 00 00 00 00 48 c7 45 e0 00 00 00 00 <48> 8b 00 48 8b 00 48 85 c0 0f 84 ae 00 00 00 48 8b 80 b8 03 00
RIP  [<ffffffffa0568703>] nfs4_xdr_enc_setattr+0x43/0x110 [nfsv4]
 RSP <ffff8801595d3888>
CR2: 0000000000000000

The problem is that _nfs4_do_set_security_label calls rpc_call_sync()
directly which fails to do any setup of the SEQUENCE call. Have it use
nfs4_call_sync() instead which does the right thing. While we're at it
change the name of "args" to "arg" to better match the pattern in
_nfs4_do_setattr.

Reported-by: Chao Ye <cye@redhat.com>
Cc: David Quigley <dpquigl@davequigley.com>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Cc: stable@vger.kernel.org # 3.11+
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-11-01 12:41:39 -04:00
Ingo Molnar fb10d5b7ef Merge branch 'linus' into sched/core
Resolve cherry-picking conflicts:

Conflicts:
	mm/huge_memory.c
	mm/memory.c
	mm/mprotect.c

See this upstream merge commit for more details:

  52469b4fcd Merge branch 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-11-01 08:24:41 +01:00
Theodore Ts'o dcb9917ba0 ext4: avoid bh leak in retry path of ext4_expand_extra_isize_ea()
Reported-by: Dave Jones <davej@redhat.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Cc: stable@vger.kernel.org
2013-10-31 23:00:24 -04:00
Linus Torvalds 358eec1824 vfs: decrapify dput(), fix cache behavior under normal load
We do not want to dirty the dentry->d_flags cacheline in dput() just to
set the DCACHE_REFERENCED flag when it is already set in the common case
anyway.  This way the first cacheline of the dentry (which contains the
RCU lookup information etc) can stay shared among multiple CPU's.

This finishes off some of the details of all the scalability patches
merged during the merge window.

Also don't mark dentry_kill() for inlining, since it's the uncommon path
and inlining it just makes the common path slower due to extra function
entry/exit overhead.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-10-31 15:43:02 -07:00
Jie Liu bb86d21cba xfs: fix the extent count when allocating an new indirection array entry
At xfs_iext_add(), if extent(s) are being appended to the last page in
the indirection array and the new extent(s) don't fit in the page, the
number of extents(erp->er_extcount) in a new allocated entry should be
the minimum value between count and XFS_LINEAR_EXTS, instead of count.

For now, there is no existing test case can demonstrates a problem with
the er_extcount being set incorrectly here, but it obviously like a bug.

Signed-off-by: Jie Liu <jeff.liu@oracle.com>
Reviewed-by: Ben Myers <bpm@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2013-10-31 16:43:19 -05:00
Jan Kara 1ce0aa802c jbd: Revert "jbd: remove dependency on __GFP_NOFAIL"
This reverts commit 05713082ab. The idea
to remove __GFP_NOFAIL was opposed by Andrew Morton. Although mm guys do
want to get rid of __GFP_NOFAIL users, opencoding the allocation retry
is even worse.

See emails following
http://www.gossamer-threads.com/lists/linux/kernel/1809153#1809153

Signed-off-by: Jan Kara <jack@suse.cz>
2013-10-31 20:37:15 +01:00
Jeff Layton 1acd1c301f nfs: fix inverted test for delegation in nfs4_reclaim_open_state
commit 6686390bab (NFS: remove incorrect "Lock reclaim failed!"
warning.) added a test for a delegation before checking to see if any
reclaimed locks failed. The test however is backward and is only doing
that check when a delegation is held instead of when one isn't.

Cc: NeilBrown <neilb@suse.de>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Fixes: 6686390bab6a: NFS: remove incorrect "Lock reclaim failed!" warning.
Cc: stable@vger.kernel.org # 3.12
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-10-31 13:24:07 -04:00
Darrick J. Wong 2746f7a170 ext4: don't count free clusters from a corrupt block group
A bg that's been flagged "corrupt" by definition has no free blocks,
so that the allocator won't be tempted to use the damaged bg.
Therefore, we shouldn't count the clusters in the damaged group when
calculating free counts.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Reviewed-by: Zheng Liu <wenqing.lz@taobao.com>
2013-10-31 11:46:31 -04:00
Jaegeuk Kim cfe58f9dcd f2fs: avoid to wait all the node blocks during fsync
Previously, f2fs_sync_file() waits for all the node blocks to be written.
But, we don't need to do that, but wait only the inode-related node blocks.

This patch adds wait_on_node_pages_writeback() in which waits inode-related
node blocks that are on writeback.

Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-10-31 16:01:03 +09:00
Anna Schumaker 8217d146ab NFSD: Add support for NFS v4.2 operation checking
The server does allow NFS over v4.2, even if it doesn't add any new
operations yet.

I also switch to using constants to represent the last operation for
each minor version since this makes the code cleaner and easier to
understand at a quick glance.

Signed-off-by: Anna Schumaker <bjschuma@netapp.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2013-10-30 18:25:04 -04:00
Eric Sandeen 10e6e65dfc xfs: be more forgiving of a v4 secondary sb w/ junk in v5 fields
Today, if xfs_sb_read_verify encounters a v4 superblock
with junk past v4 fields which includes data in sb_crc,
it will be treated as a failing checksum and a significant
corruption.

There are known prior bugs which leave junk at the end
of the V4 superblock; we don't need to actually fail the
verification in this case if other checks pan out ok.

So if this is a secondary superblock, and the primary
superblock doesn't indicate that this is a V5 filesystem,
don't treat this as an actual checksum failure.

We should probably check the garbage condition as
we do in xfs_repair, and possibly warn about it
or self-heal, but that's a different scope of work.

Stable folks: This can go back to v3.10, which is what
introduced the sb CRC checking that is tripped up by old,
stale, incorrect V4 superblocks w/ unzeroed bits.

Cc: stable@vger.kernel.org
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Acked-by: Dave Chinner <david@fromorbit.com>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2013-10-30 16:38:29 -05:00
Geyslan G. Bem 643f7c4e56 xfs: fix possible NULL dereference in xlog_verify_iclog
In xlog_verify_iclog a debug check of the incore log buffers prints an
error if icptr is null and then goes on to dereference the pointer
regardless.  Convert this to an assert so that the intention is clear.
This was reported by Coverty.

Signed-off-by: Ben Myers <bpm@sgi.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
2013-10-30 16:01:00 -05:00
Denis Efremov 5bf1f439c8 xfs:xfs_dir2_node.c: pointer use before check for null
ASSERT on args takes place after args dereference.
This assertion is redundant since we are going to panic anyway.

Found by Linux Driver Verification project (linuxtesting.org) -
PVS-Studio analyzer.

Signed-off-by: Denis Efremov <yefremov.denis@gmail.com>
Reviewed-by: Ben Myers <bpm@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2013-10-30 15:53:14 -05:00
Dave Chinner ad22c7a043 xfs: prevent stack overflows from page cache allocation
Page cache allocation doesn't always go through ->begin_write and
hence we don't always get the opportunity to set the allocation
context to GFP_NOFS. Failing to do this means we open up the direct
relcaim stack to recurse into the filesystem and consume a
significant amount of stack.

On RHEL6.4 kernels we are seeing ra_submit() and
generic_file_splice_read() from an nfsd context recursing into the
filesystem via the inode cache shrinker and evicting inodes. This is
causing truncation to be run (e.g EOF block freeing) and causing
bmap btree block merges and free space btree block splits to occur.
These btree manipulations are occurring with the call chain already
30 functions deep and hence there is not enough stack space to
complete such operations.

To avoid these specific overruns, we need to prevent the page cache
allocation from recursing via direct reclaim. We can do that because
the allocation functions take the allocation context from that which
is stored in the mapping for the inode. We don't set that right now,
so the default is GFP_HIGHUSER_MOVABLE, which is effectively a
GFP_KERNEL context. We need it to be the equivalent of GFP_NOFS, so
when we initialise an inode, set the mapping gfp mask appropriately.

This makes the use of AOP_FLAG_NOFS redundant from other parts of
the XFS IO path, so get rid of it.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ben Myers <bpm@sgi.com>
2013-10-30 15:44:51 -05:00
Dave Chinner 632b89e82b xfs: fix static and extern sparse warnings
The kbuild test robot indicated that there were some new sparse
warnings in fs/xfs/xfs_dquot_buf.c. Actually, there were a lot more
that is wasn't warning about, so fix them all up.

Reported-by: kbuild test robot
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Ben Myers <bpm@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2013-10-30 13:59:56 -05:00
Dave Chinner a629362105 xfs: validity check the directory block leaf entry count
The directory block format verifier fails to check that the leaf
entry count is in a valid range, and so if it is corrupted then it
can lead to derefencing a pointer outside the block buffer. While we
can't exactly validate the count without first walking the directory
block, we can ensure the count lands in the valid area within the
directory block and hence avoid out-of-block references.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Ben Myers <bpm@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2013-10-30 13:57:14 -05:00
Dave Chinner b01ef655d8 xfs: make dir2 ftype offset pointers explicit
Rather than hiding the ftype field size accounting inside the dirent
padding for the ".." and first entry offset functions for v2
directory formats, add explicit functions that calculate it
correctly.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Ben Myers <bpm@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2013-10-30 13:52:38 -05:00
Dave Chinner 1c9a5b2e30 xfs: convert directory vector functions to constants
Many of the vectorised function calls now take no parameters and
return a constant value. There is no reason for these to be vectored
functions, so convert them to constants

Binary sizes:

   text    data     bss     dec     hex filename
 794490   96802    1096  892388   d9de4 fs/xfs/xfs.o.orig
 792986   96802    1096  890884   d9804 fs/xfs/xfs.o.p1
 792350   96802    1096  890248   d9588 fs/xfs/xfs.o.p2
 789293   96802    1096  887191   d8997 fs/xfs/xfs.o.p3
 789005   96802    1096  886903   d8997 fs/xfs/xfs.o.p4
 789061   96802    1096  886959   d88af fs/xfs/xfs.o.p5
 789733   96802    1096  887631   d8b4f fs/xfs/xfs.o.p6
 791421   96802    1096  889319   d91e7 fs/xfs/xfs.o.p7
 791701   96802    1096  889599   d92ff fs/xfs/xfs.o.p8
 791205   96802    1096  889103   d91cf fs/xfs/xfs.o.p9

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Ben Myers <bpm@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2013-10-30 13:49:18 -05:00
Dave Chinner 24dd0f546c xfs: convert directory vector functions to constants
Next step in the vectorisation process is the directory free block
encode/decode operations. There are relatively few of these, though
there are quite a number of calls to them.

Binary sizes:

   text    data     bss     dec     hex filename
 794490   96802    1096  892388   d9de4 fs/xfs/xfs.o.orig
 792986   96802    1096  890884   d9804 fs/xfs/xfs.o.p1
 792350   96802    1096  890248   d9588 fs/xfs/xfs.o.p2
 789293   96802    1096  887191   d8997 fs/xfs/xfs.o.p3
 789005   96802    1096  886903   d8997 fs/xfs/xfs.o.p4
 789061   96802    1096  886959   d88af fs/xfs/xfs.o.p5
 789733   96802    1096  887631   d8b4f fs/xfs/xfs.o.p6
 791421   96802    1096  889319   d91e7 fs/xfs/xfs.o.p7
 791701   96802    1096  889599   d92ff fs/xfs/xfs.o.p8

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Ben Myers <bpm@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2013-10-30 13:48:41 -05:00
Dave Chinner 01ba43b873 xfs: vectorise encoding/decoding directory headers
Conversion from on-disk structures to in-core header structures
currently relies on magic number checks. If the magic number is
wrong, but one of the supported values, we do the wrong thing with
the encode/decode operation. Split these functions so that there are
discrete operations for the specific directory format we are
handling.

In doing this, move all the header encode/decode functions to
xfs_da_format.c as they are directly manipulating the on-disk
format. It should be noted that all the growth in binary size is
from xfs_da_format.c - the rest of the code actaully shrinks.

   text    data     bss     dec     hex filename
 794490   96802    1096  892388   d9de4 fs/xfs/xfs.o.orig
 792986   96802    1096  890884   d9804 fs/xfs/xfs.o.p1
 792350   96802    1096  890248   d9588 fs/xfs/xfs.o.p2
 789293   96802    1096  887191   d8997 fs/xfs/xfs.o.p3
 789005   96802    1096  886903   d8997 fs/xfs/xfs.o.p4
 789061   96802    1096  886959   d88af fs/xfs/xfs.o.p5
 789733   96802    1096  887631   d8b4f fs/xfs/xfs.o.p6
 791421   96802    1096  889319   d91e7 fs/xfs/xfs.o.p7

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Ben Myers <bpm@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2013-10-30 13:47:22 -05:00
Dave Chinner 4bceb18f15 xfs: vectorise DA btree operations
The remaining non-vectorised code for the directory structure is the
node format blocks. This is shared with the attribute tree, and so
is slightly more complex to vectorise.

Introduce a "non-directory" directory ops structure that is attached
to all non-directory inodes so that attribute operations can be
vectorised for all inodes.

Once we do this, we can vectorise all the da btree operations.
Because this patch adds more infrastructure than it removes the
binary size does not decrease:

   text    data     bss     dec     hex filename
 794490   96802    1096  892388   d9de4 fs/xfs/xfs.o.orig
 792986   96802    1096  890884   d9804 fs/xfs/xfs.o.p1
 792350   96802    1096  890248   d9588 fs/xfs/xfs.o.p2
 789293   96802    1096  887191   d8997 fs/xfs/xfs.o.p3
 789005   96802    1096  886903   d8997 fs/xfs/xfs.o.p4
 789061   96802    1096  886959   d88af fs/xfs/xfs.o.p5
 789733   96802    1096  887631   d8b4f fs/xfs/xfs.o.p6

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Ben Myers <bpm@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2013-10-30 13:43:28 -05:00
Dave Chinner 4141956ae0 xfs: vectorise directory leaf operations
Next step in the vectorisation process is the leaf block
encode/decode operations. Most of the operations on leaves are
handled by the data block vectors, so there are relatively few of
them here.

Because of all the shuffling of code and having to pass more state
to some functions, this patch doesn't directly reduce the size of
the binary. It does open up many more opportunities for factoring
and optimisation, however.

   text    data     bss     dec     hex filename
 794490   96802    1096  892388   d9de4 fs/xfs/xfs.o.orig
 792986   96802    1096  890884   d9804 fs/xfs/xfs.o.p1
 792350   96802    1096  890248   d9588 fs/xfs/xfs.o.p2
 789293   96802    1096  887191   d8997 fs/xfs/xfs.o.p3
 789005   96802    1096  886903   d8997 fs/xfs/xfs.o.p4
 789061   96802    1096  886959   d88af fs/xfs/xfs.o.p5

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ben Myers <bpm@sgi.com>
2013-10-30 13:39:43 -05:00
Dave Chinner 2ca9877410 xfs: vectorise directory data operations part 2
Convert the rest of the directory data block encode/decode
operations to vector format.

This further reduces the size of the built binary:

   text    data     bss     dec     hex filename
 794490   96802    1096  892388   d9de4 fs/xfs/xfs.o.orig
 792986   96802    1096  890884   d9804 fs/xfs/xfs.o.p1
 792350   96802    1096  890248   d9588 fs/xfs/xfs.o.p2
 789293   96802    1096  887191   d8997 fs/xfs/xfs.o.p3
 789005   96802    1096  886903   d8997 fs/xfs/xfs.o.p4

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ben Myers <bpm@sgi.com>
2013-10-30 13:39:31 -05:00
Dave Chinner 9d23fc8575 xfs: vectorise directory data operations
Following from the initial patches to vectorise the shortform
directory encode/decode operations, convert half the data block
operations to use the vector. The rest will be done in a second
patch.

This further reduces the size of the built binary:

   text    data     bss     dec     hex filename
 794490   96802    1096  892388   d9de4 fs/xfs/xfs.o.orig
 792986   96802    1096  890884   d9804 fs/xfs/xfs.o.p1
 792350   96802    1096  890248   d9588 fs/xfs/xfs.o.p2
 789293   96802    1096  887191   d8997 fs/xfs/xfs.o.p3

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ben Myers <bpm@sgi.com>
2013-10-30 13:39:14 -05:00
Dave Chinner 4740175e75 xfs: vectorise remaining shortform dir2 ops
Following from the initial patch to introduce the directory
operations vector, convert the rest of the shortform directory
operations to use vectored ops rather than superblock feature
checks. This further reduces the size of the built binary:

   text    data     bss     dec     hex filename
 794490   96802    1096  892388   d9de4 fs/xfs/xfs.o.orig
 792986   96802    1096  890884   d9804 fs/xfs/xfs.o.p1
 792350   96802    1096  890248   d9588 fs/xfs/xfs.o.p2

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ben Myers <bpm@sgi.com>
2013-10-30 13:38:59 -05:00
Dave Chinner 32c5483a8a xfs: abstract the differences in dir2/dir3 via an ops vector
Lots of the dir code now goes through switches to determine what is
the correct on-disk format to parse. It generally involves a
"xfs_sbversion_hasfoo" check, deferencing the superblock version and
feature fields and hence touching several cache lines per operation
in the process. Some operations do multiple checks because they nest
conditional operations and they don't pass the information in a
direct fashion between each other.

Hence, add an ops vector to the xfs_inode structure that is
configured when the inode is initialised to point to all the correct
decode and encoding operations.  This will significantly reduce the
branchiness and cacheline footprint of the directory object decoding
and encoding.

This is the first patch in a series of conversion patches. It will
introduce the ops structure, the setup of it and add the first
operation to the vector. Subsequent patches will convert directory
ops one at a time to keep the changes simple and obvious.

Just this patch shows the benefit of such an approach on code size.
Just converting the two shortform dir operations as this patch does
decreases the built binary size by ~1500 bytes:

$ size fs/xfs/xfs.o.orig fs/xfs/xfs.o.p1
   text    data     bss     dec     hex filename
 794490   96802    1096  892388   d9de4 fs/xfs/xfs.o.orig
 792986   96802    1096  890884   d9804 fs/xfs/xfs.o.p1
$

That's a significant decrease in the instruction cache footprint of
the directory code for such a simple change, and indicates that this
approach is definitely worth pursuing further.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ben Myers <bpm@sgi.com>
2013-10-30 13:37:38 -05:00
Vladimir Zapolskiy 1c1365e374 sysfs: return correct error code on unimplemented mmap()
Both POSIX.1-2008 and Linux Programmer's Manual have a dedicated return
error code for a case, when a file doesn't support mmap(), it's ENODEV.

This change replaces overloaded EINVAL with ENODEV in a situation
described above for sysfs binary files.

Signed-off-by: Vladimir Zapolskiy <vladimir_zapolskiy@mentor.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-10-30 10:21:39 -07:00
Lukas Czerner 8f9ff18920 ext4: fix FITRIM in no journal mode
When using FITRIM ioctl on a file system without journal it will
only trim the block group once, no matter how many times you invoke
FITRIM ioctl and how many block you release from the block group.

It is because we only clear EXT4_GROUP_INFO_WAS_TRIMMED_BIT in journal
callback. Fix this by clearing the bit in no journal mode as well.

Signed-off-by: Lukas Czerner <lczerner@redhat.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Reported-by: Jorge Fábregas <jorge.fabregas@gmail.com>
2013-10-30 11:10:52 -04:00
Azat Khuzhin 5ba052fe33 ext4: drop set but otherwise unused variable from ext4_add_dirent_to_inline()
Signed-off-by: Azat Khuzhin <a3at.mail@gmail.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2013-10-30 10:53:10 -04:00
J. Bruce Fields e50a26dc78 nfsd4: nfsd_shutdown_net needs state lock
A comment claims the caller should take it, but that's not being done.
Note we don't want it around the cancel_delayed_work_sync since that may
wait on work which holds the client lock.

Reported-by: Benny Halevy <bhalevy@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2013-10-30 10:35:59 -04:00
Rafael J. Wysocki 59612d1879 Revert "select: use freezable blocking call"
This reverts commit 9745cdb36d (select: use freezable blocking call)
that triggers problems during resume from suspend to RAM on Paul Bolle's
32-bit x86 machines.  Paul says:

  Ever since I tried running (release candidates of) v3.11 on the two
  working i686s I still have lying around I ran into issues on resuming
  from suspend. Reverting 9745cdb36d (select: use freezable blocking
  call) resolves those issues.

  Resuming from suspend on i686 on (release candidates of) v3.11 and
  later triggers issues like:

  traps: systemd[1] general protection ip:b738e490 sp:bf882fc0 error:0 in libc-2.16.so[b731c000+1b0000]

  and

  traps: rtkit-daemon[552] general protection ip:804d6e5 sp:b6cb32f0 error:0 in rtkit-daemon[8048000+d000]

  Once I hit the systemd error I can only get out of the mess that the
  system is at that point by power cycling it.

Since we are reverting another freezer-related change causing similar
problems to happen, this one should be reverted as well.

References: https://lkml.org/lkml/2013/10/29/583
Reported-by: Paul Bolle <pebolle@tiscali.nl>
Fixes: 9745cdb36d (select: use freezable blocking call)
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: 3.11+ <stable@vger.kernel.org> # 3.11+
2013-10-30 15:28:35 +01:00
Rafael J. Wysocki c511851de1 Revert "epoll: use freezable blocking call"
This reverts commit 1c441e9212 (epoll: use freezable blocking call)
which is reported to cause user space memory corruption to happen
after suspend to RAM.

Since it appears to be extremely difficult to root cause this
problem, it is best to revert the offending commit and try to address
the original issue in a better way later.

References: https://bugzilla.kernel.org/show_bug.cgi?id=61781
Reported-by: Natrio <natrio@list.ru>
Reported-by: Jeff Pohlmeyer <yetanothergeek@gmail.com>
Bisected-by: Leo Wolf <jclw@ymail.com>
Fixes: 1c441e9212 (epoll: use freezable blocking call)
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: 3.11+ <stable@vger.kernel.org> # 3.11+
2013-10-30 15:27:53 +01:00
Anna Schumaker e1a90ebd8b NFSD: Combine decode operations for v4 and v4.1
We were using a different array of function pointers to represent each
minor version.  This makes adding a new minor version tedious, since it
needs a step to copy, paste and modify a new version of the same
functions.

This patch combines the v4 and v4.1 arrays into a single instance and
will check minor version support inside each decoder function.

Signed-off-by: Anna Schumaker <bjschuma@netapp.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2013-10-30 10:04:08 -04:00
BoxiLiu 48ffdab1c1 ext4: change ext4_read_inline_dir() to return 0 on success
In ext4_read_inline_dir(), if there is inline data, the successful
return value is the return value of ext4_read_inline_data().  Howewer,
this is used by ext4_readdir(), and while it seems harmless to return
a positive value on success, it's inconsistent, since historically
we've always return 0 on success.

Signed-off-by: BoxiLiu <lewis.liulei@huawei.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Acked-by: Tao Ma <boyu.mt@taobao.com>
2013-10-30 08:07:20 -04:00
Ming Lei bbf023c74d ext4: pair trace_ext4_writepages & trace_ext4_writepages_result
Pair the two trace events to make troubeshooting writepages
easier, and it should be more convinient to write a simple script
to parse the traces.

Cc: linux-ext4@vger.kernel.org
Cc: Jan Kara <jack@suse.cz>
Signed-off-by: Ming Lei <ming.lei@canonical.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2013-10-30 07:27:16 -04:00
Chao Yu 44c60bf2b9 f2fs: check all ones or zeros bitmap with bitops for better mount performance
Previously, check_block_count check valid_map with bit data type in common
scenario that sit has all ones or zeros bitmap, it makes low mount performance.
So let's check the special bitmap with integer data type instead of the bit one.

v1-->v2:
 o use find_next_{zero_}bit_le for better performance and readable as Jaegeuk
   suggested.
 o use neat logogram in comment as Gu Zheng suggested.
 o search continuous ones or zeros for better performance when checking mixed
   bitmap.

Suggested-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
Signed-off-by: Shu Tan <shu.tan@samsung.com>
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-10-30 12:23:23 +09:00
Fan Li 9a47938b22 f2fs: change the method of calculating the number summary blocks
npages_for_summary_flush uses (SUMMARY_SIZE + 1) as the size of a f2fs_summary
while its actual size is  SUMMARY_SIZE. So the result sometimes is bigger than
actual number by one, which causes checkpoint can't be written into disk
contiguously, and sometimes summary blocks can't be compacted like they should.
Besides, when writing summary blocks into pages, if remain space in a page
isn't big enough for one f2fs_summary, it will be left unused, current code
seems not to take it into account.

Signed-off-by: Fan Li <fanofcode.li@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-10-30 12:17:58 +09:00
Tejun Heo d1c1459e45 sysfs: separate out dup filename warning into a separate function
Separate out sysfs_warn_dup() out of sysfs_add_one().  This will help
separating out the core sysfs functionalities into kernfs so that it
can be used by non-sysfs users too.

This doesn't make any functional changes.

Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-10-29 15:12:07 -07:00
Tejun Heo 7eed6ecb07 sysfs: move sysfs_hash_and_remove() to fs/sysfs/dir.c
Most removal related logic is implemented in fs/sysfs/dir.c.  Move
sysfs_hash_and_remove() to fs/sysfs/dir.c so that __sysfs_remove()
doesn't have to be public.

This is pure relocation.

Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-10-29 15:12:07 -07:00
Tejun Heo baa97cb507 sysfs: remove unused sysfs_get_dentry() prototype
sysfs_get_dentry() has been gone for years now.  Remove the left-over
prototype.

Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-10-29 15:12:07 -07:00
Tejun Heo 672f76a81a sysfs: honor bin_attr.attr.ignore_lockdep
ignore_lockdep is currently honored only for regular files.  There's
no reason to ignore it for bin files.  Update sysfs_ignore_lockdep()
so that bin_attr.attr.ignore_lockdep works too.

While this doesn't have any in-kernel user, this unifies the behaviors
between regular and bin files and will help later changes.

Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-10-29 15:12:07 -07:00
Tejun Heo 56b3f3b884 sysfs: merge sysfs_elem_bin_attr into sysfs_elem_attr
3124eb1679 ("sysfs: merge regular and bin file handling") folded bin
file handling into regular file handling.  Among other things, bin
file now shares the same open path including sysfs_open_dirent
association using sysfs_dirent->s_attr.open.  This is buggy because
->s_bin_attr lives in the same union and doesn't have the field.  This
bug doesn't trigger because sysfs_elem_bin_attr doesn't have an active
field at the conflicting position.  It does have a field "buffers" but
it isn't used anymore.

This patch collapses sysfs_elem_bin_attr into sysfs_elem_attr so that
the bin_attr is accessed through ->s_attr.bin_attr which lives with
->s_attr.attr in an anonymous union.  The code paths already assume
bin_attr contains attr as the first element, so this doesn't add any
more assumptions while making it explicit that the two types are
handled together.

Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-10-29 15:12:06 -07:00
J. Bruce Fields 6f6cc3205c nfsd: -EINVAL on invalid anonuid/gid instead of silent failure
If we're going to refuse to accept these it would be polite of us to at
least say so....

This introduces a slight complication since we need to grandfather in
exportfs's ill-advised use of -1 uid and gid on its test_export.

If it turns out there are other users passing down -1 we may need to
do something else.

Best might be to drop the checks entirely, but I'm not sure if other
parts of the kernel might assume that a task can't run as uid or gid -1.

Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2013-10-29 17:46:14 -04:00
J. Bruce Fields 427d6c6646 nfsd: return better errors to exportfs
Someone noticed exportfs happily accepted exports that would later be
rejected when mountd tried to give them to the kernel.  Fix this.

This is a regression from 4c1e1b34d5
"nfsd: Store ex_anon_uid and ex_anon_gid as kuids and kgids".

Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: stable@vger.kernel.org
Reported-by: Yin.JianHong <jiyin@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2013-10-29 17:45:30 -04:00
J. Bruce Fields 49e7372063 nfsd: fh_update should error out in unexpected cases
The reporter saw a NULL dereference when a filesystem's ->mknod returned
success but left the dentry negative, and then nfsd tried to dereference
d_inode (in this case because the CREATE was followed by a GETATTR in
the same nfsv4 compound).

fh_update already checks for this and another broken case, but for some
reason it returns success and leaves nfsd trying to soldier on.  If it
failed we'd avoid the crash.  There's only so much we can do with a
buggy filesystem, but it's easy enough to bail out here, so let's do
that.

Reported-by: Antti Tönkyrä <daedalus@pingtimeout.net>
Tested-by: Antti Tönkyrä <daedalus@pingtimeout.net>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2013-10-29 17:43:52 -04:00
Benny Halevy 956c4fee44 nfsd4: need to destroy revoked delegations in destroy_client
[use list_splice_init]
Signed-off-by: Benny Halevy <bhalevy@primarydata.com>
[bfields: no need for recall_lock here]
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2013-10-29 12:00:48 -04:00
Benny Halevy 01a87d91fc nfsd: no need to unhash_stid before free
idr_remove is about to be called before kmem_cache_free so unhashing it
is redundant

Signed-off-by: Benny Halevy <bhalevy@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2013-10-29 10:03:27 -04:00
Chao Yu cc3de6a3ac f2fs: fix calculating incorrect free size when update xattr in __f2fs_setxattr
During xattr updating, free size should be corrected to remainder free size
+ old entry size.
It can avoid ENOSPC error when we update old entry with the same size new
entry at fully filled xattr.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-10-29 15:56:08 +09:00
Jaegeuk Kim 5d56b6718a f2fs: add an option to avoid unnecessary BUG_ONs
If you want to remove unnecessary BUG_ONs, you can just turn off F2FS_CHECK_FS
in your kernel config.

Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-10-29 15:44:38 +09:00
Jaegeuk Kim 3b218e3a21 f2fs: introduce CONFIG_F2FS_CHECK_FS for BUG_ON control
This config will support an option to remove so many BUG_ONs that degrade
the performance potentially.

Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-10-29 15:43:01 +09:00
Trond Myklebust c698dbf9fe Merge branch 'fscache' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs into linux-next
Pull fs-cache fixes from David Howells:

Can you pull these commits to fix an issue with NFS whereby caching can be
enabled on a file that is open for writing by subsequently opening it for
reading.  This can be made to crash by opening it for writing again if you're
quick enough.

The gist of the patchset is that the cookie should be acquired at inode
creation only and subsequently enabled and disabled as appropriate (which
dispenses with the backing objects when they're not needed).

The extra synchronisation that NFS does can then be dispensed with as it is
thenceforth managed by FS-Cache.

Could you send these on to Linus?

This likely will need fixing also in CIFS and 9P also once the FS-Cache
changes are upstream.  AFS and Ceph are probably safe.

* 'fscache' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs:
  NFS: Use i_writecount to control whether to get an fscache cookie in nfs_open()
  FS-Cache: Provide the ability to enable/disable cookies
  FS-Cache: Add use/unuse/wake cookie wrappers
2013-10-28 19:36:46 -04:00
J. Bruce Fields a3f432bfd0 nfs: use IS_ROOT not DCACHE_DISCONNECTED
This check was added by Al Viro with
d9e80b7de9 "nfs d_revalidate() is too
trigger-happy with d_drop()", with the explanation that we don't want to
remove the root of a disconnected tree, which will still be included on
the s_anon list.

But DCACHE_DISCONNECTED does *not* actually identify dentries that are
disconnected from the dentry tree or hashed on s_anon.  IS_ROOT() is the
way to do that.

Also add a comment from Al's commit to remind us why this check is
there.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-10-28 19:19:19 -04:00
Geyslan G. Bem 6706246b22 nfs: Use PTR_ERR_OR_ZERO in 'nfs/nfs4super.c'
Use 'PTR_ERR_OR_ZERO()' rather than 'IS_ERR(...) ? PTR_ERR(...) : 0'.

Signed-off-by: Geyslan G. Bem <geyslan@gmail.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-10-28 18:16:56 -04:00
Geyslan G. Bem 54bcfa6682 nfs: Use PTR_ERR_OR_ZERO in 'nfs41_callback_up' function
Use 'PTR_ERR_OR_ZERO()' rather than 'IS_ERR(...) ? PTR_ERR(...) : 0'.

Signed-off-by: Geyslan G. Bem <geyslan@gmail.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-10-28 18:16:55 -04:00
Geyslan G. Bem 4f5829d726 nfs: Remove useless 'error' assignment
the 'error' variable was been assigned twice in vain.

Signed-off-by: Geyslan G. Bem <geyslan@gmail.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-10-28 18:16:55 -04:00
Benny Halevy 7ebe40f203 nfsd: remove_stid can be incorporated into nfs4_put_delegation
All calls to nfs4_put_delegation are preceded with remove_stid.

Signed-off-by: Benny Halevy <bhalevy@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2013-10-28 15:58:32 -04:00
Benny Halevy 5d7dab83e3 nfsd: nfs4_open_delegation needs to remove_stid rather than unhash_stid
In the out_free: path, the newly allocated stid must be removed rather
than unhashed so it can never be found.

Signed-off-by: Benny Halevy <bhalevy@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2013-10-28 15:58:20 -04:00
Benny Halevy 9857df815f nfsd: nfs4_free_stid
Make it symmetric to nfs4_alloc_stid.

Signed-off-by: Benny Halevy <bhalevy@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2013-10-28 15:43:06 -04:00
Weston Andros Adamson 4d4b69dd84 NFS: add support for multiple sec= mount options
This patch adds support for multiple security options which can be
specified using a colon-delimited list of security flavors (the same
syntax as nfsd's exports file).

This is useful, for instance, when NFSv4.x mounts cross SECINFO
boundaries. With this patch a user can use "sec=krb5i,krb5p"
to mount a remote filesystem using krb5i, but can still cross
into krb5p-only exports.

New mounts will try all security options before failing.  NFSv4.x
SECINFO results will be compared against the sec= flavors to
find the first flavor in both lists or if no match is found will
return -EPERM.

Signed-off-by: Weston Andros Adamson <dros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-10-28 15:38:02 -04:00
Weston Andros Adamson 5837f6dfcb NFS: stop using NFS_MOUNT_SECFLAVOUR server flag
Since the parsed sec= flavor is now stored in nfs_server->auth_info,
we no longer need an nfs_server flag to determine if a sec= option was
used.

This flag has not been completely removed because it is still needed for
the (old but still supported) non-text parsed mount options ABI
compatability.

Signed-off-by: Weston Andros Adamson <dros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-10-28 15:37:56 -04:00
Weston Andros Adamson 0f5f49b8b3 NFS: cache parsed auth_info in nfs_server
Cache the auth_info structure in nfs_server and pass these values to submounts.

This lays the groundwork for supporting multiple sec= options.

Signed-off-by: Weston Andros Adamson <dros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-10-28 15:37:43 -04:00
Weston Andros Adamson a3f73c27af NFS: separate passed security flavs from selected
When filling parsed_mount_data, store the parsed sec= mount option in
the new struct nfs_auth_info and the chosen flavor in selected_flavor.

This patch lays the groundwork for supporting multiple sec= options.

Signed-off-by: Weston Andros Adamson <dros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-10-28 15:36:58 -04:00
Weston Andros Adamson 47fd88e6b7 NFSv4: make nfs_find_best_sec static
It's not used outside of nfs4namespace.c anymore.

Signed-off-by: Weston Andros Adamson <dros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-10-28 15:33:34 -04:00
Chuck Lever 0625c2dd6a NFS: Fix possible endless state recovery wait
In nfs4_wait_clnt_recover(), hold a reference to the clp being
waited on.  The state manager can reduce clp->cl_count to 1, in
which case the nfs_put_client() in nfs4_run_state_manager() can
free *clp before wait_on_bit() returns and allows
nfs4_wait_clnt_recover() to run again.

The behavior at that point is non-deterministic.  If the waited-on
bit still happens to be zero, wait_on_bit() will wake the waiter as
expected.  If the bit is set again (say, if the memory was poisoned
when freed) wait_on_bit() can leave the waiter asleep.

This is a narrow fix which ensures the safety of accessing *clp in
nfs4_wait_clnt_recover(), but does not address the continued use
of a possibly freed *clp after nfs4_wait_clnt_recover() returns
(see nfs_end_delegation_return(), for example).

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-10-28 15:31:55 -04:00
Chuck Lever cd3fadece2 NFS: Set EXCHGID4_FLAG_SUPP_MOVED_MIGR
Broadly speaking, v4.1 migration is untested.  There are no servers
in the wild that support NFSv4.1 migration.  However, as server
implementations become available, we do want to enable testing by
developers, while leaving it disabled for environments for which
broken migration support would be an unpleasant surprise.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-10-28 15:31:25 -04:00
Chuck Lever d1c2331e75 NFS: Handle SEQ4_STATUS_LEASE_MOVED
With the advent of NFSv4 sessions in NFSv4.1 and following, a "lease
moved" condition is reported differently than it is in NFSv4.0.

NFSv4 minor version 0 servers return an error status code,
NFS4ERR_LEASE_MOVED, to signal that a lease has moved.  This error
causes the whole compound operation to fail.  Normal compounds
against this server continue to fail until the client performs
migration recovery on the migrated share.

Minor version 1 and later servers assert a bit flag in the reply to
a compound's SEQUENCE operation to signal LEASE_MOVED.  This is not
a fatal condition: operations against this server continue normally.
The server asserts this flag until the client performs migration
recovery on the migrated share.

Note that servers MUST NOT return NFS4ERR_LEASE_MOVED to NFSv4
clients not using NFSv4.0.

After the server asserts any of the sr_status_flags in the SEQUENCE
operation in a typical compound, our client initiates standard lease
recovery.  For NFSv4.1+, a stand-alone SEQUENCE operation is
performed to discover what recovery is needed.

If SEQ4_STATUS_LEASE_MOVED is asserted in this stand-alone SEQUENCE
operation, our client attempts to discover which FSIDs have been
migrated, and then performs migration recovery on each.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-10-28 15:31:07 -04:00
Chuck Lever f8aba1e8d5 NFS: Handle NFS4ERR_LEASE_MOVED during async RENEW
With NFSv4 minor version 0, the asynchronous lease RENEW
heartbeat can return NFS4ERR_LEASE_MOVED.  Error recovery logic for
async RENEW is a separate code path from the generic NFS proc paths,
so it must be updated to handle NFS4ERR_LEASE_MOVED as well.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-10-28 15:30:52 -04:00
Chuck Lever 60ea681299 NFS: Migration support for RELEASE_LOCKOWNER
Currently the Linux NFS client ignores the operation status code for
the RELEASE_LOCKOWNER operation.  Like NFSv3's UMNT operation,
RELEASE_LOCKOWNER is a courtesy to help servers manage their
resources, and the outcome is not consequential for the client.

During a migration, a server may report NFS4ERR_LEASE_MOVED, in
which case the client really should retry, since typically
LEASE_MOVED has nothing to do with the current operation, but does
prevent it from going forward.

Also, it's important for a client to respond as soon as possible to
a moved lease condition, since the client's lease could expire on
the destination without further action by the client.

NFS4ERR_DELAY is not included in the list of valid status codes for
RELEASE_LOCKOWNER in RFC 3530bis.  However, rfc3530-migration-update
does permit migration-capable servers to return DELAY to clients,
but only in the context of an ongoing migration.  In this case the
server has frozen lock state in preparation for migration, and a
client retry would help the destination server purge unneeded state
once migration recovery is complete.

Interestly, NFS4ERR_MOVED is not valid for RELEASE_LOCKOWNER, even
though lock owners can be migrated with Transparent State Migration.

Note that RFC 3530bis section 9.5 includes RELEASE_LOCKOWNER in the
list of operations that renew a client's lease on the server if they
succeed.  Now that our client pays attention to the operation's
status code, we can note that renewal appropriately.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-10-28 15:30:46 -04:00
Chuck Lever 8ef2f8d46a NFS: Implement support for NFS4ERR_LEASE_MOVED
Trigger lease-moved recovery when a request returns
NFS4ERR_LEASE_MOVED.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-10-28 15:30:27 -04:00
Chuck Lever b7f7a66e42 NFS: Support NFS4ERR_LEASE_MOVED recovery in state manager
A migration on the FSID in play for the current NFS operation
is reported via the error status code NFS4ERR_MOVED.

"Lease moved" means that a migration has occurred on some other
FSID than the one for the current operation.  It's a signal that
the client should take action immediately to handle a migration
that it may not have noticed otherwise.  This is so that the
client's lease does not expire unnoticed on the destination server.

In NFSv4.0, a moved lease is reported with the NFS4ERR_LEASE_MOVED
error status code.

To recover from NFS4ERR_LEASE_MOVED, check each FSID for that server
to see if it is still present.  Invoke nfs4_try_migration() if the
FSID is no longer present on the server.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-10-28 15:30:21 -04:00
Chuck Lever 44c9993384 NFS: Add method to detect whether an FSID is still on the server
Introduce a mechanism for probing a server to determine if an FSID
is present or absent.

The on-the-wire compound is different between minor version 0 and 1.
Minor version 0 appends a RENEW operation to identify which client
ID is probing.  Minor version 1 has a SEQUENCE operation in the
compound which effectively carries the same information.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-10-28 15:30:03 -04:00
Chuck Lever 352297b917 NFS: Handle NFS4ERR_MOVED during delegation recall
When a server returns NFS4ERR_MOVED during a delegation recall,
trigger the new migration recovery logic in the state manager.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-10-28 15:25:30 -04:00
Chuck Lever 519ae255d4 NFS: Add migration recovery callouts in nfs4proc.c
When a server returns NFS4ERR_MOVED, trigger the new migration
recovery logic in the state manager.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-10-28 15:25:23 -04:00
Chuck Lever 9f51a78e3a NFS: Rename "stateid_invalid" label
I'm going to use this exit label also for migration recovery
failures.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-10-28 15:25:10 -04:00
Chuck Lever f1478c13c0 NFS: Re-use exit code in nfs4_async_handle_error()
Clean up.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-10-28 15:24:55 -04:00
Chuck Lever c9fdeb280b NFS: Add basic migration support to state manager thread
Migration recovery and state recovery must be serialized, so handle
both in the state manager thread.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-10-28 15:24:40 -04:00
Chuck Lever ce6cda1845 NFS: Add a super_block backpointer to the nfs_server struct
NFS_SB() returns the pointer to an nfs_server struct, given a
pointer to a super_block.  But we have no way to go back the other
way.

Add a super_block backpointer field so that, given an nfs_server
struct, it is easy to get to the filesystem's root dentry.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-10-28 15:24:26 -04:00
Chuck Lever b03d735b4c NFS: Add method to retrieve fs_locations during migration recovery
The nfs4_proc_fs_locations() function is invoked during referral
processing to perform a GETATTR(fs_locations) on an object's parent
directory in order to discover the target of the referral.  It
performs a LOOKUP in the compound, so the client needs to know the
parent's file handle a priori.

Unfortunately this function is not adequate for handling migration
recovery.  We need to probe fs_locations information on an FSID, but
there's no parent directory available for many operations that
can return NFS4ERR_MOVED.

Another subtlety: recovering from NFS4ERR_LEASE_MOVED is a process
of walking over a list of known FSIDs that reside on the server, and
probing whether they have migrated.  Once the server has detected
that the client has probed all migrated file systems, it stops
returning NFS4ERR_LEASE_MOVED.

A minor version zero server needs to know what client ID is
requesting fs_locations information so it can clear the flag that
forces it to continue returning NFS4ERR_LEASE_MOVED.  This flag is
set per client ID and per FSID.  However, the client ID is not an
argument of either the PUTFH or GETATTR operations.  Later minor
versions have client ID information embedded in the compound's
SEQUENCE operation.

Therefore, by convention, minor version zero clients send a RENEW
operation in the same compound as the GETATTR(fs_locations), since
RENEW's one argument is a clientid4.  This allows a minor version
zero server to identify correctly the client that is probing for a
migration.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-10-28 15:24:00 -04:00
Chuck Lever 9e6ee76dfb NFS: Export _nfs_display_fhandle()
Allow code in nfsv4.ko to use _nfs_display_fhandle().

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-10-28 15:23:35 -04:00
Chuck Lever ec011fe847 NFS: Introduce a vector of migration recovery ops
The differences between minor version 0 and minor version 1
migration will be abstracted by the addition of a set of migration
recovery ops.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-10-28 15:23:17 -04:00
Chuck Lever 800c06a5bf NFS: Add functions to swap transports during migration recovery
Introduce functions that can walk through an array of returned
fs_locations information and connect a transport to one of the
destination servers listed therein.

Note that NFS minor version 1 introduces "fs_locations_info" which
extends the locations array sorting criteria available to clients.
This is not supported yet.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-10-28 15:23:07 -04:00
Chuck Lever 32e62b7c3e NFS: Add nfs4_update_server
New function nfs4_update_server() moves an nfs_server to a different
nfs_client.  This is done as part of migration recovery.

Though it may be appealing to think of them as the same thing,
migration recovery is not the same as following a referral.

For a referral, the client has not descended into the file system
yet: it has no nfs_server, no super block, no inodes or open state.
It is enough to simply instantiate the nfs_server and super block,
and perform a referral mount.

For a migration, however, we have all of those things already, and
they have to be moved to a different nfs_client.  No local namespace
changes are needed here.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-10-28 15:22:29 -04:00
Weston Andros Adamson d2bfda2e7a NFSv4: don't reprocess cached open CLAIM_PREVIOUS
Cached opens have already been handled by _nfs4_opendata_reclaim_to_nfs4_state
and can safely skip being reprocessed, but must still call update_open_stateid
to make sure that all active fmodes are recovered.

Signed-off-by: Weston Andros Adamson <dros@netapp.com>
Cc: stable@vger.kernel.org # 3.7.x: f494a6071d3: NFSv4: fix NULL dereference
Cc: stable@vger.kernel.org # 3.7.x: a43ec98b72a: NFSv4: don't fail on missin
Cc: stable@vger.kernel.org # 3.7.x
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-10-28 15:10:56 -04:00
Trond Myklebust d49f042aee NFSv4: Fix state reference counting in _nfs4_opendata_reclaim_to_nfs4_state
Currently, if the call to nfs_refresh_inode fails, then we end up leaking
a reference count, due to the call to nfs4_get_open_state.
While we're at it, replace nfs4_get_open_state with a simple call to
atomic_inc(); there is no need to do a full lookup of the struct nfs_state
since it is passed as an argument in the struct nfs4_opendata, and
is already assigned to the variable 'state'.

Cc: stable@vger.kernel.org # 3.7.x: a43ec98b72a: NFSv4: don't fail on missing
Cc: stable@vger.kernel.org # 3.7.x
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-10-28 14:57:12 -04:00
Weston Andros Adamson a43ec98b72 NFSv4: don't fail on missing fattr in open recover
This is an unneeded check that could cause the client to fail to recover
opens.

Signed-off-by: Weston Andros Adamson <dros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-10-28 14:54:03 -04:00
Weston Andros Adamson f494a6071d NFSv4: fix NULL dereference in open recover
_nfs4_opendata_reclaim_to_nfs4_state doesn't expect to see a cached
open CLAIM_PREVIOUS, but this can happen. An example is when there are
RDWR openers and RDONLY openers on a delegation stateid. The recovery
path will first try an open CLAIM_PREVIOUS for the RDWR openers, this
marks the delegation as not needing RECLAIM anymore, so the open
CLAIM_PREVIOUS for the RDONLY openers will not actually send an rpc.

The NULL dereference is due to _nfs4_opendata_reclaim_to_nfs4_state
returning PTR_ERR(rpc_status) when !rpc_done. When the open is
cached, rpc_done == 0 and rpc_status == 0, thus
_nfs4_opendata_reclaim_to_nfs4_state returns NULL - this is unexpected
by callers of nfs4_opendata_to_nfs4_state().

This can be reproduced easily by opening the same file two times on an
NFSv4.0 mount with delegations enabled, once as RDWR and once as RDONLY then
sleeping for a long time.  While the files are held open, kick off state
recovery and this NULL dereference will be hit every time.

An example OOPS:

[   65.003602] BUG: unable to handle kernel NULL pointer dereference at 00000000
00000030
[   65.005312] IP: [<ffffffffa037d6ee>] __nfs4_close+0x1e/0x160 [nfsv4]
[   65.006820] PGD 7b0ea067 PUD 791ff067 PMD 0
[   65.008075] Oops: 0000 [#1] SMP
[   65.008802] Modules linked in: rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache
snd_ens1371 gameport nfsd snd_rawmidi snd_ac97_codec ac97_bus btusb snd_seq snd
_seq_device snd_pcm ppdev bluetooth auth_rpcgss coretemp snd_page_alloc crc32_pc
lmul crc32c_intel ghash_clmulni_intel microcode rfkill nfs_acl vmw_balloon serio
_raw snd_timer lockd parport_pc e1000 snd soundcore parport i2c_piix4 shpchp vmw
_vmci sunrpc ata_generic mperf pata_acpi mptspi vmwgfx ttm scsi_transport_spi dr
m mptscsih mptbase i2c_core
[   65.018684] CPU: 0 PID: 473 Comm: 192.168.10.85-m Not tainted 3.11.2-201.fc19
.x86_64 #1
[   65.020113] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop
Reference Platform, BIOS 6.00 07/31/2013
[   65.022012] task: ffff88003707e320 ti: ffff88007b906000 task.ti: ffff88007b906000
[   65.023414] RIP: 0010:[<ffffffffa037d6ee>]  [<ffffffffa037d6ee>] __nfs4_close+0x1e/0x160 [nfsv4]
[   65.025079] RSP: 0018:ffff88007b907d10  EFLAGS: 00010246
[   65.026042] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
[   65.027321] RDX: 0000000000000050 RSI: 0000000000000001 RDI: 0000000000000000
[   65.028691] RBP: ffff88007b907d38 R08: 0000000000016f60 R09: 0000000000000000
[   65.029990] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000001
[   65.031295] R13: 0000000000000050 R14: 0000000000000000 R15: 0000000000000001
[   65.032527] FS:  0000000000000000(0000) GS:ffff88007f600000(0000) knlGS:0000000000000000
[   65.033981] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   65.035177] CR2: 0000000000000030 CR3: 000000007b27f000 CR4: 00000000000407f0
[   65.036568] Stack:
[   65.037011]  0000000000000000 0000000000000001 ffff88007b907d90 ffff88007a880220
[   65.038472]  ffff88007b768de8 ffff88007b907d48 ffffffffa037e4a5 ffff88007b907d80
[   65.039935]  ffffffffa036a6c8 ffff880037020e40 ffff88007a880000 ffff880037020e40
[   65.041468] Call Trace:
[   65.042050]  [<ffffffffa037e4a5>] nfs4_close_state+0x15/0x20 [nfsv4]
[   65.043209]  [<ffffffffa036a6c8>] nfs4_open_recover_helper+0x148/0x1f0 [nfsv4]
[   65.044529]  [<ffffffffa036a886>] nfs4_open_recover+0x116/0x150 [nfsv4]
[   65.045730]  [<ffffffffa036d98d>] nfs4_open_reclaim+0xad/0x150 [nfsv4]
[   65.046905]  [<ffffffffa037d979>] nfs4_do_reclaim+0x149/0x5f0 [nfsv4]
[   65.048071]  [<ffffffffa037e1dc>] nfs4_run_state_manager+0x3bc/0x670 [nfsv4]
[   65.049436]  [<ffffffffa037de20>] ? nfs4_do_reclaim+0x5f0/0x5f0 [nfsv4]
[   65.050686]  [<ffffffffa037de20>] ? nfs4_do_reclaim+0x5f0/0x5f0 [nfsv4]
[   65.051943]  [<ffffffff81088640>] kthread+0xc0/0xd0
[   65.052831]  [<ffffffff81088580>] ? insert_kthread_work+0x40/0x40
[   65.054697]  [<ffffffff8165686c>] ret_from_fork+0x7c/0xb0
[   65.056396]  [<ffffffff81088580>] ? insert_kthread_work+0x40/0x40
[   65.058208] Code: 5c 41 5d 5d c3 0f 1f 84 00 00 00 00 00 66 66 66 66 90 55 48 89 e5 41 57 41 89 f7 41 56 41 89 ce 41 55 41 89 d5 41 54 53 48 89 fb <4c> 8b 67 30 f0 41 ff 44 24 44 49 8d 7c 24 40 e8 0e 0a 2d e1 44
[   65.065225] RIP  [<ffffffffa037d6ee>] __nfs4_close+0x1e/0x160 [nfsv4]
[   65.067175]  RSP <ffff88007b907d10>
[   65.068570] CR2: 0000000000000030
[   65.070098] ---[ end trace 0d1fe4f5c7dd6f8b ]---

Cc: <stable@vger.kernel.org> #3.7+
Signed-off-by: Weston Andros Adamson <dros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-10-28 14:53:32 -04:00
Trond Myklebust 83c78eb042 NFSv4.1: Don't change the security label as part of open reclaim.
The current caching model calls for the security label to be set on
first lookup and/or on any subsequent label changes. There is no
need to do it as part of an open reclaim.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-10-28 14:50:38 -04:00
Jeff Layton 1966903f8e nfs: fix handling of invalid mount options in nfs_remount
nfs_parse_mount_options returns 0 on error, not -errno.

Reported-by: Karel Zak <kzak@redhat.com>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-10-28 14:35:07 -04:00
Jeff Layton 57acc40d73 nfs: reject version and minorversion changes on remount attempts
Reported-by: Eric Doutreleau <edoutreleau@genoscope.cns.fr>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-10-28 14:30:23 -04:00
Andy Adamson 3660cd4322 NFSv4 Remove zeroing state kern warnings
As of commit 5d422301f9 we no longer zero the
state.

Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-10-28 14:28:53 -04:00
Tim Gardner 944d6f1a5b cifs: Remove redundant multiplex identifier check from check_smb_hdr()
The only call site for check_smb_header() assigns 'mid' from the SMB
packet, which is then checked again in check_smb_header(). This seems
like redundant redundancy.

Reviewed-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Tim Gardner <timg@tpi.com>
Signed-off-by: Steve French <smfrench@gmail.com>
2013-10-28 09:31:36 -05:00
Steve French 34f626406c Query file system attributes from server on SMB2, not just cifs, mounts
Currently SMB2 and SMB3 mounts do not query the file system attributes
from the server at mount time as is done for cifs.  These can be useful for debugging.

Signed-off-by: Steve French <smfrench@gmail.com>
2013-10-28 09:22:55 -05:00
Steve French 64a5cfa6db Allow setting per-file compression via SMB2/3
Allow cifs/smb2/smb3 to return whether or not a file is compressed
via lsattr, and allow SMB2/SMB3 to set the per-file compression
flag ("chattr +c filename" on an smb3 mount).

Windows users often set the compressed flag (it can be
done from the desktop and file manager).  David Disseldorp
has patches to Samba server to support this (at least on btrfs)
which are complementary to this

Signed-off-by: Steve French <smfrench@gmail.com>
2013-10-28 09:22:31 -05:00
Steve French 7ff8d45c9d Fix corrupt SMB2 ioctl requests
We were off by one calculating the length of ioctls in some cases
because the protocol specification for SMB2 ioctl includes a mininum
one byte payload but not all SMB2 ioctl requests actually have
a data buffer to send. We were also not zeroing out the
return buffer (in case of error this is helpful).

Signed-off-by: Steve French <smfrench@gmail.com>
2013-10-28 09:21:36 -05:00
Jaegeuk Kim 2ed2d5b33c f2fs: fix a deadlock during init_acl procedure
The deadlock is found through the following scenario.

sys_mkdir()
 -> f2fs_add_link()
  -> __f2fs_add_link()
   -> init_inode_metadata()
     : lock_page(inode);
    -> f2fs_init_acl()
     -> f2fs_set_acl()
      -> f2fs_setxattr(..., NULL)
       : This NULL page incurs a deadlock at update_inode_page().

So, likewise f2fs_init_security(), this patch adds a parameter to transfer the
locked inode page to f2fs_setxattr().

Found by Linux File System Verification project (linuxtesting.org).

Reported-by: Alexey Khoroshilov <khoroshilov@ispras.ru>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-10-28 13:39:09 +09:00
Jaegeuk Kim b8b60e1a65 f2fs: clean up acl flow for better readability
This patch cleans up a couple of acl codes.

Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-10-28 13:38:21 +09:00
Changman Lee 4625d6aac2 f2fs: remove unnecessary segment bitmap updates
Only one dirty type is set in __locate_dirty_segment and we can know
dirty type of segment. So we don't need to check other dirty types.

Signed-off-by: Changman Lee <cm224.lee@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-10-28 13:38:16 +09:00
Huang Shijie e104f1e9da jffs2: do not support the MLC nand
We should not support the MLC nand for jffs2. So if the nand type is
MLC, we quit immediatly.

Signed-off-by: Huang Shijie <b32955@freescale.com>
Signed-off-by: Brian Norris <computersforpeace@gmail.com>
2013-10-27 16:27:07 -07:00
Christoph Hellwig cce6de908e nfsd: fix Kconfig syntax
The description text for CONFIG_NFSD_V4_SECURITY_LABEL has an unpaired
quote sign which breaks syntax highlighting for the nfsd Kconfig file.
Remove it.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2013-10-26 15:37:26 -04:00
Mats Kärrman 58a4e23703 UBIFS: correct data corruption range
With power-cut emulation, it is possible that sometimes no data at all is
corrupted and that confusing messages are printed due to errors in the
computation of data corruption range.

[1] The start of the range should be [0..len-1], not [0..len].
[2] The end of the range should always be at least 1 greater than the start.

Signed-off-by: Mats Karrman <mats.karrman@tritech.se>
Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
2013-10-26 11:33:38 +01:00
Wei Yongjun 7203db97b7 UBIFS: fix return code
Fix to return -ENOMEM in the kmalloc() and d_make_root() error handling
case instead of 0, as done elsewhere in those functions.

Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
2013-10-26 11:11:59 +01:00
Linus Torvalds f55ac56d5e Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull vfs fixes (try two) from Al Viro:
 "nfsd performance regression fix + seq_file lseek(2) fix"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  seq_file: always update file->f_pos in seq_lseek()
  nfsd regression since delayed fput()
2013-10-25 18:16:47 +01:00
Gu Zheng 05e16745c0 seq_file: always update file->f_pos in seq_lseek()
This issue was first pointed out by Jiaxing Wang several months ago, but no
further comments:
https://lkml.org/lkml/2013/6/29/41

As we know pread() does not change f_pos, so after pread(), file->f_pos
and m->read_pos become different. And seq_lseek() does not update file->f_pos
if offset equals to m->read_pos, so after pread() and seq_lseek()(lseek to
m->read_pos), then a subsequent read may read from a wrong position, the
following program produces the problem:

    char str1[32] = { 0 };
    char str2[32] = { 0 };
    int poffset = 10;
    int count = 20;

    /*open any seq file*/
    int fd = open("/proc/modules", O_RDONLY);

    pread(fd, str1, count, poffset);
    printf("pread:%s\n", str1);

    /*seek to where m->read_pos is*/
    lseek(fd, poffset+count, SEEK_SET);

    /*supposed to read from poffset+count, but this read from position 0*/
    read(fd, str2, count);
    printf("read:%s\n", str2);

out put:
pread:
 ck_netbios_ns 12665
read:
 nf_conntrack_netbios

/proc/modules:
nf_conntrack_netbios_ns 12665 0 - Live 0xffffffffa038b000
nf_conntrack_broadcast 12589 1 nf_conntrack_netbios_ns, Live 0xffffffffa0386000

So we always update file->f_pos to offset in seq_lseek() to fix this issue.

Signed-off-by: Jiaxing Wang <hello.wjx@gmail.com>
Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-10-25 10:46:40 -04:00
Jaegeuk Kim e943a10d94 f2fs: add tracepoint for vm_page_mkwrite
This patch adds a tracepoint for f2fs_vm_page_mkwrite.

Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-10-25 16:54:40 +09:00
Jaegeuk Kim 26c6b88799 f2fs: add tracepoint for set_page_dirty
This patch adds a tracepoint for set_page_dirty.

Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-10-25 16:54:40 +09:00
Chao Yu e8d61a7488 f2fs: remove redundant set_page_dirty from write_compacted_summaries
Previously, set_page_dirty is called every time after writting one summary info
into compacted summary page,
To avoid redundant set_page_dirty, we only call set_page_dirty before release
page.

Signed-off-by: Yu Chao <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-10-25 16:54:39 +09:00
Jaegeuk Kim ea91e9b043 f2fs: add reclaiming control by sysfs
This patch adds a control method in sysfs to reclaim prefree segments.

Signed-off-by: Changman Lee <cm224.lee@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-10-25 16:54:39 +09:00
Jaegeuk Kim 4660f9c0fe f2fs: introduce f2fs_balance_fs_bg for some background jobs
This patch merges some background jobs into this new function.

Signed-off-by: Changman Lee <cm224.lee@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-10-25 16:54:38 +09:00
Jaegeuk Kim 81eb8d6e28 f2fs: reclaim prefree segments periodically
Previously, f2fs postpones reclaiming prefree segments into free segments
as much as possible.
However, if user writes and deletes a bunch of data without any sync or fsync
calls, some flash storages can suffer from garbage collections.

So, this patch adds the reclaiming codes to f2fs_write_node_pages and background
GC thread.

If there are a lot of prefree segments, let's do checkpoint so that f2fs
submits discard commands for the prefree regions to the flash storage.

Signed-off-by: Changman Lee <cm224.lee@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-10-25 16:54:37 +09:00
Haicheng Li aabe51364f f2fs: use bool for booleans
Signed-off-by: Haicheng Li <haicheng.li@linux.intel.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-10-25 16:54:37 +09:00
Jaegeuk Kim dcdfff6527 f2fs: clean up several status-related operations
This patch cleans up improper definitions that update some status information.

Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-10-25 16:54:08 +09:00
Linus Torvalds 88829dfe4b Two important fixes
- Fix long standing memory leak in the (rarely used) public key support
 - Fix large file corruption on 32 bit architectures
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.14 (GNU/Linux)
 
 iQIcBAABCgAGBQJSaX/HAAoJENaSAD2qAscKdpQQAI6Rvsv5y/Gj+8/9rCUnNYhw
 8YWYkOko2+cyGl6ro+nIm2nmKOuaGrjijvubOjOAe4WkMzS0EyJjku/9NT3S6KzC
 SqHC0ZeZf0jaFC9zUkUN69RY9m96Ak94HAagXO3Qm39DCSj8xijxODOVnVzkEs2x
 ylOU8OgRbD/AIDzmLxgHaOtuAmQ0GNvbVoYK6ZErVmOMENU2/67iH3OsyGD4OFpr
 Oaq1i8m7rxPmwv3QNSGhXSK6EScqs2jgM4aPWx3aG+OhYv6sGWkL8jJgPS/uSUBc
 ttD1Ou/d9yyvZPDFd9wmiHhenbCVbEdl6JAIS8zKv4NkSQ3V7AVWwAoe6JMfbREo
 U+Om7FwGLgKlZ/19+IxBMGTITuOjUkKq97vJMiYbXuWzdrZSflv5GiGGKbxchmnA
 CnfYaN1HYVcpLsbXoDTBomML7VTtbifgmY0diUJ2aJ1eTg86Gs1DXjhnuLF70Jjd
 dfuYfOKkJguuRfZ50yrpWfEQ0iOudXI1v+PrramLof33lNKWI8XeKjgDxyUrAjOZ
 UjFT639EXIRzYDIOCPZicQKdNO3BRziKi1cSnXQQp9cNTMs6/FIxK2zrQmjgqvww
 Hwj+M6czLs45lbfjQIxi3FlEAYYdXBQwrEiAu4cmt9j1bxIZnwIa7Fu0bXSxphfD
 dUo0GN7CkF45BkNvotFX
 =74EV
 -----END PGP SIGNATURE-----

Merge tag 'ecryptfs-3.12-rc7-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tyhicks/ecryptfs

Pull ecryptfs fixes from Tyler Hicks:
 "Two important fixes
   - Fix long standing memory leak in the (rarely used) public key
     support
   - Fix large file corruption on 32 bit architectures"

* tag 'ecryptfs-3.12-rc7-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tyhicks/ecryptfs:
  eCryptfs: fix 32 bit corruption issue
  ecryptfs: Fix memory leakage in keystore.c
2013-10-25 07:32:01 +01:00
Ming Lei b9c0622516 sysfs: fix sysfs_write_file for bin file
Before patch(sysfs: prepare path write for unified regular / bin
file handling), when size of bin file is zero, writting still can
continue, but this patch changes the behaviour.

The worse thing is that firmware loader is broken by this patch,
and user space application can't write to firmware bin file any more
because both firmware loader and drivers can't know at advance how
large the firmware file is and have to set its initialized size as
zero.

This patch fixes the problem and keeps behaviour of writting to bin
as before.

Reported-by: Lothar Waßmann <LW@karo-electronics.de>
Tested-by: Lothar Waßmann <LW@karo-electronics.de>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Ming Lei <ming.lei@canonical.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-10-25 05:46:27 +01:00
Al Viro dd3e2c55a4 fuse: rcu-delay freeing fuse_conn
makes ->permission() and ->d_revalidate() safety in RCU mode independent
from vfsmount_lock.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-10-24 23:45:13 -04:00
Al Viro 1dcddd4abd ncpfs: rcu-delay unload_nls() and freeing ncp_server
makes ->d_hash() and ->d_compare() safety in RCU mode independent
from vfsmount_lock.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-10-24 23:43:28 -04:00
Al Viro cac45b062c fat: rcu-delay unloading nls and freeing sbi
makes ->d_hash() and ->d_compare() safety in RCU mode independent
from vfsmount_lock.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-10-24 23:43:28 -04:00
Al Viro 2e32cf5ef2 cifs: rcu-delay unload_nls() and freeing sbi
makes ->d_hash(), ->d_compare() and ->permission() safety in RCU mode
independent from vfsmount_lock.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-10-24 23:43:27 -04:00
Al Viro baa40671d3 autofs4: make freeing sbi rcu-delayed
makes ->d_managed() safety in RCU mode independent from vfsmount_lock

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-10-24 23:43:27 -04:00
Al Viro 2d1d9b5b5c adfs: delayed freeing of sbi
makes ->d_hash() and ->d_compare() safety in RCU mode independent
from vfsmount_lock.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-10-24 23:43:27 -04:00
Al Viro 30687e0a47 hpfs: make freeing sbi and codetables rcu-delayed
makes ->d_hash() and ->d_compare() safety in RCU mode independent
from vfsmount_lock

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-10-24 23:43:26 -04:00
Al Viro e2fec7c355 make freeing super_block rcu-delayed
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-10-24 23:43:26 -04:00
Miklos Szeredi b70a80e7a1 vfs: introduce d_instantiate_no_diralias()
...which just returns -EBUSY if a directory alias would be created.

This is to be used by fuse mkdir to make sure that a buggy or malicious
userspace filesystem doesn't do anything nasty.  Previously fuse used a
private mutex for this purpose, which can now go away.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
2013-10-24 23:41:37 -04:00
Al Viro 94e92a6e77 move taking vfsmount_lock down into prepend_path()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-10-24 23:35:01 -04:00
Al Viro 474279dc0f split __lookup_mnt() in two functions
Instead of passing the direction as argument (and checking it on every
step through the hash chain), just have separate __lookup_mnt() and
__lookup_mnt_last().  And use the standard iterators...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-10-24 23:35:00 -04:00
Al Viro 7eb5e88269 uninline destroy_super(), consolidate alloc_super()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-10-24 23:35:00 -04:00
Al Viro 966c1f75f8 isofs: don't pass dentry to isofs_hash{i,}_common()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-10-24 23:34:59 -04:00
Al Viro 719ea2fbb5 new helpers: lock_mount_hash/unlock_mount_hash
aka br_write_{lock,unlock} of vfsmount_lock.  Inlines in fs/mount.h,
vfsmount_lock extern moved over there as well.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-10-24 23:34:59 -04:00
Al Viro aab407fc5c don't bother with vfsmount_lock in mounts_poll()
wake_up_interruptible/poll_wait provide sufficient barriers;
just use ACCESS_ONCE() to fetch ns->event and that's it.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-10-24 23:34:59 -04:00
Al Viro aba809cf09 namespace.c: get rid of mnt_ghosts
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-10-24 23:34:58 -04:00
Al Viro 9559f68915 fold dup_mnt_ns() into its only surviving caller
should've been done 6 years ago...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-10-24 23:34:58 -04:00
Al Viro f6b742d869 mnt_set_expiry() doesn't need vfsmount_lock
->mnt_expire is protected by namespace_sem

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-10-24 23:34:57 -04:00
Al Viro 22a7919299 finish_automount() doesn't need vfsmount_lock for removal from expiry list
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-10-24 23:34:57 -04:00
Al Viro 085e83ff0c fs/namespace.c: bury long-dead define
MNT_WRITER_UNDERFLOW_LIMIT has been missed 4 years ago when it became unused.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-10-24 23:34:57 -04:00
Al Viro 649a795aff fold mntfree() into mntput_no_expire()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-10-24 23:34:56 -04:00
Al Viro 6339dab869 do_remount(): pull touch_mnt_namespace() up
... and don't bother with dropping and regaining vfsmount_lock

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-10-24 23:34:56 -04:00
Al Viro aa7a574d0c dup_mnt_ns(): get rid of pointless grabbing of vfsmount_lock
mnt_list is protected by namespace_sem, not vfsmount_lock

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-10-24 23:34:55 -04:00
Al Viro 44bb4385ce fs_is_visible only needs namespace_sem held shared
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-10-24 23:34:55 -04:00
Al Viro 59aa0da8e2 initialize namespace_sem statically
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-10-24 23:34:54 -04:00
Al Viro 72c2d53192 file->f_op is never NULL...
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-10-24 23:34:54 -04:00
Al Viro e84f9e57b9 consolidate the reassignments of ->f_op in ->open() instances
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-10-24 23:34:53 -04:00
Al Viro 7b00ed6fe6 put_mnt_ns(): use drop_collected_mounts()
... rather than open-coding it

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-10-24 23:34:52 -04:00
Al Viro 84eb3532b5 ncpfs: switch to %p[dD]
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-10-24 23:34:52 -04:00
Al Viro 4cb2a01d8c ubifs: switch to %pd
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-10-24 23:34:51 -04:00
Al Viro a6a9f18f0a nfsd: switch to %p[dD]
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-10-24 23:34:51 -04:00
Al Viro 6de1472f1a nfs: use %p[dD] instead of open-coded (and often racy) equivalents
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-10-24 23:34:50 -04:00
Al Viro 48bc06e74b befs: split symlink iops in two - for short and long symlinks resp.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-10-24 23:34:50 -04:00
Al Viro 87dc800be2 new helper: kfree_put_link()
duplicated to hell and back...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-10-24 23:34:49 -04:00
Al Viro 12f3887222 libfs: get exports to definitions of objects being exported...
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-10-24 23:34:49 -04:00
Al Viro cbe9c08524 ecryptfs: ->lower_path.dentry is never NULL
... on anything found via ->d_fsdata

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-10-24 23:34:48 -04:00
Al Viro 92dd123033 ecryptfs: get rid of ecryptfs_set_dentry_lower{,_mnt}
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-10-24 23:34:48 -04:00
Al Viro 2edbfbf1c1 ecryptfs: don't leave RCU pathwalk immediately
If the underlying dentry doesn't have ->d_revalidate(), there's no need to
force dropping out of RCU mode.  All we need for that is to make freeing
ecryptfs_dentry_info RCU-delayed.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-10-24 23:34:48 -04:00
Al Viro 3a93e17cf6 ecryptfs: check DCACHE_OP_REVALIDATE instead of ->d_op
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-10-24 23:34:47 -04:00
Al Viro ceaec15d49 9p: make v9fs_cache_inode_{get,put,set}_cookie empty inlines for !9P_CACHEFS
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-10-24 23:34:47 -04:00
Colin Ian King 43b7c6c6a4 eCryptfs: fix 32 bit corruption issue
Shifting page->index on 32 bit systems was overflowing, causing
data corruption of > 4GB files. Fix this by casting it first.

https://launchpad.net/bugs/1243636

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Reported-by: Lars Duesing <lars.duesing@camelotsweb.de>
Cc: stable@vger.kernel.org # v3.11+
Signed-off-by: Tyler Hicks <tyhicks@canonical.com>
2013-10-24 12:36:30 -07:00
Dave Chinner c963c6193a xfs: split xfs_rtalloc.c for userspace sanity
xfs_rtalloc.c is partially shared with userspace. Split the file up
into two parts - one that is kernel private and the other which is
wholly shared with userspace.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ben Myers <bpm@sgi.com>
2013-10-23 17:16:32 -05:00
Dave Chinner a4fbe6ab1e xfs: decouple inode and bmap btree header files
Currently the xfs_inode.h header has a dependency on the definition
of the BMAP btree records as the inode fork includes an array of
xfs_bmbt_rec_host_t objects in it's definition.

Move all the btree format definitions from xfs_btree.h,
xfs_bmap_btree.h, xfs_alloc_btree.h and xfs_ialloc_btree.h to
xfs_format.h to continue the process of centralising the on-disk
format definitions. With this done, the xfs inode definitions are no
longer dependent on btree header files.

The enables a massive culling of unnecessary includes, with close to
200 #include directives removed from the XFS kernel code base.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Ben Myers <bpm@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2013-10-23 16:28:49 -05:00
Dave Chinner 239880ef64 xfs: decouple log and transaction headers
xfs_trans.h has a dependency on xfs_log.h for a couple of
structures. Most code that does transactions doesn't need to know
anything about the log, but this dependency means that they have to
include xfs_log.h. Decouple the xfs_trans.h and xfs_log.h header
files and clean up the includes to be in dependency order.

In doing this, remove the direct include of xfs_trans_reserve.h from
xfs_trans.h so that we remove the dependency between xfs_trans.h and
xfs_mount.h. Hence the xfs_trans.h include can be moved to the
indicate the actual dependencies other header files have on it.

Note that these are kernel only header files, so this does not
translate to any userspace changes at all.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Ben Myers <bpm@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2013-10-23 16:17:44 -05:00
Dave Chinner d420e5c810 xfs: remove unused transaction callback variables
We don't do callbacks at transaction commit time, no do we have any
infrastructure to set up or run such callbacks, so remove the
variables and typedefs for these operations. If we ever need to add
callbacks, we can reintroduce the variables at that time.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Ben Myers <bpm@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2013-10-23 14:30:51 -05:00
Dave Chinner 9aede1d81b xfs: split dquot buffer operations out
Parts of userspace want to be able to read and modify dquot buffers
(e.g. xfs_db) so we need to split out the reading and writing of
these buffers so it is easy to shared code with libxfs in userspace.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ben Myers <bpm@sgi.com>
2013-10-23 14:28:35 -05:00
Dave Chinner 5706278758 xfs: unify directory/attribute format definitions
The on-disk format definitions for the directory and attribute
structures are spread across 3 header files right now, only one of
which is dedicated to defining on-disk structures and their
manipulation (xfs_dir2_format.h). Pull all the format definitions
into a single header file - xfs_da_format.h - and switch all the
code over to point at that.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Ben Myers <bpm@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2013-10-23 14:21:40 -05:00
Dave Chinner 70a9883c5f xfs: create a shared header file for format-related information
All of the buffer operations structures are needed to be exported
for xfs_db, so move them all to a common location rather than
spreading them all over the place. They are verifying the on-disk
format, so while xfs_format.h might be a good place, it is not part
of the on disk format.

Hence we need to create a new header file that we centralise these
related definitions. Start by moving the bffer operations
structures, and then also move all the other definitions that have
crept into xfs_log_format.h and xfs_format.h as there was no other
shared header file to put them in.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ben Myers <bpm@sgi.com>
2013-10-23 14:11:30 -05:00
wang.bo116@zte.com.cn e71d1a59e7 UBIFS: remove unnecessary code in ubifs_garbage_collect
In ubifs_garbage_collect,local variable "space_before" calculate twice. In
fact, at the beginning of the loop, there is no need to calculate this
variable. Calculate it before call "ubifs_garbage_collect_leb" is enough. This
patch just remove the unnecessary calculate code.

Signed-off-by: wang bo <wang.bo116@zte.com.cn>
Acked-by: Brian Norris <computersforpeace@gmail.com>
Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
2013-10-22 13:34:27 +01:00
Gu Zheng 7bd59381c8 f2fs: introduce f2fs_kmem_cache_alloc to hide the unfailed, kmem cache allocation
Introduce the unfailed version of kmem_cache_alloc named f2fs_kmem_cache_alloc
to hide the retry routine and make the code a bit cleaner.

v2:
   Fix the wrong use of 'retry' tag pointed out by Gao feng.
   Use more neat code to remove redundant tag suggested by Haicheng Li.

Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-10-22 20:16:02 +09:00
Randy Dunlap 69c88dc7d9 vfs: fix new kernel-doc warnings
Move kernel-doc notation to immediately before its function to eliminate
kernel-doc warnings introduced by commit db14fc3abc ("vfs: add
d_walk()")

  Warning(fs/dcache.c:1343): No description found for parameter 'data'
  Warning(fs/dcache.c:1343): No description found for parameter 'dentry'
  Warning(fs/dcache.c:1343): Excess function parameter 'parent' description in 'check_mount'

Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Miklos Szeredi <mszeredi@suse.cz>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-10-22 12:02:40 +01:00
Randy Dunlap 606d6fe3ff fs/namei.c: fix new kernel-doc warning
Add @path parameter to fix kernel-doc warning.
Also fix a spello/typo.

  Warning(fs/namei.c:2304): No description found for parameter 'path'

Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-10-22 12:02:40 +01:00
Haicheng Li 435f2a1b58 f2fs: no need to check other dirty_segmap when the seg has been found
Because one dirty seg can only be mapped to one dirty_type. Otherwise, it's a bug.

Signed-off-by: Haicheng Li <haicheng.li@linux.intel.com>
[Jaegeuk Kim: modify a comment related to this patch]
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-10-22 19:57:31 +09:00
Haicheng Li cffbfa6648 f2fs: use true and false for boolean value
Signed-off-by: Haicheng Li <haicheng.li@linux.intel.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-10-22 19:49:39 +09:00
Linus Torvalds d24fec3991 Just a patch to fix an oops in an error path.
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iQIcBAABAgAGBQJSZVtaAAoJEDaohF61QIxkQwoP/2uqO2kg0b0ndR2pyCeUIu6a
 uMZ5/dC1DZ8CEVPLudu5Cb6mdS646rUEv4MjfZx6z7tJBWv0QpesiSnZN0vDlP3i
 Mj8iA/JckzbZv734Y7RQzpVfN+k/BOG/8YMrEQY3c9loD9yOzqGazOF6OK38O1E8
 CLQ2HeX0sigCdlYQOe9Lx8D0QiRlx91Yx8GH41wzAy5HGIWlJ2TxFLPf0upS1OPl
 PzH0G5mnS6apUndIxobk/z8w5q40+x2MWXG8aXNZflro7h4gp9L5DyfzaO/1dZV0
 WgS9zbjAOJKx8N0eAA1Z0PyNJ2i2/BLlpsw/6asm5CwEqMp134TCvv53oaihaIK/
 0P9Z4auXXuqKAc3Ok31HhGnWUwEhcY9TYRNqnH6dYGcg0YfQAWRpGdHPK7yFf85g
 MoTcgCqrcI9V4bxdECCdGTA798FOocuo2ShMeABJ73Zl97W3c0e91cAA2dPJ0N8+
 LaqmdP0cb0T5pJjbdQ2uDgQOK2JkoKQgkeilHHndRYT6cM+R4BFKTlft3ga/0ZLn
 GVubFNrL/T6rHVmK7014GvvX5NgsRzWd2yK01NYZGQFe/aOs0Eb86ed2R08X/+lh
 q9lmrvHZ6ATU9XvQsFMynnOLBWEMcPCC5rBEilUS70GIz8GENoG58XcBf4d2adiB
 5cDZlF5/v2BBDUt8vjK5
 =bh3d
 -----END PGP SIGNATURE-----

Merge tag 'jfs-3.12' of git://github.com/kleikamp/linux-shaggy

Pull jfs bugfix from David Kleikamp:
 "Just a patch to fix an oops in an error path"

* tag 'jfs-3.12' of git://github.com/kleikamp/linux-shaggy:
  jfs: fix error path in ialloc
2013-10-22 09:01:11 +01:00
Christoph Hellwig 865e9446b4 xfs: fold xfs_change_file_space into xfs_ioc_space
Now that only one caller of xfs_change_file_space is left it can be merged
into said caller.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2013-10-21 16:57:03 -05:00
Christoph Hellwig 83aee9e4c2 xfs: simplify the fallocate path
Call xfs_alloc_file_space or xfs_free_file_space directly from
xfs_file_fallocate instead of going through xfs_change_file_space.

This simplified the code by removing the unessecary marshalling of the
arguments into an xfs_flock64_t structure and allows removing checks that
are already done in the VFS code.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2013-10-21 16:56:21 -05:00
Christoph Hellwig 5f8aca8b43 xfs: always hold the iolock when calling xfs_change_file_space
Currently fallocate always holds the iolock when calling into
xfs_change_file_space, while the ioctl path lets some of the lower level
functions take it, but leave it out in others.

This patch makes sure the ioctl path also always holds the iolock and
thus introduces consistent locking for the preallocation operations while
simplifying the code and allowing to kill the now unused XFS_ATTR_NOLOCK
flag.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2013-10-21 16:54:22 -05:00
Christoph Hellwig 001a3e7370 xfs: remove the unused XFS_ATTR_NONBLOCK flag
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2013-10-21 16:53:11 -05:00
Christoph Hellwig 76ca4c238c xfs: always take the iolock around xfs_setattr_size
There is no reason to conditionally take the iolock inside xfs_setattr_size
when we can let the caller handle it unconditionally, which just incrases
the lock hold time for the case where it was previously taken internally
by a few instructions.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2013-10-21 16:51:33 -05:00
Al Viro c7314d74fc nfsd regression since delayed fput()
Background: nfsd v[23] had throughput regression since delayed fput
went in; every read or write ends up doing fput() and we get a pair
of extra context switches out of that (plus quite a bit of work
in queue_work itselfi, apparently).  Use of schedule_delayed_work()
gives it a chance to accumulate a bit before we do __fput() on all
of them.  I'm not too happy about that solution, but... on at least
one real-world setup it reverts about 10% throughput loss we got from
switch to delayed fput.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-10-20 08:44:39 -04:00
Greg Kroah-Hartman a7204d72db Merge 3.12-rc6 into driver-core-next
We want these fixes here too.

Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-10-19 13:05:38 -07:00
Linus Torvalds bdeeab62a6 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs
Pull btrfs fix from Chris Mason:
 "Sage hit a deadlock with ceph on btrfs, and Josef tracked it down to a
  regression in our initial rc1 pull.  When doing nocow writes we were
  sometimes starting a transaction with locks held"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
  Btrfs: release path before starting transaction in can_nocow_extent
2013-10-18 16:46:21 -07:00
Peter A. Felvegi 444996027e udf: fix for pathetic mount times in case of invalid file system
The UDF driver was not strict enough about checking the IDs in the
VSDs when mounting, which resulted in reading through all the sectors
of the block device in some unfortunate cases. Eg, trying to mount my
uninitialized 200G SSD partition (all 0xFF bytes) took ~350 minutes to
fail, because the code expected some of the valid IDs or a zero byte.
During this, the mount couldn't be killed, sync from the cmdline
blocked, and the machine froze into the shutdown. Valid filesystems
(extX, btrfs, ntfs) were rejected by the mere accident of having a
zero byte at just the right place in some of their sectors, close
enough to the beginning not to generate excess I/O. The fix adds a
hard limit on the VSD sector offset, adds the two missing VSD IDs, and
stops scanning when encountering an invalid ID. Also replaced the
magic number 32768 with a more meaningful #define, and supressed the
bogus message about failing to read the first sector if no UDF fs was
detected.

Signed-off-by: Peter A. Felvegi <petschy@gmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>
2013-10-18 22:39:07 +02:00
Josef Bacik 1bda19eb73 Btrfs: release path before starting transaction in can_nocow_extent
We can't be holding tree locks while we try to start a transaction, we will
deadlock.  Thanks,

Reported-by: Sage Weil <sage@inktank.com>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2013-10-18 12:43:40 -04:00
Linus Torvalds 04919afb85 Merge branch 'for-linus' of git://git.samba.org/sfrench/cifs-2.6
Pull CIFS fixes from Steve French:
 "Five small cifs fixes (includes fixes for: unmount hang, 2 security
  related, symlink, large file writes)"

* 'for-linus' of git://git.samba.org/sfrench/cifs-2.6:
  cifs: ntstatus_to_dos_map[] is not terminated
  cifs: Allow LANMAN auth method for servers supporting unencapsulated authentication methods
  cifs: Fix inability to write files >2GB to SMB2/3 shares
  cifs: Avoid umount hangs with smb2 when server is unresponsive
  do not treat non-symlink reparse points as valid symlinks
2013-10-17 18:49:21 -07:00
Theodore Ts'o efbed4dc58 ext4: add ratelimiting to ext4 messages
In the case of a storage device that suddenly disappears, or in the
case of significant file system corruption, this can result in a huge
flood of messages being sent to the console.  This can overflow the
file system containing /var/log/messages, or if a serial console is
configured, this can slow down the system so much that a hardware
watchdog can end up triggering forcing a system reboot.

Google-Bug-Id: 7258357

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2013-10-17 21:11:01 -04:00
Jaegeuk Kim 87a9bd2656 f2fs: avoid to write during the recovery
This patch enhances the recovery routine not to write any data/node/meta until
its completion.
If any writes are sent to the disk, it could contaminate the written history
that will be used for further recovery.

Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-10-18 09:44:14 +09:00
Gu Zheng e234088758 f2fs: avoid wait if IO end up when do_checkpoint for better performance
Previously, do_checkpoint() will call congestion_wait() for waiting the pages
(previous submitted node/meta/data pages) to be written back.
Because congestion_wait() will set a regular period (e.g. HZ / 50 ) for waiting, and
no additional wake up mechanism was introduced if IO ends up before regular period costed.
Yuan Zhong found there is a situation that after the pages have been written back,
but the checkpoint thread still wait for congestion_wait to exit.

So here we store checkpoint task into f2fs_sb when doing checkpoint, it'll wait for IO completes
if there's IO going on, and in the end IO path, wake up checkpoint task when IO ends up.

Thanks to Yuan Zhong's pre work about this problem.

Reported-by: Yuan Zhong <yuan.mark.zhong@samsung.com>
Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-10-18 09:44:14 +09:00
Gu Zheng 9076a75f8e f2fs: introduce function read_raw_super_block()
Introduce function read_raw_super_block() to hide reading raw super block and
the retry routine if the first sb is invalid.

Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-10-18 09:44:13 +09:00
Jaegeuk Kim b1838f8952 f2fs: fix the starvation problem on cp_rwsem
This patch removes the logic previously introduced to address the starvation
on cp_rwsem.

One potential there-in bug is that we should cover the wait.list with spin_lock,
but the previous code broke this rule.

And, actually current rwsem handles this starvation issue reasonably, so that we
didn't need to do this before neither.

Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-10-18 09:44:13 +09:00
Jaegeuk Kim 3d1e38073b f2fs: fix to store and retrieve i_rdev correctly
When storing i_rdev, we should check its file type.

Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-10-18 09:43:38 +09:00
Ming Lei aeac589a74 ext4: fix performance regression in ext4_writepages
Commit 4e7ea81db5(ext4: restructure writeback path) introduces another
performance regression on random write:

- one more page may be added to ext4 extent in
  mpage_prepare_extent_to_map, and will be submitted for I/O so
  nr_to_write will become -1 before 'done' is set

- the worse thing is that dirty pages may still be retrieved from page
  cache after nr_to_write becomes negative, so lots of small chunks
  can be submitted to block device when page writeback is catching up
  with write path, and performance is hurted.

On one arm A15 board with sata 3.0 SSD(CPU: 1.5GHz dura core, RAM:
2GB, SATA controller: 3.0Gbps), this patch can improve below test's
result from 157MB/sec to 174MB/sec(>10%):

	dd if=/dev/zero of=./z.img bs=8K count=512K

The above test is actually prototype of block write in bonnie++
utility.

This patch makes sure no more pages than nr_to_write can be added to
extent for mapping, so that nr_to_write won't become negative.

Cc: linux-ext4@vger.kernel.org
Acked-by: Jan Kara <jack@suse.cz>
Signed-off-by: Ming Lei <ming.lei@canonical.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2013-10-17 18:56:16 -04:00
Eric Sandeen 59e5a0e821 xfs: don't break from growfs ag update loop on error
When xfs_growfs_data_private() is updating backup superblocks,
it bails out on the first error encountered, whether reading or
writing:

* If we get an error writing out the alternate superblocks,
* just issue a warning and continue.  The real work is
* already done and committed.

This can cause a problem later during repair, because repair
looks at all superblocks, and picks the most prevalent one
as correct.  If we bail out early in the backup superblock
loop, we can end up with more "bad" matching superblocks than
good, and a post-growfs repair may revert the filesystem to
the old geometry.

With the combination of superblock verifiers and old bugs,
we're more likely to encounter read errors due to verification.

And perhaps even worse, we don't even properly write any of the
newly-added superblocks in the new AGs.

Even with this change, growfs will still say:

  xfs_growfs: XFS_IOC_FSGROWFSDATA xfsctl failed: Structure needs cleaning
  data blocks changed from 319815680 to 335216640

which might be confusing to the user, but it at least communicates
that something has gone wrong, and dmesg will probably highlight
the need for an xfs_repair.

And this is still best-effort; if verifiers fail on more than
half the backup supers, they may still "win" - but that's probably
best left to repair to more gracefully handle by doing its own
strict verification as part of the backup super "voting."

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Acked-by: Dave Chinner <david@fromorbit.com>
Reviewed-by: Mark Tinguely <tinguely@sgi.com> 
Signed-off-by: Ben Myers <bpm@sgi.com>
2013-10-17 13:31:42 -05:00
Eric Sandeen 31625f28ad xfs: don't emit corruption noise on fs probes
If we get EWRONGFS due to probing of non-xfs filesystems,
there's no need to issue the scary corruption error and backtrace.

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ben Myers <bpm@sgi.com>
2013-10-17 13:31:25 -05:00
Eric Sandeen 08e96e1a3c xfs: remove newlines from strings passed to __xfs_printk
__xfs_printk adds its own "\n".  Having it in the original string
leads to unintentional blank lines from these messages.

Most format strings have no newline, but a few do, leading to
i.e.:

[ 7347.119911] XFS (sdb2): Access to block zero in inode 132 start_block: 0 start_off: 0 blkcnt: 0 extent-state: 0 lastx: 1a05
[ 7347.119911] 
[ 7347.119919] XFS (sdb2): Access to block zero in inode 132 start_block: 0 start_off: 0 blkcnt: 0 extent-state: 0 lastx: 1a05
[ 7347.119919] 

Fix them all.

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2013-10-17 13:30:29 -05:00
Dave Chinner 2c6e24ce1a xfs: prevent deadlock trying to cover an active log
Recent analysis of a deadlocked XFS filesystem from a kernel
crash dump indicated that the filesystem was stuck waiting for log
space. The short story of the hang on the RHEL6 kernel is this:

	- the tail of the log is pinned by an inode
	- the inode has been pushed by the xfsaild
	- the inode has been flushed to it's backing buffer and is
	  currently flush locked and hence waiting for backing
	  buffer IO to complete and remove it from the AIL
	- the backing buffer is marked for write - it is on the
	  delayed write queue
	- the inode buffer has been modified directly and logged
	  recently due to unlinked inode list modification
	- the backing buffer is pinned in memory as it is in the
	  active CIL context.
	- the xfsbufd won't start buffer writeback because it is
	  pinned
	- xfssyncd won't force the log because it sees the log as
	  needing to be covered and hence wants to issue a dummy
	  transaction to move the log covering state machine along.

Hence there is no trigger to force the CIL to the log and hence
unpin the inode buffer and therefore complete the inode IO, remove
it from the AIL and hence move the tail of the log along, allowing
transactions to start again.

Mainline kernels also have the same deadlock, though the signature
is slightly different - the inode buffer never reaches the delayed
write lists because xfs_buf_item_push() sees that it is pinned and
hence never adds it to the delayed write list that the xfsaild
flushes.

There are two possible solutions here. The first is to simply force
the log before trying to cover the log and so ensure that the CIL is
emptied before we try to reserve space for the dummy transaction in
the xfs_log_worker(). While this might work most of the time, it is
still racy and is no guarantee that we don't get stuck in
xfs_trans_reserve waiting for log space to come free. Hence it's not
the best way to solve the problem.

The second solution is to modify xfs_log_need_covered() to be aware
of the CIL. We only should be attempting to cover the log if there
is no current activity in the log - covering the log is the process
of ensuring that the head and tail in the log on disk are identical
(i.e. the log is clean and at idle). Hence, by definition, if there
are items in the CIL then the log is not at idle and so we don't
need to attempt to cover it.

When we don't need to cover the log because it is active or idle, we
issue a log force from xfs_log_worker() - if the log is idle, then
this does nothing.  However, if the log is active due to there being
items in the CIL, it will force the items in the CIL to the log and
unpin them.

In the case of the above deadlock scenario, instead of
xfs_log_worker() getting stuck in xfs_trans_reserve() attempting to
cover the log, it will instead force the log, thereby unpinning the
inode buffer, allowing IO to be issued and complete and hence
removing the inode that was pinning the tail of the log from the
AIL. At that point, everything will start moving along again. i.e.
the xfs_log_worker turns back into a watchdog that can alleviate
deadlocks based around pinned items that prevent the tail of the log
from being moved...

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2013-10-17 10:56:17 -05:00
Linus Torvalds 056cdce0d3 Merge branch 'akpm' (fixes from Andrew Morton)
Merge misc fixes from Andrew Morton.

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (21 commits)
  mm: revert mremap pud_free anti-fix
  mm: fix BUG in __split_huge_page_pmd
  swap: fix set_blocksize race during swapon/swapoff
  procfs: call default get_unmapped_area on MMU-present architectures
  procfs: fix unintended truncation of returned mapped address
  writeback: fix negative bdi max pause
  percpu_refcount: export symbols
  fs: buffer: move allocation failure loop into the allocator
  mm: memcg: handle non-error OOM situations more gracefully
  tools/testing/selftests: fix uninitialized variable
  block/partitions/efi.c: treat size mismatch as a warning, not an error
  mm: hugetlb: initialize PG_reserved for tail pages of gigantic compound pages
  mm/zswap: bugfix: memory leak when re-swapon
  mm: /proc/pid/pagemap: inspect _PAGE_SOFT_DIRTY only on present pages
  mm: migration: do not lose soft dirty bit if page is in migration state
  gcov: MAINTAINERS: Add an entry for gcov
  mm/hugetlb.c: correct missing private flag clearing
  mm/vmscan.c: don't forget to free shrinker->nr_deferred
  ipc/sem.c: synchronize semop and semctl with IPC_RMID
  ipc: update locking scheme comments
  ...
2013-10-16 21:36:03 -07:00
HATAYAMA Daisuke fad1a86e25 procfs: call default get_unmapped_area on MMU-present architectures
Commit c4fe244857 ("sparc: fix PCI device proc file mmap(2)") added
proc_reg_get_unmapped_area in proc_reg_file_ops and
proc_reg_file_ops_no_compat, by which now mmap always returns EIO if
get_unmapped_area method is not defined for the target procfs file,
which causes regression of mmap on /proc/vmcore.

To address this issue, like get_unmapped_area(), call default
current->mm->get_unmapped_area on MMU-present architectures if
pde->proc_fops->get_unmapped_area, i.e.  the one in actual file
operation in the procfs file, is not defined.

Reported-by: Michael Holzheu <holzheu@linux.vnet.ibm.com>
Signed-off-by: HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: David S. Miller <davem@davemloft.net>
Tested-by: Michael Holzheu <holzheu@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-10-16 21:35:53 -07:00
HATAYAMA Daisuke 2cbe3b0af8 procfs: fix unintended truncation of returned mapped address
Currently, proc_reg_get_unmapped_area truncates upper 32-bit of the
mapped virtual address returned from get_unmapped_area method in
pde->proc_fops due to the variable rv of signed integer on x86_64.  This
is too small to have vitual address of unsigned long on x86_64 since on
x86_64, signed integer is of 4 bytes while unsigned long is of 8 bytes.
To fix this issue, use unsigned long instead.

Fixes a regression added in commit c4fe244857 ("sparc: fix PCI device
proc file mmap(2)").

Signed-off-by: HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: David S. Miller <davem@davemloft.net>
Tested-by: Michael Holzheu <holzheu@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-10-16 21:35:53 -07:00
Johannes Weiner 84235de394 fs: buffer: move allocation failure loop into the allocator
Buffer allocation has a very crude indefinite loop around waking the
flusher threads and performing global NOFS direct reclaim because it can
not handle allocation failures.

The most immediate problem with this is that the allocation may fail due
to a memory cgroup limit, where flushers + direct reclaim might not make
any progress towards resolving the situation at all.  Because unlike the
global case, a memory cgroup may not have any cache at all, only
anonymous pages but no swap.  This situation will lead to a reclaim
livelock with insane IO from waking the flushers and thrashing unrelated
filesystem cache in a tight loop.

Use __GFP_NOFAIL allocations for buffers for now.  This makes sure that
any looping happens in the page allocator, which knows how to
orchestrate kswapd, direct reclaim, and the flushers sensibly.  It also
allows memory cgroups to detect allocations that can't handle failure
and will allow them to ultimately bypass the limit if reclaim can not
make progress.

Reported-by: azurIt <azurit@pobox.sk>
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-10-16 21:35:53 -07:00
Cyrill Gorcunov e9cdd6e771 mm: /proc/pid/pagemap: inspect _PAGE_SOFT_DIRTY only on present pages
If a page we are inspecting is in swap we may occasionally report it as
having soft dirty bit (even if it is clean).  The pte_soft_dirty helper
should be called on present pte only.

Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: Pavel Emelyanov <xemul@parallels.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Matt Mackall <mpm@selenic.com>
Cc: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@gmail.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
Reviewed-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-10-16 21:35:52 -07:00
Linus Torvalds 0056019da4 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull tmpfile fix from Al Viro:
 "A fix for double iput() in ->tmpfile() on ext3 and ext4; I'd fucked it
  up, Miklos has caught it"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  ext[34]: fix double put in tmpfile
2013-10-16 17:18:18 -07:00
Geyslan G. Bem 3edc8376c0 ecryptfs: Fix memory leakage in keystore.c
In 'decrypt_pki_encrypted_session_key' function:

Initializes 'payload' pointer and releases it on exit.

Signed-off-by: Geyslan G. Bem <geyslan@gmail.com>
Signed-off-by: Tyler Hicks <tyhicks@canonical.com>
Cc: stable@vger.kernel.org # v2.6.28+
2013-10-16 15:18:01 -07:00
Bart Van Assche a97f4a66d8 dlm: Avoid that dlm_release_lockspace() incorrectly returns -EBUSY
When dlm_release_lockspace(ls, 1) is invoked on a busy system
immediately after the last dlm_unlock() AST has finished it can occur
that lkb_idr_is_local() is invoked for the unlocked LKB since removal
from ls_lkbidr only occurs after the AST has returned. If that happens
dlm_release_lockspace(ls, 1) will return -EBUSY instead of releasing
the lockspace. Fix this race condition by changing lkb_idr_is_local()
such that it only returns true for LKB's that have not yet been
unlocked.

Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: David Teigland <teigland@redhat.com>
2013-10-16 10:32:42 -05:00
Eric Sandeen 2046fd1873 ext3: Count journal as bsddf overhead in ext3_statfs
ext4 counts journal space as bsddf overhead, but ext3 does not.

For some reason when I patched ext4 I thought I should leave
ext3 alone, but frankly it makes more sense to fix it, I think.

Otherwise we get inconsistent behavior from ext3 under ext3.ko,
and ext3 under ext4.ko, which is not at all desirable...

This is testable by xfstests shared/289, though it will need
modification because it currently special-cases ext3.

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Jan Kara <jack@suse.cz>
2013-10-16 14:29:17 +02:00
Jan Kara 7534e854b9 ext4: fixup kerndoc annotation of mpage_map_and_submit_extent()
Document give_up_on_write argument of mpage_map_and_submit_extent().

Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2013-10-16 08:26:08 -04:00
Jan Kara 78371a45df ext4: fix assertion in ext4_add_complete_io()
It doesn't make sense to require io_end->handle when we are in
nojournal mode. So update the assertion accordingly to avoid false
warnings from ext4_add_complete_io().

Reported-by: Eric Whitney <enwlinux@gmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2013-10-16 08:25:11 -04:00
Miklos Szeredi 43ae9e3fc7 ext[34]: fix double put in tmpfile
d_tmpfile() already swallowed the inode ref.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Cc: stable@vger.kernel.org
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-10-15 12:14:06 -04:00
Steven Whitehouse e66cf16109 GFS2: Use lockref for glocks
Currently glocks have an atomic reference count and also a spinlock
which covers various internal fields, such as the state. This intent of
this patch is to replace the spinlock and the atomic reference count
with a lockref structure. This contains a spinlock which we can continue
to use as before, and a reference counter which is used in conjuction
with the spinlock to replace the previous atomic counter.

As a result of this there are some new rules for reference counting on
glocks. We need to distinguish between reference count changes under
gl_spin (which are now just increment or decrement of the new counter,
provided the count cannot hit zero) and those which are outside of
gl_spin, but which now take gl_spin internally.

The conversion is relatively straight forward. There is probably some
further clean up which can be done, but the priority at this stage is to
make the change in as simple a manner as possible.

A consequence of this change is that the reference count is being
decoupled from the lru list processing. This should allow future
adoption of the lru_list code with glocks in due course.

The reason for using the "dead" state and not just relying on 0 being
the "invalid state" is so that in due course 0 ref counts can be
allowable. The intent is to eventually be able to remove the ref count
changes which are currently hidden away in state_change().

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2013-10-15 15:18:08 +01:00
Tim Gardner 0c26606cbe cifs: ntstatus_to_dos_map[] is not terminated
Functions that walk the ntstatus_to_dos_map[] array could
run off the end. For example, ntstatus_to_dos() loops
while ntstatus_to_dos_map[].ntstatus is not 0. Granted,
this is mostly theoretical, but could be used as a DOS attack
if the error code in the SMB header is bogus.

[Might consider adding to stable, as this patch is low risk - Steve]

Reviewed-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
Signed-off-by: Steve French <smfrench@gmail.com>
2013-10-14 12:14:01 -05:00
Benjamin Herrenschmidt d723a92dd4 sysfs/bin: Fix size handling overflow for bin_attribute
While looking at the code, I noticed that bin_attribute read() and write()
ops copy the inode size into an int for futher comparisons.

Some bin_attributes can be fairly large. For example, pci creates some for
BARs set to the BAR size and giant BARs are around the corner, so this is
going to break something somewhere eventually.

Let's use the right type.

[adjust for seqfile conversions, only needed for bin_read() - gkh]

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-10-14 10:07:19 -07:00
Tejun Heo 785a162d14 sysfs: make sysfs_file_ops() follow ignore_lockdep flag
375b611e60 ("sysfs: remove sysfs_buffer->ops") introduced
sysfs_file_ops() which determines the associated file operation of a
given sysfs_dirent.  As file ops access should be protected by an
active reference, the new function includes a lockdep assertion on the
sysfs_dirent; unfortunately, I forgot to take attr->ignore_lockdep
flag into account and the lockdep assertion trips spuriously for files
which opt out from active reference lockdep checking.

# cat /sys/devices/pci0000:00/0000:00:01.2/usb1/authorized

 ------------[ cut here ]------------
 WARNING: CPU: 1 PID: 540 at /work/os/work/fs/sysfs/file.c:79 sysfs_file_ops+0x4e/0x60()
 Modules linked in:
 CPU: 1 PID: 540 Comm: cat Not tainted 3.11.0-work+ #3
 Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
  0000000000000009 ffff880016205c08 ffffffff81ca0131 0000000000000000
  ffff880016205c40 ffffffff81096d0d ffff8800166cb898 ffff8800166f6f60
  ffffffff8125a220 ffff880011ab1ec0 ffff88000aff0c78 ffff880016205c50
 Call Trace:
  [<ffffffff81ca0131>] dump_stack+0x4e/0x82
  [<ffffffff81096d0d>] warn_slowpath_common+0x7d/0xa0
  [<ffffffff81096dea>] warn_slowpath_null+0x1a/0x20
  [<ffffffff8125994e>] sysfs_file_ops+0x4e/0x60
  [<ffffffff8125a274>] sysfs_open_file+0x54/0x300
  [<ffffffff811df612>] do_dentry_open.isra.17+0x182/0x280
  [<ffffffff811df820>] finish_open+0x30/0x40
  [<ffffffff811f0623>] do_last+0x503/0xd90
  [<ffffffff811f0f6b>] path_openat+0xbb/0x6d0
  [<ffffffff811f23ba>] do_filp_open+0x3a/0x90
  [<ffffffff811e09a9>] do_sys_open+0x129/0x220
  [<ffffffff811e0abe>] SyS_open+0x1e/0x20
  [<ffffffff81caf3c2>] system_call_fastpath+0x16/0x1b
 ---[ end trace aa48096b111dafdb ]---

Rename fs/sysfs/dir.c::ignore_lockdep() to sysfs_ignore_lockdep() and
move it to fs/sysfs/sysfs.h and make sysfs_file_ops() skip lockdep
assertion if sysfs_ignore_lockdep() is true.

Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-10-14 08:40:39 -07:00
Michael Witten a26a8746ce Docs: Kconfig: `devlopers' -> `developers'
Signed-off-by: Michael Witten <mfwitten@gmail.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2013-10-14 15:48:34 +02:00
Linus Torvalds 9d05746e7b vfs: allow O_PATH file descriptors for fstatfs()
Olga reported that file descriptors opened with O_PATH do not work with
fstatfs(), found during further development of ksh93's thread support.

There is no reason to not allow O_PATH file descriptors here (fstatfs is
very much a path operation), so use "fdget_raw()".  See commit
55815f7014 ("vfs: make O_PATH file descriptors usable for 'fstat()'")
for a very similar issue reported for fstat() by the same team.

Reported-and-tested-by: ольга крыжановская <olga.kryzhanovska@gmail.com>
Acked-by: Al Viro <viro@zeniv.linux.org.uk>
Cc: stable@kernel.org	# O_PATH introduced in 3.0+
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-10-12 13:12:31 -07:00
Linus Torvalds be5090da4a A bug fix and performance regression fix for ext4.
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.14 (GNU/Linux)
 
 iQIcBAABCAAGBQJSWZeNAAoJENNvdpvBGATwx6kP/2mVlKlNBXVfGUmLVP3Xb68v
 4JhBzlC3ra3TRqVkw6C6kx4fbdq0cW/mDkecmYg2s+aDnswG/94/+yRdU4kQkyne
 iqN22ZYA7CumZgJvR0Z2ptWksDRpv8H5twgdVbPtad6/2cKmjseUraPo7YZhjDCe
 O9eRCXyVII305soAddZzZUgWOWCSWpdTW5zBitKaGq5x/K//rY9UlPVSuAo+9KPZ
 vyBiKJ1R6fDbtyH7JhCdXydMPKzlAPmyqYBQGLyq2GsRsXDp/VljGci6QN0iuZ5k
 lZsxFg8q0P6/R4Pjr3DDtE0tUbPXEyMxuquh/m4b3pAXRoMMCynyLP2zy7Gc7ec0
 ek2ty+sVG06JjseqigHSmS/a+PdZgDY5xEMKhaK4X38lxRPb7apNktolXxxEt6eU
 OPZsuvma1g+lbkkCdRO5FVwMllb7cuPhuZPGyxZvmP+ON59oT5QOVsDC+55WnHNs
 Ib11PCTN93Mwhrm1YPNWVV+gWG50eLZQYJam6H4mE4knaXnba6htEhYrdNczoFH4
 lcHaJzCDJLnYVRRbKXKdLSSnyz1X9cYJBP9g5ks1iNy7/JreF7WoIAOWvZWCp432
 7NC0IOmV4Q4itiCTcSh85rGlsXU8ZA7wK5HILhp9qZmNkw30OMvihNoWoTFiWTJR
 mVCkm+isBbqMP0nhV5km
 =ZwlW
 -----END PGP SIGNATURE-----

Merge tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4

Pull ext4 bugfixes from Ted Ts'o:
 "A bug fix and performance regression fix for ext4"

* tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
  ext4: fix memory leak in xattr
  ext4: fix performance regression in writeback of random writes
2013-10-12 12:55:15 -07:00
Linus Torvalds d64dab903f Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs
Pull btrfs fixes from Chris Mason:
 "We've got more bug fixes in my for-linus branch:

  One of these fixes another corner of the compression oops from last
  time.  Miao nailed down some problems with concurrent snapshot
  deletion and drive balancing.

  I kept out one of his patches for more testing, but these are all
  stable"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
  Btrfs: fix oops caused by the space balance and dead roots
  Btrfs: insert orphan roots into fs radix tree
  Btrfs: limit delalloc pages outside of find_delalloc_range
  Btrfs: use right root when checking for hash collision
2013-10-12 12:54:24 -07:00
Dave Jones 6e4ea8e33b ext4: fix memory leak in xattr
If we take the 2nd retry path in ext4_expand_extra_isize_ea, we
potentionally return from the function without having freed these
allocations.  If we don't do the return, we over-write the previous
allocation pointers, so we leak either way.

Spotted with Coverity.

[ Fixed by tytso to set is and bs to NULL after freeing these
  pointers, in case in the retry loop we later end up triggering an
  error causing a jump to cleanup, at which point we could have a double
  free bug. -- Ted ]

Signed-off-by: Dave Jones <davej@fedoraproject.org>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Cc: stable@vger.kernel.org
2013-10-12 14:39:49 -04:00
Kent Overstreet e34ecee2ae aio: Fix a trinity splat
aio kiocb refcounting was broken - it was relying on keeping track of
the number of available ring buffer entries, which it needs to do
anyways; then at shutdown time it'd wait for completions to be delivered
until the # of available ring buffer entries equalled what it was
initialized to.

Problem with  that is that the ring buffer is mapped writable into
userspace, so userspace could futz with the head and tail pointers to
cause the kernel to see extra completions, and cause free_ioctx() to
return while there were still outstanding kiocbs. Which would be bad.

Fix is just to directly refcount the kiocbs - which is more
straightforward, and with the new percpu refcounting code doesn't cost
us any cacheline bouncing which was the whole point of the original
scheme.

Also clean up ioctx_alloc()'s error path and fix a bug where it wasn't
subtracting from aio_nr if ioctx_add_table() failed.

Signed-off-by: Kent Overstreet <kmo@daterainc.com>
2013-10-10 19:31:47 -07:00
Miao Xie c00869f1ae Btrfs: fix oops caused by the space balance and dead roots
When doing space balance and subvolume destroy at the same time, we met
the following oops:

kernel BUG at fs/btrfs/relocation.c:2247!
RIP: 0010: [<ffffffffa04cec16>] prepare_to_merge+0x154/0x1f0 [btrfs]
Call Trace:
 [<ffffffffa04b5ab7>] relocate_block_group+0x466/0x4e6 [btrfs]
 [<ffffffffa04b5c7a>] btrfs_relocate_block_group+0x143/0x275 [btrfs]
 [<ffffffffa0495c56>] btrfs_relocate_chunk.isra.27+0x5c/0x5a2 [btrfs]
 [<ffffffffa0459871>] ? btrfs_item_key_to_cpu+0x15/0x31 [btrfs]
 [<ffffffffa048b46a>] ? btrfs_get_token_64+0x7e/0xcd [btrfs]
 [<ffffffffa04a3467>] ? btrfs_tree_read_unlock_blocking+0xb2/0xb7 [btrfs]
 [<ffffffffa049907d>] btrfs_balance+0x9c7/0xb6f [btrfs]
 [<ffffffffa049ef84>] btrfs_ioctl_balance+0x234/0x2ac [btrfs]
 [<ffffffffa04a1e8e>] btrfs_ioctl+0xd87/0x1ef9 [btrfs]
 [<ffffffff81122f53>] ? path_openat+0x234/0x4db
 [<ffffffff813c3b78>] ? __do_page_fault+0x31d/0x391
 [<ffffffff810f8ab6>] ? vma_link+0x74/0x94
 [<ffffffff811250f5>] vfs_ioctl+0x1d/0x39
 [<ffffffff811258c8>] do_vfs_ioctl+0x32d/0x3e2
 [<ffffffff811259d4>] SyS_ioctl+0x57/0x83
 [<ffffffff813c3bfa>] ? do_page_fault+0xe/0x10
 [<ffffffff813c73c2>] system_call_fastpath+0x16/0x1b

It is because we returned the error number if the reference of the root was 0
when doing space relocation. It was not right here, because though the root
was dead(refs == 0), but the space it held still need be relocated, or we
could not remove the block group. So in this case, we should return the root
no matter it is dead or not.

Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2013-10-10 21:31:02 -04:00
Miao Xie 14927d9546 Btrfs: insert orphan roots into fs radix tree
Now we don't drop all the deleted snapshots/subvolumes before the space
balance. It means we have to relocate the space which is held by the dead
snapshots/subvolumes. So we must into them into fs radix tree, or we would
forget to commit the change of them when doing transaction commit, and it
would corrupt the metadata.

Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2013-10-10 21:30:53 -04:00
Josef Bacik 7bf811a595 Btrfs: limit delalloc pages outside of find_delalloc_range
Liu fixed part of this problem and unfortunately I steered him in slightly the
wrong direction and so didn't completely fix the problem.  The problem is we
limit the size of the delalloc range we are looking for to max bytes and then we
try to lock that range.  If we fail to lock the pages in that range we will
shrink the max bytes to a single page and re loop.  However if our first page is
inside of the delalloc range then we will end up limiting the end of the range
to a period before our first page.  This is illustrated below

[0 -------- delalloc range --------- 256mb]
                                  [page]

So find_delalloc_range will return with delalloc_start as 0 and end as 128mb,
and then we will notice that delalloc_start < *start and adjust it up, but not
adjust delalloc_end up, so things go sideways.  To fix this we need to not limit
the max bytes in find_delalloc_range, but in find_lock_delalloc_range and that
way we don't end up with this confusion.  Thanks,

Signed-off-by: Josef Bacik <jbacik@fusionio.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2013-10-10 21:27:56 -04:00
Josef Bacik 4871c1588f Btrfs: use right root when checking for hash collision
btrfs_rename was using the root of the old dir instead of the root of the new
dir when checking for a hash collision, so if you tried to move a file into a
subvol it would freak out because it would see the file you are trying to move
in its current root.  This fixes the bug where this would fail

btrfs subvol create test1
btrfs subvol create test2
mv test1 test2.

Thanks to Chris Murphy for catching this,

Cc: stable@vger.kernel.org
Reported-by: Chris Murphy <lists@colorremedies.com>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2013-10-10 21:27:45 -04:00
Rob Herring 32df8dca50 of: remove HAVE_ARCH_DEVTREE_FIXUPS
HAVE_ARCH_DEVTREE_FIXUPS appears to always be needed except for sparc,
but it is only used for /proc/device-teee and sparc does not enable
/proc/device-tree. So this option is redundant. Remove the option and
always enable it. This has the side effect of fixing /proc/device-tree
on arches such as arm64 which failed to define this option.

Signed-off-by: Rob Herring <rob.herring@calxeda.com>
Acked-by: Vineet Gupta <vgupta@synopsys.com>
Acked-by: Grant Likely <grant.likely@linaro.org>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: James Hogan <james.hogan@imgtec.com>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Jonas Bonn <jonas@southpole.se>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: x86@kernel.org
Cc: Chris Zankel <chris@zankel.net>
Cc: Max Filippov <jcmvbkbc@gmail.com>
2013-10-09 20:04:08 -05:00
Rik van Riel 82727018b0 sched/numa: Call task_numa_free() from do_execve()
It is possible for a task in a numa group to call exec, and
have the new (unrelated) executable inherit the numa group
association from its former self.

This has the potential to break numa grouping, and is trivial
to fix.

Signed-off-by: Rik van Riel <riel@redhat.com>
Signed-off-by: Mel Gorman <mgorman@suse.de>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1381141781-10992-51-git-send-email-mgorman@suse.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-10-09 14:48:00 +02:00
Mel Gorman e29cf08b05 sched/numa: Report a NUMA task group ID
It is desirable to model from userspace how the scheduler groups tasks
over time. This patch adds an ID to the numa_group and reports it via
/proc/PID/status.

Signed-off-by: Mel Gorman <mgorman@suse.de>
Reviewed-by: Rik van Riel <riel@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1381141781-10992-45-git-send-email-mgorman@suse.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-10-09 14:47:49 +02:00
Brian Foster 74564fb48c xfs: clean up xfs_inactive() error handling, kill VN_INACTIVE_[NO]CACHE
The xfs_inactive() return value is meaningless. Turn xfs_inactive()
into a void function and clean up the error handling appropriately.
Kill the VN_INACTIVE_[NO]CACHE directives as they are not relevant
to Linux.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2013-10-08 17:20:41 -05:00
Brian Foster 88877d2b97 xfs: push down inactive transaction mgmt for ifree
Push the inode free work performed during xfs_inactive() down into
a new xfs_inactive_ifree() helper. This clears xfs_inactive() from
all inode locking and transaction management more directly
associated with freeing the inode xattrs, extents and the inode
itself.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2013-10-08 17:15:01 -05:00
Brian Foster f7be2d7f59 xfs: push down inactive transaction mgmt for truncate
Create the new xfs_inactive_truncate() function to handle the
truncate portion of xfs_inactive(). Push the locking and
transaction management into the new function.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2013-10-08 15:32:11 -05:00
Brian Foster 36b21dde6e xfs: push down inactive transaction mgmt for remote symlinks
Push down the transaction management for remote symlinks from
xfs_inactive() down to xfs_inactive_symlink_rmt(). The latter is
cleaned up to avoid transaction management intended for the
calling context (i.e., trans duplication, reservation, item
attachment).

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2013-10-08 14:53:02 -05:00
Mark Tinguely 2900a579ab xfs: add the inode directory type support to XFS_IOC_FSGEOM
Add the inode type directory type support to XFS_IOC_FSGEOM
so that xfs_repair/xfs_info knows if the superblock v4 filesystem
enabled the feature.

Signed-off-by: Mark Tinguely <tinguely@sgi.com>
Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2013-10-08 14:28:09 -05:00
Jaegeuk Kim ccaaca2591 f2fs: fix writing incorrect orphan blocks
Previously, there was a erroneous scenario like below.
thread 1:                       thread 2:
 f2fs_unlink
  - acquire_orphan_inode
    : sbi->n_orphans++           write_checkpoint
                                 - block_operations
                                  : f2fs_lock_all
                                 - do_checkpoint
                                  : write orphan blocks with sbi->n_orphans
                                 - unblock_operations
  - f2fs_lock_op
  - release_orphan_inode
  - f2fs_unlock_op

During the checkpoint by thread 2, f2fs stores a wrong orphan block according
to the wrong sbi->n_orphans.
To avoid this, simply we should make cover acquire_orphan_inode too with
f2fs_lock_op.

Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-10-08 10:19:28 +09:00
Jaegeuk Kim 5887d291d7 f2fs: avoid unnecessary checkpoints
During the f2fs_put_super procedure, we don't need to conduct checkpoint all
the time, since we don't need to do that if superblock is clean.

Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-10-08 09:32:43 +09:00
Sachin Prabhu dde2356c84 cifs: Allow LANMAN auth method for servers supporting unencapsulated authentication methods
This allows users to use LANMAN authentication on servers which support
unencapsulated authentication.

The patch fixes a regression where users using plaintext authentication
were no longer able to do so because of changed bought in by patch
3f618223dc

https://bugzilla.redhat.com/show_bug.cgi?id=1011621

Reported-by: Panos Kavalagios <Panagiotis.Kavalagios@eurodyn.com>
Reviewed-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Sachin Prabhu <sprabhu@redhat.com>
Signed-off-by: Steve French <smfrench@gmail.com>
2013-10-07 09:57:11 -05:00
Jan Klos 2f6c947963 cifs: Fix inability to write files >2GB to SMB2/3 shares
When connecting to SMB2/3 shares, maximum file size is set to non-LFS maximum in superblock. This is due to cap_large_files bit being different for SMB1 and SMB2/3 (where it is just an internal flag that is not negotiated and the SMB1 one corresponds to multichannel capability, so maybe LFS works correctly if server sends 0x08 flag) while capabilities are checked always for the SMB1 bit in cifs_read_super().

The patch fixes this by checking for the correct bit according to the protocol version.

CC: Stable <stable@kernel.org>
Signed-off-by: Jan Klos <honza.klos@gmail.com>
Reviewed-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <smfrench@gmail.com>
2013-10-07 09:54:45 -05:00
Kelly Anderson 4058c5117d f2fs: handle remount options correctly
The current f2fs code errors if the xattr or acl options are passed when
remounting.  This is important in a typical scenario where f2fs is mounted
as a "ro" root file-system by the boot loader and then the init process wants
to remount it "rw" with the "remount,rw" option.

Signed-off-by: Kelly Anderson <kelly@xilka.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-10-07 11:38:13 +09:00
Gu Zheng e479556bfd f2fs: use rw_sem instead of fs_lock(locks mutex)
The fs_locks is used to block other ops(ex, recovery) when doing checkpoint.
And each other operate routine(besides checkpoint) needs to acquire a fs_lock,
there is a terrible problem here, if these are too many concurrency threads acquiring
fs_lock, so that they will block each other and may lead to some performance problem,
but this is not the phenomenon we want to see.
Though there are some optimization patches introduced to enhance the usage of fs_lock,
but the thorough solution is using a *rw_sem* to replace the fs_lock.
Checkpoint routine takes write_sem, and other ops take read_sem, so that we can block
other ops(ex, recovery) when doing checkpoint, and other ops will not disturb each other,
this can avoid the problem described above completely.
Because of the weakness of rw_sem, the above change may introduce a potential problem
that the checkpoint thread might get starved if other threads are intensively locking
the read semaphore for I/O.(Pointed out by Xu Jin)
In order to avoid this, a wait_list is introduced, the appending read semaphore ops
will be dropped into the wait_list if checkpoint thread is waiting for write semaphore,
and will be waked up when checkpoint thread gives up write semaphore.
Thanks to Kim's previous review and test, and will be very glad to see other guys'
performance tests about this patch.

V2:
  -fix the potential starvation problem.
  -use more suitable func name suggested by Xu Jin.

Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com>
[Jaegeuk Kim: adjust minor coding standard]
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-10-07 11:33:05 +09:00
Shirish Pargaonkar eb4c7df6c2 cifs: Avoid umount hangs with smb2 when server is unresponsive
Do not send SMB2 Logoff command when reconnecting, the way smb1
code base works.

Also, no need to wait for a credit for an echo command when one is already
in flight.

Without these changes, umount command hangs if the server is unresponsive
e.g. hibernating.

Signed-off-by: Shirish Pargaonkar <shirishpargaonkar@gmail.com>
Acked-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <smfrench@us.ibm.com>
2013-10-06 20:18:42 -05:00
Steve French c31f330719 do not treat non-symlink reparse points as valid symlinks
Windows 8 and later can create NFS symlinks (within reparse points)
which we were assuming were normal NTFS symlinks and thus reporting
corrupt paths for.  Add check for reparse points to make sure that
they really are normal symlinks before we try to parse the pathname.

We also should not be parsing other types of reparse points (DFS
junctions etc) as if they were a  symlink so return EOPNOTSUPP
on those.  Also fix endian errors (we were not parsing symlink
lengths as little endian).

This fixes commit d244bf2dfb
which implemented follow link for non-Unix CIFS mounts

CC: Stable <stable@kernel.org>
Reviewed-by: Andrew Bartlett <abartlet@samba.org>
Signed-off-by: Steve French <smfrench@gmail.com>
2013-10-05 21:54:18 -05:00
Tejun Heo 3124eb1679 sysfs: merge regular and bin file handling
With the previous changes, sysfs regular file code is ready to handle
bin files too.  This patch makes bin files share the regular file
path.

* sysfs_create/remove_bin_file() are moved to fs/sysfs/file.c.

* sysfs_init_inode() is updated to use the new sysfs_bin_operations
  instead of bin_fops for bin files.

* fs/sysfs/bin.c and the related pieces are removed.

This patch shouldn't introduce any behavior difference to bin file
accesses.

Overall, this unification reduces the amount of duplicate logic, makes
behaviors more consistent and paves the road for building simpler and
more versatile interface which will allow other subsystems to make use
of sysfs for their pseudo filesystems.

v2: Stale fs/sysfs/bin.c reference dropped from
    Documentation/DocBook/filesystems.tmpl.  Reported by kbuild test
    robot.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Kay Sievers <kay@vrfy.org>
Cc: kbuild test robot <fengguang.wu@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-10-05 17:27:40 -07:00
Tejun Heo 49fe604781 sysfs: prepare open path for unified regular / bin file handling
sysfs bin file handling will be merged into the regular file support.
This patch prepares the open path.

This patch updates sysfs_open_file() such that it can handle both
regular and bin files.

This is a preparation and the new bin file path isn't used yet.

Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-10-05 17:27:40 -07:00
Tejun Heo 73d9714627 sysfs: copy bin mmap support from fs/sysfs/bin.c to fs/sysfs/file.c
sysfs bin file handling will be merged into the regular file support.
This patch copies mmap support from bin so that fs/sysfs/file.c can
handle mmapping bin files.

The code is copied mostly verbatim with the following updates.

* ->mmapped and ->vm_ops are added to sysfs_open_file and bin_buffer
  references are replaced with sysfs_open_file ones.

* Symbols are prefixed with sysfs_.

* sysfs_unmap_bin_file() grabs sysfs_open_dirent and traverses
  ->files.  Invocation of this function is added to
  sysfs_addrm_finish().

* sysfs_bin_mmap() is added to sysfs_bin_operations.

This is a preparation and the new mmap path isn't used yet.

Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-10-05 17:27:40 -07:00
Tejun Heo 2f0c6b7593 sysfs: add sysfs_bin_read()
sysfs bin file handling will be merged into the regular file support.
This patch prepares the read path.

Copy fs/sysfs/bin.c::read() to fs/sysfs/file.c and make it use
sysfs_open_file instead of bin_buffer.  The function is identical copy
except for the use of sysfs_open_file.

The new function is added to sysfs_bin_operations.  This isn't used
yet but will eventually replace fs/sysfs/bin.c.

Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-10-05 17:27:40 -07:00
Tejun Heo f9b9a6217c sysfs: prepare path write for unified regular / bin file handling
sysfs bin file handling will be merged into the regular file support.
This patch prepares the write path.

bin file write is almost identical to regular file write except that
the write length is capped by the inode size and @off is passed to the
write method.  This patch adds bin file handling to sysfs_write_file()
so that it can handle both regular and bin files.

A new file_operations struct sysfs_bin_operations is added, which
currently only hosts sysfs_write_file() and generic_file_llseek().
This isn't used yet but will eventually replace fs/sysfs/bin.c.

Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-10-05 17:27:40 -07:00
Tejun Heo 3ff65d3cb0 sysfs: collapse fs/sysfs/bin.c::fill_read() into read()
read() is simple enough and fill_read() being in a separate function
doesn't add anything.  Let's collapse it into read().  This will make
merging bin file handling with regular file.

Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-10-05 17:27:40 -07:00
Tejun Heo 91270162bf sysfs: skip bin_buffer->buffer while reading
After b31ca3f5df ("sysfs: fix deadlock"), bin read() first writes
data to bb->buffer and bounces it to a transient kernel buffer which
is then copied out to userland.  The double bouncing doesn't add
anything.  Let's just use the transient buffer directly.

While at it, rename @temp to @buf for clarity.

Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-10-05 17:27:40 -07:00
Tejun Heo 13c589d5b0 sysfs: use seq_file when reading regular files
sysfs read path implements its own buffering scheme between userland
and kernel callbacks, which essentially is a degenerate duplicate of
seq_file.  This patch replaces the custom read buffering
implementation in sysfs with seq_file.

While the amount of code reduction is small, this reduces low level
hairiness and enables future development of a new versatile API based
on seq_file so that sysfs features can be shared with other
subsystems.

As write path was already converted to not use sysfs_open_file->page,
this patch makes ->page and ->count unused and removes them.

Userland behavior remains the same except for some extreme corner
cases - e.g. sysfs will now regenerate the content each time a file is
read after a non-contiguous seek whereas the original code would keep
using the same content.  While this is a userland visible behavior
change, it is extremely unlikely to be noticeable and brings sysfs
behavior closer to that of procfs.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Kay Sievers <kay@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-10-05 17:21:03 -07:00
Tejun Heo 8ef445f080 sysfs: use transient write buffer
There isn't much to be gained by keeping around kernel buffer while a
file is open especially as the read path planned to be converted to
use seq_file and won't use the buffer.  This patch makes
sysfs_write_file() use per-write transient buffer instead of
sysfs_open_file->page.

This simplifies the write path, enables removing sysfs_open_file->page
once read path is updated and will help merging bin file write path
which already requires the use of a transient buffer due to a locking
order issue.

As the function comments of flush_write_buffer() and
sysfs_write_buffer() are being updated anyway, reformat them so that
they're more conventional.

v2: Use min_t() instead of min() in sysfs_write_file() to avoid build
    warning on arm.  Reported by build test robot.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: kbuild test robot <fengguang.wu@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-10-05 17:21:03 -07:00
Tejun Heo bcafe4eea3 sysfs: add sysfs_open_file->sd and ->file
sysfs will be converted to use seq_file for read path, which will make
it difficult to pass around multiple pointers directly.  This patch
adds sysfs_open_file->sd and ->file so that we can reach all the
necessary data structures from sysfs_open_file.

flush_write_buffer() is updated to drop @dentry which was used to
discover the sysfs_dirent as it's now available through
sysfs_open_file->sd.

This patch doesn't cause any behavior difference.

Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-10-05 17:21:03 -07:00
Tejun Heo 58282d8dc2 sysfs: rename sysfs_buffer to sysfs_open_file
sysfs read path will be converted to use seq_file which will handle
buffering making sysfs_buffer a misnomer.  Rename sysfs_buffer to
sysfs_open_file, and sysfs_open_dirent->buffers to ->files.

This path is pure rename.

Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-10-05 17:16:28 -07:00
Tejun Heo c75ec764cf sysfs: add sysfs_open_file_mutex
Add a separate mutex to protect sysfs_open_dirent->buffers list.  This
will allow performing sleepable operations while traversing
sysfs_buffers, which will be renamed to sysfs_open_file.

Note that currently sysfs_open_dirent->buffers list isn't being used
for anything and this patch doesn't make any functional difference.
It will be used to merge regular and bin file supports.

Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-10-05 17:15:48 -07:00
Tejun Heo 375b611e60 sysfs: remove sysfs_buffer->ops
Currently, sysfs_ops is fetched during sysfs_open_file() and cached in
sysfs_buffer->ops to be used while the file is open.  This patch
removes the caching and makes each operation directly fetch sysfs_ops.

This patch doesn't introduce any behavior difference and is to prepare
for merging regular and bin file supports.

Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-10-05 17:04:34 -07:00
Linus Torvalds e62063d699 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs
Pull btrfs fixes from Chris Mason:
 "This is a small collection of fixes, including a regression fix from
  Liu Bo that solves rare crashes with compression on.

  I've merged my for-linus up to 3.12-rc3 because the top commit is only
  meant for 3.12.  The rest of the fixes are also available in my master
  branch on top of my last 3.11 based pull"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
  btrfs: Fix crash due to not allocating integrity data for a bioset
  Btrfs: fix a use-after-free bug in btrfs_dev_replace_finishing
  Btrfs: eliminate races in worker stopping code
  Btrfs: fix crash of compressed writes
  Btrfs: fix transid verify errors when recovering log tree
2013-10-05 12:17:24 -07:00
Tejun Heo aea585ef8f sysfs: remove sysfs_buffer->needs_read_fill
->needs_read_fill is used to implement the following behaviors.

1. Ensure buffer filling on the first read.
2. Force buffer filling after a write.
3. Force buffer filling after a successful poll.

However, #2 and #3 don't really work as sysfs doesn't reset file
position.  While the read buffer would be refilled, the next read
would continue from the position after the last read or write,
requiring an explicit seek to the start for it to be useful, which
makes ->needs_read_fill superflous as read buffer is always refilled
if f_pos == 0.

Update sysfs_read_file() to test buffer->page for #1 instead and
remove ->needs_read_fill.  While this changes behavior in extreme
corner cases - e.g. re-reading a sysfs file after seeking to non-zero
position after a write or poll, it's highly unlikely to lead to actual
breakage.  This change is to prepare for using seq_file in the read
path.

While at it, reformat a comment in fill_write_buffer().

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Kay Sievers <kay@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-10-05 11:02:04 -07:00
Tejun Heo 89e51dab7c sysfs: remove unused sysfs_buffer->pos
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-10-05 10:54:47 -07:00
Darrick J. Wong b208c2f7ce btrfs: Fix crash due to not allocating integrity data for a bioset
When btrfs creates a bioset, we must also allocate the integrity data pool.
Otherwise btrfs will crash when it tries to submit a bio to a checksumming
disk:

 BUG: unable to handle kernel NULL pointer dereference at 0000000000000018
 IP: [<ffffffff8111e28a>] mempool_alloc+0x4a/0x150
 PGD 2305e4067 PUD 23063d067 PMD 0
 Oops: 0000 [#1] PREEMPT SMP
 Modules linked in: btrfs scsi_debug xfs ext4 jbd2 ext3 jbd mbcache
sch_fq_codel eeprom lpc_ich mfd_core nfsd exportfs auth_rpcgss af_packet
raid6_pq xor zlib_deflate libcrc32c [last unloaded: scsi_debug]
 CPU: 1 PID: 4486 Comm: mount Not tainted 3.12.0-rc1-mcsum #2
 Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
 task: ffff8802451c9720 ti: ffff880230698000 task.ti: ffff880230698000
 RIP: 0010:[<ffffffff8111e28a>]  [<ffffffff8111e28a>] mempool_alloc+0x4a/0x150
 RSP: 0018:ffff880230699688  EFLAGS: 00010286
 RAX: 0000000000000001 RBX: 0000000000000000 RCX: 00000000005f8445
 RDX: 0000000000000001 RSI: 0000000000000010 RDI: 0000000000000000
 RBP: ffff8802306996f8 R08: 0000000000011200 R09: 0000000000000008
 R10: 0000000000000020 R11: ffff88009d6e8000 R12: 0000000000011210
 R13: 0000000000000030 R14: ffff8802306996b8 R15: ffff8802451c9720
 FS:  00007f25b8a16800(0000) GS:ffff88024fc80000(0000) knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
 CR2: 0000000000000018 CR3: 0000000230576000 CR4: 00000000000007e0
 Stack:
  ffff8802451c9720 0000000000000002 ffffffff81a97100 0000000000281250
  ffffffff81a96480 ffff88024fc99150 ffff880228d18200 0000000000000000
  0000000000000000 0000000000000040 ffff880230e8c2e8 ffff8802459dc900
 Call Trace:
  [<ffffffff811b2208>] bio_integrity_alloc+0x48/0x1b0
  [<ffffffff811b26fc>] bio_integrity_prep+0xac/0x360
  [<ffffffff8111e298>] ? mempool_alloc+0x58/0x150
  [<ffffffffa03e8041>] ? alloc_extent_state+0x31/0x110 [btrfs]
  [<ffffffff81241579>] blk_queue_bio+0x1c9/0x460
  [<ffffffff8123e58a>] generic_make_request+0xca/0x100
  [<ffffffff8123e639>] submit_bio+0x79/0x160
  [<ffffffffa03f865e>] btrfs_map_bio+0x48e/0x5b0 [btrfs]
  [<ffffffffa03c821a>] btree_submit_bio_hook+0xda/0x110 [btrfs]
  [<ffffffffa03e7eba>] submit_one_bio+0x6a/0xa0 [btrfs]
  [<ffffffffa03ef450>] read_extent_buffer_pages+0x250/0x310 [btrfs]
  [<ffffffff8125eef6>] ? __radix_tree_preload+0x66/0xf0
  [<ffffffff8125f1c5>] ? radix_tree_insert+0x95/0x260
  [<ffffffffa03c66f6>] btree_read_extent_buffer_pages.constprop.128+0xb6/0x120
[btrfs]
  [<ffffffffa03c8c1a>] read_tree_block+0x3a/0x60 [btrfs]
  [<ffffffffa03caefd>] open_ctree+0x139d/0x2030 [btrfs]
  [<ffffffffa03a282a>] btrfs_mount+0x53a/0x7d0 [btrfs]
  [<ffffffff8113ab0b>] ? pcpu_alloc+0x8eb/0x9f0
  [<ffffffff81167305>] ? __kmalloc_track_caller+0x35/0x1e0
  [<ffffffff81176ba0>] mount_fs+0x20/0xd0
  [<ffffffff81191096>] vfs_kern_mount+0x76/0x120
  [<ffffffff81193320>] do_mount+0x200/0xa40
  [<ffffffff81135cdb>] ? strndup_user+0x5b/0x80
  [<ffffffff81193bf0>] SyS_mount+0x90/0xe0
  [<ffffffff8156d31d>] system_call_fastpath+0x1a/0x1f
 Code: 4c 8d 75 a8 4c 89 6d e8 45 89 e0 4c 8d 6f 30 48 89 5d d8 41 83 e0 af 48
89 fb 49 83 c6 18 4c 89 7d f8 65 4c 8b 3c 25 c0 b8 00 00 <48> 8b 73 18 44 89 c7
44 89 45 98 ff 53 20 48 85 c0 48 89 c2 74
 RIP  [<ffffffff8111e28a>] mempool_alloc+0x4a/0x150
  RSP <ffff880230699688>
 CR2: 0000000000000018
 ---[ end trace 7a96042017ed21e2 ]---

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2013-10-05 10:52:10 -04:00
Chris Mason 1329dfc8bb Merge branch 'for-linus' into for-linus-3.12 2013-10-05 10:51:32 -04:00
Linus Torvalds a5c984cc29 Merge branch 'for-linus' of git://git.samba.org/sfrench/cifs-2.6
Pull CIFS fixes from Steve French:
 "Small set of cifs fixes.  Most important is Jeff's fix that works
  around disconnection problems which can be caused by simultaneous use
  of user space tools (starting a long running smbclient backup then
  doing a cifs kernel mount) or multiple cifs mounts through a NAT, and
  Jim's fix to deal with reexport of cifs share.

  I expect to send two more cifs fixes next week (being tested now) -
  fixes to address an SMB2 unmount hang when server dies and a fix for
  cifs symlink handling of Windows "NFS" symlinks"

* 'for-linus' of git://git.samba.org/sfrench/cifs-2.6:
  [CIFS] update cifs.ko version
  [CIFS] Remove ext2 flags that have been moved to fs.h
  [CIFS] Provide sane values for nlink
  cifs: stop trying to use virtual circuits
  CIFS: FS-Cache: Uncache unread pages in cifs_readpages() before freeing them
2013-10-04 20:50:16 -07:00
Linus Torvalds 3dbecf0aa9 xfs: bugfixes for 3.12-rc4
- lockdep fix for project quotas
 - fix for dirent dtype support on v4 filesystems
 - fix for a memory leak in recovery
 - fix for build failure due to the recovery fix
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.10 (GNU/Linux)
 
 iQIcBAABAgAGBQJSTw/vAAoJENaLyazVq6ZOo2EP/RjhwaDqDZHB5bm/axZrtxP6
 g31TGvJ+nCUT6JjYX2wnoFuJDT2SDcs5+2gtjk1DRLb3JRQI2uJ+MtHLjDIZJSvE
 sMAADOgWvTuzx3TsnR4U0MM1/XVnv99k1vinedD6mGq16QtT0OWYsA9AKkMKWd1o
 OiTGyX4AMCNtfAZkiH9+OR8+BqH1xEEzv28H/Bf7yLSsQHM+v9uKPC5+f7I8bWvB
 YK8fAxeGmiAfDGR4tQ+tQVoIj3qrJmPyj45ElwAvGCKbOh0LG4/N+dwaCQme0teW
 xFfXMF+C/94qDom3z0gYAWzSOixgTFmy6gxt+3Mqw7uZ/dNzO+KeKE5Fm8cG11yD
 y3vxqwav/fLHv1fRUvl5abrAzl5VU8nRAbeQqZBM0xjzgfilMp5Jk2Jvix8OHcO5
 edmb7+CkkGdiYD15cSUl2242qKaukB3K1vrHoOlFte42vxELmcHWBRBxuZe8rgV1
 czf2xCHkWWjdwUrFeZoxVSEFydfoGIW0clAz8tHPQpVyvnSjRTuugJ8wuN92NyNF
 xGS5er0lyCqlBCBVCOZX/xTcwSQZ4UNG8qgdzDT26VN1VpTFeaaJlMRwD2GhYMYk
 8eYX3Ie/XdECLn5ZaG4xWEJHLarXLcqUI6eMobjkVs+qt/FQl/PzH76qOcZWKKbf
 kEOhPA1Gh97SZ66+vqaw
 =eNZa
 -----END PGP SIGNATURE-----

Merge tag 'xfs-for-linus-v3.12-rc4' of git://oss.sgi.com/xfs/xfs

Pull xfs bugfixes from Ben Myers:
 "There are lockdep annotations for project quotas, a fix for dirent
  dtype support on v4 filesystems, a fix for a memory leak in recovery,
  and a fix for the build error that resulted from it.  D'oh"

* tag 'xfs-for-linus-v3.12-rc4' of git://oss.sgi.com/xfs/xfs:
  xfs: Use kmem_free() instead of free()
  xfs: fix memory leak in xlog_recover_add_to_trans
  xfs: dirent dtype presence is dependent on directory magic numbers
  xfs: lockdep needs to know about 3 dquot-deep nesting
2013-10-04 14:47:22 -07:00
Ilya Dryomov 1357272fc7 Btrfs: fix a use-after-free bug in btrfs_dev_replace_finishing
free_device rcu callback, scheduled from btrfs_rm_dev_replace_srcdev,
can be processed before btrfs_scratch_superblock is called, which would
result in a use-after-free on btrfs_device contents.  Fix this by
zeroing the superblock before the rcu callback is registered.

Cc: Stefan Behrens <sbehrens@giantdisaster.de>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2013-10-04 16:02:14 -04:00
Ilya Dryomov 964fb15acf Btrfs: eliminate races in worker stopping code
The current implementation of worker threads in Btrfs has races in
worker stopping code, which cause all kinds of panics and lockups when
running btrfs/011 xfstest in a loop.  The problem is that
btrfs_stop_workers is unsynchronized with respect to check_idle_worker,
check_busy_worker and __btrfs_start_workers.

E.g., check_idle_worker race flow:

       btrfs_stop_workers():            check_idle_worker(aworker):
- grabs the lock
- splices the idle list into the
  working list
- removes the first worker from the
  working list
- releases the lock to wait for
  its kthread's completion
                                  - grabs the lock
                                  - if aworker is on the working list,
                                    moves aworker from the working list
                                    to the idle list
                                  - releases the lock
- grabs the lock
- puts the worker
- removes the second worker from the
  working list
                              ......
        btrfs_stop_workers returns, aworker is on the idle list
                 FS is umounted, memory is freed
                              ......
              aworker is waken up, fireworks ensue

With this applied, I wasn't able to trigger the problem in 48 hours,
whereas previously I could reliably reproduce at least one of these
races within an hour.

Reported-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2013-10-04 16:02:13 -04:00
Liu Bo 385fe0bede Btrfs: fix crash of compressed writes
The crash[1] is found by xfstests/generic/208 with "-o compress",
it's not reproduced everytime, but it does panic.

The bug is quite interesting, it's actually introduced by a recent commit
(573aecafca,
Btrfs: actually limit the size of delalloc range).

Btrfs implements delay allocation, so during writeback, we
(1) get a page A and lock it
(2) search the state tree for delalloc bytes and lock all pages within the range
(3) process the delalloc range, including find disk space and create
    ordered extent and so on.
(4) submit the page A.

It runs well in normal cases, but if we're in a racy case, eg.
buffered compressed writes and aio-dio writes,
sometimes we may fail to lock all pages in the 'delalloc' range,
in which case, we need to fall back to search the state tree again with
a smaller range limit(max_bytes = PAGE_CACHE_SIZE - offset).

The mentioned commit has a side effect, that is, in the fallback case,
we can find delalloc bytes before the index of the page we already have locked,
so we're in the case of (delalloc_end <= *start) and return with (found > 0).

This ends with not locking delalloc pages but making ->writepage still
process them, and the crash happens.

This fixes it by just thinking that we find nothing and returning to caller
as the caller knows how to deal with it properly.

[1]:
------------[ cut here ]------------
kernel BUG at mm/page-writeback.c:2170!
[...]
CPU: 2 PID: 11755 Comm: btrfs-delalloc- Tainted: G           O 3.11.0+ #8
[...]
RIP: 0010:[<ffffffff810f5093>]  [<ffffffff810f5093>] clear_page_dirty_for_io+0x1e/0x83
[...]
[ 4934.248731] Stack:
[ 4934.248731]  ffff8801477e5dc8 ffffea00049b9f00 ffff8801869f9ce8 ffffffffa02b841a
[ 4934.248731]  0000000000000000 0000000000000000 0000000000000fff 0000000000000620
[ 4934.248731]  ffff88018db59c78 ffffea0005da8d40 ffffffffa02ff860 00000001810016c0
[ 4934.248731] Call Trace:
[ 4934.248731]  [<ffffffffa02b841a>] extent_range_clear_dirty_for_io+0xcf/0xf5 [btrfs]
[ 4934.248731]  [<ffffffffa02a8889>] compress_file_range+0x1dc/0x4cb [btrfs]
[ 4934.248731]  [<ffffffff8104f7af>] ? detach_if_pending+0x22/0x4b
[ 4934.248731]  [<ffffffffa02a8bad>] async_cow_start+0x35/0x53 [btrfs]
[ 4934.248731]  [<ffffffffa02c694b>] worker_loop+0x14b/0x48c [btrfs]
[ 4934.248731]  [<ffffffffa02c6800>] ? btrfs_queue_worker+0x25c/0x25c [btrfs]
[ 4934.248731]  [<ffffffff810608f5>] kthread+0x8d/0x95
[ 4934.248731]  [<ffffffff81060868>] ? kthread_freezable_should_stop+0x43/0x43
[ 4934.248731]  [<ffffffff814fe09c>] ret_from_fork+0x7c/0xb0
[ 4934.248731]  [<ffffffff81060868>] ? kthread_freezable_should_stop+0x43/0x43
[ 4934.248731] Code: ff 85 c0 0f 94 c0 0f b6 c0 59 5b 5d c3 0f 1f 44 00 00 55 48 89 e5 41 54 53 48 89 fb e8 2c de 00 00 49 89 c4 48 8b 03 a8 01 75 02 <0f> 0b 4d 85 e4 74 52 49 8b 84 24 80 00 00 00 f6 40 20 01 75 44
[ 4934.248731] RIP  [<ffffffff810f5093>] clear_page_dirty_for_io+0x1e/0x83
[ 4934.248731]  RSP <ffff8801869f9c48>
[ 4934.280307] ---[ end trace 36f06d3f8750236a ]---

Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2013-10-04 16:02:11 -04:00
Josef Bacik 60e7cd3a4b Btrfs: fix transid verify errors when recovering log tree
If we crash with a log, remount and recover that log, and then crash before we
can commit another transaction we will get transid verify errors on the next
mount.  This is because we were not zero'ing out the log when we committed the
transaction after recovery.  This is ok as long as we commit another transaction
at some point in the future, but if you abort or something else goes wrong you
can end up in this weird state because the recovery stuff says that the tree log
should have a generation+1 of the super generation, which won't be the case of
the transaction that was started for recovery.  Fix this by removing the check
and _always_ zero out the log portion of the super when we commit a transaction.
This fixes the transid verify issues I was seeing with my force errors tests.
Thanks,

Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2013-10-04 16:02:09 -04:00
Thierry Reding b2a42f78ab xfs: Use kmem_free() instead of free()
This fixes a build failure caused by calling the free() function which
does not exist in the Linux kernel.

Signed-off-by: Thierry Reding <treding@nvidia.com>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>

(cherry picked from commit aaaae98022)
2013-10-04 13:56:12 -05:00
tinguely@sgi.com 9b3b77fe66 xfs: fix memory leak in xlog_recover_add_to_trans
Free the memory in error path of xlog_recover_add_to_trans().
Normally this memory is freed in recovery pass2, but is leaked
in the error path.

Signed-off-by: Mark Tinguely <tinguely@sgi.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Ben Myers <bpm@sgi.com>

(cherry picked from commit 519ccb81ac)
2013-10-04 13:56:03 -05:00
Dave Chinner 6d313498f0 xfs: dirent dtype presence is dependent on directory magic numbers
The determination of whether a directory entry contains a dtype
field originally was dependent on the filesystem having CRCs
enabled. This meant that the format for dtype beign enabled could be
determined by checking the directory block magic number rather than
doing a feature bit check. This was useful in that it meant that we
didn't need to pass a struct xfs_mount around to functions that
were already supplied with a directory block header.

Unfortunately, the introduction of dtype fields into the v4
structure via a feature bit meant this "use the directory block
magic number" method of discriminating the dirent entry sizes is
broken. Hence we need to convert the places that use magic number
checks to use feature bit checks so that they work correctly and not
by chance.

The current code works on v4 filesystems only because the dirent
size roundup covers the extra byte needed by the dtype field in the
places where this problem occurs.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Ben Myers <bpm@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>

(cherry picked from commit 367993e7c6)
2013-10-04 13:55:48 -05:00
Dave Chinner 89c6c89af2 xfs: lockdep needs to know about 3 dquot-deep nesting
Michael Semon reported that xfs/299 generated this lockdep warning:

=============================================
[ INFO: possible recursive locking detected ]
3.12.0-rc2+ #2 Not tainted
---------------------------------------------
touch/21072 is trying to acquire lock:
 (&xfs_dquot_other_class){+.+...}, at: [<c12902fb>] xfs_trans_dqlockedjoin+0x57/0x64

but task is already holding lock:
 (&xfs_dquot_other_class){+.+...}, at: [<c12902fb>] xfs_trans_dqlockedjoin+0x57/0x64

other info that might help us debug this:
 Possible unsafe locking scenario:

       CPU0
       ----
  lock(&xfs_dquot_other_class);
  lock(&xfs_dquot_other_class);

 *** DEADLOCK ***

 May be due to missing lock nesting notation

7 locks held by touch/21072:
 #0:  (sb_writers#10){++++.+}, at: [<c11185b6>] mnt_want_write+0x1e/0x3e
 #1:  (&type->i_mutex_dir_key#4){+.+.+.}, at: [<c11078ee>] do_last+0x245/0xe40
 #2:  (sb_internal#2){++++.+}, at: [<c122c9e0>] xfs_trans_alloc+0x1f/0x35
 #3:  (&(&ip->i_lock)->mr_lock/1){+.+...}, at: [<c126cd1b>] xfs_ilock+0x100/0x1f1
 #4:  (&(&ip->i_lock)->mr_lock){++++-.}, at: [<c126cf52>] xfs_ilock_nowait+0x105/0x22f
 #5:  (&dqp->q_qlock){+.+...}, at: [<c12902fb>] xfs_trans_dqlockedjoin+0x57/0x64
 #6:  (&xfs_dquot_other_class){+.+...}, at: [<c12902fb>] xfs_trans_dqlockedjoin+0x57/0x64

The lockdep annotation for dquot lock nesting only understands
locking for user and "other" dquots, not user, group and quota
dquots. Fix the annotations to match the locking heirarchy we now
have.

Reported-by: Michael L. Semon <mlsemon35@gmail.com>
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Ben Myers <bpm@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>

(cherry picked from commit f112a04971)
2013-10-04 13:55:33 -05:00
Linus Torvalds 15c83d26e1 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse
Pull fuse bugfixes from Miklos Szeredi:
 "This contains two more fixes by Maxim for writeback/truncate races and
  fixes for RCU walk in fuse_dentry_revalidate()"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse:
  fuse: no RCU mode in fuse_access()
  fuse: readdirplus: fix RCU walk
  fuse: don't check_submounts_and_drop() in RCU walk
  fuse: fix fallocate vs. ftruncate race
  fuse: wait for writeback in fuse_file_fallocate()
2013-10-04 09:06:13 -07:00
Steven Whitehouse e46c772dba GFS2: Protect quota sync generation
Now that gfs2_quota_sync can be potentially called from multiple
threads, we should protect this bit of code, and the sync generation
number in particular in order to ensure that there are no races
when syncing quotas.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Cc: Abhijith Das <adas@redhat.com>
2013-10-04 12:29:34 +01:00
Steven Whitehouse aabd7c72f5 GFS2: Inline qd_trylock into gfs2_quota_unlock
The function qd_trylock was not a trylock despite its name and
can be inlined into gfs2_quota_unlock in order to make the
code a bit clearer. There should be no functional change as a
result of this patch.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Cc: Abhijith Das <adas@redhat.com>
2013-10-04 11:39:21 +01:00
Steven Whitehouse 1bf59bf6de GFS2: Make two similar quota code fragments into a function
There should be no functional change bar the removal of a
test of the MS_READONLY flag which would never be reachable.
This merges the common code from qd_fish and qd_trylock into
a single function and calls it from both those places.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Cc: Abhijith Das <adas@redhat.com>
2013-10-04 11:14:46 +01:00
Steven Whitehouse bef292a72d GFS2: Remove obsolete quota tunable
There is no need for a paramater which relates to the internals
of quota to be exposed to users. The only possible use would be
to turn it up so large that the memory allocation fails. So lets
remove it and set it to a sensible value which ensures that we
don't ask for multipage allocations.

Currently the size of struct gfs2_holder means that the caluclated
value is identical to the previous default value, so there should
be no functional change.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Cc: Abhijith Das <adas@redhat.com>
2013-10-04 09:49:29 +01:00
Tejun Heo 250f7c3fee sysfs: introduce [__]sysfs_remove()
Given a sysfs_dirent, there is no reason to have multiple versions of
removal functions.  A function which removes the specified
sysfs_dirent and its descendants is enough.

This patch intorduces [__}sysfs_remove() which replaces all internal
variations of removal functions.  This will be the only removal
function in the planned new sysfs_dirent based interface.

Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-10-03 16:38:52 -07:00