linux_old1

Commit Graph

Author	SHA1	Message	Date
Ian Kent	a32744d4ab	autofs: work around unhappy compat problem on x86-64 When the autofs protocol version 5 packet type was added in commit `5c0a32fc2c` ("autofs4: add new packet type for v5 communications"), it obvously tried quite hard to be word-size agnostic, and uses explicitly sized fields that are all correctly aligned. However, with the final "char name[NAME_MAX+1]" array at the end, the actual size of the structure ends up being not very well defined: because the struct isn't marked 'packed', doing a "sizeof()" on it will align the size of the struct up to the biggest alignment of the members it has. And despite all the members being the same, the alignment of them is different: a "__u64" has 4-byte alignment on x86-32, but native 8-byte alignment on x86-64. And while 'NAME_MAX+1' ends up being a nice round number (256), the name[] array starts out a 4-byte aligned. End result: the "packed" size of the structure is 300 bytes: 4-byte, but not 8-byte aligned. As a result, despite all the fields being in the same place on all architectures, sizeof() will round up that size to 304 bytes on architectures that have 8-byte alignment for u64. Note that this is not a problem for 32-bit compat mode on POWER, since there __u64 is 8-byte aligned even in 32-bit mode. But on x86, 32-bit and 64-bit alignment is different for 64-bit entities, and as a result the structure that has exactly the same layout has different sizes. So on x86-64, but no other architecture, we will just subtract 4 from the size of the structure when running in a compat task. That way we will write the properly sized packet that user mode expects. Not pretty. Sadly, this very subtle, and unnecessary, size difference has been encoded in user space that wants to read packets of exactly the right size, and will refuse to touch anything else. Reported-and-tested-by: Thomas Meyer <thomas@m3y3r.de> Signed-off-by: Ian Kent <raven@themaw.net> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-02-25 12:10:27 -08:00
Oleg Nesterov	971316f050	epoll: ep_unregister_pollwait() can use the freed pwq->whead signalfd_cleanup() ensures that ->signalfd_wqh is not used, but this is not enough. eppoll_entry->whead still points to the memory we are going to free, ep_unregister_pollwait()->remove_wait_queue() is obviously unsafe. Change ep_poll_callback(POLLFREE) to set eppoll_entry->whead = NULL, change ep_unregister_pollwait() to check pwq->whead != NULL under rcu_read_lock() before remove_wait_queue(). We add the new helper, ep_remove_wait_queue(), for this. This works because sighand_cachep is SLAB_DESTROY_BY_RCU and because ->signalfd_wqh is initialized in sighand_ctor(), not in copy_sighand. ep_unregister_pollwait()->remove_wait_queue() can play with already freed and potentially reused ->sighand, but this is fine. This memory must have the valid ->signalfd_wqh until rcu_read_unlock(). Reported-by: Maxime Bizon <mbizon@freebox.fr> Cc: <stable@kernel.org> Signed-off-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-02-24 11:42:50 -08:00
Oleg Nesterov	d80e731eca	epoll: introduce POLLFREE to flush ->signalfd_wqh before kfree() This patch is intentionally incomplete to simplify the review. It ignores ep_unregister_pollwait() which plays with the same wqh. See the next change. epoll assumes that the EPOLL_CTL_ADD'ed file controls everything f_op->poll() needs. In particular it assumes that the wait queue can't go away until eventpoll_release(). This is not true in case of signalfd, the task which does EPOLL_CTL_ADD uses its ->sighand which is not connected to the file. This patch adds the special event, POLLFREE, currently only for epoll. It expects that init_poll_funcptr()'ed hook should do the necessary cleanup. Perhaps it should be defined as EPOLLFREE in eventpoll. __cleanup_sighand() is changed to do wake_up_poll(POLLFREE) if ->signalfd_wqh is not empty, we add the new signalfd_cleanup() helper. ep_poll_callback(POLLFREE) simply does list_del_init(task_list). This make this poll entry inconsistent, but we don't care. If you share epoll fd which contains our sigfd with another process you should blame yourself. signalfd is "really special". I simply do not know how we can define the "right" semantics if it used with epoll. The main problem is, epoll calls signalfd_poll() once to establish the connection with the wait queue, after that signalfd_poll(NULL) returns the different/inconsistent results depending on who does EPOLL_CTL_MOD/signalfd_read/etc. IOW: apart from sigmask, signalfd has nothing to do with the file, it works with the current thread. In short: this patch is the hack which tries to fix the symptoms. It also assumes that nobody can take tasklist_lock under epoll locks, this seems to be true. Note: - we do not have wake_up_all_poll() but wake_up_poll() is fine, poll/epoll doesn't use WQ_FLAG_EXCLUSIVE. - signalfd_cleanup() uses POLLHUP along with POLLFREE, we need a couple of simple changes in eventpoll.c to make sure it can't be "lost". Reported-by: Maxime Bizon <mbizon@freebox.fr> Cc: <stable@kernel.org> Signed-off-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-02-24 11:42:50 -08:00
Linus Torvalds	855a85f704	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs Quoth Chris: "This is later than I wanted because I got backed up running through btrfs bugs from the Oracle QA teams. But they are all bug fixes that we've queued and tested since rc1. Nothing in particular stands out, this just reflects bug fixing and QA done in parallel by all the btrfs developers. The most user visible of these is: Btrfs: clear the extent uptodate bits during parent transid failures Because that helps deal with out of date drives (say an iscsi disk that has gone away and come back). The old code wasn't always properly retrying the other mirror for this type of failure." * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs: (24 commits) Btrfs: fix compiler warnings on 32 bit systems Btrfs: increase the global block reserve estimates Btrfs: clear the extent uptodate bits during parent transid failures Btrfs: add extra sanity checks on the path names in btrfs_mksubvol Btrfs: make sure we update latest_bdev Btrfs: improve error handling for btrfs_insert_dir_item callers Btrfs: be less strict on finding next node in clear_extent_bit Btrfs: fix a bug on overcommit stuff Btrfs: kick out redundant stuff in convert_extent_bit Btrfs: skip states when they does not contain bits to clear Btrfs: check return value of lookup_extent_mapping() correctly Btrfs: fix deadlock on page lock when doing auto-defragment Btrfs: fix return value check of extent_io_ops btrfs: honor umask when creating subvol root btrfs: silence warning in raid array setup btrfs: fix structs where bitfields and spinlock/atomic share 8B word btrfs: delalloc for page dirtied out-of-band in fixup worker Btrfs: fix memory leak in load_free_space_cache() btrfs: don't check DUP chunks twice Btrfs: fix trim 0 bytes after a device delete ...	2012-02-24 09:02:53 -08:00
Chris Mason	e77266e4c4	Btrfs: fix compiler warnings on 32 bit systems The enospc tracing code added some interesting uses of u64 pointer casts. Signed-off-by: Chris Mason <chris.mason@oracle.com>	2012-02-24 10:39:05 -05:00
Anton Altaparmakov	0afa1b62e3	Merge branch 'master' of /Volumes/CaseSensitiveDisk/linux	2012-02-24 09:39:16 +00:00
Anton Altaparmakov	9b556248ec	NTFS: Correct two spelling errors "dealocate" to "deallocate" in mft.c. From: Masanari Iida <standby24x7@gmail.com> Signed-off-by: Anton Altaparmakov <anton@tuxera.com>	2012-02-24 09:17:09 +00:00
Anton Altaparmakov	37fbf4bfb8	Restore direct_io / truncate locking API With kernel 3.1, Christoph removed i_alloc_sem and replaced it with calls (namely inode_dio_wait() and inode_dio_done()) which are EXPORT_SYMBOL_GPL() thus they cannot be used by non-GPL file systems and further inode_dio_wait() was pushed from notify_change() into the file system ->setattr() method but no non-GPL file system can make this call. That means non-GPL file systems cannot exist any more unless they do not use any VFS functionality related to reading/writing as far as I can tell or at least as long as they want to implement direct i/o. Both Linus and Al (and others) have said on LKML that this breakage of the VFS API should not have happened and that the change was simply missed as it was not documented in the change logs of the patches that did those changes. This patch changes the two function exports in question to be EXPORT_SYMBOL() thus restoring the VFS API as it used to be - accessible for all modules. Christoph, who introduced the two functions and exported them GPL-only is CC-ed on this patch to give him the opportunity to object to the symbols being changed in this manner if he did indeed intend them to be GPL-only and does not want them to become available to all modules. Signed-off-by: Anton Altaparmakov <anton@tuxera.com> CC: Christoph Hellwig <hch@infradead.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-02-23 15:56:21 -08:00
Linus Torvalds	bb4c7e9a99	Merge branch 'for-linus' of git://oss.sgi.com/xfs/xfs A fix from Jesper Juhl removes an assignment in an ASSERT when a compare is intended. Two fixes from Mitsuo Hayasaka address off-by-ones in XFS quota enforcement. * 'for-linus' of git://oss.sgi.com/xfs/xfs: xfs: make inode quota check more general xfs: change available ranges of softlimit and hardlimit in quota check XFS: xfs_trans_add_item() - don't assign in ASSERT() when compare is intended	2012-02-23 15:38:57 -08:00
Liu Bo	5500cdbe14	Btrfs: increase the global block reserve estimates When doing IO with large amounts of data fragmentation, the global block reserve calulations are too low. This increases them to avoid ENOSPC crashes. Signed-off-by: Liu Bo <liubo2009@cn.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2012-02-23 10:49:04 -05:00
Chris Mason	5065319052	Btrfs: clear the extent uptodate bits during parent transid failures If btrfs reads a block and finds a parent transid mismatch, it clears the uptodate flags on the extent buffer, and the pages inside it. But we only clear the uptodate bits in the state tree if the block straddles more than one page. This is from an old optimization from to reduce contention on the extent state tree. But it is buggy because the code that retries a read from a different copy of the block is going to find the uptodate state bits set and skip the IO. The end result of the bug is that we'll never actually read the good copy (if there is one). The fix here is to always clear the uptodate state bits, which is safe because this code is only called when the parent transid fails. Signed-off-by: Chris Mason <chris.mason@oracle.com>	2012-02-23 10:43:45 -05:00
Chris Mason	16780cabb8	Btrfs: add extra sanity checks on the path names in btrfs_mksubvol Signed-off-by: Chris Mason <chris.mason@oracle.com>	2012-02-23 10:43:45 -05:00
Chris Mason	a6b0d5c8db	Btrfs: make sure we update latest_bdev When we are setting up the mount, we close all the devices that were not actually part of the metadata we found. But, we don't make sure that one of those devices wasn't fs_devices->latest_bdev, which means we can do a use after free on the one we closed. This updates latest_bdev as it goes. Signed-off-by: Chris Mason <chris.mason@oracle.com>	2012-02-23 10:43:45 -05:00
Chris Mason	fe66a05a06	Btrfs: improve error handling for btrfs_insert_dir_item callers This allows us to gracefully continue if we aren't able to insert directory items, both for normal files/dirs and snapshots. Signed-off-by: Chris Mason <chris.mason@oracle.com>	2012-02-23 10:43:45 -05:00
Linus Torvalds	437cf4c7b7	Bugfixes for the NFS client. Fix a nasty Oops in the NFSv4 getacl code, another source of infinite loops in the NFSv4 state recovery code, and a regression in NFSv4.1 session initialisation. Also deal with an NFSv4.1 memory leak. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.12 (GNU/Linux) iQIcBAABAgAGBQJPRLALAAoJEGcL54qWCgDy5yoP/0eKjYERpqf00ETRnmQw6ngt PCaC33mUwfsJlLdSfW6hhG+2IhKkiWR4mraCo1Es9CYS7eSsPU9/djK62neHZG3s lJF2xcPlfvbiYtbyG2MsmtRffUl/XRbLLMfZMADgG4iQy1y4uTDxsaxcKrBIu5ig mQWlEpsYeE9TLea3Qvw6oHiXMKkrwdeR9Et5aGo3Y5hxbpDhD86yR1cyw2ds2aUP FmXk/ERqJDJpEt+uEQKFsMDzjcZ27r/nca/AhwE+cVQku6Fi9QdnFcdNtQmwFYq6 iwYS/grUoW0tLV7Fv/iYNvZmVtXtS2Ng1VTR77gcLRXps0naEciuM5PO7MmuUe/j 0aS/fV1s4/ZloRCu6l/gj2JmOAkh0joy6msQTcdEt4LQcvQSxtUmLQtWyLoGm6aa 5LwTOFg25qRH6c7mJglSsiPdjgAcOIdwzH1gikSzSJeV2NjroZZ4nPW/noz0Idfq l36B9T3iHc0jZa6wr/Q/S1qF6RdfLYfuXcrDKcxJFDpuYNRNDN1jjOq+OKs5yGw9 Fgui6r9/g0uDsEKWtqR30NzUnUx4PxY0hxduT04T2sxMMapU3oXcXWW4vRgly9be ASx/hAEnBovNwzHeKo9wVt3WdHqP2M4wMx8dD8hFLL1qPx9uj90IeclOVneNEHKa UMtC3BH1DBF44FYhZgT4 =yq1J -----END PGP SIGNATURE----- Merge tag 'nfs-for-3.3-4' of git://git.linux-nfs.org/projects/trondmy/linux-nfs Bugfixes for the NFS client. Fix a nasty Oops in the NFSv4 getacl code, another source of infinite loops in the NFSv4 state recovery code, and a regression in NFSv4.1 session initialisation. Also deal with an NFSv4.1 memory leak. * tag 'nfs-for-3.3-4' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: NFSv4: fix server_scope memory leak NFSv4.1: Fix a NFSv4.1 session initialisation regression NFSv4: Ensure we throw out bad delegation stateids on NFS4ERR_BAD_STATEID NFSv4: Fix an Oops in the NFSv4 getacl code	2012-02-22 08:43:35 -08:00
Anton Altaparmakov	45d95bcd7a	NTFS: Do not dereference pointer before checking for NULL. Found by Coverity software (http://scan.coverity.com). Signed-off-by: Anton Altaparmakov <anton@tuxera.com>	2012-02-22 11:15:43 +00:00
Anton Altaparmakov	0e37579506	NTFS: Remove unused variable. Found by Coverity software (http://scan.coverity.com). Signed-off-by: Anton Altaparmakov <anton@tuxera.com>	2012-02-22 10:49:58 +00:00
Linus Torvalds	faf309009e	sys_poll: fix incorrect type for 'timeout' parameter The 'poll()' system call timeout parameter is supposed to be 'int', not 'long'. Now, the reason this matters is that right now 32-bit compat mode is broken on at least x86-64, because the 32-bit code just calls 'sys_poll()' directly on x86-64, and the 32-bit argument will have been zero-extended, turning a signed 'int' into a large unsigned 'long' value. We could just introduce a 'compat_sys_poll()' function for this, and that may eventually be what we have to do, but since the actual standard poll() semantics is supposed to be 'int', and since at least on x86-64 glibc sign-extends the argument before invocing the system call (so nobody can actually use a 64-bit timeout value in user space _anyway_, even in 64-bit binaries), the simpler solution would seem to be to just fix the definition of the system call to match what it should have been from the very start. If it turns out that somebody somehow circumvents the user-level libc 64-bit sign extension and actually uses a large unsigned 64-bit timeout despite that not being how poll() is supposed to work, we will need to do the compat_sys_poll() approach. Reported-by: Thomas Meyer <thomas@m3y3r.de> Acked-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-02-21 17:24:20 -08:00
Mitsuo Hayasaka	c922bbc819	xfs: make inode quota check more general The xfs checks quota when reserving disk blocks and inodes. In the block reservation, it checks if the total number of blocks including current usage and new reservation exceed quota. In the inode reservation, it checks using the total number of inodes including only current usage without new reservation. However, this inode quota check works well since the caller of xfs_trans_dquot() always sets the argument of the number of new inode reservation to 1 or 0 and inode is reserved one by one in current xfs. To make it more general, this patch changes it to the same way as the block quota check. Signed-off-by: Mitsuo Hayasaka <mitsuo.hayasaka.hu@hitachi.com> Cc: Ben Myers <bpm@sgi.com> Cc: Alex Elder <elder@kernel.org> Cc: Christoph Hellwig <hch@lst.de> Reviewed-by: Mark Tinguely <tinguely@sgi.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Ben Myers <bpm@sgi.com>	2012-02-21 10:12:43 -06:00
Mitsuo Hayasaka	20f12d8ac0	xfs: change available ranges of softlimit and hardlimit in quota check In general, quota allows us to use disk blocks and inodes up to each limit, that is, they are available if they don't exceed their limitations. Current xfs sets their available ranges to lower than them except disk inode quota check. So, this patch changes the ranges to not beyond them. Signed-off-by: Mitsuo Hayasaka <mitsuo.hayasaka.hu@hitachi.com> Cc: Ben Myers <bpm@sgi.com> Cc: Alex Elder <elder@kernel.org> Cc: Christoph Hellwig <hch@lst.de> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Mark Tinguely <tinguely@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com>	2012-02-21 10:12:43 -06:00
Liu Bo	692e5759a4	Btrfs: be less strict on finding next node in clear_extent_bit In clear_extent_bit, it is enough that next node is adjacent in tree level. Signed-off-by: Liu Bo <liubo2009@cn.fujitsu.com>	2012-02-21 16:02:10 +01:00
Linus Torvalds	8ebbfb4957	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Assorted fixes, sat in -next for a week or so... * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: ocfs2: deal with wraparounds of i_nlink in ocfs2_rename() vfs: fix compat_sys_stat() handling of overflows in st_nlink quota: Fix deadlock with suspend and quotas vfs: Provide function to get superblock and wait for it to thaw vfs: fix panic in __d_lookup() with high dentry hashtable counts autofs4 - fix lockdep splat in autofs vfs: fix d_inode_lookup() dentry ref leak	2012-02-20 16:13:58 -08:00
Weston Andros Adamson	abe9a6d57b	NFSv4: fix server_scope memory leak server_scope would never be freed if nfs4_check_cl_exchange_flags() returned non-zero Signed-off-by: Weston Andros Adamson <dros@netapp.com> Cc: stable@vger.kernel.org Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2012-02-17 17:34:03 -05:00
Trond Myklebust	f86f36a6ae	NFSv4.1: Fix a NFSv4.1 session initialisation regression Commit `aacd553` (NFSv4.1: cleanup init and reset of session slot tables) introduces a regression in the session initialisation code. New tables now find their sequence ids initialised to 0, rather than the mandated value of 1 (see RFC5661). Fix the problem by merging nfs4_reset_slot_table() and nfs4_init_slot_table(). Since the tbl->max_slots is initialised to 0, the test in nfs4_reset_slot_table for max_reqs != tbl->max_slots will automatically pass for an empty table. Reported-by: Vitaliy Gusev <gusev.vitaliy@nexenta.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2012-02-17 17:33:39 -05:00
Cong Wang	465c9343c5	ecryptfs: remove the second argument of k[un]map_atomic() Signed-off-by: Cong Wang <amwang@redhat.com> Signed-off-by: Tyler Hicks <tyhicks@canonical.com>	2012-02-16 16:06:27 -06:00
Tyler Hicks	545d680938	eCryptfs: Copy up lower inode attrs after setting lower xattr After passing through a ->setxattr() call, eCryptfs needs to copy the inode attributes from the lower inode to the eCryptfs inode, as they may have changed in the lower filesystem's ->setxattr() path. One example is if an extended attribute containing a POSIX Access Control List is being set. The new ACL may cause the lower filesystem to modify the mode of the lower inode and the eCryptfs inode would need to be updated to reflect the new mode. https://launchpad.net/bugs/926292 Signed-off-by: Tyler Hicks <tyhicks@canonical.com> Reported-by: Sebastien Bacher <seb128@ubuntu.com> Cc: John Johansen <john.johansen@canonical.com> Cc: <stable@vger.kernel.org>	2012-02-16 16:06:27 -06:00
Tyler Hicks	4a26620df4	eCryptfs: Improve statfs reporting statfs() calls on eCryptfs files returned the wrong filesystem type and, when using filename encryption, the wrong maximum filename length. If mount-wide filename encryption is enabled, the cipher block size and the lower filesystem's max filename length will determine the max eCryptfs filename length. Pre-tested, known good lengths are used when the lower filesystem's namelen is 255 and a cipher with 8 or 16 byte block sizes is used. In other, less common cases, we fall back to a safe rounded-down estimate when determining the eCryptfs namelen. https://launchpad.net/bugs/885744 Signed-off-by: Tyler Hicks <tyhicks@canonical.com> Reported-by: Kees Cook <keescook@chromium.org> Reviewed-by: Kees Cook <keescook@chromium.org> Reviewed-by: John Johansen <john.johansen@canonical.com>	2012-02-16 16:06:21 -06:00
Liu Bo	d9b0218f6c	Btrfs: fix a bug on overcommit stuff When overcommitting, we should check the sum of pinned space and bytes for delayed item. Signed-off-by: Liu Bo <liubo2009@cn.fujitsu.com>	2012-02-16 17:23:18 +01:00
Liu Bo	9d47c7671d	Btrfs: kick out redundant stuff in convert_extent_bit clear_state_bit will do merge_state for us, so kick out the redundant one. Signed-off-by: Liu Bo <liubo2009@cn.fujitsu.com>	2012-02-16 17:23:17 +01:00
Liu Bo	0449314a9c	Btrfs: skip states when they does not contain bits to clear Clearing a range's bits is different with setting them, since we don't need to touch them when states do not contain bits we want. Signed-off-by: Liu Bo <liubo2009@cn.fujitsu.com>	2012-02-16 17:23:17 +01:00
Tsutomu Itoh	285190d99f	Btrfs: check return value of lookup_extent_mapping() correctly This patch corrects error checking of lookup_extent_mapping(). Signed-off-by: Tsutomu Itoh <t-itoh@jp.fujitsu.com>	2012-02-16 17:23:17 +01:00
Miao Xie	600a45e1d5	Btrfs: fix deadlock on page lock when doing auto-defragment When I ran xfstests circularly on a auto-defragment btrfs, the deadlock happened. Steps to reproduce: [tty0] # export MOUNT_OPTIONS="-o autodefrag" # export TEST_DEV=<partition1> # export TEST_DIR=<mountpoint1> # export SCRATCH_DEV=<partition2> # export SCRATCH_MNT=<mountpoint2> # while [ 1 ] > do > ./check 091 127 263 > sleep 1 > done [tty1] # while [ 1 ] > do > echo 3 > /proc/sys/vm/drop_caches > done Several hours later, the test processes will hang on, and the deadlock will happen on page lock. The reason is that: Auto defrag task Flush thread Test task btrfs_writepages() add ordered extent (including page 1, 2) set page 1 writeback set page 2 writeback endio_fn() end page 2 writeback release page 2 lock page 1 alloc and lock page 2 page 2 is not uptodate btrfs_readpage() start ordered extent() btrfs_writepages() try to lock page 1 so deadlock happens. Fix this bug by unlocking the page which is in writeback, and re-locking it after the writeback end. Signed-off-by: Miao Xie <miax@cn.fujitsu.com>	2012-02-16 17:23:16 +01:00
Tsutomu Itoh	013bd4c336	Btrfs: fix return value check of extent_io_ops This patch adds the check on the return value of extent_io_ops. Signed-off-by: Tsutomu Itoh <t-itoh@jp.fujitsu.com>	2012-02-16 17:23:16 +01:00
Florian Albrechtskirchinger	12fc9d0923	btrfs: honor umask when creating subvol root Set the subvol root inode permissions based on the current umask.	2012-02-16 16:35:41 +01:00
David Sterba	8a33442694	btrfs: silence warning in raid array setup Raid array setup code creates an extent buffer in an usual way. When the PAGE_CACHE_SIZE is > super block size, the extent pages are not marked up-to-date, which triggers a WARN_ON in the following write_extent_buffer call. Add an explicit up-to-date call to silence the warning. Signed-off-by: David Sterba <dsterba@suse.cz>	2012-02-15 16:40:25 +01:00
David Sterba	c08782dacd	btrfs: fix structs where bitfields and spinlock/atomic share 8B word On ia64, powerpc64 and sparc64 the bitfield is modified through a RMW cycle and current gcc rewrites the adjacent 4B word, which in case of a spinlock or atomic has disaterous effect. https://lkml.org/lkml/2012/2/1/220 Signed-off-by: David Sterba <dsterba@suse.cz>	2012-02-15 16:40:25 +01:00
Jeff Mahoney	87826df0ec	btrfs: delalloc for page dirtied out-of-band in fixup worker We encountered an issue that was easily observable on s/390 systems but could really happen anywhere. The timing just seemed to hit reliably on s/390 with limited memory. The gist is that when an unexpected set_page_dirty() happened, we'd run into the BUG() in btrfs_writepage_fixup_worker since it wasn't properly set up for delalloc. This patch does the following: - Performs the missing delalloc in the fixup worker - Allow the start hook to return -EBUSY which informs __extent_writepage that it should mark the page skipped and not to redirty it. This is required since the fixup worker can fail with -ENOSPC and the page will have already been redirtied. That causes an Oops in drop_outstanding_extents later. Retrying the fixup worker could lead to an infinite loop. Deferring the page redirty also saves us some cycles since the page would be stuck in a resubmit-redirty loop until the fixup worker completes. It's not harmful, just wasteful. - If the fixup worker fails, we mark the page and mapping as errored, and end the writeback, similar to what we would do had the page actually been submitted to writeback. Signed-off-by: Jeff Mahoney <jeffm@suse.com>	2012-02-15 16:40:25 +01:00
Tsutomu Itoh	a7e221e900	Btrfs: fix memory leak in load_free_space_cache() load_free_space_cache() has forgotten to free path. Signed-off-by: Tsutomu Itoh <t-itoh@jp.fujitsu.com>	2012-02-15 16:40:24 +01:00
Arne Jansen	859acaf1a2	btrfs: don't check DUP chunks twice Because scrub enumerates the dev extent tree to find the chunks to scrub, it currently finds each DUP chunk twice and also scrubs it twice. This patch makes sure that scrub_chunk only checks that part of the chunk the dev extent has been found for. This only changes the behaviour for DUP chunks. Reported-and-tested-by: Stefan Behrens <sbehrens@giantdisaster.de> Signed-off-by: Arne Jansen <sensille@gmx.net>	2012-02-15 16:40:24 +01:00
Liu Bo	2cac13e41b	Btrfs: fix trim 0 bytes after a device delete A user reported a bug of btrfs's trim, that is we will trim 0 bytes after a device delete. The reproducer: $ mkfs.btrfs disk1 $ mkfs.btrfs disk2 $ mount disk1 /mnt $ fstrim -v /mnt $ btrfs device add disk2 /mnt $ btrfs device del disk1 /mnt $ fstrim -v /mnt This is because after we delete the device, the block group may start from a non-zero place, which will confuse trim to discard nothing. Reported-by: Lutz Euler <lutz.euler@freenet.de> Signed-off-by: Liu Bo <liubo2009@cn.fujitsu.com>	2012-02-15 16:40:23 +01:00
Jeff Liu	6af021d8fc	Btrfs: return the internal error unchanged if btrfs_get_extent_fiemap() call failed for SEEK_DATA/SEEK_HOLE inquiry Given that ENXIO only means "offset beyond EOF" for either SEEK_DATA or SEEK_HOLE inquiry in a desired file range, so we should return the internal error unchanged if btrfs_get_extent_fiemap() call failed, rather than ENXIO. Cc: Dave Chinner <david@fromorbit.com> Signed-off-by: Jie Liu <jeff.liu@oracle.com>	2012-02-15 16:40:23 +01:00
Jan Schmidt	8f24b49688	Btrfs: avoid positive number with ERR_PTR inode_ref_info() returns 1 when the element wasn't found and < 0 on error, just like btrfs_search_slot(). In iref_to_path() it's an error when the inode ref can't be found, thus we return ERR_PTR(ret) in that case. In order to avoid ERR_PTR(1), we now set ret to -ENOENT in that case. Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>	2012-02-15 16:40:23 +01:00
Keith Mannthey	941b2ddf71	btrfs: Sector Size check during Mount Gracefully fail when trying to mount a BTRFS file system that has a sectorsize smaller than PAGE_SIZE. On PPC it is possible to build a FS while using a 4k PAGE_SIZE kernel then boot into a 64K PAGE_SIZE kernel. Presently open_ctree fails in an endless loop and hangs the machine in this situation. My debugging has show this Sector size < Page size to be a non trivial situation and a graceful exit from the situation would be nice for the time being. Signed-off-by: Keith Mannthey <kmannth@us.ibm.com>	2012-02-15 16:40:22 +01:00
Linus Torvalds	ce5afed937	Merge git://git.samba.org/sfrench/cifs-2.6 * git://git.samba.org/sfrench/cifs-2.6: cifs: don't return error from standard_receive3 after marking response malformed cifs: request oplock when doing open on lookup cifs: fix error handling when cifscreds key payload is an error	2012-02-13 20:34:44 -08:00
Al Viro	847c9db5cb	ocfs2: deal with wraparounds of i_nlink in ocfs2_rename() unfortunately, nlink_t may be smaller than 32 bits and ->i_nlink on ocfs2 can grow up to 0xffffffff; storing it in nlink_t variable will lose upper bits on such architectures. Needs to be made u32, until we get kernel-side nlink_t uniformly 32bit... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-02-13 20:45:39 -05:00
Al Viro	fcf83067bf	vfs: fix compat_sys_stat() handling of overflows in st_nlink Massaged cp_compat_stat() into form closer to cp_new_stat(); the only real issue had been in handling of st_nlink overflows - native 32bit stat(2) returns -EOVERFLOW in such situations, compat one silently loses upper bits. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-02-13 20:45:39 -05:00
Jan Kara	dcdbed853d	quota: Fix deadlock with suspend and quotas This script causes a kernel deadlock: set -e DEVICE=/dev/vg1/linear lvchange -ay $DEVICE mkfs.ext3 $DEVICE mount -t ext3 -o usrquota,grpquota $DEVICE /mnt/test quotacheck -gu /mnt/test umount /mnt/test mount -t ext3 -o usrquota,grpquota $DEVICE /mnt/test quotaon /mnt/test dmsetup suspend $DEVICE setquota -u root 1 2 3 4 /mnt/test & sleep 1 dmsetup resume $DEVICE setquota acquired semaphore s_umount for read and then tried to perform a transaction (and waits because the device is suspended). dmsetup resume tries to acquire s_umount for write before resuming the device (and waits for setquota). Fix the deadlock by grabbing a thawed superblock for quota commands which need it. Reported-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-02-13 20:45:39 -05:00
Jan Kara	6b6dc836a1	vfs: Provide function to get superblock and wait for it to thaw In quota code we need to find a superblock corresponding to a device and wait for superblock to be unfrozen. However this waiting has to happen without s_umount semaphore because that is required for superblock to thaw. So provide a function in VFS for this to keep dances with s_umount where they belong. [AV: implementation switched to saner variant] Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-02-13 20:45:38 -05:00
Dimitri Sivanich	074b85175a	vfs: fix panic in __d_lookup() with high dentry hashtable counts When the number of dentry cache hash table entries gets too high (2147483648 entries), as happens by default on a 16TB system, use of a signed integer in the dcache_init() initialization loop prevents the dentry_hashtable from getting initialized, causing a panic in __d_lookup(). Fix this in dcache_init() and similar areas. Signed-off-by: Dimitri Sivanich <sivanich@sgi.com> Acked-by: David S. Miller <davem@davemloft.net> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-02-13 20:45:38 -05:00
Steven Rostedt	1d6f209786	autofs4 - fix lockdep splat in autofs When recursing down the locks when traversing a tree/list in get_next_positive_dentry() or get_next_positive_subdir() a lock can change from being nested to being a parent which breaks lockdep. This patch tells lockdep about what we did. Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Acked-by: Ian Kent <raven@themaw.net> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-02-13 20:45:37 -05:00
Miklos Szeredi	e188dc02d3	vfs: fix d_inode_lookup() dentry ref leak d_inode_lookup() leaks a dentry reference on IS_DEADDIR(). Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> CC: stable@vger.kernel.org Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-02-13 20:45:37 -05:00
Jesper Juhl	05293485a0	XFS: xfs_trans_add_item() - don't assign in ASSERT() when compare is intended It looks to me like the two ASSERT()s in xfs_trans_add_item() really want to do a compare (==) rather than assignment (=). This patch changes it from the latter to the former. Signed-off-by: Jesper Juhl <jj@chaosbits.net> Signed-off-by: Ben Myers <bpm@sgi.com>	2012-02-13 17:06:39 -06:00
Linus Torvalds	19be13cfe3	Merge branch 'for-linus' of git://oss.sgi.com/xfs/xfs Two bugfixes in XFS for 3.3: one fix passes KMEM_SLEEP to kmem_realloc instead of 0, and the other resolves a possible deadlock in xfs quotas. * 'for-linus' of git://oss.sgi.com/xfs/xfs: xfs: use a normal shrinker for the dquot freelist xfs: pass KM_SLEEP flag to kmem_realloc() in xlog_recover_add_to_cnt_trans()	2012-02-13 14:19:45 -08:00
Linus Torvalds	3ec1e88b33	Merge branch 'for-linus' of git://git.kernel.dk/linux-block Says Jens: "Time to push off some of the pending items. I really wanted to wait until we had the regression nailed, but alas it's not quite there yet. But I'm very confident that it's "just" a missing expire on exit, so fix from Tejun should be fairly trivial. I'm headed out for a week on the slopes. - Killing the barrier part of mtip32xx. It doesn't really support barriers, and it doesn't need them (writes are fully ordered). - A few fixes from Dan Carpenter, preventing overflows of integer multiplication. - A fixup for loop, fixing a previous commit that didn't quite solve the partial read problem from Dave Young. - A bio integer overflow fix from Kent Overstreet. - Improvement/fix of the door "keep locked" part of the cdrom shared code from Paolo Benzini. - A few cfq fixes from Shaohua Li. - A fix for bsg sysfs warning when removing a file it did not create from Stanislaw Gruszka. - Two fixes for floppy from Vivek, preventing a crash. - A few block core fixes from Tejun. One killing the over-optimized ioc exit path, cleaning that up nicely. Two others fixing an oops on elevator switch, due to calling into the scheduler merge check code without holding the queue lock." * 'for-linus' of git://git.kernel.dk/linux-block: block: fix lockdep warning on io_context release put_io_context() relay: prevent integer overflow in relay_open() loop: zero fill bio instead of return -EIO for partial read bio: don't overflow in bio_get_nr_vecs() floppy: Fix a crash during rmmod floppy: Cleanup disk->queue before caling put_disk() if add_disk() was never called cdrom: move shared static to cdrom_device_info bsg: fix sysfs link remove warning block: don't call elevator callbacks for plug merges block: separate out blk_rq_merge_ok() and blk_try_merge() from elevator functions mtip32xx: removed the irrelevant argument of mtip_hw_submit_io() and the unused member of struct driver_data block: strip out locking optimization in put_io_context() cdrom: use copy_to_user() without the underscores block: fix ioc locking warning block: fix NULL icq_cache reference block,cfq: change code order	2012-02-11 10:07:11 -08:00
Christoph Hellwig	04da0c8196	xfs: use a normal shrinker for the dquot freelist Stop reusing dquots from the freelist when allocating new ones directly, and implement a shrinker that actually follows the specifications for the interface. The shrinker implementation is still highly suboptimal at this point, but we can gradually work on it. This also fixes an bug in the previous lock ordering, where we would take the hash and dqlist locks inside of the freelist lock against the normal lock ordering. This is only solvable by introducing the dispose list, and thus not when using direct reclaim of unused dquots for new allocations. As a side-effect the quota upper bound and used to free ratio values in /proc/fs/xfs/xqm are set to 0 as these values don't make any sense in the new world order. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Ben Myers <bpm@sgi.com>	2012-02-10 12:02:05 -06:00
Linus Torvalds	af5feae3d7	fix 1 mysterious divide error fix 3 NULL dereference bugs in writeback tracing, on SD card removal w/o umount -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (GNU/Linux) iQIcBAABAgAGBQJPNI/ZAAoJECvKgwp+S8JaXNsP/3UwYM4R/bIqjsGSEr8mpxzs L/9hq85Vql+HDIZ0QT2Zj8aYcF2iYhjxrrVGVjNmINY3bSvniqtrZ6oCejdj7wqR vb2ECC3csUnvUbbewCOM4EaowU2CoANhO5xZeDzOu9SnYfMPuxRzjFlxU5WehJm1 5dKcCtbaO9Bleo5aZyr2AAaZPgE2lG7Hrvk8HghPhEw7ZBtO1Pc3iVegEhIvRiZR tUNTCwxE7QV1GehTUTgGpJWNL4qzrbyiqm/Vg+yI27l13IPn6mb/qfe7eHDFUTCb Ey6oeojhmmv0Kgc7b38/0U6q1QNL8x+zJP3J21wMmYqn2DtkLgZkI4TAcmBZwwHi rGvrwQESzTpiuhdXxQEOQpmrd8IvTmiFQK+IZzJ3uUA197ROdxyWLmdbbMZvsLym 8rtC+WNR0IJmPmnWNl1pj2df8YmtWkAGLaw2RMj4RFz3AcXBRurAOrCVG8Lk8ptH pFS0n4W3ScuTrZFy1jXYjpVumeIAuWJ/ScPJZhVsDJmssZWv4ZNr/X+OExq0z3dJ g9IBJ64q1zJiD5gSs2+iXmBTEHP6lpap9hY9WjApep7RuDsM9+o78oVEJcGdXbRM StFJoFdyOrsIR0cuo4yd+Lp/1ZpqP2ES++itW2PA96RXAuP/4R040xXqK/qMEczW XfCHqpOIqpCF7lxt9bcc =shjO -----END PGP SIGNATURE----- Merge tag 'writeback-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/wfg/linux fix 1 mysterious divide error fix 3 NULL dereference bugs in writeback tracing, on SD card removal w/o umount * tag 'writeback-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/wfg/linux: writeback: fix dereferencing NULL bdi->dev on trace_writeback_queue lib: proportion: lower PROP_MAX_SHIFT to 32 on 64-bit kernel writeback: fix NULL bdi->dev in trace writeback_single_inode backing-dev: fix wakeup timer races with bdi_unregister()	2012-02-10 09:05:52 -08:00
Trond Myklebust	b9f9a03150	NFSv4: Ensure we throw out bad delegation stateids on NFS4ERR_BAD_STATEID To ensure that we don't just reuse the bad delegation when we attempt to recover the nfs4_state that received the bad stateid error. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Cc: stable@vger.kernel.org	2012-02-09 15:59:21 -05:00
Xi Wang	1ecd3c7ea7	nilfs2: avoid overflowing segment numbers in nilfs_ioctl_clean_segments() nsegs is read from userspace. Limit its value and avoid overflowing nsegs * sizeof(__u64) in the subsequent call to memdup_user(). This patch complements `481fe17e97` ("nilfs2: potential integer overflow in nilfs_ioctl_clean_segments()"). Signed-off-by: Xi Wang <xi.wang@gmail.com> Cc: Haogang Chen <haogangchen@gmail.com> Acked-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-02-08 19:03:51 -08:00
Kent Overstreet	5abebfdd02	bio: don't overflow in bio_get_nr_vecs() There were two places bio_get_nr_vecs() could overflow: First, it did a left shift to convert from sectors to bytes immediately before dividing by PAGE_SIZE. If PAGE_SIZE ever was less than 512 a great many things would break, so dividing by PAGE_SIZE >> 9 is safe and will generate smaller code too. The nastier overflow was in the DIV_ROUND_UP() (that's what the code was effectively doing, anyways). If n + d overflowed, the whole thing would return 0 which breaks things rather effectively. bio_get_nr_vecs() doesn't claim to give an exact value anyways, so the DIV_ROUND_UP() is silly; we could do a straight divide except if a device's queue_max_sectors was less than PAGE_SIZE we'd return 0. So we just add 1; this should always be safe - things will break badly if bio_get_nr_vecs() returns > BIO_MAX_PAGES (bio_alloc() will suddenly start failing) but it's queue_max_segments that must guard against this, if queue_max_sectors is preventing this from happen things are going to explode on architectures with different PAGE_SIZE. Signed-off-by: Kent Overstreet <koverstreet@google.com> Cc: Tejun Heo <tj@kernel.org> Acked-by: Valdis Kletnieks <Valdis.Kletnieks@vt.edu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2012-02-08 22:07:18 +01:00
Jeff Layton	ff4fa4a25a	cifs: don't return error from standard_receive3 after marking response malformed standard_receive3 will check the validity of the response from the server (via checkSMB). It'll pass the result of that check to handle_mid which will dequeue it and mark it with a status of MID_RESPONSE_MALFORMED if checkSMB returned an error. At that point, standard_receive3 will also return an error, which will make the demultiplex thread skip doing the callback for the mid. This is wrong -- if we were able to identify the request and the response is marked malformed, then we want the demultiplex thread to do the callback. Fix this by making standard_receive3 return 0 in this situation. Cc: stable@vger.kernel.org Reported-and-Tested-by: Mark Moseley <moseleymark@gmail.com> Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <smfrench@gmail.com>	2012-02-07 22:25:31 -06:00
Jeff Layton	8b0192a5f4	cifs: request oplock when doing open on lookup Currently, it's always set to 0 (no oplock requested). Cc: <stable@vger.kernel.org> Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <smfrench@gmail.com>	2012-02-07 22:25:29 -06:00
Jeff Layton	4edc53c1f8	cifs: fix error handling when cifscreds key payload is an error Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <smfrench@gmail.com>	2012-02-07 22:25:26 -06:00
Linus Torvalds	84f8bf38b9	Merge git://git.samba.org/sfrench/cifs-2.6 * git://git.samba.org/sfrench/cifs-2.6: cifs: Fix oops in session setup code for null user mounts [CIFS] Update cifs Kconfig title to match removal of experimental dependency cifs: fix printk format warnings cifs: check offset in decode_ntlmssp_challenge() cifs: NULL dereference on allocation failure	2012-02-07 14:07:20 -08:00
Tejun Heo	11a3122f6c	block: strip out locking optimization in put_io_context() put_io_context() performed a complex trylock dancing to avoid deferring ioc release to workqueue. It was also broken on UP because trylock was always assumed to succeed which resulted in unbalanced preemption count. While there are ways to fix the UP breakage, even the most pathological microbench (forced ioc allocation and tight fork/exit loop) fails to show any appreciable performance benefit of the optimization. Strip it out. If there turns out to be workloads which are affected by this change, simpler optimization from the discussion thread can be applied later. Signed-off-by: Tejun Heo <tj@kernel.org> LKML-Reference: <1328514611.21268.66.camel@sli10-conroe> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2012-02-07 07:51:30 +01:00
Heiko Carstens	96e02d1586	exec: fix use-after-free bug in setup_new_exec() Setting the task name is done within setup_new_exec() by accessing bprm->filename. However this happens after flush_old_exec(). This may result in a use after free bug, flush_old_exec() may "complete" vfork_done, which will wake up the parent which in turn may free the passed in filename. To fix this add a new tcomm field in struct linux_binprm which contains the now early generated task name until it is used. Fixes this bug on s390: Unable to handle kernel pointer dereference at virtual kernel address 0000000039768000 Process kworker/u:3 (pid: 245, task: 000000003a3dc840, ksp: 0000000039453818) Krnl PSW : 0704000180000000 0000000000282e94 (setup_new_exec+0xa0/0x374) Call Trace: ([<0000000000282e2c>] setup_new_exec+0x38/0x374) [<00000000002dd12e>] load_elf_binary+0x402/0x1bf4 [<0000000000280a42>] search_binary_handler+0x38e/0x5bc [<0000000000282b6c>] do_execve_common+0x410/0x514 [<0000000000282cb6>] do_execve+0x46/0x58 [<00000000005bce58>] kernel_execve+0x28/0x70 [<000000000014ba2e>] ____call_usermodehelper+0x102/0x140 [<00000000005bc8da>] kernel_thread_starter+0x6/0xc [<00000000005bc8d4>] kernel_thread_starter+0x0/0xc Last Breaking-Event-Address: [<00000000002830f0>] setup_new_exec+0x2fc/0x374 Kernel panic - not syncing: Fatal exception: panic_on_oops Reported-by: Sebastian Ott <sebott@linux.vnet.ibm.com> Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-02-06 15:15:20 -08:00
Linus Torvalds	71b1b20b8a	- Fix a regression in 16-bit Atmel NAND flash which was introduced in 3.1 - Fix breakage with MTD suspend caused by the API rework - Fix a problem with resetting the MX28 BCH module - A couple of other trivial fixes -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (GNU/Linux) iEYEABECAAYFAk8s6HsACgkQdwG7hYl686MIiACgxpNoUWFvq8z+2UGXxsLnNrio hhcAn31H7TY3KUuIQBo4CqG2dEjNwpCw =DRWp -----END PGP SIGNATURE----- Merge tag 'for-linus-3.3' of git://git.infradead.org/~dwmw2/mtd-3.3 - Fix a regression in 16-bit Atmel NAND flash which was introduced in 3.1 - Fix breakage with MTD suspend caused by the API rework - Fix a problem with resetting the MX28 BCH module - A couple of other trivial fixes * tag 'for-linus-3.3-20120204' of git://git.infradead.org/~dwmw2/mtd-3.3: Revert "mtd: atmel_nand: optimize read/write buffer functions" mtd: fix MTD suspend jffs2: do not initialize variable unnecessarily mtd: gpmi-nand bugfix: reset the BCH module when it is not MX23 mtd: nand: fix typo in comment	2012-02-04 07:17:47 -08:00
Trond Myklebust	331818f1c4	NFSv4: Fix an Oops in the NFSv4 getacl code Commit `bf118a342f` (NFSv4: include bitmap in nfsv4 get acl data) introduces the 'acl_scratch' page for the case where we may need to decode multi-page data. However it fails to take into account the fact that the variable may be NULL (for the case where we're not doing multi-page decode), and it also attaches it to the encoding xdr_stream rather than the decoding one. The immediate result is an Oops in nfs4_xdr_enc_getacl due to the call to page_address() with a NULL page pointer. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Cc: Andy Adamson <andros@netapp.com> Cc: stable@vger.kernel.org	2012-02-03 18:50:34 -05:00
Linus Torvalds	6c073a7ee2	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: rbd: fix safety of rbd_put_client() rbd: fix a memory leak in rbd_get_client() ceph: create a new session lock to avoid lock inversion ceph: fix length validation in parse_reply_info() ceph: initialize client debugfs outside of monc->mutex ceph: change "ceph.layout" xattr to be "ceph.file.layout"	2012-02-02 15:47:33 -08:00
Shirish Pargaonkar	de47a4176c	cifs: Fix oops in session setup code for null user mounts For null user mounts, do not invoke string length function during session setup. Cc: <stable@kernel.org Reported-and-Tested-by: Chris Clayton <chris2553@googlemail.com> Acked-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Shirish Pargaonkar <shirishpargaonkar@gmail.com> Signed-off-by: Steve French <smfrench@gmail.com>	2012-02-02 16:59:09 -06:00
Christopher Yeoh	8cdb878dcb	Fix race in process_vm_rw_core This fixes the race in process_vm_core found by Oleg (see http://article.gmane.org/gmane.linux.kernel/1235667/ for details). This has been updated since I last sent it as the creation of the new mm_access() function did almost exactly the same thing as parts of the previous version of this patch did. In order to use mm_access() even when /proc isn't enabled, we move it to kernel/fork.c where other related process mm access functions already are. Signed-off-by: Chris Yeoh <yeohc@au1.ibm.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-02-02 12:55:17 -08:00
Alex Elder	d8fb02abdc	ceph: create a new session lock to avoid lock inversion Lockdep was reporting a possible circular lock dependency in dentry_lease_is_valid(). That function needs to sample the session's s_cap_gen and and s_cap_ttl fields coherently, but needs to do so while holding a dentry lock. The s_cap_lock field was being used to protect the two fields, but that can't be taken while holding a lock on a dentry within the session. In most cases, the s_cap_gen and s_cap_ttl fields only get operated on separately. But in three cases they need to be updated together. Implement a new lock to protect the spots updating both fields atomically is required. Signed-off-by: Alex Elder <elder@dreamhost.com> Reviewed-by: Sage Weil <sage@newdream.net>	2012-02-02 12:49:19 -08:00
Xi Wang	32852a81bc	ceph: fix length validation in parse_reply_info() "len" is read from network and thus needs validation. Otherwise, given a bogus "len" value, p+len could be an out-of-bounds pointer, which is used in further parsing. Signed-off-by: Xi Wang <xi.wang@gmail.com> Signed-off-by: Sage Weil <sage@newdream.net>	2012-02-02 12:49:11 -08:00
Alex Elder	114fc47492	ceph: change "ceph.layout" xattr to be "ceph.file.layout" The virtual extended attribute named "ceph.layout" is meaningful only for regular files. Change its name to be "ceph.file.layout" to more directly reflect that in the ceph xattr namespace. Preserve the old "ceph.layout" name for the time being (until we decide it's safe to get rid of it entirely). Add a missing initializer for "readonly" in the terminating entry. Signed-off-by: Alex Elder <elder@dreamhost.com> Reviewed-by: Sage Weil <sage@newdream.net>	2012-02-02 12:48:52 -08:00
Oleg Nesterov	6d08f2c713	proc: make sure mem_open() doesn't pin the target's memory Once /proc/pid/mem is opened, the memory can't be released until mem_release() even if its owner exits. Change mem_open() to do atomic_inc(mm_count) + mmput(), this only pins mm_struct. Change mem_rw() to do atomic_inc_not_zero(mm_count) before access_remote_vm(), this verifies that this mm is still alive. I am not sure what should mem_rw() return if atomic_inc_not_zero() fails. With this patch it returns zero to match the "mm == NULL" case, may be it should return -EINVAL like it did before `e268337d`. Perhaps it makes sense to add the additional fatal_signal_pending() check into the main loop, to ensure we do not hold this memory if the target task was oom-killed. Cc: stable@kernel.org Signed-off-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-02-01 14:39:01 -08:00
Oleg Nesterov	572d34b946	proc: unify mem_read() and mem_write() No functional changes, cleanup and preparation. mem_read() and mem_write() are very similar. Move this code into the new common helper, mem_rw(), which takes the additional "int write" argument. Cc: stable@kernel.org Signed-off-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-02-01 14:39:01 -08:00
Oleg Nesterov	71879d3cb3	proc: mem_release() should check mm != NULL mem_release() can hit mm == NULL, add the necessary check. Cc: stable@kernel.org Signed-off-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-02-01 14:39:01 -08:00
Artem Bityutskiy	7d73101921	mtd: fix merge conflict resolution breakage This patch fixes merge conflict resolution breakage introduced by merge `d3712b9dfc` ("Merge tag 'for-linus' of git://github.com/prasad-joshi/logfs_upstream"). The commit changed 'mtd_can_have_bb()' function and made it always return zero, which is incorrect. Instead, we need it to return whether the underlying flash device can have bad eraseblocks or not. UBI needs this information because it affects how it handles the underlying flash. E.g., if the underlying flash is NOR, it cannot have bad blocks and any write or erase error is fatal, and all we can do is to switch to R/O mode. We do not need to reserve a pool of good eraseblocks for bad eraseblocks handling, and so on. This patch also removes 'mtd_can_have_bb()' invocations from Logfs to ensure correct Logfs behavior. I've tested that with this patch UBI works on top of NOR and NAND flashes emulated by mtdram and nandsim correspondingly. This patch is based on patch from Linus Torvalds. Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com> Acked-by: Jörn Engel <joern@logfs.org> Acked-by: Prasad Joshi <prasadjoshi.linux@gmail.com> Acked-by: Brian Norris <computersforpeace@gmail.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-02-01 11:10:24 -08:00
Wu Fengguang	15eb77a07c	writeback: fix NULL bdi->dev in trace writeback_single_inode bdi_prune_sb() resets sb->s_bdi to default_backing_dev_info when the tearing down the original bdi. Fix trace_writeback_single_inode to use sb->s_bdi=default_backing_dev_info rather than bdi->dev=NULL for a teared down bdi. Cc: <stable@kernel.org> Reported-by: Rabin Vincent <rabin@rab.in> Tested-by: Rabin Vincent <rabin@rab.in> Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>	2012-02-01 16:53:40 +08:00
Chris Mason	d98456fcaf	Btrfs: don't reserve data with extents locked in btrfs_fallocate btrfs_fallocate tries to allocate space only if ranges in the file don't already exist. But the enospc checks it does are not allowed with extents locked. Signed-off-by: Chris Mason <chris.mason@oracle.com>	2012-01-31 20:27:41 -05:00
Steve French	2a73ca8208	[CIFS] Update cifs Kconfig title to match removal of experimental dependency Removed the dependency on CONFIG_EXPERIMENTAL but forgot to update the text description to be consistent. Signed-off-by: Steve French <smfrench@gmail.com>	2012-01-31 12:51:24 -06:00
Mitsuo Hayasaka	4505360376	xfs: pass KM_SLEEP flag to kmem_realloc() in xlog_recover_add_to_cnt_trans() The kmem_realloc() in xfs is given KM_* memory allocation flags. And it allocates memory using kmalloc() after they are converted to gfp_mask flags. In xlog_recover_add_to_cont_trans(), 0u is passed to kmem_realloc(), instead of them. I guess it is preferred to use them, and here memory must be allocated but don't have to be done with GFP_ATOMIC. So, this patch changes it to KM_SLEEP. Signed-off-by: Mitsuo Hayasaka <mitsuo.hayasaka.hu@hitachi.com> Cc: Ben Myers <bpm@sgi.com> Cc: Alex Elder <elder@kernel.org> Cc: Christoph Hellwig <hch@lst.de> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Ben Myers <bpm@sgi.com>	2012-01-31 12:11:18 -06:00
Linus Torvalds	d3712b9dfc	Pull request from git://github.com/prasad-joshi/logfs_upstream.git There are few important bug fixes for LogFS Shortlog: Joern Engel (5): logfs: Prevent memory corruption logfs: remove useless BUG_ON logfs: Free areas before calling generic_shutdown_super() logfs: Grow inode in delete path Logfs: Allow NULL block_isbad() methods Prasad Joshi (5): logfs: update page reference count for pined pages logfs: take write mutex lock during fsync and sync logfs: set superblock shutdown flag after generic sb shutdown logfs: Propagate page parameter to __logfs_write_inode MAINTAINERS: Add Prasad Joshi in LogFS maintiners Diffstat: MAINTAINERS \| 1 + fs/logfs/dev_mtd.c \| 26 +++++++++++------------- fs/logfs/dir.c \| 2 +- fs/logfs/file.c \| 2 + fs/logfs/gc.c \| 2 +- fs/logfs/inode.c \| 4 ++- fs/logfs/journal.c \| 1 - fs/logfs/logfs.h \| 5 +++- fs/logfs/readwrite.c \| 51 +++++++++++++++++++++++++++++++++---------------- fs/logfs/segment.c \| 51 ++++++++++++++++++++++++++++++++++++++----------- fs/logfs/super.c \| 3 +- 11 files changed, 99 insertions(+), 49 deletions(-) -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (GNU/Linux) iQIcBAABAgAGBQJPKByhAAoJEDFA/f+3K+ZNQSUP/3gACcIwcsl+FnXPWBtz9XIG g0DjXoRDd/sR0u25nLgjCVdBJgx5FVEyA+PvLgvvUU2KCAsqI5F/EQ+fLJs21YEN TzepBO5aHtFZbNEjo6WiXOlDbBePTtk44WrN6jqoCHM/aDeT4Wof3NZBmHWNN1PX B2RtEZ0ypJ7/b1OY2LUNcQfTaJXNgVoP8Hkx4KGY5LUVxVrBXxvDTU7YbkS8a+ys 1Yje/EQ4XD4RyZB42TmFEuTenvGPRgMGVFdnkJKuON8EmJQ8Hc61jEf5d7Q8sWef dH5F/ptoAaR9a9LbbO8LoYuBZ8MR8848NPsrNPpr/gWntj46Z79yII8Jr7YoSDyw zq5G2dZbwlbVrtVWKGae47THkNB8bljR/g4cijvPAkvuIAku6mg+dgjVHAhZ/t+J xu8+Gy2sWHUH2gmoSXuoNyppOvYpPIRd5RB16PizMH3bw+sMad2K8/rfOKnmF1/r HTM2jZ5bDcHVDjSuVI6u2m/mQX+PmPXUTffreaFXuSI75YpT0dqN3nponTX4EgFI Ad9ZBQvdg8w1LGDsNxIAaqrGx4Q87RxqfUV4W/wo6N8gKsp+I2y4GtYMeD/CEKyi wncKg10YwoMXZj7cBAkWgPlgrOBYCPwYZc/1DVRHvqrHo/m13SJrWDKkNKVvoXzH 2y4Tfi5w1WDRUT7yeoyK =TA1A -----END PGP SIGNATURE----- Merge tag 'for-linus' of git://github.com/prasad-joshi/logfs_upstream There are few important bug fixes for LogFS * tag 'for-linus' of git://github.com/prasad-joshi/logfs_upstream: Logfs: Allow NULL block_isbad() methods logfs: Grow inode in delete path logfs: Free areas before calling generic_shutdown_super() logfs: remove useless BUG_ON MAINTAINERS: Add Prasad Joshi in LogFS maintiners logfs: Propagate page parameter to __logfs_write_inode logfs: set superblock shutdown flag after generic sb shutdown logfs: take write mutex lock during fsync and sync logfs: Prevent memory corruption logfs: update page reference count for pined pages Fix up conflict in fs/logfs/dev_mtd.c due to semantic change in what "mtd->block_isbad" means in commit f2933e86ad93: "Logfs: Allow NULL block_isbad() methods" clashing with the abstraction changes in the commits 7086c19d0742: "mtd: introduce mtd_block_isbad interface" and d58b27ed58a3: "logfs: do not use 'mtd->block_isbad' directly". This resolution takes the semantics from commit `f2933e86ad`, and just makes mtd_block_isbad() return zero (false) if the 'block_isbad' function is NULL. But that also means that now "mtd_can_have_bb()" always returns 0. Now, "mtd_block_markbad()" will obviously return an error if the low-level driver doesn't support bad blocks, so this is somewhat non-symmetric, but it actually makes sense if a NULL "block_isbad" function is considered to mean "I assume that all my blocks are always good".	2012-01-31 09:23:59 -08:00
Randy Dunlap	000f9bb839	cifs: fix printk format warnings Fix printk format warnings for ssize_t variables: fs/cifs/connect.c:2145:3: warning: format '%ld' expects type 'long int', but argument 3 has type 'ssize_t' fs/cifs/connect.c:2152:3: warning: format '%ld' expects type 'long int', but argument 3 has type 'ssize_t' fs/cifs/connect.c:2160:3: warning: format '%ld' expects type 'long int', but argument 3 has type 'ssize_t' fs/cifs/connect.c:2170:3: warning: format '%ld' expects type 'long int', but argument 3 has type 'ssize_t' Signed-off-by: Randy Dunlap <rdunlap@xenotime.net> Acked-by: Jeff Layton <jlayton@redhat.com> Cc: linux-cifs@vger.kernel.org	2012-01-31 07:42:08 -06:00
Dan Carpenter	4991a5faab	cifs: check offset in decode_ntlmssp_challenge() We should check that we're not copying memory from beyond the end of the blob. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by: Jeff Layton <jlayton@redhat.com>	2012-01-31 07:42:06 -06:00
Linus Torvalds	0a96265754	Here are some patches for the 3.3-rc1 tree. It contains the removal of the sysdev code, now that all users of it are gone, as well as some sysfs bugfixes that have been reported by users. There are also some documentation updates here as well. -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.18 (GNU/Linux) iEYEABECAAYFAk8jKW4ACgkQMUfUDdst+ynAUwCfVWwHJxpb4DSSMVZhGOnHMQrL ZjIAn00gPeSs5u8y1nPvFrFikbon4FDs =bzVy -----END PGP SIGNATURE----- Merge tag 'driver-core-3.3-rc1-bugfixes' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core Here are some patches for the 3.3-rc1 tree. It contains the removal of the sysdev code, now that all users of it are gone, as well as some sysfs bugfixes that have been reported by users. There are also some documentation updates here as well. * tag 'driver-core-3.3-rc1-bugfixes' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: sysfs: Complain bitterly about attempts to remove files from nonexistent directories. stable: update documentation to ask for kernel version base/core.c:fix typo in comment in function device_add Documentation: devres: add allocation functions to list of supported calls Documentation update for the driver model core kernel-doc: fix new warnings in driver-core kernel-doc: fix new warnings in debugfs kernel-doc: fix new warnings in device.h driver core: remove drivers/base/sys.c and include/linux/sysdev.h	2012-01-28 18:20:48 -08:00
Linus Torvalds	67d2433ee7	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs: Btrfs: fix reservations in btrfs_page_mkwrite Btrfs: advance window_start if we're using a bitmap btrfs: mask out gfp flags in releasepage Btrfs: fix enospc error caused by wrong checks of the chunk Btrfs: do not defrag a file partially Btrfs: fix warning for 32-bit build of fs/btrfs/check-integrity.c Btrfs: use cluster->window_start when allocating from a cluster bitmap Btrfs: Check for NULL page in extent_range_uptodate btrfs: Fix busyloops in transaction waiting code Btrfs: make sure a bitmap has enough bytes Btrfs: fix uninit warning in backref.c	2012-01-28 17:00:19 -08:00
Joern Engel	f2933e86ad	Logfs: Allow NULL block_isbad() methods Not all mtd drivers define block_isbad(). Let's assume no bad blocks instead of refusing to mount. Signed-off-by: Joern Engel <joern@logfs.org>	2012-01-28 11:43:40 +05:30
Joern Engel	bbe0138712	logfs: Grow inode in delete path Can be necessary if an inode gets deleted (through -ENOSPC) before being written. Might be better to move this into logfs_write_rec(), but for now go with the stupid&safe patch. Signed-off-by: Joern Engel <joern@logfs.org>	2012-01-28 11:43:07 +05:30
Joern Engel	1bcceaff8c	logfs: Free areas before calling generic_shutdown_super() Or hit an assertion in map_invalidatepage() instead. Signed-off-by: Joern Engel <joern@logfs.org>	2012-01-28 11:42:39 +05:30
Joern Engel	6c69494f6b	logfs: remove useless BUG_ON It prevents write sizes >4k. Signed-off-by: Joern Engel <joern@logfs.org>	2012-01-28 11:41:56 +05:30
Prasad Joshi	0bd90387ed	logfs: Propagate page parameter to __logfs_write_inode During GC LogFS has to rewrite each valid block to a separate segment. Rewrite operation reads data from an old segment and writes it to a newly allocated segment. Since every write operation changes data block pointers maintained in inode, inode should also be rewritten. In GC path to avoid AB-BA deadlock LogFS marks a page with PG_pre_locked in addition to locking the page (PG_locked). The page lock is ignored iff the page is pre-locked. LogFS uses a special file called segment file. The segment file maintains an 8 bytes entry for every segment. It keeps track of erase count, level etc. for every segment. Bad things happen with a segment belonging to the segment file is GCed ------------[ cut here ]------------ kernel BUG at /home/prasad/logfs/readwrite.c:297! invalid opcode: 0000 [#1] SMP Modules linked in: logfs joydev usbhid hid psmouse e1000 i2c_piix4 serio_raw [last unloaded: logfs] Pid: 20161, comm: mount Not tainted 3.1.0-rc3+ #3 innotek GmbH VirtualBox EIP: 0060:[<f809132a>] EFLAGS: 00010292 CPU: 0 EIP is at logfs_lock_write_page+0x6a/0x70 [logfs] EAX: 00000027 EBX: f73f5b20 ECX: c16007c8 EDX: 00000094 ESI: 00000000 EDI: e59be6e4 EBP: c7337b28 ESP: c7337b18 DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 Process mount (pid: 20161, ti=c7336000 task=eb323f70 task.ti=c7336000) Stack: f8099a3d c7337b24 f73f5b20 00001002 c7337b50 f8091f6d f8099a4d f80994e4 00000003 00000000 c7337b68 00000000 c67e4400 00001000 c7337b80 f80935e5 00000000 00000000 00000000 00000000 e1fcf000 0000000f e59be618 c70bf900 Call Trace: [<f8091f6d>] logfs_get_write_page.clone.16+0xdd/0x100 [logfs] [<f80935e5>] logfs_mod_segment_entry+0x55/0x110 [logfs] [<f809460d>] logfs_get_segment_entry+0x1d/0x20 [logfs] [<f8091060>] ? logfs_cleanup_journal+0x50/0x50 [logfs] [<f809521b>] ostore_get_erase_count+0x1b/0x40 [logfs] [<f80965b8>] logfs_open_area+0xc8/0x150 [logfs] [<c141a7ec>] ? kmemleak_alloc+0x2c/0x60 [<f809668e>] __logfs_segment_write.clone.16+0x4e/0x1b0 [logfs] [<c10dd563>] ? mempool_kmalloc+0x13/0x20 [<c10dd563>] ? mempool_kmalloc+0x13/0x20 [<f809696f>] logfs_segment_write+0x17f/0x1d0 [logfs] [<f8092e8c>] logfs_write_i0+0x11c/0x180 [logfs] [<f8092f35>] logfs_write_direct+0x45/0x90 [logfs] [<f80934cd>] __logfs_write_buf+0xbd/0xf0 [logfs] [<c102900e>] ? kmap_atomic_prot+0x4e/0xe0 [<f809424b>] logfs_write_buf+0x3b/0x60 [logfs] [<f80947a9>] __logfs_write_inode+0xa9/0x110 [logfs] [<f8094cb0>] logfs_rewrite_block+0xc0/0x110 [logfs] [<f8095300>] ? get_mapping_page+0x10/0x60 [logfs] [<f8095aa0>] ? logfs_load_object_aliases+0x2e0/0x2f0 [logfs] [<f808e57d>] logfs_gc_segment+0x2ad/0x310 [logfs] [<f808e62a>] __logfs_gc_once+0x4a/0x80 [logfs] [<f808ed43>] logfs_gc_pass+0x683/0x6a0 [logfs] [<f8097a89>] logfs_mount+0x5a9/0x680 [logfs] [<c1126b21>] mount_fs+0x21/0xd0 [<c10f6f6f>] ? __alloc_percpu+0xf/0x20 [<c113da41>] ? alloc_vfsmnt+0xb1/0x130 [<c113db4b>] vfs_kern_mount+0x4b/0xa0 [<c113e06e>] do_kern_mount+0x3e/0xe0 [<c113f60d>] do_mount+0x34d/0x670 [<c10f2749>] ? strndup_user+0x49/0x70 [<c113fcab>] sys_mount+0x6b/0xa0 [<c142d87c>] syscall_call+0x7/0xb Code: f8 e8 8b 93 39 c9 8b 45 f8 3e 0f ba 28 00 19 d2 85 d2 74 ca eb d0 0f 0b 8d 45 fc 89 44 24 04 c7 04 24 3d 9a 09 f8 e8 09 92 39 c9 <0f> 0b 8d 74 26 00 55 89 e5 3e 8d 74 26 00 8b 10 80 e6 01 74 09 EIP: [<f809132a>] logfs_lock_write_page+0x6a/0x70 [logfs] SS:ESP 0068:c7337b18 ---[ end trace 96e67d5b3aa3d6ca ]--- The patch passes locked page to __logfs_write_inode. It calls function logfs_get_wblocks() to pre-lock the page. This ensures any further attempts to lock the page are ignored (esp from get_erase_count). Acked-by: Joern Engel <joern@logfs.org> Signed-off-by: Prasad Joshi <prasadjoshi.linux@gmail.com>	2012-01-28 11:38:25 +05:30
Prasad Joshi	ecfd890991	logfs: set superblock shutdown flag after generic sb shutdown While unmounting the file system LogFS calls generic_shutdown_super. The function does file system independent superblock shutdown. However, it might result in call file system specific inode eviction. LogFS marks FS shutting down by setting bit LOGFS_SB_FLAG_SHUTDOWN in super->s_flags. Since, inode eviction might call truncate on inode, following BUG is observed when file system is unmounted: ------------[ cut here ]------------ kernel BUG at /home/prasad/logfs/segment.c:362! invalid opcode: 0000 [#1] PREEMPT SMP CPU 3 Modules linked in: logfs binfmt_misc ppdev virtio_blk parport_pc lp parport psmouse floppy virtio_pci serio_raw virtio_ring virtio Pid: 1933, comm: umount Not tainted 3.0.0+ #4 Bochs Bochs RIP: 0010:[<ffffffffa008c841>] [<ffffffffa008c841>] logfs_segment_write+0x211/0x230 [logfs] RSP: 0018:ffff880062d7b9e8 EFLAGS: 00010202 RAX: 000000000000000e RBX: ffff88006eca9000 RCX: 0000000000000000 RDX: ffff88006fd87c40 RSI: ffffea00014ff468 RDI: ffff88007b68e000 RBP: ffff880062d7ba48 R08: 8000000020451430 R09: 0000000000000000 R10: dead000000100100 R11: 0000000000000000 R12: ffff88006fd87c40 R13: ffffea00014ff468 R14: ffff88005ad0a460 R15: 0000000000000000 FS: 00007f25d50ea760(0000) GS:ffff88007fd80000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 0000000000d05e48 CR3: 0000000062c72000 CR4: 00000000000006e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process umount (pid: 1933, threadinfo ffff880062d7a000, task ffff880070b44500) Stack: ffff880062d7ba38 ffff88005ad0a508 0000000000001000 0000000000000000 8000000020451430 ffffea00014ff468 ffff880062d7ba48 ffff88005ad0a460 ffff880062d7bad8 ffffea00014ff468 ffff88006fd87c40 0000000000000000 Call Trace: [<ffffffffa0088fee>] logfs_write_i0+0x12e/0x190 [logfs] [<ffffffffa0089360>] __logfs_write_rec+0x140/0x220 [logfs] [<ffffffffa0089312>] __logfs_write_rec+0xf2/0x220 [logfs] [<ffffffffa00894a4>] logfs_write_rec+0x64/0xd0 [logfs] [<ffffffffa0089616>] __logfs_write_buf+0x106/0x110 [logfs] [<ffffffffa008a19e>] logfs_write_buf+0x4e/0x80 [logfs] [<ffffffffa008a6b8>] __logfs_write_inode+0x98/0x110 [logfs] [<ffffffffa008a7c4>] logfs_truncate+0x54/0x290 [logfs] [<ffffffffa008abfc>] logfs_evict_inode+0xdc/0x190 [logfs] [<ffffffff8115eef5>] evict+0x85/0x170 [<ffffffff8115f126>] iput+0xe6/0x1b0 [<ffffffff8115b4a8>] shrink_dcache_for_umount_subtree+0x218/0x280 [<ffffffff8115ce91>] shrink_dcache_for_umount+0x51/0x90 [<ffffffff8114796c>] generic_shutdown_super+0x2c/0x100 [<ffffffffa008cc47>] logfs_kill_sb+0x57/0xf0 [logfs] [<ffffffff81147de5>] deactivate_locked_super+0x45/0x70 [<ffffffff811487ea>] deactivate_super+0x4a/0x70 [<ffffffff81163934>] mntput_no_expire+0xa4/0xf0 [<ffffffff8116469f>] sys_umount+0x6f/0x380 [<ffffffff814dd46b>] system_call_fastpath+0x16/0x1b Code: 55 c8 49 8d b6 a8 00 00 00 45 89 f9 45 89 e8 4c 89 e1 4c 89 55 b8 c7 04 24 00 00 00 00 e8 68 fc ff ff 4c 8b 55 b8 e9 3c ff ff ff <0f> 0b 0f 0b c7 45 c0 00 00 00 00 e9 44 fe ff ff 66 66 66 66 66 RIP [<ffffffffa008c841>] logfs_segment_write+0x211/0x230 [logfs] RSP <ffff880062d7b9e8> ---[ end trace fe6b040cea952290 ]--- Therefore, move super->s_flags setting after the fs-indenpendent work has been finished. Reviewed-by: Joern Engel <joern@logfs.org> Signed-off-by: Prasad Joshi <prasadjoshi.linux@gmail.com>	2012-01-28 11:37:47 +05:30
Prasad Joshi	13ced29cb2	logfs: take write mutex lock during fsync and sync LogFS uses super->s_write_mutex while writing data to disk. Taking the same mutex lock in sync and fsync code path solves the following BUG: ------------[ cut here ]------------ kernel BUG at /home/prasad/logfs/dev_bdev.c:134! Pid: 2387, comm: flush-253:16 Not tainted 3.0.0+ #4 Bochs Bochs RIP: 0010:[<ffffffffa007deed>] [<ffffffffa007deed>] bdev_writeseg+0x25d/0x270 [logfs] Call Trace: [<ffffffffa007c381>] logfs_open_area+0x91/0x150 [logfs] [<ffffffff8128dcb2>] ? find_level.clone.9+0x62/0x100 [<ffffffffa007c49c>] __logfs_segment_write.clone.20+0x5c/0x190 [logfs] [<ffffffff810ef005>] ? mempool_kmalloc+0x15/0x20 [<ffffffff810ef383>] ? mempool_alloc+0x53/0x130 [<ffffffffa007c7a4>] logfs_segment_write+0x1d4/0x230 [logfs] [<ffffffffa0078f8e>] logfs_write_i0+0x12e/0x190 [logfs] [<ffffffffa0079300>] __logfs_write_rec+0x140/0x220 [logfs] [<ffffffffa0079444>] logfs_write_rec+0x64/0xd0 [logfs] [<ffffffffa00795b6>] __logfs_write_buf+0x106/0x110 [logfs] [<ffffffffa007a13e>] logfs_write_buf+0x4e/0x80 [logfs] [<ffffffffa0073e33>] __logfs_writepage+0x23/0x80 [logfs] [<ffffffffa007410c>] logfs_writepage+0xdc/0x110 [logfs] [<ffffffff810f5ba7>] __writepage+0x17/0x40 [<ffffffff810f6208>] write_cache_pages+0x208/0x4f0 [<ffffffff810f5b90>] ? set_page_dirty+0x70/0x70 [<ffffffff810f653a>] generic_writepages+0x4a/0x70 [<ffffffff810f75d1>] do_writepages+0x21/0x40 [<ffffffff8116b9d1>] writeback_single_inode+0x101/0x250 [<ffffffff8116bdbd>] writeback_sb_inodes+0xed/0x1c0 [<ffffffff8116c5fb>] writeback_inodes_wb+0x7b/0x1e0 [<ffffffff8116cc23>] wb_writeback+0x4c3/0x530 [<ffffffff814d984d>] ? sub_preempt_count+0x9d/0xd0 [<ffffffff8116cd6b>] wb_do_writeback+0xdb/0x290 [<ffffffff814d984d>] ? sub_preempt_count+0x9d/0xd0 [<ffffffff814d6208>] ? _raw_spin_unlock_irqrestore+0x18/0x40 [<ffffffff8105aa5a>] ? del_timer+0x8a/0x120 [<ffffffff8116cfac>] bdi_writeback_thread+0x8c/0x2e0 [<ffffffff8116cf20>] ? wb_do_writeback+0x290/0x290 [<ffffffff8106d2e6>] kthread+0x96/0xa0 [<ffffffff814de514>] kernel_thread_helper+0x4/0x10 [<ffffffff8106d250>] ? kthread_worker_fn+0x190/0x190 [<ffffffff814de510>] ? gs_change+0xb/0xb RIP [<ffffffffa007deed>] bdev_writeseg+0x25d/0x270 [logfs] ---[ end trace 0211ad60a57657c4 ]--- Reviewed-by: Joern Engel <joern@logfs.org> Signed-off-by: Prasad Joshi <prasadjoshi.linux@gmail.com>	2012-01-28 11:36:06 +05:30
Joern Engel	934eed395d	logfs: Prevent memory corruption This is a bad one. I wonder whether we were so far protected by no_free_segments(sb) usually being smaller than LOGFS_NO_AREAS. Found by Dan Carpenter <dan.carpenter@oracle.com> using smatch. Signed-off-by: Joern Engel <joern@logfs.org> Signed-off-by: Prasad Joshi <prasadjoshi.linux@gmail.com>	2012-01-28 11:24:21 +05:30
Prasad Joshi	96150606e2	logfs: update page reference count for pined pages LogFS sets PG_private flag to indicate a pined page. We assumed that marking a page as private is enough to ensure its existence. But instead it is necessary to hold a reference count to the page. The change resolves the following BUG BUG: Bad page state in process flush-253:16 pfn:6a6d0 page flags: 0x100000000000808(uptodate\|private) Suggested-and-Acked-by: Joern Engel <joern@logfs.org> Signed-off-by: Prasad Joshi <prasadjoshi.linux@gmail.com>	2012-01-28 11:23:10 +05:30
Chris Mason	9998eb7034	Btrfs: fix reservations in btrfs_page_mkwrite Josef fixed btrfs_page_mkwrite to properly release reserved extents if there was an error. But if we fail to get a reservation and we fail to dirty the inode (for ENOSPC reasons), we'll end up trying to release a reservation we never had. This makes sure we only release if we were able to reserve. Signed-off-by: Chris Mason <chris.mason@oracle.com>	2012-01-27 10:44:44 -05:00
Josef Bacik	9b23062840	Btrfs: advance window_start if we're using a bitmap If we span a long area in a bitmap we could end up taking a lot of time searching to the next free area if we're searching from the original window_start, so advance window_start in order to make sure we don't do any superficial searching. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2012-01-26 15:01:12 -05:00
David Sterba	0c4e538bcc	btrfs: mask out gfp flags in releasepage btree_releasepage is a callback and can be passed unknown gfp flags and then they may end up in kmem_cache_alloc called from alloc_extent_state, slab allocator will BUG_ON when there is HIGHMEM or DMA32 flag set. This may happen when btrfs is mounted from a loop device, which masks out __GFP_IO flag. The check in try_release_extent_state 3399 if ((mask & GFP_NOFS) == GFP_NOFS) 3400 mask = GFP_NOFS; will not work and passes unfiltered flags further resulting in crash at mm/slab.c:2963 [<000000000024ae4c>] cache_alloc_refill+0x3b4/0x5c8 [<000000000024c810>] kmem_cache_alloc+0x204/0x294 [<00000000001fd3c2>] mempool_alloc+0x52/0x170 [<000003c000ced0b0>] alloc_extent_state+0x40/0xd4 [btrfs] [<000003c000cee5ae>] __clear_extent_bit+0x38a/0x4cc [btrfs] [<000003c000cee78c>] try_release_extent_state+0x9c/0xd4 [btrfs] [<000003c000cc4c66>] btree_releasepage+0x7e/0xd0 [btrfs] [<0000000000210d84>] shrink_page_list+0x6a0/0x724 [<0000000000211394>] shrink_inactive_list+0x230/0x578 [<0000000000211bb8>] shrink_list+0x6c/0x120 [<0000000000211e4e>] shrink_zone+0x1e2/0x228 [<0000000000211f24>] shrink_zones+0x90/0x254 [<0000000000213410>] do_try_to_free_pages+0xac/0x420 [<0000000000213ae0>] try_to_free_pages+0x13c/0x1b0 [<0000000000204e6c>] __alloc_pages_nodemask+0x5b4/0x9a8 [<00000000001fb04a>] grab_cache_page_write_begin+0x7e/0xe8 Signed-off-by: David Sterba <dsterba@suse.cz> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2012-01-26 15:01:12 -05:00
Miao Xie	9e622d6bea	Btrfs: fix enospc error caused by wrong checks of the chunk When we did sysbench test for inline files, enospc error happened easily though there was lots of free disk space which could be allocated for new chunks. Reproduce steps: # mkfs.btrfs -b $((2 * 1024 * 1024 * 1024)) <test partition> # mount <test partition> /mnt # ulimit -n 102400 # cd /mnt # sysbench --num-threads=1 --test=fileio --file-num=81920 \ > --file-total-size=80M --file-block-size=1K --file-io-mode=sync \ > --file-test-mode=seqwr prepare # sysbench --num-threads=1 --test=fileio --file-num=81920 \ > --file-total-size=80M --file-block-size=1K --file-io-mode=sync \ > --file-test-mode=seqwr run <soon later, BUG_ON() was triggered by enospc error> The reason of this bug is: Now, we can reserve space which is larger than the free space in the chunks if we have enough free disk space which can be used for new chunks. By this way, the space allocator should allocate a new chunk by force if there is no free space in the free space cache. But there are two wrong checks which break this operation. One is if (ret == -ENOSPC && num_bytes > min_alloc_size) in btrfs_reserve_extent(), it is wrong, we should try to allocate a new chunk even we fail to allocate free space by minimum allocable size. The other is if (space_info->force_alloc) force = space_info->force_alloc; in do_chunk_alloc(). It makes the allocator ignore CHUNK_ALLOC_FORCE If someone sets ->force_alloc to CHUNK_ALLOC_LIMITED, and makes the enospc error happen. Fix these two wrong checks. Especially the second one, we fix it by changing the value of CHUNK_ALLOC_LIMITED and CHUNK_ALLOC_FORCE, and make CHUNK_ALLOC_FORCE greater than CHUNK_ALLOC_LIMITED since CHUNK_ALLOC_FORCE has higher priority. And if the value which is passed in by the caller is greater than ->force_alloc, use the passed value. Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2012-01-26 15:01:12 -05:00
Liu Bo	7ec31b548a	Btrfs: do not defrag a file partially xfstests 218 complains that btrfs defrags a file partially: After: 1 Write backwards sync, but contiguous - should defrag to 1 extent Before: 10 -After: 1 +After: 2 To fix this, we need to set max_to_defrag count properly. Signed-off-by: Liu Bo <liubo2009@cn.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2012-01-26 15:01:12 -05:00

1 2 3 4 5 ...

25748 Commits