During migration, the recovery master node may be asked to master a lockres
it may not know about. In that case, it would not only have to create a
lockres and add it to the hash, but also remember to to do the _put_
corresponding to the kref_init in dlm_init_lockres(), as soon as the migration
is completed. Yes, we don't wait for the dlm_purge_lockres() to do that
matching put. Note the ref added for it being in the hash protects the lockres
from being freed prematurely.
This patch adds that missing put, as described above, to plug a memleak.
Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>
Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
Normally locks for remote nodes are freed when that node sends an UNLOCK
message to the master. The master node tags an DLM_UNLOCK_FREE_LOCK action
to do an extra put on the lock at the end.
However, there are times when the master node has to free the locks for the
remote nodes forcibly.
Two cases when this happens are:
1. When the master has migrated the lockres plus all locks to another node.
2. When the master is clearing all the locks of a dead node.
It was in the above two conditions that the dlm was missing the extra put.
Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>
Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
In ocfs2_group_add, 'cr' is a disk field of type 'ocfs2_chain_rec', and we
were putting cpu byteorder values into it. Swap things to the right endian
before storing.
Signed-off-by: Tao Ma <tao.ma@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
struct dlm_query_join_packet is made up of four one-byte fields. They
are effectively in big-endian order already. However, little-endian
machines swap them before putting the packet on the wire (because
query_join's response is a status, and that status is treated as a u32
on the wire). Thus, a big-endian and little-endian machines will
treat this structure differently.
The solution is to have little-endian machines swap the structure when
converting from the structure to the u32 representation.
Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
__dlm_print_one_lock_resource must be called with spin_lock
the res->spinlock. While in some cases, we use it without this
precondition and lead to the failure of assert_spin_locked.
So call dlm_print_one_lock_resource instead.
Signed-off-by: Tao Ma <tao.ma@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
fs/ocfs2/dlm/dlmdomain.c: In function 'dlm_send_join_cancels':
fs/ocfs2/dlm/dlmdomain.c:983: warning: format '%u' expects type 'unsigned int', but argument 7 has type 'long unsigned int'
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
* 'hotfixes' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6:
NFS: Fix dentry revalidation for NFSv4 referrals and mountpoint crossings
NFS: Fix the fsid revalidation in nfs_update_inode()
SUNRPC: Fix a nfs4 over rdma transport oops
NFS: Fix an f_mode/f_flags confusion in fs/nfs/write.c
As long as the directory contents haven't changed, we should just let the
path walk proceed to cross the mountpoint. Apart from being an optimisation
in the case of 'nohide' mountpoint traversals, it also fixes an issue with
referrals: referral inodes don't have valid filehandles, so calling
nfs_revalidate_inode() on them is a bug.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
When we detect that we've crossed a mountpoint on the remote server, we
must take care not to use that inode to revalidate the fsid on our
current superblock. To do so, we label the inode as a remote mountpoint,
and check for that in nfs_update_inode().
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
If the radix_tree_preload() fails, we need to destroy the inode we just
read in before trying again. This could leak xfs_vnode structures when
there is memory pressure. Noticed by Christoph Hellwig.
SGI-PV: 977823
SGI-Modid: xfs-linux-melb:xfs-kern:30606a
Signed-off-by: David Chinner <dgc@sgi.com>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
Signed-off-by: Christoph Hellwig <hch@infradead.org>
wakeups
Idle state is not being detected properly by the xfsaild push code. The
current idle state is detected by an empty list which may never happen
with mostly idle filesystem or one using lazy superblock counters. A
single dirty item in the list that exists beyond the push target can
result repeated looping attempting to push up to the target because it
fails to check if the push target has been acheived or not.
Fix by considering a dirty list with everything past the target as an idle
state and set the timeout appropriately.
SGI-PV: 977545
SGI-Modid: xfs-linux-melb:xfs-kern:30532a
Signed-off-by: David Chinner <dgc@sgi.com>
Signed-off-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
NFS and SELinux worked together previously because SELinux had NFS
specific knowledge built in. This design was approved by both groups
back in 2004 but the recent NFS changes to use nfs_parsed_mount_data and
the usage of nfs_clone_mount_data showed this to be a poor fragile
solution. This patch fixes the NFS functionality regression by making
use of the new LSM interfaces to allow an FS to explicitly set its own
mount options.
The explicit setting of mount options is done in the nfs get_sb
functions which are called before the generic vfs hooks try to set mount
options for filesystems which use text mount data.
This does not currently support NFSv4 as that functionality did not
exist in previous kernels and thus there is no regression. I will be
adding the needed code, which I believe to be the exact same as the v3
code, in nfs4_get_sb for 2.6.26.
Signed-off-by: Eric Paris <eparis@redhat.com>
Acked-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: James Morris <jmorris@namei.org>
Introduce new LSM interfaces to allow an FS to deal with their own mount
options. This includes a new string parsing function exported from the
LSM that an FS can use to get a security data blob and a new security
data blob. This is particularly useful for an FS which uses binary
mount data, like NFS, which does not pass strings into the vfs to be
handled by the loaded LSM. Also fix a BUG() in both SELinux and SMACK
when dealing with binary mount data. If the binary mount data is less
than one page the copy_page() in security_sb_copy_data() can cause an
illegal page fault and boom. Remove all NFSisms from the SELinux code
since they were broken by past NFS changes.
Signed-off-by: Eric Paris <eparis@redhat.com>
Acked-by: Stephen Smalley <sds@tycho.nsa.gov>
Acked-by: Casey Schaufler <casey@schaufler-ca.com>
Signed-off-by: James Morris <jmorris@namei.org>
* git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-2.6:
debugfs: fix sparse warnings
Driver core: Fix cleanup when failing device_add().
driver core: Remove dpm_sysfs_remove() from error path of device_add()
PM: fix new mutex-locking bug in the PM core
PM: Do not acquire device semaphores upfront during suspend
kobject: properly initialize ksets
sysfs: CONFIG_SYSFS_DEPRECATED fix
driver core: fix up Kconfig text for CONFIG_SYSFS_DEPRECATED
The "resize" option won't be noticed as it comes after the NULL option, so if
you try to mount (or in this case remount) with that option it won't be
recognized.
Signed-off-by: Josef Bacik <jbacik@redhat.com>
Cc: <linux-ext4@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
When the page is not up to date, ecryptfs_prepare_write() should be
acting much like ecryptfs_readpage(). This includes the painfully
obvious step of actually decrypting the page contents read from the
lower encrypted file.
Note that this patch resolves a bug in eCryptfs in 2.6.24 that one can
produce with these steps:
# mount -t ecryptfs /secret /secret
# echo "abc" > /secret/file.txt
# umount /secret
# mount -t ecryptfs /secret /secret
# echo "def" >> /secret/file.txt
# cat /secret/file.txt
Without this patch, the resulting data returned from cat is likely to
be something other than "abc\ndef\n".
(Thanks to Benedikt Driessen for reporting this.)
Signed-off-by: Michael Halcrow <mhalcrow@us.ibm.com>
Cc: Benedikt Driessen <bdriessen@escrypt.com>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
In commit e6bafba5b4 ("wmi: (!x & y)
strikes again"), a bug was fixed that involved converting !x & y to !(x
& y). The code below shows the same pattern, and thus should perhaps be
fixed in the same way.
This is not tested and clearly changes the semantics, so it is only
something to consider.
The semantic patch that makes this change is as follows:
(http://www.emn.fr/x-info/coccinelle/)
// <smpl>
@@ expression E1,E2; @@
(
!E1 & !E2
|
- !E1 & E2
+ !(E1 & E2)
)
// </smpl>
Signed-off-by: Julia Lawall <julia@diku.dk>
Cc: Chris Mason <chris.mason@oracle.com>
Cc: Jeff Mahoney <jeffm@suse.com>
Cc: Jan Kara <jack@ucw.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Fix NULL pointer dereference in fsync_buffers_list() introduced by recent fix
of races in private_list handling. Since bh->b_assoc_map has been cleared in
__remove_assoc_queue() we should really use original value stored in the
'mapping' variable.
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This makes the user_regset-based core dump code call user_regset writeback
hooks when available. This is necessary groundwork to allow IA64 to set
CORE_DUMP_USE_REGSET.
Cc: Shaohua Li <shaohua.li@intel.com>
Signed-off-by: Roland McGrath <roland@redhat.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
extern does not belong in C files, move declaration to linux/debugfs.h
fs/debugfs/file.c:42:30: warning: symbol 'debugfs_file_operations' was not declared. Should it be static?
fs/debugfs/file.c:54:31: warning: symbol 'debugfs_link_operations' was not declared. Should it be static?
Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
* 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mfasheh/ocfs2:
[PATCH] fs/ocfs2/aops.c: Correct use of ! and &
[2.6 patch] ocfs2: make dlm_do_assert_master() static
[2.6 patch] make ocfs2_downconvert_thread() static
[2.6 patch] fs/ocfs2/: possible cleanups
[PATCH] ocfs2: le*_add_cpu conversion
ocfs2: Fix writeout in ocfs2_data_convert_worker()
ocfs2: Enable localalloc for local mounts
* 'for-linus' of git://git.kernel.dk/linux-2.6-block:
block: fix blkdev_issue_flush() not detecting and passing EOPNOTSUPP back
block: fix shadowed variable warning in blk-map.c
block: remove extern on function definition
cciss: remove READ_AHEAD define and use block layer defaults
make cdrom.c:check_for_audio_disc() static
block/genhd.c: proper externs
unexport blk_rq_map_user_iov
unexport blk_{get,put}_queue
block/genhd.c: cleanups
proper prototype for blk_dev_init()
block/blk-tag.c should #include "blk.h"
Fix DMA access of block device in 64-bit kernel on some non-x86 systems with 4GB or upper 4GB memory
block: separate out padding from alignment
block: restore the meaning of rq->data_len to the true data length
resubmit: cciss: procfs updates to display info about many
splice: only return -EAGAIN if there's hope of more data
block: fix kernel-docbook parameters and files
This patch adds proper externs for two structs in include/linux/genhd.h
Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
sys_tee() currently is a bit eager in returning -EAGAIN, it may do so
even if we don't have a chance of anymore data becoming available. So
improve the logic and only return -EAGAIN if we have an attached writer
to the input pipe.
Reported by Johann Felix Soden <johfel@gmx.de> and
Patrick McManus <mcmanus@ducksong.com>.
Tested-by: Johann Felix Soden <johfel@users.sourceforge.net>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
In commit e6bafba5b4, a bug was fixed that
involved converting !x & y to !(x & y). The code below shows the same
pattern, and thus should perhaps be fixed in the same way.
This is not tested and clearly changes the semantics, so it is only
something to consider.
Signed-off-by: Julia Lawall <julia@diku.dk>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
This patch makes the needlessly global dlm_do_assert_master() static.
Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
This patch makes the needlessly global ocfs2_downconvert_thread()
static.
Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
This patch contains the following cleanups that are now possible:
- make the following needlessly global functions static:
- dlmglue.c:ocfs2_process_blocked_lock()
- heartbeat.c:ocfs2_node_map_init()
- #if 0 the following unused global function plus support functions:
- heartbeat.c:ocfs2_node_map_is_only()
Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
Commit f1f540688e "optimized"
ocfs2_data_convert_worker() to "only do work for regular files".
Unfortunately, I left out a '!', which casued it to *skip* regular files.
This was hidden from testing until recently because the default data
journaling mode (data=ordered) doesn't exercise this code.
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
Signed-off-by: Joel Becker <joel.becker@oracle.com>
Commit 2fbe8d1ebe disabled localalloc
for local mounts. This caused issues as ocfs2 uses localalloc to
provide write locality. This patch enables localalloc for local mounts.
Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
Fix docbook problems in filesystems.tmpl.
These cause the generated docbook to be incorrect.
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The new code that removed the limitation on the execve string size
(which was historically 32 pages) replaced it with a much softer limit
based on RLIMIT_STACK which is usually much larger than the traditional
limit. See commit b6a2fea393 ("mm:
variable length argument support") for details.
However, if you have a small stack limit (perhaps because you need lots
of stacks in a threaded environment), the new heuristic of allowing up
to 1/4th of RLIMIT_STACK to be used for argument and environment strings
could actually be smaller than the old limit.
So just say that it's ok to have up to ARG_MAX strings regardless of the
value of RLIMIT_STACK, and check the rlimit only when going over that
traditional limit.
(Of course, if you actually have a *really* small stack limit, the whole
stack itself will be limited before you hit ARG_MAX, but that has always
been true and is clearly the right behaviour anyway).
Acked-by: Carlos O'Donell <carlos@codesourcery.com>
Cc: Michael Kerrisk <michael.kerrisk@googlemail.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Ollie Wild <aaw@google.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
the "ikeep" option is set rather than "noikeep".
This regression was introduced in 970451.
With no mount options specified, xfs_parseargs() does the following:
int ikeep = 0;
args->flags |= XFSMNT_BARRIER;
args->flags2 |= XFSMNT2_COMPAT_IOSIZE;
if (!options)
goto done;
It only sets the above two options by default and before, it also used to
set XFSMNT_IDELETE by default.
If options are specified, then
if (!(args->flags & XFSMNT_DMAPI) && !ikeep)
args->flags |= XFSMNT_IDELETE;
is executed later on which is skipped by the "goto done;" above.
The solution is to invert the logic.
SGI-PV: 977771
SGI-Modid: xfs-linux-melb:xfs-kern:30590a
Signed-off-by: Niv Sardi <xaiki@sgi.com>
Signed-off-by: Barry Naujok <bnaujok@sgi.com>
Signed-off-by: Josef 'Jeff' Sipek <jeffpc@josefsipek.net>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
* 'for-linus' of git://oss.sgi.com:8090/xfs/xfs-2.6:
[XFS] Undo bit ops cleanup mod due to regression on 32-bit powermac
[XFS] Undo bit ops cleanup mod due to regression on 32-bit powermac
Remove empty file fs/xfs/Makefile-linux-2.6.
* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
ext4: add missing ext4_journal_stop()
ext4: ext4_find_next_zero_bit needs an aligned address on some arch
ext4: set EXT4_EXTENTS_FL only for directory and regular files
ext4: Don't mark filesystem error if fallocate fails
ext4: Fix BUG when writing to an unitialized extent
ext4: Don't use ext4_dec_count() if not needed
ext4: modify block allocation algorithm for the last group
ext4: Don't claim block from group which has corrupt bitmap
ext4: Get journal write access before modifying the extent tree
ext4: Fix memory and buffer head leak in callers to ext4_ext_find_extent()
ext4: Don't leave behind a half-created inode if ext4_mkdir() fails
ext4: Fix kernel BUG at fs/ext4/mballoc.c:910!
ext4: Fix locking hierarchy violation in ext4_fallocate()
Remove incorrect BKL comments in ext4
Add missing ext4_journal_stop() in error handling.
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Cc: Stephen Tweedie <sct@redhat.com>
Cc: adilger@clusterfs.com
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Mingming Cao <cmm@us.ibm.com>
Change getting task_struct by get_proc_task() at read or write time,
and returns -ESRCH if get_proc_task() returns NULL.
This is same behavior as other /proc files.
Signed-off-by: Hiroshi Shimamoto <h-shimamoto@ct.jp.nec.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
At lstats_open(), calling get_proc_task() gets task struct, but it never put.
put_task_struct() should be called when releasing.
Signed-off-by: Hiroshi Shimamoto <h-shimamoto@ct.jp.nec.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Reading /proc/<pid>/latency or /proc/<pid>/task/<tid>/latency could cause
NULL pointer dereference.
In lstats_open(), get_proc_task() can return NULL, in which case the kernel
will oops at lstats_show_proc() because m->private is NULL.
When get_proc_task() returns NULL, the kernel should return -ENOENT.
This can be reproduced by the following script.
while :
do
date
bash -c 'ls > ls.$$' &
pid=$!
cat /proc/$pid/latency &
cat /proc/$pid/latency &
cat /proc/$pid/latency &
cat /proc/$pid/latency
done
Signed-off-by: Hiroshi Shimamoto <h-shimamoto@ct.jp.nec.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
RLIMIT_RTTIME was introduced to allow the user to set a runtime timeout on
real-time tasks: http://lkml.org/lkml/2007/12/18/218. This patch updates
/proc/<pid>/limits with the new rlimit.
Signed-off-by: Eugene Teo <eugeneteo@kernel.sg>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>