Although all glocks are, by the time of the umount glock wait,
scheduled for demotion, some of them haven't made it far
enough through the process for the original set of waiting
code to wait for them.
This extends the ref count to the whole glock lifetime in order
to ensure that the waiting does catch all glocks. It does make
it a bit more invasive, but it seems the only sensible solution
at the moment.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
This patch adds a wait on umount between the point at which we
dispose of all glocks and the point at which we unmount the
lock protocol. This ensures that we've received all the replies
to our unlock requests before we stop the locking.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Reported-by: Fabio M. Di Nitto <fdinitto@redhat.com>
During blocked lock processing, we should consider the possibility that the
lock is no longer blocking.
Joel Becker <joel.becker@oracle.com> assisted in fixing this issue.
Reported-by: David Teigland <teigland@redhat.com>
Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>
Signed-off-by: Joel Becker <joel.becker@oracle.com>
During upconvert, if the master were to send a BAST, dlmglue will detect the
upconversion in process and send a cancel convert to the master. Upon receiving
the AST for the cancel convert, it will re-process the lock resource to determine
whether it needs downconverting. Say, the up was from PR to EX and the BAST was
for EX. After the cancel convert, it will need to downconvert to NL.
However, if the node was originally upconverting from NL to EX, then there would
be no reason to downconvert (assuming the same message sequence).
This patch makes dlmglue consider the possibility that the current lock level
is already compatible and that downconverting is not required.
Joel Becker <joel.becker@oracle.com> assisted in fixing this issue.
Fixes ossbz#1178
http://oss.oracle.com/bugzilla/show_bug.cgi?id=1178
Reported-by: Coly Li <coly.li@suse.de>
Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>
Signed-off-by: Joel Becker <joel.becker@oracle.com>
There is possibility of a livelock in __ocfs2_cluster_lock(). If a node were
to get an ast for an upconvert request, followed immediately by a bast,
there is a small window where the fs may downconvert the lock before the
process requesting the upconvert is able to take the lock.
This patch adds a new flag to indicate that the upconvert is still in
progress and that the dc thread should not downconvert it right now.
Wengang Wang <wen.gang.wang@oracle.com> and Joel Becker
<joel.becker@oracle.com> contributed heavily to this patch.
Reported-by: David Teigland <teigland@redhat.com>
Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>
Signed-off-by: Joel Becker <joel.becker@oracle.com>
During bast, set the OCFS2_LOCK_BLOCKED flag only if the lock needs to
downconverted.
Signed-off-by: Wengang Wang <wen.gang.wang@oracle.com>
Acked-by: Sunil Mushran <sunil.mushran@oracle.com>
Acked-by: Mark Fasheh <mfasheh@suse.com>
Signed-off-by: Joel Becker <joel.becker@oracle.com>
Fix typo in ioat2_quiesce. check 'tmo' is zero, not 'end'. Also applies
to 2.6.32.3
Cc: <stable@kernel.org>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Although we use u64 to pass userspace pointers to the kernel
to avoid compat_ioctl, it doesn't work in some ppc platform.
So wrap them with compat_ptr and add compat_ioctl.
The detailed discussion about compat_ptr can be found in thread
http://lkml.org/lkml/2009/10/27/423.
We indeed met with a bug when testing on ppc(-EFAULT is returned
when using old_path). This patch try to fix this.
I have tested in ppc64(with 32 bit reflink) and x86_64(with i686
reflink), both works.
Signed-off-by: Tao Ma <tao.ma@oracle.com>
Signed-off-by: Joel Becker <joel.becker@oracle.com>
Mainline commit aad1b15310 made the
dlm_begin_reco_handler() return -EAGAIN instead of EAGAIN.
As this error is transmitted over the wire, we want the receiver,
dlm_send_begin_reco_message(), to understand both the older EAGAIN and
the newer -EAGAIN, to allow rolling upgrade of the cluster nodes.
Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>
Signed-off-by: Joel Becker <joel.becker@oracle.com>
In CoW, we have to make sure that the page is already written
out to the disk. So we have a BUG_ON(PageDirty(page)).
In ppc platform we have pagesize=64K, so if the cs=4K, if the
file have fragmented clusters, we will map the page many times.
See this file as an example.
Tree Depth: 0 Count: 19 Next Free Rec: 14
## Offset Clusters Block# Flags
0 0 4 2164864 0x2 Refcounted
1 4 2 9302792 0x2 Refcounted
...
We have to replace the extent recs one by one, so the page with index 0
will be mapped and dirtied twice.
I'd like to leave the BUG_ON there while adding a check so that in
case we meet with an error in other platforms, we can find it easily.
Signed-off-by: Tao Ma <tao.ma@oracle.com>
Signed-off-by: Joel Becker <joel.becker@oracle.com>
In ocfs2_duplicate_clusters_by_page, we calculate map_end
by shifting page_index. But actually in case we meet with
a large offset(say in a i686 box, poff_t is only 32 bits
and page_index=2056240), we will overflow. So change the
type of page_index to loff_t.
Signed-off-by: Tao Ma <tao.ma@oracle.com>
Signed-off-by: Joel Becker <joel.becker@oracle.com>
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (23 commits)
connector: Delete buggy notification code.
be2net: use eq-id to calculate cev-isr reg offset
Bluetooth: Use the control channel for raw HID reports
Bluetooth: Add DFU driver for Atheros Bluetooth chipset AR3011
Bluetooth: Redo checks in IRQ handler for shared IRQ support
Bluetooth: Fix memory leak in L2CAP
Bluetooth: Remove double free of SKB pointer in L2CAP
cdc_ether: Partially revert "usbnet: Set link down initially ..."
be2net: Fix memset() arg ordering.
bonding: bond_open error return value
ixgbe: if ixgbe_copy_dcb_cfg is going to fail learn about it early
ixgbe: set the correct DCB bit for pg tx settings
igbvf: fix issue w/ mapped_as_page being left set after unmap
drivers/net: ks8851_mll ethernet network driver
be2net: Bug fix to support newer generation of BE ASIC
starfire: clean up properly if firmware loading fails
mac80211: fix NULL pointer dereference when ftrace is enabled
netfilter: ctnetlink: fix expectation mask dump
ipv6: conntrack: Add member of user to nf_ct_frag6_queue structure
ath9k: fix eeprom INI values override for 2GHz-only cards
...
This is the counterpart to cba767175b
("pktcdvd: remove broken dev_t export of class devices"). Device is not
registered using dev_t, so it should not be destroyed using device_destroy
which looks up the device by dev_t. This will fail and adding the device
again will fail with the "duplicate name" error. This is fixed using
device_unregister instead of device_destroy.
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@holoscopio.com>
Cc: Kay Sievers <kay.sievers@vrfy.org>
Cc: Peter Osterlund <petero2@telia.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Jens Axboe <jens.axboe@oracle.com>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Newly added memory can not be accessed via /dev/mem, because we do not
update the variables high_memory, max_pfn and max_low_pfn.
Add a function update_end_of_memory_vars() to update these variables for
64-bit kernels.
[akpm@linux-foundation.org: simplify comment]
Signed-off-by: Shaohui Zheng <shaohui.zheng@intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Li Haicheng <haicheng.li@intel.com>
Reviewed-by: Wu Fengguang <fengguang.wu@intel.com>
Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
init_fault_attr_entries() should be init_fault_attr_dentries().
cleanup_fault_attr_entries() should be cleanup_fault_attr_dentries().
Signed-off-by: Anton Blanchard <anton@samba.org>
Acked-by: Akinobu Mita <akinobu.mita@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
hugetlb_sysfs_add_hstate is called by hugetlb_register_node directly
during init and also indirectly via sysfs after init.
This patch removes the __init tag from hugetlb_sysfs_add_hstate.
Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Cc: Lee Schermerhorn <lee.schermerhorn@hp.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The probe function passes a pointer to a struct fb_info to
platform_set_drvdata(), so don't interpret the return value of
platform_get_drvdata() as a pointer to struct imxfb_info.
The original imxfb_info *fbi backlight_power was NULL but in imxfb_suspend
it was 4 resulting in an oops as imxfb_suspend calls
imxfb_disable_controller(fbi) which in turn has
if (fbi->backlight_power)
fbi->backlight_power(0);
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Acked-by: Sascha Hauer <kernel@pengutronix.de>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
In cgroup_create(), if alloc_css_id() returns failure, the errno is not
propagated to userspace, so mkdir will fail silently.
To trigger this bug, we mount blkio (or memory subsystem), and create more
then 65534 cgroups. (The number of cgroups is limited to 65535 if a
subsystem has use_id == 1)
# mount -t cgroup -o blkio xxx /mnt
# for ((i = 0; i < 65534; i++)); do mkdir /mnt/$i; done
# mkdir /mnt/65534
(should return ENOSPC)
#
Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Acked-by: Serge Hallyn <serue@us.ibm.com>
Acked-by: Paul Menage <menage@google.com>
Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
When I use markup_oops.pl parse a x8664 oops, I got:
objdump: --start-address: bad number: NaN
No matching code found
This is because:
main::(./m.pl:228): open(FILE, "objdump -dS --adjust-vma=$vmaoffset --start-address=$decodestart --stop-address=$decodestop $filename |") || die "Cannot start objdump";
DB<3> p $decodestart
NaN
This NaN is from:
main::(./m.pl:176): my $decodestart = Math::BigInt->from_hex("0x$target") - Math::BigInt->from_hex("0x$func_offset");
DB<2> p $func_offset
0x175
There is already a "0x" in $func_offset, another 0x makes it a NaN.
The $func_offset is from line:
if ($line =~ /RIP: 0010:\[\<[0-9a-f]+\>\] \[\<[0-9a-f]+\>\] ([a-zA-Z0-9\_]+)\+(0x[0-9a-f]+)\/0x[a-f0-9]/) {
$function = $1;
$func_offset = $2;
}
I make a patch to change "(0x[0-9a-f]+)\/0x[a-f0-9]/)" to "0x([0-9a-f]+)\/0x[a-f0-9]/)".
Signed-off-by: Hui Zhu <teawater@gmail.com>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Michal Marek <mmarek@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
When git has been set to always use color in .gitconfig then I get the
warning message
Bad divisor in main::vcs_assign: 0
This is caused by vcs_file_signoffs not matching any commits due to the
pattern not understand the colour codes. Fix this by telling git log to
never use colour.
Signed-off-by: Richard Kennedy <richard@rsk.demon.co.uk>
Acked-by: Joe Perches <joe@perches.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
write_kmem() used to assume vwrite() always return the full buffer length.
However now vwrite() could return 0 to indicate memory hole. This
creates a bug that "buf" is not advanced accordingly.
Fix it to simply ignore the return value, hence the memory hole.
Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Christoph Lameter <cl@linux-foundation.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Tejun Heo <tj@kernel.org>
Cc: Nick Piggin <npiggin@suse.de>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Otherwise vmalloc_to_page() will BUG().
This also makes the kmem read/write implementation aligned with mem(4):
"References to nonexistent locations cause errors to be returned." Here we
return -ENXIO (inspired by Hugh) if no bytes have been transfered to/from
user space, otherwise return partial read/write results.
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
Cc: Greg Kroah-Hartman <gregkh@suse.de>
Cc: Hugh Dickins <hugh.dickins@tiscali.co.uk>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The cache alias problem will happen if the changes of user shared mapping
is not flushed before copying, then user and kernel mapping may be mapped
into two different cache line, it is impossible to guarantee the coherence
after iov_iter_copy_from_user_atomic. So the right steps should be:
flush_dcache_page(page);
kmap_atomic(page);
write to page;
kunmap_atomic(page);
flush_dcache_page(page);
More precisely, we might create two new APIs flush_dcache_user_page and
flush_dcache_kern_page to replace the two flush_dcache_page accordingly.
Here is a snippet tested on omap2430 with VIPT cache, and I think it is
not ARM-specific:
int val = 0x11111111;
fd = open("abc", O_RDWR);
addr = mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0);
*(addr+0) = 0x44444444;
tmp = *(addr+0);
*(addr+1) = 0x77777777;
write(fd, &val, sizeof(int));
close(fd);
The results are not always 0x11111111 0x77777777 at the beginning as expected. Sometimes we see 0x44444444 0x77777777.
Signed-off-by: Anfei <anfei.zhou@gmail.com>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: Miklos Szeredi <miklos@szeredi.hu>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Cc: <linux-arch@vger.kernel.org>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Fix kfifo kernel-doc warnings:
Warning(kernel/kfifo.c:361): No description found for parameter 'total'
Warning(kernel/kfifo.c:402): bad line: @ @lenout: pointer to output variable with copied data
Warning(kernel/kfifo.c:412): No description found for parameter 'lenout'
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Cc: Stefani Seibold <stefani@seibold.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Fix the kernel oops when dev_dbg is called with mx3_fbi->txd == NULL
Fix the late initialisation of mx3fb->backlight_level. If not, in the
chain of function started by init_fb_chan(), in __blank() call
sdc_set_brightness(mx3fb, mx3fb->backlight_level) that will shut down the
CONTRAST PWM output.
Signed-off-by: Alberto Panizzo <maramaopercheseimorto@gmail.com>
Acked-by: Guennadi Liakhovetski <g.liakhovetski <at> gmx.de>
Cc: Sascha Hauer <s.hauer@pengutronix.de>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Eric Paris located a bug in idr. With IDR_BITS of 6, it grows to three
layers when id 4096 is first allocated. When that happens, idr wraps
incorrectly and searches the idr array ignoring the high bits. The
following test code from Eric demonstrates the bug nicely.
#include <linux/idr.h>
#include <linux/kernel.h>
#include <linux/module.h>
static DEFINE_IDR(test_idr);
int init_module(void)
{
int ret, forty95, forty96;
void *addr;
/* add 2 entries both with 4095 as the start address */
again1:
if (!idr_pre_get(&test_idr, GFP_KERNEL))
return -ENOMEM;
ret = idr_get_new_above(&test_idr, (void *)4095, 4095, &forty95);
if (ret) {
if (ret == -EAGAIN)
goto again1;
return ret;
}
if (forty95 != 4095)
printk(KERN_ERR "hmmm, forty95=%d\n", forty95);
again2:
if (!idr_pre_get(&test_idr, GFP_KERNEL))
return -ENOMEM;
ret = idr_get_new_above(&test_idr, (void *)4096, 4095, &forty96);
if (ret) {
if (ret == -EAGAIN)
goto again2;
return ret;
}
if (forty96 != 4096)
printk(KERN_ERR "hmmm, forty96=%d\n", forty96);
/* try to find the 2 entries, noticing that 4096 broke */
addr = idr_find(&test_idr, forty95);
if ((int)addr != forty95)
printk(KERN_ERR "hmmm, after find forty95=%d addr=%d\n", forty95, (int)addr);
addr = idr_find(&test_idr, forty96);
if ((int)addr != forty96)
printk(KERN_ERR "hmmm, after find forty96=%d addr=%d\n", forty96, (int)addr);
/* really weird, the entry which should be at 4096 is actually at 0!! */
addr = idr_find(&test_idr, 0);
if ((int)addr)
printk(KERN_ERR "found an entry at id=0 for addr=%d\n", (int)addr);
idr_remove(&test_idr, forty95);
idr_remove(&test_idr, forty96);
return 0;
}
void cleanup_module(void)
{
}
MODULE_AUTHOR("Eric Paris <eparis@redhat.com>");
MODULE_DESCRIPTION("Simple idr test");
MODULE_LICENSE("GPL");
This happens because when sub_alloc() back tracks it doesn't always do it
step-by-step while the over-the-limit detection assumes step-by-step
backtracking. The logic in sub_alloc() looks like the following.
restart:
clear pa[top level + 1] for end cond detection
l = top level
while (true) {
search for empty slot at this level
if (not found) {
push id to the next possible value
l++
A: if (pa[l] is clear)
failed, return asking caller to grow the tree
if (going up 1 level gives more slots to search)
continue the while loop above with the incremented l
else
C: goto restart
}
adjust id accordingly to the found slot
if (l == 0)
return found id;
create lower level if not there yet
record pa[l] and l--
}
Test A is the fail exit condition but this assumes that failure is
propagated upwared one level at a time but the B optimization path breaks
the assumption and restarts the whole thing with a start value which is
above the possible limit with the current layers. sub_alloc() assumes the
start id value is inside the limit when called and test A is the only exit
condition check, so it ends up searching for empty slot while ignoring
high set bit.
So, for 4095->4096 test, level0 search fails but pa[1] contains a valid
pointer. However, going up 1 level wouldn't give any more empty slot so
it takes C and when the whole thing restarts nobody notices the high bit
set beyond the top level.
This patch fixes the bug by changing the fail exit condition check to full
id limit check.
Based-on-patch-from: Eric Paris <eparis@redhat.com>
Reported-by: Eric Paris <eparis@redhat.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Fix minor spelling error in comment. No code change.
Signed-off-by: Andy Shevchenko <ext-andriy.shevchenko@nokia.com>
LKML-Reference: <201002022238.o12McDiF018720@imap1.linux-foundation.org>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
On Tue, Feb 02, 2010 at 02:57:14PM -0800, Greg KH (gregkh@suse.de) wrote:
> > There are at least two ways to fix it: using a big cannon and a small
> > one. The former way is to disable notification registration, since it is
> > not used by anyone at all. Second way is to check whether calling
> > process is root and its destination group is -1 (kind of priveledged
> > one) before command is dispatched to workqueue.
>
> Well if no one is using it, removing it makes the most sense, right?
>
> No objection from me, care to make up a patch either way for this?
Getting it is not used, let's drop support for notifications about
(un)registered events from connector.
Another option was to check credentials on receiving, but we can always
restore it without bugs if needed, but genetlink has a wider code base
and none complained, that userspace can not get notification when some
other clients were (un)registered.
Kudos for Sebastian Krahmer <krahmer@suse.de>, who found a bug in the
code.
Signed-off-by: Evgeniy Polyakov <zbr@ioremap.net>
Acked-by: Greg Kroah-Hartman <gregkh@suse.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Free memory allocated using kmem_cache_zalloc using kmem_cache_free rather
than kfree.
The semantic patch that makes this change is as follows:
(http://coccinelle.lip6.fr/)
// <smpl>
@@
expression x,E,c;
@@
x = \(kmem_cache_alloc\|kmem_cache_zalloc\|kmem_cache_alloc_node\)(c,...)
... when != x = E
when != &x
?-kfree(x)
+kmem_cache_free(c,x)
// </smpl>
Signed-off-by: Julia Lawall <julia@diku.dk>
Acked-by: David Howells <dhowells@redhat.com>
Cc: James Morris <jmorris@namei.org>
Cc: Steve Dickson <steved@redhat.com>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: James Morris <jmorris@namei.org>
While debugging a dma driver I noticed a memleak after
unloading the driver module.
Caught by kmemleak.
Signed-off-by: Anatolij Gustschin <agust@denx.de>
Cc: Maciej Sosnowski <maciej.sosnowski@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
* 'for-linus' of git://git.kernel.dk/linux-2.6-block:
cfq-iosched: Do not idle on async queues
blk-cgroup: Fix potential deadlock in blk-cgroup
block: fix bugs in bio-integrity mempool usage
block: fix bio_add_page for non trivial merge_bvec_fn case
drbd: null dereference bug
drbd: fix max_segment_size initialization
Improve handling of fragmented per-CPU vmaps. We previously don't free
up per-CPU maps until all its addresses have been used and freed. So
fragmented blocks could fill up vmalloc space even if they actually had
no active vmap regions within them.
Add some logic to allow all CPUs to have these blocks purged in the case
of failure to allocate a new vm area, and also put some logic to trim
such blocks of a current CPU if we hit them in the allocation path (so
as to avoid a large build up of them).
Christoph reported some vmap allocation failures when using the per CPU
vmap APIs in XFS, which cannot be reproduced after this patch and the
previous bug fix.
Cc: linux-mm@kvack.org
Cc: stable@kernel.org
Tested-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Nick Piggin <npiggin@suse.de>
--
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
RCU list walking of the per-cpu vmap cache was broken. It did not use
RCU primitives, and also the union of free_list and rcu_head is
obviously wrong (because free_list is indeed the list we are RCU
walking).
While we are there, remove a couple of unused fields from an earlier
iteration.
These APIs aren't actually used anywhere, because of problems with the
XFS conversion. Christoph has now verified that the problems are solved
with these patches. Also it is an exported interface, so I think it
will be good to be merged now (and Christoph wants to get the XFS
changes into their local tree).
Cc: stable@kernel.org
Cc: linux-mm@kvack.org
Tested-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Nick Piggin <npiggin@suse.de>
--
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* 'sh/for-2.6.33' of git://git.kernel.org/pub/scm/linux/kernel/git/lethal/sh-2.6:
sh: Fix access to released memory in clk_debugfs_register_one()
sh: Fix access to released memory in dwarf_unwinder_cleanup()
usb: r8a66597-hdc disable interrupts fix
spi: spi_sh_msiof: Fixed data sampling on the correct edge
Commit 221af7f87b ("Split 'flush_old_exec' into two functions") split
the function at the point of no return - ie right where there were no
more error cases to check. That made sense from a technical standpoint,
but when we then also combined it with the actual personality setting
going in between flush_old_exec() and setup_new_exec(), it needs to be a
bit more careful.
In particular, we need to make sure that we really flush the old
personality bits in the 'flush' stage, rather than later in the 'setup'
stage, since otherwise we might be flushing the _new_ personality state
that we're just setting up.
So this moves the flags and personality flushing (and 'flush_thread()',
which is the arch-specific function that generally resets lazy FP state
etc) of the old process into flush_old_exec(), so that it doesn't affect
any state that execve() is setting up for the new process environment.
This was reported by Michal Simek as breaking his Microblaze qemu
environment.
Reported-and-tested-by: Michal Simek <michal.simek@petalogix.com>
Cc: Peter Anvin <hpa@zytor.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Few weeks back, Shaohua Li had posted similar patch. I am reposting it
with more test results.
This patch does two things.
- Do not idle on async queues.
- It also changes the write queue depth CFQ drives (cfq_may_dispatch()).
Currently, we seem to driving queue depth of 1 always for WRITES. This is
true even if there is only one write queue in the system and all the logic
of infinite queue depth in case of single busy queue as well as slowly
increasing queue depth based on last delayed sync request does not seem to
be kicking in at all.
This patch will allow deeper WRITE queue depths (subjected to the other
WRITE queue depth contstraints like cfq_quantum and last delayed sync
request).
Shaohua Li had reported getting more out of his SSD. For me, I have got
one Lun exported from an HP EVA and when pure buffered writes are on, I
can get more out of the system. Following are test results of pure
buffered writes (with end_fsync=1) with vanilla and patched kernel. These
results are average of 3 sets of run with increasing number of threads.
AVERAGE[bufwfs][vanilla]
-------
job Set NR ReadBW(KB/s) MaxClat(us) WriteBW(KB/s) MaxClat(us)
--- --- -- ------------ ----------- ------------- -----------
bufwfs 3 1 0 0 95349 474141
bufwfs 3 2 0 0 100282 806926
bufwfs 3 4 0 0 109989 2.7301e+06
bufwfs 3 8 0 0 116642 3762231
bufwfs 3 16 0 0 118230 6902970
AVERAGE[bufwfs] [patched kernel]
-------
bufwfs 3 1 0 0 270722 404352
bufwfs 3 2 0 0 206770 1.06552e+06
bufwfs 3 4 0 0 195277 1.62283e+06
bufwfs 3 8 0 0 260960 2.62979e+06
bufwfs 3 16 0 0 299260 1.70731e+06
I also ran buffered writes along with some sequential reads and some
buffered reads going on in the system on a SATA disk because the potential
risk could be that we should not be driving queue depth higher in presence
of sync IO going to keep the max clat low.
With some random and sequential reads going on in the system on one SATA
disk I did not see any significant increase in max clat. So it looks like
other WRITE queue depth control logic is doing its job. Here are the
results.
AVERAGE[brr, bsr, bufw together] [vanilla]
-------
job Set NR ReadBW(KB/s) MaxClat(us) WriteBW(KB/s) MaxClat(us)
--- --- -- ------------ ----------- ------------- -----------
brr 3 1 850 546345 0 0
bsr 3 1 14650 729543 0 0
bufw 3 1 0 0 23908 8274517
brr 3 2 981.333 579395 0 0
bsr 3 2 14149.7 1175689 0 0
bufw 3 2 0 0 21921 1.28108e+07
brr 3 4 898.333 1.75527e+06 0 0
bsr 3 4 12230.7 1.40072e+06 0 0
bufw 3 4 0 0 19722.3 2.4901e+07
brr 3 8 900 3160594 0 0
bsr 3 8 9282.33 1.91314e+06 0 0
bufw 3 8 0 0 18789.3 23890622
AVERAGE[brr, bsr, bufw mixed] [patched kernel]
-------
job Set NR ReadBW(KB/s) MaxClat(us) WriteBW(KB/s) MaxClat(us)
--- --- -- ------------ ----------- ------------- -----------
brr 3 1 837 417973 0 0
bsr 3 1 14357.7 591275 0 0
bufw 3 1 0 0 24869.7 8910662
brr 3 2 1038.33 543434 0 0
bsr 3 2 13351.3 1205858 0 0
bufw 3 2 0 0 18626.3 13280370
brr 3 4 913 1.86861e+06 0 0
bsr 3 4 12652.3 1430974 0 0
bufw 3 4 0 0 15343.3 2.81305e+07
brr 3 8 890 2.92695e+06 0 0
bsr 3 8 9635.33 1.90244e+06 0 0
bufw 3 8 0 0 17200.3 24424392
So looks like it might make sense to include this patch.
Thanks
Vivek
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Linux kernel 2.6.32 and later allocate address space from the top of the
kernel virtual memory address space.
This patch implements virtual memory size detection for 64 bit MIPS CPUs
to avoid resulting crashes.
Signed-off-by: Guenter Roeck <guenter.roeck@ericsson.com>
Cc: linux-mips@linux-mips.org
Patchwork: http://patchwork.linux-mips.org/patch/935/
Reviewed-by: David Daney <ddaney@caviumnetworks.com>
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
cev-isr reg offset for each function is better calculated using (any) eq-id
alloted to that function instead of using pci-func number(which
does not work in some configurations...)
Signed-off-by: Sathya Perla <sathyap@serverengines.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch improves disable_controller() in the r8a66597-hdc
driver to disable all interrupts and clear status flags. It
also makes sure that disable_controller() is called during
probe(). This fixes the relatively rare case of unexpected
pending interrupts after kexec reboot.
Signed-off-by: Magnus Damm <damm@opensource.se>
Acked-by: Yoshihiro Shimoda <shimoda.yoshihiro@renesas.com>
Signed-off-by: Paul Mundt <lethal@linux-sh.org>