try_to_wake_up() has a problem which may change status from TASK_DEAD to
TASK_RUNNING in race condition with SMI or guest environment of virtual
machine. As a result, exited task is scheduled() again and panic occurs.
Here is the sequence how it occurs:
----------------------------------+-----------------------------
|
CPU A | CPU B
----------------------------------+-----------------------------
TASK A calls exit()....
do_exit()
exit_mm()
down_read(mm->mmap_sem);
rwsem_down_failed_common()
set TASK_UNINTERRUPTIBLE
set waiter.task <= task A
list_add to sem->wait_list
:
raw_spin_unlock_irq()
(I/O interruption occured)
__rwsem_do_wake(mmap_sem)
list_del(&waiter->list);
waiter->task = NULL
wake_up_process(task A)
try_to_wake_up()
(task is still
TASK_UNINTERRUPTIBLE)
p->on_rq is still 1.)
ttwu_do_wakeup()
(*A)
:
(I/O interruption handler finished)
if (!waiter.task)
schedule() is not called
due to waiter.task is NULL.
tsk->state = TASK_RUNNING
:
check_preempt_curr();
:
task->state = TASK_DEAD
(*B)
<--- set TASK_RUNNING (*C)
schedule()
(exit task is running again)
BUG_ON() is called!
--------------------------------------------------------
The execution time between (*A) and (*B) is usually very short,
because the interruption is disabled, and setting TASK_RUNNING at (*C)
must be executed before setting TASK_DEAD.
HOWEVER, if SMI is interrupted between (*A) and (*B),
(*C) is able to execute AFTER setting TASK_DEAD!
Then, exited task is scheduled again, and BUG_ON() is called....
If the system works on guest system of virtual machine, the time
between (*A) and (*B) may be also long due to scheduling of hypervisor,
and same phenomenon can occur.
By this patch, do_exit() waits for releasing task->pi_lock which is used
in try_to_wake_up(). It guarantees the task becomes TASK_DEAD after
waking up.
Signed-off-by: Yasunori Goto <y-goto@jp.fujitsu.com>
Acked-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Link: http://lkml.kernel.org/r/20120117174031.3118.E1E9C6FF@jp.fujitsu.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This patch adds the TCO Watchdog DeviceIDs for the Intel Lynx Point PCH.
Signed-off-by: Seth Heasley <seth.heasley@intel.com>
Signed-off-by: Wim Van Sebroeck <wim@iguana.be>
Let the watchdog core to check the valid value range of min_timeout/max_timeout.
Signed-off-by: Axel Lin <axel.lin@gmail.com>
Signed-off-by: Wim Van Sebroeck <wim@iguana.be>
Correct typo "unexpectdly" to "unexpectedly" in pnx4008_wdt.c
and stmp3xxx_wdt.c
Signed-off-by: Masanari Iida<standby24x7@gmail.com>
Signed-off-by: Wim Van Sebroeck <wim@iguana.be>
While receiving WDIOS_DISABLECARD option for WDIOC_SETOPTIONS command,
call wafwdt_stop() to disable watchdog.
Call wafwdt_start() while receiving WDIOS_ENABLECARD option.
Current code has reverse behavior.
Signed-off-by: Axel Lin <axel.lin@gmail.com>
Signed-off-by: Wim Van Sebroeck <wim@iguana.be>
While receiving WDIOS_DISABLECARD option for WDIOC_SETOPTIONS command,
call wm8350_wdt_stop() to disable watchdog.
Call wm8350_wdt_start() while receiving WDIOS_ENABLECARD option.
Current code has reverse behavior.
Signed-off-by: Axel Lin <axel.lin@gmail.com>
Acked-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
Signed-off-by: Wim Van Sebroeck <wim@iguana.be>
It is only used in this driver, so no need to make the symbol global.
Signed-off-by: Axel Lin <axel.lin@gmail.com>
Signed-off-by: Wim Van Sebroeck <wim@iguana.be>
It is only used in this driver, so no need to make the symbol global.
Signed-off-by: Axel Lin <axel.lin@gmail.com>
Acked-by: Marc Vertes <marc.vertes@sigfox.com>
Signed-off-by: Wim Van Sebroeck <wim@iguana.be>
Currently the watchdog driver calls the pm_runtime_enable and never
the disable. This may cause a warning when pm_runtime_enable
checks for the count match.
Also fixes the error
/build/watchdog # insmod omap_wdt.ko
[ 44.999389] omap_wdt omap_wdt: Unbalanced pm_runtime_enable!
[ 45.011047] OMAP Watchdog Timer Rev 0x00: initial timeout 60 sec
/build/watchdog #
Attempting to fix the same by calling pm_runtime_disable.
Signed-off-by: Shubhrajyoti D <shubhrajyoti@ti.com>
Signed-off-by: Wim Van Sebroeck <wim@iguana.be>
Reimplement a call to devm_request_mem_region followed by a call to ioremap
or ioremap_nocache by a call to devm_request_and_ioremap.
The semantic patch that makes this transformation is as follows:
(http://coccinelle.lip6.fr/)
// <smpl>
@nm@
expression myname;
identifier i;
@@
struct platform_driver i = { .driver = { .name = myname } };
@@
expression dev,res,size;
expression nm.myname;
@@
-if (!devm_request_mem_region(dev, res->start, size,
- \(res->name\|dev_name(dev)\|myname\))) {
- ...
- return ...;
-}
... when != res->start
(
-devm_ioremap(dev,res->start,size)
+devm_request_and_ioremap(dev,res)
|
-devm_ioremap_nocache(dev,res->start,size)
+devm_request_and_ioremap(dev,res)
)
... when any
when != res->start
// </smpl>
Signed-off-by: Julia Lawall <julia@diku.dk>
Signed-off-by: Wim Van Sebroeck <wim@iguana.be>
Reimplement a call to devm_request_mem_region followed by a call to ioremap
or ioremap_nocache by a call to devm_request_and_ioremap.
The variable res_size is then no longer needed.
The semantic patch that makes this transformation is as follows:
(http://coccinelle.lip6.fr/)
// <smpl>
@nm@
expression myname;
identifier i;
@@
struct platform_driver i = { .driver = { .name = myname } };
@@
expression dev,res,size;
expression nm.myname;
@@
-if (!devm_request_mem_region(dev, res->start, size,
- \(res->name\|dev_name(dev)\|myname\))) {
- ...
- return ...;
-}
... when != res->start
(
-devm_ioremap(dev,res->start,size)
+devm_request_and_ioremap(dev,res)
|
-devm_ioremap_nocache(dev,res->start,size)
+devm_request_and_ioremap(dev,res)
)
... when any
when != res->start
// </smpl>
Signed-off-by: Julia Lawall <julia@diku.dk>
Signed-off-by: Wim Van Sebroeck <wim@iguana.be>
* 'fixes-for-arm-soc' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-stericsson:
mach-ux500: no MMC_CAP_SD_HIGHSPEED on Snowball
mach-ux500: enable ARM errata 764369
mach-ux500: do not override outer.inv_all
mach-ux500: musb: now musb is always in OTG mode
this patch separates fimd specific power on/off function from pm function
and the pm interfaces will call that function for power on or off.
and also removes unnecessary codes of resume function.
Signed-off-by: Inki Dae <inki.dae@samsung.com>
Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
David no longer has access to MAX1688 hardware, so drop him from the maintainers
list.
Cc: David George <dgeorgester@gmail.com>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Acked-by: David George <dgeorgester@gmail.com>
Acked-by: Jean Delvare <khali@linux-fr.org>
The actual driver code seems to have been lost in the shuffle.
Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
Signed-off-by: Guenter Roeck <guenter.roeck@ericsson.com>
I'd like to add my colleagues who dedicated to developing and
improving our driver to maintainer entry.
Signed-off-by: Inki Dae <inki.dae@samsung.com>
Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
FB based FIMD and DRM based FIMD drivers use same hardware
so with this patch, only one of them would be selected.
Signed-off-by: Inki Dae <inki.dae@samsung.com>
Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
DRM_EXYNOS_HDMI driver and VIDEO_SAMSUNG_S5P_TV driver should be
not enabled at once because they use same HW blocks. So dependency
for DRM_EXYNOS_HDMI is fixed to check VIDEO_SAMSUNG_S5P_TV=n.
Signed-off-by: Seung-Woo Kim <sw0312.kim@samsung.com>
Signed-off-by: Inki Dae <inki.dae@samsung.com>
Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
To make a api pair of request_mem_region and release_mem_region,
release_mem_region is used instead of release_resource.
Signed-off-by: Seung-Woo Kim <sw0312.kim@samsung.com>
Signed-off-by: Inki Dae <inki.dae@samsung.com>
Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
Add missing iounmap in error handling code, in a case where the function
already preforms iounmap on some other execution path.
A simplified version of the semantic match that finds this problem is as
follows: (http://coccinelle.lip6.fr/)
// <smpl>
@@
expression e;
statement S,S1;
int ret;
@@
e = \(ioremap\|ioremap_nocache\)(...)
... when != iounmap(e)
if (<+...e...+>) S
... when any
when != iounmap(e)
*if (...)
{ ... when != iounmap(e)
return ...; }
... when any
iounmap(e);
// </smpl>
Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr>
Signed-off-by: Tony Lindgren <tony@atomide.com>
pdata needs to be freed before leaving the function in an error case.
A simplified version of the semantic match that finds the problem is as
follows: (http://coccinelle.lip6.fr)
// <smpl>
@r exists@
local idexpression x;
statement S;
identifier f1;
position p1,p2;
expression *ptr != NULL;
@@
x@p1 = \(kmalloc\|kzalloc\|kcalloc\)(...);
...
if (x == NULL) S
<... when != x
when != if (...) { <+...x...+> }
x->f1
...>
(
return \(0\|<+...x...+>\|ptr\);
|
return@p2 ...;
)
@script:python@
p1 << r.p1;
p2 << r.p2;
@@
print "* file: %s kmalloc %s return %s" % (p1[0].file,p1[0].line,p2[0].line)
// </smpl>
Signed-off-by: Julia Lawall <julia@diku.dk>
Signed-off-by: Tony Lindgren <tony@atomide.com>
Currently MMC2 setup code can only enable loopback clock and
relies on reset value for boards that need to have it disabled.
This causes a problem with certain bootloaders that always enable
that clock, resulting with unwanted bootloader dependencies.
Fix this by making it disable the clock if board data says so.
Signed-off-by: Grazvydas Ignotas <notasas@gmail.com>
Acked-by: Igor Grinberg <grinberg@compulab.co.il>
Signed-off-by: Tony Lindgren <tony@atomide.com>
hsmmc23_before_set_reg() can set MMCSDIO2ADPCLKISEL bit, which
enables internal clock for MMC2. Currently this function is also called
by code handling MMC3, and if .internal_clock is set in platform data
(by default it currently is), it will set MMCSDIO2ADPCLKISEL for MMC2
instead of MMC3 (MMC3 doesn't have such bit so nothing actually needs to
be done). This breaks 2nd SD slot on pandora.
Fix this by changing hsmmc23_before_set_reg() to only handle MMC2.
Note that this removes .remux() call for MMC3, but no board currently
needs it and it's also not called for MMC4 and MMC5.
Signed-off-by: Grazvydas Ignotas <notasas@gmail.com>
Signed-off-by: Tony Lindgren <tony@atomide.com>
following statement can only change device size from 8-bit(0) to 16-bit(1),
but not vice versa:
regval |= GPMC_CONFIG1_DEVICESIZE(wval);
so as this field has 1 reserved bit, that could be used in future,
just clear both bits and then OR with the desired value
Signed-off-by: Yegor Yefremov <yegorslists@googlemail.com>
Cc: stable@vger.kernel.org
Signed-off-by: Tony Lindgren <tony@atomide.com>
Commit 2f0778af (ARM: 7205/2: sched_clock: allow sched_clock to be
selected at runtime) had a typo for the case when CONFIG_OMAP_32K_TIMER
is not set.
In dmtimer_read_sched_clock(), wrong argument was getting passed to
__omap_dm_timer_read_counter() function call; instead of "&clksrc",
we were passing "clksrc.io_base", which results into kernel crash.
To reproduce kernel crash, just disable the CONFIG_OMAP_32K_TIMER config
option (and DEBUG_LL) and build/boot the kernel.
This will use dmtimer as a kernel clocksource and lead to kernel
crash during boot -
[ 0.000000] OMAP clocksource: GPTIMER2 at 26000000 Hz
[ 0.000000] sched_clock: 32 bits at 26MHz, resolution 38ns, wraps every
165191ms
[ 0.000000] Unable to handle kernel paging request at virtual address
00030ef1
[ 0.000000] pgd = c0004000
[ 0.000000] [00030ef1] *pgd=00000000
[ 0.000000] Internal error: Oops: 5 [#1] SMP
[ 0.000000] Modules linked in:
[ 0.000000] CPU: 0 Not tainted (3.3.0-rc1-11574-g0c76665-dirty #3)
[ 0.000000] PC is at dmtimer_read_sched_clock+0x18/0x4c
[ 0.000000] LR is at update_sched_clock+0x10/0x84
[ 0.000000] pc : [<c00243b8>] lr : [<c0018684>] psr: 200001d3
[ 0.000000] sp : c0641f38 ip : c0641e18 fp : 0000000a
[ 0.000000] r10: 151c3303 r9 : 00000026 r8 : 76276259
[ 0.000000] r7 : 00028547 r6 : c065ac80 r5 : 431bde82 r4 : c0655968
[ 0.000000] r3 : 00030ef1 r2 : fb032000 r1 : 00000028 r0 : 00000001
Signed-off-by: Vaibhav Hiremath <hvaibhav@ti.com>
[tony@atomide.com: updated comments]
Signed-off-by: Tony Lindgren <tony@atomide.com>
Commit 36a1211970 removed linux/module.h
include statement from one of the headers that end up in net/sock.h.
It was providing us with static_branch() definition implicitly, so
after its removal the build got broken.
To fix this, and avoid having this happening in the future,
let me do the right thing and include linux/jump_label.h
explicitly in sock.h.
Signed-off-by: Glauber Costa <glommer@parallels.com>
Reported-by: Randy Dunlap <rdunlap@xenotime.net>
CC: David S. Miller <davem@davemloft.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
The Audigy's SB1394 controller is actually from Texas Instruments
and has the same bus reset packet generation bug, so it needs the
same quirk entry.
Signed-off-by: Clemens Ladisch <clemens@ladisch.de>
Cc: 2.6.36+ <stable@vger.kernel.org>
Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
Setting link parameters on a netdevice changes the value
of if_nlmsg_size(), therefore it is necessary to recalculate
min_ifinfo_dump_size.
Signed-off-by: Stefan Gula <steweg@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Tunnel devices set NETIF_F_LLTX to bypass HARD_TX_LOCK. Sit and
ipip set this unconditionally in ops->setup, but gre enables it
conditionally after parameter passing in ops->newlink. This is
not called during tunnel setup as below, however, so GRE tunnels are
still taking the lock.
modprobe ip_gre
ip tunnel add test0 mode gre remote 10.5.1.1 dev lo
ip link set test0 up
ip addr add 10.6.0.1 dev test0
# cat /sys/class/net/test0/features
# $DIR/test_tunnel_xmit 10 10.5.2.1
ip route add 10.5.2.0/24 dev test0
ip tunnel del test0
The newlink callback is only called in rtnl_netlink, and only if
the device is new, as it calls register_netdevice internally. Gre
tunnels are created at 'ip tunnel add' with ioctl SIOCADDTUNNEL,
which calls ipgre_tunnel_locate, which calls register_netdev.
rtnl_newlink is called at 'ip link set', but skips ops->newlink
and the device is up with locking still enabled. The equivalent
ipip tunnel works fine, btw (just substitute 'method gre' for
'method ipip').
On kernels before /sys/class/net/*/features was removed [1],
the first commented out line returns 0x6000 with method gre,
which indicates that NETIF_F_LLTX (0x1000) is not set. With ipip,
it reports 0x7000. This test cannot be used on recent kernels where
the sysfs file is removed (and ETHTOOL_GFEATURES does not currently
work for tunnel devices, because they lack dev->ethtool_ops).
The second commented out line calls a simple transmission test [2]
that sends on 24 cores at maximum rate. Results of a single run:
ipip: 19,372,306
gre before patch: 4,839,753
gre after patch: 19,133,873
This patch replicates the condition check in ipgre_newlink to
ipgre_tunnel_locate. It works for me, both with oseq on and off.
This is the first time I looked at rtnetlink and iproute2 code,
though, so someone more knowledgeable should probably check the
patch. Thanks.
The tail of both functions is now identical, by the way. To avoid
code duplication, I'll be happy to rework this and merge the two.
[1] http://patchwork.ozlabs.org/patch/104610/
[2] http://kernel.googlecode.com/files/xmit_udp_parallel.c
Signed-off-by: Willem de Bruijn <willemb@google.com>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/microcode_amd: Add support for CPU family specific container files
x86/amd: Add missing feature flag for fam15h models 10h-1fh processors
x86/boot-image: Don't leak phdrs in arch/x86/boot/compressed/misc.c::Parse_elf()
x86/numachip: Drop unnecessary conflict with EDAC
x86/uv: Fix uninitialized spinlocks
x86/uv: Fix uv_gpa_to_soc_phys_ram() shift
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/geert/linux-m68k:
m68k: Fix assembler constraint to prevent overeager gcc optimisation
mac_esp: rename irq
mac_scsi: dont enable mac_scsi irq before requesting it
macfb: fix black and white modes
m68k/irq: Remove obsolete IRQ_FLG_* definitions
Fix up trivial conflict in arch/m68k/kernel/process_mm.c as per Geert.
rsyslog will display KERN_EMERG messages on a connected
terminal. However, these messages are useless/undecipherable
for a general user.
For example, after a softlockup we get:
Message from syslogd@intel-s3e37-04 at Jan 25 14:18:06 ...
kernel:Stack:
Message from syslogd@intel-s3e37-04 at Jan 25 14:18:06 ...
kernel:Call Trace:
Message from syslogd@intel-s3e37-04 at Jan 25 14:18:06 ...
kernel:Code: ff ff a8 08 75 25 31 d2 48 8d 86 38 e0 ff ff 48 89
d1 0f 01 c8 0f ae f0 48 8b 86 38 e0 ff ff a8 08 75 08 b1 01 4c 89 e0 0f 01 c9 <e8> ea 69 dd ff 4c 29 e8 48 89 c7 e8 0f bc da ff 49 89 c4 49 89
This happens because the printk levels for these messages are
incorrect. Only an informational message should be displayed on
a terminal.
I modified the printk levels for various messages in the kernel
and tested the output by using the drivers/misc/lkdtm.c kernel
modules (ie, softlockups, panics, hard lockups, etc.) and
confirmed that the console output was still the same and that
the output to the terminals was correct.
For example, in the case of a softlockup we now see the much
more informative:
Message from syslogd@intel-s3e37-04 at Jan 25 10:18:06 ...
BUG: soft lockup - CPU4 stuck for 60s!
instead of the above confusing messages.
AFAICT, the messages no longer have to be KERN_EMERG. In the
most important case of a panic we set console_verbose(). As for
the other less severe cases the correct data is output to the
console and /var/log/messages.
Successfully tested by me using the drivers/misc/lkdtm.c module.
Signed-off-by: Prarit Bhargava <prarit@redhat.com>
Cc: dzickus@redhat.com
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Link: http://lkml.kernel.org/r/1327586134-11926-1-git-send-email-prarit@redhat.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Quite oddly, all of the arguments passed through from the top
level macros to the second level which didn't need parentheses
had them, while the only expression (involving a parameter)
needing them didn't.
Very recently I got bitten by the lack thereof when using
something like "array + index" for the first operand, with
"array" being an array more narrow than int.
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/4F2183A9020000780006F3E6@nat28.tlf.novell.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
If we span a long area in a bitmap we could end up taking a lot of time
searching to the next free area if we're searching from the original
window_start, so advance window_start in order to make sure we don't do any
superficial searching. Thanks,
Signed-off-by: Josef Bacik <josef@redhat.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
btree_releasepage is a callback and can be passed unknown gfp flags and then
they may end up in kmem_cache_alloc called from alloc_extent_state, slab
allocator will BUG_ON when there is HIGHMEM or DMA32 flag set.
This may happen when btrfs is mounted from a loop device, which masks out
__GFP_IO flag. The check in try_release_extent_state
3399 if ((mask & GFP_NOFS) == GFP_NOFS)
3400 mask = GFP_NOFS;
will not work and passes unfiltered flags further resulting in crash at
mm/slab.c:2963
[<000000000024ae4c>] cache_alloc_refill+0x3b4/0x5c8
[<000000000024c810>] kmem_cache_alloc+0x204/0x294
[<00000000001fd3c2>] mempool_alloc+0x52/0x170
[<000003c000ced0b0>] alloc_extent_state+0x40/0xd4 [btrfs]
[<000003c000cee5ae>] __clear_extent_bit+0x38a/0x4cc [btrfs]
[<000003c000cee78c>] try_release_extent_state+0x9c/0xd4 [btrfs]
[<000003c000cc4c66>] btree_releasepage+0x7e/0xd0 [btrfs]
[<0000000000210d84>] shrink_page_list+0x6a0/0x724
[<0000000000211394>] shrink_inactive_list+0x230/0x578
[<0000000000211bb8>] shrink_list+0x6c/0x120
[<0000000000211e4e>] shrink_zone+0x1e2/0x228
[<0000000000211f24>] shrink_zones+0x90/0x254
[<0000000000213410>] do_try_to_free_pages+0xac/0x420
[<0000000000213ae0>] try_to_free_pages+0x13c/0x1b0
[<0000000000204e6c>] __alloc_pages_nodemask+0x5b4/0x9a8
[<00000000001fb04a>] grab_cache_page_write_begin+0x7e/0xe8
Signed-off-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
When we did sysbench test for inline files, enospc error happened easily though
there was lots of free disk space which could be allocated for new chunks.
Reproduce steps:
# mkfs.btrfs -b $((2 * 1024 * 1024 * 1024)) <test partition>
# mount <test partition> /mnt
# ulimit -n 102400
# cd /mnt
# sysbench --num-threads=1 --test=fileio --file-num=81920 \
> --file-total-size=80M --file-block-size=1K --file-io-mode=sync \
> --file-test-mode=seqwr prepare
# sysbench --num-threads=1 --test=fileio --file-num=81920 \
> --file-total-size=80M --file-block-size=1K --file-io-mode=sync \
> --file-test-mode=seqwr run
<soon later, BUG_ON() was triggered by enospc error>
The reason of this bug is:
Now, we can reserve space which is larger than the free space in the chunks if
we have enough free disk space which can be used for new chunks. By this way,
the space allocator should allocate a new chunk by force if there is no free
space in the free space cache. But there are two wrong checks which break this
operation.
One is
if (ret == -ENOSPC && num_bytes > min_alloc_size)
in btrfs_reserve_extent(), it is wrong, we should try to allocate a new chunk
even we fail to allocate free space by minimum allocable size.
The other is
if (space_info->force_alloc)
force = space_info->force_alloc;
in do_chunk_alloc(). It makes the allocator ignore CHUNK_ALLOC_FORCE If someone
sets ->force_alloc to CHUNK_ALLOC_LIMITED, and makes the enospc error happen.
Fix these two wrong checks. Especially the second one, we fix it by changing
the value of CHUNK_ALLOC_LIMITED and CHUNK_ALLOC_FORCE, and make
CHUNK_ALLOC_FORCE greater than CHUNK_ALLOC_LIMITED since CHUNK_ALLOC_FORCE has
higher priority. And if the value which is passed in by the caller is greater
than ->force_alloc, use the passed value.
Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
xfstests 218 complains that btrfs defrags a file partially:
After: 1
Write backwards sync, but contiguous - should defrag to 1 extent
Before: 10
-After: 1
+After: 2
To fix this, we need to set max_to_defrag count properly.
Signed-off-by: Liu Bo <liubo2009@cn.fujitsu.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
There have been 4 warnings on 32-bit build, they are herewith fixed.
Signed-off-by: Stefan Behrens <sbehrens@giantdisaster.de>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
We specifically set window_start in the cluster struct to indicate where the
cluster starts in a bitmap, but we've been using min_start to indicate where
we're searching from. This is usually the start of the blockgroup, so
essentially means we're constantly searching from the start of any bitmap we
find, which completely negates all the trouble we go to in order to setup a
cluster. So start using window_start to make sure we actually use the area we
found. Thanks,
Signed-off-by: Josef Bacik <josef@redhat.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>