The default btrfs mount -o compress mode will quickly back off
compressing a file if it notices that compression does not reduce the
size of the data being written. This can save considerable CPU because
all future writes to the file go through uncompressed.
But some files are both very large and have mixed data stored in
them. In that case, we want to add the ability to always try
compressing data before writing it.
This commit adds mount -o compress-force. A later commit will add
a new inode flag that does the same thing.
Signed-off-by: Chris Mason <chris.mason@oracle.com>
We can race with the unmount of an fs and the stopping of a kthread where we
will free the block group before we're done using it. The reason for this is
because we do not hold a reference on the block group while its caching, since
the allocator drops its reference once it exits or moves on to the next block
group. This patch fixes the problem by taking a reference to the block group
before we start caching and dropping it when we're done to make sure all
accesses to the block group are safe. Thanks,
Signed-off-by: Josef Bacik <josef@redhat.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
It is legal for btrfs_set_acl to be sent a NULL acl. This
makes sure we don't dereference it. A similar patch was sent by
Johannes Hirte <johannes.hirte@fem.tu-ilmenau.de>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
Currently orphan cleanup only ever gets triggered if we cross subvolumes during
a lookup, which means that if we just mount a plain jane fs that has orphans in
it, they will never get cleaned up. This results in panic's like these
http://www.kerneloops.org/oops.php?number=1109085
where adding an orphan entry results in -EEXIST being returned and we panic. In
order to fix this, we check to see on lookup if our root has had the orphan
cleanup done, and if not go ahead and do it. This is easily reproduceable by
running this testcase
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <string.h>
#include <unistd.h>
#include <stdio.h>
int main(int argc, char **argv)
{
char data[4096];
char newdata[4096];
int fd1, fd2;
memset(data, 'a', 4096);
memset(newdata, 'b', 4096);
while (1) {
int i;
fd1 = creat("file1", 0666);
if (fd1 < 0)
break;
for (i = 0; i < 512; i++)
write(fd1, data, 4096);
fsync(fd1);
close(fd1);
fd2 = creat("file2", 0666);
if (fd2 < 0)
break;
ftruncate(fd2, 4096 * 512);
for (i = 0; i < 512; i++)
write(fd2, newdata, 4096);
close(fd2);
i = rename("file2", "file1");
unlink("file1");
}
return 0;
}
and then pulling the power on the box, and then trying to run that test again
when the box comes back up. I've tested this locally and it fixes the problem.
Thanks to Tomas Carnecky for helping me track this down initially.
Signed-off-by: Josef Bacik <josef@redhat.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
Fix bug reported by Johannes Hirte. The reason of that bug
is btrfs_del_items is called after btrfs_duplicate_item and
btrfs_del_items triggers tree balance. The fix is check that
case and call btrfs_search_slot when needed.
Signed-off-by: Yan Zheng <zheng.yan@oracle.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
Stanse found 2 memory leaks in relocate_block_group and
__btrfs_map_block. cluster and multi are not freed/assigned on all
paths. Fix that.
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Cc: linux-btrfs@vger.kernel.org
Signed-off-by: Chris Mason <chris.mason@oracle.com>
Some callers of btrfs_ordered_update_i_size can now pass in
a NULL for the ordered extent to update against. This makes
sure we properly align the offset they pass in when deciding
how much to bump the on disk i_size.
Signed-off-by: Chris Mason <chris.mason@oracle.com>
parent 49313cdac7b34c9f7ecbb1780cfc648b1c082cd7 (v2.6.32-1-g49313cd)
commit ff48c08e1c05c67e8348ab6f8a24de8034e0e34d
Author: Jan Engelhardt <jengelh@medozas.de>
Date: Wed Dec 9 22:57:36 2009 +0100
Btrfs: fix missing last-entry in readdir(3)
When one does a 32-bit readdir(3), the last entry of a directory is
missing. This is however not due to passing a large value to filldir,
but it seems to have to do with glibc doing telldir or something
quirky. In any case, this patch fixes it in practice.
Signed-off-by: Jan Engelhardt <jengelh@medozas.de>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
The recent patch to make fallocate enospc friendly would send
down a NULL trans handle to the allocator. This moves the
transaction start to properly fix things.
Signed-off-by: Chris Mason <chris.mason@oracle.com>
This patch makes us a bit less zealous about making sure we have enough free
metadata space by pearing down the size of new metadata chunks to 256mb instead
of 1gb. Also, we used to try an allocate metadata chunks when allocating data,
but that sort of thing is done elsewhere now so we can just remove it. With my
-ENOSPC test I used to have 3gb reserved for metadata out of 75gb, now I have
1.7gb. Thanks,
Signed-off-by: Josef Bacik <josef@redhat.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
Christoph's patch e244a0aeb6 doesn't display
the discard option in /proc/mounts, leading to some confusion for me.
Here's the missing bit.
Signed-off-by: Matthew Wilcox <willy@linux.intel.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
I rebased Christian Parpart's patch to deny hard link across
subvolumes. Original patch modifies also btrfs_rename, but
I excluded it because we can move across subvolumes now and
it make no problem.
-----------------
Hard link across subvolumes should not allowed in Btrfs.
btrfs_link checks root of 'to' directory is same as root
of 'from' file. If not same, btrfs_link returns -EPERM.
Signed-off-by: TARUISI Hiroaki <taruishi.hiroak@jp.fujitsu.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
If block group 0 is completely free, btrfs_read_block_groups will
add extent [0, BTRFS_SUPER_INFO_OFFSET) to the free space cache.
Signed-off-by: Yan Zheng <zheng.yan@oracle.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
The bytes_used field in root item was originally planned to
trace the amount of used data and tree blocks. But it never
worked right since we can't trace freeing of data accurately.
This patch changes it to only trace the amount of tree blocks.
Signed-off-by: Yan Zheng <zheng.yan@oracle.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
The check for skip pinned case is wrong, it may breaks the
while loop too soon.
Signed-off-by: Yan Zheng <zheng.yan@oracle.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
iput() can trigger new transactions if we are dropping the
final reference, so calling it in btrfs_commit_transaction
may end up deadlock. This patch adds delayed iput to avoid
the issue.
Signed-off-by: Yan Zheng <zheng.yan@oracle.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
Pass transaction handle down to security and ACL initialization
functions, so we can avoid starting nested transactions
Signed-off-by: Yan Zheng <zheng.yan@oracle.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
truncating and deleting regular files are unbound operations,
so it's not good to do them in a single transaction. This
patch makes btrfs_truncate and btrfs_delete_inode start a
new transaction after all items in a tree leaf are deleted.
Signed-off-by: Yan Zheng <zheng.yan@oracle.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
fallocate(2) may allocate large number of file extents, so it's not
good to do it in a single transaction. This patch make fallocate(2)
start a new transaction for each file extents it allocates.
Signed-off-by: Yan Zheng <zheng.yan@oracle.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
btrfs_lookup_dentry may trigger orphan cleanup, so it's not good
to call it while committing a transaction.
Signed-off-by: Yan Zheng <zheng.yan@oracle.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
We do log replay in a single transaction, so it's not good to do unbound
operations. This patch cleans up orphan inodes cleanup after replaying
the log. It also avoids doing other unbound operations such as truncating
a file during replaying log. These unbound operations are postponed to
the orphan inode cleanup stage.
Signed-off-by: Yan Zheng <zheng.yan@oracle.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
There are some cases file extents are inserted without involving
ordered struct. In these cases, we update disk_i_size directly,
without checking pending ordered extent and DELALLOC bit. This
patch extends btrfs_ordered_update_i_size() to handle these cases.
Signed-off-by: Yan Zheng <zheng.yan@oracle.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
Rewrite btrfs_drop_extents by using btrfs_duplicate_item, so we can
avoid calling lock_extent within transaction.
Signed-off-by: Yan Zheng <zheng.yan@oracle.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
btrfs_duplicate_item duplicates item with new key, guaranteeing
the source item and the new items are in the same tree leaf and
contiguous. It allows us to split file extent in place, without
using lock_extent to prevent bookend extent race.
Signed-off-by: Yan Zheng <zheng.yan@oracle.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
We allow two log transactions at a time, but use same flag
to mark dirty tree-log btree blocks. So we may flush dirty
blocks belonging to newer log transaction when committing a
log transaction. This patch fixes the issue by using two
flags to mark dirty tree-log btree blocks.
Signed-off-by: Yan Zheng <zheng.yan@oracle.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
request_region should be used with release_region, not request_mem_region.
Geert Uytterhoeven pointed out that in the case of drivers/video/gbefb.c,
the problem is actually the other way around; request_mem_region should be
used instead of request_region.
The semantic patch that finds/fixes this problem is as follows:
(http://coccinelle.lip6.fr/)
// <smpl>
@r1@
expression start;
@@
request_region(start,...)
@b1@
expression r1.start;
@@
request_mem_region(start,...)
@depends on !b1@
expression r1.start;
expression E;
@@
- release_mem_region
+ release_region
(start,E)
// </smpl>
Signed-off-by: Julia Lawall <julia@diku.dk>
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
These laptops often leave i8042 in a wierd state resulting in non-
operational touchpad and keyboard.
Signed-off-by: Anisse Astier <anisse@astier.eu>
Signed-off-by: Dmitry Torokhov <dtor@mail.ru>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Jon confirms that recent modprobe will look in /proc/cmdline, so these
cmdline options can still be used.
See http://bugzilla.kernel.org/show_bug.cgi?id=14164
Reported-by: Adam Williamson <awilliam@redhat.com>
Cc: stable@kernel.org
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
On the parisc architecture we face for each and every loaded kernel module
this kernel "badness warning":
sysfs: cannot create duplicate filename '/module/ac97_bus/sections/.text'
Badness at fs/sysfs/dir.c:487
Reason for that is, that on parisc all kernel modules do have multiple
.text sections due to the usage of the -ffunction-sections compiler flag
which is needed to reach all jump targets on this platform.
An objdump on such a kernel module gives:
Sections:
Idx Name Size VMA LMA File off Algn
0 .note.gnu.build-id 00000024 00000000 00000000 00000034 2**2
CONTENTS, ALLOC, LOAD, READONLY, DATA
1 .text 00000000 00000000 00000000 00000058 2**0
CONTENTS, ALLOC, LOAD, READONLY, CODE
2 .text.ac97_bus_match 0000001c 00000000 00000000 00000058 2**2
CONTENTS, ALLOC, LOAD, READONLY, CODE
3 .text 00000000 00000000 00000000 000000d4 2**0
CONTENTS, ALLOC, LOAD, READONLY, CODE
...
Since the .text sections are empty (size of 0 bytes) and won't be
loaded by the kernel module loader anyway, I don't see a reason
why such sections need to be listed under
/sys/module/<module_name>/sections/<section_name> either.
The attached patch does solve this issue by not exporting section
names which are empty.
This fixes bugzilla http://bugzilla.kernel.org/show_bug.cgi?id=14703
Signed-off-by: Helge Deller <deller@gmx.de>
CC: rusty@rustcorp.com.au
CC: akpm@linux-foundation.org
CC: James.Bottomley@HansenPartnership.com
CC: roland@redhat.com
CC: dave@hiauly1.hia.nrc.ca
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The version that made it into mainline missed the initialisation of the
chip handle.
Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
Signed-off-by: Liam Girdwood <lrg@slimlogic.co.uk>
We should now use dev_set_drvdata to set the driver driver_data field.
Signed-off-by: Florian Fainelli <florian@openwrt.org>
Patchwork: http://patchwork.linux-mips.org/patch/747/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
This patch fixes the compilation failure of
rc32434 due to a bad module parameter description.
Signed-off-by: Florian Fainelli <florian@openwrt.org>
Signed-off-by: Wim Van Sebroeck <wim@iguana.be>
The size value passed to ioremap_nocache() is not correct.
Use resource_size() to get the correct value.
Signed-off-by: H Hartley Sweeten <hsweeten@visionengravers.com>
Cc: Phil Sutter <n0-1@freewrt.org>
Signed-off-by: Wim Van Sebroeck <wim@iguana.be>
The SYSFS_DEPRECATED_V2 says "remove" older, deprecated features, but it
actually enables them, so correct this confusing, backwards text.
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Cc: Kay Sievers <kay.sievers@vrfy.org>
Cc: Greg KH <greg@kroah.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
When detecting power failure, the probe function would reset the clock
time to defined state.
However, the clock's _date_ might still be bogus and a subsequent probe
fails when sanity-checking these values.
Change the power-failure fixup code to do a full setting of rtc_time,
including a valid date.
Signed-off-by: Johannes Weiner <jw@emlix.com>
Cc: Alessandro Zummo <a.zummo@towertech.it>
Cc: Paul Gortmaker <p_gortmaker@yahoo.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The possible CCR_Y2K register values are 19 or 20 and struct rtc_time's
tm_year is in years since 1900.
The function translating rtc_time to register values assumes tm_year to be
years since first christmas, though, and we end up storing 0 or 1 in the
CCR_Y2K register, which the hardware does not refuse to do.
A subsequent probing of the clock fails due to the invalid value range in
the register, though.
[ And if it didn't, reading the clock would yield a bogus year because
the function translating registers to tm_year is assuming a register
value of 19 or 20. ]
This fixes the conversion from years since 1900 in tm_year to the
corresponding CCR_Y2K value of 19 or 20.
Signed-off-by: Johannes Weiner <jw@emlix.com>
Cc: Alessandro Zummo <a.zummo@towertech.it>
Cc: Paul Gortmaker <p_gortmaker@yahoo.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Prevent the AoE block driver from creating cache aliases of page cache
pages on machines with virtually indexed caches.
Building kernels on an AT91SAM9G20 board without this patch fails with
segmentation faults after a couple of passes.
Signed-off-by: Peter Horton <zero@colonel-panic.org>
Cc: "Ed L. Cashin" <ecashin@coraid.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
- Remove wrong and unnecessary unmask operation
- Remove extra GEDR reading
This fixes the loss of interrupts which occurs when two or more pins are
triggered in close succession.
Signed-off-by: Alek Du <alek.du@intel.com>
Cc: David Brownell <david-b@pacbell.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
It has been fun but the last year or more it has been a duty and a burden.
So I leave it open for others to take over.
Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
Cc: Anibal Monsalve Salazar <anibal@debian.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Michal Marek <mmarek@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>