Commit Graph

233279 Commits

Author SHA1 Message Date
Linus Torvalds f60c153d50 Merge branch 'for-2.6.38' of git://linux-nfs.org/~bfields/linux
* 'for-2.6.38' of git://linux-nfs.org/~bfields/linux:
  nfsd: break lease on unlink due to rename
  nfsd4: acquire only one lease per file
  nfsd4: modify fi_delegations under recall_lock
  nfsd4: remove unused deleg dprintk's.
  nfsd4: split lease setting into separate function
  nfsd4: fix leak on allocation error
  nfsd4: add helper function for lease setup
  nfsd4: split up nfsd_break_deleg_cb
  NFSD: memory corruption due to writing beyond the stat array
  NFSD: use nfserr for status after decode_cb_op_status
  nfsd: don't leak dentry count on mnt_want_write failure
2011-02-15 12:06:38 -08:00
Linus Torvalds a1213b091c Merge branches 'core-fixes-for-linus' and 'timers-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'core-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  Revert "lockdep, timer: Fix del_timer_sync() annotation"

* 'timers-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  timer debug: Hide kernel addresses via %pK in /proc/timer_list
2011-02-15 10:19:18 -08:00
Linus Torvalds 1cecd791f2 Merge branch 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86: Fix text_poke_smp_batch() deadlock
  perf tools: Fix thread_map event synthesizing in top and record
  watchdog, nmi: Lower the severity of error messages
  ARM: oprofile: Fix backtraces in timer mode
  oprofile: Fix usage of CONFIG_HW_PERF_EVENTS for oprofile_perf_init and friends
2011-02-15 10:18:48 -08:00
Linus Torvalds fef86db8fe Merge branch 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86, dmi, debug: Log board name (when present) in dmesg/oops output
  x86, ioapic: Don't warn about non-existing IOAPICs if we have none
  x86: Fix mwait_usable section mismatch
  x86: Readd missing irq_to_desc() in fixup_irq()
  x86: Fix section mismatch in LAPIC initialization
2011-02-15 10:18:29 -08:00
Linus Torvalds 87450bd55d Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
  Input: matrix_keypad - increase the limit of rows and columns
  Input: wacom - fix error path in wacom_probe()
  Input: ads7846 - check proper condition when freeing gpio
  Revert "Input: do not pass injected events back to the originating handler"
  Input: sysrq - rework re-inject logic
  Input: serio - clear pending rescans after sysfs driver rebind
  Input: rotary_encoder - use proper irqflags
  Input: wacom_w8001 - report resolution to userland
2011-02-15 09:40:27 -08:00
Linus Torvalds 055d219441 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6:
  get rid of nameidata_dentry_drop_rcu() calling nameidata_drop_rcu()
  drop out of RCU in return_reval
  split do_revalidate() into RCU and non-RCU cases
  in do_lookup() split RCU and non-RCU cases of need_revalidate
  nothing in do_follow_link() is going to see RCU
2011-02-15 08:06:36 -08:00
Linus Torvalds 007a14af26 Merge git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable
* git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable:
  Btrfs: check return value of alloc_extent_map()
  Btrfs - Fix memory leak in btrfs_init_new_device()
  btrfs: prevent heap corruption in btrfs_ioctl_space_info()
  Btrfs: Fix balance panic
  Btrfs: don't release pages when we can't clear the uptodate bits
  Btrfs: fix page->private races
2011-02-15 08:00:35 -08:00
Martin Schwidefsky 261cd298a8 s390: remove task_show_regs
task_show_regs used to be a debugging aid in the early bringup days
of Linux on s390. /proc/<pid>/status is a world readable file, it
is not a good idea to show the registers of a process. The only
correct fix is to remove task_show_regs.

Reported-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-02-15 07:34:16 -08:00
Al Viro 4e924a4f53 get rid of nameidata_dentry_drop_rcu() calling nameidata_drop_rcu()
can't happen anymore and didn't work right anyway

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-02-15 02:26:54 -05:00
Al Viro f60aef7ec6 drop out of RCU in return_reval
... thus killing the need to handle drop-from-RCU in d_revalidate()

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-02-15 02:26:54 -05:00
Al Viro f5e1c1c1af split do_revalidate() into RCU and non-RCU cases
fixing oopsen in lookup_one_len()

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-02-15 02:26:54 -05:00
Al Viro 24643087e7 in do_lookup() split RCU and non-RCU cases of need_revalidate
and use unlikely() instead of gotos, for fsck sake...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-02-15 02:26:54 -05:00
Al Viro 844a391799 nothing in do_follow_link() is going to see RCU
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-02-15 02:26:53 -05:00
Naga Chumbalkar 84e383b322 x86, dmi, debug: Log board name (when present) in dmesg/oops output
The "Type 2" SMBIOS record that contains Board Name is not
strictly required and may be absent in the SMBIOS on some
platforms.

( Please note that Type 2 is not listed in Table 3 in Sec 6.2
  ("Required Structures and Data") of the SMBIOS v2.7
  Specification. )

Use the Manufacturer Name (aka System Vendor) name.
Print Board Name only when it is present.

Before the fix:
  (i) dmesg output: DMI: /ProLiant DL380 G6, BIOS P62 01/29/2011
 (ii) oops output:  Pid: 2170, comm: bash Not tainted 2.6.38-rc4+ #3 /ProLiant DL380 G6

After the fix:
  (i) dmesg output: DMI: HP ProLiant DL380 G6, BIOS P62 01/29/2011
 (ii) oops output:  Pid: 2278, comm: bash Not tainted 2.6.38-rc4+ #4 HP ProLiant DL380 G6

Signed-off-by: Naga Chumbalkar <nagananda.chumbalkar@hp.com>
Reviewed-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Cc: <stable@kernel.org> # .3x - good for debugging, please apply as far back as it applies cleanly
LKML-Reference: <20110214224423.2182.13929.sendpatchset@nchumbalkar.americas.hpqcorp.net>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-02-15 04:20:57 +01:00
Paul Bolle 678301ecad x86, ioapic: Don't warn about non-existing IOAPICs if we have none
mp_find_ioapic() prints errors like:

    ERROR: Unable to locate IOAPIC for GSI 13

if it can't find the IOAPIC that manages that specific GSI. I
see errors like that at every boot of a laptop that apparently
doesn't have any IOAPICs.

But if there are no IOAPICs it doesn't seem to be an error that
none can be found. A solution that gets rid of this message is
to directly return if nr_ioapics (still) is zero. (But keep
returning -1 in that case, so nothing breaks from this change.)

The call chain that generates this error is:

pnpacpi_allocated_resource()
    case ACPI_RESOURCE_TYPE_IRQ:
        pnpacpi_parse_allocated_irqresource()
            acpi_get_override_irq()
                 mp_find_ioapic()

Signed-off-by: Paul Bolle <pebolle@tiscali.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-02-15 04:15:04 +01:00
Ingo Molnar a252852afa Merge branch 'urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/rric/oprofile into perf/urgent 2011-02-15 04:10:35 +01:00
Linus Torvalds 1abe3af271 Merge branch 'fixes' of master.kernel.org:/home/rmk/linux-2.6-arm
* 'fixes' of master.kernel.org:/home/rmk/linux-2.6-arm:
  ARM: 6657/1: hw_breakpoint: fix ptrace breakpoint advertising on unsupported arch
  ARM: 6656/1: hw_breakpoint: avoid UNPREDICTABLE behaviour when reading DBGDSCR
  ARM: 6658/1: collie: do actually pass locomo_info to locomo driver
  ARM: 6659/1: Thumb-2: Make CONFIG_OABI_COMPAT depend on !CONFIG_THUMB2_KERNEL
  ARM: 6654/1: perf/oprofile: fix off-by-one in stack check
  ARM: fixup SMP alternatives in modules
  ARM: make SWP emulation explicit on !CPU_USE_DOMAINS
  ARM: Avoid building unsafe kernels on OMAP2 and MX3
  ARM: pxa: Properly configure PWM period for palm27x
  ARM: pxa: only save/restore registers when pm functions are defined
  ARM: pxa/colibri: use correct SD detect pin
  ARM: pxa: fix mfpr_sync to read from valid offset
2011-02-14 14:49:29 -08:00
Tsutomu Itoh c26a920373 Btrfs: check return value of alloc_extent_map()
I add the check on the return value of alloc_extent_map() to several places.
In addition, alloc_extent_map() returns only the address or NULL.
Therefore, check by IS_ERR() is unnecessary. So, I remove IS_ERR() checking.

Signed-off-by: Tsutomu Itoh <t-itoh@jp.fujitsu.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2011-02-14 16:21:37 -05:00
Ilya Dryomov 67100f255d Btrfs - Fix memory leak in btrfs_init_new_device()
Memory allocated by calling kstrdup() should be freed.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2011-02-14 16:21:31 -05:00
Dan Rosenberg 51788b1bdd btrfs: prevent heap corruption in btrfs_ioctl_space_info()
Commit bf5fc093c5 refactored
btrfs_ioctl_space_info() and introduced several security issues.

space_args.space_slots is an unsigned 64-bit type controlled by a
possibly unprivileged caller.  The comparison as a signed int type
allows providing values that are treated as negative and cause the
subsequent allocation size calculation to wrap, or be truncated to 0.
By providing a size that's truncated to 0, kmalloc() will return
ZERO_SIZE_PTR.  It's also possible to provide a value smaller than the
slot count.  The subsequent loop ignores the allocation size when
copying data in, resulting in a heap overflow or write to ZERO_SIZE_PTR.

The fix changes the slot count type and comparison typecast to u64,
which prevents truncation or signedness errors, and also ensures that we
don't copy more data than we've allocated in the subsequent loop.  Note
that zero-size allocations are no longer possible since there is already
an explicit check for space_args.space_slots being 0 and truncation of
this value is no longer an issue.

Signed-off-by: Dan Rosenberg <drosenberg@vsecurity.com>
Signed-off-by: Josef Bacik <josef@redhat.com>
Reviewed-by: Josef Bacik <josef@redhat.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2011-02-14 16:04:23 -05:00
Yan, Zheng 6848ad6461 Btrfs: Fix balance panic
Mark the cloned backref_node as checked in clone_backref_node()

Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2011-02-14 16:00:03 -05:00
Linus Torvalds 15a831f253 Merge branch 'devicetree/merge' of git://git.secretlab.ca/git/linux-2.6
* 'devicetree/merge' of git://git.secretlab.ca/git/linux-2.6:
  Revert "dt: add documentation of ARM dt boot interface"
2011-02-14 10:10:37 -08:00
Linus Torvalds f1b6a4ec27 Merge branch 'rtc-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'rtc-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  RTC: Fix minor compile warning
  RTC: Convert rtc drivers to use the alarm_irq_enable method
  RTC: Fix rtc driver ioctl specific shortcutting
2011-02-14 10:10:07 -08:00
Chris Mason e3f24cc521 Btrfs: don't release pages when we can't clear the uptodate bits
Btrfs tracks uptodate state in an rbtree as well as in the
page bits.  This is supposed to enable us to use block sizes other than
the page size, but there are a few parts still missing before that
completely works.

But, our readpage routine trusts this additional range based tracking
of uptodateness, much in the same way the buffer head up to date bits
are trusted for the other filesystems.

The problem is that sometimes we need to allocate memory in order to
split records in the rbtree, even when we are just clearing bits.  This
can be difficult when our clearing function is called GFP_ATOMIC, which
can happen in the releasepage path.

So, what happens today looks like this:

releasepage called with GFP_ATOMIC
btrfs_releasepage calls clear_extent_bit
clear_extent_bit fails to allocate ram, leaving the up to date bit set
btrfs_releasepage returns success

The end result is the page being gone, but btrfs thinking the range is
up to date.   Later on if someone tries to read that same page, the
btrfs readpage code will return immediately thinking the page is already
up to date.

This commit fixes things to fail the releasepage when we can't clear the
extent state bits.  It covers both data pages and metadata tree blocks.

Signed-off-by: Chris Mason <chris.mason@oracle.com>
2011-02-14 13:04:01 -05:00
Chris Mason eb14ab8ed2 Btrfs: fix page->private races
There is a race where btrfs_releasepage can drop the
page->private contents just as alloc_extent_buffer is setting
up pages for metadata.  Because of how the Btrfs page flags work,
this results in us skipping the crc on the page during IO.

This patch sovles the race by waiting until after the extent buffer
is inserted into the radix tree before it sets page private.

Signed-off-by: Chris Mason <chris.mason@oracle.com>
2011-02-14 13:03:52 -05:00
J. Bruce Fields 83f6b0c182 nfsd: break lease on unlink due to rename
4795bb37ef "nfsd: break lease on unlink,
link, and rename", only broke the lease on the file that was being
renamed, and didn't handle the case where the target path refers to an
already-existing file that will be unlinked by a rename--in that case
the target file should have any leases broken as well.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2011-02-14 10:35:19 -05:00
J. Bruce Fields acfdf5c383 nfsd4: acquire only one lease per file
Instead of acquiring one lease each time another client opens a file,
nfsd can acquire just one lease to represent all of them, and reference
count it to determine when to release it.

This fixes a regression introduced by
c45821d263 "locks: eliminate fl_mylease
callback": after that patch, only the struct file * is used to determine
who owns a given lease.  But since we recently converted the server to
share a single struct file per open, if we acquire multiple leases on
the same file from nfsd, it then becomes impossible on unlocking a lease
to determine which of those leases (all of whom share the same struct
file *) we meant to remove.

Thanks to Takashi Iwai <tiwai@suse.de> for catching a bug in a previous
version of this patch.

Tested-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2011-02-14 10:35:19 -05:00
J. Bruce Fields 5d926e8c2f nfsd4: modify fi_delegations under recall_lock
Modify fi_delegations only under the recall_lock, allowing us to use
that list on lease breaks.

Also some trivial cleanup to simplify later changes.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2011-02-14 10:35:19 -05:00
J. Bruce Fields 65bc58f518 nfsd4: remove unused deleg dprintk's.
These aren't all that useful, and get in the way of the next steps.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2011-02-14 10:35:19 -05:00
J. Bruce Fields edab9782b5 nfsd4: split lease setting into separate function
Splitting some code into a separate function which we'll be adding some
more to.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2011-02-14 10:35:18 -05:00
J. Bruce Fields dd239cc05f nfsd4: fix leak on allocation error
Also share some common exit code.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2011-02-14 10:35:18 -05:00
J. Bruce Fields 22d38c4c10 nfsd4: add helper function for lease setup
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2011-02-14 10:35:18 -05:00
J. Bruce Fields 6b57d9c86d nfsd4: split up nfsd_break_deleg_cb
We'll be adding some more code here soon.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2011-02-14 10:35:18 -05:00
Konstantin Khorenko 3aa6e0aa8a NFSD: memory corruption due to writing beyond the stat array
If nfsd fails to find an exported via NFS file in the readahead cache, it
should increment corresponding nfsdstats counter (ra_depth[10]), but due to a
bug it may instead write to ra_depth[11], corrupting the following field.

In a kernel with NFSDv4 compiled in the corruption takes the form of an
increment of a counter of the number of NFSv4 operation 0's received; since
there is no operation 0, this is harmless.

In a kernel with NFSDv4 disabled it corrupts whatever happens to be in the
memory beyond nfsdstats.

Signed-off-by: Konstantin Khorenko <khorenko@openvz.org>
Cc: stable@kernel.org
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2011-02-14 10:35:18 -05:00
Benny Halevy 0af3f814cc NFSD: use nfserr for status after decode_cb_op_status
Bugs introduced in 85a5648019
"NFSD: Update XDR decoders in NFSv4 callback client"

Cc: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2011-02-14 10:35:18 -05:00
J. Bruce Fields 541ce98c10 nfsd: don't leak dentry count on mnt_want_write failure
The exit cleanup isn't quite right here.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2011-02-14 10:31:08 -05:00
Grant Likely 7211da1778 Revert "dt: add documentation of ARM dt boot interface"
This reverts commit 9830fcd6f6.

The ARM dt support has not been merged yet; this documentation update
was premature.

Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
2011-02-14 08:13:20 -07:00
Borislav Petkov 1c9d16e359 x86: Fix mwait_usable section mismatch
We use it in non __cpuinit code now too so drop marker.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
LKML-Reference: <20110211171754.GA21047@aftab>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-02-14 12:08:28 +01:00
Thomas Gleixner 6ee5859df5 Merge branch 'fortglx/2.6.38/tip/timers/rtc' of git://git.linaro.org/people/jstultz/linux into timers/urgent 2011-02-14 09:00:30 +01:00
David Miller 795abaf1e4 klist: Fix object alignment on 64-bit.
Commit c0e69a5bbc ("klist.c: bit 0 in pointer can't be used as flag")
intended to make sure that all klist objects were at least pointer size
aligned, but used the constant "4" which only works on 32-bit.

Use "sizeof(void *)" which is correct in all cases.

Signed-off-by: David S. Miller <davem@davemloft.net>
Acked-by: Jesper Nilsson <jesper.nilsson@axis.com>
Cc: stable <stable@kernel.org>
Cc: Greg Kroah-Hartman <gregkh@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-02-13 16:54:24 -08:00
Linus Torvalds 091994cfb8 Merge branch 'spi/merge' of git://git.secretlab.ca/git/linux-2.6
* 'spi/merge' of git://git.secretlab.ca/git/linux-2.6:
  devicetree-discuss is moderated for non-subscribers
  MAINTAINERS: Add entry for GPIO subsystem
  dt: add documentation of ARM dt boot interface
  dt: Remove obsolete description of powerpc boot interface
  dt: Move device tree documentation out of powerpc directory
  spi/spi_sh_msiof: fix wrong address calculation, which leads to an Oops
2011-02-13 07:59:48 -08:00
Linus Torvalds d8ed516f82 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound-2.6
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound-2.6:
  ALSA: hda - add quirk for Ordissimo EVE using a realtek ALC662
  ALSA: hrtimer: remove superfluous tasklet invocation
  ALSA: hrtimer: handle delayed timer interrupts
  ALSA: HDA: Add subwoofer quirk for Acer Aspire 8942G
  ALSA: hda - Don't handle empty patch files
  ALSA: hda - Fix missing CA initialization for HDMI/DP
  ALSA: usbaudio - Enable the E-MU 0204 USB
  ALSA: hda - switch lfe with side in mixer for 4930g
  ASoC: Improve WM8994 digital power sequencing
  ASoC: Create an AIF1ADCDAT signal widget to match AIF2
  asoc: davinci: da830/omap-l137: correct cpu_dai_name
  ASoC: fill in snd_soc_pcm_runtime.card before calling snd_soc_dai_link.init()
2011-02-13 07:58:50 -08:00
Linus Torvalds f00eaeea7a Revert "pci: use security_capable() when checking capablities during config space read"
This reverts commit 47970b1b2a.

It turns out it breaks several distributions.  Looks like the stricter
selinux checks fail due to selinux policies not being set to allow the
access - breaking X, but also lspci.

So while the change was clearly the RightThing(tm) to do in theory, in
practice we have backwards compatibility issues making it not work.

Reported-by: Dave Young <hidave.darkstar@gmail.com>
Acked-by: David Airlie <airlied@linux.ie>
Acked-by: Alex Riesen <raa.lkml@gmail.com>
Cc: Eric Paris <eparis@redhat.com>
Cc: Chris Wright <chrisw@sous-sol.org>
Cc: James Morris <jmorris@namei.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-02-13 07:50:50 -08:00
Takashi Iwai 6146124118 Merge branch 'fix/asoc' into for-linus 2011-02-13 10:05:30 +01:00
Grant Likely c170093d31 Merge branch 'devicetree/merge' into spi/merge 2011-02-12 23:53:34 -07:00
Paul Bolle 78bba987bc devicetree-discuss is moderated for non-subscribers
Signed-off-by: Paul Bolle <pebolle@tiscali.nl>
Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
2011-02-12 23:27:23 -07:00
Grant Likely a0dc00b430 MAINTAINERS: Add entry for GPIO subsystem
I'll probably regret this....

Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-02-12 09:46:30 -08:00
Linus Torvalds c8e0b00ed1 Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4
* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
  jbd2: call __jbd2_log_start_commit with j_state_lock write locked
  ext4: serialize unaligned asynchronous DIO
  ext4: make grpinfo slab cache names static
  ext4: Fix data corruption with multi-block writepages support
  ext4: fix up ext4 error handling
  ext4: unregister features interface on module unload
  ext4: fix panic on module unload when stopping lazyinit thread
2011-02-12 09:10:24 -08:00
Theodore Ts'o e447183180 jbd2: call __jbd2_log_start_commit with j_state_lock write locked
On an SMP ARM system running ext4, I've received a report that the
first J_ASSERT in jbd2_journal_commit_transaction has been triggering:

	J_ASSERT(journal->j_running_transaction != NULL);

While investigating possible causes for this problem, I noticed that
__jbd2_log_start_commit() is getting called with j_state_lock only
read-locked, in spite of the fact that it's possible for it might
j_commit_request.  Fix this by grabbing the necessary information so
we can test to see if we need to start a new transaction before
dropping the read lock, and then calling jbd2_log_start_commit() which
will grab the write lock.

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2011-02-12 08:18:24 -05:00
Eric Sandeen e9e3bcecf4 ext4: serialize unaligned asynchronous DIO
ext4 has a data corruption case when doing non-block-aligned
asynchronous direct IO into a sparse file, as demonstrated
by xfstest 240.

The root cause is that while ext4 preallocates space in the
hole, mappings of that space still look "new" and 
dio_zero_block() will zero out the unwritten portions.  When
more than one AIO thread is going, they both find this "new"
block and race to zero out their portion; this is uncoordinated
and causes data corruption.

Dave Chinner fixed this for xfs by simply serializing all
unaligned asynchronous direct IO.  I've done the same here.
The difference is that we only wait on conversions, not all IO.
This is a very big hammer, and I'm not very pleased with
stuffing this into ext4_file_write().  But since ext4 is
DIO_LOCKING, we need to serialize it at this high level.

I tried to move this into ext4_ext_direct_IO, but by then
we have the i_mutex already, and we will wait on the
work queue to do conversions - which must also take the
i_mutex.  So that won't work.

This was originally exposed by qemu-kvm installing to
a raw disk image with a normal sector-63 alignment.  I've
tested a backport of this patch with qemu, and it does
avoid the corruption.  It is also quite a lot slower
(14 min for package installs, vs. 8 min for well-aligned)
but I'll take slow correctness over fast corruption any day.

Mingming suggested that we can track outstanding
conversions, and wait on those so that non-sparse
files won't be affected, and I've implemented that here;
unaligned AIO to nonsparse files won't take a perf hit.

[tytso@mit.edu: Keep the mutex as a hashed array instead
 of bloating the ext4 inode]

[tytso@mit.edu: Fix up namespace issues so that global
 variables are protected with an "ext4_" prefix.]

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2011-02-12 08:17:34 -05:00