Commit Graph

287770 Commits

Author SHA1 Message Date
Linus Torvalds 14fdbf7eb4 Merge branch 'kvm-updates/3.3' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Fixing a regression with the PMU MSRs when PMU virtualization is
disabled, a guest-internal DoS with the SYSCALL instruction, and a dirty
memory logging race that may cause live migration to fail.

* 'kvm-updates/3.3' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
  KVM: do not #GP on perf MSR writes when vPMU is disabled
  KVM: x86: fix missing checks in syscall emulation
  KVM: x86: extend "struct x86_emulate_ops" with "get_cpuid"
  KVM: Fix __set_bit() race in mark_page_dirty() during dirty logging
2012-02-06 16:26:58 -08:00
Linus Torvalds 8597559a78 GPIO fixes for v3.3-rc2
Straight forward bug fixes in this branch.  A couple of x86 gpio drivers
 missing spinlock initialization, an API change fixup for the samsung driver
 and a name typo fix.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.11 (GNU/Linux)
 
 iQIcBAABAgAGBQJPL3DuAAoJEEFnBt12D9kB3ZAP/ikN3WkdfTvufg9pxfWW5Mwn
 dMuZKRf8tjNVk+orTRc45mgF5w1b9QFfVC90q/hYWExHw34WgkW0yDoWDsnVzuez
 q5WpkxVgvc8lPBD/XC/hS9tNoER5AkdS5w5daw5t499jt17euxADmBSmI0N12api
 U1AmDuLK6281YbYfGJiFYaSyyVpE582XRm+jM+3SuwPpXeFu56sZk9r/ehEhRoT/
 7HjnBmga3REJN3YhUU6545HaD1hXdPCqXg2gYu0T3MCtvtuLAEEnUF6magWLNNfL
 JleZz0fQ1YAM/Q0CYU7S6Fib8j9QWgQOkrPRcvt83OMu7gbQT6A2i8tyM2eXhtlJ
 qFO3+uXdtrGBAGDT4G4Hdzt91xtKSIwZQZxpa0cZdif/MiFBwbOB6NwNURuOGEHk
 6WSQrstZozd6QjeSla+5jrppCJfBgguQ+xixY50xcYnb8xjJrm780UKdySsThXdv
 W1WhK2SgkbMztwBspXfx5SNg02udRwolfuVXL2TinPZOmJgnElK+WtJ+1dkyoQTW
 OhX4hWYCCi4Ao2qAoghWRCeVksFd6XsJvEZJsO57/URlNbEaE++FyCav/BS51CVe
 AP6m0EizXC89XHQW1xWpp5Wcv/8IvC7PfJ9liKXyvjteiSyC7hYlgYqOA9bMn9NP
 +G+6mzfJ66qC/Hvq3GkF
 =WQ/Y
 -----END PGP SIGNATURE-----

Merge tag 'gpio-for-linus' of git://git.secretlab.ca/git/linux-2.6

GPIO fixes for v3.3-rc2

Straight forward bug fixes in this branch.  A couple of x86 gpio drivers
missing spinlock initialization, an API change fixup for the samsung driver
and a name typo fix.

* tag 'gpio-for-linus' of git://git.secretlab.ca/git/linux-2.6:
  gpio: Add missing spin_lock_init in gpio-ml-ioh driver
  gpio: Add missing spin_lock_init in gpio-pch driver
  gpio: samsung: adapt to changes in gpio specifier translator function declaration
  Correct bad gpio naming
2012-02-06 15:29:56 -08:00
Linus Torvalds 105e518093 One patch to fix fan detection on NCT6776F.
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.10 (GNU/Linux)
 
 iQEcBAABAgAGBQJPLedwAAoJEANmWzN1g3l3F18IAJvV8cvTBDr/OV5wmEHx4v7t
 s9l7wfXop2ilsNoAT8mdLwa6ilwatG9k9UhzsF8a2nUR1rNAWA9SFc2v6kLCIZpN
 SVQzh9260X5jrRiUn7dKpLwwxDbepcvHFbpei9Xi/ON97ZjUnHEk27tf/s6MI8ad
 EvCCJR+Lsw1ara6ZX+kQqznUME46dPZA6A5rxXFbRv8QTIe8++f+KSdXWBt8r3yK
 pDfdEGyYwkDoqjUBv2EZ/XMCQdjmdgU9NRExKHJdOyHnea+0RbUik1viuXm7JNQV
 Rd9DtC3vjq8qxCLBhJ1Ay5K8TmAbrgAGGjCo98mNY8D0F9NXL4sfyPW7qxCEa3E=
 =tYxf
 -----END PGP SIGNATURE-----

Merge tag 'hwmon-fixes-for-3.3-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging

One patch to fix fan detection on NCT6776F.

* tag 'hwmon-fixes-for-3.3-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging:
  hwmon: (w83627ehf) Fix number of fans for NCT6776F
2012-02-06 15:25:48 -08:00
Heiko Carstens 96e02d1586 exec: fix use-after-free bug in setup_new_exec()
Setting the task name is done within setup_new_exec() by accessing
bprm->filename. However this happens after flush_old_exec().
This may result in a use after free bug, flush_old_exec() may
"complete" vfork_done, which will wake up the parent which in turn
may free the passed in filename.
To fix this add a new tcomm field in struct linux_binprm which
contains the now early generated task name until it is used.

Fixes this bug on s390:

  Unable to handle kernel pointer dereference at virtual kernel address 0000000039768000
  Process kworker/u:3 (pid: 245, task: 000000003a3dc840, ksp: 0000000039453818)
  Krnl PSW : 0704000180000000 0000000000282e94 (setup_new_exec+0xa0/0x374)
  Call Trace:
  ([<0000000000282e2c>] setup_new_exec+0x38/0x374)
   [<00000000002dd12e>] load_elf_binary+0x402/0x1bf4
   [<0000000000280a42>] search_binary_handler+0x38e/0x5bc
   [<0000000000282b6c>] do_execve_common+0x410/0x514
   [<0000000000282cb6>] do_execve+0x46/0x58
   [<00000000005bce58>] kernel_execve+0x28/0x70
   [<000000000014ba2e>] ____call_usermodehelper+0x102/0x140
   [<00000000005bc8da>] kernel_thread_starter+0x6/0xc
   [<00000000005bc8d4>] kernel_thread_starter+0x0/0xc
  Last Breaking-Event-Address:
   [<00000000002830f0>] setup_new_exec+0x2fc/0x374

  Kernel panic - not syncing: Fatal exception: panic_on_oops

Reported-by: Sebastian Ott <sebott@linux.vnet.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-02-06 15:15:20 -08:00
Keith Packard c898261c0d drm/i915: Force explicit bpp selection for intel_dp_link_required
It is never correct to use intel_crtc->bpp in intel_dp_link_required,
so instead pass an explicit bpp in to this function. This patch
only supports 18bpp and 24bpp modes, which means that 10bpc modes will
be computed incorrectly. Fixing that will require more extensive
changes, and so must be addressed separately from this bugfix.

intel_dp_link_required is called from intel_dp_mode_valid and
intel_dp_mode_fixup.

* intel_dp_mode_valid is called to list supported modes; in this case,
  the current crtc values cannot be relevant as the modes in question
  may never be selected. Thus, using intel_crtc->bpp is never right.

* intel_dp_mode_fixup is called during mode setting, but it is run
  well before ironlake_crtc_mode_set is called to set intel_crtc->bpp,
  so using intel_crtc-bpp in this path can only ever get a stale
  value.

Cc: Lubos Kolouch <lubos.kolouch@gmail.com>
Cc: Adam Jackson <ajax@redhat.com>
Cc: stable@vger.kernel.org
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=42263
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=44881
Tested-by: Dave Airlie <airlied@redhat.com>
Tested-by: camalot@picnicpark.org (Dell Latitude 6510)
Tested-by: Roland Dreier <roland@digitalvampire.org>
Signed-off-by: Keith Packard <keithp@keithp.com>
2012-02-06 14:34:29 -08:00
Naveen N. Rao a4a03fc7ef perf evsel: Fix an issue where perf report fails to show the proper percentage
This patch fixes an issue where perf report shows nan% for certain
perf.data files. The below is from a report for a do_fork probe:

   -nan%           sshd  [kernel.kallsyms]  [k] do_fork
   -nan%    packagekitd  [kernel.kallsyms]  [k] do_fork
   -nan%    dbus-daemon  [kernel.kallsyms]  [k] do_fork
   -nan%           bash  [kernel.kallsyms]  [k] do_fork

A git bisect shows commit f3bda2c as the cause. However, looking back
through the git history, I saw commit 640c03c which seems to have
removed the required initialization for perf_sample->period. The problem
only started showing after commit f3bda2c. The below patch re-introduces
the initialization and it fixes the problem for me.

With the below patch, for the same perf.data:

  73.08%             bash  [kernel.kallsyms]  [k] do_fork
   8.97%      11-dhclient  [kernel.kallsyms]  [k] do_fork
   6.41%             sshd  [kernel.kallsyms]  [k] do_fork
   3.85%        20-chrony  [kernel.kallsyms]  [k] do_fork
   2.56%         sendmail  [kernel.kallsyms]  [k] do_fork

This patch applies over current linux-tip commit 9949284.

Problem introduced in:

$ git describe 640c03c
v2.6.37-rc3-83-g640c03c

Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Robert Richter <robert.richter@amd.com>
Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Cc: stable@kernel.org
Link: http://lkml.kernel.org/r/20120203170113.5190.25558.stgit@localhost6.localdomain6
Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2012-02-06 18:59:38 -02:00
Jiri Olsa bf32c9ebc9 perf tools: Fix prefix matching for kernel maps
In some perf ancient versions we used '[kernel.kallsyms._text]' as the
name for the kernel map.

This got changed with commit:
  perf: 'perf kvm' tool for monitoring guest performance from host
  commit a1645ce12a
  Author: Zhang, Yanmin <yanmin_zhang@linux.intel.com>

and we started to use following name '[kernel.kallsyms]_text'.

This name change is important for the report code dealing with ancient
perf data. When processing the kernel map event, we need to recognize
the old naming (dont match the last ']') and initialize the kernel map
correctly.

The subsequent call to maps__set_kallsyms_ref_reloc_sym deals with the
superfluous ']' to get correct symbol name.

Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1328461865-6127-1-git-send-email-jolsa@redhat.com
Signed-off-by: Jiri Olsa <jolsa@redhat.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2012-02-06 18:57:39 -02:00
Jiri Olsa 7a0153ee15 perf tools: Fix perf stack to non executable on x86_64
By adding following objects:
  bench/mem-memcpy-x86-64-asm.o
the x86_64 perf binary ended up with executable stack.

The reason was that above object are assembler sourced and is missing the
GNU-stack note section. In such case the linker assumes that the final binary
should not be restricted at all and mark the stack as RWX.

Adding section ".note.GNU-stack" definition to mentioned object, with all
flags disabled, thus omiting this object from linker stack flags decision.

Problem introduced in:

  $ git describe ea7872b
  v2.6.37-rc2-19-gea7872b

Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=783570
Reported-by: Clark Williams <williams@redhat.com>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: stable@kernel.org
Link: http://lkml.kernel.org/r/1328100848-5630-1-git-send-email-jolsa@redhat.com
Signed-off-by: Jiri Olsa <jolsa@redhat.com>
[ committer note: Backported fix to perf/urgent (3.3-rc2+) ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2012-02-06 18:54:06 -02:00
Thadeu Lima de Souza Cascardo 7e2eb99cc6 mlx4: fix DMA mapping leak when allocation fails
mlx4_en_prepare_rx_desc does not correctly clean up after it finds an
allocation failure. It should unmap a page before calling put_page, but
it only calls the later.

This bug would prevent a device removal using hotplug after setting the
device MTU to 9000 and opening the network interface. After the fix, we
still see the allocation failure with MTU 9000, but we are able to
remove the device.

Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-02-06 14:42:28 -05:00
Thadeu Lima de Souza Cascardo 68355f7113 mlx4: allow device removal by fixing dma unmap size
After opening the network interface, Mellanox ConnectX device cannot be
removed by hotplug because it has not properly unmapped all DMA memory.

It happens that mlx4_en_activate_rx_rings overrides the variable that
keeps the size of the memory mapped.

This is fixed by passing to mlx4_en_destroy_rx_ring the same size that is
given to mlx4_en_create_rx_ring.

After applying this patch, hot unplugging the device works after opening
the interface.

Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-02-06 14:42:28 -05:00
Henrik Rydberg 2e6b411971 bcma: don't fail for bad SPROM CRC
The brcmsmac driver is now using the bcma SPROM CRC check, which does
not recognize all chipsets that were functional prior to the switch. In
particular, the current code bails out on odd CRC errors in recent
Macbooks. This patch ignores those errors, with the argument that an
unrecognized SPROM should be treated similarly to a non-existing one.

Signed-off-by: Henrik Rydberg <rydberg@euromail.se>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2012-02-06 14:37:52 -05:00
Eugenia Emantayev 4c41b36737 mlx4_core: use correct port for steering
Use port number for correct steering (list per port).
Before the fix all steering entries (for both physical ports)
were managed in first port structures, so we had leakage of resources
for port 2.

Signed-off-by: Eugenia Emantayev <eugenia@mellanox.co.il>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-02-06 12:10:11 -05:00
Eugenia Emantayev 4df9950406 mlx4_core: use correct flag for unicast_promisc
Use MLX4_DEV_CAP_FLAG_VEP_UC_STEER for unicast_promisc_add/remove
Unicast entries were managed in wrong data structures.

Signed-off-by: Eugenia Emantayev <eugenia@mellanox.co.il>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-02-06 12:10:11 -05:00
Eugenia Emantayev f08ad06c05 mlx4_core: fix memory leak at multi_func_cleanup
Perform cleanup also in non-master flow.
The VFs use communication channel as well.

Signed-off-by: Eugenia Emantayev <eugenia@mellanox.co.il>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-02-06 12:10:11 -05:00
Jesper Juhl 715252d419 IB/srpt: Don't return freed pointer from srpt_alloc_ioctx_ring()
Signed-off-by: Jesper Juhl <jj@chaosbits.net>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-02-06 08:57:11 -08:00
Felix Fietkau 55a2bb4a6d ath9k_hw: fix a RTS/CTS timeout regression
commit adb5066 "ath9k_hw: do not apply the 2.4 ghz ack timeout
workaround to cts" reduced the hardware CTS timeout to the normal
values specified by the standard, but it turns out while it doesn't
need the same extra time that it needs for the ACK timeout, it
does need more than the value specified in the standard, but only
for 2.4 GHz.

This patch brings the CTS timeout value in sync with the initialization
values, while still allowing adjustment for bigger distances.

Signed-off-by: Felix Fietkau <nbd@openwrt.org>
Cc: stable@vger.kernel.org
Reported-by: Seth Forshee <seth.forshee@canonical.com>
Reported-by: Marek Lindner <lindner_marek@yahoo.de>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2012-02-06 11:34:02 -05:00
Felix Fietkau f88373fa47 ath9k: fix a WEP crypto related regression
commit b4a82a0 "ath9k_hw: fix interpretation of the rx KeyMiss flag"
fixed the interpretation of the KeyMiss flag for keycache based lookups,
however WEP encryption uses a static index, so KeyMiss is always asserted
for it, even though frames are decrypted properly.
Fix this by clearing the ATH9K_RXERR_KEYMISS flag if no keycache based
lookup was performed.

Signed-off-by: Felix Fietkau <nbd@openwrt.org>
Cc: stable@vger.kernel.org
Reported-by: Laurent Bonnans <bonnans.l@gmail.com>
Reported-by: Jurica Vukadin <u.ra604@googlemail.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2012-02-06 11:34:02 -05:00
Amitkumar Karwar 2da8cbf8a6 mwifiex: add NULL checks in driver unload path
If driver load is failed, sometimes few pointers may remain
uninitialized ex. priv->wdev, priv->netdev, adapter->sleep_cfm
This will cause NULL pointer dereferance while unloading the
driver.

Signed-off-by: Amitkumar Karwar <akarwar@marvell.com>
Signed-off-by: Kiran Divekar <dkiran@marvell.com>
Signed-off-by: Bing Zhao <bzhao@marvell.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2012-02-06 11:34:02 -05:00
Olof Johansson ab74a91429 Merge branch 'v3.3-samsung-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/kgene/linux-samsung into fixes
* 'v3.3-samsung-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/kgene/linux-samsung:
  ARM: EXYNOS: Correct M-5MOLS sensor clock frequency on Universal C210 board
  ARM: EXYNOS: Correct framebuffer window size on Nuri board
  ARM: SAMSUNG: Fix missing api-change from subsys_interface change
  ARM: EXYNOS: Fix "warning: initialization from incompatible pointer type"
  ARM: S5PV210: Fix the name of exynos4_clk_hdmiphy_ctrl() for S5PV210
  ARM: EXYNOS: Remove build warning without enabling PM
  ARM: SAMSUNG: Fix platform data setup for I2C adapter 0
  ARM: EXYNOS: fix non-SMP builds for EXYNOS4
  ARM: S3C6410: Use device names for both I2C clocks
  ARM: S3C64XX: Make s3c64xx_init_uarts() static
2012-02-06 08:31:33 -08:00
Przemo Firszt d7cb3dbd10 HID: wacom: Fix invalid power_supply_powers calls
power_supply_powers calls added in 35b4c01e2 ("power_supply: add "powers" links
to self-powered HID devices") have to be called after power device is created.
This patch also fixes the second call - it has to be "ac" instead of "battery"

Signed-off-by: Przemo Firszt <przemo@firszt.eu>
Signed-off-by: Chris Bagwell <chris@cnpbagwell.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2012-02-06 16:14:20 +01:00
Jiri Kosina d4730ace0c Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux into upstream-fixes
Sync with Linus' tree. This is necessary to have a base for
patch that fixes commit 35b4c01e29 ("power_supply: add "powers"
links to self-powered HID devices") which went in through Anton's
tree.
2012-02-06 16:12:16 +01:00
Mark Brown db966f8abb ASoC: wm8994: Enabling VMID should take a runtime PM reference
We can enable VMID independently of the bias in some use cases so we need
to ensure that the core device is powered up.

Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
Cc: stable@vger.kernel.org
2012-02-06 12:08:33 +00:00
Takashi Iwai eedec3d385 ALSA: hda/realtek - Fix a wrong condition
sparse complains that "spec->multiout.dac_nids" is a pointer.

sound/pci/hda/patch_realtek.c:2321:37: error: incompatible types for operation (>)
sound/pci/hda/patch_realtek.c:2321:37:    left side has type unsigned short const [usertype] *dac_nids
sound/pci/hda/patch_realtek.c:2321:37:    right side has type int

It was meant to be num_dacs instead of dac_nids.
Although the current code still works as expected (when num_dacs is zero,
dac_nids should be NULL, too), better to fix now, of course.

Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Cc: David Henningsson <david.henningsson@canonical.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
2012-02-06 10:27:06 +01:00
Jesper Juhl 226e01ef0d ALSA: emu8000: Remove duplicate linux/moduleparam.h include from emu8000_patch.c
The header 'linux/moduleparam.h' is included twice in
'sound/isa/sb/emu8000_patch.c'. Once is enough.

Signed-off-by: Jesper Juhl <jj@chaosbits.net>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
2012-02-06 10:22:54 +01:00
Dan Carpenter 822bfa51ce cdrom: use copy_to_user() without the underscores
"nframes" comes from the user and "nframes * CD_FRAMESIZE_RAW" can wrap
on 32 bit systems.  That would have been ok if we used the same wrapped
value for the copy, but we use a shifted value.  We should just use the
checked version of copy_to_user() because it's not going to make a
difference to the speed.

Cc: stable@vger.kernel.com
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2012-02-06 10:20:45 +01:00
Shaohua Li 9fa73472dd block: fix ioc locking warning
Meelis reported a warning:

WARNING: at kernel/timer.c:1122 run_timer_softirq+0x199/0x1ec()
Hardware name: 939Dual-SATA2
timer: cfq_idle_slice_timer+0x0/0xaa preempt leak: 00000102 -> 00000103
Modules linked in: sr_mod cdrom videodev media drm_kms_helper ohci_hcd ehci_hcd v4l2_compat_ioctl32 usbcore i2c_ali15x3 snd_seq drm snd_timer snd_seq
Pid: 0, comm: swapper Not tainted 3.3.0-rc2-00110-gd125666 #176
Call Trace:
 <IRQ>  [<ffffffff81022aaa>] warn_slowpath_common+0x7e/0x96
 [<ffffffff8114c485>] ? cfq_slice_expired+0x1d/0x1d
 [<ffffffff81022b56>] warn_slowpath_fmt+0x41/0x43
 [<ffffffff8114c526>] ? cfq_idle_slice_timer+0xa1/0xaa
 [<ffffffff8114c485>] ? cfq_slice_expired+0x1d/0x1d
 [<ffffffff8102c124>] run_timer_softirq+0x199/0x1ec
 [<ffffffff81047a53>] ? timekeeping_get_ns+0x12/0x31
 [<ffffffff810145fd>] ? apic_write+0x11/0x13
 [<ffffffff81027475>] __do_softirq+0x74/0xfa
 [<ffffffff812f337a>] call_softirq+0x1a/0x30
 [<ffffffff81002ff9>] do_softirq+0x31/0x68
 [<ffffffff810276cf>] irq_exit+0x3d/0xa3
 [<ffffffff81014aca>] smp_apic_timer_interrupt+0x6b/0x77
 [<ffffffff812f2de9>] apic_timer_interrupt+0x69/0x70
 <EOI>  [<ffffffff81040136>] ? sched_clock_cpu+0x73/0x7d
 [<ffffffff81040136>] ? sched_clock_cpu+0x73/0x7d
 [<ffffffff8100801f>] ? default_idle+0x1e/0x32
 [<ffffffff81008019>] ? default_idle+0x18/0x32
 [<ffffffff810008b1>] cpu_idle+0x87/0xd1
 [<ffffffff812de861>] rest_init+0x85/0x89
 [<ffffffff81659a4d>] start_kernel+0x2eb/0x2f8
 [<ffffffff8165926e>] x86_64_start_reservations+0x7e/0x82
 [<ffffffff81659362>] x86_64_start_kernel+0xf0/0xf7

this_q == locked_q is possible. There are two problems here:
1. In UP case, there is preemption counter issue as spin_trylock always
successes.
2. In SMP case, the loop breaks too earlier.

Signed-off-by: Shaohua Li <shaohua.li@intel.com>
Reported-by: Meelis Roos <mroos@linux.ee>
Reported-by: Knut Petersen <Knut_Petersen@t-online.de>
Tested-by: Knut Petersen <Knut_Petersen@t-online.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2012-02-06 08:57:29 +01:00
Danny Kukawka c88db23325 spi-topcliff-pch: rename pch_spi_pcidev to pch_spi_pcidev_driver
Rename static struct pci_driver pch_spi_pcidev to
pch_spi_pcidev_driver to get rid of warnings from modpost checks.

Signed-off-by: Danny Kukawka <danny.kukawka@bisect.de>
Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
2012-02-05 23:24:17 -07:00
Sylwester Nawrocki b1a4874e0d spi: Add spi-s3c64xx driver dependency on ARCH_EXYNOS4
The spi-s3c64xx driver is also used on Exynos4 so update the dependency
to enable build on those platforms.

Signed-off-by: Sylwester Nawrocki <s.nawrocki@samsung.com>
Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
[grant.likely: relax depends to ARCH_EXYNOS instead of ARCH_EXYNOS4]
Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
2012-02-05 23:23:43 -07:00
Wu Fengguang 977b7e3a52 writeback: fix dereferencing NULL bdi->dev on trace_writeback_queue
When a SD card is hot removed without umount, del_gendisk() will call
bdi_unregister() without destroying/freeing it. This leaves the bdi in
the bdi->dev = NULL, bdi->wb.task = NULL, bdi->bdi_list removed state.

When sync(2) gets the bdi before bdi_unregister() and calls
bdi_queue_work() after the unregister, trace_writeback_queue will be
dereferencing the NULL bdi->dev. Fix it with a simple test for NULL.

LKML-reference: http://lkml.org/lkml/2012/1/18/346
Cc: stable@kernel.org
Reported-by: Rabin Vincent <rabin@rab.in>
Tested-by: Namjae Jeon <linkinjeon@gmail.com>
Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
2012-02-06 11:17:25 +08:00
David Lv b530b1930b via-velocity: S3 resume fix.
Initially diagnosed on Ubuntu 11.04 with kernel 2.6.38.

velocity_close is not called during a suspend / resume cycle in this
driver and it has no business playing directly with power states.

Signed-off-by: David Lv <DavidLv@viatech.com.cn>
Acked-by: Francois Romieu <romieu@fr.zoreil.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-02-05 17:47:09 -05:00
Herbert Xu 3a92d687c8 crypto: sha512 - Avoid stack bloat on i386
Unfortunately in reducing W from 80 to 16 we ended up unrolling
the loop twice.  As gcc has issues dealing with 64-bit ops on
i386 this means that we end up using even more stack space (>1K).

This patch solves the W reduction by moving LOAD_OP/BLEND_OP
into the loop itself, thus avoiding the need to duplicate it.

While the stack space still isn't great (>0.5K) it is at least
in the same ball park as the amount of stack used for our C sha1
implementation.

Note that this patch basically reverts to the original code so
the diff looks bigger than it really is.

Cc: stable@vger.kernel.org
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2012-02-05 15:09:28 +11:00
Guenter Roeck 585c0fd821 hwmon: (w83627ehf) Fix number of fans for NCT6776F
NCT6776F can select fan input pins for fans 3 to 5 with a secondary set of
chip register bits. Check that second set of bits in addition to the first set
to detect if fans 3..5 are monitored.

Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Cc: stable@vger.kernel.org # 3.0+
Acked-by: Jean Delvare <khali@linux-fr.org>
2012-02-04 18:08:23 -08:00
Julian Anastasov e6b45241c5 ipv4: reset flowi parameters on route connect
Eric Dumazet found that commit 813b3b5db8
(ipv4: Use caller's on-stack flowi as-is in output
route lookups.) that comes in 3.0 added a regression.
The problem appears to be that resulting flowi4_oif is
used incorrectly as input parameter to some routing lookups.
The result is that when connecting to local port without
listener if the IP address that is used is not on a loopback
interface we incorrectly assign RTN_UNICAST to the output
route because no route is matched by oif=lo. The RST packet
can not be sent immediately by tcp_v4_send_reset because
it expects RTN_LOCAL.

	So, change ip_route_connect and ip_route_newports to
update the flowi4 fields that are input parameters because
we do not want unnecessary binding to oif.

	To make it clear what are the input parameters that
can be modified during lookup and to show which fields of
floiw4 are reused add a new function to update the flowi4
structure: flowi4_update_output.

Thanks to Yurij M. Plotnikov for providing a bug report including a
program to reproduce the problem.

Thanks to Eric Dumazet for tracking the problem down to
tcp_v4_send_reset and providing initial fix.

Reported-by: Yurij M. Plotnikov <Yurij.Plotnikov@oktetlabs.ru>
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-02-04 19:29:48 -05:00
Linus Torvalds 23783f817b Power management fixes for 3.3-rc3
Three power management regression fixes, one for a recent regression introcuded
 by the freezer changes during the 3.3 merge window and two for regressions
 in cpuidle (resulting from PM QoS changes) and in the hibernate user space
 interface, both introduced during the 3.2 development cycle.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.18 (GNU/Linux)
 
 iQIcBAABAgAGBQJPLbiPAAoJEKhOf7ml8uNscvwQAJYYhSBL+ouK8ERS0OLkeEoB
 k4O1Ap0hb5Kv54Sr85WKEm5zGRDJXUxlWeMklo9K/fvs04CU1gsBb8jhDdbZ2ovE
 rnyybPjfieExQbLxX6nYIP4qKMLtnZvHhHpafuDSUz0RWq/7sCTiFI2htNj97gGu
 DzXYpeePFgvzG6AaznywWkvNdXoQfmsTC0adDrXWcuKXnNrH6h8o/OIB+pO70Szw
 gmU8SjVGGQjrlnuQ+Ku4WqbSyXs1bXlUkyTHJilg6CNJySrA/LUHhKPrRnP1i3Hu
 LxX/rsrTqohhD1tz1qQOpnMiu86FSez+UVA65b2cF3EqZbNROY2+O1/V+OlczKYy
 V9Q3rk+J4uRJtnL8DEgcniMGrRsjyle5USN5KDX50BkrC56h3mZirnEu1yaiMIJn
 K8NWI/4JdK7JbA6f2hXuPuesmudSP4uo8vuUzKthEUi88QReYXYSMcz/Fy/G9z8n
 JW7PimC5OmeTwYIqBcjZf+8j/1u6cHaEkvjPAJhIUgCR/ZVi6VFySnUByDD6JKTJ
 bQcUSqZZ8TvEc4A6JjG18/QfmWIZMErfuG0WAKb8sqtXoPkHKR/XXjbaXof9Oppn
 nRS5iJUaZGY4YivSHZZOFAk24ThqKx5ZK3qXq/dBbj9JwtJdc+++b9f0RwXUHjd9
 ECoM3bFtO8ewINmZ7wRQ
 =EKGs
 -----END PGP SIGNATURE-----

Merge tag 'pm-fixes-for-3.3-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Power management fixes for 3.3-rc3

Three power management regression fixes, one for a recent regression introcuded
by the freezer changes during the 3.3 merge window and two for regressions
in cpuidle (resulting from PM QoS changes) and in the hibernate user space
interface, both introduced during the 3.2 development cycle.

They include:

* Two hibernate (s2disk) regression fixes from Srivatsa S. Bhat (for
 regressions introduced during the 3.3 merge window and during the 3.2
 development cycle).

* A cpuidle fix from Venki Pallipadi for a regression resulting from PM QoS
 changes during the 3.2 development cycle causing cpuidle to work incorrectly
 for CONFIG_PM unset.

* tag 'pm-fixes-for-3.3-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  PM / QoS: CPU C-state breakage with PM Qos change
  PM / Freezer: Thaw only kernel threads if freezing of kernel threads fails
  PM / Hibernate: Thaw kernel threads in SNAPSHOT_CREATE_IMAGE ioctl path
2012-02-04 15:21:39 -08:00
Shawn Lu e2446eaab5 tcp_v4_send_reset: binding oif to iif in no sock case
Binding RST packet outgoing interface to incoming interface
for tcp v4 when there is no socket associate with it.
when sk is not NULL, using sk->sk_bound_dev_if instead.
(suggested by Eric Dumazet).

This has few benefits:
1. tcp_v6_send_reset already did that.
2. This helps tcp connect with SO_BINDTODEVICE set. When
connection is lost, we still able to sending out RST using
same interface.
3. we are sending reply, it is most likely to be succeed
if iif is used

Signed-off-by: Shawn Lu <shawn.lu@ericsson.com>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-02-04 18:20:05 -05:00
Neil Horman 5962b35c1d netprio_cgroup: Fix obo in get_prioidx
It was recently pointed out to me that the get_prioidx function sets a bit in
the prioidx map prior to checking to see if the index being set is out of
bounds.  This patch corrects that, avoiding the possiblity of us writing beyond
the end of the array

Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
Reported-by: Stanislaw Gruszka <sgruszka@redhat.com>
CC: Stanislaw Gruszka <sgruszka@redhat.com>
CC: "David S. Miller" <davem@davemloft.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-02-04 16:30:24 -05:00
Venkatesh Pallipadi d020283dc6 PM / QoS: CPU C-state breakage with PM Qos change
Looks like change "PM QoS: Move and rename the implementation files"
merged during the 3.2 development cycle made PM QoS depend on
CONFIG_PM which depends on (PM_SLEEP || PM_RUNTIME).

That breaks CPU C-states with kernels not having these CONFIGs, causing CPUs
to spend time in Polling loop idle instead of going into deep C-states,
consuming way way more power. This is with either acpi idle or intel idle
enabled.

Either CONFIG_PM should be enabled with any pm_qos users or
the !CONFIG_PM pm_qos_request() should return sane defaults not to break
the existing users. Here's is the patch for the latter option.

[rjw: Modified the changelog slightly.]

Signed-off-by: Venkatesh Pallipadi <venki@google.com>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Cc: stable@vger.kernel.org
2012-02-04 22:23:17 +01:00
Srivatsa S. Bhat 379e0be812 PM / Freezer: Thaw only kernel threads if freezing of kernel threads fails
If freezing of kernel threads fails, we are expected to automatically
thaw tasks in the error recovery path. However, at times, we encounter
situations in which we would like the automatic error recovery path
to thaw only the kernel threads, because we want to be able to do
some more cleanup before we thaw userspace. Something like:

error = freeze_kernel_threads();
if (error) {
	/* Do some cleanup */

	/* Only then thaw userspace tasks*/
	thaw_processes();
}

An example of such a situation is where we freeze/thaw filesystems
during suspend/hibernation. There, if freezing of kernel threads
fails, we would like to thaw the frozen filesystems before thawing
the userspace tasks.

So, modify freeze_kernel_threads() to thaw only kernel threads in
case of freezing failure. And change suspend_freeze_processes()
accordingly. (At the same time, let us also get rid of the rather
cryptic usage of the conditional operator (:?) in that function.)

[rjw: In fact, this patch fixes a regression introduced during the
 3.3 merge window, because without it thaw_processes() may be called
 before swsusp_free() in some situations and that may lead to massive
 memory allocation failures.]

Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Acked-by: Tejun Heo <tj@kernel.org>
Acked-by: Nigel Cunningham <nigel@tuxonice.net>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
2012-02-04 22:23:05 +01:00
David S. Miller 1715322f3e Merge branch 'for-davem' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless 2012-02-04 16:10:34 -05:00
Linus Torvalds d9142025f5 arm-soc fixes for 3.3-rc
* A series of OMAP regression fixes for merge window fallout
 * Two patches for Davinci, one removes some misdefined clocks, the other
   is a regression fix for merge window fallout
 * Two patches that makes Broadcom bcmring build again (and removes a
   bunch of unused code in the process)
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.11 (GNU/Linux)
 
 iQIcBAABAgAGBQJPLYzVAAoJEIwa5zzehBx3QSMP/17+6JYJEnzNgZAnrCNm03EK
 4f72pLIh6SSKn85jrq+mYlUorakUhlkF26G7+IyYz96YsvvEjrp8mKqlnate4Svv
 veROfNIa12AifX6H/0hr9xiRJLK+RLz933HAywh24x9GeKBAjoS72EcZRmThZWJ9
 RssfzAXAUbAfwYJyczcEpLOLkg7HJIAx5w7mQLh/hkhWkDKAugO8z1tSKESN98F3
 HJtSKE84Xg17y0c1SRoBTV3npSf3D2RptSX0r7H/nT3dpM1lSjoPvawtqdGf+ibT
 plHmRY75ebGEq18CHOl177YXPq0iUyfCDLqsu/nbcTtWoY/Cdfdj5gmMHuCYItcY
 8JJ758KUIJeHBpjbfeFbAPXJwa2FTnhgG9IQmCWuO47mquBCAi2TCWMmyNi1+kuZ
 PB+RkoU8O99bIwI+9vza1apVjjqx7rW6+9IS56KXkKYq6FkSbkvtpuveYvkvuboT
 D9o9GKmLL2Z4qE0XG7jsFi7RU1dymTaWow4chyx/iVvNvZnA/Yb9z1CmbydsyLV3
 ND1e7Tt49OIjy764Vw1KNmnIhd2joFoqF4jcw/+ID+Hy7cFSuBjjIR6TkZrqYjNf
 FhgqrRTl4iIw9zIroAJFuWOQOe82vxDDRVTe/7S7SrhvF5nTDF0AqHmeenBo6cLM
 0CfOwbRxbpgXfDpaHbH9
 =tFsG
 -----END PGP SIGNATURE-----

Merge tag 'fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc

arm-soc fixes for 3.3-rc

* A series of OMAP regression fixes for merge window fallout
* Two patches for Davinci, one removes some misdefined clocks, the other
  is a regression fix for merge window fallout
* Two patches that makes Broadcom bcmring build again (and removes a
  bunch of unused code in the process)

* tag 'fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc:
  ARM: bcmring: fix build failure in mach-bcmring/arch.c
  ARM: bcmring: remove unused DMA map code
  ARM: davinci: update mdio bus name
  ARM: OMAP2+: arch/arm/mach-omap2/smartreflex.c: add missing iounmap
  ARM: OMAP2+: arch/arm/mach-omap2/devices.c: introduce missing kfree
  ARM: OMAP: fix MMC2 loopback clock handling
  ARM: OMAP: fix erroneous mmc2 clock change on mmc3 setup
  ARM: OMAP2+: GPMC: fix device size setup
  ARM: OMAP2+: timer: Fix crash due to wrong arg to __omap_dm_timer_read_counter
  ARM: OMAP3: hwmod data: register dss hwmods after dss_core
  ARM: OMAP2/3: PRM: fix missing plat/irqs.h build breakage
  ARM: OMAP2+: io: fix compilation breakage on 2420-only configs
  ARM: OMAP4: hwmod data: Add names for DMIC memory address space
  ARM: OMAP3: hwmod data: add SYSC_HAS_ENAWAKEUP for dispc
  ARM: OMAP2+: hwmod data: split omap2/3 dispc hwmod class
  ARM: davinci: DA850: remove non-existing pll1_sysclk4-7 clocks
  ARM: OMAP2: fix regulator warnings
  ARM: OMAP2: fix omap3 touchbook kconfig warning
  i2c: OMAP: Fix OMAP1 build error
2012-02-04 12:11:40 -08:00
Paul Gortmaker ca43784daa ARM: bcmring: fix build failure in mach-bcmring/arch.c
Upstream commit d1fce9c115

   "ARM: restart: bcmring: use new restart hook"

breaks building of this platform, since what used to be the
last field of the MACHINE_START/END block didn't have a
trailing comma.  Once another field was added below, we get:

arch/arm/mach-bcmring/arch.c:198: error: request for member 'restart' in something not a structure or union

Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Acked-by: Jiandong Zheng <jdzheng@broadcom.com>
Signed-off-by: Olof Johansson <olof@lixom.net>
2012-02-04 11:38:20 -08:00
JD Zheng 864e5e360e ARM: bcmring: remove unused DMA map code
Remove BCMRING DMA map code which is no longer used.

This also fixes a build error with dma.c introduced by
bfcd2ea6a4.

Signed-off-by: Jiandong Zheng <jdzheng@broadcom.com>
Signed-off-by: Olof Johansson <olof@lixom.net>
2012-02-04 11:27:13 -08:00
Linus Torvalds 31c150a11c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
  Input: i8042 - add Lenovo Ideapad U455 to 'reset' blacklist
  Input: serio_raw - return proper result when serio_raw_read fails
  Input: document device properties
  Input: twl4030_keypad - fix comment (trivial)
  Input: gpio_keys - fix struct device declared inside parameter list
  Input: evdev - fix variable initialisation
2012-02-04 10:57:42 -08:00
Linus Torvalds 4554c135a0 Merge branch 'fixes' of git://git.infradead.org/users/vkoul/slave-dma
* 'fixes' of git://git.infradead.org/users/vkoul/slave-dma:
  i.MX SDMA: Fix burstsize settings
  ARM: mach-shmobile: both USB DMAC instances on sh7372 are slave-only
  dma: sh_dma: not all SH DMAC implementations support MEMCPY
  at_hdmac: bugfix for enabling channel irq
  dmaengine: fix missing 'cnt' in ?: in dmatest
2012-02-04 10:54:26 -08:00
Linus Torvalds 82bdc843c2 Merge branch 'akpm'
* akpm:
  mm: compaction: check pfn_valid when entering a new MAX_ORDER_NR_PAGES block during isolation for migration
  readahead: fix pipeline break caused by block plug
  kprobes: fix a memory leak in function pre_handler_kretprobe()
  drivers/tty/vt/vt_ioctl.c: fix KDFONTOP 32bit compatibility layer
  lkdtm: avoid calling lkdtm_do_action() with spinlock held
  mm/filemap_xip.c: fix race condition in xip_file_fault()
  mm/memcontrol.c: fix warning with CONFIG_NUMA=n
  avr32: select generic atomic64_t support
  mm: postpone migrated page mapping reset
  xtensa: fix memscan()
  MAINTAINERS: update lguest F: patterns
  MAINTAINERS: remove staging sections
  MAINTAINERS: remove iMX5 section
  MAINTAINERS: update partitions block F: patterns
2012-02-04 10:51:54 -08:00
Linus Torvalds 71b1b20b8a - Fix a regression in 16-bit Atmel NAND flash which was introduced in 3.1
- Fix breakage with MTD suspend caused by the API rework
  - Fix a problem with resetting the MX28 BCH module
  - A couple of other trivial fixes
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.11 (GNU/Linux)
 
 iEYEABECAAYFAk8s6HsACgkQdwG7hYl686MIiACgxpNoUWFvq8z+2UGXxsLnNrio
 hhcAn31H7TY3KUuIQBo4CqG2dEjNwpCw
 =DRWp
 -----END PGP SIGNATURE-----

Merge tag 'for-linus-3.3' of git://git.infradead.org/~dwmw2/mtd-3.3

 - Fix a regression in 16-bit Atmel NAND flash which was introduced in 3.1
 - Fix breakage with MTD suspend caused by the API rework
 - Fix a problem with resetting the MX28 BCH module
 - A couple of other trivial fixes

* tag 'for-linus-3.3-20120204' of git://git.infradead.org/~dwmw2/mtd-3.3:
  Revert "mtd: atmel_nand: optimize read/write buffer functions"
  mtd: fix MTD suspend
  jffs2: do not initialize variable unnecessarily
  mtd: gpmi-nand bugfix: reset the BCH module when it is not MX23
  mtd: nand: fix typo in comment
2012-02-04 07:17:47 -08:00
Artem Bityutskiy 500823195d Revert "mtd: atmel_nand: optimize read/write buffer functions"
This reverts commit fb5427508a.

The reason is that it breaks 16 bits NAND flash as it was reported by
Nikolaus Voss and confirmed by Eric Bénard.

Nicolas Ferre <nicolas.ferre@atmel.com> alco confirmed:
"After double checking with designers, I must admit that I misunderstood
the way of optimizing accesses to SMC. 16 bit nand is not so common
those days..."

Reported-by: Nikolaus Voss <n.voss@weinmann.de>
Acked-by: Nicolas Ferre <nicolas.ferre@atmel.com>
Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Cc: stable@kernel.org [3.1+]
2012-02-04 08:04:57 +00:00
Linus Torvalds d12566674c Merge branch 'fixes' of git://git.linaro.org/people/rmk/linux-arm
* 'fixes' of git://git.linaro.org/people/rmk/linux-arm:
  ARM: 7314/1: kuser: consistently use usr_ret for returning from helpers
  ARM: 7302/1: Add TLB flushing for both entries in a PMD
  ARM: 7303/1: perf: add empty NODE event definitions for Cortex-A5 and Cortex-A15
  ARM: 7308/1: vfp: flush thread hwstate before copying ptrace registers
  ARM: 7307/1: vfp: fix ptrace regset modification race
  ARM: 7306/1: vfp: flush thread hwstate before restoring context from sigframe
  Revert "ARM: 7304/1: ioremap: fix boundary check when reusing static mapping"
2012-02-03 16:57:40 -08:00
Mel Gorman 0bf380bc70 mm: compaction: check pfn_valid when entering a new MAX_ORDER_NR_PAGES block during isolation for migration
When isolating for migration, migration starts at the start of a zone
which is not necessarily pageblock aligned.  Further, it stops isolating
when COMPACT_CLUSTER_MAX pages are isolated so migrate_pfn is generally
not aligned.  This allows isolate_migratepages() to call pfn_to_page() on
an invalid PFN which can result in a crash.  This was originally reported
against a 3.0-based kernel with the following trace in a crash dump.

PID: 9902   TASK: d47aecd0  CPU: 0   COMMAND: "memcg_process_s"
 #0 [d72d3ad0] crash_kexec at c028cfdb
 #1 [d72d3b24] oops_end at c05c5322
 #2 [d72d3b38] __bad_area_nosemaphore at c0227e60
 #3 [d72d3bec] bad_area at c0227fb6
 #4 [d72d3c00] do_page_fault at c05c72ec
 #5 [d72d3c80] error_code (via page_fault) at c05c47a4
    EAX: 00000000  EBX: 000c0000  ECX: 00000001  EDX: 00000807  EBP: 000c0000
    DS:  007b      ESI: 00000001  ES:  007b      EDI: f3000a80  GS:  6f50
    CS:  0060      EIP: c030b15a  ERR: ffffffff  EFLAGS: 00010002
 #6 [d72d3cb4] isolate_migratepages at c030b15a
 #7 [d72d3d14] zone_watermark_ok at c02d26cb
 #8 [d72d3d2c] compact_zone at c030b8de
 #9 [d72d3d68] compact_zone_order at c030bba1
#10 [d72d3db4] try_to_compact_pages at c030bc84
#11 [d72d3ddc] __alloc_pages_direct_compact at c02d61e7
#12 [d72d3e08] __alloc_pages_slowpath at c02d66c7
#13 [d72d3e78] __alloc_pages_nodemask at c02d6a97
#14 [d72d3eb8] alloc_pages_vma at c030a845
#15 [d72d3ed4] do_huge_pmd_anonymous_page at c03178eb
#16 [d72d3f00] handle_mm_fault at c02f36c6
#17 [d72d3f30] do_page_fault at c05c70ed
#18 [d72d3fb0] error_code (via page_fault) at c05c47a4
    EAX: b71ff000  EBX: 00000001  ECX: 00001600  EDX: 00000431
    DS:  007b      ESI: 08048950  ES:  007b      EDI: bfaa3788
    SS:  007b      ESP: bfaa36e0  EBP: bfaa3828  GS:  6f50
    CS:  0073      EIP: 080487c8  ERR: ffffffff  EFLAGS: 00010202

It was also reported by Herbert van den Bergh against 3.1-based kernel
with the following snippet from the console log.

BUG: unable to handle kernel paging request at 01c00008
IP: [<c0522399>] isolate_migratepages+0x119/0x390
*pdpt = 000000002f7ce001 *pde = 0000000000000000

It is expected that it also affects 3.2.x and current mainline.

The problem is that pfn_valid is only called on the first PFN being
checked and that PFN is not necessarily aligned.  Lets say we have a case
like this

H = MAX_ORDER_NR_PAGES boundary
| = pageblock boundary
m = cc->migrate_pfn
f = cc->free_pfn
o = memory hole

H------|------H------|----m-Hoooooo|ooooooH-f----|------H

The migrate_pfn is just below a memory hole and the free scanner is beyond
the hole.  When isolate_migratepages started, it scans from migrate_pfn to
migrate_pfn+pageblock_nr_pages which is now in a memory hole.  It checks
pfn_valid() on the first PFN but then scans into the hole where there are
not necessarily valid struct pages.

This patch ensures that isolate_migratepages calls pfn_valid when
necessary.

Reported-by: Herbert van den Bergh <herbert.van.den.bergh@oracle.com>
Tested-by: Herbert van den Bergh <herbert.van.den.bergh@oracle.com>
Signed-off-by: Mel Gorman <mgorman@suse.de>
Acked-by: Michal Nazarewicz <mina86@mina86.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-02-03 16:16:41 -08:00
Shaohua Li 3deaa7190a readahead: fix pipeline break caused by block plug
Herbert Poetzl reported a performance regression since 2.6.39.  The test
is a simple dd read, but with big block size.  The reason is:

T1: ra (A, A+128k), (A+128k, A+256k)
T2: lock_page for page A, submit the 256k
T3: hit page A+128K, ra (A+256k, A+384). the range isn't submitted
because of plug and there isn't any lock_page till we hit page A+256k
because all pages from A to A+256k is in memory
T4: hit page A+256k, ra (A+384, A+ 512). Because of plug, the range isn't
submitted again.
T5: lock_page A+256k, so (A+256k, A+512k) will be submitted. The task is
waitting for (A+256k, A+512k) finish.

There is no request to disk in T3 and T4, so readahead pipeline breaks.

We really don't need block plug for generic_file_aio_read() for buffered
I/O.  The readahead already has plug and has fine grained control when I/O
should be submitted.  Deleting plug for buffered I/O fixes the regression.

One side effect is plug makes the request size 256k, the size is 128k
without it.  This is because default ra size is 128k and not a reason we
need plug here.

Vivek said:

: We submit some readahead IO to device request queue but because of nested
: plug, queue never gets unplugged.  When read logic reaches a page which is
: not in page cache, it waits for page to be read from the disk
: (lock_page_killable()) and that time we flush the plug list.
:
: So effectively read ahead logic is kind of broken in parts because of
: nested plugging.  Removing top level plug (generic_file_aio_read()) for
: buffered reads, will allow unplugging queue earlier for readahead.

Signed-off-by: Shaohua Li <shaohua.li@intel.com>
Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
Reported-by: Herbert Poetzl <herbert@13thfloor.at>
Tested-by: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-02-03 16:16:41 -08:00