linux_old1/drivers
Douglas Anderson 9ea61cac0b dm bufio: avoid sleeping while holding the dm_bufio lock
We've seen in-field reports showing _lots_ (18 in one case, 41 in
another) of tasks all sitting there blocked on:

  mutex_lock+0x4c/0x68
  dm_bufio_shrink_count+0x38/0x78
  shrink_slab.part.54.constprop.65+0x100/0x464
  shrink_zone+0xa8/0x198

In the two cases analyzed, we see one task that looks like this:

  Workqueue: kverityd verity_prefetch_io

  __switch_to+0x9c/0xa8
  __schedule+0x440/0x6d8
  schedule+0x94/0xb4
  schedule_timeout+0x204/0x27c
  schedule_timeout_uninterruptible+0x44/0x50
  wait_iff_congested+0x9c/0x1f0
  shrink_inactive_list+0x3a0/0x4cc
  shrink_lruvec+0x418/0x5cc
  shrink_zone+0x88/0x198
  try_to_free_pages+0x51c/0x588
  __alloc_pages_nodemask+0x648/0xa88
  __get_free_pages+0x34/0x7c
  alloc_buffer+0xa4/0x144
  __bufio_new+0x84/0x278
  dm_bufio_prefetch+0x9c/0x154
  verity_prefetch_io+0xe8/0x10c
  process_one_work+0x240/0x424
  worker_thread+0x2fc/0x424
  kthread+0x10c/0x114

...and that looks to be the one holding the mutex.

The problem has been reproduced on fairly easily:
0. Be running Chrome OS w/ verity enabled on the root filesystem
1. Pick test patch: http://crosreview.com/412360
2. Install launchBalloons.sh and balloon.arm from
     http://crbug.com/468342
   ...that's just a memory stress test app.
3. On a 4GB rk3399 machine, run
     nice ./launchBalloons.sh 4 900 100000
   ...that tries to eat 4 * 900 MB of memory and keep accessing.
4. Login to the Chrome web browser and restore many tabs

With that, I've seen printouts like:
  DOUG: long bufio 90758 ms
...and stack trace always show's we're in dm_bufio_prefetch().

The problem is that we try to allocate memory with GFP_NOIO while
we're holding the dm_bufio lock.  Instead we should be using
GFP_NOWAIT.  Using GFP_NOIO can cause us to sleep while holding the
lock and that causes the above problems.

The current behavior explained by David Rientjes:

  It will still try reclaim initially because __GFP_WAIT (or
  __GFP_KSWAPD_RECLAIM) is set by GFP_NOIO.  This is the cause of
  contention on dm_bufio_lock() that the thread holds.  You want to
  pass GFP_NOWAIT instead of GFP_NOIO to alloc_buffer() when holding a
  mutex that can be contended by a concurrent slab shrinker (if
  count_objects didn't use a trylock, this pattern would trivially
  deadlock).

This change significantly increases responsiveness of the system while
in this state.  It makes a real difference because it unblocks kswapd.
In the bug report analyzed, kswapd was hung:

   kswapd0         D ffffffc000204fd8     0    72      2 0x00000000
   Call trace:
   [<ffffffc000204fd8>] __switch_to+0x9c/0xa8
   [<ffffffc00090b794>] __schedule+0x440/0x6d8
   [<ffffffc00090bac0>] schedule+0x94/0xb4
   [<ffffffc00090be44>] schedule_preempt_disabled+0x28/0x44
   [<ffffffc00090d900>] __mutex_lock_slowpath+0x120/0x1ac
   [<ffffffc00090d9d8>] mutex_lock+0x4c/0x68
   [<ffffffc000708e7c>] dm_bufio_shrink_count+0x38/0x78
   [<ffffffc00030b268>] shrink_slab.part.54.constprop.65+0x100/0x464
   [<ffffffc00030dbd8>] shrink_zone+0xa8/0x198
   [<ffffffc00030e578>] balance_pgdat+0x328/0x508
   [<ffffffc00030eb7c>] kswapd+0x424/0x51c
   [<ffffffc00023f06c>] kthread+0x10c/0x114
   [<ffffffc000203dd0>] ret_from_fork+0x10/0x40

By unblocking kswapd memory pressure should be reduced.

Suggested-by: David Rientjes <rientjes@google.com>
Reviewed-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Douglas Anderson <dianders@chromium.org>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2016-12-08 14:13:04 -05:00
..
accessibility
acpi Merge branch 'device-properties' 2016-11-11 23:23:02 +01:00
amba
android ANDROID: binder: Clear binder and cookie when setting handle in flat binder struct 2016-10-24 19:37:48 +02:00
ata ahci: fix the single MSI-X case in ahci_init_one 2016-10-25 11:43:07 -04:00
atm
auxdisplay auxdisplay: img-ascii-lcd: driver for simple ASCII LCD displays 2016-10-06 17:03:41 +02:00
base driver core fixes for 4.9-rc5 2016-11-13 10:22:07 -08:00
bcma
block aoe: fix crash in page count manipulation 2016-11-12 08:27:07 -07:00
bluetooth Bluetooth: btwilink: Fix probe return value 2016-10-20 10:14:49 +02:00
bus bus: qcom-ebi2: depend on ARCH_QCOM or COMPILE_TEST 2016-10-17 13:46:09 -07:00
cdrom
char char/misc fixes for 4.9-rc5 2016-11-13 10:24:08 -08:00
clk clk: mmp: pxa910: fix return value check in pxa910_clk_init() 2016-11-01 17:41:20 -07:00
clocksource Revert "clocksource/drivers/timer_sun5i: Replace code by clocksource_mmio_init" 2016-10-20 21:58:58 +02:00
connector
cpufreq Merge branches 'pm-cpufreq-fixes' and 'pm-sleep-fixes' 2016-10-29 01:29:17 +02:00
cpuidle Merge branch 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus 2016-10-15 09:26:12 -07:00
crypto Merge branch 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 2016-10-10 14:04:16 -07:00
dax device-dax: fix percpu_ref_exit ordering 2016-10-27 17:04:05 -07:00
dca
devfreq PM / devfreq: Skip status update on uninitialized previous_freq 2016-10-11 00:01:20 +02:00
dio
dma dmaengine updates for 4.8-rc1 2016-10-06 17:13:54 -07:00
dma-buf Merge tag 'drm-for-v4.9' of git://people.freedesktop.org/~airlied/linux 2016-10-11 18:12:22 -07:00
edac * Altera Arria10 enablement of NAND, DMA, USB, QSPI and SD-MMC FIFO 2016-10-04 12:06:26 -07:00
eisa
extcon extcon: qcom-spmi-misc: Sync the extcon state on interrupt 2016-10-26 16:04:29 +09:00
firewire firewire: net: fix fragmented datagram_size off-by-one 2016-11-03 14:46:39 +01:00
firmware efi/arm: Fix absolute relocation detection for older toolchains 2016-10-19 14:49:44 +02:00
fmc
fpga
gpio gpio/mvebu: Use irq_domain_add_linear 2016-11-01 19:31:49 +01:00
gpu imx-drm: fix possible hangup when disabling crtcs 2016-11-11 09:09:57 +10:00
hid HID: sensor: fix attributes in HID sensor interface 2016-11-05 16:56:09 +01:00
hsi
hv vmbus: make sysfs names consistent with PCI 2016-11-01 09:07:13 -06:00
hwmon hwmon: (core) fix resource leak on devm_kcalloc failure 2016-10-24 06:05:13 -07:00
hwspinlock
hwtracing
i2c i2c: core: fix NULL pointer dereference under race condition 2016-11-04 20:36:58 +01:00
ide
idle nmi_backtrace: generate one-line reports for idle cpus 2016-10-07 18:46:30 -07:00
iio iio: maxim_thermocouple: detect invalid storage size in read() 2016-11-13 10:08:32 +01:00
infiniband infiniband: shut up a maybe-uninitialized warning 2016-11-11 08:45:08 -08:00
input Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input 2016-11-05 11:26:11 -07:00
iommu iommu/vt-d: Fix dead-locks in disable_dmar_iommu() path 2016-11-08 15:08:26 +01:00
ipack ipack: print a hex number after a 0x prefix 2016-10-27 18:43:43 -07:00
irqchip GIC updates for Linux 4.9-rc2 2016-10-21 21:40:29 +02:00
isdn
leds
lguest
lightnvm Merge branch 'for-4.9/block' of git://git.kernel.dk/linux-block 2016-10-07 14:42:05 -07:00
macintosh
mailbox Merge branch 'mailbox-for-next' of git://git.linaro.org/landing-teams/working/fujitsu/integration 2016-10-06 17:36:53 -07:00
mcb mcb: Add a dma_device to mcb_device 2016-09-27 12:33:47 +02:00
md dm bufio: avoid sleeping while holding the dm_bufio lock 2016-12-08 14:13:04 -05:00
media gp8psk: Fix DVB frontend attach 2016-11-13 10:02:22 -08:00
memory ARM: SoC driver updates for v4.9 2016-10-07 21:23:40 -07:00
memstick memstick: rtsx_usb_ms: Manage runtime PM when accessing the device 2016-10-17 15:43:05 +02:00
message
mfd - Core Frameworks 2016-10-07 08:35:35 -07:00
misc mei: bus: fix received data size check in NFC fixup 2016-10-31 10:25:22 -06:00
mmc mmc: mxs: Initialize the spinlock prior to using it 2016-11-07 13:30:08 +01:00
mtd MTD updates for 4.9-rc4: 2016-11-05 10:52:29 -07:00
net Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2016-10-29 20:33:20 -07:00
nfc mei: bus: fix received data size check in NFC fixup 2016-10-31 10:25:22 -06:00
ntb
nubus
nvdimm nvdimm: make CONFIG_NVDIMM_DAX 'bool' 2016-10-27 16:16:21 -07:00
nvme lightnvm: invalid offset calculation for lba_shift 2016-11-11 18:27:32 -07:00
nvmem ARM: SoC driver updates for v4.9 2016-10-07 21:23:40 -07:00
of Revert "console: don't prefer first registered if DT specifies stdout-path" 2016-11-11 08:12:37 -08:00
oprofile Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2016-10-10 20:16:43 -07:00
parisc
parport
pci pci-v4.9-fixes-3 2016-11-11 16:38:26 -08:00
pcmcia pcmcia: fix return value of soc_pcmcia_regulator_set 2016-11-11 08:45:08 -08:00
perf perf: xgene: Remove bogus IS_ERR() check 2016-10-17 15:50:07 +01:00
phy phy: sun4i: check PMU presence when poking unknown bit of pmu 2016-11-05 13:45:02 +05:30
pinctrl pinctrl-aspeed-g5: Never set SCU90[6] 2016-11-07 10:31:33 +01:00
platform ACPI fix for v4.9-rc5 2016-11-11 17:02:01 -08:00
pnp
power power supply and reset changes for the v4.9 series 2016-10-06 18:21:15 -07:00
powercap
pps pps: kc: fix non-tickless system config dependency 2016-10-11 15:06:32 -07:00
ps3
ptp drivers/ptp: Fix kernel memory disclosure 2016-10-13 10:20:06 -04:00
pwm
rapidio mm: replace get_user_pages() write/force parameters with gup_flags 2016-10-19 08:11:43 -07:00
ras
regulator regulator: core: silence warning: "VDD1: ramp_delay not set" 2016-10-28 18:22:40 +01:00
remoteproc rpmsg updates for v4.9 2016-10-06 17:03:49 -07:00
reset reset: uniphier: rename MIO reset to SD reset for Pro5, PXs2, LD20 SoCs 2016-10-22 18:31:42 +09:00
rpmsg
rtc RTC for 4.9 2016-10-14 13:13:44 -07:00
s390 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux 2016-10-27 14:16:30 -07:00
sbus
scsi SCSI fixes on 20161111 2016-11-13 10:07:08 -08:00
sfi
sh
sn
soc powerpc updates for 4.9 #2 2016-10-14 11:07:42 -07:00
spi Merge remote-tracking branches 'spi/fix/dt', 'spi/fix/fsl-dspi' and 'spi/fix/fsl-espi' into spi-linus 2016-10-29 12:51:55 -06:00
spmi spmi: pmic-arb: Return an error code if sanity check fails 2016-09-27 12:43:34 +02:00
ssb
staging Staging/IIO fixes for 4.9-rc5 2016-11-13 10:13:33 -08:00
target target/tcm_fc: use CPU affinity for responses 2016-10-21 01:19:44 -07:00
tc
thermal thermal/powerclamp: correct cpu support check 2016-10-20 14:15:44 +08:00
thunderbolt
tty tty: serial_core: fix NULL struct tty pointer access in uart_write_wakeup 2016-10-28 08:13:07 -04:00
uio
usb USB: cdc-acm: fix TIOCMIWAIT 2016-11-10 13:12:59 +01:00
uwb uwb: fix device reference leaks 2016-11-01 09:04:04 -06:00
vfio vfio/pci: Fix integer overflows, bitmask check 2016-10-26 13:49:29 -06:00
vhost
video Merge branch 'gup_flag-cleanups' 2016-10-19 08:39:47 -07:00
virt mm: replace get_user_pages() write/force parameters with gup_flags 2016-10-19 08:11:43 -07:00
virtio virtio_ring: mark vring_dma_dev inline 2016-10-31 00:40:08 +02:00
vlynq
vme vme: vme_get_size potentially returning incorrect value on failure 2016-10-28 08:25:18 -04:00
w1
watchdog Merge branches 'acpi-wdat' and 'acpi-cppc' 2016-10-21 22:24:23 +02:00
xen xen: fixes for 4.9-rc2 2016-10-24 19:52:24 -07:00
zorro
Kconfig
Makefile A small bug fix and a new driver for acting as an IPMI device. 2016-10-23 15:56:23 -07:00