Commit Graph

71345 Commits

Author SHA1 Message Date
Michael S. Tsirkin 5eea84f478 if_tun: add TUNSETVNETLE/TUNGETVNETLE
ifreq flags field is only 16 bit wide, so setting IFF_VNET_LE there has
no effect:
doesn't fit in two bytes.

The tests passed apparently because they have an even number of bugs,
all cancelling out.

Luckily we didn't release a kernel with this flag, so it's
not too late to fix this.

Add TUNSETVNETLE/TUNGETVNETLE to really achieve the purpose
of IFF_VNET_LE.

This has an added benefit that if we ever want a BE flag,
we won't have to deal with weird configurations like
setting both LE and BE at the same time.

IFF_VNET_LE will be dropped in a follow-up patch.

Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-16 11:19:41 -05:00
Linus Torvalds 67e2c38838 Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security
Pull security layer updates from James Morris:
 "In terms of changes, there's general maintenance to the Smack,
  SELinux, and integrity code.

  The IMA code adds a new kconfig option, IMA_APPRAISE_SIGNED_INIT,
  which allows IMA appraisal to require signatures.  Support for reading
  keys from rootfs before init is call is also added"

* 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security: (23 commits)
  selinux: Remove security_ops extern
  security: smack: fix out-of-bounds access in smk_parse_smack()
  VFS: refactor vfs_read()
  ima: require signature based appraisal
  integrity: provide a hook to load keys when rootfs is ready
  ima: load x509 certificate from the kernel
  integrity: provide a function to load x509 certificate from the kernel
  integrity: define a new function integrity_read_file()
  Security: smack: replace kzalloc with kmem_cache for inode_smack
  Smack: Lock mode for the floor and hat labels
  ima: added support for new kernel cmdline parameter ima_template_fmt
  ima: allocate field pointers array on demand in template_desc_init_fields()
  ima: don't allocate a copy of template_fmt in template_desc_init_fields()
  ima: display template format in meas. list if template name length is zero
  ima: added error messages to template-related functions
  ima: use atomic bit operations to protect policy update interface
  ima: ignore empty and with whitespaces policy lines
  ima: no need to allocate entry for comment
  ima: report policy load status
  ima: use path names cache
  ...
2014-12-14 20:36:37 -08:00
Linus Torvalds 6ae840e7cc Char/Misc driver patches for 3.19-rc1
Here's the big char/misc driver update for 3.19-rc1
 
 Lots of little things all over the place in different drivers, and a new
 subsystem, "coresight" has been added.  Full details are in the
 shortlog.
 
 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iEYEABECAAYFAlSODosACgkQMUfUDdst+ykSNwCfcqx1Z3rQzbLwSrR2sa1fV3Zb
 yEAAniJoLZ4ZkoQK4/1ozsFc31q+gXNm
 =/epr
 -----END PGP SIGNATURE-----

Merge tag 'char-misc-3.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc

Pull char/misc driver updates from Greg KH:
 "Here's the big char/misc driver update for 3.19-rc1

  Lots of little things all over the place in different drivers, and a
  new subsystem, "coresight" has been added.  Full details are in the
  shortlog"

* tag 'char-misc-3.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: (73 commits)
  parport: parport_pc, do not remove parent devices early
  spmi: Remove shutdown/suspend/resume kernel-doc
  carma-fpga-program: drop videobuf dependency
  carma-fpga: drop videobuf dependency
  carma-fpga-program.c: fix compile errors
  i8k: Fix temperature bug handling in i8k_get_temp()
  cxl: Name interrupts in /proc/interrupt
  CXL: Return error to PSL if IRQ demultiplexing fails & print clearer warning
  coresight-replicator: remove .owner field for driver
  coresight: fixed comments in coresight.h
  coresight: fix typo in comment in coresight-priv.h
  coresight: bindings for coresight drivers
  coresight: Adding ABI documentation
  w1: support auto-load of w1_bq27000 module.
  w1: avoid potential u16 overflow
  cn: verify msg->len before making callback
  mei: export fw status registers through sysfs
  mei: read and print all six FW status registers
  mei: txe: add cherrytrail device id
  mei: kill cached host and me csr values
  ...
2014-12-14 16:43:47 -08:00
Linus Torvalds e6b5be2be4 Driver core patches for 3.19-rc1
Here's the set of driver core patches for 3.19-rc1.
 
 They are dominated by the removal of the .owner field in platform
 drivers.  They touch a lot of files, but they are "simple" changes, just
 removing a line in a structure.
 
 Other than that, a few minor driver core and debugfs changes.  There are
 some ath9k patches coming in through this tree that have been acked by
 the wireless maintainers as they relied on the debugfs changes.
 
 Everything has been in linux-next for a while.
 
 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iEYEABECAAYFAlSOD20ACgkQMUfUDdst+ylLPACg2QrW1oHhdTMT9WI8jihlHVRM
 53kAoLeteByQ3iVwWurwwseRPiWa8+MI
 =OVRS
 -----END PGP SIGNATURE-----

Merge tag 'driver-core-3.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core

Pull driver core update from Greg KH:
 "Here's the set of driver core patches for 3.19-rc1.

  They are dominated by the removal of the .owner field in platform
  drivers.  They touch a lot of files, but they are "simple" changes,
  just removing a line in a structure.

  Other than that, a few minor driver core and debugfs changes.  There
  are some ath9k patches coming in through this tree that have been
  acked by the wireless maintainers as they relied on the debugfs
  changes.

  Everything has been in linux-next for a while"

* tag 'driver-core-3.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (324 commits)
  Revert "ath: ath9k: use debugfs_create_devm_seqfile() helper for seq_file entries"
  fs: debugfs: add forward declaration for struct device type
  firmware class: Deletion of an unnecessary check before the function call "vunmap"
  firmware loader: fix hung task warning dump
  devcoredump: provide a one-way disable function
  device: Add dev_<level>_once variants
  ath: ath9k: use debugfs_create_devm_seqfile() helper for seq_file entries
  ath: use seq_file api for ath9k debugfs files
  debugfs: add helper function to create device related seq_file
  drivers/base: cacheinfo: remove noisy error boot message
  Revert "core: platform: add warning if driver has no owner"
  drivers: base: support cpu cache information interface to userspace via sysfs
  drivers: base: add cpu_device_create to support per-cpu devices
  topology: replace custom attribute macros with standard DEVICE_ATTR*
  cpumask: factor out show_cpumap into separate helper function
  driver core: Fix unbalanced device reference in drivers_probe
  driver core: fix race with userland in device_add()
  sysfs/kernfs: make read requests on pre-alloc files use the buffer.
  sysfs/kernfs: allow attributes to request write buffer be pre-allocated.
  fs: sysfs: return EGBIG on write if offset is larger than file size
  ...
2014-12-14 16:10:09 -08:00
Linus Torvalds 37da7bbbe8 TTY/Serial driver patches for 3.19-rc1
Here's the big tty/serial driver update for 3.19-rc1.
 
 There are a number of TTY core changes/fixes in here from Peter Hurley
 that have all been teted in linux-next for a long time now.  There are
 also the normal serial driver updates as well, full details in the
 changelog below.
 
 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iEYEABECAAYFAlSOD/MACgkQMUfUDdst+ymW+wCfbSzoYMRObIImMPWfoQtxkvvN
 rpkAnAtyEP/zZIfkQIuKTSH6FJxocF8V
 =WZt3
 -----END PGP SIGNATURE-----

Merge tag 'tty-3.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty

Pull tty/serial driver updates from Greg KH:
 "Here's the big tty/serial driver update for 3.19-rc1.

  There are a number of TTY core changes/fixes in here from Peter Hurley
  that have all been teted in linux-next for a long time now.  There are
  also the normal serial driver updates as well, full details in the
  changelog below"

* tag 'tty-3.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty: (219 commits)
  serial: pxa: hold port.lock when reporting modem line changes
  tty-hvsi_lib: Deletion of an unnecessary check before the function call "tty_kref_put"
  tty: Deletion of unnecessary checks before two function calls
  n_tty: Fix read_buf race condition, increment read_head after pushing data
  serial: of-serial: add PM suspend/resume support
  Revert "serial: of-serial: add PM suspend/resume support"
  Revert "serial: of-serial: fix up PM ops on no_console_suspend and port type"
  serial: 8250: don't attempt a trylock if in sysrq
  serial: core: Add big-endian iotype
  serial: samsung: use port->fifosize instead of hardcoded values
  serial: samsung: prefer to use fifosize from driver data
  serial: samsung: fix style problems
  serial: samsung: wait for transfer completion before clock disable
  serial: icom: fix error return code
  serial: tegra: clean up tty-flag assignments
  serial: Fix io address assign flow with Fintek PCI-to-UART Product
  serial: mxs-auart: fix tx_empty against shift register
  serial: mxs-auart: fix gpio change detection on interrupt
  serial: mxs-auart: Fix mxs_auart_set_ldisc()
  serial: 8250_dw: Use 64-bit access for OCTEON.
  ...
2014-12-14 15:23:32 -08:00
Linus Torvalds e7cf773d43 USB patches for 3.19-rc1
Here's the big set of USB and PHY patches for 3.19-rc1.
 
 The normal churn in the USB gadget area is in here, as well as xhci and
 other individual USB driver updates.  The PHY tree is also in here, as
 there were dependancies on the USB tree.
 
 All of these have been in linux-next.
 
 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iEYEABECAAYFAlSOEHcACgkQMUfUDdst+ykziQCgsm1D/af2nac6CTF2pov8VMIY
 ywgAnRi8LtZ2WassrwTNxY86Avaqryis
 =UVp8
 -----END PGP SIGNATURE-----

Merge tag 'usb-3.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb

Pull USB updates from Greg KH:
 "Here's the big set of USB and PHY patches for 3.19-rc1.

  The normal churn in the USB gadget area is in here, as well as xhci
  and other individual USB driver updates.  The PHY tree is also in
  here, as there were dependancies on the USB tree.

  All of these have been in linux-next"

* tag 'usb-3.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb: (351 commits)
  arm: omap3: twl: remove usb phy init data
  usbip: fix error handling in stub_probe()
  usb: gadget: udc: missing curly braces
  USB: mos7720: delete some unneeded code
  wusb: replace memset by memzero_explicit
  usbip: remove unneeded structure
  usb: xhci: fix comment for PORT_DEV_REMOVE
  xhci: don't use the same variable for stopped and halted rings current TD
  xhci: clear extra bits from slot context when setting max exit latency
  xhci: cleanup finish_td function
  USB: adutux: NULL dereferences on disconnect
  usb: chipidea: fix platform_no_drv_owner.cocci warnings
  usb: chipidea: Fixed a few typos in comments
  Documentation: bindings: add doc for the USB2 ChipIdea USB driver
  usb: chipidea: add a usb2 driver for ci13xxx
  usb: chipidea: fix phy handling
  usb: chipidea: remove duplicate dev_set_drvdata for host_start
  usb: chipidea: parameter 'mode' isn't needed for hw_device_reset
  usb: chipidea: add controller reset API
  usb: chipidea: remove flag CI_HDRC_REQUIRE_TRANSCEIVER
  ...
2014-12-14 14:57:16 -08:00
Linus Torvalds 980f3c344f This is the bulk of GPIO changes for the v3.19 series:
- A new API that allows setting more than one GPIO at the
   time. This is implemented for the new descriptor-based
   API only and makes it possible to e.g. toggle a clock and
   data line at the same time, if the hardware can do this
   with a single register write. Both consumers and drivers
   need new calls, and the core will fall back to driving
   individual lines where needed. Implemented for the MPC8xxx
   driver initially.
 - Patched the mdio-mux-gpio and the serial mctrl driver
   that drives modems to use the new multiple-setting API
   to set several signals simultaneously.
 - Get rid of the global GPIO descriptor array, and instead
   allocate descriptors dynamically for each GPIO on a certain
   GPIO chip. This moves us closer to getting rid of the
   limitation of using the global, static GPIO numberspace.
 - New driver and device tree bindings for 74xx ICs.
 - New driver and device tree bindings for the VF610 Vybrid.
 - Support the RCAR r8a7793 and r8a7794.
 - Guidelines for GPIO device tree bindings trying to get
   things a bit more strict with the advent of combined
   device properties.
 - Suspend/resume support for the MVEBU driver.
 - A slew of minor fixes and improvements.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJUjgQ7AAoJEEEQszewGV1zuJ8P+wamlDNhJbsgqXPcSCZZFgeP
 1O22VRYqoo/i8mAzNCRi2h6NogO9Da6rCRhHdH35TsuNzIbusHE+btMukj248qJ7
 WYOf25I0ImyUP8kulogW4/+7lYibRLHnN2BSLuAkApofmxDvODPS1KNWHulcOcxl
 VaVsA8wvFzQO1s1Wjv94ctVfs5rqk7mBfPwk61zHuLeETecmKg0e52p0Uzqlq6gi
 UKi9uK3sjQ7kI/+xa+qDrF9GRwRR22oJfD/9zNj8g94iU9iMs5Oh+Zp3RJcvYUSD
 y5BIb+IY2ATy20ZkijWmeP8LJz6pja+C9Ne7lKM0jkv7geGeHGAoavz0n3oUq4oz
 IvUNz6hCAP9PcxWc5a9FFqqORLWrRew6GmZmJvIkmC9K+3UQcWhkzO3vLpfl6Q9h
 S728XexkIlhxG9NcER21bFXV2dw3z/X9dm5mQ473TqJm+wQmRuYcPRg053NbqMcx
 juvkweCksx8qlpnjo/1QXQcVwFM8kuR7xAlVo7zdMDOU5F8pdxRnsTl0cUdx5cPv
 DKeMRg8+FYcHmIoe/EodemIh7cAZtEpijZNNAr9cDmAjifeBjWhCb+zri5SIc96x
 0jKVTXyY4jnHXBVoA0FIl1d2t54yVjh3PYiu0MjeLJ9tyB+Px/nOxW8FrdlFnPJ/
 oP5WK13c8h3bMkxUzsvL
 =ZAhA
 -----END PGP SIGNATURE-----

Merge tag 'gpio-v3.19-2' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio

Pull take two of the GPIO updates:
 "Same stuff as last time, now with a fixup patch for the previous
  compile error plus I ran a few extra rounds of compile-testing.

  This is the bulk of GPIO changes for the v3.19 series:

   - A new API that allows setting more than one GPIO at the time.  This
     is implemented for the new descriptor-based API only and makes it
     possible to e.g. toggle a clock and data line at the same time, if
     the hardware can do this with a single register write.  Both
     consumers and drivers need new calls, and the core will fall back
     to driving individual lines where needed.  Implemented for the
     MPC8xxx driver initially

   - Patched the mdio-mux-gpio and the serial mctrl driver that drives
     modems to use the new multiple-setting API to set several signals
     simultaneously

   - Get rid of the global GPIO descriptor array, and instead allocate
     descriptors dynamically for each GPIO on a certain GPIO chip.  This
     moves us closer to getting rid of the limitation of using the
     global, static GPIO numberspace

   - New driver and device tree bindings for 74xx ICs

   - New driver and device tree bindings for the VF610 Vybrid

   - Support the RCAR r8a7793 and r8a7794

   - Guidelines for GPIO device tree bindings trying to get things a bit
     more strict with the advent of combined device properties

   - Suspend/resume support for the MVEBU driver

   - A slew of minor fixes and improvements"

* tag 'gpio-v3.19-2' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio: (33 commits)
  gpio: mcp23s08: fix up compilation error
  gpio: pl061: document gpio-ranges property for bindings file
  gpio: pl061: hook request if gpio-ranges avaiable
  gpio: mcp23s08: Add option to configure IRQ output polarity as active high
  gpio: fix deferred probe detection for legacy API
  serial: mctrl_gpio: use gpiod_set_array function
  mdio-mux-gpio: Use GPIO descriptor interface and new gpiod_set_array function
  gpio: remove const modifier from gpiod_get_direction()
  gpio: remove gpio_descs global array
  gpio: mxs: implement get_direction callback
  gpio: em: Use dynamic allocation of GPIOs
  gpio: Check if base is positive before calling gpio_is_valid()
  gpio: mcp23s08: Add simple IRQ support for SPI devices
  gpio: mcp23s08: request a shared interrupt
  gpio: mcp23s08: Do not free unrequested interrupt
  gpio: rcar: Add r8a7793 and r8a7794 support
  gpio-mpc8xxx: add mpc8xxx_gpio_set_multiple function
  gpiolib: allow simultaneous setting of multiple GPIO outputs
  gpio: mvebu: add suspend/resume support
  gpio: gpio-davinci: remove duplicate check on resource
  ..
2014-12-14 14:05:05 -08:00
Linus Torvalds 7d22286ff7 Merge git://git.kvack.org/~bcrl/aio-next
Pull aio updates from Benjamin LaHaise.

* git://git.kvack.org/~bcrl/aio-next:
  aio: Skip timer for io_getevents if timeout=0
  aio: Make it possible to remap aio ring
2014-12-14 13:36:57 -08:00
Linus Torvalds 96895199c8 Merge branch 'i2c/for-3.19' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux
Pull i2c updates from Wolfram Sang:
 "For 3.19, the I2C subsystem has to offer special candy this time.
  Right in time for Christmas :)

   - I2C slave framework: finally, a generic mechanism for Linux being
     an I2C slave (if the bus driver supports that).  Docs are still
     missing but will come later this cycle, the code is good enough to
     go.
   - I2C muxes represent their topology in sysfs much more detailed.
     This will help users to navigate around much easier.
   - irq population of i2c clients is now done at probe time, not device
     creation time, to have better support for deferred probing.
   - new drivers for Imagination SCB, Amlogic Meson
   - DMA support added for Freescale IMX, Renesas SHMobile
   - slightly bigger driver updates to OMAP, i801, AT91, and rk3x
     (mostly quirk handling, timing updates, and using better kernel
     interfaces)
   - eeprom driver can now write with byte-access (very slow, but OK to
     have)
   - and the bunch of smaller fixes, cleanups, ID updates..."

* 'i2c/for-3.19' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux: (56 commits)
  i2c: sh_mobile: remove unneeded DMA mask
  i2c: rcar: add slave support
  i2c: slave-eeprom: add eeprom simulator driver
  i2c: core changes for slave support
  MAINTAINERS: add I2C dt bindings also to I2C realm
  i2c: designware: Fix falling time bindings doc
  i2c: davinci: switch to use platform_get_irq
  Documentation: i2c: Use PM ops instead of legacy suspend/resume
  i2c: sh_mobile: optimize irq entry
  i2c: pxa: add support for SCCB devices
  omap: i2c: don't check bus state IP rev3.3 and earlier
  i2c: s3c2410: Handle i2c sys_cfg register in i2c driver
  i2c: rk3x: add Kconfig dependency on COMMON_CLK
  i2c: omap: add notes related to i2c multimaster mode
  i2c: omap: don't reset controller if Arbitration Lost detected
  i2c: omap: implement workaround for handling invalid BB-bit values
  i2c: omap: cleanup register definitions
  i2c: rk3x: handle dynamic clock rate changes correctly
  i2c: at91: enable probe deferring on dma channel request
  i2c: at91: remove legacy DMA support
  ...
2014-12-14 12:54:40 -08:00
Pavel Emelyanov e4a0d3e720 aio: Make it possible to remap aio ring
There are actually two issues this patch addresses. Let me start with
the one I tried to solve in the beginning.

So, in the checkpoint-restore project (criu) we try to dump tasks'
state and restore one back exactly as it was. One of the tasks' state
bits is rings set up with io_setup() call. There's (almost) no problems
in dumping them, there's a problem restoring them -- if I dump a task
with aio ring originally mapped at address A, I want to restore one
back at exactly the same address A. Unfortunately, the io_setup() does
not allow for that -- it mmaps the ring at whatever place mm finds
appropriate (it calls do_mmap_pgoff() with zero address and without
the MAP_FIXED flag).

To make restore possible I'm going to mremap() the freshly created ring
into the address A (under which it was seen before dump). The problem is
that the ring's virtual address is passed back to the user-space as the
context ID and this ID is then used as search key by all the other io_foo()
calls. Reworking this ID to be just some integer doesn't seem to work, as
this value is already used by libaio as a pointer using which this library
accesses memory for aio meta-data.

So, to make restore work we need to make sure that

a) ring is mapped at desired virtual address
b) kioctx->user_id matches this value

Having said that, the patch makes mremap() on aio region update the
kioctx's user_id and mmap_base values.

Here appears the 2nd issue I mentioned in the beginning of this mail.
If (regardless of the C/R dances I do) someone creates an io context
with io_setup(), then mremap()-s the ring and then destroys the context,
the kill_ioctx() routine will call munmap() on wrong (old) address.
This will result in a) aio ring remaining in memory and b) some other
vma get unexpectedly unmapped.

What do you think?

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Acked-by: Dmitry Monakhov <dmonakhov@openvz.org>
Signed-off-by: Benjamin LaHaise <bcrl@kvack.org>
2014-12-13 17:49:50 -05:00
Linus Torvalds 9ea18f8cab Merge branch 'for-3.19/drivers' of git://git.kernel.dk/linux-block
Pull block layer driver updates from Jens Axboe:

 - NVMe updates:
        - The blk-mq conversion from Matias (and others)

        - A stack of NVMe bug fixes from the nvme tree, mostly from Keith.

        - Various bug fixes from me, fixing issues in both the blk-mq
          conversion and generic bugs.

        - Abort and CPU online fix from Sam.

        - Hot add/remove fix from Indraneel.

 - A couple of drbd fixes from the drbd team (Andreas, Lars, Philipp)

 - With the generic IO stat accounting from 3.19/core, converting md,
   bcache, and rsxx to use those.  From Gu Zheng.

 - Boundary check for queue/irq mode for null_blk from Matias.  Fixes
   cases where invalid values could be given, causing the device to hang.

 - The xen blkfront pull request, with two bug fixes from Vitaly.

* 'for-3.19/drivers' of git://git.kernel.dk/linux-block: (56 commits)
  NVMe: fix race condition in nvme_submit_sync_cmd()
  NVMe: fix retry/error logic in nvme_queue_rq()
  NVMe: Fix FS mount issue (hot-remove followed by hot-add)
  NVMe: fix error return checking from blk_mq_alloc_request()
  NVMe: fix freeing of wrong request in abort path
  xen/blkfront: remove redundant flush_op
  xen/blkfront: improve protection against issuing unsupported REQ_FUA
  NVMe: Fix command setup on IO retry
  null_blk: boundary check queue_mode and irqmode
  block/rsxx: use generic io stats accounting functions to simplify io stat accounting
  md: use generic io stats accounting functions to simplify io stat accounting
  drbd: use generic io stats accounting functions to simplify io stat accounting
  md/bcache: use generic io stats accounting functions to simplify io stat accounting
  NVMe: Update module version major number
  NVMe: fail pci initialization if the device doesn't have any BARs
  NVMe: add ->exit_hctx() hook
  NVMe: make setup work for devices that don't do INTx
  NVMe: enable IO stats by default
  NVMe: nvme_submit_async_admin_req() must use atomic rq allocation
  NVMe: replace blk_put_request() with blk_mq_free_request()
  ...
2014-12-13 14:22:26 -08:00
Linus Torvalds caf292ae5b Merge branch 'for-3.19/core' of git://git.kernel.dk/linux-block
Pull block driver core update from Jens Axboe:
 "This is the pull request for the core block IO changes for 3.19.  Not
  a huge round this time, mostly lots of little good fixes:

   - Fix a bug in sysfs blktrace interface causing a NULL pointer
     dereference, when enabled/disabled through that API.  From Arianna
     Avanzini.

   - Various updates/fixes/improvements for blk-mq:

        - A set of updates from Bart, mostly fixing buts in the tag
          handling.

        - Cleanup/code consolidation from Christoph.

        - Extend queue_rq API to be able to handle batching issues of IO
          requests. NVMe will utilize this shortly. From me.

        - A few tag and request handling updates from me.

        - Cleanup of the preempt handling for running queues from Paolo.

        - Prevent running of unmapped hardware queues from Ming Lei.

        - Move the kdump memory limiting check to be in the correct
          location, from Shaohua.

        - Initialize all software queues at init time from Takashi. This
          prevents a kobject warning when CPUs are brought online that
          weren't online when a queue was registered.

   - Single writeback fix for I_DIRTY clearing from Tejun.  Queued with
     the core IO changes, since it's just a single fix.

   - Version X of the __bio_add_page() segment addition retry from
     Maurizio.  Hope the Xth time is the charm.

   - Documentation fixup for IO scheduler merging from Jan.

   - Introduce (and use) generic IO stat accounting helpers for non-rq
     drivers, from Gu Zheng.

   - Kill off artificial limiting of max sectors in a request from
     Christoph"

* 'for-3.19/core' of git://git.kernel.dk/linux-block: (26 commits)
  bio: modify __bio_add_page() to accept pages that don't start a new segment
  blk-mq: Fix uninitialized kobject at CPU hotplugging
  blktrace: don't let the sysfs interface remove trace from running list
  blk-mq: Use all available hardware queues
  blk-mq: Micro-optimize bt_get()
  blk-mq: Fix a race between bt_clear_tag() and bt_get()
  blk-mq: Avoid that __bt_get_word() wraps multiple times
  blk-mq: Fix a use-after-free
  blk-mq: prevent unmapped hw queue from being scheduled
  blk-mq: re-check for available tags after running the hardware queue
  blk-mq: fix hang in bt_get()
  blk-mq: move the kdump check to blk_mq_alloc_tag_set
  blk-mq: cleanup tag free handling
  blk-mq: use 'nr_cpu_ids' as highest CPU ID count for hwq <-> cpu map
  blk: introduce generic io stat accounting help function
  blk-mq: handle the single queue case in blk_mq_hctx_next_cpu
  genhd: check for int overflow in disk_expand_part_tbl()
  blk-mq: add blk_mq_free_hctx_request()
  blk-mq: export blk_mq_free_request()
  blk-mq: use get_cpu/put_cpu instead of preempt_disable/preempt_enable
  ...
2014-12-13 14:14:23 -08:00
Linus Torvalds a99abce2d9 Merge branch 'upstream' of git://git.infradead.org/users/pcmoore/audit
Pull audit updates from Paul Moore:
 "Two small patches from the audit next branch; only one of which has
  any real significant code changes, the other is simply a MAINTAINERS
  update for audit.

  The single code patch is pretty small and rather straightforward, it
  changes the audit "version" number reported to userspace from an
  integer to a bitmap which is used to indicate the functionality of the
  running kernel.  This really doesn't have much impact on the kernel,
  but it will make life easier for the audit userspace folks.

  Thankfully we were still on a version number which allowed us to do
  this without breaking userspace"

* 'upstream' of git://git.infradead.org/users/pcmoore/audit:
  audit: convert status version to a feature bitmap
  audit: add Paul Moore to the MAINTAINERS entry
2014-12-13 13:41:28 -08:00
Linus Torvalds e3aa91a7cb Merge git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6
Pull crypto update from Herbert Xu:
 - The crypto API is now documented :)
 - Disallow arbitrary module loading through crypto API.
 - Allow get request with empty driver name through crypto_user.
 - Allow speed testing of arbitrary hash functions.
 - Add caam support for ctr(aes), gcm(aes) and their derivatives.
 - nx now supports concurrent hashing properly.
 - Add sahara support for SHA1/256.
 - Add ARM64 version of CRC32.
 - Misc fixes.

* git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (77 commits)
  crypto: tcrypt - Allow speed testing of arbitrary hash functions
  crypto: af_alg - add user space interface for AEAD
  crypto: qat - fix problem with coalescing enable logic
  crypto: sahara - add support for SHA1/256
  crypto: sahara - replace tasklets with kthread
  crypto: sahara - add support for i.MX53
  crypto: sahara - fix spinlock initialization
  crypto: arm - replace memset by memzero_explicit
  crypto: powerpc - replace memset by memzero_explicit
  crypto: sha - replace memset by memzero_explicit
  crypto: sparc - replace memset by memzero_explicit
  crypto: algif_skcipher - initialize upon init request
  crypto: algif_skcipher - removed unneeded code
  crypto: algif_skcipher - Fixed blocking recvmsg
  crypto: drbg - use memzero_explicit() for clearing sensitive data
  crypto: drbg - use MODULE_ALIAS_CRYPTO
  crypto: include crypto- module prefix in template
  crypto: user - add MODULE_ALIAS
  crypto: sha-mb - remove a bogus NULL check
  crytpo: qat - Fix 64 bytes requests
  ...
2014-12-13 13:33:26 -08:00
Linus Torvalds 78a45c6f06 Merge branch 'akpm' (second patch-bomb from Andrew)
Merge second patchbomb from Andrew Morton:
 - the rest of MM
 - misc fs fixes
 - add execveat() syscall
 - new ratelimit feature for fault-injection
 - decompressor updates
 - ipc/ updates
 - fallocate feature creep
 - fsnotify cleanups
 - a few other misc things

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (99 commits)
  cgroups: Documentation: fix trivial typos and wrong paragraph numberings
  parisc: percpu: update comments referring to __get_cpu_var
  percpu: update local_ops.txt to reflect this_cpu operations
  percpu: remove __get_cpu_var and __raw_get_cpu_var macros
  fsnotify: remove destroy_list from fsnotify_mark
  fsnotify: unify inode and mount marks handling
  fallocate: create FAN_MODIFY and IN_MODIFY events
  mm/cma: make kmemleak ignore CMA regions
  slub: fix cpuset check in get_any_partial
  slab: fix cpuset check in fallback_alloc
  shmdt: use i_size_read() instead of ->i_size
  ipc/shm.c: fix overly aggressive shmdt() when calls span multiple segments
  ipc/msg: increase MSGMNI, remove scaling
  ipc/sem.c: increase SEMMSL, SEMMNI, SEMOPM
  ipc/sem.c: change memory barrier in sem_lock() to smp_rmb()
  lib/decompress.c: consistency of compress formats for kernel image
  decompress_bunzip2: off by one in get_next_block()
  usr/Kconfig: make initrd compression algorithm selection not expert
  fault-inject: add ratelimit option
  ratelimit: add initialization macro
  ...
2014-12-13 13:00:36 -08:00
Christoph Lameter 6c51ec4d18 percpu: remove __get_cpu_var and __raw_get_cpu_var macros
No user is left in the kernel source tree.  Therefore we can drop the
definitions.

This is the final merge of the transition away from __get_cpu_var.  After
this patch the kernel will not build if anyone uses __get_cpu_var.

Signed-off-by: Christoph Lameter <cl@linux.com>
Cc: Tejun Heo <tj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-12-13 12:42:53 -08:00
Jan Kara 37d469e767 fsnotify: remove destroy_list from fsnotify_mark
destroy_list is used to track marks which still need waiting for srcu
period end before they can be freed.  However by the time mark is added to
destroy_list it isn't in group's list of marks anymore and thus we can
reuse fsnotify_mark->g_list for queueing into destroy_list.  This saves
two pointers for each fsnotify_mark.

Signed-off-by: Jan Kara <jack@suse.cz>
Cc: Eric Paris <eparis@redhat.com>
Cc: Heinrich Schuchardt <xypron.glpk@gmx.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-12-13 12:42:53 -08:00
Jan Kara 0809ab69a2 fsnotify: unify inode and mount marks handling
There's a lot of common code in inode and mount marks handling.  Factor it
out to a common helper function.

Signed-off-by: Jan Kara <jack@suse.cz>
Cc: Eric Paris <eparis@redhat.com>
Cc: Heinrich Schuchardt <xypron.glpk@gmx.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-12-13 12:42:53 -08:00
Manfred Spraul 0050ee059f ipc/msg: increase MSGMNI, remove scaling
SysV can be abused to allocate locked kernel memory.  For most systems, a
small limit doesn't make sense, see the discussion with regards to SHMMAX.

Therefore: increase MSGMNI to the maximum supported.

And: If we ignore the risk of locking too much memory, then an automatic
scaling of MSGMNI doesn't make sense.  Therefore the logic can be removed.

The code preserves auto_msgmni to avoid breaking any user space applications
that expect that the value exists.

Notes:
1) If an administrator must limit the memory allocations, then he can set
MSGMNI as necessary.

Or he can disable sysv entirely (as e.g. done by Android).

2) MSGMAX and MSGMNB are intentionally not increased, as these values are used
to control latency vs. throughput:
If MSGMNB is large, then msgsnd() just returns and more messages can be queued
before a task switch to a task that calls msgrcv() is forced.

[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Manfred Spraul <manfred@colorfullife.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Rafael Aquini <aquini@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-12-13 12:42:52 -08:00
Manfred Spraul e843e7d2c8 ipc/sem.c: increase SEMMSL, SEMMNI, SEMOPM
a)

SysV can be abused to allocate locked kernel memory.  For most systems, a
small limit doesn't make sense, see the discussion with regards to SHMMAX.

Therefore: Increase the sysv sem limits so that all known applications
will work with these defaults.

b)

With regards to the maximum supported:
Some of the specified hard limits are not correct anymore, therefore the
patch updates the documentation.

- SEMMNI must stay below IPCMNI, which is 32768.
  As for SHMMAX: Stay a bit below this limit.

- SEMMSL was limited to 8k, to ensure that the kmalloc for the kernel array
  was limited to 16 kB (order=2)

  This doesn't apply anymore:
   - the allocation size isn't sizeof(short)*nsems anymore.
   - ipc_alloc falls back to vmalloc

- SEMOPM should stay below 1000, to limit the kmalloc in semtimedop() to an
  order=1 allocation.
  Therefore: Leave it at 500 (order=0 allocation).

Note:
If an administrator must limit the memory allocations, then he can set the
values as necessary.

Or he can disable sysv entirely (as e.g. done by Android).

Signed-off-by: Manfred Spraul <manfred@colorfullife.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Acked-by: Rafael Aquini <aquini@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-12-13 12:42:52 -08:00
Dmitry Monakhov 6adc4a22f2 fault-inject: add ratelimit option
Current debug levels are not optimal.  Especially if one want to provoke
big numbers of faults(broken device simulator) then any verbose level will
produce giant numbers of identical logging messages.  Let's add ratelimit
parameter for that purpose.

Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
Acked-by: Akinobu Mita <akinobu.mita@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-12-13 12:42:52 -08:00
Dmitry Monakhov 89e3f90995 ratelimit: add initialization macro
Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
Cc: Akinobu Mita <akinobu.mita@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-12-13 12:42:52 -08:00
David Drysdale 51f39a1f0c syscalls: implement execveat() system call
This patchset adds execveat(2) for x86, and is derived from Meredydd
Luff's patch from Sept 2012 (https://lkml.org/lkml/2012/9/11/528).

The primary aim of adding an execveat syscall is to allow an
implementation of fexecve(3) that does not rely on the /proc filesystem,
at least for executables (rather than scripts).  The current glibc version
of fexecve(3) is implemented via /proc, which causes problems in sandboxed
or otherwise restricted environments.

Given the desire for a /proc-free fexecve() implementation, HPA suggested
(https://lkml.org/lkml/2006/7/11/556) that an execveat(2) syscall would be
an appropriate generalization.

Also, having a new syscall means that it can take a flags argument without
back-compatibility concerns.  The current implementation just defines the
AT_EMPTY_PATH and AT_SYMLINK_NOFOLLOW flags, but other flags could be
added in future -- for example, flags for new namespaces (as suggested at
https://lkml.org/lkml/2006/7/11/474).

Related history:
 - https://lkml.org/lkml/2006/12/27/123 is an example of someone
   realizing that fexecve() is likely to fail in a chroot environment.
 - http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=514043 covered
   documenting the /proc requirement of fexecve(3) in its manpage, to
   "prevent other people from wasting their time".
 - https://bugzilla.redhat.com/show_bug.cgi?id=241609 described a
   problem where a process that did setuid() could not fexecve()
   because it no longer had access to /proc/self/fd; this has since
   been fixed.

This patch (of 4):

Add a new execveat(2) system call.  execveat() is to execve() as openat()
is to open(): it takes a file descriptor that refers to a directory, and
resolves the filename relative to that.

In addition, if the filename is empty and AT_EMPTY_PATH is specified,
execveat() executes the file to which the file descriptor refers.  This
replicates the functionality of fexecve(), which is a system call in other
UNIXen, but in Linux glibc it depends on opening "/proc/self/fd/<fd>" (and
so relies on /proc being mounted).

The filename fed to the executed program as argv[0] (or the name of the
script fed to a script interpreter) will be of the form "/dev/fd/<fd>"
(for an empty filename) or "/dev/fd/<fd>/<filename>", effectively
reflecting how the executable was found.  This does however mean that
execution of a script in a /proc-less environment won't work; also, script
execution via an O_CLOEXEC file descriptor fails (as the file will not be
accessible after exec).

Based on patches by Meredydd Luff.

Signed-off-by: David Drysdale <drysdale@google.com>
Cc: Meredydd Luff <meredydd@senatehouse.org>
Cc: Shuah Khan <shuah.kh@samsung.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Rich Felker <dalias@aerifal.cx>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-12-13 12:42:51 -08:00
Vladimir Davydov 8135be5a80 memcg: fix possible use-after-free in memcg_kmem_get_cache()
Suppose task @t that belongs to a memory cgroup @memcg is going to
allocate an object from a kmem cache @c.  The copy of @c corresponding to
@memcg, @mc, is empty.  Then if kmem_cache_alloc races with the memory
cgroup destruction we can access the memory cgroup's copy of the cache
after it was destroyed:

CPU0				CPU1
----				----
[ current=@t
  @mc->memcg_params->nr_pages=0 ]

kmem_cache_alloc(@c):
  call memcg_kmem_get_cache(@c);
  proceed to allocation from @mc:
    alloc a page for @mc:
      ...

				move @t from @memcg
				destroy @memcg:
				  mem_cgroup_css_offline(@memcg):
				    memcg_unregister_all_caches(@memcg):
				      kmem_cache_destroy(@mc)

    add page to @mc

We could fix this issue by taking a reference to a per-memcg cache, but
that would require adding a per-cpu reference counter to per-memcg caches,
which would look cumbersome.

Instead, let's take a reference to a memory cgroup, which already has a
per-cpu reference counter, in the beginning of kmem_cache_alloc to be
dropped in the end, and move per memcg caches destruction from css offline
to css free.  As a side effect, per-memcg caches will be destroyed not one
by one, but all at once when the last page accounted to the memory cgroup
is freed.  This doesn't sound as a high price for code readability though.

Note, this patch does add some overhead to the kmem_cache_alloc hot path,
but it is pretty negligible - it's just a function call plus a per cpu
counter decrement, which is comparable to what we already have in
memcg_kmem_get_cache.  Besides, it's only relevant if there are memory
cgroups with kmem accounting enabled.  I don't think we can find a way to
handle this race w/o it, because alloc_page called from kmem_cache_alloc
may sleep so we can't flush all pending kmallocs w/o reference counting.

Signed-off-by: Vladimir Davydov <vdavydov@parallels.com>
Acked-by: Christoph Lameter <cl@linux.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-12-13 12:42:49 -08:00
Oleg Nesterov d003f371b2 oom: don't assume that a coredumping thread will exit soon
oom_kill.c assumes that PF_EXITING task should exit and free the memory
soon.  This is wrong in many ways and one important case is the coredump.
A task can sleep in exit_mm() "forever" while the coredumping sub-thread
can need more memory.

Change the PF_EXITING checks to take SIGNAL_GROUP_COREDUMP into account,
we add the new trivial helper for that.

Note: this is only the first step, this patch doesn't try to solve other
problems.  The SIGNAL_GROUP_COREDUMP check is obviously racy, a task can
participate in coredump after it was already observed in PF_EXITING state,
so TIF_MEMDIE (which also blocks oom-killer) still can be wrongly set.
fatal_signal_pending() can be true because of SIGNAL_GROUP_COREDUMP so
out_of_memory() and mem_cgroup_out_of_memory() shouldn't blindly trust it.
 And even the name/usage of the new helper is confusing, an exiting thread
can only free its ->mm if it is the only/last task in thread group.

[akpm@linux-foundation.org: add comment]
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Cc: Cong Wang <xiyou.wangcong@gmail.com>
Acked-by: David Rientjes <rientjes@google.com>
Acked-by: Michal Hocko <mhocko@suse.cz>
Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Cc: Tejun Heo <tj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-12-13 12:42:49 -08:00
Johannes Weiner 6b4f7799c6 mm: vmscan: invoke slab shrinkers from shrink_zone()
The slab shrinkers are currently invoked from the zonelist walkers in
kswapd, direct reclaim, and zone reclaim, all of which roughly gauge the
eligible LRU pages and assemble a nodemask to pass to NUMA-aware
shrinkers, which then again have to walk over the nodemask.  This is
redundant code, extra runtime work, and fairly inaccurate when it comes to
the estimation of actually scannable LRU pages.  The code duplication will
only get worse when making the shrinkers cgroup-aware and requiring them
to have out-of-band cgroup hierarchy walks as well.

Instead, invoke the shrinkers from shrink_zone(), which is where all
reclaimers end up, to avoid this duplication.

Take the count for eligible LRU pages out of get_scan_count(), which
considers many more factors than just the availability of swap space, like
zone_reclaimable_pages() currently does.  Accumulate the number over all
visited lruvecs to get the per-zone value.

Some nodes have multiple zones due to memory addressing restrictions.  To
avoid putting too much pressure on the shrinkers, only invoke them once
for each such node, using the class zone of the allocation as the pivot
zone.

For now, this integrates the slab shrinking better into the reclaim logic
and gets rid of duplicative invocations from kswapd, direct reclaim, and
zone reclaim.  It also prepares for cgroup-awareness, allowing
memcg-capable shrinkers to be added at the lruvec level without much
duplication of both code and runtime work.

This changes kswapd behavior, which used to invoke the shrinkers for each
zone, but with scan ratios gathered from the entire node, resulting in
meaningless pressure quantities on multi-zone nodes.

Zone reclaim behavior also changes.  It used to shrink slabs until the
same amount of pages were shrunk as were reclaimed from the LRUs.  Now it
merely invokes the shrinkers once with the zone's scan ratio, which makes
the shrinkers go easier on caches that implement aging and would prefer
feeding back pressure from recently used slab objects to unused LRU pages.

[vdavydov@parallels.com: assure class zone is populated]
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Dave Chinner <david@fromorbit.com>
Signed-off-by: Vladimir Davydov <vdavydov@parallels.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-12-13 12:42:48 -08:00
Davidlohr Bueso f5f302e212 mm,vmacache: count number of system-wide flushes
These flushes deal with sequence number overflows, such as for long lived
threads.  These are rare, but interesting from a debugging PoV.  As such,
display the number of flushes when vmacache debugging is enabled.

Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-12-13 12:42:48 -08:00
Joonsoo Kim 48c96a3685 mm/page_owner: keep track of page owners
This is the page owner tracking code which is introduced so far ago.  It
is resident on Andrew's tree, though, nobody tried to upstream so it
remain as is.  Our company uses this feature actively to debug memory leak
or to find a memory hogger so I decide to upstream this feature.

This functionality help us to know who allocates the page.  When
allocating a page, we store some information about allocation in extra
memory.  Later, if we need to know status of all pages, we can get and
analyze it from this stored information.

In previous version of this feature, extra memory is statically defined in
struct page, but, in this version, extra memory is allocated outside of
struct page.  It enables us to turn on/off this feature at boottime
without considerable memory waste.

Although we already have tracepoint for tracing page allocation/free,
using it to analyze page owner is rather complex.  We need to enlarge the
trace buffer for preventing overlapping until userspace program launched.
And, launched program continually dump out the trace buffer for later
analysis and it would change system behaviour with more possibility rather
than just keeping it in memory, so bad for debug.

Moreover, we can use page_owner feature further for various purposes.  For
example, we can use it for fragmentation statistics implemented in this
patch.  And, I also plan to implement some CMA failure debugging feature
using this interface.

I'd like to give the credit for all developers contributed this feature,
but, it's not easy because I don't know exact history.  Sorry about that.
Below is people who has "Signed-off-by" in the patches in Andrew's tree.

Contributor:
Alexander Nyberg <alexn@dsv.su.se>
Mel Gorman <mgorman@suse.de>
Dave Hansen <dave@linux.vnet.ibm.com>
Minchan Kim <minchan@kernel.org>
Michal Nazarewicz <mina86@mina86.com>
Andrew Morton <akpm@linux-foundation.org>
Jungsoo Son <jungsoo.son@lge.com>

Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Dave Hansen <dave@sr71.net>
Cc: Michal Nazarewicz <mina86@mina86.com>
Cc: Jungsoo Son <jungsoo.son@lge.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-12-13 12:42:48 -08:00
Joonsoo Kim 9a92a6ce6f stacktrace: introduce snprint_stack_trace for buffer output
Current stacktrace only have the function for console output.  page_owner
that will be introduced in following patch needs to print the output of
stacktrace into the buffer for our own output format so so new function,
snprint_stack_trace(), is needed.

Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Dave Hansen <dave@sr71.net>
Cc: Michal Nazarewicz <mina86@mina86.com>
Cc: Jungsoo Son <jungsoo.son@lge.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-12-13 12:42:48 -08:00
Joonsoo Kim 031bc5743f mm/debug-pagealloc: make debug-pagealloc boottime configurable
Now, we have prepared to avoid using debug-pagealloc in boottime.  So
introduce new kernel-parameter to disable debug-pagealloc in boottime, and
makes related functions to be disabled in this case.

Only non-intuitive part is change of guard page functions.  Because guard
page is effective only if debug-pagealloc is enabled, turning off
according to debug-pagealloc is reasonable thing to do.

Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Dave Hansen <dave@sr71.net>
Cc: Michal Nazarewicz <mina86@mina86.com>
Cc: Jungsoo Son <jungsoo.son@lge.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-12-13 12:42:48 -08:00
Joonsoo Kim e30825f186 mm/debug-pagealloc: prepare boottime configurable on/off
Until now, debug-pagealloc needs extra flags in struct page, so we need to
recompile whole source code when we decide to use it.  This is really
painful, because it takes some time to recompile and sometimes rebuild is
not possible due to third party module depending on struct page.  So, we
can't use this good feature in many cases.

Now, we have the page extension feature that allows us to insert extra
flags to outside of struct page.  This gets rid of third party module
issue mentioned above.  And, this allows us to determine if we need extra
memory for this page extension in boottime.  With these property, we can
avoid using debug-pagealloc in boottime with low computational overhead in
the kernel built with CONFIG_DEBUG_PAGEALLOC.  This will help our
development process greatly.

This patch is the preparation step to achive above goal.  debug-pagealloc
originally uses extra field of struct page, but, after this patch, it will
use field of struct page_ext.  Because memory for page_ext is allocated
later than initialization of page allocator in CONFIG_SPARSEMEM, we should
disable debug-pagealloc feature temporarily until initialization of
page_ext.  This patch implements this.

Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Dave Hansen <dave@sr71.net>
Cc: Michal Nazarewicz <mina86@mina86.com>
Cc: Jungsoo Son <jungsoo.son@lge.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-12-13 12:42:48 -08:00
Joonsoo Kim eefa864b70 mm/page_ext: resurrect struct page extending code for debugging
When we debug something, we'd like to insert some information to every
page.  For this purpose, we sometimes modify struct page itself.  But,
this has drawbacks.  First, it requires re-compile.  This makes us
hesitate to use the powerful debug feature so development process is
slowed down.  And, second, sometimes it is impossible to rebuild the
kernel due to third party module dependency.  At third, system behaviour
would be largely different after re-compile, because it changes size of
struct page greatly and this structure is accessed by every part of
kernel.  Keeping this as it is would be better to reproduce errornous
situation.

This feature is intended to overcome above mentioned problems.  This
feature allocates memory for extended data per page in certain place
rather than the struct page itself.  This memory can be accessed by the
accessor functions provided by this code.  During the boot process, it
checks whether allocation of huge chunk of memory is needed or not.  If
not, it avoids allocating memory at all.  With this advantage, we can
include this feature into the kernel in default and can avoid rebuild and
solve related problems.

Until now, memcg uses this technique.  But, now, memcg decides to embed
their variable to struct page itself and it's code to extend struct page
has been removed.  I'd like to use this code to develop debug feature, so
this patch resurrect it.

To help these things to work well, this patch introduces two callbacks for
clients.  One is the need callback which is mandatory if user wants to
avoid useless memory allocation at boot-time.  The other is optional, init
callback, which is used to do proper initialization after memory is
allocated.  Detailed explanation about purpose of these functions is in
code comment.  Please refer it.

Others are completely same with previous extension code in memcg.

Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Dave Hansen <dave@sr71.net>
Cc: Michal Nazarewicz <mina86@mina86.com>
Cc: Jungsoo Son <jungsoo.son@lge.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-12-13 12:42:48 -08:00
Jianyu Zhan 2d48366b3f mm, gfp: escalatedly define GFP_HIGHUSER and GFP_HIGHUSER_MOVABLE
GFP_USER, GFP_HIGHUSER and GFP_HIGHUSER_MOVABLE are escalatedly confined
defined, also implied by their names:

GFP_USER                                  = GFP_USER
GFP_USER + __GFP_HIGHMEM                  = GFP_HIGHUSER
GFP_USER + __GFP_HIGHMEM + __GFP_MOVABLE  = GFP_HIGHUSER_MOVABLE

So just make GFP_HIGHUSER and GFP_HIGHUSER_MOVABLE escalatedly defined to
reflect this fact.  It also makes the definition clear and texturally warn
on any furture break-up of this escalated relastionship.

Signed-off-by: Jianyu Zhan <jianyu.zhan@emc.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-12-13 12:42:48 -08:00
Andrew Morton 66f2ca7e3f include/linux/kmemleak.h: needs slab.h
include/linux/kmemleak.h: In function 'kmemleak_alloc_recursive':
include/linux/kmemleak.h:43: error: 'SLAB_NOLEAKTRACE' undeclared (first use in this function)

Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-12-13 12:42:47 -08:00
Zhang Zhen 056b7ccef4 mm/memcontrol.c: remove the unused arg in __memcg_kmem_get_cache()
The gfp was passed in but never used in this function.

Signed-off-by: Zhang Zhen <zhenzhang.zhang@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-12-13 12:42:47 -08:00
Tejun Heo bd6dace78b mm: move swp_entry_t definition to include/linux/mm_types.h
swp_entry_t being defined in include/linux/swap.h instead of
include/linux/mm_types.h causes cyclic include dependency later when
include/linux/page_cgroup.h is included from writeback path.  Move the
definition to include/linux/mm_types.h.

While at it, reformat the comment above it.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-12-13 12:42:47 -08:00
Vladimir Davydov 6f185c290e memcg: turn memcg_kmem_skip_account into a bit field
It isn't supposed to stack, so turn it into a bit-field to save 4 bytes on
the task_struct.

Also, remove the memcg_stop/resume_kmem_account helpers - it is clearer to
set/clear the flag inline.  Regarding the overwhelming comment to the
helpers, which is removed by this patch too, we already have a compact yet
accurate explanation in memcg_schedule_cache_create, no need in yet
another one.

Signed-off-by: Vladimir Davydov <vdavydov@parallels.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-12-13 12:42:47 -08:00
Michal Nazarewicz 5e19b013f5 lib: bitmap: add alignment offset for bitmap_find_next_zero_area()
Add a bitmap_find_next_zero_area_off() function which works like
bitmap_find_next_zero_area() function except it allows an offset to be
specified when alignment is checked.  This lets caller request a bit such
that its number plus the offset is aligned according to the mask.

[gregory.0xf0@gmail.com: Retrieved from https://patchwork.linuxtv.org/patch/6254/ and updated documentation]
Signed-off-by: Michal Nazarewicz <mina86@mina86.com>
Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
Signed-off-by: Gregory Fong <gregory.0xf0@gmail.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Kukjin Kim <kgene.kim@samsung.com>
Cc: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
Cc: Laura Abbott <lauraa@codeaurora.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-12-13 12:42:46 -08:00
Davidlohr Bueso 3dec0ba0be mm/rmap: share the i_mmap_rwsem
Similarly to the anon memory counterpart, we can share the mapping's lock
ownership as the interval tree is not modified when doing doing the walk,
only the file page.

Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Acked-by: Rik van Riel <riel@redhat.com>
Acked-by: "Kirill A. Shutemov" <kirill@shutemov.name>
Acked-by: Hugh Dickins <hughd@google.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Acked-by: Mel Gorman <mgorman@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-12-13 12:42:45 -08:00
Davidlohr Bueso c8c06efa8b mm: convert i_mmap_mutex to rwsem
The i_mmap_mutex is a close cousin of the anon vma lock, both protecting
similar data, one for file backed pages and the other for anon memory.  To
this end, this lock can also be a rwsem.  In addition, there are some
important opportunities to share the lock when there are no tree
modifications.

This conversion is straightforward.  For now, all users take the write
lock.

[sfr@canb.auug.org.au: update fremap.c]
Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Reviewed-by: Rik van Riel <riel@redhat.com>
Acked-by: "Kirill A. Shutemov" <kirill@shutemov.name>
Acked-by: Hugh Dickins <hughd@google.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Acked-by: Mel Gorman <mgorman@suse.de>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-12-13 12:42:45 -08:00
Davidlohr Bueso 8b28f621be mm,fs: introduce helpers around the i_mmap_mutex
This series is a continuation of the conversion of the i_mmap_mutex to
rwsem, following what we have for the anon memory counterpart.  With
Hugh's feedback from the first iteration.

Ultimately, the most obvious paths that require exclusive ownership of the
lock is when we modify the VMA interval tree, via
vma_interval_tree_insert() and vma_interval_tree_remove() families.  Cases
such as unmapping, where the ptes content is changed but the tree remains
untouched should make it safe to share the i_mmap_rwsem.

As such, the code of course is straightforward, however the devil is very
much in the details.  While its been tested on a number of workloads
without anything exploding, I would not be surprised if there are some
less documented/known assumptions about the lock that could suffer from
these changes.  Or maybe I'm just missing something, but either way I
believe its at the point where it could use more eyes and hopefully some
time in linux-next.

Because the lock type conversion is the heart of this patchset,
its worth noting a few comparisons between mutex vs rwsem (xadd):

  (i) Same size, no extra footprint.

  (ii) Both have CONFIG_XXX_SPIN_ON_OWNER capabilities for
       exclusive lock ownership.

  (iii) Both can be slightly unfair wrt exclusive ownership, with
        writer lock stealing properties, not necessarily respecting
        FIFO order for granting the lock when contended.

  (iv) Mutexes can be slightly faster than rwsems when
       the lock is non-contended.

  (v) Both suck at performance for debug (slowpaths), which
      shouldn't matter anyway.

Sharing the lock is obviously beneficial, and sem writer ownership is
close enough to mutexes.  The biggest winner of these changes is
migration.

As for concrete numbers, the following performance results are for a
4-socket 60-core IvyBridge-EX with 130Gb of RAM.

Both alltests and disk (xfs+ramdisk) workloads of aim7 suite do quite well
with this set, with a steady ~60% throughput (jpm) increase for alltests
and up to ~30% for disk for high amounts of concurrency.  Lower counts of
workload users (< 100) does not show much difference at all, so at least
no regressions.

                    3.18-rc1            3.18-rc1-i_mmap_rwsem
alltests-100     17918.72 (  0.00%)    28417.97 ( 58.59%)
alltests-200     16529.39 (  0.00%)    26807.92 ( 62.18%)
alltests-300     16591.17 (  0.00%)    26878.08 ( 62.00%)
alltests-400     16490.37 (  0.00%)    26664.63 ( 61.70%)
alltests-500     16593.17 (  0.00%)    26433.72 ( 59.30%)
alltests-600     16508.56 (  0.00%)    26409.20 ( 59.97%)
alltests-700     16508.19 (  0.00%)    26298.58 ( 59.31%)
alltests-800     16437.58 (  0.00%)    26433.02 ( 60.81%)
alltests-900     16418.35 (  0.00%)    26241.61 ( 59.83%)
alltests-1000    16369.00 (  0.00%)    26195.76 ( 60.03%)
alltests-1100    16330.11 (  0.00%)    26133.46 ( 60.03%)
alltests-1200    16341.30 (  0.00%)    26084.03 ( 59.62%)
alltests-1300    16304.75 (  0.00%)    26024.74 ( 59.61%)
alltests-1400    16231.08 (  0.00%)    25952.35 ( 59.89%)
alltests-1500    16168.06 (  0.00%)    25850.58 ( 59.89%)
alltests-1600    16142.56 (  0.00%)    25767.42 ( 59.62%)
alltests-1700    16118.91 (  0.00%)    25689.58 ( 59.38%)
alltests-1800    16068.06 (  0.00%)    25599.71 ( 59.32%)
alltests-1900    16046.94 (  0.00%)    25525.92 ( 59.07%)
alltests-2000    16007.26 (  0.00%)    25513.07 ( 59.38%)

disk-100          7582.14 (  0.00%)     7257.48 ( -4.28%)
disk-200          6962.44 (  0.00%)     7109.15 (  2.11%)
disk-300          6435.93 (  0.00%)     6904.75 (  7.28%)
disk-400          6370.84 (  0.00%)     6861.26 (  7.70%)
disk-500          6353.42 (  0.00%)     6846.71 (  7.76%)
disk-600          6368.82 (  0.00%)     6806.75 (  6.88%)
disk-700          6331.37 (  0.00%)     6796.01 (  7.34%)
disk-800          6324.22 (  0.00%)     6788.00 (  7.33%)
disk-900          6253.52 (  0.00%)     6750.43 (  7.95%)
disk-1000         6242.53 (  0.00%)     6855.11 (  9.81%)
disk-1100         6234.75 (  0.00%)     6858.47 ( 10.00%)
disk-1200         6312.76 (  0.00%)     6845.13 (  8.43%)
disk-1300         6309.95 (  0.00%)     6834.51 (  8.31%)
disk-1400         6171.76 (  0.00%)     6787.09 (  9.97%)
disk-1500         6139.81 (  0.00%)     6761.09 ( 10.12%)
disk-1600         4807.12 (  0.00%)     6725.33 ( 39.90%)
disk-1700         4669.50 (  0.00%)     5985.38 ( 28.18%)
disk-1800         4663.51 (  0.00%)     5972.99 ( 28.08%)
disk-1900         4674.31 (  0.00%)     5949.94 ( 27.29%)
disk-2000         4668.36 (  0.00%)     5834.93 ( 24.99%)

In addition, a 67.5% increase in successfully migrated NUMA pages, thus
improving node locality.

The patch layout is simple but designed for bisection (in case reversion
is needed if the changes break upstream) and easier review:

o Patches 1-4 convert the i_mmap lock from mutex to rwsem.
o Patches 5-10 share the lock in specific paths, each patch
  details the rationale behind why it should be safe.

This patchset has been tested with: postgres 9.4 (with brand new hugetlb
support), hugetlbfs test suite (all tests pass, in fact more tests pass
with these changes than with an upstream kernel), ltp, aim7 benchmarks,
memcached and iozone with the -B option for mmap'ing.  *Untested* paths
are nommu, memory-failure, uprobes and xip.

This patch (of 8):

Various parts of the kernel acquire and release this mutex, so add
i_mmap_lock_write() and immap_unlock_write() helper functions that will
encapsulate this logic.  The next patch will make use of these.

Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Reviewed-by: Rik van Riel <riel@redhat.com>
Acked-by: "Kirill A. Shutemov" <kirill@shutemov.name>
Acked-by: Hugh Dickins <hughd@google.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Acked-by: Mel Gorman <mgorman@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-12-13 12:42:45 -08:00
Linus Torvalds f96fe22567 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull another networking update from David Miller:
 "Small follow-up to the main merge pull from the other day:

  1) Alexander Duyck's DMA memory barrier patch set.

  2) cxgb4 driver fixes from Karen Xie.

  3) Add missing export of fixed_phy_register() to modules, from Mark
     Salter.

  4) DSA bug fixes from Florian Fainelli"

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (24 commits)
  net/macb: add TX multiqueue support for gem
  linux/interrupt.h: remove the definition of unused tasklet_hi_enable
  jme: replace calls to redundant function
  net: ethernet: davicom: Allow to select DM9000 for nios2
  net: ethernet: smsc: Allow to select SMC91X for nios2
  cxgb4: Add support for QSA modules
  libcxgbi: fix freeing skb prematurely
  cxgb4i: use set_wr_txq() to set tx queues
  cxgb4i: handle non-pdu-aligned rx data
  cxgb4i: additional types of negative advice
  cxgb4/cxgb4i: set the max. pdu length in firmware
  cxgb4i: fix credit check for tx_data_wr
  cxgb4i: fix tx immediate data credit check
  net: phy: export fixed_phy_register()
  fib_trie: Fix trie balancing issue if new node pushes down existing node
  vlan: Add ability to always enable TSO/UFO
  r8169:update rtl8168g pcie ephy parameter
  net: dsa: bcm_sf2: force link for all fixed PHY devices
  fm10k/igb/ixgbe: Use dma_rmb on Rx descriptor reads
  r8169: Use dma_rmb() and dma_wmb() for DescOwn checks
  ...
2014-12-12 16:11:12 -08:00
Linus Torvalds 26ceb127f7 Merge branch 'for-linus' of git://ftp.arm.linux.org.uk/~rmk/linux-arm
Pull ARM updates from Russell King:
 "The major updates included in this update are:

   - Clang compatible stack pointer accesses by Behan Webster.
   - SA11x0 updates from Dmitry Eremin-Solenikov.
   - kgdb handling of breakpoints with read-only text/modules
   - Support for Privileged-no-execute feature on ARMv7 to prevent
     userspace code execution by the kernel.
   - AMBA primecell bus handling of irq-safe runtime PM
   - Unwinding support for memset/memzero/memmove/memcpy functions
   - VFP fixes for Krait CPUs and improvements in detecting the VFP
     architecture
   - A number of code cleanups (using pr_*, removing or reducing the
     severity of a couple of kernel messages, splitting ftrace asm code
     out to a separate file, etc.)
   - Add machine name to stack dump output"

* 'for-linus' of git://ftp.arm.linux.org.uk/~rmk/linux-arm: (62 commits)
  ARM: 8247/2: pcmcia: sa1100: make use of device clock
  ARM: 8246/2: pcmcia: sa1111: provide device clock
  ARM: 8245/1: pcmcia: soc-common: enable/disable socket clocks
  ARM: 8244/1: fbdev: sa1100fb: make use of device clock
  ARM: 8243/1: sa1100: add a clock alias for sa1111 pcmcia device
  ARM: 8242/1: sa1100: add cpu clock
  ARM: 8221/1: PJ4: allow building in Thumb-2 mode
  ARM: 8234/1: sa1100: reorder IRQ handling code
  ARM: 8233/1: sa1100: switch to hwirq usage
  ARM: 8232/1: sa1100: merge GPIO multiplexer IRQ to "normal" irq domain
  ARM: 8231/1: sa1100: introduce irqdomains support
  ARM: 8230/1: sa1100: shift IRQs by one
  ARM: 8229/1: sa1100: replace irq numbers with names in irq driver
  ARM: 8228/1: sa1100: drop entry-macro.S
  ARM: 8227/1: sa1100: switch to MULTI_IRQ_HANDLER
  ARM: 8241/1: Update processor_modes for hyp and monitor mode
  ARM: 8240/1: MCPM: document mcpm_sync_init()
  ARM: 8239/1: Introduce {set,clear}_pte_bit
  ARM: 8238/1: mm: Refine set_memory_* functions
  ARM: 8237/1: fix flush_pfn_alias
  ...
2014-12-12 15:26:48 -08:00
Linus Torvalds 8d14066755 IOMMU Updates for Linux v3.19
This time with:
 
 	* A new IOMMU-API call: iommu_map_sg() to map multiple
 	  non-contiguous pages into an IO address space with only one
 	  API call. This allows certain optimizations in the IOMMU
 	  driver.
 
 	* DMAR device hotplug in the Intel VT-d driver. It is now
 	  possible to hotplug the IOMMU itself.
 
 	* A new IOMMU driver for the Rockchip ARM platform.
 
 	* Couple of cleanups and improvements in the OMAP IOMMU driver.
 
 	* Nesting support for the ARM-SMMU driver.
 
 	* Various other small cleanups and improvements.
 
 Please note that this time some branches were also pulled into other
 trees, like the DRI and the Tegra tree. The VT-d branch was also pulled
 into tip/x86/apic.
 Some patches for the AMD IOMMUv2 driver are not in the IOMMU tree but
 were merged by Andrew (or finally ended up in the DRI tree).
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iQIcBAABAgAGBQJUiwktAAoJECvwRC2XARrjweAP/Rr4igltfUFjfSp7iQBoQxQP
 qmFv3/A6f0gxfJ2IhA8ZnmJwoNAxYnqCEl+vA0FYDnXxO6CCHXWkeO5NprwU+fuG
 BZDn4lmg+GhpTCXS5668l+MZxtqZCMaCQbK5pm+b+uk8qkXznlVs2Nb00BBL/TTo
 4WhyLPbNnAfQkBGTFf47QqSi5YYPmJ44TvIcHPsFBz0dTesO7JKg9c4HpyxUmnMs
 7VDPWJmiuTJ+/UISkFzxHVw9GcL6XXhdu70XEFrVo6wnXRDTqbDtBUP8vnghnqwG
 kOIYrY+7HO3iEDaiiSGqxeMH5Uac2yIlQi4W4TXWg7WpKenZPqnwZusD50vO3HyB
 9L4VIL14gbgV6s+/9YNDYR494d9+xRjGMO4FAIIWnVFmE98+zqHpc8OaF0azY4N7
 3vxlM6VGvWytTmTpU/J/VMhwFPo/6QHdGpBa0k0+ACMQ8LTPt+Q7o+fTBcDCgnJW
 SAa5AGruMBFgZ2RtN9bgAGxAhuSPmh7pD+va5vUzvAngY0o6tamblh9G6Sk26nka
 y0RZVJ+vplP3jvLXMy9SByxGgBv3/iNF8YUd7Zp0rpZyW8KElRm4m9xuvxzu9cIW
 3SR+/pEhvqqXo48XS040FScD65QjGbKWGuiH+Zs+6Ij4MY6xxCAlf7EV+8DIiTO5
 LcI66Fk6oFzFKz2BRBpn
 =x7Ei
 -----END PGP SIGNATURE-----

Merge tag 'iommu-updates-v3.19' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu

Pull IOMMU updates from Joerg Roedel:
 "This time with:

   - A new IOMMU-API call: iommu_map_sg() to map multiple non-contiguous
     pages into an IO address space with only one API call.  This allows
     certain optimizations in the IOMMU driver.

   - DMAR device hotplug in the Intel VT-d driver.  It is now possible
     to hotplug the IOMMU itself.

   - A new IOMMU driver for the Rockchip ARM platform.

   - Couple of cleanups and improvements in the OMAP IOMMU driver.

   - Nesting support for the ARM-SMMU driver.

   - Various other small cleanups and improvements.

  Please note that this time some branches were also pulled into other
  trees, like the DRI and the Tegra tree.  The VT-d branch was also
  pulled into tip/x86/apic.

  Some patches for the AMD IOMMUv2 driver are not in the IOMMU tree but
  were merged by Andrew (or finally ended up in the DRI tree)"

* tag 'iommu-updates-v3.19' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu: (42 commits)
  iommu: Decouple iommu_map_sg from CPU page size
  iommu/vt-d: Fix an off-by-one bug in __domain_mapping()
  pci, ACPI, iommu: Enhance pci_root to support DMAR device hotplug
  iommu/vt-d: Enhance intel-iommu driver to support DMAR unit hotplug
  iommu/vt-d: Enhance error recovery in function intel_enable_irq_remapping()
  iommu/vt-d: Enhance intel_irq_remapping driver to support DMAR unit hotplug
  iommu/vt-d: Search for ACPI _DSM method for DMAR hotplug
  iommu/vt-d: Implement DMAR unit hotplug framework
  iommu/vt-d: Dynamically allocate and free seq_id for DMAR units
  iommu/vt-d: Introduce helper function dmar_walk_resources()
  iommu/arm-smmu: add support for DOMAIN_ATTR_NESTING attribute
  iommu/arm-smmu: Play nice on non-ARM/SMMU systems
  iommu/amd: remove compiler warning due to IOMMU_CAP_NOEXEC
  iommu/arm-smmu: add IOMMU_CAP_NOEXEC to the ARM SMMU driver
  iommu: add capability IOMMU_CAP_NOEXEC
  iommu/arm-smmu: change IOMMU_EXEC to IOMMU_NOEXEC
  iommu/amd: Fix accounting of device_state
  x86/vt-d: Fix incorrect bit operations in setting values
  iommu/rockchip: Allow to compile with COMPILE_TEST
  iommu/ipmmu-vmsa: Return proper error if devm_request_irq fails
  ...
2014-12-12 15:10:34 -08:00
Linus Torvalds 87c779baab Merge branch 'for-linus' of git://git.infradead.org/users/vkoul/slave-dma
Pull dmaengine updates from Vinod Koul:
 "Main features this time are:

   - BAM v1.3.0 support form qcom bam dma
   - support for Allwinner sun8i dma
   - atmels eXtended DMA Controller driver
   - chancnt cleanup by Maxime
   - fixes spread over drivers"

* 'for-linus' of git://git.infradead.org/users/vkoul/slave-dma: (56 commits)
  dmaenegine: Delete a check before free_percpu()
  dmaengine: ioatdma: fix dma mapping errors
  dma: cppi41: add a delay while setting the TD bit
  dma: cppi41: wait longer for the HW to return the descriptor
  dmaengine: fsl-edma: fixup reg offset and hw S/G support in big-endian model
  dmaengine: fsl-edma: fix calculation of remaining bytes
  drivers/dma/pch_dma: declare pch_dma_id_table as static
  dmaengine: ste_dma40: fix error return code
  dma: imx-sdma: clarify about firmware not found error
  Documentation: devicetree: Fix Xilinx VDMA specification
  dmaengine: pl330: update author info
  dmaengine: clarify the issue_pending expectations
  dmaengine: at_xdmac: Add DMA_PRIVATE
  ARM: dts: at_xdmac: fix bad value of dma-cells in documentation
  dmaengine: at_xdmac: fix missing spin_unlock
  dmaengine: at_xdmac: fix a bug in transfer residue computation
  dmaengine: at_xdmac: fix software lockup at_xdmac_tx_status()
  dmaengine: at_xdmac: remove chancnt affectation
  dmaengine: at_xdmac: prefer usage of readl/writel_relaxed
  dmaengine: xdmac: fix print warning on dma_addr_t variable
  ...
2014-12-12 14:59:53 -08:00
Linus Torvalds eea0cf3fcd IPMI Driver updates for 3.19
For the following changes:
   - Quite a few bug fixes
   - A new driver for the powernv
   - A new driver for the SMBus interface from the IPMI 2.0 specification
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iEYEABECAAYFAlSKGRsACgkQIXnXXONXEReJngCeJsMEdY9pBJ9GEppkiSv0HG74
 VR8AoJb3PQ2SfqmAbT0RgACWEkSuWCdj
 =9NNo
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.code.sf.net/p/openipmi/linux-ipmi

Pull IPMI driver updates from Corey Minyard:
  - Quite a few bug fixes
  - A new driver for the powernv
  - A new driver for the SMBus interface from the IPMI 2.0 specification

* tag 'for-linus' of git://git.code.sf.net/p/openipmi/linux-ipmi:
  ipmi: Check the BT interrupt enable periodically
  ipmi: Fix attention handling for system interfaces
  ipmi: Periodically check to see if irqs and messages are set right
  drivers/char/ipmi: Add powernv IPMI driver
  ipmi: Add SMBus interface driver (SSIF)
  ipmi: Remove the now unused priority from SMI sender
  ipmi: Remove the now unnecessary message queue
  ipmi: Make the message handler easier to use for SMI interfaces
  ipmi: Move message sending into its own function
  ipmi: rename waiting_msgs to waiting_rcv_msgs
  ipmi: Fix handling of BMC flags
  ipmi: Initialize BMC device attributes
  ipmi: Unregister previously registered driver in error case
  ipmi: Use the proper type for acpi_handle
  ipmi: Fix a bug in hot add/remove
  ipmi: Remove useless sysfs_name parameters
  ipmi: clean up the device handling for the bmc device
  ipmi: Move the address source to string to ipmi-generic code
  ipmi: Ignore SSIF in the PNP handling
2014-12-12 14:49:56 -08:00
Linus Torvalds 823e334ecd Docs changes for the 3.19 merge window
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJUiaPRAAoJEI3ONVYwIuV68qQP/2j93CDahLEXlpbimdHCOyKO
 xLKYwSIOxpzCTv8mQouNH4czVDheSSrq+xDUafioaMRlM80fb1EdhC9Nhr/xT/O+
 fCsgfLETlpxAS4j6gjn60H8QYdnBgcz0p2hOgvysf8WHZ0PHzSnfE78BGS1Mnugs
 gjF97y5pa3p+uTNNxHA7aB+Rg3/JbsuDXH0CtdzjpsS0bM/MBnGV0AQLLZ3qdzvf
 6bSwb8LyV54cRjYrs0SZT0OWrPXlUc0AxjrQjq2IcyL+s5Uwu/Eb1RoZ69C14ohC
 h5A1C87v4asqrzsC0VCqG48wi36+2k7zfVT7wSkHfERXWGtjpd7Uy3lccMMPLg/H
 Mnt6EKDVn4L2b8VOM1wl8pj0kIbln/g2whJ6x40Q3R4uO8o94Vy0HpeTTeNxdwyY
 iERVkHPuRAZ2SLrz4BheiGRgcePgocMrofyGcJ3/ezMGcq3sUZnAJYmJYF+S5jkZ
 Q94ee8jgypgEvuQHX52n3AiqJspNpqAGqXE5ZY4PgLt4KlwbT1yQ5hXUX34ou+3u
 sLkFO8vNUQa4o1TTfwCzkPun/Aoy+Rb8BCzgbml/FlPgidXpNcofO/g5BWRjXwOX
 mpUCNPiMhryx0aTrEgl1GChholqToakS6RTvqdxUKC3honKWBQ93zsWSDkEzS+3h
 +qxzXDCN3u3Py9r/XvPK
 =Pq+D
 -----END PGP SIGNATURE-----

Merge tag 'docs-for-linus' of git://git.lwn.net/linux-2.6

Pull documentation update from Jonathan Corbet:
 "Here's my set of accumulated documentation changes for 3.19.

  It includes a couple of additions to the coding style document, some
  fixes for minor build problems within the documentation tree, the
  relocation of the kselftest docs, and various tweaks and additions.

  A couple of changes reach outside of Documentation/; they only make
  trivial comment changes and I did my best to get the required acks.

  Complete with a shiny signed tag this time around"

* tag 'docs-for-linus' of git://git.lwn.net/linux-2.6:
  kobject: grammar fix
  Input: xpad - update docs to reflect current state
  Documentation: Build mic/mpssd only for x86_64
  cgroups: Documentation: fix wrong cgroupfs paths
  Documentation/email-clients.txt: add info about Claws Mail
  CodingStyle: add some more error handling guidelines
  kselftest: Move the docs to the Documentation dir
  Documentation: fix formatting to make 's' happy
  Documentation: power: Fix typo in Documentation/power
  Documentation: vm: Add 1GB large page support information
  ipv4: add kernel parameter tcpmhash_entries
  Documentation: Fix a typo in mailbox.txt
  treewide: Fix typo in Documentation/DocBook/device-drivers
  CodingStyle: Add a chapter on conditional compilation
2014-12-12 14:42:48 -08:00
Quentin Lambert 7142637dd9 linux/interrupt.h: remove the definition of unused tasklet_hi_enable
Signed-off-by: Quentin Lambert <lambert.quentin@gmail.com>
Signed-off-by: Valentin Rothberg <valentinrothberg@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-12 15:15:41 -05:00
Linus Torvalds 6ce4436c9c Couple of pstore-ram enhancements to allow use of different memory attributes
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJUi0B6AAoJEKurIx+X31iByH8P/jfMgzyUO+KpJMA1DbgCAG7x
 WPJgbMUyPwB63DH09RyMEmiwf61Rl1klXTPVNY0Dnj7qRJOmpB9U3vGIfO4HpD84
 5IZMBlc+Jl+kJCxSAJYbTJTZLsIMjFGOfuVTvlY+HnMBitQVBumKptmC0DoBBqgz
 yYy5MHRMaVoHcogyMyBiknmxdxu6/ruUKY+6yyvdUESt0SCcJG8V6Qik7TMmnx47
 NvIIPzfibvvLLnd8IOEj2fwh8XMtJdfcCxPpAEvEaNq0jZEDF9K22jttTQvl9r92
 NQf7JKQQrNfzloRZ3flKax5ZMGi9RkcirTLLdJ4I2xMGVHOA4XUAjsSCYR6INuuJ
 Ox00FnuiIrADNw37m52Y+ujPTF1C2PQUNK69gwsLd84MSjy+95F2dlC5cC3Yt4N5
 rpstXxWELZTqjMGD8GTPOpv6zlg799IbFexr4H6KTc+47EX0MNayJiI6L597gYnq
 gIiPmDnnz6WlWp4HHgBIwjNAH3Tbf/uU3MlgzqS3Ftd7YkYmLnxvClhrwgErviFn
 Nfnz2LtGuMxMHSt0uSWxODVEaR4reKRVJBvhRSGWL1PufylEyt0YWayiqpohuKD9
 6X/RufWK5qdCBHytoGyMUZ57oqxth9QSVG4RBkGPmaZgMq/5DdyOhBfW0yInjMuo
 AuDMmqrU5yFTitLMGcsG
 =kcmD
 -----END PGP SIGNATURE-----

Merge tag 'please-pull-morepstore' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux

Pull pstore update #2 from Tony Luck:
 "Couple of pstore-ram enhancements to allow use of different memory
  attributes"

* tag 'please-pull-morepstore' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux:
  pstore-ram: Allow optional mapping with pgprot_noncached
  pstore-ram: Fix hangs by using write-combine mappings
2014-12-12 11:34:13 -08:00
Linus Torvalds bdeb03cada Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs
Pull btrfs update from Chris Mason:
 "From a feature point of view, most of the code here comes from Miao
  Xie and others at Fujitsu to implement scrubbing and replacing devices
  on raid56.  This has been in development for a while, and it's a big
  improvement.

  Filipe and Josef have a great assortment of fixes, many of which solve
  problems corruptions either after a crash or in error conditions.  I
  still have a round two from Filipe for next week that solves
  corruptions with discard and block group removal"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs: (62 commits)
  Btrfs: make get_caching_control unconditionally return the ctl
  Btrfs: fix unprotected deletion from pending_chunks list
  Btrfs: fix fs mapping extent map leak
  Btrfs: fix memory leak after block remove + trimming
  Btrfs: make btrfs_abort_transaction consider existence of new block groups
  Btrfs: fix race between writing free space cache and trimming
  Btrfs: fix race between fs trimming and block group remove/allocation
  Btrfs, replace: enable dev-replace for raid56
  Btrfs: fix freeing used extents after removing empty block group
  Btrfs: fix crash caused by block group removal
  Btrfs: fix invalid block group rbtree access after bg is removed
  Btrfs, raid56: fix use-after-free problem in the final device replace procedure on raid56
  Btrfs, replace: write raid56 parity into the replace target device
  Btrfs, replace: write dirty pages into the replace target device
  Btrfs, raid56: support parity scrub on raid56
  Btrfs, raid56: use a variant to record the operation type
  Btrfs, scrub: repair the common data on RAID5/6 if it is corrupted
  Btrfs, raid56: don't change bbio and raid_map
  Btrfs: remove unnecessary code of stripe_index assignment in __btrfs_map_block
  Btrfs: remove noused bbio_ret in __btrfs_map_block in condition
  ...
2014-12-12 11:15:23 -08:00