Add devname:mapper/control and MAPPER_CTRL_MINOR module alias
to support dm-mod module autoloading.
Signed-off-by: Kay Sievers <kay.sievers@vrfy.org>
Signed-off-by: Peter Rajnoha <prajnoha@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
'target_request_nr' is a more generic name that reflects the fact that
it will be used for both flush and discard support.
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Determine whether a mapped device is bio-based or request-based when
loading its first (inactive) table and don't allow that to be changed
later.
This patch performs different device initialisation in each of the two
cases. (We don't think it's necessary to add code to support changing
between the two types.)
Allowed md->type transitions:
DM_TYPE_NONE to DM_TYPE_BIO_BASED
DM_TYPE_NONE to DM_TYPE_REQUEST_BASED
We now prevent table_load from replacing the inactive table with a
conflicting type of table even after an explicit table_clear.
Introduce 'type_lock' into the struct mapped_device to protect md->type
and to prepare for the next patch that will change the queue
initialization and allocate memory while md->type_lock is held.
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Acked-by: Kiyoshi Ueda <k-ueda@ct.jp.nec.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
drivers/md/dm-ioctl.c | 15 +++++++++++++++
drivers/md/dm.c | 37 ++++++++++++++++++++++++++++++-------
drivers/md/dm.h | 5 +++++
include/linux/dm-ioctl.h | 4 ++--
4 files changed, 52 insertions(+), 9 deletions(-)
nouveau starting using these APIs, the first on non-x86 hw, and this
include isn't required on anything with real amounts of vmalloc space.
this fixes a build problem on powerpc.
Signed-off-by: Dave Airlie <airlied@redhat.com>
We don't actually need this include on any platform.
built on powerpc + x86, reported on m68k.
Reported-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6:
isofs: Fix lseek() to position beyond 4 GB
vfs: remove unused MNT_STRICTATIME
vfs: show unreachable paths in getcwd and proc
vfs: only add " (deleted)" where necessary
vfs: add prepend_path() helper
vfs: __d_path: dont prepend the name of the root dentry
ia64: perfmon: add d_dname method
vfs: add helpers to get root and pwd
cachefiles: use path_get instead of lone dget
fs/sysv/super.c: add support for non-PDP11 v7 filesystems
V7: Adjust sanity checks for some volumes
Add v7 alias
v9fs: fixup for inode_setattr being removed
Manual merge to take Al's version of the fs/sysv/super.c file: it merged
cleanly, but Al had removed an unnecessary header include, so his side
was better.
Add multiplexed bus core support. I2C multiplexer and switches
like pca954x get instantiated as new adapters per port.
Signed-off-by: Michael Lawnick <ml.lawnick@gmx.de>
Acked-by: Rodolfo Giometti <giometti@linux.it>
Signed-off-by: Jean Delvare <khali@linux-fr.org>
Moving userspace-instantiated clients to separate lists wasn't nearly
enough to avoid deadlocks in multiplexed bus cases. We also want to
have a dedicated mutex to protect each list.
Signed-off-by: Jean Delvare <khali@linux-fr.org>
Cc: Michael Lawnick <ml.lawnick@gmx.de>
Uninline i2c adapter locking helper functions, move them to i2c-core,
and use them in i2c-core itself. The functions are still exported for
external users. This makes future updates to the locking model (which
will be needed for multiplexing support) possible and transparent.
Signed-off-by: Jean Delvare <khali@linux-fr.org>
Cc: Michael Lawnick <ml.lawnick@gmx.de>
Now that i2c-core offers the possibility to provide custom probing
function for I2C devices, let's make use of it.
Signed-off-by: Jean Delvare <khali@linux-fr.org>
Acked-by: Mauro Carvalho Chehab <mchehab@redhat.com>
The probe method used by i2c_new_probed_device() may not be suitable
for all cases. Let the caller provide its own, optional probe
function.
Signed-off-by: Jean Delvare <khali@linux-fr.org>
Acked-by: Mauro Carvalho Chehab <mchehab@redhat.com>
* 'for-linus' of master.kernel.org:/home/rmk/linux-2.6-arm: (226 commits)
ARM: 6323/1: cam60: don't use __init for cam60_spi_{flash_platform_data,partitions}
ARM: 6324/1: cam60: move cam60_spi_devices to .init.data
ARM: 6322/1: imx/pca100: Fix name of spi platform data
ARM: 6321/1: fix syntax error in main Kconfig file
ARM: 6297/1: move U300 timer to dynamic clock lookup
ARM: 6296/1: clock U300 intcon and timer properly
ARM: 6295/1: fix U300 apb_pclk split
ARM: 6306/1: fix inverted MMC card detect in U300
ARM: 6299/1: errata: TLBIASIDIS and TLBIMVAIS operations can broadcast a faulty ASID
ARM: 6294/1: etm: do a dummy read from OSSRR during initialization
ARM: 6292/1: coresight: add ETM management registers
ARM: 6288/1: ftrace: document mcount formats
ARM: 6287/1: ftrace: clean up mcount assembly indentation
ARM: 6286/1: fix Thumb-2 decompressor broken by "Auto calculate ZRELADDR"
ARM: 6281/1: video/imxfb.c: allow usage without BACKLIGHT_CLASS_DEVICE
ARM: 6280/1: imx: Fix build failure when including <mach/gpio.h> without <linux/spinlock.h>
ARM: S5PV210: Fix on missing s3c-sdhci card detection method for hsmmc3
ARM: S5P: Fix on missing S5P_DEV_FIMC in plat-s5p/Kconfig
ARM: S5PV210: Override FIMC driver name on Aquila board
ARM: S5PC100: enable FIMC on SMDKC100
...
Fix up conflicts in arch/arm/mach-{s5pc100,s5pv210}/cpu.c due to
different subsystem 'setname' calls, and trivial port types in
include/linux/serial_core.h
Simply replace the whole kfifo.c and kfifo.h files with the new generic
version and fix the kerneldoc API template file.
Signed-off-by: Stefani Seibold <stefani@seibold.net>
Cc: Greg KH <greg@kroah.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Add the new version of the kfifo API files kfifo.c and kfifo.h.
Signed-off-by: Stefani Seibold <stefani@seibold.net>
Cc: Greg KH <greg@kroah.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
For consistency with other kfifo routines, return bool, not int.
Signed-off-by: Robert P. J. Day <rpjday@crashcourse.ca>
Cc: Stefani Seibold <stefani@seibold.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Newly mkfs-ed filesystems from Seventh Edition have last modification time
set to zero, but are otherwise perfectly valid.
Also, tighten up other sanity checks to filter out most filesystems with
[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Lubomir Rintel <lkundrak@v3.sk>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
We are missing the oops end marker for the exception based WARN implementation
in lib/bug.c. This is useful for logfile analysis tools.
Signed-off-by: Anton Blanchard <anton@samba.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Arjan van de Ven <arjan@infradead.org>
Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
To keep panic_timeout accuracy when running under a hypervisor, the
current implementation only spins on long time (1 second) calls to mdelay.
That brings a good effect, but the problem is the keyboard LEDs don't
blink at all on that situation.
This patch changes to call to panic_blink_enter() between every mdelay and
keeps blinking in spite of long spin timer mode.
The time to call to mdelay is now 100ms. Even this change will keep
panic_timeout accuracy enough when running under a hypervisor.
Signed-off-by: TAMUKI Shoichi <tamuki@linet.gr.jp>
Cc: Ben Dooks <ben-linux@fluff.org>
Cc: Russell King <linux@arm.linux.org.uk>
Acked-by: Dmitry Torokhov <dtor@mail.ru>
Cc: Anton Blanchard <anton@samba.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Architectures implement dma_is_consistent() in different ways (some
misinterpret the definition of API in DMA-API.txt). So it hasn't been so
useful for drivers. We have only one user of the API in tree. Unlikely
out-of-tree drivers use the API.
Even if we fix dma_is_consistent() in some architectures, it doesn't look
useful at all. It was invented long ago for some old systems that can't
allocate coherent memory at all. It's better to export only APIs that are
definitely necessary for drivers.
Let's remove this API.
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: <linux-arch@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
dma_get_cache_alignment returns the minimum DMA alignment. Architectures
defines it as ARCH_DMA_MINALIGN (formally ARCH_KMALLOC_MINALIGN). So we
can unify dma_get_cache_alignment implementations.
Note that some architectures implement dma_get_cache_alignment wrongly.
dma_get_cache_alignment() should return the minimum DMA alignment. So
fully-coherent architectures should return 1. This patch also fixes this
issue.
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Cc: <linux-arch@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Now each architecture has the own dma_get_cache_alignment implementation.
dma_get_cache_alignment returns the minimum DMA alignment. Architectures
define it as ARCH_KMALLOC_MINALIGN (it's used to make sure that malloc'ed
buffer is DMA-safe; the buffer doesn't share a cache with the others). So
we can unify dma_get_cache_alignment implementations.
This patch:
dma_get_cache_alignment() needs to know if an architecture defines
ARCH_KMALLOC_MINALIGN or not (needs to know if architecture has DMA
alignment restriction). However, slab.h define ARCH_KMALLOC_MINALIGN if
architectures doesn't define it.
Let's rename ARCH_KMALLOC_MINALIGN to ARCH_DMA_MINALIGN.
ARCH_KMALLOC_MINALIGN is used only in the internals of slab/slob/slub
(except for crypto).
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Cc: <linux-arch@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
mem_cgroup_soft_limit_reclaim() has zone, nid and zid argument. but nid
and zid can be calculated from zone. So remove it.
Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Acked-by: Mel Gorman <mel@csn.ul.ie>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Nishimura Daisuke <d-nishimura@mtf.biglobe.ne.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Currently mem_cgroup_shrink_node_zone() call shrink_zone() directly. thus
it doesn't need to initialize sc.nodemask because shrink_zone() doesn't
use it at all.
Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Acked-by: Mel Gorman <mel@csn.ul.ie>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Nishimura Daisuke <d-nishimura@mtf.biglobe.ne.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
When the OOM killer scans task, it check a task is under memcg or
not when it's called via memcg's context.
But, as Oleg pointed out, a thread group leader may have NULL ->mm
and task_in_mem_cgroup() may do wrong decision. We have to use
find_lock_task_mm() in memcg as generic OOM-Killer does.
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Cc: Balbir Singh <balbir@linux.vnet.ibm.com>
Reviewed-by: Minchan Kim <minchan.kim@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The gpios on the max730x chips have support for internal pullups while in
input mode.
This patch adds support for configuring these pullups via platform data.
A new member ("input_pullup_active") to the platform data struct is
introduced. A set bit in this variable activates the pullups while the
respective port is in input mode. This is a compatible enhancement since
unset bits lead to disables pullups which was the default in the original
driver.
_Note_: the 4 lowest bits in "input_pullup_active" are unused because the
first 4 ports of the controller are not used, too.
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Reviewed-by: Wolfram Sang <w.sang@pengutronix.de>
Cc: David Brownell <david-b@pacbell.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
There are some chips (like TI WL12xx series) that can be interfaced over
SDIO but don't support the SDIO specification, meaning that they are
missing CIA (Common I/O Area) with all it's registers. Current Linux SDIO
implementation relies on those registers to identify and configure the
card, so non-standard cards can not function and cause lots of warnings
from the core when it reads invalid data from non-existent registers.
After this patch, init_card() host callback can now set new quirk
MMC_QUIRK_NONSTD_SDIO, which means that SDIO core should not try to access
any standard SDIO registers and rely on init_card() to fill all SDIO
structures instead. As those cards are usually embedded chips, all the
required information can be obtained from machine board files by the host
driver when it's called through init_card() callback.
Signed-off-by: Grazvydas Ignotas <notasas@gmail.com>
Cc: Adrian Hunter <adrian.hunter@nokia.com>
Cc: Tony Lindgren <tony@atomide.com>
Cc: Bob Copeland <me@bobcopeland.com>
Cc: Kalle Valo <kvalo@adurom.com>
Cc: Madhusudhan Chikkature <madhu.cr@ti.com>
Cc: Kishore Kadiyala <kishore.kadiyala@ti.com>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: <linux-mmc@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
If you don't use CONFIG_MMC_UNSAFE_RESUME, as soon as you attempt to
suspend, the card will be removed, therefore this patch doesn't change the
behavior of this option.
However the removal will be done by pm notifier, which runs while
userspace is still not frozen and thus can freely use del_gendisk, without
the risk of deadlock which would happen otherwise.
Card detect workqueue is now disabled while userspace is frozen, Therefore
if you do use CONFIG_MMC_UNSAFE_RESUME, and remove the card during
suspend, the removal will be detected as soon as userspace is unfrozen,
again at the moment it is safe to call del_gendisk.
Tested with and without CONFIG_MMC_UNSAFE_RESUME with suspend and hibernate.
[akpm@linux-foundation.org: clean up function prototype]
[akpm@linux-foundation.org: fix CONFIG_PM-n linkage, small cleanups]
[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Maxim Levitsky <maximlevitsky@gmail.com>
Cc: David Brownell <david-b@pacbell.net>
Cc: Alan Stern <stern@rowland.harvard.edu>
Cc: <linux-mmc@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The eMMC spec 4.4 and 4.3 + additional feature chips has CSD structure
version 3 and version 3 have to check the CSD_STRUCTURE byte in the
EXT_CSD register.
Also fix EXT_CSD revision message.
[akpm@linux-foundation.org: fix comment, per Chris Ball]
Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
Cc: Adrian Hunter <adrian.hunter@nokia.com>
Cc: Chris Ball <cjb@laptop.org>
Cc: <linux-mmc@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The O_* bit numbers are defined in 20+ arch/*, and can silently overlap.
Add a compile time check to ensure the uniqueness as suggested by David
Miller.
Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
Cc: David Miller <davem@davemloft.net>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Eric Paris <eparis@redhat.com>
Cc: Roland Dreier <rdreier@cisco.com>
Cc: Jamie Lokier <jamie@shareable.org>
Cc: Andreas Schwab <schwab@linux-m68k.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Add <linux/types.h> to <linux/virtio_9p.h> so that types are explicitly
defined:
linux/virtio_9p.h:15: found __[us]{8,16,32,64} type without #include <linux/types.h>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Cc: Eric Van Hensbergen <ericvh@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Commit 9e4f5e29 ("FC Pass Thru support") exported a number of header files
in include/scsi to user space, but didn't change the uX types to the
userspace-compatible __uX types. Without that you'll get compile errors
when including them - E.G.:
include/scsi/scsi.h:145: error: expected specifier-qualifier-list before `u8'
Signed-off-by: Peter Korsgaard <jacmet@sunsite.dk>
Cc: Boaz Harrosh <bharrosh@panasas.com>
Cc: James Smart <james.smart@emulex.com>
Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
One straggler which was missed due to merge ordering issues.
Cc: Wu Fengguang <fengguang.wu@intel.com>
Cc: Eric Paris <eparis@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
gcc allows this when arg is a function, but sparse complains:
drivers/char/ipmi/ipmi_watchdog.c:303:1: error: cannot dereference this type
drivers/char/ipmi/ipmi_watchdog.c:307:1: error: cannot dereference this type
drivers/char/ipmi/ipmi_watchdog.c:311:1: error: cannot dereference this type
Reported-by: Randy Dunlap <randy.dunlap@oracle.com>
Tested-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Also reorders the macros with the most common ones at the top.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Reviewed-by: Takashi Iwai <tiwai@suse.de>
Tested-by: Phil Carmody <ext-phil.2.carmody@nokia.com>
There may be cases (most obviously, sysfs-writable charp parameters) where
a module needs to prevent sysfs access to parameters.
Rather than express this in terms of a big lock, the functions are
expressed in terms of what they protect against. This is clearer, esp.
if the implementation changes to a module-level or even param-level lock.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Reviewed-by: Takashi Iwai <tiwai@suse.de>
Tested-by: Phil Carmody <ext-phil.2.carmody@nokia.com>
Since this section can be read-only (they're in .rodata), they should
always have been const. Minor flow-through various functions.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Tested-by: Phil Carmody <ext-phil.2.carmody@nokia.com>
This allows us to generalize the KPARAM_KMALLOCED flag, by calling a function
on every parameter when a module is unloaded.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Reviewed-by: Takashi Iwai <tiwai@suse.de>
Tested-by: Phil Carmody <ext-phil.2.carmody@nokia.com>
This is more kernel-ish, saves some space, and also allows us to
expand the ops without breaking all the callers who are happy for the
new members to be NULL.
The few places which defined their own param types are changed to the
new scheme (more which crept in recently fixed in following patches).
Since we're touching them anyway, we change get() and set() to take a
const struct kernel_param (which they really are). This causes some
harmless warnings until we fix them (in following patches).
To reduce churn, module_param_call creates the ops struct so the callers
don't have to change (and casts the functions to reduce warnings).
The modern version which takes an ops struct is called module_param_cb.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Reviewed-by: Takashi Iwai <tiwai@suse.de>
Tested-by: Phil Carmody <ext-phil.2.carmody@nokia.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Ville Syrjala <syrjala@sci.fi>
Cc: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Cc: Alessandro Rubini <rubini@ipvvis.unipv.it>
Cc: Michal Januszewski <spock@gentoo.org>
Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: "J. Bruce Fields" <bfields@fieldses.org>
Cc: Neil Brown <neilb@suse.de>
Cc: linux-kernel@vger.kernel.org
Cc: linux-input@vger.kernel.org
Cc: linux-fbdev-devel@lists.sourceforge.net
Cc: linux-nfs@vger.kernel.org
Cc: netdev@vger.kernel.org
Acked-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
This patch adds voltage regulator driver for Maxim 8998 chip. This chip
is used on Samsung Aquila and GONI boards and provides following
functionalities:
- 4 BUCK voltage converters, 17 LDO power regulators and 5 other power
controllers
- battery charger
This patch adds basic driver for voltage regulators and MAX 8998 MFD core.
Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
Acked-by: Samuel Ortiz <sameo@linux.intel.com>
Acked-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
Signed-off-by: Liam Girdwood <lrg@slimlogic.co.uk>
If error hugepage is not in-use, we can fully recovery from error
by dequeuing it from freelist, so return RECOVERY.
Otherwise whether or not we can recovery depends on user processes,
so return DELAYED.
Dependency:
"HWPOISON, hugetlb: enable error handling path for hugepage"
Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Andi Kleen <ak@linux.intel.com>
This patch adds reverse mapping feature for hugepage by introducing
mapcount for shared/private-mapped hugepage and anon_vma for
private-mapped hugepage.
While hugepage is not currently swappable, reverse mapping can be useful
for memory error handler.
Without this patch, memory error handler cannot identify processes
using the bad hugepage nor unmap it from them. That is:
- for shared hugepage:
we can collect processes using a hugepage through pagecache,
but can not unmap the hugepage because of the lack of mapcount.
- for privately mapped hugepage:
we can neither collect processes nor unmap the hugepage.
This patch solves these problems.
This patch include the bug fix given by commit 23be7468e8, so reverts it.
Dependency:
"hugetlb: move definition of is_vm_hugetlb_page() to hugepage_inline.h"
ChangeLog since May 24.
- create hugetlb_inline.h and move is_vm_hugetlb_index() in it.
- move functions setting up anon_vma for hugepage into mm/rmap.c.
ChangeLog since May 13.
- rebased to 2.6.34
- fix logic error (in case that private mapping and shared mapping coexist)
- move is_vm_hugetlb_page() into include/linux/mm.h to use this function
from linear_page_index()
- define and use linear_hugepage_index() instead of compound_order()
- use page_move_anon_rmap() in hugetlb_cow()
- copy exclusive switch of __set_page_anon_rmap() into hugepage counterpart.
- revert commit 24be7468 completely
Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Larry Woodman <lwoodman@redhat.com>
Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
Acked-by: Fengguang Wu <fengguang.wu@intel.com>
Acked-by: Mel Gorman <mel@csn.ul.ie>
Signed-off-by: Andi Kleen <ak@linux.intel.com>
is_vm_hugetlb_page() is a widely used inline function to insert hooks
into hugetlb code.
But we can't use it in pagemap.h because of circular dependency of
the header files. This patch removes this limitation.
Acked-by: Mel Gorman <mel@csn.ul.ie>
Acked-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Commit d0adde574b added MNT_STRICTATIME
but it isn't actually used (MS_STRICTATIME clears MNT_RELATIME and
MNT_NOATIME rather than setting any mount flag).
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Prepend "(unreachable)" to path strings if the path is not reachable
from the current root.
Two places updated are
- the return string from getcwd()
- and symlinks under /proc/$PID.
Other uses of d_path() are left unchanged (we know that some old
software crashes if /proc/mounts is changed).
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Add three helpers that retrieve a refcounted copy of the root and cwd
from the supplied fs_struct.
get_fs_root()
get_fs_pwd()
get_fs_root_and_pwd()
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Newly mkfs-ed filesystems from Seventh Edition have last modification
time set to zero, but are otherwise perfectly valid.
Also, tighten up other sanity checks to filter out most filesystems with
different bytesex than we're using.
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Lubomir Rintel <lkundrak@v3.sk>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
"netpoll: Use 'bool' for netpoll_rx() return type." missed the case when
CONFIG_NETPOLL is disabled.
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fix kernel-doc warnings in linux/i2c.h:
Warning(include/linux/i2c.h:176): No description found for parameter 'alert'
Warning(include/linux/i2c.h:259): No description found for parameter 'of_node'
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* 'for-2.6.36' of git://git.kernel.dk/linux-2.6-block: (149 commits)
block: make sure that REQ_* types are seen even with CONFIG_BLOCK=n
xen-blkfront: fix missing out label
blkdev: fix blkdev_issue_zeroout return value
block: update request stacking methods to support discards
block: fix missing export of blk_types.h
writeback: fix bad _bh spinlock nesting
drbd: revert "delay probes", feature is being re-implemented differently
drbd: Initialize all members of sync_conf to their defaults [Bugz 315]
drbd: Disable delay probes for the upcomming release
writeback: cleanup bdi_register
writeback: add new tracepoints
writeback: remove unnecessary init_timer call
writeback: optimize periodic bdi thread wakeups
writeback: prevent unnecessary bdi threads wakeups
writeback: move bdi threads exiting logic to the forker thread
writeback: restructure bdi forker loop a little
writeback: move last_active to bdi
writeback: do not remove bdi from bdi_list
writeback: simplify bdi code a little
writeback: do not lose wake-ups in bdi threads
...
Fixed up pretty trivial conflicts in drivers/block/virtio_blk.c and
drivers/scsi/scsi_error.c as per Jens.
* 'v4l_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-2.6: (94 commits)
V4L/DVB: tvp7002: fix write to H-PLL Feedback Divider LSB register
V4L/DVB: dvb: siano: free spinlock before schedule()
V4L/DVB: media: video: pvrusb2: remove custom hex_to_bin()
V4L/DVB: drivers: usbvideo: remove custom implementation of hex_to_bin()
V4L/DVB: Report supported QAM modes on bt8xx
V4L/DVB: media: ir-keytable: null dereference in debug code
V4L/DVB: ivtv: convert to the new control framework
V4L/DVB: ivtv: convert gpio subdev to new control framework
V4L/DVB: wm8739: convert to the new control framework
V4L/DVB: cs53l32a: convert to new control framework
V4L/DVB: wm8775: convert to the new control framework
V4L/DVB: cx2341x: convert to the control framework
V4L/DVB: cx25840: convert to the new control framework
V4L/DVB: cx25840/ivtv: replace ugly priv control with s_config
V4L/DVB: saa717x: convert to the new control framework
V4L/DVB: msp3400: convert to the new control framework
V4L/DVB: saa7115: convert to the new control framework
V4L/DVB: v4l2: hook up the new control framework into the core framework
V4L/DVB: Documentation: add v4l2-controls.txt documenting the new controls API
V4L/DVB: v4l2-ctrls: Whitespace cleanups
...
* git://git.kernel.org/pub/scm/linux/kernel/git/wim/linux-2.6-watchdog:
watchdog: hpwdt: formatting of pointers in printk()
watchdog: Adding support for ARM Primecell SP805 Watchdog
watchdog: f71808e_wdt: new watchdog driver for Fintek F71808E and F71882FG
watchdog: sch311x_wdt.c: set parent before registeriing the misc device in probe() function
watchdog: wdt_pci.c: move ids to pci_ids.h
watchdog: s3c2410_wdt - Fix removing of platform device
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
Input: xpad - add USB-ID for PL-3601 Xbox 360 pad
Input: cy8ctmg100_ts - signedness bug
Input: elantech - report position also with 3 fingers
Input: elantech - discard the first 2 positions on some firmwares
Input: adxl34x - do not mark device as disabled on startup
Input: gpio_keys - add hooks to enable/disable device
Input: evdev - rearrange ioctl handling
Input: dynamically allocate ABS information
Input: switch to input_abs_*() access functions
Input: add static inline accessors for ABS properties
* git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty-2.6: (68 commits)
U6715 16550A serial driver support
Char: nozomi, set tty->driver_data appropriately
Char: nozomi, fix tty->count counting
serial: max3107: Fix gpiolib support
hsu: call PCI pm hooks in suspend/resume function
hsu: some code cleanup
hsu: add a periodic timer to check dma rx channel
hsu: driver for Medfield High Speed UART device
mxser: remove unnesesary NULL check
serial: add support for OX16PCI958 card
serial: 68328serial.c: remove dead (ALMA_ANS | DRAGONIXVZ | M68EZ328ADS)
timbuart: use __devinit and __devexit macros for probe and remove
serial: MMIO32 support for 8250_early.c
serial: mcf: don't take spinlocks in already protected functions
serial: general fixes in the serial_rs485 structure
serial: fix missing bit coverage of ASYNC_FLAGS
serial: "altera_uart: simplify altera_uart_console_putc()" checkpatch fixes
serial: crisv10: formatting of pointers in printk()
vt: Fix warning: statement with no effect due to vt_kern.h
tty_io: remove casts from void*
...
Fix kernel-doc warnings in linux/usb.h:
Warning(include/linux/usb.h:185): No description found for parameter 'resetting_device'
Warning(include/linux/usb.h:1212): No description found for parameter 'stream_id'
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
The Logitech Harmony 700 series needs an extra delay during
initialization. This patch adds a USB quirk which enables such a delay
and adds the device to the quirks list.
Signed-off-by: Phil Dibowitz <phil@ipom.com>
Cc: stable <stable@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
1) Introduce ulpi specific flags for control of the ulpi phy
2) Extend the generic ulpi driver with support for Function and
Interface control of upli phy
3) Update the platforms using the generic ulpi driver with new ulpi
flags
4) Remove the otg control flags not in use
Signed-off-by: Igor Grinberg <grinberg@compulab.co.il>
Signed-off-by: Mike Rapoport <mike@compulab.co.il>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Fixes below compilation warning from ulpi.h
include/linux/usb/ulpi.h:145:
warning: 'struct otg_io_access_ops' declared inside parameter list
include/linux/usb/ulpi.h:145:
warning: its scope is only this definition or declaration,
which is probably not what you want
Signed-off-by: Ajay Kumar Gupta <ajay.gupta@ti.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
This patch (as1395) adds code to hcd_pci_suspend() for handling wakeup
races. This is another general race pattern, similar to the "open
vs. unregister" race we're all familiar with. Here, the race is
between suspending a device and receiving a wakeup request from one of
the device's suspended children.
In particular, if a root-hub wakeup is requested at about the same
time as the corresponding USB controller is suspended, and if the
controller is enabled for wakeup, then the controller should either
fail to suspend or else wake right back up again.
During system sleep this won't happen very much, especially since host
controllers generally aren't enabled for wakeup during sleep. However
it is definitely an issue for runtime PM. Something like this will be
needed to prevent the controller from autosuspending while waiting for
a root-hub resume to take place. (That is, in fact, the common case,
for which there is an extra test.)
Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
This patch (as1385) adds a "do_wakeup" parameter to the pci_suspend
method used by PCI-based host controller drivers. ehci-hcd in
particular needs to know whether or not to enable wakeup when
suspending a controller. Although that information is currently
available through device_may_wakeup(), when support is added for
runtime suspend this will no longer be true.
Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
This patch (as1393) converts several of the single-bit fields in
struct usb_hcd to atomic flags. This is for safety's sake; not all
CPUs can update bitfield values atomically, and these flags are used
in multiple contexts.
The flag fields that are set only during registration or removal can
remain as they are, since non-atomic accesses at those times will not
cause any problems.
(Strictly speaking, the authorized_default flag should become atomic
as well. I didn't bother with it because it gets changed only via
sysfs. It can be done later, if anyone wants.)
Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Added a disconnect() callback to composite devices which
is called by composite glue when its disconnect callback
is called by gadget.
Signed-off-by: Michal Nazarewicz <m.nazarewicz@samsung.com>
Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
Acked-by: David Brownell <dbrownell@users.sourceforge.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
usb_string_ids_tab() and usb_string_ids_n() functions added to
the composite framework. The first accepts an array of
usb_string object and for each registeres a string id and the
second registeres a given number of ids and returns the first.
This may simplify string ids registration since gadgets and
composite functions won't have to call usb_string_id() several
times and each time check for errer status -- all this will be
done with a single call.
Signed-off-by: Michal Nazarewicz <m.nazarewicz@samsung.com>
Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
FunctionFS had a bit unique name for function used to add it
to USB configuration. Renamed as to match naming convention
of other functions.
Signed-off-by: Michal Nazarewicz <m.nazarewicz@samsung.com>
Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
And audit all the users. None needed the BKL. That was easy
because there was only very few around.
Tested with allmodconfig build on x86-64
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Cc: Arnd Bergmann <arnd@arndb.de>
From: Andi Kleen <ak@linux.intel.com>
With this patch, the LPM capable EHCI host controller can put device
into L1 sleep state which is a mode that can enter/exit quickly, and
reduce power consumption.
Signed-off-by: Jacob Pan <jacob.jun.pan@intel.com>
Signed-off-by: Alek Du <alek.du@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
EHCI 1.1 addendum introduced several energy efficiency extensions for
EHCI USB host controllers:
1. LPM (link power management)
2. Per-port change
3. Shorter periodic frame list
4. Hardware prefetching
This patch is intended to define the HW bits and debug interface for
EHCI 1.1 addendum. The LPM and Per-port change patches will be sent out
after this patch.
Signed-off-by: Jacob Pan <jacob.jun.pan@intel.com>
Signed-off-by: Alek Du <alek.du@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
otg_io_write() function does not follow the declaration of
struct otg_io_access_ops.
Signed-off-by: Igor Grinberg <grinberg@compulab.co.il>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
This patch (as1390) fixes a problem that crops up when a UHCI host
controller is unbound from uhci-hcd while there are still some active
URBs. The URBs have to be unlinked when the root hub is unregistered,
and uhci-hcd relies upon root-hub status polls as part of its
unlinking procedure. But usb_hcd_poll_rh_status() won't make those
status calls if hcd->rh_registered is clear, and the flag is cleared
_before_ the unregistration takes place.
Since hcd->rh_registered is used for other things and needs to be
cleared early, the solution is to add a new flag (rh_pollable) and use
it instead. It gets cleared _after_ the root hub is unregistered.
Now that the status polls don't end too soon, we have to make sure
they also don't occur too late -- after the root hub's usb_device
structure or the HCD's private structures are deallocated. Therefore
the patch adds usb_get_device() and usb_put_device() calls to protect
the root hub structure, and it adds an extra del_timer_sync() to
prevent the root-hub timer from causing an unexpected status poll.
This additional complexity would not be needed if the HCD framework
had provided separate stop() and release() callbacks instead of just
stop(). This lack could be fixed at some future time (although it
would require changes to every host controller driver); when that
happens this patch won't be needed any more.
Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
UART Features extract from STEricsson U6715 data-sheet (arm926 SoC for mobile phone):
* Fully compatible with industry standard 16C550 and 16C450 from various
manufacturers
* RX and TX 64 byte FIFO reduces CPU interrupts
* Full double buffering
* Modem control signals include CTS, RTS, (and DSR, DTR on UART1 only)
* Automatic baud rate selection
* Manual or automatic RTS/CTS smart hardware flow control
* Programmable serial characteristics:
– Baud rate generation (50 to 3.25M baud)
– 5, 6, 7 or 8-bit characters
– Even, odd or no-parity bit generation and detection
– 1, 1.5 or 2 stop bit generation
* Independent control of transmit, receive, line status, data set interrupts and FIFOs
* Full status-reporting capabilities
* Separate DMA signaling for RX and TX
* Timed interrupt to spread receive interrupt on known duration
* DMA time-out interrupt to allow detection of end of reception
* Carkit pulse coding and decoding compliant with USB carkit control interface [40]
In 16550A auto-configuration, if the fifo size is 64 then it's an U6 16550A port
Add set_termios hook & export serial8250_do_set_termios to change uart
clock following baudrate
Signed-off-by: Philippe Langlais <philippe.langlais@stericsson.com>
Acked-by: Alan Cox <alan@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
This is a PCI & UART driver, which suppors both PIO and DMA mode
UART operation. It has 3 identical UART ports and one internal
DMA controller.
Current FW will export 4 pci devices for hsu: 3 uart ports and 1
dma controller, each has one IRQ line. And we need to discuss the
device model, one PCI device covering whole HSU should be a better
model, but there is a problem of how to export the 4 IRQs info
Current driver set the highest baud rate to 2746800bps, which is
easy to scale down to 115200/230400.... To suport higher baud rate,
we need add special process, change DLAB/DLH/PS/DIV/MUL registers
all together.
921600 is the highest baud rate that has been tested with Bluetooth
modem connected to HSU port 0. Will test more when there is right
BT firmware.
Current version contains several work around for A0's Silicon bugs
Signed-off-by: Feng Tang <feng.tang@intel.com>
Signed-off-by: Alan Cox <alan@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Fix several issues related to the RS485 interface:
- It adds the flag SER_RS485_RTS_BEFORE_SEND that was missing from the
serial_rs485 structure (even if "delay_rts_before_send" was existing)
- It adds a further "delay_rts_after_send" field for those drivers that
can have a delay after send (e.g., atmel_serial)
- It fixes the usage of the structure in the atmel_serial driver (where
"delay_rts_before_send" should be used instead of "delay_rts_after_send").
Signed-off-by: Claudio Scordino <claudio@evidence.eu.com>
Signed-off-by: Bernhard Roth <br@pwrnet.de>
Cc: Philippe De Muyter <phdm@macqel.be>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
It seems that currently ASYNC_FLAGS is one bit short of covering all the
bits of the ASYNC user flags. In particular it does not cover the
ASYNC_AUTOPROBE bit.
ASYNCB_LAST_USER and ASYNCB_AUTOPROBE are both equal to 15.
Therefore:
ASYNC_AUTOPROBE = 1000 0000 0000 0000
ASYNC_FLAGS = 0111 1111 1111 1111
So ASYNC_FLAGS is not covering the ASYNC_AUTOPROBE bit.
This patch fixes the issue and with the patch the values will be:
ASYNC_AUTOPROBE = 1000 0000 0000 0000
ASYNC_FLAGS = 1111 1111 1111 1111
As a side note, doing a "git grep" I didn't find any use of
ASYNC_AUTOPROBE or ASYNCB_AUTOPROBE in the kernel, besides this include
file.
Signed-off-by: John Villalovos <john.l.villalovos@intel.com>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Using:
gcc (GCC) 4.5.0 20100610 (prerelease)
with CONFIG_CONSOLE_TRANSLATIONS=n, the following warnings are seen:
drivers/char/vt_ioctl.c: In function ‘vt_ioctl’:
drivers/char/vt_ioctl.c:1309:4: warning: statement with no effect
drivers/char/vt.c: In function ‘vc_allocate’:
drivers/char/vt.c:774:3: warning: statement with no effect
drivers/video/console/vgacon.c: In function ‘vgacon_init’:
drivers/video/console/vgacon.c:587:3: warning: statement with no effect
drivers/video/console/vgacon.c: In function ‘vgacon_deinit’:
drivers/video/console/vgacon.c:606:2: warning: statement with no effect
drivers/video/console/fbcon.c: In function ‘fbcon_init’:
drivers/video/console/fbcon.c:1087:3: warning: statement with no effect
drivers/video/console/fbcon.c:1089:3: warning: statement with no effect
drivers/video/console/fbcon.c: In function ‘fbcon_set_disp’:
drivers/video/console/fbcon.c:1369:3: warning: statement with no effect
drivers/video/console/fbcon.c:1371:3: warning: statement with no effect
This is because several functions in include/linux/vt_kern.h are
defined to (0). Convert them to static inline functions to
silence the warnings and gain a bit of type safety.
Signed-off-by: Kevin Winchester <kjwinchester@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
At the moment there is only one platform type supported and there is is
hard wired, but with these changes the infrastructure is now there for
anyone else to provide methods for their hardware.
Signed-off-by: Alan Cox <alan@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
The tty locking now follows the rules for mutexes, so
we can replace the BKL usage with a new subsystem
wide mutex.
Using a regular mutex here will change the behaviour
when blocked on the BTM from spinning to sleeping,
but that should not be visible to the user.
Using the mutex also means that all the BTM is now
covered by lockdep.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
This changes all remaining users of tty_lock_nested
to be non-recursive, which lets us kill this function.
As a consequence, we won't need to keep the lock count
any more, which allows more simplifications later.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Calling wait_event_interruptible implicitly
releases the BKL when it sleeps, but we need
to do this explcitly when we have converted
it to a mutex.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
As a preparation for replacing the big kernel lock
in the TTY layer, wrap all the callers in new
macros tty_lock, tty_lock_nested and tty_unlock.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
This takes all the tty references through the expected interface points so
we can refcount them.
Signed-off-by: Alan Cox <alan@linux.intel.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
The vt layer isn't safely handling reference counts to tty object on the input
side. Add a tty port structure to the vt layer in order to implement this using
the standard helpers.
Signed-off-by: Alan Cox <alan@linux.intel.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Pass down the ldisc number so that the drivers don't have to peek into the
tty object themselves. This lets us get rid of another case of back referencing
port to tty which we don't want (because of races versus hangup/close).
Signed-off-by: Alan Cox <alan@linux.intel.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
This lets us avoid problems with races on the flag changes
Signed-off-by: Alan Cox <alan@linux.intel.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Jesse's initial patch commit said:
"At panic time (i.e. when oops_in_progress is set) we should try a bit
harder to update the screen and make sure output gets to the VT, since
some drivers are capable of flipping back to it.
So make sure we try to unblank and update the display if called from a
panic context."
I've enhanced this to add a flag to the vc that console layer can set to
indicate they want this behaviour to occur. This also adds support to
fbcon for that flag and adds an fb flag for drivers to indicate they want
to use the support. It enables this for KMS drivers.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Acked-by: James Simmons <jsimmons@infradead.org>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
This patch is against the 2.6.34 source.
Paraphrased from the 1989 BSD patch by David Borman @ cray.com:
These are the changes needed for the kernel to support
LINEMODE in the server.
There is a new bit in the termios local flag word, EXTPROC.
When this bit is set, several aspects of the terminal driver
are disabled. Input line editing, character echo, and mapping
of signals are all disabled. This allows the telnetd to turn
off these functions when in linemode, but still keep track of
what state the user wants the terminal to be in.
New ioctl:
TIOCSIG Generate a signal to processes in the
current process group of the pty.
There is a new mode for packet driver, the TIOCPKT_IOCTL bit.
When packet mode is turned on in the pty, and the EXTPROC bit
is set, then whenever the state of the pty is changed, the
next read on the master side of the pty will have the TIOCPKT_IOCTL
bit set. This allows the process on the server side of the pty
to know when the state of the terminal has changed; it can then
issue the appropriate ioctl to retrieve the new state.
Since the original BSD patches accompanied the source code for telnet
I've left that reference here, but obviously the feature is useful for
any remote terminal protocol, including ssh.
The corresponding feature has existed in the BSD tty driver since 1989.
For historical reference, a good copy of the relevant files can be found
here:
http://anonsvn.mit.edu/viewvc/krb5/trunk/src/appl/telnet/?pathrev=17741
Signed-off-by: Howard Chu <hyc@symas.com>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Remove Hayes ESP ioctls
The Hayes ESP driver has been removed from the tree:
commit f53a2ade0b
("tty: esp: remove broken driver")
so its ioctls aren't needed any more.
Signed-off-by: Jeff Dike <jdike@linux.intel.com>
Acked-by: Alan Cox <alan@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
* 'writable_limits' of git://decibel.fi.muni.cz/~xslaby/linux:
unistd: add __NR_prlimit64 syscall numbers
rlimits: implement prlimit64 syscall
rlimits: switch more rlimit syscalls to do_prlimit
rlimits: redo do_setrlimit to more generic do_prlimit
rlimits: add rlimit64 structure
rlimits: do security check under task_lock
rlimits: allow setrlimit to non-current tasks
rlimits: split sys_setrlimit
rlimits: selinux, do rlimits changes under task_lock
rlimits: make sure ->rlim_max never grows in sys_setrlimit
rlimits: add task_struct to update_rlimit_cpu
rlimits: security, add task_struct to setrlimit
Fix up various system call number conflicts. We not only added fanotify
system calls in the meantime, but asm-generic/unistd.h added a wait4
along with a range of reserved per-architecture system calls.
* git://git.infradead.org/mtd-2.6: (79 commits)
mtd: Remove obsolete <mtd/compatmac.h> include
mtd: Update copyright notices
jffs2: Update copyright notices
mtd-physmap: add support users can assign the probe type in board files
mtd: remove redwood map driver
mxc_nand: Add v3 (i.MX51) Support
mxc_nand: support 8bit ecc
mxc_nand: fix correct_data function
mxc_nand: add V1_V2 namespace to registers
mxc_nand: factor out a check_int function
mxc_nand: make some internally used functions overwriteable
mxc_nand: rework get_dev_status
mxc_nand: remove 0xe00 offset from registers
mtd: denali: Add multi connected NAND support
mtd: denali: Remove set_ecc_config function
mtd: denali: Remove unuseful code in get_xx_nand_para functions
mtd: denali: Remove device_info_tag structure
mtd: m25p80: add support for the Winbond W25Q32 SPI flash chip
mtd: m25p80: add support for the Intel/Numonyx {16,32,64}0S33B SPI flash chips
mtd: m25p80: add support for the EON EN25P{32, 64} SPI flash chips
...
Fix up trivial conflicts in drivers/mtd/maps/{Kconfig,redwood.c} due to
redwood driver removal.
* 'for-linus' of git://git.infradead.org/users/eparis/notify: (132 commits)
fanotify: use both marks when possible
fsnotify: pass both the vfsmount mark and inode mark
fsnotify: walk the inode and vfsmount lists simultaneously
fsnotify: rework ignored mark flushing
fsnotify: remove global fsnotify groups lists
fsnotify: remove group->mask
fsnotify: remove the global masks
fsnotify: cleanup should_send_event
fanotify: use the mark in handler functions
audit: use the mark in handler functions
dnotify: use the mark in handler functions
inotify: use the mark in handler functions
fsnotify: send fsnotify_mark to groups in event handling functions
fsnotify: Exchange list heads instead of moving elements
fsnotify: srcu to protect read side of inode and vfsmount locks
fsnotify: use an explicit flag to indicate fsnotify_destroy_mark has been called
fsnotify: use _rcu functions for mark list traversal
fsnotify: place marks on object in order of group memory address
vfs/fsnotify: fsnotify_close can delay the final work in fput
fsnotify: store struct file not struct path
...
Fix up trivial delete/modify conflict in fs/notify/inotify/inotify.c.
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6: (96 commits)
no need for list_for_each_entry_safe()/resetting with superblock list
Fix sget() race with failing mount
vfs: don't hold s_umount over close_bdev_exclusive() call
sysv: do not mark superblock dirty on remount
sysv: do not mark superblock dirty on mount
btrfs: remove junk sb_dirt change
BFS: clean up the superblock usage
AFFS: wait for sb synchronization when needed
AFFS: clean up dirty flag usage
cifs: truncate fallout
mbcache: fix shrinker function return value
mbcache: Remove unused features
add f_flags to struct statfs(64)
pass a struct path to vfs_statfs
update VFS documentation for method changes.
All filesystems that need invalidate_inode_buffers() are doing that explicitly
convert remaining ->clear_inode() to ->evict_inode()
Make ->drop_inode() just return whether inode needs to be dropped
fs/inode.c:clear_inode() is gone
fs/inode.c:evict() doesn't care about delete vs. non-delete paths now
...
Fix up trivial conflicts in fs/nilfs2/super.c
These form the basis of the basic WRITE etc primitives, so we
need them to be always visible. Otherwise we see errors like:
mm/filemap.c:2164: error: 'REQ_WRITE' undeclared
fs/read_write.c:362: error: 'REQ_WRITE' undeclared
fs/splice.c:1108: error: 'REQ_WRITE' undeclared
fs/aio.c:1496: error: 'REQ_WRITE' undeclared
Reported-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
The previous value of 672 for L2CAP_DEFAULT_MAX_PDU_SIZE is based on
the default L2CAP MTU. That default MTU is calculated from the size
of two DH5 packets, minus ACL and L2CAP b-frame header overhead.
ERTM is used with newer basebands that typically support larger 3-DH5
packets, and i-frames and s-frames have more header overhead. With
clean RF conditions, basebands will typically attempt to use 1021-byte
3-DH5 packets for maximum throughput. Adjusting for 2 bytes of ACL
headers plus 10 bytes of worst-case L2CAP headers yields 1009 bytes
of payload.
This PDU size imposes less overhead for header bytes and gives the
baseband the option to choose 3-DH5 packets, but is small enough for
ERTM traffic to interleave well with other L2CAP or SCO data.
672-byte payloads do not allow the most efficient over-the-air
packet choice, and cannot achieve maximum throughput over BR/EDR.
Signed-off-by: Mat Martineau <mathewm@codeaurora.org>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
The L2CAP specification requires that the ERTM retransmit timeout be at
least 2 seconds for BR/EDR connections.
Signed-off-by: Mat Martineau <mathewm@codeaurora.org>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Add missing kernel-doc notation to struct sock:
Warning(include/net/sock.h:324): No description found for parameter 'sk_peer_pid'
Warning(include/net/sock.h:324): No description found for parameter 'sk_peer_cred'
Warning(include/net/sock.h:324): No description found for parameter 'sk_classid'
Warning(include/net/sock.h:324): Excess struct/union/enum/typedef member 'sk_peercred' description in 'sock'
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fix etherdevice.h parameter name typo in kernel-doc:
Warning(include/linux/etherdevice.h:138): No description found for parameter 'hwaddr'
Warning(include/linux/etherdevice.h:138): Excess function parameter 'addr' description in 'dev_hw_addr_random'
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (59 commits)
igbvf.txt: Add igbvf Documentation
igb.txt: Add igb documentation
e100/e1000*/igb*/ixgb*: Add missing read memory barrier
ixgbe: fix build error with FCOE_CONFIG without DCB_CONFIG
netxen: protect tx timeout recovery by rtnl lock
isdn: gigaset: use after free
isdn: gigaset: add missing unlock
solos-pci: Fix race condition in tasklet RX handling
pkt_sched: Fix sch_sfq vs tcf_bind_filter oops
net: disable preemption before call smp_processor_id()
tcp: no md5sig option size check bug
iwlwifi: fix locking assertions
iwlwifi: fix TX tracer
isdn: fix information leak
net: Fix napi_gro_frags vs netpoll path
usbnet: remove noisy and hardly useful printk
rtl8180: avoid potential NULL deref in rtl8180_beacon_work
ath9k: Remove myself from the MAINTAINERS list
libertas: scan before assocation if no BSSID was given
libertas: fix association with some APs by using extended rates
...
Getting and putting arrays of pointers with flex arrays is a PITA. You
have to remember to pass &ptr to the _put and you have to do weird and
wacky casting to get the ptr back from the _get. Add two functions
flex_array_get_ptr() and flex_array_put_ptr() to handle all of the magic.
[akpm@linux-foundation.org: simplification suggested by Joe]
Signed-off-by: Eric Paris <eparis@redhat.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Dave Hansen <dave@linux.vnet.ibm.com>
Cc: Joe Perches <joe@perches.com>
Cc: James Morris <jmorris@namei.org>
Cc: Joe Perches <joe@perches.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
A profile of a network benchmark showed iommu_num_pages rather high up:
0.52% iommu_num_pages
Looking at the profile, an integer divide is taking almost all of the time:
%
: c000000000376ea4 <.iommu_num_pages>:
1.93 : c000000000376ea4: fb e1 ff f8 std r31,-8(r1)
0.00 : c000000000376ea8: f8 21 ff c1 stdu r1,-64(r1)
0.00 : c000000000376eac: 7c 3f 0b 78 mr r31,r1
3.86 : c000000000376eb0: 38 84 ff ff addi r4,r4,-1
0.00 : c000000000376eb4: 38 05 ff ff addi r0,r5,-1
0.00 : c000000000376eb8: 7c 84 2a 14 add r4,r4,r5
46.95 : c000000000376ebc: 7c 00 18 38 and r0,r0,r3
45.66 : c000000000376ec0: 7c 84 02 14 add r4,r4,r0
0.00 : c000000000376ec4: 7c 64 2b 92 divdu r3,r4,r5
0.00 : c000000000376ec8: 38 3f 00 40 addi r1,r31,64
0.00 : c000000000376ecc: eb e1 ff f8 ld r31,-8(r1)
1.61 : c000000000376ed0: 4e 80 00 20 blr
Since every caller of iommu_num_pages passes in a constant power of two
we can inline this such that the divide is replaced by a shift. The
entire function is only a few instructions once optimised, so it is
a good candidate for inlining overall.
Signed-off-by: Anton Blanchard <anton@samba.org>
Cc: Akinobu Mita <akinobu.mita@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
There are no more uses of NIPQUAD or NIPQUAD_FMT. Remove the definitions.
Signed-off-by: Joe Perches <joe@perches.com>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
We should use the __same_type() helper in __must_be_array().
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Reported-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The asm-generic/iomap.h provides these functions already, but the
non-generic fallback defines do not.
Signed-off-by: Mike Frysinger <vapier@gentoo.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
On some SoC chips, HW resources may be in use during any particular idle
period. As a consequence, the cpuidle states that the SoC is safe to
enter can change from idle period to idle period. In addition, the
latency and threshold of each cpuidle state can vary, depending on the
operating condition when the CPU becomes idle, e.g. the current cpu
frequency, the current state of the HW blocks, etc.
cpuidle core and the menu governor, in the current form, are geared
towards cpuidle states that are static, i.e. the availabiltiy of the
states, their latencies, their thresholds are non-changing during run
time. cpuidle does not provide any hook that cpuidle drivers can use to
adjust those values on the fly for the current idle period before the menu
governor selects the target cpuidle state.
This patch extends cpuidle core and the menu governor to handle states
that are dynamic. There are three additions in the patch and the patch
maintains backwards-compatibility with existing cpuidle drivers.
1) add prepare() to struct cpuidle_device. A cpuidle driver can hook
into the callback and cpuidle will call prepare() before calling the
governor's select function. The callback gives the cpuidle driver a
chance to update the dynamic information of the cpuidle states for the
current idle period, e.g. state availability, latencies, thresholds,
power values, etc.
2) add CPUIDLE_FLAG_IGNORE as one of the state flags. In the prepare()
function, a cpuidle driver can set/clear the flag to indicate to the
menu governor whether a cpuidle state should be ignored, i.e. not
available, during the current idle period.
3) add power_specified bit to struct cpuidle_device. The menu governor
currently assumes that the cpuidle states are arranged in the order of
increasing latency, threshold, and power savings. This is true or can
be made true for static states. Once the state parameters are dynamic,
the latencies, thresholds, and power savings for the cpuidle states can
increase or decrease by different amounts from idle period to idle
period. So the assumption of increasing latency, threshold, and power
savings from Cn to C(n+1) can no longer be guaranteed.
It can be straightforward to calculate the power consumption of each
available state and to specify it in power_usage for the idle period.
Using the power_usage fields, the menu governor then selects the state
that has the lowest power consumption and that still satisfies all other
critieria. The power_specified bit defaults to 0. For existing cpuidle
drivers, cpuidle detects that power_specified is 0 and fills in a dummy
set of power_usage values.
Signed-off-by: Ai Li <aili@codeaurora.org>
Cc: Len Brown <len.brown@intel.com>
Acked-by: Arjan van de Ven <arjan@linux.intel.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Venkatesh Pallipadi <venki@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
When taking a memory snapshot in hibernate_snapshot(), all (directly
called) memory allocations use GFP_ATOMIC. Hence swap misusage during
hibernation never occurs.
But from a pessimistic point of view, there is no guarantee that no page
allcation has __GFP_WAIT. It is better to have a global indication "we
enter hibernation, don't use swap!".
This patch tries to freeze new-swap-allocation during hibernation. (All
user processes are frozenm so swapin is not a concern).
This way, no updates will happen to swap_map[] between
hibernate_snapshot() and save_image(). Swap is thawed when swsusp_free()
is called. We can be assured that swap corruption will not occur.
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
Cc: Hugh Dickins <hughd@google.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Ondrej Zary <linux@rainbow-software.org>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Memcg also need to trace page isolation information as global reclaim.
This patch does it.
Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Acked-by: Mel Gorman <mel@csn.ul.ie>
Acked-by: Balbir Singh <balbir@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Mel Gorman recently added some vmscan tracepoints. Unfortunately they are
covered only global reclaim. But we want to trace memcg reclaim too.
Thus, this patch convert them to DEFINE_TRACE macro. it help to reuse
tracepoint definition for other similar usage (i.e. memcg). This patch
have no functionally change.
Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Acked-by: Balbir Singh <balbir@linux.vnet.ibm.com>
Acked-by: Mel Gorman <mel@csn.ul.ie>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Memcg also need to trace reclaim progress as direct reclaim. This patch
add it.
Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Acked-by: Mel Gorman <mel@csn.ul.ie>
Acked-by: Balbir Singh <balbir@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Mel Gorman recently added some vmscan tracepoints. Unfortunately they are
covered only global reclaim. But we want to trace memcg reclaim too.
Thus, this patch convert them to DEFINE_TRACE macro. it help to reuse
tracepoint definition for other similar usage (i.e. memcg). This patch
have no functionally change.
Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Acked-by: Mel Gorman <mel@csn.ul.ie>
Acked-by: Balbir Singh <balbir@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
On swapin it is fairly common for a page to be owned exclusively by one
process. In that case we want to add the page to the anon_vma of that
process's VMA, instead of to the root anon_vma.
This will reduce the amount of rmap searching that the swapout code needs
to do.
Signed-off-by: Rik van Riel <riel@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
/proc/pid/oom_adj is now deprecated so that that it may eventually be
removed. The target date for removal is August 2012.
A warning will be printed to the kernel log if a task attempts to use this
interface. Future warning will be suppressed until the kernel is rebooted
to prevent spamming the kernel log.
Signed-off-by: David Rientjes <rientjes@google.com>
Cc: Nick Piggin <npiggin@suse.de>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This a complete rewrite of the oom killer's badness() heuristic which is
used to determine which task to kill in oom conditions. The goal is to
make it as simple and predictable as possible so the results are better
understood and we end up killing the task which will lead to the most
memory freeing while still respecting the fine-tuning from userspace.
Instead of basing the heuristic on mm->total_vm for each task, the task's
rss and swap space is used instead. This is a better indication of the
amount of memory that will be freeable if the oom killed task is chosen
and subsequently exits. This helps specifically in cases where KDE or
GNOME is chosen for oom kill on desktop systems instead of a memory
hogging task.
The baseline for the heuristic is a proportion of memory that each task is
currently using in memory plus swap compared to the amount of "allowable"
memory. "Allowable," in this sense, means the system-wide resources for
unconstrained oom conditions, the set of mempolicy nodes, the mems
attached to current's cpuset, or a memory controller's limit. The
proportion is given on a scale of 0 (never kill) to 1000 (always kill),
roughly meaning that if a task has a badness() score of 500 that the task
consumes approximately 50% of allowable memory resident in RAM or in swap
space.
The proportion is always relative to the amount of "allowable" memory and
not the total amount of RAM systemwide so that mempolicies and cpusets may
operate in isolation; they shall not need to know the true size of the
machine on which they are running if they are bound to a specific set of
nodes or mems, respectively.
Root tasks are given 3% extra memory just like __vm_enough_memory()
provides in LSMs. In the event of two tasks consuming similar amounts of
memory, it is generally better to save root's task.
Because of the change in the badness() heuristic's baseline, it is also
necessary to introduce a new user interface to tune it. It's not possible
to redefine the meaning of /proc/pid/oom_adj with a new scale since the
ABI cannot be changed for backward compatability. Instead, a new tunable,
/proc/pid/oom_score_adj, is added that ranges from -1000 to +1000. It may
be used to polarize the heuristic such that certain tasks are never
considered for oom kill while others may always be considered. The value
is added directly into the badness() score so a value of -500, for
example, means to discount 50% of its memory consumption in comparison to
other tasks either on the system, bound to the mempolicy, in the cpuset,
or sharing the same memory controller.
/proc/pid/oom_adj is changed so that its meaning is rescaled into the
units used by /proc/pid/oom_score_adj, and vice versa. Changing one of
these per-task tunables will rescale the value of the other to an
equivalent meaning. Although /proc/pid/oom_adj was originally defined as
a bitshift on the badness score, it now shares the same linear growth as
/proc/pid/oom_score_adj but with different granularity. This is required
so the ABI is not broken with userspace applications and allows oom_adj to
be deprecated for future removal.
Signed-off-by: David Rientjes <rientjes@google.com>
Cc: Nick Piggin <npiggin@suse.de>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Since 2.6.28 zone->prev_priority is unused. Then it can be removed
safely. It reduce stack usage slightly.
Now I have to say that I'm sorry. 2 years ago, I thought prev_priority
can be integrate again, it's useful. but four (or more) times trying
haven't got good performance number. Thus I give up such approach.
The rest of this changelog is notes on prev_priority and why it existed in
the first place and why it might be not necessary any more. This information
is based heavily on discussions between Andrew Morton, Rik van Riel and
Kosaki Motohiro who is heavily quotes from.
Historically prev_priority was important because it determined when the VM
would start unmapping PTE pages. i.e. there are no balances of note within
the VM, Anon vs File and Mapped vs Unmapped. Without prev_priority, there
is a potential risk of unnecessarily increasing minor faults as a large
amount of read activity of use-once pages could push mapped pages to the
end of the LRU and get unmapped.
There is no proof this is still a problem but currently it is not considered
to be. Active files are not deactivated if the active file list is smaller
than the inactive list reducing the liklihood that file-mapped pages are
being pushed off the LRU and referenced executable pages are kept on the
active list to avoid them getting pushed out by read activity.
Even if it is a problem, prev_priority prev_priority wouldn't works
nowadays. First of all, current vmscan still a lot of UP centric code. it
expose some weakness on some dozens CPUs machine. I think we need more and
more improvement.
The problem is, current vmscan mix up per-system-pressure, per-zone-pressure
and per-task-pressure a bit. example, prev_priority try to boost priority to
other concurrent priority. but if the another task have mempolicy restriction,
it is unnecessary, but also makes wrong big latency and exceeding reclaim.
per-task based priority + prev_priority adjustment make the emulation of
per-system pressure. but it have two issue 1) too rough and brutal emulation
2) we need per-zone pressure, not per-system.
Another example, currently DEF_PRIORITY is 12. it mean the lru rotate about
2 cycle (1/4096 + 1/2048 + 1/1024 + .. + 1) before invoking OOM-Killer.
but if 10,0000 thrreads enter DEF_PRIORITY reclaim at the same time, the
system have higher memory pressure than priority==0 (1/4096*10,000 > 2).
prev_priority can't solve such multithreads workload issue. In other word,
prev_priority concept assume the sysmtem don't have lots threads."
Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: Mel Gorman <mel@csn.ul.ie>
Reviewed-by: Johannes Weiner <hannes@cmpxchg.org>
Reviewed-by: Rik van Riel <riel@redhat.com>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Chris Mason <chris.mason@oracle.com>
Cc: Nick Piggin <npiggin@suse.de>
Cc: Rik van Riel <riel@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Michael Rubin <mrubin@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Add a trace event for when page reclaim queues a page for IO and records
whether it is synchronous or asynchronous. Excessive synchronous IO for a
process can result in noticeable stalls during direct reclaim. Excessive
IO from page reclaim may indicate that the system is seriously under
provisioned for the amount of dirty pages that exist.
Signed-off-by: Mel Gorman <mel@csn.ul.ie>
Acked-by: Rik van Riel <riel@redhat.com>
Acked-by: Larry Woodman <lwoodman@redhat.com>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Chris Mason <chris.mason@oracle.com>
Cc: Nick Piggin <npiggin@suse.de>
Cc: Rik van Riel <riel@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Michael Rubin <mrubin@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Add an event for when pages are isolated en-masse from the LRU lists.
This event augments the information available on LRU traffic and can be
used to evaluate lumpy reclaim.
[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Mel Gorman <mel@csn.ul.ie>
Acked-by: Rik van Riel <riel@redhat.com>
Acked-by: Larry Woodman <lwoodman@redhat.com>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Chris Mason <chris.mason@oracle.com>
Cc: Nick Piggin <npiggin@suse.de>
Cc: Rik van Riel <riel@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Michael Rubin <mrubin@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Add two trace events for kswapd waking up and going asleep for the
purposes of tracking kswapd activity and two trace events for direct
reclaim beginning and ending. The information can be used to work out how
much time a process or the system is spending on the reclamation of pages
and in the case of direct reclaim, how many pages were reclaimed for that
process. High frequency triggering of these events could point to memory
pressure problems.
Signed-off-by: Mel Gorman <mel@csn.ul.ie>
Acked-by: Rik van Riel <riel@redhat.com>
Acked-by: Larry Woodman <lwoodman@redhat.com>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Chris Mason <chris.mason@oracle.com>
Cc: Nick Piggin <npiggin@suse.de>
Cc: Rik van Riel <riel@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Michael Rubin <mrubin@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
We try to avoid livelocks of writeback when some steadily creates dirty
pages in a mapping we are writing out. For memory-cleaning writeback,
using nr_to_write works reasonably well but we cannot really use it for
data integrity writeback. This patch tries to solve the problem.
The idea is simple: Tag all pages that should be written back with a
special tag (TOWRITE) in the radix tree. This can be done rather quickly
and thus livelocks should not happen in practice. Then we start doing the
hard work of locking pages and sending them to disk only for those pages
that have TOWRITE tag set.
Note: Adding new radix tree tag grows radix tree node from 288 to 296
bytes for 32-bit archs and from 552 to 560 bytes for 64-bit archs.
However, the number of slab/slub items per page remains the same (13 and 7
respectively).
Signed-off-by: Jan Kara <jack@suse.cz>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Cc: Chris Mason <chris.mason@oracle.com>
Cc: Theodore Ts'o <tytso@mit.edu>
Cc: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Implement function for setting one tag if another tag is set for each item
in given range.
Signed-off-by: Jan Kara <jack@suse.cz>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Cc: Chris Mason <chris.mason@oracle.com>
Cc: Theodore Ts'o <tytso@mit.edu>
Cc: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The new anon-vma code, was suboptimal and it lead to erratic invocation of
ksm_does_need_to_copy. That leads to host hangs or guest vnc lockup, or
weird behavior. It's unclear why ksm_does_need_to_copy is unstable but
the point is that when KSM is not in use, ksm_does_need_to_copy must never
run or we bounce pages for no good reason. I suspect the same hangs will
happen with KVM swaps. But this at least fixes the regression in the
new-anon-vma code and it only let KSM bugs triggers when KSM is in use.
The code in do_swap_page likely doesn't cope well with a not-swapcache,
especially the memcg code.
Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Signed-off-by: Rik van Riel <riel@redhat.com>
Cc: Izik Eidus <ieidus@yahoo.com>
Cc: Avi Kivity <avi@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The current implementation of tmpfs is not scalable. We found that
stat_lock is contended by multiple threads when we need to get a new page,
leading to useless spinning inside this spin lock.
This patch makes use of the percpu_counter library to maintain local count
of used blocks to speed up getting and returning of pages. So the
acquisition of stat_lock is unnecessary for getting and returning blocks,
improving the performance of tmpfs on system with large number of cpus.
On a 4 socket 32 core NHM-EX system, we saw improvement of 270%.
The implementation below has a slight chance of race between threads
causing a slight overshoot of the maximum configured blocks. However, any
overshoot is small, and is bounded by the number of cpus. This happens
when the number of used blocks is slightly below the maximum configured
blocks when a thread checks the used block count, and another thread
allocates the last block before the current thread does. This should not
be a problem for tmpfs, as the overshoot is most likely to be a few blocks
and bounded. If a strict limit is really desired, then configured the max
blocks to be the limit less the number of cpus in system.
Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
Cc: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Add percpu_counter_compare that allows for a quick but accurate comparison
of percpu_counter with a given value.
A rough count is provided by the count field in percpu_counter structure,
without accounting for the other values stored in individual cpu counters.
The actual count is a sum of count and the cpu counters. However, count
field is never different from the actual value by a factor of
batch*num_online_cpu. We do not need to get actual count for comparison
if count is different from the given value by this factor and allows for
quick comparison without summing up all the per cpu counters.
Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
Cc: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
No real bugs, just some dead code and some fixups.
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Avoid quite a lot of warnings in header files in a gcc 4.6 -Wall builds
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Define stubs for the numa_*_id() generic percpu related functions for
non-NUMA configurations in <asm-generic/topology.h> where the other
non-numa stubs live.
Fixes ia64 !NUMA build breakage -- e.g., tiger_defconfig
Back out now unneeded '#ifndef CONFIG_NUMA' guards from ia64 smpboot.c
Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com>
Tested-by: Tony Luck <tony.luck@intel.com>
Acked-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
We have been used naming try_set_zone_oom and clear_zonelist_oom.
The role of functions is to lock of zonelist for preventing parallel
OOM. So clear_zonelist_oom makes sense but try_set_zone_oome is rather
awkward and unmatched with clear_zonelist_oom.
Let's change it with try_set_zonelist_oom.
Signed-off-by: Minchan Kim <minchan.kim@gmail.com>
Acked-by: David Rientjes <rientjes@google.com>
Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The three oom killer sysctl variables (sysctl_oom_dump_tasks,
sysctl_oom_kill_allocating_task, and sysctl_panic_on_oom) are better
declared in include/linux/oom.h rather than kernel/sysctl.c.
Signed-off-by: David Rientjes <rientjes@google.com>
Acked-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
There are various points in the oom killer where the kernel must determine
whether to panic or not. It's better to extract this to a helper function
to remove all the confusion as to its semantics.
Also fix a call to dump_header() where tasklist_lock is not read- locked,
as required.
There's no functional change with this patch.
Acked-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: David Rientjes <rientjes@google.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The oom killer presently kills current whenever there is no more memory
free or reclaimable on its mempolicy's nodes. There is no guarantee that
current is a memory-hogging task or that killing it will free any
substantial amount of memory, however.
In such situations, it is better to scan the tasklist for nodes that are
allowed to allocate on current's set of nodes and kill the task with the
highest badness() score. This ensures that the most memory-hogging task,
or the one configured by the user with /proc/pid/oom_adj, is always
selected in such scenarios.
Signed-off-by: David Rientjes <rientjes@google.com>
Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The comment suggests that when b_count equals zero it is calling
__wait_no_buffer to trigger some debug, but as there is no debug in
__wait_on_buffer the whole thing is redundant.
AFAICT from the git log this has been the case for at least 5 years, so
it seems safe just to remove this.
Signed-off-by: Richard Kennedy <richard@rsk.demon.co.uk>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
KSM reference counts can cause an anon_vma to exist after the processe it
belongs to have already exited. Because the anon_vma lock now lives in
the root anon_vma, we need to ensure that the root anon_vma stays around
until after all the "child" anon_vmas have been freed.
The obvious way to do this is to have a "child" anon_vma take a reference
to the root in anon_vma_fork. When the anon_vma is freed at munmap or
process exit, we drop the refcount in anon_vma_unlink and possibly free
the root anon_vma.
The KSM anon_vma reference count function also needs to be modified to
deal with the possibility of freeing 2 levels of anon_vma. The easiest
way to do this is to break out the KSM magic and make it generic.
When compiling without CONFIG_KSM, this code is compiled out.
Signed-off-by: Rik van Riel <riel@redhat.com>
Tested-by: Larry Woodman <lwoodman@redhat.com>
Acked-by: Larry Woodman <lwoodman@redhat.com>
Reviewed-by: Minchan Kim <minchan.kim@gmail.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Acked-by: Mel Gorman <mel@csn.ul.ie>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Tested-by: Dave Young <hidave.darkstar@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Always (and only) lock the root (oldest) anon_vma whenever we do something
in an anon_vma. The recently introduced anon_vma scalability is due to
the rmap code scanning only the VMAs that need to be scanned. Many common
operations still took the anon_vma lock on the root anon_vma, so always
taking that lock is not expected to introduce any scalability issues.
However, always taking the same lock does mean we only need to take one
lock, which means rmap_walk on pages from any anon_vma in the vma is
excluded from occurring during an munmap, expand_stack or other operation
that needs to exclude rmap_walk and similar functions.
Also add the proper locking to vma_adjust.
Signed-off-by: Rik van Riel <riel@redhat.com>
Tested-by: Larry Woodman <lwoodman@redhat.com>
Acked-by: Larry Woodman <lwoodman@redhat.com>
Reviewed-by: Minchan Kim <minchan.kim@gmail.com>
Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Acked-by: Mel Gorman <mel@csn.ul.ie>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Track the root (oldest) anon_vma in each anon_vma tree. Because we only
take the lock on the root anon_vma, we cannot use the lock on higher-up
anon_vmas to lock anything. This makes it impossible to do an indirect
lookup of the root anon_vma, since the data structures could go away from
under us.
However, a direct pointer is safe because the root anon_vma is always the
last one that gets freed on munmap or exit, by virtue of the same_vma list
order and unlink_anon_vmas walking the list forward.
[akpm@linux-foundation.org: fix typo]
Signed-off-by: Rik van Riel <riel@redhat.com>
Acked-by: Mel Gorman <mel@csn.ul.ie>
Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Tested-by: Larry Woodman <lwoodman@redhat.com>
Acked-by: Larry Woodman <lwoodman@redhat.com>
Reviewed-by: Minchan Kim <minchan.kim@gmail.com>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Subsitute a direct call of spin_lock(anon_vma->lock) with an inline
function doing exactly the same.
This makes it easier to do the substitution to the root anon_vma lock in a
following patch.
We will deal with the handful of special locks (nested, dec_and_lock, etc)
separately.
Signed-off-by: Rik van Riel <riel@redhat.com>
Acked-by: Mel Gorman <mel@csn.ul.ie>
Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Tested-by: Larry Woodman <lwoodman@redhat.com>
Acked-by: Larry Woodman <lwoodman@redhat.com>
Reviewed-by: Minchan Kim <minchan.kim@gmail.com>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Rename anon_vma_lock to vma_lock_anon_vma. This matches the naming style
used in page_lock_anon_vma and will come in really handy further down in
this patch series.
Signed-off-by: Rik van Riel <riel@redhat.com>
Acked-by: Mel Gorman <mel@csn.ul.ie>
Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Tested-by: Larry Woodman <lwoodman@redhat.com>
Acked-by: Larry Woodman <lwoodman@redhat.com>
Reviewed-by: Minchan Kim <minchan.kim@gmail.com>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
kunmap_atomic() is currently at level -4 on Rusty's "Hard To Misuse"
list[1] ("Follow common convention and you'll get it wrong"), except in
some architectures when CONFIG_DEBUG_HIGHMEM is set[2][3].
kunmap() takes a pointer to a struct page; kunmap_atomic(), however, takes
takes a pointer to within the page itself. This seems to once in a while
trip people up (the convention they are following is the one from
kunmap()).
Make it much harder to misuse, by moving it to level 9 on Rusty's list[4]
("The compiler/linker won't let you get it wrong"). This is done by
refusing to build if the type of its first argument is a pointer to a
struct page.
The real kunmap_atomic() is renamed to kunmap_atomic_notypecheck()
(which is what you would call in case for some strange reason calling it
with a pointer to a struct page is not incorrect in your code).
The previous version of this patch was compile tested on x86-64.
[1] http://ozlabs.org/~rusty/index.cgi/tech/2008-04-01.html
[2] In these cases, it is at level 5, "Do it right or it will always
break at runtime."
[3] At least mips and powerpc look very similar, and sparc also seems to
share a common ancestor with both; there seems to be quite some
degree of copy-and-paste coding here. The include/asm/highmem.h file
for these three archs mention x86 CPUs at its top.
[4] http://ozlabs.org/~rusty/index.cgi/tech/2008-03-30.html
[5] As an aside, could someone tell me why mn10300 uses unsigned long as
the first parameter of kunmap_atomic() instead of void *?
Signed-off-by: Cesar Eduardo Barros <cesarb@cesarb.net>
Cc: Russell King <linux@arm.linux.org.uk> (arch/arm)
Cc: Ralf Baechle <ralf@linux-mips.org> (arch/mips)
Cc: David Howells <dhowells@redhat.com> (arch/frv, arch/mn10300)
Cc: Koichi Yasutake <yasutake.koichi@jp.panasonic.com> (arch/mn10300)
Cc: Kyle McMartin <kyle@mcmartin.ca> (arch/parisc)
Cc: Helge Deller <deller@gmx.de> (arch/parisc)
Cc: "James E.J. Bottomley" <jejb@parisc-linux.org> (arch/parisc)
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> (arch/powerpc)
Cc: Paul Mackerras <paulus@samba.org> (arch/powerpc)
Cc: "David S. Miller" <davem@davemloft.net> (arch/sparc)
Cc: Thomas Gleixner <tglx@linutronix.de> (arch/x86)
Cc: Ingo Molnar <mingo@redhat.com> (arch/x86)
Cc: "H. Peter Anvin" <hpa@zytor.com> (arch/x86)
Cc: Arnd Bergmann <arnd@arndb.de> (include/asm-generic)
Cc: Rusty Russell <rusty@rustcorp.com.au> ("Hard To Misuse" list)
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The start/stop_critical_timing functions for preemptirqsoff, preemptoff
and irqsoff tracers contain atomic_inc() and atomic_dec() operations.
Atomic operations use local_irq_save/restore macros to ensure atomic
access but they are traced by the same function which is causing recursion
problem.
The reason is when these tracers are turn ON then the
local_irq_save/restore macros are changed in include/linux/irqflags.h to
call trace_hardirqs_on/off which call start/stop_critical_timing.
Microblaze was affected because it uses generic atomic implementation.
Signed-off-by: Michal Simek <monstr@monstr.eu>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Expand the crtc_gamma_set function to accept a starting offset. The
reason for this is to eventually use this function for setcolreg from
drm_fb_helper.c. The fbdev colormap function can start at any offset in
the color map.
Signed-by: James Simmons <jsimmons@infradead.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
(For some reason I thought that went in ages ago ...)
This fixes support for PCI domains in what should hopefully be a backward
compatible way along with a change to libdrm.
When the interface version is set to 1.4, we assume userspace understands
domains and the world is at peace. We thus pass proper domain numbers
instead of 0 to userspace.
The newer libdrm will then try 1.4 first, and fallback to 1.1, along with
ignoring domains in the later case (well, except on alpha of course)
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
If sget() finds a matching superblock being set up, it'll
grab an active reference to it and grab s_umount. That's
fine - we'll wait for completion of foofs_get_sb() that way.
However, if said foofs_get_sb() fails we'll end up holding
the halfway-created superblock. deactivate_locked_super()
called by foofs_get_sb() will just unlock the sucker since
we are holding another active reference to it.
What we need is a way to tell if superblock has been successfully
set up. Unfortunately, neither ->s_root nor the check for
MS_ACTIVE quite fit. Cheap and easy way, suitable for backport:
new flag set by the (only) caller of ->get_sb(). If that flag
isn't present by the time sget() grabbed s_umount on preexisting
superblock it has found, it's seeing a stillborn and should
just bury it with deactivate_locked_super() (and repeat the search).
Longer term we want to set that flag in ->get_sb() instances (and
check for it to distinguish between "sget() found us a live sb"
and "sget() has allocated an sb, we need to set it up" in there,
instead of checking ->s_root as we do now).
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Cc: stable@kernel.org
The mbcache code was written to support a variable number of indexes,
but all the existing users use exactly one index. Simplify to code to
support only that case.
There are also no users of the cache entry free operation, and none of
the users keep extra data in cache entries. Remove those features as
well.
Signed-off-by: Andreas Gruenbacher <agruen@suse.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Add a flags field to help glibc implementing statvfs(3) efficiently.
We copy the flag values from glibc, and add a new ST_VALID flag to
denote that f_flags is implemented.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
We'll need the path to implement the flags field for statvfs support.
We do have it available in all callers except:
- ecryptfs_statfs. This one doesn't actually need vfs_statfs but just
needs to do a caller to the lower filesystem statfs method.
- sys_ustat. Add a non-exported statfs_by_dentry helper for it which
doesn't won't be able to fill out the flags field later on.
In addition rename the helpers for statfs vs fstatfs to do_*statfs instead
of the misleading vfs prefix.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
builds path relative to fs root, called under dcache_lock,
doesn't append any nonsense to unlinked ones.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Essentially, the minimal variant of ->evict_inode(). It's
a trimmed-down clear_inode(), sans any fs callbacks. Once
it returns we know that no async writeback will be happening;
every ->evict_inode() instance should do that once and do that
before doing anything ->write_inode() could interfere with
(e.g. freeing the on-disk inode).
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Hybrid of ->clear_inode() and ->delete_inode(); if present, does
all fs work to be done when in-core inode is about to be gone,
for whatever reason.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
add I_CLEAR instead of replacing I_FREEING with it. I_CLEAR is
equivalent to I_FREEING for almost all code looking at either;
it's there to keep track of having called clear_inode() exactly
once per inode lifetime, at some point after having set I_FREEING.
I_CLEAR and I_FREEING never get set at the same time with the
current code, so we can switch to setting i_flags to I_FREEING | I_CLEAR
instead of I_CLEAR without loss of information. As the result of
such change, checks become simpler and the amount of code that needs
to know about I_CLEAR shrinks a lot.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Make sure we check the truncate constraints early on in ->setattr by adding
those checks to inode_change_ok. Also clean up and document inode_change_ok
to make this obvious.
As a fallout we don't have to call inode_newsize_ok from simple_setsize and
simplify it down to a truncate_setsize which doesn't return an error. This
simplifies a lot of setattr implementations and means we use truncate_setsize
almost everywhere. Get rid of fat_setsize now that it's trivial and mark
ext2_setsize static to make the calling convention obvious.
Keep the inode_newsize_ok in vmtruncate for now as all callers need an
audit for its removal anyway.
Note: setattr code in ecryptfs doesn't call inode_change_ok at all and
needs a deeper audit, but that is left for later.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Replace inode_setattr with opencoded variants of it in all callers. This
moves the remaining call to vmtruncate into the filesystem methods where it
can be replaced with the proper truncate sequence.
In a few cases it was obvious that we would never end up calling vmtruncate
so it was left out in the opencoded variant:
spufs: explicitly checks for ATTR_SIZE earlier
btrfs,hugetlbfs,logfs,dlmfs: explicitly clears ATTR_SIZE earlier
ufs: contains an opencoded simple_seattr + truncate that sets the filesize just above
In addition to that ncpfs called inode_setattr with handcrafted iattrs,
which allowed to trim down the opencoded variant.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Despite its name it's now a generic implementation of ->setattr, but
rather a helper to copy attributes from a struct iattr to the inode.
Rename it to setattr_copy to reflect this fact.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Move the call to vmtruncate to get rid of accessive blocks to the callers
in preparation of the new truncate sequence and rename the non-truncating
version to block_write_begin.
While we're at it also remove several unused arguments to block_write_begin.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Split up the block_write_begin implementation - __block_write_begin is a new
trivial wrapper for block_prepare_write that always takes an already
allocated page and can be either called from block_write_begin or filesystem
code that already has a page allocated. Remove the handling of already
allocated pages from block_write_begin after switching all callers that
do it to __block_write_begin.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Move the call to vmtruncate to get rid of accessive blocks to the callers
in preparation of the new truncate sequence and rename the non-truncating
version to cont_write_begin.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Move the call to vmtruncate to get rid of accessive blocks to the only
remaining caller and rename the non-truncating version to nobh_write_begin.
Get rid of the superflous file argument to it while we're at it.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Move the call to vmtruncate to get rid of accessive blocks to the callers
in prepearation of the new truncate calling sequence. This was only done
for DIO_LOCKING filesystems, so the __blockdev_direct_IO_newtrunc variant
was not needed anyway. Get rid of blockdev_direct_IO_no_locking and
its _newtrunc variant while at it as just opencoding the two additional
paramters is shorted than the name suffix.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
a) count file openers correctly; i_count use was completely wrong
b) use new mutex for exclusion between final close/open/truncate,
to protect tailpacking logics. i_mutex use was wrong and resulted
in deadlocks.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
kmem_cache->cpu_slab is a percpu pointer but was missing __percpu
markup. Add it.
Signed-off-by: Namhyung Kim <namhyung@gmail.com>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Pekka Enberg <penberg@kernel.org>
Since this module is also used by drivers that are not yet converted, the old
and new code have to co-exist.
The source is split into three parts: a common part at the top, which is used
by both old and new code, then the old code followed by the new control
framework implementation. This new code is much more readable (and shorter!)
than the original code.
Once all bridge drivers that use this are converted the old code can be
deleted.
Signed-off-by: Hans Verkuil <hverkuil@xs4all.nl>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
The cx25840 used a private control CX25840_CID_ENABLE_PVR150_WORKAROUND
to be told whether to enable a workaround for certain pvr150 cards.
This is really config data that it needs to get at load time.
Implemented this in cx25840 and ivtv.
Signed-off-by: Hans Verkuil <hverkuil@xs4all.nl>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
Add a new framework to handle controls which makes life for driver
developers much easier.
Note that this patch moves some of the control support that used to be in
v4l2-common.c to v4l2-ctrls.c. The tables were copied unchanged. The body
of v4l2_ctrl_query_fill() was copied to a new v4l2_ctrl_fill() function
in v4l2-ctrls.c. This new function doesn't use the v4l2_queryctrl
struct anymore, which makes it more general.
The remainder of v4l2-ctrls.c is all new. Highlights include:
- No need to implement VIDIOC_QUERYCTRL, QUERYMENU, S_CTRL, G_CTRL,
S_EXT_CTRLS, G_EXT_CTRLS or TRY_EXT_CTRLS in either bridge drivers
or subdevs. New wrapper functions are provided that can just be plugged in.
Once everything has been converted these wrapper functions can be removed as well.
- When subdevices are added their controls can be automatically merged
with the bridge driver's controls.
- Most drivers just need to implement s_ctrl to set the controls.
The framework handles the locking and tries to be as 'atomic' as possible.
- Ready for the subdev device nodes: the same mechanism applies to subdevs
and their device nodes as well. Sub-device drivers can make controls
local, preventing them from being merged with bridge drivers.
- Takes care of backwards compatibility handling of VIDIOC_S_CTRL and
VIDIOC_G_CTRL. Handling of V4L2_CID_PRIVATE_BASE is fully transparent.
CTRL_CLASS controls are automatically added.
Signed-off-by: Hans Verkuil <hverkuil@xs4all.nl>
Reviewed-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
This driver exports a video device node per each camera interface/
video postprocessor (FIMC) device contained in Samsung S5P SoC series.
The driver is based on v4l2-mem2mem framework.
Signed-off-by: Sylwester Nawrocki <s.nawrocki@samsung.com>
Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
Signed-off-by: Pawel Osciak <p.osciak@samsung.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
LIRC: add new IOCTL that enables learning mode (wide band receiver)
Still missing features: carrier report & timeout reports.
Will need to pack these into ir_raw_event
Signed-off-by: Maxim Levitsky <maximlevitsky@gmail.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
Some ir input devices have small buffer, and interrupt the host
each time it is full (or half full)
Add a helper that automaticly handles timeouts, and also
automaticly merges samples of same time (space-space)
Such samples might be placed by hardware because size of
sample in the buffer is small (a byte for example).
Also remove constness from ir_dev_props, because it now contains timeout
settings that driver might want to change
Signed-off-by: Maxim Levitsky <maximlevitsky@gmail.com>
Acked-by: Jarod Wilson <jarod@redhat.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
Currently, ir device registration fails if keymap requested by driver is not found.
Fix that by always compiling in the empty keymap, and using it as a failback.
Signed-off-by: Maxim Levitsky <maximlevitsky@gmail.com>
Acked-by: Jarod Wilson <jarod@redhat.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
This ports lirc_streamzap.c over to ir-core in-place, to be followed by
a patch moving the driver over to drivers/media/IR/streamzap.c and
enabling the proper Kconfig bits.
Presently, the in-kernel keymap doesn't work, as the stock Streamzap
remote uses an RC-5-like, but not-quite-RC-5 protocol, which the
in-kernel RC-5 decoder doesn't cope with. The remote can be used right
now with the lirc bridge driver though, and other remotes (at least an
RC-6(A) MCE remote) work perfectly with the driver.
I'll take a look at making the existing RC-5 decoder cope with this odd
duck, possibly implement another standalone decoder engine, or just
throw up my hands and say "meh, use lirc"... But the driver itself
should be perfectly sound.
Remaining items on the streamzap TODO list:
- add LIRC_SET_REC_TIMEOUT-alike support
- add LIRC_GET_M{AX,IN}_TIMEOUT-alike support
- add LIRC_GET_REC_RESOLUTION-alike support
All of the above should be trivial to add. There are patches pending to
add this support to ir-core from Maxim Levitsky, and I'll take care of
these once his patches get integrated. None of them are currently
essential though.
Signed-off-by: Jarod Wilson <jarod@redhat.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
The CX2584x and related cores are multifunction subdevices with a number
of internal blocks that act as interrupt sources. Move the v4L2_subdev
interrupt_service_routine callback from v4l_subdev_ir_ops to
v4l2_subdev_core_ops, as the video and audio blocks of a CX2584x and
related cores can generate interrupts along with the IR block. This
change also makes sense for other subdev's that generate interrupts and
do not have an IR block.
Signed-off-by: Andy Walls <awalls@md.metrocast.net>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
There is a distinction on IR Tx for the CX2388[578] chips of carrier
sense inversion (space is a carrier burst and mark is no burst) and I/O
pin level inversion (0 is high output level, 1 is low output level).
Allow the caller to set these parameters distinctly as v4l2_subdevice
IR parameters. This permits the IR device to be configured and enabled
without the IR Tx LED being on during idle/space time due to an external
hardware level inversion
Signed-off-by: Andy Walls <awalls@md.metrocast.net>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
Add s_io_pin_config core subdev op for the CX2388[578] AV cores.
This is complete for IR_RX, IR_TX, GPIOs 16,19-23, and IRQ_N.
It likely needs work for the I2S signal direction.
Signed-off-by: Andy Walls <awalls@md.metrocast.net>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
Add a method to v4l2_sudev_core_ops to allow bridge drivers to
manage what signal pads/functions are routed out to multiplexed IO pins on a
pin by pin basis. The interface also allows specifying initial output settings
for pins and disabling an IO pin altogether.
Signed-off-by: Andy Walls <awalls@md.metrocast.net>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
Handling of autofs ioctl numbers does not need to be generic
and can easily be done directly in autofs itself.
This also pushes the BKL into autofs and autofs4 ioctl
methods.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: H. Peter Anvin <hpa@zytor.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Ian Kent <raven@themaw.net>
Cc: Autofs <autofs@linux.kernel.org>
Cc: John Kacur <jkacur@redhat.com>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
The abbreviation of severity should be SEV instead of SER, so the CPER
severity constants are renamed accordingly. GHES severity constants
are renamed in the same way too.
Signed-off-by: Huang Ying <ying.huang@intel.com>
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Move the VENDOR/DEVICE ids to pci_ids.h.
Signed-off-by: H Hartley Sweeten <hsweeten@visionengravers.com>
Cc: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Wim Van Sebroeck <wim@iguana.be>
* git://git.kernel.org/pub/scm/linux/kernel/git/cmetcalf/linux-tile:
arch/tile: check kmalloc() result
arch/tile: catch up on various minor cleanups.
arch/tile: avoid erroneous error return for PTRACE_POKEUSR.
tile: set ARCH_KMALLOC_MINALIGN
tile: remove homegrown L1_CACHE_ALIGN macro
arch/tile: Miscellaneous cleanup changes.
arch/tile: Split the icache flush code off to a generic <arch> header.
arch/tile: Fix bug in support for atomic64_xx() ops.
arch/tile: Shrink the tile-opcode files considerably.
arch/tile: Add driver to enable access to the user dynamic network.
arch/tile: Enable more sophisticated IRQ model for 32-bit chips.
Move list types from <linux/list.h> to <linux/types.h>.
Add wait4() back to the set of <asm-generic/unistd.h> syscalls.
Revert adding some arch-specific signal syscalls to <linux/syscalls.h>.
arch/tile: Do not use GFP_KERNEL for dma_alloc_coherent(). Feedback from fujita.tomonori@lab.ntt.co.jp.
arch/tile: core support for Tilera 32-bit chips.
Fix up the "generic" unistd.h ABI to be more useful.
There are three reasons to add this support:
1. users probably know the interface type of their flashs, then probe
can be faster if they give the right type in platform data since wrong
types will not be detected.
2. sometimes, detecting can cause destory to system. For example, for
kernel XIP, detecting can cause NOR enter a mode instructions can not
be fetched right, which will make kernel crash.
3. For a new probe which is not listed in the rom_probe_types, if users
assign it in board files, physmap can still probe it.
Signed-off-by: Barry Song <21cnbao@gmail.com>
Signed-off-by: Mike Frysinger <vapier.adi@gmail.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ieee1394/linux1394-2.6: (82 commits)
firewire: core: add forgotten dummy driver methods, remove unused ones
firewire: add isochronous multichannel reception
firewire: core: small clarifications in core-cdev
firewire: core: remove unused code
firewire: ohci: release channel in error path
firewire: ohci: use memory barriers to order descriptor updates
tools/firewire: nosy-dump: increment program version
tools/firewire: nosy-dump: remove unused code
tools/firewire: nosy-dump: use linux/firewire-constants.h
tools/firewire: nosy-dump: break up a deeply nested function
tools/firewire: nosy-dump: make some symbols static or const
tools/firewire: nosy-dump: change to kernel coding style
tools/firewire: nosy-dump: work around segfault in decode_fcp
tools/firewire: nosy-dump: fix it on x86-64
tools/firewire: add userspace front-end of nosy
firewire: nosy: note ioctls in ioctl-number.txt
firewire: nosy: use generic printk macros
firewire: nosy: endianess fixes and annotations
firewire: nosy: annotate __user pointers and __iomem pointers
firewire: nosy: fix device shutdown with active client
...
* 'acpica' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-acpi-2.6: (27 commits)
ACPI / ACPICA: Simplify acpi_ev_initialize_gpe_block()
ACPI / ACPICA: Fail acpi_gpe_wakeup() if ACPI_GPE_CAN_WAKE is unset
ACPI / ACPICA: Do not execute _PRW methods during initialization
ACPI: Fix bogus GPE test in acpi_bus_set_run_wake_flags()
ACPICA: Update version to 20100702
ACPICA: Fix for Alias references within Package objects
ACPICA: Fix lint warning for 64-bit constant
ACPICA: Remove obsolete GPE function
ACPICA: Update debug output components
ACPICA: Add support for WDDT - Watchdog Descriptor Table
ACPICA: Drop acpi_set_gpe
ACPICA: Use low-level GPE enable during GPE block initialization
ACPI / EC: Do not use acpi_set_gpe
ACPI / EC: Drop suspend and resume routines
ACPICA: Remove wakeup GPE reference counting which is not used
ACPICA: Introduce acpi_gpe_wakeup()
ACPICA: Rename acpi_hw_gpe_register_bit
ACPICA: Update version to 20100528
ACPICA: Add signatures for undefined tables: ATKG, GSCI, IEIT
ACPICA: Optimization: Reduce the number of namespace walks
...
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband: (42 commits)
IB/qib: Add missing <linux/slab.h> include
IB/ehca: Drop unnecessary NULL test
RDMA/nes: Fix confusing if statement indentation
IB/ehca: Init irq tasklet before irq can happen
RDMA/nes: Fix misindented code
RDMA/nes: Fix showing wqm_quanta
RDMA/nes: Get rid of "set but not used" variables
RDMA/nes: Read firmware version from correct place
IB/srp: Export req_lim via sysfs
IB/srp: Make receive buffer handling more robust
IB/srp: Use print_hex_dump()
IB: Rename RAW_ETY to RAW_ETHERTYPE
RDMA/nes: Fix two sparse warnings
RDMA/cxgb3: Make needlessly global iwch_l2t_send() static
IB/iser: Make needlessly global iser_alloc_rx_descriptors() static
RDMA/cxgb4: Add timeouts when waiting for FW responses
IB/qib: Fix race between qib_error_qp() and receive packet processing
IB/qib: Limit the number of packets processed per interrupt
IB/qib: Allow writes to the diag_counters to be able to clear them
IB/qib: Set cfgctxts to number of CPUs by default
...
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound-2.6: (214 commits)
ALSA: hda - Add pin-fix for HP dc5750
ALSA: als4000: Fix potentially invalid DMA mode setup
ALSA: als4000: enable burst mode
ALSA: hda - Fix initial capsrc selection in patch_alc269()
ASoC: TWL4030: Capture route runtime DAPM ordering fix
ALSA: hda - Add PC-beep whitelist for an Intel board
ALSA: hda - More relax for pending period handling
ALSA: hda - Define AC_FMT_* constants
ALSA: hda - Fix beep frequency on IDT 92HD73xx and 92HD71Bxx codecs
ALSA: hda - Add support for HDMI HBR passthrough
ALSA: hda - Set Stream Type in Stream Format according to AES0
ALSA: hda - Fix Thinkpad X300 so SPDIF is not exposed
ALSA: hda - FIX to not expose SPDIF on Thinkpad X301, since it does not have the ability to use SPDIF
ASoC: wm9081: fix resource reclaim in wm9081_register error path
ASoC: wm8978: fix a memory leak if a wm8978_register fail
ASoC: wm8974: fix a memory leak if another WM8974 is registered
ASoC: wm8961: fix resource reclaim in wm8961_register error path
ASoC: wm8955: fix resource reclaim in wm8955_register error path
ASoC: wm8940: fix a memory leak if wm8940_register return error
ASoC: wm8904: fix resource reclaim in wm8904_register error path
...
* 'for-2.6.36' of git://linux-nfs.org/~bfields/linux: (34 commits)
nfsd4: fix file open accounting for RDWR opens
nfsd: don't allow setting maxblksize after svc created
nfsd: initialize nfsd versions before creating svc
net: sunrpc: removed duplicated #include
nfsd41: Fix a crash when a callback is retried
nfsd: fix startup/shutdown order bug
nfsd: minor nfsd read api cleanup
gcc-4.6: nfsd: fix initialized but not read warnings
nfsd4: share file descriptors between stateid's
nfsd4: fix openmode checking on IO using lock stateid
nfsd4: miscellaneous process_open2 cleanup
nfsd4: don't pretend to support write delegations
nfsd: bypass readahead cache when have struct file
nfsd: minor nfsd_svc() cleanup
nfsd: move more into nfsd_startup()
nfsd: just keep single lockd reference for nfsd
nfsd: clean up nfsd_create_serv error handling
nfsd: fix error handling in __write_ports_addxprt
nfsd: fix error handling when starting nfsd with rpcbind down
nfsd4: fix v4 state shutdown error paths
...
* 'nfs-for-2.6.36' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6: (42 commits)
NFS: NFSv4.1 is no longer a "developer only" feature
NFS: NFS_V4 is no longer an EXPERIMENTAL feature
NFS: Fix /proc/mount for legacy binary interface
NFS: Fix the locking in nfs4_callback_getattr
SUNRPC: Defer deleting the security context until gss_do_free_ctx()
SUNRPC: prevent task_cleanup running on freed xprt
SUNRPC: Reduce asynchronous RPC task stack usage
SUNRPC: Move the bound cred to struct rpc_rqst
SUNRPC: Clean up of rpc_bindcred()
SUNRPC: Move remaining RPC client related task initialisation into clnt.c
SUNRPC: Ensure that rpc_exit() always wakes up a sleeping task
SUNRPC: Make the credential cache hashtable size configurable
SUNRPC: Store the hashtable size in struct rpc_cred_cache
NFS: Ensure the AUTH_UNIX credcache is allocated dynamically
NFS: Fix the NFS users of rpc_restart_call()
SUNRPC: The function rpc_restart_call() should return success/failure
NFSv4: Get rid of the bogus RPC_ASSASSINATED(task) checks
NFSv4: Clean up the process of renewing the NFSv4 lease
NFSv4.1: Handle NFS4ERR_DELAY on SEQUENCE correctly
NFS: nfs_rename() should not have to flush out writebacks
...
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ryusuke/nilfs2: (45 commits)
nilfs2: reject filesystem with unsupported block size
nilfs2: avoid rec_len overflow with 64KB block size
nilfs2: simplify nilfs_get_page function
nilfs2: reject incompatible filesystem
nilfs2: add feature set fields to super block
nilfs2: clarify byte offset in super block format
nilfs2: apply read-ahead for nilfs_btree_lookup_contig
nilfs2: introduce check flag to btree node buffer
nilfs2: add btree get block function with readahead option
nilfs2: add read ahead mode to nilfs_btnode_submit_block
nilfs2: fix buffer head leak in nilfs_btnode_submit_block
nilfs2: eliminate inline keywords in btree implementation
nilfs2: get maximum number of child nodes from bmap object
nilfs2: reduce repetitive calculation of max number of child nodes
nilfs2: optimize calculation of min/max number of btree node children
nilfs2: remove redundant pointer checks in bmap lookup functions
nilfs2: get rid of nilfs_bmap_union
nilfs2: unify bmap set_target_v operations
nilfs2: get rid of nilfs_btree uses
nilfs2: get rid of nilfs_direct uses
...
* 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (40 commits)
ext4: Adding error check after calling ext4_mb_regular_allocator()
ext4: Fix dirtying of journalled buffers in data=journal mode
ext4: re-inline ext4_rec_len_(to|from)_disk functions
jbd2: Remove t_handle_lock from start_this_handle()
jbd2: Change j_state_lock to be a rwlock_t
jbd2: Use atomic variables to avoid taking t_handle_lock in jbd2_journal_stop
ext4: Add mount options in superblock
ext4: force block allocation on quota_off
ext4: fix freeze deadlock under IO
ext4: drop inode from orphan list if ext4_delete_inode() fails
ext4: check to make make sure bd_dev is set before dereferencing it
jbd2: Make barrier messages less scary
ext4: don't print scary messages for allocation failures post-abort
ext4: fix EFBIG edge case when writing to large non-extent file
ext4: fix ext4_get_blocks references
ext4: Always journal quota file modifications
ext4: Fix potential memory leak in ext4_fill_super
ext4: Don't error out the fs if the user tries to make a file too big
ext4: allocate stripe-multiple IOs on stripe boundaries
ext4: move aio completion after unwritten extent conversion
...
Fix up conflicts in fs/ext4/inode.c as per Ted.
Fix up xfs conflicts as per earlier xfs merge.
* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs-2.6:
ext3: Fix dirtying of journalled buffers in data=journal mode
ext3: default to ordered mode
quota: Use mark_inode_dirty_sync instead of mark_inode_dirty
quota: Change quota error message to print out disk and function name
MAINTAINERS: Update entries of ext2 and ext3
MAINTAINERS: Update address of Andreas Dilger
ext3: Avoid filesystem corruption after a crash under heavy delete load
ext3: remove vestiges of nobh support
ext3: Fix set but unused variables
quota: clean up quota active checks
quota: Clean up the namespace in dqblk_xfs.h
quota: check quota reservation on remove_dquot_ref
* git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6:
[DNS RESOLVER] Minor typo correction
DNS: Fixes for the DNS query module
cifs: Include linux/err.h for IS_ERR and PTR_ERR
DNS: Make AFS go to the DNS for AFSDB records for unknown cells
DNS: Separate out CIFS DNS Resolver code
cifs: account for new creduid=0x%x parameter in spnego upcall string
cifs: reduce false positives with inode aliasing serverino autodisable
CIFS: Make cifs_convert_address() take a const src pointer and a length
cifs: show features compiled in as part of DebugData
cifs: update README
Fix up trivial conflicts in fs/cifs/cifsfs.c due to workqueue changes
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq: (55 commits)
workqueue: mark init_workqueues() as early_initcall()
workqueue: explain for_each_*cwq_cpu() iterators
fscache: fix build on !CONFIG_SYSCTL
slow-work: kill it
gfs2: use workqueue instead of slow-work
drm: use workqueue instead of slow-work
cifs: use workqueue instead of slow-work
fscache: drop references to slow-work
fscache: convert operation to use workqueue instead of slow-work
fscache: convert object to use workqueue instead of slow-work
workqueue: fix how cpu number is stored in work->data
workqueue: fix mayday_mask handling on UP
workqueue: fix build problem on !CONFIG_SMP
workqueue: fix locking in retry path of maybe_create_worker()
async: use workqueue for worker pool
workqueue: remove WQ_SINGLE_CPU and use WQ_UNBOUND instead
workqueue: implement unbound workqueue
workqueue: prepare for WQ_UNBOUND implementation
libata: take advantage of cmwq and remove concurrency limitations
workqueue: fix worker management invocation without pending works
...
Fixed up conflicts in fs/cifs/* as per Tejun. Other trivial conflicts in
include/linux/workqueue.h, kernel/trace/Kconfig and kernel/workqueue.c
Stephen reports:
After merging the block tree, today's linux-next build (x86_64
allmodconfig) failed like this:
usr/include/linux/fs.h:11: included file 'linux/blk_types.h' is not exported
Caused by commit 9d3dbbcd9a84518ff5e32ffe671d06a48cf84fd9 ("bio, fs:
separate out bio_types.h and define READ/WRITE constants in terms of
BIO_RW_* flags").
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
It was a now abandoned attempt to throttle resync bandwidth
based on the delay it causes on the bulk data socket.
It has no userbase yet, and has been disabled by
9173465ccb51c09cc3102a10af93e9f469a0af6f already.
This removes the now unused code.
The basic feature, namely using up "idle" bandwith
of network and disk IO subsystem, with minimal impact
to application IO, is being reimplemented differently.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
Add 2 new trace points to the periodic write-back wake up case, just like we do
in the 'bdi_queue_work()' function. Namely, introduce:
1. trace_writeback_wake_thread(bdi)
2. trace_writeback_wake_forker_thread(bdi)
The first event is triggered every time we wake up a bdi thread to start
periodic background write-out. The second event is triggered only when the bdi
thread does not exist and should be created by the forker thread.
This patch was suggested by Dave Chinner and Christoph Hellwig.
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
Whe the first inode for a bdi is marked dirty, we wake up the bdi thread which
should take care of the periodic background write-out. However, the write-out
will actually start only 'dirty_writeback_interval' centisecs later, so we can
delay the wake-up.
This change was requested by Nick Piggin who pointed out that if we delay the
wake-up, we weed out 2 unnecessary contex switches, which matters because
'__mark_inode_dirty()' is a hot-path function.
This patch introduces a new function - 'bdi_wakeup_thread_delayed()', which
sets up a timer to wake-up the bdi thread and returns. So the wake-up is
delayed.
We also delete the timer in bdi threads just before writing-back. And
synchronously delete it when unregistering bdi. At the unregister point the bdi
does not have any users, so no one can arm it again.
Since now we take 'bdi->wb_lock' in the timer, which can execute in softirq
context, we have to use 'spin_lock_bh()' for 'bdi->wb_lock'. This patch makes
this change as well.
This patch also moves the 'bdi_wb_init()' function down in the file to avoid
forward-declaration of 'bdi_wakeup_thread_delayed()'.
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
Currently bdi threads use local variable 'last_active' which stores last time
when the bdi thread did some useful work. Move this local variable to 'struct
bdi_writeback'. This is just a preparation for the further patches which will
make the forker thread decide when bdi threads should be killed.
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
This patch simplifies bdi code a little by removing the 'pending_list' which is
redundant. Indeed, currently the forker thread ('bdi_forker_thread()') is
working like this:
1. In a loop, fetch all bdi's which have works but have no writeback thread and
move them to the 'pending_list'.
2. If the list is empty, sleep for 5 sec.
3. Otherwise, take one bdi from the list, fork the writeback thread for this
bdi, and repeat the loop.
IOW, it first moves everything to the 'pending_list', then process only one
element, and so on. This patch simplifies the algorithm, which is now as
follows.
1. Find the first bdi which has a work and remove it from the global list of
bdi's (bdi_list).
2. If there was not such bdi, sleep 5 sec.
3. Fork the writeback thread for this bdi and repeat the loop.
IOW, now we find the first bdi to process, process it, and so on. This is
simpler and involves less lists.
The bonus now is that we can get rid of a couple of functions, as well as
remove complications which involve 'rcu_call()' and 'bdi->rcu_head'.
This patch also makes sure we use 'list_add_tail_rcu()', instead of plain
'list_add_tail()', but this piece of code is going to be removed in the next
patch anyway.
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
The write-back code mixes words "thread" and "task" for the same things. This
is not a big deal, but still an inconsistency.
hch: a convention I tend to use and I've seen in various places
is to always use _task for the storage of the task_struct pointer,
and thread everywhere else. This especially helps with having
foo_thread for the actual thread and foo_task for a global
variable keeping the task_struct pointer
This patch renames:
* 'bdi_add_default_flusher_task()' -> 'bdi_add_default_flusher_thread()'
* 'bdi_forker_task()' -> 'bdi_forker_thread()'
because bdi threads are 'bdi_writeback_thread()', so these names are more
consistent.
This patch also amends commentaries and makes them refer the forker and bdi
threads as "thread", not "task".
Also, while on it, make 'bdi_add_default_flusher_thread()' declaration use
'static void' instead of 'void static' and make checkpatch.pl happy.
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
linux/fs.h hard coded READ/WRITE constants which should match BIO_RW_*
flags. This is fragile and caused breakage during BIO_RW_* flag
rearrangement. The hardcoding is to avoid include dependency hell.
Create linux/bio_types.h which contatins definitions for bio data
structures and flags and include it from bio.h and fs.h, and make fs.h
define all READ/WRITE related constants in terms of BIO_RW_* flags.
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
Commit a82afdf (block: use the same failfast bits for bio and request)
moved BIO_RW_* bits around such that they match up with REQ_* bits.
Unfortunately, fs.h hard coded RW_MASK, RWA_MASK, READ, WRITE, READA
and SWRITE as 0, 1, 2 and 3, and expected them to match with BIO_RW_*
bits. READ/WRITE didn't change but BIO_RW_AHEAD was moved to bit 4
instead of bit 1, breaking RWA_MASK, READA and SWRITE.
This patch updates RWA_MASK, READA and SWRITE such that they match the
BIO_RW_* bits again. A follow up patch will update the definitions to
directly use BIO_RW_* bits so that this kind of breakage won't happen
again.
Neil also spotted missing RWA_MASK conversion.
Stable: The offending commit a82afdf was released with v2.6.32, so
this patch should be applied to all kernels since then but it must
_NOT_ be applied to kernels earlier than that.
Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-and-bisected-by: Vladislav Bolkhovitin <vst@vlnb.net>
Root-caused-by: Neil Brown <neilb@suse.de>
Cc: stable@kernel.org
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
Filesystems can call sb_issue_discard on a memory reclaim path
(e.g. ext4 calls sb_issue_discard during journal commit).
Use GFP_NOFS in sb_issue_discard to avoid recursing back into the FS.
Reported-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
include/trace/events/writeback.h uses dev_name(), so it needs to
include linux/device.h.
include/trace/events/writeback.h:12: error: implicit declaration of function 'dev_name'
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
block/compat_ioctl.c: In function 'compat_blkdev_ioctl':
block/compat_ioctl.c:754: error: 'BLKTRACESETUP32' undeclared (first use in this function)
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
The blktrace driver currently needs the BKL, but
we should not need to take that in the block layer,
so just push it down into the driver itself.
It is quite likely that the BKL is not actually
required in blktrace code and could be removed
in a follow-on patch.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
As a preparation for the removal of the big kernel
lock in the block layer, this removes the BKL
from the common ioctl handling code, moving it
into every single driver still using it.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
Add a trace event to the ->writepage loop in write_cache_pages to give
visibility into how the ->writepage call is changing variables within the
writeback control structure. Of most interest is how wbc->nr_to_write changes
from call to call, especially with filesystems that write multiple pages
in ->writepage.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
Tracing high level background writeback events is good, but it doesn't
give the entire picture. Add visibility into write throttling to catch IO
dispatched by foreground throttling of processing dirtying lots of pages.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
Trace queue/sched/exec parts of the writeback loop. This provides
insight into when and why flusher threads are scheduled to run. e.g
a sync invocation leaves traces like:
sync-[...]: writeback_queue: bdi 8:0: sb_dev 8:1 nr_pages=7712 sync_mode=0 kupdate=0 range_cyclic=0 background=0
flush-8:0-[...]: writeback_exec: bdi 8:0: sb_dev 8:1 nr_pages=7712 sync_mode=0 kupdate=0 range_cyclic=0 background=0
This also lays the foundation for adding more writeback tracing to
provide deeper insight into the whole writeback path.
The original tracing code is from Jens Axboe, though this version is
a rewrite as a result of the code being traced changing
significantly.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
SCSI-ml needs a way to mark a request as flush request in
q->prepare_flush_fn because it needs to identify them later (e.g. in
q->request_fn or prep_rq_fn).
queue_flush sets REQ_HARDBARRIER in rq->cmd_flags however the block
layer also sends normal REQ_TYPE_FS requests with REQ_HARDBARRIER. So
SCSI-ml can't use REQ_HARDBARRIER to identify flush requests.
We could change the block layer to clear REQ_HARDBARRIER bit before
sending non flush requests to the lower layers. However, intorudcing
the new flag looks cleaner (surely easier).
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Cc: James Bottomley <James.Bottomley@suse.de>
Cc: David S. Miller <davem@davemloft.net>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Alasdair G Kergon <agk@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
No real bugs I believe, just some dead code, and some
shut up code.
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Cc: Eric Paris <eparis@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
Allocating a fixed payload for discard requests always was a horrible hack,
and it's not coming to byte us when adding support for discard in DM/MD.
So change the code to leave the allocation of a payload to the lowlevel
driver. Unfortunately that means we'll need another hack, which allows
us to update the various block layer length fields indicating that we
have a payload. Instead of hiding this in sd.c, which we already partially
do for UNMAP support add a documented helper in the core block layer for it.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
Move all code for the writeback thread into fs/fs-writeback.c instead of
splitting it over two functions in two files.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
The wb_list member of struct backing_device_info always has exactly one
element. Just use the direct bdi->wb pointer instead and simplify some
code.
Also remove bdi_task_init which is now trivial to prepare for the next
patch.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
Remove the current bio flags and reuse the request flags for the bio, too.
This allows to more easily trace the type of I/O from the filesystem
down to the block driver. There were two flags in the bio that were
missing in the requests: BIO_RW_UNPLUG and BIO_RW_AHEAD. Also I've
renamed two request flags that had a superflous RW in them.
Note that the flags are in bio.h despite having the REQ_ name - as
blkdev.h includes bio.h that is the only way to go for now.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
Remove all the trivial wrappers for the cmd_type and cmd_flags fields in
struct requests. This allows much easier grepping for different request
types instead of unwinding through macros.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
block uses ISA_DMA_THRESHOLD for BLK_BOUNCE_ISA. Only SCSI uses
ISA_DMA_THRESHOLD for ancient drivers with non-zero
unchecked_isa_dma. Nowadays drivers (and subsystems) use dma_mask
properly instead of ISA_DMA_THRESHOLD.
Documentation/scsi/scsi_mid_low_api.txt says:
unchecked_isa_dma - 1=>only use bottom 16 MB of ram (ISA DMA addressing
restriction), 0=>can use full 32 bit (or better) DMA
address space
So block simply uses DMA_BIT_MASK(24) for BLK_BOUNCE_ISA for SCSI.
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Acked-by: James Bottomley <James.Bottomley@suse.de>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
A barrier request should by defintion have priority in get_request
and let the queue be unplugged immediately as it's blocking all forward
progress due to the queue draining.
Most filesystems already get this implicitly by the way how submit_bh
treats the buffer_ordered flag, and gfs2 sets it explicitly. But btrfs
and XFS are still forgetting to set the flag, as is blkdev_issue_flush
and some places in DM/MD.
For XFS on metadata heavy workloads this gives a consistent speedup
in the 2-3% range.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
There are two reasons for doing this:
- On SSD disks, the completion times aren't as random as they
are for rotational drives. So it's questionable whether they
should contribute to the random pool in the first place.
- Calling add_disk_randomness() has a lot of overhead.
This adds /sys/block/<dev>/queue/add_random that will allow you to
switch off on a per-device basis. The default setting is on, so there
should be no functional changes from this patch.
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
UP accessors didn't take care of __percpu notations leading to a lot
of spurious sparse warnings on UP configurations. Fix it.
Signed-off-by: Namhyung Kim <namhyung@gmail.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
A regression of building without CONFIG_HW_CONSOLE was introduced with
commit b45cfba4e9 (vt,console,kdb:
implement atomic console enter/leave functions).
ERROR: "con_debug_enter" [drivers/serial/kgdboc.ko] undefined!
ERROR: "vc_cons" [drivers/serial/kgdboc.ko] undefined!
ERROR: "fg_console" [drivers/serial/kgdboc.ko] undefined!
ERROR: "con_debug_leave" [drivers/serial/kgdboc.ko] undefined!
When there is no HW console the con_debug_enter and con_debug_leave
functions should have no code.
Signed-off-by: Jason Wessel <jason.wessel@windriver.com>
CC: Jesse Barnes <jbarnes@virtuousgeek.org>
Reported-by: Randy Dunlap <randy.dunlap@oracle.com>
* 'irq-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
xen: Do not suspend IPI IRQs.
powerpc: Use IRQF_NO_SUSPEND not IRQF_TIMER for non-timer interrupts
ixp4xx-beeper: Use IRQF_NO_SUSPEND not IRQF_TIMER for non-timer interrupt
irq: Add new IRQ flag IRQF_NO_SUSPEND
* 'timers-timekeeping-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
um: Fix read_persistent_clock fallout
kgdb: Do not access xtime directly
powerpc: Clean up obsolete code relating to decrementer and timebase
powerpc: Rework VDSO gettimeofday to prevent time going backwards
clocksource: Add __clocksource_updatefreq_hz/khz methods
x86: Convert common clocksources to use clocksource_register_hz/khz
timekeeping: Make xtime and wall_to_monotonic static
hrtimer: Cleanup direct access to wall_to_monotonic
um: Convert to use read_persistent_clock
timkeeping: Fix update_vsyscall to provide wall_to_monotonic offset
powerpc: Cleanup xtime usage
powerpc: Simplify update_vsyscall
time: Kill off CONFIG_GENERIC_TIME
time: Implement timespec_add
x86: Fix vtime/file timestamp inconsistencies
Trivial conflicts in Documentation/feature-removal-schedule.txt
Much less trivial conflicts in arch/powerpc/kernel/time.c resolved as
per Thomas' earlier merge commit 47916be4e2 ("Merge branch
'powerpc.cherry-picks' into timers/clocksource")