Provide a mechanism that allows running code in IRQ context. It is
most useful for NMI code that needs to interact with the rest of the
system -- like wakeup a task to drain buffers.
Perf currently has such a mechanism, so extract that and provide it as
a generic feature, independent of perf so that others may also
benefit.
The IRQ context callback is generated through self-IPIs where
possible, or on architectures like powerpc the decrementer (the
built-in timer facility) is set to generate an interrupt immediately.
Architectures that don't have anything like this get to do with a
callback from the timer tick. These architectures can call
irq_work_run() at the tail of any IRQ handlers that might enqueue such
work (like the perf IRQ handler) to avoid undue latencies in
processing the work.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by: Kyle McMartin <kyle@mcmartin.ca>
Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
[ various fixes ]
Signed-off-by: Huang Ying <ying.huang@intel.com>
LKML-Reference: <1287036094.7768.291.camel@yhuang-dev>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
early_init_irq_lock_class() is called way before anything touches the
irq descriptors. In case of SPARSE_IRQ=y this is a NOP operation
because the radix tree is empty at this point. For the SPARSE_IRQ=n
case it's sufficient to set the lock class in early_init_irq().
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Ingo Molnar <mingo@elte.hu>
The generic irq Kconfig options are copied around all archs. Provide a
generic Kconfig file which can be included.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
LKML-Reference: <20100927121843.217333624@linutronix.de>
Reviewed-by: H. Peter Anvin <hpa@zytor.com>
Reviewed-by: Ingo Molnar <mingo@elte.hu>
The size of a built-in initramfs is calculated in init/initramfs.c by
"__initramfs_end - __initramfs_start". Those symbols are defined in the
linker script include/asm-generic/vmlinux.lds.h:
#define INIT_RAM_FS \
. = ALIGN(PAGE_SIZE); \
VMLINUX_SYMBOL(__initramfs_start) = .; \
*(.init.ramfs) \
VMLINUX_SYMBOL(__initramfs_end) = .;
If the initramfs file has an odd number of bytes, the "__initramfs_end"
symbol points to an odd address, for example, the symbols in the
System.map might look like:
0000000000572000 T __initramfs_start
00000000005bcd05 T __initramfs_end <-- odd address
At least on s390 this causes a problem:
Certain s390 instructions, especially instructions for loading addresses
(larl) or branch addresses must be on even addresses. The compiler loads
the symbol addresses with the "larl" instruction. This instruction sets
the last bit to 0 and, therefore, for odd size files, the calculated size
is one byte less than it should be:
0000000000540a9c <populate_rootfs>:
540a9c: eb cf f0 78 00 24 stmg %r12,%r15,120(%r15),
540aa2: c0 10 00 01 8a af larl %r1,572000 <__initramfs_start>
540aa8: c0 c0 00 03 e1 2e larl %r12,5bcd04 <initramfs_end>
(Instead of 5bcd05)
...
540abe: 1b c1 sr %r12,%r1
To fix the problem, this patch introduces the global variable
__initramfs_size, which is calculated in the "usr/initramfs_data.S" file.
The populate_rootfs() function can then use the start marker of the
.init.ramfs section and the value of __initramfs_size for loading the
initramfs. Because the start marker and size is sufficient, the
__initramfs_end symbol is no longer needed and is removed.
Signed-off-by: Michael Holzheu <holzheu@linux.vnet.ibm.com>
Signed-off-by: Hendrik Brueckner <brueckner@linux.vnet.ibm.com>
Reviewed-by: WANG Cong <xiyou.wangcong@gmail.com>
Acked-by: Michal Marek <mmarek@suse.cz>
Acked-by: "H. Peter Anvin" <hpa@zytor.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Michal Marek <mmarek@suse.cz>
Replace duplicate code in NFSROOT for mounting an NFS server on '/'
with logic that uses the existing mainline text-based logic in the NFS
client.
Add documenting comments where appropriate.
Note that this means NFSROOT mounts now use the same default settings
as v2/v3 mounts done via mount(2) from user space.
vers=3,tcp,rsize=<negotiated default>,wsize=<negotiated default>
As before, however, no version/protocol negotiation with the server is
done.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
When CONFIG_BLOCK is not enabled:
init/do_mounts.c:71: error: implicit declaration of function 'dev_to_part'
init/do_mounts.c:71: warning: initialization makes pointer from integer without a cast
init/do_mounts.c:73: error: dereferencing pointer to incomplete type
init/do_mounts.c:76: error: dereferencing pointer to incomplete type
init/do_mounts.c:76: error: dereferencing pointer to incomplete type
init/do_mounts.c:102: error: implicit declaration of function 'part_pack_uuid'
init/do_mounts.c:104: error: 'block_class' undeclared (first use in this function)
Reported-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
o Actual implementation of throttling policy in block layer. Currently it
implements READ and WRITE bytes per second throttling logic. IOPS throttling
comes in later patches.
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
It is also called outside the scope of init functions. Stephen
reports:
WARNING: init/mounts.o(.text+0x21a): Section mismatch in reference from the function name_to_dev_t() to the function .init.text:match_dev_by_uuid()
The function name_to_dev_t() references
the function __init match_dev_by_uuid().
This is often because name_to_dev_t lacks a __init
annotation or the annotation of match_dev_by_uuid is wrong.
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
This is the third patch in a series which adds support for
storing partition metadata, optionally, off of the hd_struct.
One major use for that data is being able to resolve partition
by other identities than just the index on a block device. Device
enumeration varies by platform and there's a benefit to being able
to use something like EFI GPT's GUIDs to determine the correct
block device and partition to mount as the root.
This change adds that support to root= by adding support for
the following syntax:
root=PARTUUID=hex-uuid
Signed-off-by: Will Drewry <wad@chromium.org>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
Implement a small-memory-footprint uniprocessor-only implementation of
preemptible RCU. This implementation uses but a single blocked-tasks
list rather than the combinatorial number used per leaf rcu_node by
TREE_PREEMPT_RCU, which reduces memory consumption and greatly simplifies
processing. This version also takes advantage of uniprocessor execution
to accelerate grace periods in the case where there are no readers.
The general design is otherwise broadly similar to that of TREE_PREEMPT_RCU.
This implementation is a step towards having RCU implementation driven
off of the SMP and PREEMPT kernel configuration variables, which can
happen once this implementation has accumulated sufficient experience.
Removed ACCESS_ONCE() from __rcu_read_unlock() and added barrier() as
suggested by Steve Rostedt in order to avoid the compiler-reordering
issue noted by Mathieu Desnoyers (http://lkml.org/lkml/2010/8/16/183).
As can be seen below, CONFIG_TINY_PREEMPT_RCU represents almost 5Kbyte
savings compared to CONFIG_TREE_PREEMPT_RCU. Of course, for non-real-time
workloads, CONFIG_TINY_RCU is even better.
CONFIG_TREE_PREEMPT_RCU
text data bss dec filename
13 0 0 13 kernel/rcupdate.o
6170 825 28 7023 kernel/rcutree.o
----
7026 Total
CONFIG_TINY_PREEMPT_RCU
text data bss dec filename
13 0 0 13 kernel/rcupdate.o
2081 81 8 2170 kernel/rcutiny.o
----
2183 Total
CONFIG_TINY_RCU (non-preemptible)
text data bss dec filename
13 0 0 13 kernel/rcupdate.o
719 25 0 744 kernel/rcutiny.o
---
757 Total
Requested-by: Loïc Minier <loic.minier@canonical.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Commit cf244dc01b added a fourth level to the TREE_RCU hierarchy,
but the RCU_FANOUT help message still said "cube root". This commit
fixes this to "fourth root" and also emphasizes that production
systems are well-served by the default. (Stress-testing RCU itself
uses small RCU_FANOUT values in order to test large-system code paths
on small(er) systems.)
Located-by: John Kacur <jkacur@redhat.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Because both TINY_RCU and TREE_PREEMPT_RCU have been in mainline for
several releases, it is time to restrict the use of TREE_RCU to SMP
non-preemptible systems. This reduces testing/validation effort. This
commit is a first step towards driving the selection of RCU implementation
directly off of the SMP and PREEMPT configuration parameters.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Make do_execve() take a const filename pointer so that kernel_execve() compiles
correctly on ARM:
arch/arm/kernel/sys_arm.c:88: warning: passing argument 1 of 'do_execve' discards qualifiers from pointer target type
This also requires the argv and envp arguments to be consted twice, once for
the pointer array and once for the strings the array points to. This is
because do_execve() passes a pointer to the filename (now const) to
copy_strings_kernel(). A simpler alternative would be to cast the filename
pointer in do_execve() when it's passed to copy_strings_kernel().
do_execve() may not change any of the strings it is passed as part of the argv
or envp lists as they are some of them in .rodata, so marking these strings as
const should be fine.
Further kernel_execve() and sys_execve() need to be changed to match.
This has been test built on x86_64, frv, arm and mips.
Signed-off-by: David Howells <dhowells@redhat.com>
Tested-by: Ralf Baechle <ralf@linux-mips.org>
Acked-by: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* 'params' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6-for-linus: (22 commits)
param: don't deref arg in __same_type() checks
param: update drivers/acpi/debug.c to new scheme
param: use module_param in drivers/message/fusion/mptbase.c
ide: use module_param_named rather than module_param_call
param: update drivers/char/ipmi/ipmi_watchdog.c to new scheme
param: lock if_sdio's lbs_helper_name and lbs_fw_name against sysfs changes.
param: lock myri10ge_fw_name against sysfs changes.
param: simple locking for sysfs-writable charp parameters
param: remove unnecessary writable charp
param: add kerneldoc to moduleparam.h
param: locking for kernel parameters
param: make param sections const.
param: use free hook for charp (fix leak of charp parameters)
param: add a free hook to kernel_param_ops.
param: silence .init.text references from param ops
Add param ops struct for hvc_iucv driver.
nfs: update for module_param_named API change
AppArmor: update for module_param_named API change
param: use ops in struct kernel_param, rather than get and set fns directly
param: move the EXPORT_SYMBOL to after the definitions.
...
It's 11 months since we changed swap_map[] to indicates SWAP_HAS_CACHE.
Since that, memcg's swap accounting has been very stable and it seems
it can be maintained.
So, I'd like to remove EXPERIMENTAL from the config.
Acked-by: Balbir Singh <balbir@linux.vnet.ibm.com>
Acked-by: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Since this section can be read-only (they're in .rodata), they should
always have been const. Minor flow-through various functions.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Tested-by: Phil Carmody <ext-phil.2.carmody@nokia.com>
* 'for-linus' of git://git.infradead.org/users/eparis/notify: (132 commits)
fanotify: use both marks when possible
fsnotify: pass both the vfsmount mark and inode mark
fsnotify: walk the inode and vfsmount lists simultaneously
fsnotify: rework ignored mark flushing
fsnotify: remove global fsnotify groups lists
fsnotify: remove group->mask
fsnotify: remove the global masks
fsnotify: cleanup should_send_event
fanotify: use the mark in handler functions
audit: use the mark in handler functions
dnotify: use the mark in handler functions
inotify: use the mark in handler functions
fsnotify: send fsnotify_mark to groups in event handling functions
fsnotify: Exchange list heads instead of moving elements
fsnotify: srcu to protect read side of inode and vfsmount locks
fsnotify: use an explicit flag to indicate fsnotify_destroy_mark has been called
fsnotify: use _rcu functions for mark list traversal
fsnotify: place marks on object in order of group memory address
vfs/fsnotify: fsnotify_close can delay the final work in fput
fsnotify: store struct file not struct path
...
Fix up trivial delete/modify conflict in fs/notify/inotify/inotify.c.
Andrew Morton suggested that the do_one_initcall and do_one_initcall_debug
functions can be marked __init_or_module such that they can be discarded
for the CONFIG_MODULES=N case.
Signed-off-by: Kevin Winchester <kjwinchester@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Using:
gcc (GCC) 4.5.0 20100610 (prerelease)
The following warning appears:
init/main.c: In function `do_one_initcall':
init/main.c:730:10: warning: `calltime.tv64' may be used uninitialized in this function
This warning is actually correct, as the global initcall_debug could
arguably be changed by the initcall.
Correct this warning by extracting a new function, do_one_initcall_debug,
that performs the initcall for the debug case.
Signed-off-by: Kevin Winchester <kjwinchester@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* 'bkl/core' of git://git.kernel.org/pub/scm/linux/kernel/git/frederic/random-tracing:
do_coredump: Do not take BKL
init: Remove the BKL from startup code
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq: (55 commits)
workqueue: mark init_workqueues() as early_initcall()
workqueue: explain for_each_*cwq_cpu() iterators
fscache: fix build on !CONFIG_SYSCTL
slow-work: kill it
gfs2: use workqueue instead of slow-work
drm: use workqueue instead of slow-work
cifs: use workqueue instead of slow-work
fscache: drop references to slow-work
fscache: convert operation to use workqueue instead of slow-work
fscache: convert object to use workqueue instead of slow-work
workqueue: fix how cpu number is stored in work->data
workqueue: fix mayday_mask handling on UP
workqueue: fix build problem on !CONFIG_SMP
workqueue: fix locking in retry path of maybe_create_worker()
async: use workqueue for worker pool
workqueue: remove WQ_SINGLE_CPU and use WQ_UNBOUND instead
workqueue: implement unbound workqueue
workqueue: prepare for WQ_UNBOUND implementation
libata: take advantage of cmwq and remove concurrency limitations
workqueue: fix worker management invocation without pending works
...
Fixed up conflicts in fs/cifs/* as per Tejun. Other trivial conflicts in
include/linux/workqueue.h, kernel/trace/Kconfig and kernel/workqueue.c
* 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (162 commits)
tracing/kprobes: unregister_trace_probe needs to be called under mutex
perf: expose event__process function
perf events: Fix mmap offset determination
perf, powerpc: fsl_emb: Restore setting perf_sample_data.period
perf, powerpc: Convert the FSL driver to use local64_t
perf tools: Don't keep unreferenced maps when unmaps are detected
perf session: Invalidate last_match when removing threads from rb_tree
perf session: Free the ref_reloc_sym memory at the right place
x86,mmiotrace: Add support for tracing STOS instruction
perf, sched migration: Librarize task states and event headers helpers
perf, sched migration: Librarize the GUI class
perf, sched migration: Make the GUI class client agnostic
perf, sched migration: Make it vertically scrollable
perf, sched migration: Parameterize cpu height and spacing
perf, sched migration: Fix key bindings
perf, sched migration: Ignore unhandled task states
perf, sched migration: Handle ignored migrate out events
perf: New migration tool overview
tracing: Drop cpparg() macro
perf: Use tracepoint_synchronize_unregister() to flush any pending tracepoint call
...
Fix up trivial conflicts in Makefile and drivers/cpufreq/cpufreq.c
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu:
percpu: allow limited allocation before slab is online
percpu: make @dyn_size always mean min dyn_size in first chunk init functions
Mark init_workqueues() as early_initcall() and thus it will be initialized
before smp bringup. init_workqueues() registers for the hotcpu notifier
and thus it should cope with the processors that are brought online after
the workqueues are initialized.
x86 smp bringup code uses workqueues and uses a workaround for the
cold boot process (as the workqueues are initialized post smp_init()).
Marking init_workqueues() as early_initcall() will pave the way for
cleaning up this code.
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Audit watch should depend on CONFIG_AUDIT_SYSCALL and should select
FSNOTIFY. This splits the spagetti like mixing of audit_watch and
audit_filter code so they can be configured seperately.
Signed-off-by: Eric Paris <eparis@redhat.com>
CONFIG_AUDIT builds audit_watches which depend on fsnotify. Make
CONFIG_AUDIT select fsnotify.
Reported-by: Alexander Beregalov <a.beregalov@gmail.com>
Signed-off-by: Eric Paris <eparis@redhat.com>
Simply switch audit_trees from using inotify to using fsnotify for it's
inode pinning and disappearing act information.
Signed-off-by: Eric Paris <eparis@redhat.com>
I have shown by code review that no driver takes
the BKL at init time any more, so whatever the
init code was locking against is no longer there
and it is now safe to remove the BKL there.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Steven Rostedt <rostedt@goodmis>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Ilya reported that on a very slow machine he could reliably
reproduce a race between forking init and kthreadd. We first
fork init so that it obtains pid-1, however since the scheduler
is already fully running at this point it can preempt and run
the init thread before we spawn and set kthreadd_task.
The init thread can then attempt spawning kthreads without
kthreadd being present which results in an OOPS.
Reported-by: Ilya Loginov <isloginov@gmail.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
LKML-Reference: <1277736661.3561.110.camel@laptop>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This patch updates percpu allocator such that it can serve limited
amount of allocation before slab comes online. This is primarily to
allow slab to depend on working percpu allocator.
Two parameters, PERCPU_DYNAMIC_EARLY_SIZE and SLOTS, determine how
much memory space and allocation map slots are reserved. If this
reserved area is exhausted, WARN_ON_ONCE() will trigger and allocation
will fail till slab comes online.
The following changes are made to implement early alloc.
* pcpu_mem_alloc() now checks slab_is_available()
* Chunks are allocated using pcpu_mem_alloc()
* Init paths make sure ai->dyn_size is at least as large as
PERCPU_DYNAMIC_EARLY_SIZE.
* Initial alloc maps are allocated in __initdata and copied to
kmalloc'd areas once slab is online.
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Christoph Lameter <cl@linux-foundation.org>
Patch is against latest Linus master branch and is expected to be
safe bug fix.
You get:
ACPI: HARDWARE addr space,NOT supported yet
for each ACPI defined CPU which status is active, but exceeds
maxcpus= count.
As these "not booted" CPUs do not run an idle routine
and echo X >/proc/acpi/processor/*/throttling did not work
I couldn't find a way to really access not onlined/booted
machines. Still this should get fixed and
/proc/acpi/processor/X dirs of cores exceeding maxcpus
should not show up.
I wonder whether this could get cleaned up by truncating possible cpu mask
and nr_cpu_ids to setup_max_cpus early some day
(and not exporting setup_max_cpus anymore then).
But this needs touching of a lot other places...
Signed-off-by: Thomas Renninger <trenn@suse.de>
CC: travis@sgi.com
CC: linux-acpi@vger.kernel.org
CC: lenb@kernel.org
Signed-off-by: Len Brown <len.brown@intel.com>
We have been resisting new ftrace plugins and removing existing
ones, and kmemtrace has been superseded by kmem trace events
and perf-kmem, so we remove it.
Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Acked-by: Pekka Enberg <penberg@cs.helsinki.fi>
Acked-by: Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Steven Rostedt <rostedt@goodmis.org>
[ remove kmemtrace from the makefile, handle slob too ]
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
The boot tracer is useless. It simply logs the initcalls
but in fact these initcalls are also logged through printk
while using the initcall_debug kernel parameter.
Nobody seem to be using it so far. Then just remove it.
Signed-off-by: WANG Cong <xiyou.wangcong@gmail.com>
Cc: Chase Douglas <chase.douglas@canonical.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Li Zefan <lizf@cn.fujitsu.com>
LKML-Reference: <20100526105753.GA5677@cr0.nay.redhat.com>
[ remove the hooks in main.c, and the headers ]
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
* 'for-35' of git://repo.or.cz/linux-kbuild: (81 commits)
kbuild: Revert part of e8d400a to resolve a conflict
kbuild: Fix checking of scm-identifier variable
gconfig: add support to show hidden options that have prompts
menuconfig: add support to show hidden options which have prompts
gconfig: remove show_debug option
gconfig: remove dbg_print_ptype() and dbg_print_stype()
kconfig: fix zconfdump()
kconfig: some small fixes
add random binaries to .gitignore
kbuild: Include gen_initramfs_list.sh and the file list in the .d file
kconfig: recalc symbol value before showing search results
.gitignore: ignore *.lzo files
headerdep: perlcritic warning
scripts/Makefile.lib: Align the output of LZO
kbuild: Generate modules.builtin in make modules_install
Revert "kbuild: specify absolute paths for cscope"
kbuild: Do not unnecessarily regenerate modules.builtin
headers_install: use local file handles
headers_check: fix perl warnings
export_report: fix perl warnings
...
For each new populated zone of hotadded node, need to update its pagesets
with dynamically allocated per_cpu_pageset struct for all possible CPUs:
1) Detach zone->pageset from the shared boot_pageset
at end of __build_all_zonelists().
2) Use mutex to protect zone->pageset when it's still
shared in onlined_pages()
Otherwises, multiple zones of different nodes would share same boot strapping
boot_pageset for same CPU, which will finally cause below kernel panic:
------------[ cut here ]------------
kernel BUG at mm/page_alloc.c:1239!
invalid opcode: 0000 [#1] SMP
...
Call Trace:
[<ffffffff811300c1>] __alloc_pages_nodemask+0x131/0x7b0
[<ffffffff81162e67>] alloc_pages_current+0x87/0xd0
[<ffffffff81128407>] __page_cache_alloc+0x67/0x70
[<ffffffff811325f0>] __do_page_cache_readahead+0x120/0x260
[<ffffffff81132751>] ra_submit+0x21/0x30
[<ffffffff811329c6>] ondemand_readahead+0x166/0x2c0
[<ffffffff81132ba0>] page_cache_async_readahead+0x80/0xa0
[<ffffffff8112a0e4>] generic_file_aio_read+0x364/0x670
[<ffffffff81266cfa>] nfs_file_read+0xca/0x130
[<ffffffff8117b20a>] do_sync_read+0xfa/0x140
[<ffffffff8117bf75>] vfs_read+0xb5/0x1a0
[<ffffffff8117c151>] sys_read+0x51/0x80
[<ffffffff8103c032>] system_call_fastpath+0x16/0x1b
RIP [<ffffffff8112ff13>] get_page_from_freelist+0x883/0x900
RSP <ffff88000d1e78a8>
---[ end trace 4bda28328b9990db ]
[akpm@linux-foundation.org: merge fix]
Signed-off-by: Haicheng Li <haicheng.li@linux.intel.com>
Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
Reviewed-by: Andi Kleen <andi.kleen@intel.com>
Reviewed-by: Christoph Lameter <cl@linux-foundation.org>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Tejun Heo <tj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The kernel debugger can operate well before mm_init(), but the x86
hardware breakpoint code which uses the perf api requires that the
kernel allocators are initialized.
This means the kernel debug core needs to provide an optional arch
specific call back to allow the initialization functions to run after
the kernel has been further initialized.
The kdb shell already had a similar restriction with an early
initialization and late initialization. The kdb_init() was moved into
the debug core's version of the late init which is called
dbg_late_init();
CC: kgdb-bugreport@lists.sourceforge.net
Signed-off-by: Jason Wessel <jason.wessel@windriver.com>
This patch contains the hooks and instrumentation into kernel which
live outside the kernel/debug directory, which the kdb core
will call to run commands like lsmod, dmesg, bt etc...
CC: linux-arch@vger.kernel.org
Signed-off-by: Jason Wessel <jason.wessel@windriver.com>
Signed-off-by: Martin Hicks <mort@sgi.com>
This patch fixes few usability and configurability issues.
o All the cgroup based controller options are configurable from
"Genral Setup/Control Group Support/" menu. blkio is the only exception.
Hence make this option visible in above menu and make it configurable from
there to bring it inline with rest of the cgroup based controllers.
o Get rid of CONFIG_DEBUG_CFQ_IOSCHED.
This option currently does two things.
- Enable printing of cgroup paths in blktrace
- Enables CONFIG_DEBUG_BLK_CGROUP, which in turn displays additional stat
files in cgroup.
If we are using group scheduling, blktrace data is of not really much use
if cgroup information is not present. To get this data, currently one has to
also enable CONFIG_DEBUG_CFQ_IOSCHED, which in turn brings the overhead of
all the additional debug stat files which is not desired.
Hence, this patch moves printing of cgroup paths under
CONFIG_CFQ_GROUP_IOSCHED.
This allows us to get rid of CONFIG_DEBUG_CFQ_IOSCHED completely. Now all
the debug stat files are controlled only by CONFIG_DEBUG_BLK_CGROUP which
can be enabled through config menu.
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Acked-by: Divyesh Shah <dpshah@google.com>
Reviewed-by: Gui Jianfeng <guijianfeng@cn.fujitsu.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
The unpack routine fails to handle the decompress_method() returning
unrecognised decompressor (compress_name == NULL). This results in the
routine looping eventually oopsing on an out of bounds memory access.
Note this bug is usually hidden, only triggering on trailing junk after
one or more correct compressed blocks. The case of the compressed archive
being complete junk is (by accident?) caught by the if (state != Reset)
check because state is initialised to Start, but not updated due to the
decompressor not having been called. Obviously if the junk is trailing a
correctly decompressed buffer, state == Reset from the previous call to
the decompressor.
Signed-off-by: Phillip Lougher <phillip@lougher.demon.co.uk>
Reported-by: Aaro Koskinen <aaro.koskinen@iki.fi>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This is left over from commit 7c9414385e ("sched: Remove USER_SCHED"")
Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Acked-by: Dhaval Giani <dhaval.giani@gmail.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: David Howells <dhowells@redhat.com>
LKML-Reference: <4BA9A05F.7010407@cn.fujitsu.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
percpu.h is included by sched.h and module.h and thus ends up being
included when building most .c files. percpu.h includes slab.h which
in turn includes gfp.h making everything defined by the two files
universally available and complicating inclusion dependencies.
percpu.h -> slab.h dependency is about to be removed. Prepare for
this change by updating users of gfp and slab facilities include those
headers directly instead of assuming availability. As this conversion
needs to touch large number of source files, the following script is
used as the basis of conversion.
http://userweb.kernel.org/~tj/misc/slabh-sweep.py
The script does the followings.
* Scan files for gfp and slab usages and update includes such that
only the necessary includes are there. ie. if only gfp is used,
gfp.h, if slab is used, slab.h.
* When the script inserts a new include, it looks at the include
blocks and try to put the new include such that its order conforms
to its surrounding. It's put in the include block which contains
core kernel includes, in the same order that the rest are ordered -
alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
doesn't seem to be any matching order.
* If the script can't find a place to put a new include (mostly
because the file doesn't have fitting include block), it prints out
an error message indicating which .h file needs to be added to the
file.
The conversion was done in the following steps.
1. The initial automatic conversion of all .c files updated slightly
over 4000 files, deleting around 700 includes and adding ~480 gfp.h
and ~3000 slab.h inclusions. The script emitted errors for ~400
files.
2. Each error was manually checked. Some didn't need the inclusion,
some needed manual addition while adding it to implementation .h or
embedding .c file was more appropriate for others. This step added
inclusions to around 150 files.
3. The script was run again and the output was compared to the edits
from #2 to make sure no file was left behind.
4. Several build tests were done and a couple of problems were fixed.
e.g. lib/decompress_*.c used malloc/free() wrappers around slab
APIs requiring slab.h to be added manually.
5. The script was run on all .h files but without automatically
editing them as sprinkling gfp.h and slab.h inclusions around .h
files could easily lead to inclusion dependency hell. Most gfp.h
inclusion directives were ignored as stuff from gfp.h was usually
wildly available and often used in preprocessor macros. Each
slab.h inclusion directive was examined and added manually as
necessary.
6. percpu.h was updated not to include slab.h.
7. Build test were done on the following configurations and failures
were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my
distributed build env didn't work with gcov compiles) and a few
more options had to be turned off depending on archs to make things
build (like ipr on powerpc/64 which failed due to missing writeq).
* x86 and x86_64 UP and SMP allmodconfig and a custom test config.
* powerpc and powerpc64 SMP allmodconfig
* sparc and sparc64 SMP allmodconfig
* ia64 SMP allmodconfig
* s390 SMP allmodconfig
* alpha SMP allmodconfig
* um on x86_64 SMP allmodconfig
8. percpu.h modifications were reverted so that it could be applied as
a separate patch and serve as bisection point.
Given the fact that I had only a couple of failures from tests on step
6, I'm fairly confident about the coverage of this conversion patch.
If there is a breakage, it's likely to be something in one of the arch
headers which should be easily discoverable easily on most builds of
the specific arch.
Signed-off-by: Tejun Heo <tj@kernel.org>
Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
cpuset_mem_spread_node() returns an offline node, and causes an oops.
This patch fixes it by initializing task->mems_allowed to
node_states[N_HIGH_MEMORY], and updating task->mems_allowed when doing
memory hotplug.
Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
Acked-by: David Rientjes <rientjes@google.com>
Reported-by: Nick Piggin <npiggin@suse.de>
Tested-by: Nick Piggin <npiggin@suse.de>
Cc: Paul Menage <menage@google.com>
Cc: Li Zefan <lizf@cn.fujitsu.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This patchset introduces eventfd-based API for notifications in cgroups
and implements memory notifications on top of it.
It uses statistics in memory controler to track memory usage.
Output of time(1) on building kernel on tmpfs:
Root cgroup before changes:
make -j2 506.37 user 60.93s system 193% cpu 4:52.77 total
Non-root cgroup before changes:
make -j2 507.14 user 62.66s system 193% cpu 4:54.74 total
Root cgroup after changes (0 thresholds):
make -j2 507.13 user 62.20s system 193% cpu 4:53.55 total
Non-root cgroup after changes (0 thresholds):
make -j2 507.70 user 64.20s system 193% cpu 4:55.70 total
Root cgroup after changes (1 thresholds, never crossed):
make -j2 506.97 user 62.20s system 193% cpu 4:53.90 total
Non-root cgroup after changes (1 thresholds, never crossed):
make -j2 507.55 user 64.08s system 193% cpu 4:55.63 total
This patch:
Introduce the write-only file "cgroup.event_control" in every cgroup.
To register new notification handler you need:
- create an eventfd;
- open a control file to be monitored. Callbacks register_event() and
unregister_event() must be defined for the control file;
- write "<event_fd> <control_fd> <args>" to cgroup.event_control.
Interpretation of args is defined by control file implementation;
eventfd will be woken up by control file implementation or when the
cgroup is removed.
To unregister notification handler just close eventfd.
If you need notification functionality for a control file you have to
implement callbacks register_event() and unregister_event() in the
struct cftype.
[kamezawa.hiroyu@jp.fujitsu.com: Kconfig fix]
Signed-off-by: Kirill A. Shutemov <kirill@shutemov.name>
Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Paul Menage <menage@google.com>
Cc: Li Zefan <lizf@cn.fujitsu.com>
Cc: Balbir Singh <balbir@linux.vnet.ibm.com>
Cc: Pavel Emelyanov <xemul@openvz.org>
Cc: Dan Malek <dan@embeddedalley.com>
Cc: Vladislav Buzov <vbuzov@embeddedalley.com>
Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Cc: Alexander Shishkin <virtuoso@slind.org>
Cc: Davide Libenzi <davidel@xmailserver.org>
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The only in tree external users of the symbol setup_max_cpus are in
arch/x86/. The files ./kernel/alternative.c, ./kernel/visws_quirks.c, and
./mm/kmemcheck/kmemcheck.c are all guarded by CONFIG_SMP being defined.
For this case the symbol is an unsigned int and declared as an extern in
include/linux/smp.h.
When CONFIG_SMP is not defined the symbol setup_max_cpus is
a constant value that is only used in init/main.c. Make the symbol
static for this case.
Signed-off-by: H Hartley Sweeten <hsweeten@visionengravers.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The symbol 'count' is a local global variable in this file. The function
clean_rootfs() should use a different symbol name to prevent "symbol
shadows an earlier one" noise.
Signed-off-by: H Hartley Sweeten <hsweeten@visionengravers.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
- new Documentation/init.txt file describing various forms of failure
trying to load the init binary after kernel bootup
- extend the init/main.c init failure message to direct to
Documentation/init.txt
Signed-off-by: Andreas Mohr <andi@lisas.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
There are quite a few GFP_KERNEL memory allocations made during
suspend/hibernation and resume that may cause the system to hang, because
the I/O operations they depend on cannot be completed due to the
underlying devices being suspended.
Avoid this problem by clearing the __GFP_IO and __GFP_FS bits in
gfp_allowed_mask before suspend/hibernation and restoring the original
values of these bits in gfp_allowed_mask durig the subsequent resume.
[akpm@linux-foundation.org: fix CONFIG_PM=n linkage]
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Reported-by: Maxim Levitsky <maximlevitsky@gmail.com>
Cc: Sebastian Ott <sebott@linux.vnet.ibm.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6: (52 commits)
init: Open /dev/console from rootfs
mqueue: fix typo "failues" -> "failures"
mqueue: only set error codes if they are really necessary
mqueue: simplify do_open() error handling
mqueue: apply mathematics distributivity on mq_bytes calculation
mqueue: remove unneeded info->messages initialization
mqueue: fix mq_open() file descriptor leak on user-space processes
fix race in d_splice_alias()
set S_DEAD on unlink() and non-directory rename() victims
vfs: add NOFOLLOW flag to umount(2)
get rid of ->mnt_parent in tomoyo/realpath
hppfs can use existing proc_mnt, no need for do_kern_mount() in there
Mirror MS_KERNMOUNT in ->mnt_flags
get rid of useless vfsmount_lock use in put_mnt_ns()
Take vfsmount_lock to fs/internal.h
get rid of insanity with namespace roots in tomoyo
take check for new events in namespace (guts of mounts_poll()) to namespace.c
Don't mess with generic_permission() under ->d_lock in hpfs
sanitize const/signedness for udf
nilfs: sanitize const/signedness in dealing with ->d_name.name
...
Fix up fairly trivial (famous last words...) conflicts in
drivers/infiniband/core/uverbs_main.c and security/tomoyo/realpath.c
To avoid potential problems with an empty /dev open /dev/console
from rootfs instead of waiting to mount our root filesystem and
mounting it there. This effectively guarantees that there will
be a device node, and it won't be on a filesystem that we will
ever unmount, so there are no issues with leaving /dev/console
open and pinning the filesystem.
This is actually more effective than automatically mounting
devtmpfs on /dev because it removes removes the occasionally
problematic assumption that /dev/console exists from the boot
code.
With this patch I was able to throw busybox on my /boot partition
(which has no /dev directory) and boot into userspace without
problems.
The only possible negative consequence I can think of is that
someone out there deliberately used did not use a character device
that is major 5 minor 2 for /dev/console. Does anyone know of a
situation in which that could make sense?
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* 'x86-apic-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (25 commits)
x86: Fix out of order of gsi
x86: apic: Fix mismerge, add arch_probe_nr_irqs() again
x86, irq: Keep chip_data in create_irq_nr and destroy_irq
xen: Remove unnecessary arch specific xen irq functions.
smp: Use nr_cpus= to set nr_cpu_ids early
x86, irq: Remove arch_probe_nr_irqs
sparseirq: Use radix_tree instead of ptrs array
sparseirq: Change irq_desc_ptrs to static
init: Move radix_tree_init() early
irq: Remove unnecessary bootmem code
x86: Add iMac9,1 to pci_reboot_dmi_table
x86: Convert i8259_lock to raw_spinlock
x86: Convert nmi_lock to raw_spinlock
x86: Convert ioapic_lock and vector_lock to raw_spinlock
x86: Avoid race condition in pci_enable_msix()
x86: Fix SCI on IOAPIC != 0
x86, ia32_aout: do not kill argument mapping
x86, irq: Move __setup_vector_irq() before the first irq enable in cpu online path
x86, irq: Update the vector domain for legacy irqs handled by io-apic
x86, irq: Don't block IRQ0_VECTOR..IRQ15_VECTOR's on all cpu's
...
* 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (25 commits)
sched: Fix SCHED_MC regression caused by change in sched cpu_power
sched: Don't use possibly stale sched_class
kthread, sched: Remove reference to kthread_create_on_cpu
sched: cpuacct: Use bigger percpu counter batch values for stats counters
percpu_counter: Make __percpu_counter_add an inline function on UP
sched: Remove member rt_se from struct rt_rq
sched: Change usage of rt_rq->rt_se to rt_rq->tg->rt_se[cpu]
sched: Remove unused update_shares_locked()
sched: Use for_each_bit
sched: Queue a deboosted task to the head of the RT prio queue
sched: Implement head queueing for sched_rt
sched: Extend enqueue_task to allow head queueing
sched: Remove USER_SCHED
sched: Fix the place where group powers are updated
sched: Assume *balance is valid
sched: Remove load_balance_newidle()
sched: Unify load_balance{,_newidle}()
sched: Add a lock break for PREEMPT=y
sched: Remove from fwd decls
sched: Remove rq_iterator from move_one_task
...
Fix up trivial conflicts in kernel/sched.c
* 'oprofile-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
oprofile/x86: fix msr access to reserved counters
oprofile/x86: use kzalloc() instead of kmalloc()
oprofile/x86: fix perfctr nmi reservation for mulitplexing
oprofile/x86: add comment to counter-in-use warning
oprofile/x86: warn user if a counter is already active
oprofile/x86: implement randomization for IBS periodic op counter
oprofile/x86: implement lsfr pseudo-random number generator for IBS
oprofile/x86: implement IBS cpuid feature detection
oprofile/x86: remove node check in AMD IBS initialization
oprofile/x86: remove OPROFILE_IBS config option
oprofile: remove EXPERIMENTAL from the config option description
oprofile: remove tracing build dependency
* 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (44 commits)
rcu: Fix accelerated GPs for last non-dynticked CPU
rcu: Make non-RCU_PROVE_LOCKING rcu_read_lock_sched_held() understand boot
rcu: Fix accelerated grace periods for last non-dynticked CPU
rcu: Export rcu_scheduler_active
rcu: Make rcu_read_lock_sched_held() take boot time into account
rcu: Make lockdep_rcu_dereference() message less alarmist
sched, cgroups: Fix module export
rcu: Add RCU_CPU_STALL_VERBOSE to dump detailed per-task information
rcu: Fix rcutorture mod_timer argument to delay one jiffy
rcu: Fix deadlock in TREE_PREEMPT_RCU CPU stall detection
rcu: Convert to raw_spinlocks
rcu: Stop overflowing signed integers
rcu: Use canonical URL for Mathieu's dissertation
rcu: Accelerate grace period if last non-dynticked CPU
rcu: Fix citation of Mathieu's dissertation
rcu: Documentation update for CONFIG_PROVE_RCU
security: Apply lockdep-based checking to rcu_dereference() uses
idr: Apply lockdep-based diagnostics to rcu_dereference() uses
radix-tree: Disable RCU lockdep checking in radix tree
vfs: Abstract rcu_dereference_check for files-fdtable use
...
Currently, rcu_needs_cpu() simply checks whether the current CPU
has an outstanding RCU callback, which means that the last CPU
to go into dyntick-idle mode might wait a few ticks for the
relevant grace periods to complete. However, if all the other
CPUs are in dyntick-idle mode, and if this CPU is in a quiescent
state (which it is for RCU-bh and RCU-sched any time that we are
considering going into dyntick-idle mode), then the grace period
is instantly complete.
This patch therefore repeatedly invokes the RCU grace-period
machinery in order to force any needed grace periods to complete
quickly. It does so a limited number of times in order to
prevent starvation by an RCU callback function that might pass
itself to call_rcu().
However, if any CPU other than the current one is not in
dyntick-idle mode, fall back to simply checking (with fix to bug
noted by Lai Jiangshan). Also, take advantage of last
grace-period forcing, the opportunity to do so noted by Steve
Rostedt. And apply simplified #ifdef condition suggested by
Frederic Weisbecker.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: laijs@cn.fujitsu.com
Cc: dipankar@in.ibm.com
Cc: mathieu.desnoyers@polymtl.ca
Cc: josh@joshtriplett.org
Cc: dvhltc@us.ibm.com
Cc: niv@us.ibm.com
Cc: peterz@infradead.org
Cc: rostedt@goodmis.org
Cc: Valdis.Kletnieks@vt.edu
Cc: dhowells@redhat.com
LKML-Reference: <1266887105-1528-15-git-send-email-paulmck@linux.vnet.ibm.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
On x86, before prefill_possible_map(), nr_cpu_ids will be NR_CPUS aka
CONFIG_NR_CPUS.
Add nr_cpus= to set nr_cpu_ids. so we can simulate cpus <=8 are installed on
normal config.
-v2: accordging to Christoph, acpi_numa_init should use nr_cpu_ids in stead of
NR_CPUS.
-v3: add doc in kernel-parameters.txt according to Andrew.
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
LKML-Reference: <1265793639-15071-34-git-send-email-yinghai@kernel.org>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Cc: Tony Luck <tony.luck@intel.com>
Prepare for using radix trees in early_irq_init().
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
LKML-Reference: <1265793639-15071-30-git-send-email-yinghai@kernel.org>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
ima wants to create an inode information struct (iint) when inodes are
allocated. This means that at least the part of ima which does this
allocation (the allocation is filled with information later) should
before any inodes are created. To accomplish this we split the ima
initialization routine placing the kmem cache allocator inside a
security_initcall() function. Since this makes use of radix trees we also
need to make sure that is initialized before security_initcall().
Signed-off-by: Eric Paris <eparis@redhat.com>
Acked-by: Mimi Zohar <zohar@linux.vnet.ibm.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
This adds CROSS_COMPILE as a kconfig string so you can store it in
.config. Then you can use plain "make" in the configured kernel build
directory to do the right cross compilation without setting the
command-line or environment variable every time.
With this, you can set up different build directories for different kernel
configurations, whether native or cross-builds, and then use the simple:
make -C /build/dir M=module-source-dir
idiom to build modules for any given target kernel, indicating which one
by nothing but the build directory chosen.
I tried a version that defaults the string with env="CROSS_COMPILE" so
that in a "make oldconfig" with CROSS_COMPILE in the environment you can
just hit return to store the way you're building it. But the kconfig
prompt for strings doesn't give you any way to say you want an empty
string instead of the default, so I punted that.
Signed-off-by: Roland McGrath <roland@redhat.com>
Cc: Sam Ravnborg <sam@ravnborg.org>
Cc: Anibal Monsalve Salazar <anibal@debian.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Michal Marek <mmarek@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Michal Marek <mmarek@suse.cz>
Remove the USER_SCHED feature. It has been scheduled to be removed in
2.6.34 as per http://marc.info/?l=linux-kernel&m=125728479022976&w=2
Signed-off-by: Dhaval Giani <dhaval.giani@gmail.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <1263990378.24844.3.camel@localhost>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This patch series adds generic support for creating and extracting
LZO-compressed kernel images, as well as support for using such images on
the x86 and ARM architectures, and support for creating and using
LZO-compressed initrd and initramfs images.
Russell King said:
: Testing on a Cortex A9 model:
: - lzo decompressor is 65% of the time gzip takes to decompress a kernel
: - lzo kernel is 9% larger than a gzip kernel
:
: which I'm happy to say confirms your figures when comparing the two.
:
: However, when comparing your new gzip code to the old gzip code:
: - new is 99% of the size of the old code
: - new takes 42% of the time to decompress than the old code
:
: What this means is that for a proper comparison, the results get even better:
: - lzo is 7.5% larger than the old gzip'd kernel image
: - lzo takes 28% of the time that the old gzip code took
:
: So the expense seems definitely worth the effort. The only reason I
: can think of ever using gzip would be if you needed the additional
: compression (eg, because you have limited flash to store the image.)
:
: I would argue that the default for ARM should therefore be LZO.
This patch:
The lzo compressor is worse than gzip at compression, but faster at
extraction. Here are some figures for an ARM board I'm working on:
Uncompressed size: 3.24Mo
gzip 1.61Mo 0.72s
lzo 1.75Mo 0.48s
So for a compression ratio that is still relatively close to gzip, it's
much faster to extract, at least in that case.
This part contains:
- Makefile routine to support lzo compression
- Fixes to the existing lzo compressor so that it can be used in
compressed kernels
- wrapper around the existing lzo1x_decompress, as it only extracts one
block at a time, while we need to extract a whole file here
- config dialog for kernel compression
[akpm@linux-foundation.org: coding-style fixes]
[akpm@linux-foundation.org: cleanup]
Signed-off-by: Albin Tonnerre <albin.tonnerre@free-electrons.com>
Tested-by: Wu Zhangjin <wuzhangjin@gmail.com>
Acked-by: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Russell King <rmk@arm.linux.org.uk>
Acked-by: Russell King <rmk@arm.linux.org.uk>
Cc: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This patch introduces an interface to process data objects
in parallel. The parallelized objects return after serialization
in the same order as they were before the parallelization.
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Quoted from Ingo:
| This reminds me - i think we should eliminate CONFIG_EVENT_PROFILE -
| it's an unnecessary Kconfig complication. If both PERF_EVENTS and
| EVENT_TRACING is enabled we should expose generic tracepoints.
|
| Nor is it limited to event 'profiling', so it has become a misnomer as
| well.
Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Paul Mackerras <paulus@samba.org>
LKML-Reference: <4B2F1557.2050705@cn.fujitsu.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86, irq: Allow 0xff for /proc/irq/[n]/smp_affinity on an 8-cpu system
Makefile: Unexport LC_ALL instead of clearing it
x86: Fix objdump version check in arch/x86/tools/chkobjdump.awk
x86: Reenable TSC sync check at boot, even with NONSTOP_TSC
x86: Don't use POSIX character classes in gen-insn-attr-x86.awk
Makefile: set LC_CTYPE, LC_COLLATE, LC_NUMERIC to C
x86: Increase MAX_EARLY_RES; insufficient on 32-bit NUMA
x86: Fix checking of SRAT when node 0 ram is not from 0
x86, cpuid: Add "volatile" to asm in native_cpuid()
x86, msr: msrs_alloc/free for CONFIG_SMP=n
x86, amd: Get multi-node CPU info from NodeId MSR instead of PCI config space
x86: Add IA32_TSC_AUX MSR and use it
x86, msr/cpuid: Register enough minors for the MSR and CPUID drivers
initramfs: add missing decompressor error check
bzip2: Add missing checks for malloc returning NULL
bzip2/lzma/gzip: pre-boot malloc doesn't return NULL on failure
* 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (25 commits)
sched: Fix broken assertion
sched: Assert task state bits at build time
sched: Update task_state_arraypwith new states
sched: Add missing state chars to TASK_STATE_TO_CHAR_STR
sched: Move TASK_STATE_TO_CHAR_STR near the TASK_state bits
sched: Teach might_sleep() about preemptible RCU
sched: Make warning less noisy
sched: Simplify set_task_cpu()
sched: Remove the cfs_rq dependency from set_task_cpu()
sched: Add pre and post wakeup hooks
sched: Move kthread_bind() back to kthread.c
sched: Fix select_task_rq() vs hotplug issues
sched: Fix sched_exec() balancing
sched: Ensure set_task_cpu() is never called on blocked tasks
sched: Use TASK_WAKING for fork wakups
sched: Select_task_rq_fair() must honour SD_LOAD_BALANCE
sched: Fix task_hot() test order
sched: Fix set_cpu_active() in cpu_down()
sched: Mark boot-cpu active before smp_init()
sched: Fix cpu_clock() in NMIs, on !CONFIG_HAVE_UNSTABLE_SCHED_CLOCK
...
* 'for-33' of git://repo.or.cz/linux-kbuild: (29 commits)
net: fix for utsrelease.h moving to generated
gen_init_cpio: fixed fwrite warning
kbuild: fix make clean after mismerge
kbuild: generate modules.builtin
genksyms: properly consider EXPORT_UNUSED_SYMBOL{,_GPL}()
score: add asm/asm-offsets.h wrapper
unifdef: update to upstream revision 1.190
kbuild: specify absolute paths for cscope
kbuild: create include/generated in silentoldconfig
scripts/package: deb-pkg: use fakeroot if available
scripts/package: add KBUILD_PKG_ROOTCMD variable
scripts/package: tar-pkg: use tar --owner=root
Kbuild: clean up marker
net: add net_tstamp.h to headers_install
kbuild: move utsrelease.h to include/generated
kbuild: move autoconf.h to include/generated
drop explicit include of autoconf.h
kbuild: move compile.h to include/generated
kbuild: drop include/asm
kbuild: do not check for include/asm-$ARCH
...
Fixed non-conflicting clean merge of modpost.c as per comments from
Stephen Rothwell (modpost.c had grown an include of linux/autoconf.h
that needed to be changed to generated/autoconf.h)
A UP machine has 1 active cpu, not having the boot-cpu in the
active map when starting the scheduler confuses things.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
LKML-Reference: <20091216170517.423469527@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
The decompressors return error by calling a supplied error function, and/or
by returning an error return value. The initramfs code, however, fails to
check the exit code returned by the decompressor, and only checks the error
status set by calling the error function.
This patch adds a return code check and calls the error function.
Signed-off-by: Phillip Lougher <phillip@lougher.demon.co.uk>
LKML-Reference: <4b26b1ef.0+ZWxT6886olqcSc%phillip@lougher.demon.co.uk>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
The symbol 'call' is a static symbol used for initcall_debug. This same
symbol name is used locally by a couple functions and produces the
following sparse warnings:
warning: symbol 'call' shadows an earlier one
Fix this noise by renaming the local symbols.
Signed-off-by: H Hartley Sweeten <hsweeten@visionengravers.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The NOMMU code currently clears all anonymous mmapped memory. While this
is what we want in the default case, all memory allocation from userspace
under NOMMU has to go through this interface, including malloc() which is
allowed to return uninitialized memory. This can easily be a significant
performance penalty. So for constrained embedded systems were security is
irrelevant, allow people to avoid clearing memory unnecessarily.
This also alters the ELF-FDPIC binfmt such that it obtains uninitialised
memory for the brk and stack region.
Signed-off-by: Jie Zhang <jie.zhang@analog.com>
Signed-off-by: Robin Getz <rgetz@blackfin.uclinux.org>
Signed-off-by: Mike Frysinger <vapier@gentoo.org>
Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Paul Mundt <lethal@linux-sh.org>
Acked-by: Greg Ungerer <gerg@snapgear.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* 'timers-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
timers, init: Limit the number of per cpu calibration bootup messages
posix-cpu-timers: optimize and document timer_create callback
clockevents: Add missing include to pacify sparse
x86: vmiclock: Fix printk format
x86: Fix printk format due to variable type change
sparc: fix printk for change of variable type
clocksource/events: Fix fallout of generic code changes
nohz: Allow 32-bit machines to sleep for more than 2.15 seconds
nohz: Track last do_timer() cpu
nohz: Prevent clocksource wrapping during idle
nohz: Type cast printk argument
mips: Use generic mult/shift factor calculation for clocks
clocksource: Provide a generic mult/shift factor calculation
clockevents: Use u32 for mult and shift factors
nohz: Introduce arch_needs_cpu
nohz: Reuse ktime in sub-functions of tick_check_idle.
time: Remove xtime_cache
time: Implement logarithmic time accumulation
* git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/sysctl-2.6: (43 commits)
security/tomoyo: Remove now unnecessary handling of security_sysctl.
security/tomoyo: Add a special case to handle accesses through the internal proc mount.
sysctl: Drop & in front of every proc_handler.
sysctl: Remove CTL_NONE and CTL_UNNUMBERED
sysctl: kill dead ctl_handler definitions.
sysctl: Remove the last of the generic binary sysctl support
sysctl net: Remove unused binary sysctl code
sysctl security/tomoyo: Don't look at ctl_name
sysctl arm: Remove binary sysctl support
sysctl x86: Remove dead binary sysctl support
sysctl sh: Remove dead binary sysctl support
sysctl powerpc: Remove dead binary sysctl support
sysctl ia64: Remove dead binary sysctl support
sysctl s390: Remove dead sysctl binary support
sysctl frv: Remove dead binary sysctl support
sysctl mips/lasat: Remove dead binary sysctl support
sysctl drivers: Remove dead binary sysctl support
sysctl crypto: Remove dead binary sysctl support
sysctl security/keys: Remove dead binary sysctl support
sysctl kernel: Remove binary sysctl logic
...
* 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (31 commits)
rcu: Make RCU's CPU-stall detector be default
rcu: Add expedited grace-period support for preemptible RCU
rcu: Enable fourth level of TREE_RCU hierarchy
rcu: Rename "quiet" functions
rcu: Re-arrange code to reduce #ifdef pain
rcu: Eliminate unneeded function wrapping
rcu: Fix grace-period-stall bug on large systems with CPU hotplug
rcu: Eliminate __rcu_pending() false positives
rcu: Further cleanups of use of lastcomp
rcu: Simplify association of forced quiescent states with grace periods
rcu: Accelerate callback processing on CPUs not detecting GP end
rcu: Mark init-time-only rcu_bootup_announce() as __init
rcu: Simplify association of quiescent states with grace periods
rcu: Rename dynticks_completed to completed_fqs
rcu: Enable synchronize_sched_expedited() fastpath
rcu: Remove inline from forward-referenced functions
rcu: Fix note_new_gpnum() uses of ->gpnum
rcu: Fix synchronization for rcu_process_gp_end() uses of ->completed counter
rcu: Prepare for synchronization fixes: clean up for non-NO_HZ handling of ->completed counter
rcu: Cleanup: balance rcu_irq_enter()/rcu_irq_exit() calls
...
* 'core-locking-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
mutex: Fix missing conditions to build mutex_spin_on_owner()
mutex: Better control mutex adaptive spinning config
locking, task_struct: Reduce size on TRACE_IRQFLAGS and 64bit
locking: Use __[SPIN|RW]_LOCK_UNLOCKED in [spin|rw]_lock_init()
locking: Remove unused prototype
locking: Reduce ifdefs in kernel/spinlock.c
locking: Make inlining decision Kconfig based
Jon confirms that recent modprobe will look in /proc/cmdline, so these
cmdline options can still be used.
See http://bugzilla.kernel.org/show_bug.cgi?id=14164
Reported-by: Adam Williamson <awilliam@redhat.com>
Cc: stable@kernel.org
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The SYSFS_DEPRECATED_V2 says "remove" older, deprecated features, but it
actually enables them, so correct this confusing, backwards text.
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Cc: Kay Sievers <kay.sievers@vrfy.org>
Cc: Greg KH <greg@kroah.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Limit the number of per cpu calibration messages by only
printing out results for the first cpu to boot.
Also, don't print "CPUx is down" as this is expected, and we
don't need 4096 reminders... ;-)
Signed-off-by: Mike Travis <travis@sgi.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Roland Dreier <rdreier@cisco.com>
Cc: Randy Dunlap <rdunlap@xenotime.net>
Cc: Tejun Heo <tj@kernel.org>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Greg Kroah-Hartman <gregkh@suse.de>
Cc: Yinghai Lu <yhlu.kernel@gmail.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Cc: Jack Steiner <steiner@sgi.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
LKML-Reference: <20091118002219.889552000@alcatraz.americas.sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Resolve the conflict between v2.6.32-rc7 where dn_def_dev_handler
gets a small bug fix and the sysctl tree where I am removing all
sysctl strategy routines.
commit 892a7c67 (locking: Allow arch-inlined spinlocks) implements the
selection of which lock functions are inlined based on defines in
arch/.../spinlock.h: #define __always_inline__LOCK_FUNCTION
Despite of the name __always_inline__* the lock functions can be built
out of line depending on config options. Also if the arch does not set
some inline defines the generic code might set them; again depending on
config options.
This makes it unnecessary hard to figure out when and which lock
functions are inlined. Aside of that it makes it way harder and
messier for -rt to manipulate the lock functions.
Convert the inlining decision to CONFIG switches. Each lock function
is inlined depending on CONFIG_INLINE_*. The configs implement the
existing dependencies. The architecture code can select ARCH_INLINE_*
to signal that it wants the corresponding lock function inlined.
ARCH_INLINE_* is necessary as Kconfig ignores "depends on"
restrictions when a config element is selected.
No functional change.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
LKML-Reference: <20091109151428.504477141@linutronix.de>
Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Reviewed-by: Ingo Molnar <mingo@elte.hu>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
* 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
perf tools: Fix permission checks
perf_events: Fix some typo in the perf events config description
To simply maintenance and to be able to remove all of the binary
sysctl support from various subsystems I have rewritten the binary
sysctl code as a compatibility wrapper around proc/sys.
The code is built around a hard coded table based on the table
in sysctl_check.c that lists all of our current binary sysctls
and provides enough information to convert from the sysctl
binary input into into ascii and back again. New in this
patch is the realization that the only dynamic entries
that need to be handled have ifname as the asscii string
and ifindex as their ctl_name.
When a sys_sysctl is called the code now looks in the
translation table converting the binary name to the
path under /proc where the value is to be found. Opens
that file, and calls into a format conversion wrapper
that calls fop->read and then fop->write as appropriate.
Since in practice the practically no one uses or tests
sys_sysctl rewritting the code to be beautiful is a little
silly. The redeeming merit of this work is it allows us to
rip out all of the binary sysctl syscall support from
everywhere else in the tree. Allowing us to remove
a lot of dead (after this patch) and barely maintained code.
In addition it becomes much easier to optimize the sysctl
implementation for being the backing store of /proc/sys,
without having to worry about sys_sysctl.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
We dont need to depend on PPC64 explicitly as all powerpc platforms
(32-bit and 64-bit) define PPC now.
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
This patch is a version of RCU designed for !SMP provided for a
small-footprint RCU implementation. In particular, the
implementation of synchronize_rcu() is extremely lightweight and
high performance. It passes rcutorture testing in each of the
four relevant configurations (combinations of NO_HZ and PREEMPT)
on x86. This saves about 1K bytes compared to old Classic RCU
(which is no longer in mainline), and more than three kilobytes
compared to Hierarchical RCU (updated to 2.6.30):
CONFIG_TREE_RCU:
text data bss dec filename
183 4 0 187 kernel/rcupdate.o
2783 520 36 3339 kernel/rcutree.o
3526 Total (vs 4565 for v7)
CONFIG_TREE_PREEMPT_RCU:
text data bss dec filename
263 4 0 267 kernel/rcupdate.o
4594 776 52 5422 kernel/rcutree.o
5689 Total (6155 for v7)
CONFIG_TINY_RCU:
text data bss dec filename
96 4 0 100 kernel/rcupdate.o
734 24 0 758 kernel/rcutiny.o
858 Total (vs 848 for v7)
The above is for x86. Your mileage may vary on other platforms.
Further compression is possible, but is being procrastinated.
Changes from v7 (http://lkml.org/lkml/2009/10/9/388)
o Apply Lai Jiangshan's review comments (aside from
might_sleep() in synchronize_sched(), which is covered by SMP builds).
o Fix up expedited primitives.
Changes from v6 (http://lkml.org/lkml/2009/9/23/293).
o Forward ported to put it into the 2.6.33 stream.
o Added lockdep support.
o Make lightweight rcu_barrier.
Changes from v5 (http://lkml.org/lkml/2009/6/23/12).
o Ported to latest pre-2.6.32 merge window kernel.
- Renamed rcu_qsctr_inc() to rcu_sched_qs().
- Renamed rcu_bh_qsctr_inc() to rcu_bh_qs().
- Provided trivial rcu_cpu_notify().
- Provided trivial exit_rcu().
- Provided trivial rcu_needs_cpu().
- Fixed up the rcu_*_enter/exit() functions in linux/hardirq.h.
o Removed the dependence on EMBEDDED, with a view to making
TINY_RCU default for !SMP at some time in the future.
o Added (trivial) support for expedited grace periods.
Changes from v4 (http://lkml.org/lkml/2009/5/2/91) include:
o Squeeze the size down a bit further by removing the
->completed field from struct rcu_ctrlblk.
o This permits synchronize_rcu() to become the empty function.
Previous concerns about rcutorture were unfounded, as
rcutorture correctly handles a constant value from
rcu_batches_completed() and rcu_batches_completed_bh().
Changes from v3 (http://lkml.org/lkml/2009/3/29/221) include:
o Changed rcu_batches_completed(), rcu_batches_completed_bh()
rcu_enter_nohz(), rcu_exit_nohz(), rcu_nmi_enter(), and
rcu_nmi_exit(), to be static inlines, as suggested by David
Howells. Doing this saves about 100 bytes from rcutiny.o.
(The numbers between v3 and this v4 of the patch are not directly
comparable, since they are against different versions of Linux.)
Changes from v2 (http://lkml.org/lkml/2009/2/3/333) include:
o Fix whitespace issues.
o Change short-circuit "||" operator to instead be "+" in order
to fix performance bug noted by "kraai" on LWN.
(http://lwn.net/Articles/324348/)
Changes from v1 (http://lkml.org/lkml/2009/1/13/440) include:
o This version depends on EMBEDDED as well as !SMP, as suggested
by Ingo.
o Updated rcu_needs_cpu() to unconditionally return zero,
permitting the CPU to enter dynticks-idle mode at any time.
This works because callbacks can be invoked upon entry to
dynticks-idle mode.
o Paul is now OK with this being included, based on a poll at
the Kernel Miniconf at linux.conf.au, where about ten people said
that they cared about saving 900 bytes on single-CPU systems.
o Applies to both mainline and tip/core/rcu.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Acked-by: David Howells <dhowells@redhat.com>
Acked-by: Josh Triplett <josh@joshtriplett.org>
Reviewed-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Cc: dipankar@in.ibm.com
Cc: mathieu.desnoyers@polymtl.ca
Cc: dvhltc@us.ibm.com
Cc: niv@us.ibm.com
Cc: peterz@infradead.org
Cc: rostedt@goodmis.org
Cc: Valdis.Kletnieks@vt.edu
Cc: avi@redhat.com
Cc: mtosatti@redhat.com
LKML-Reference: <12565226351355-git-send-email->
Signed-off-by: Ingo Molnar <mingo@elte.hu>
* 'core-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
futex: fix requeue_pi key imbalance
futex: Fix typo in FUTEX_WAIT/WAKE_BITSET_PRIVATE definitions
rcu: Place root rcu_node structure in separate lockdep class
rcu: Make hot-unplugged CPU relinquish its own RCU callbacks
rcu: Move rcu_barrier() to rcutree
futex: Move exit_pi_state() call to release_mm()
futex: Nullify robust lists after cleanup
futex: Fix locking imbalance
panic: Fix panic message visibility by calling bust_spinlocks(0) before dying
rcu: Replace the rcu_barrier enum with pointer to call_rcu*() function
rcu: Clean up code based on review feedback from Josh Triplett, part 4
rcu: Clean up code based on review feedback from Josh Triplett, part 3
rcu: Fix rcu_lock_map build failure on CONFIG_PROVE_LOCKING=y
rcu: Clean up code to address Ingo's checkpatch feedback
rcu: Clean up code based on review feedback from Josh Triplett, part 2
rcu: Clean up code based on review feedback from Josh Triplett
Some architectures such as Sparc, ARM and MIPS (basically
everything with flush_dcache_page()) need to deal with dcache
aliases by carefully placing pages in both kernel and user maps.
These architectures typically have to use vmalloc_user() for this.
However, on other architectures, vmalloc() is not needed and has
the downsides of being more restricted and slower than regular
allocations.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by: David Miller <davem@davemloft.net>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Jens Axboe <jens.axboe@oracle.com>
Cc: Paul Mackerras <paulus@samba.org>
LKML-Reference: <1254830228.21044.272.camel@laptop>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
* git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6-for-linus: (39 commits)
cpumask: Move deprecated functions to end of header.
cpumask: remove unused deprecated functions, avoid accusations of insanity
cpumask: use new-style cpumask ops in mm/quicklist.
cpumask: use mm_cpumask() wrapper: x86
cpumask: use mm_cpumask() wrapper: um
cpumask: use mm_cpumask() wrapper: mips
cpumask: use mm_cpumask() wrapper: mn10300
cpumask: use mm_cpumask() wrapper: m32r
cpumask: use mm_cpumask() wrapper: arm
cpumask: Use accessors for cpu_*_mask: um
cpumask: Use accessors for cpu_*_mask: powerpc
cpumask: Use accessors for cpu_*_mask: mips
cpumask: Use accessors for cpu_*_mask: m32r
cpumask: remove arch_send_call_function_ipi
cpumask: arch_send_call_function_ipi_mask: s390
cpumask: arch_send_call_function_ipi_mask: powerpc
cpumask: arch_send_call_function_ipi_mask: mips
cpumask: arch_send_call_function_ipi_mask: m32r
cpumask: arch_send_call_function_ipi_mask: alpha
cpumask: remove obsolete topology_core_siblings and topology_thread_siblings: ia64
...
* remove asm/atomic.h inclusion from linux/utsname.h --
not needed after kref conversion
* remove linux/utsname.h inclusion from files which do not need it
NOTE: it looks like fs/binfmt_elf.c do not need utsname.h, however
due to some personality stuff it _is_ needed -- cowardly leave ELF-related
headers and files alone.
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* git://git.kernel.org/pub/scm/linux/kernel/git/sam/kbuild-next: (30 commits)
Use macros for .data.page_aligned section.
Use macros for .bss.page_aligned section.
Use new __init_task_data macro in arch init_task.c files.
kbuild: Don't define ALIGN and ENTRY when preprocessing linker scripts.
arm, cris, mips, sparc, powerpc, um, xtensa: fix build with bash 4.0
kbuild: add static to prototypes
kbuild: fail build if recordmcount.pl fails
kbuild: set -fconserve-stack option for gcc 4.5
kbuild: echo the record_mcount command
gconfig: disable "typeahead find" search in treeviews
kbuild: fix cc1 options check to ensure we do not use -fPIC when compiling
checkincludes.pl: add option to remove duplicates in place
markup_oops: use modinfo to avoid confusion with underscored module names
checkincludes.pl: provide usage helper
checkincludes.pl: close file as soon as we're done with it
ctags: usability fix
kernel hacking: move STRIP_ASM_SYMS from General
gitignore usr/initramfs_data.cpio.bz2 and usr/initramfs_data.cpio.lzma
kbuild: Check if linker supports the -X option
kbuild: introduce ld-option
...
Fix trivial conflict in scripts/basic/fixdep.c
These issues identified during an old-fashioned face-to-face code
review extending over many hours.
o Add comments for tricky parts of code, and correct comments
that have passed their sell-by date.
o Get rid of the vestiges of rcu_init_sched(), which is no
longer needed now that PREEMPT_RCU is gone.
o Move the #include of rcutree_plugin.h to the end of
rcutree.c, which means that, rather than having a random
collection of forward declarations, the new set of forward
declarations document the set of plugins. The new home for
this #include also allows __rcu_init_preempt() to move into
rcutree_plugin.h.
o Fix rcu_preempt_check_callbacks() to be static.
Suggested-by: Josh Triplett <josh@joshtriplett.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: laijs@cn.fujitsu.com
Cc: dipankar@in.ibm.com
Cc: akpm@linux-foundation.org
Cc: mathieu.desnoyers@polymtl.ca
Cc: josh@joshtriplett.org
Cc: dvhltc@us.ibm.com
Cc: niv@us.ibm.com
Cc: peterz@infradead.org
Cc: rostedt@goodmis.org
Cc: Valdis.Kletnieks@vt.edu
Cc: dhowells@redhat.com
LKML-Reference: <12537246443924-git-send-email->
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Peter Zijlstra <peterz@infradead.org>
Sizing of memory allocations shouldn't depend on the number of physical
pages found in a system, as that generally includes (perhaps a huge amount
of) non-RAM pages. The amount of what actually is usable as storage
should instead be used as a basis here.
Some of the calculations (i.e. those not intending to use high memory)
should likely even use (totalram_pages - totalhigh_pages).
Signed-off-by: Jan Beulich <jbeulich@novell.com>
Acked-by: Rusty Russell <rusty@rustcorp.com.au>
Acked-by: Ingo Molnar <mingo@elte.hu>
Cc: Dave Airlie <airlied@linux.ie>
Cc: Kyle McMartin <kyle@mcmartin.ca>
Cc: Jeremy Fitzhardinge <jeremy@goop.org>
Cc: Pekka Enberg <penberg@cs.helsinki.fi>
Cc: Hugh Dickins <hugh.dickins@tiscali.co.uk>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Patrick McHardy <kaber@trash.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* 'perfcounters-rename-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
perf: Tidy up after the big rename
perf: Do the big rename: Performance Counters -> Performance Events
perf_counter: Rename 'event' to event_id/hw_event
perf_counter: Rename list_entry -> group_entry, counter_list -> group_list
Manually resolved some fairly trivial conflicts with the tracing tree in
include/trace/ftrace.h and kernel/trace/trace_syscalls.c.
* 'core-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
rcu: Fix whitespace inconsistencies
rcu: Fix thinko, actually initialize full tree
rcu: Apply results of code inspection of kernel/rcutree_plugin.h
rcu: Add WARN_ON_ONCE() consistency checks covering state transitions
rcu: Fix synchronize_rcu() for TREE_PREEMPT_RCU
rcu: Simplify rcu_read_unlock_special() quiescent-state accounting
rcu: Add debug checks to TREE_PREEMPT_RCU for premature grace periods
rcu: Kconfig help needs to say that TREE_PREEMPT_RCU scales down
rcutorture: Occasionally delay readers enough to make RCU force_quiescent_state
rcu: Initialize multi-level RCU grace periods holding locks
rcu: Need to update rnp->gpnum if preemptable RCU is to be reliable
- provide compatibility Kconfig entry for existing PERF_COUNTERS .config's
- provide courtesy copy of old perf_counter.h, for user-space projects
- small indentation fixups
- fix up MAINTAINERS
- fix small x86 printout fallout
- fix up small PowerPC comment fallout (use 'counter' as in register)
Reviewed-by: Arjan van de Ven <arjan@linux.intel.com>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Bye-bye Performance Counters, welcome Performance Events!
In the past few months the perfcounters subsystem has grown out its
initial role of counting hardware events, and has become (and is
becoming) a much broader generic event enumeration, reporting, logging,
monitoring, analysis facility.
Naming its core object 'perf_counter' and naming the subsystem
'perfcounters' has become more and more of a misnomer. With pending
code like hw-breakpoints support the 'counter' name is less and
less appropriate.
All in one, we've decided to rename the subsystem to 'performance
events' and to propagate this rename through all fields, variables
and API names. (in an ABI compatible fashion)
The word 'event' is also a bit shorter than 'counter' - which makes
it slightly more convenient to write/handle as well.
Thanks goes to Stephane Eranian who first observed this misnomer and
suggested a rename.
User-space tooling and ABI compatibility is not affected - this patch
should be function-invariant. (Also, defconfigs were not touched to
keep the size down.)
This patch has been generated via the following script:
FILES=$(find * -type f | grep -vE 'oprofile|[^K]config')
sed -i \
-e 's/PERF_EVENT_/PERF_RECORD_/g' \
-e 's/PERF_COUNTER/PERF_EVENT/g' \
-e 's/perf_counter/perf_event/g' \
-e 's/nb_counters/nb_events/g' \
-e 's/swcounter/swevent/g' \
-e 's/tpcounter_event/tp_event/g' \
$FILES
for N in $(find . -name perf_counter.[ch]); do
M=$(echo $N | sed 's/perf_counter/perf_event/g')
mv $N $M
done
FILES=$(find . -name perf_event.*)
sed -i \
-e 's/COUNTER_MASK/REG_MASK/g' \
-e 's/COUNTER/EVENT/g' \
-e 's/\<event\>/event_id/g' \
-e 's/counter/event/g' \
-e 's/Counter/Event/g' \
$FILES
... to keep it as correct as possible. This script can also be
used by anyone who has pending perfcounters patches - it converts
a Linux kernel tree over to the new naming. We tried to time this
change to the point in time where the amount of pending patches
is the smallest: the end of the merge window.
Namespace clashes were fixed up in a preparatory patch - and some
stylistic fallout will be fixed up in a subsequent patch.
( NOTE: 'counters' are still the proper terminology when we deal
with hardware registers - and these sed scripts are a bit
over-eager in renaming them. I've undone some of that, but
in case there's something left where 'counter' would be
better than 'event' we can undo that on an individual basis
instead of touching an otherwise nicely automated patch. )
Suggested-by: Stephane Eranian <eranian@google.com>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by: Paul Mackerras <paulus@samba.org>
Reviewed-by: Arjan van de Ven <arjan@linux.intel.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: David Howells <dhowells@redhat.com>
Cc: Kyle McMartin <kyle@mcmartin.ca>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: <linux-arch@vger.kernel.org>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Sam suggested moving STRIP_ASM_SYMS into the Kernel hacking menu
from the General Setup menu. It makes more sense there.
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
Now that the last users of markers have migrated to the event
tracer we can kill off the (now orphan) support code.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
LKML-Reference: <20090917173527.GA1699@lst.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
To quote Valdis:
This leaves somebody who has a laptop wondering which
choice is best for a system with only one or two cores that
has CONFIG_PREEMPT defined. One choice says it scales down
nicely, the other explicitly has a 'depends on PREEMPT'
attached to it...
So add "scales down nicely" to TREE_PREEMPT_RCU to match that of
TREE_RCU.
Suggested-by: Valdis Kletnieks <Valdis.Kletnieks@vt.edu>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: laijs@cn.fujitsu.com
Cc: dipankar@in.ibm.com
Cc: akpm@linux-foundation.org
Cc: mathieu.desnoyers@polymtl.ca
Cc: josh@joshtriplett.org
Cc: dvhltc@us.ibm.com
Cc: niv@us.ibm.com
Cc: peterz@infradead.org
Cc: rostedt@goodmis.org
LKML-Reference: <12528585112362-git-send-email->
Signed-off-by: Ingo Molnar <mingo@elte.hu>
* git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core-2.6:
Driver Core: devtmpfs - kernel-maintained tmpfs-based /dev
debugfs: Modify default debugfs directory for debugging pktcdvd.
debugfs: Modified default dir of debugfs for debugging UHCI.
debugfs: Change debugfs directory of IWMC3200
debugfs: Change debuhgfs directory of trace-events-sample.h
debugfs: Fix mount directory of debugfs by default in events.txt
hpilo: add poll f_op
hpilo: add interrupt handler
hpilo: staging for interrupt handling
driver core: platform_device_add_data(): use kmemdup()
Driver core: Add support for compatibility classes
uio: add generic driver for PCI 2.3 devices
driver-core: move dma-coherent.c from kernel to driver/base
mem_class: fix bug
mem_class: use minor as index instead of searching the array
driver model: constify attribute groups
UIO: remove 'default n' from Kconfig
Driver core: Add accessor for device platform data
Driver core: move dev_get/set_drvdata to drivers/base/dd.c
Driver core: add new device to bus's list before probing
Devtmpfs lets the kernel create a tmpfs instance called devtmpfs
very early at kernel initialization, before any driver-core device
is registered. Every device with a major/minor will provide a
device node in devtmpfs.
Devtmpfs can be changed and altered by userspace at any time,
and in any way needed - just like today's udev-mounted tmpfs.
Unmodified udev versions will run just fine on top of it, and will
recognize an already existing kernel-created device node and use it.
The default node permissions are root:root 0600. Proper permissions
and user/group ownership, meaningful symlinks, all other policy still
needs to be applied by userspace.
If a node is created by devtmps, devtmpfs will remove the device node
when the device goes away. If the device node was created by
userspace, or the devtmpfs created node was replaced by userspace, it
will no longer be removed by devtmpfs.
If it is requested to auto-mount it, it makes init=/bin/sh work
without any further userspace support. /dev will be fully populated
and dynamic, and always reflect the current device state of the kernel.
With the commonly used dynamic device numbers, it solves the problem
where static devices nodes may point to the wrong devices.
It is intended to make the initial bootup logic simpler and more robust,
by de-coupling the creation of the inital environment, to reliably run
userspace processes, from a complex userspace bootstrap logic to provide
a working /dev.
Signed-off-by: Kay Sievers <kay.sievers@vrfy.org>
Signed-off-by: Jan Blunck <jblunck@suse.de>
Tested-By: Harald Hoyer <harald@redhat.com>
Tested-By: Scott James Remnant <scott@ubuntu.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu: (46 commits)
powerpc64: convert to dynamic percpu allocator
sparc64: use embedding percpu first chunk allocator
percpu: kill lpage first chunk allocator
x86,percpu: use embedding for 64bit NUMA and page for 32bit NUMA
percpu: update embedding first chunk allocator to handle sparse units
percpu: use group information to allocate vmap areas sparsely
vmalloc: implement pcpu_get_vm_areas()
vmalloc: separate out insert_vmalloc_vm()
percpu: add chunk->base_addr
percpu: add pcpu_unit_offsets[]
percpu: introduce pcpu_alloc_info and pcpu_group_info
percpu: move pcpu_lpage_build_unit_map() and pcpul_lpage_dump_cfg() upward
percpu: add @align to pcpu_fc_alloc_fn_t
percpu: make @dyn_size mandatory for pcpu_setup_first_chunk()
percpu: drop @static_size from first chunk allocators
percpu: generalize first chunk allocator selection
percpu: build first chunk allocators selectively
percpu: rename 4k first chunk allocator to page
percpu: improve boot messages
percpu: fix pcpu_reclaim() locking
...
Fix trivial conflict as by Tejun Heo in kernel/sched.c
* 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (64 commits)
sched: Fix sched::sched_stat_wait tracepoint field
sched: Disable NEW_FAIR_SLEEPERS for now
sched: Keep kthreads at default priority
sched: Re-tune the scheduler latency defaults to decrease worst-case latencies
sched: Turn off child_runs_first
sched: Ensure that a child can't gain time over it's parent after fork()
sched: enable SD_WAKE_IDLE
sched: Deal with low-load in wake_affine()
sched: Remove short cut from select_task_rq_fair()
sched: Turn on SD_BALANCE_NEWIDLE
sched: Clean up topology.h
sched: Fix dynamic power-balancing crash
sched: Remove reciprocal for cpu_power
sched: Try to deal with low capacity, fix update_sd_power_savings_stats()
sched: Try to deal with low capacity
sched: Scale down cpu_power due to RT tasks
sched: Implement dynamic cpu_power
sched: Add smt_gain
sched: Update the cpu_power sum during load-balance
sched: Add SD_PREFER_SIBLING
...
Ingo was getting warnings from rcu_scheduler_starting()
indicating that context switches had occurred before RCU ended
its special early-boot handling of grace periods.
This is a dangerous condition, as it indicates that RCU might
have prematurely ended grace periods. This exploratory fix
moves rcu_scheduler_starting() earlier in boot.
Reported-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
drivers/sfi/sfi_core.c contains the generic SFI implementation.
It has a private header, sfi_core.h, for its own use and the
private use of future files in drivers/sfi/
Signed-off-by: Feng Tang <feng.tang@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Some architectures initialize clocks and timers in late_time_init and
x86 wants to do the same to avoid FIXMAP hackery for calibrating the
TSC. That would result in undefined sched_clock readout and wreckaged
printk timestamps again. We probably have those already on archs which
do all their time/clock setup in late_time_init.
There is no harm to move that after late_time_init except that a few
more boot timestamps are stale. The scheduler is not active at that
point so no real wreckage is expected.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
LKML-Reference: <new-submission>
Cc: linux-arch@vger.kernel.org
* 'tracing-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
tracing: Fix too large stack usage in do_one_initcall()
tracing: handle broken names in ftrace filter
ftrace: Unify effect of writing to trace_options and option/*
Create a kernel/rcutree_plugin.h file that contains definitions
for preemptable RCU (or, under the #else branch of the #ifdef,
empty definitions for the classic non-preemptable semantics).
These definitions fit into plugins defined in kernel/rcutree.c
for this purpose.
This variant of preemptable RCU uses a new algorithm whose
read-side expense is roughly that of classic hierarchical RCU
under CONFIG_PREEMPT. This new algorithm's update-side expense
is similar to that of classic hierarchical RCU, and, in absence
of read-side preemption or blocking, is exactly that of classic
hierarchical RCU. Perhaps more important, this new algorithm
has a much simpler implementation, saving well over 1,000 lines
of code compared to mainline's implementation of preemptable
RCU, which will hopefully be retired in favor of this new
algorithm.
The simplifications are obtained by maintaining per-task
nesting state for running tasks, and using a simple
lock-protected algorithm to handle accounting when tasks block
within RCU read-side critical sections, making use of lessons
learned while creating numerous user-level RCU implementations
over the past 18 months.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: laijs@cn.fujitsu.com
Cc: dipankar@in.ibm.com
Cc: akpm@linux-foundation.org
Cc: mathieu.desnoyers@polymtl.ca
Cc: josht@linux.vnet.ibm.com
Cc: dvhltc@us.ibm.com
Cc: niv@us.ibm.com
Cc: peterz@infradead.org
Cc: rostedt@goodmis.org
LKML-Reference: <12509746134003-git-send-email->
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Conflicts:
arch/sparc/kernel/smp_64.c
arch/x86/kernel/cpu/perf_counter.c
arch/x86/kernel/setup_percpu.c
drivers/cpufreq/cpufreq_ondemand.c
mm/percpu.c
Conflicts in core and arch percpu codes are mostly from commit
ed78e1e078dd44249f88b1dd8c76dafb39567161 which substituted many
num_possible_cpus() with nr_cpu_ids. As for-next branch has moved all
the first chunk allocators into mm/percpu.c, the changes are moved
from arch code to mm/percpu.c.
Signed-off-by: Tejun Heo <tj@kernel.org>
nr_cpu_ids is dependent only on cpu_possible_map and
setup_per_cpu_areas() already depends on cpu_possible_map and will use
nr_cpu_ids. Initialize nr_cpu_ids before setting up percpu areas.
Signed-off-by: Tejun Heo <tj@kernel.org>
If user has already enabled profiling support in the kernel
(for oprofile, old-style profiling of ftrace) then offer up
perfcounters with a y default in interactive kconfig sessions.
Still keep it off by default otherwise.
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Fix a missed rename in EVENT_PROFILE support so that it gets
built and allows tracepoint tracing from the 'perf' tool.
Fix a typo in the (never before built & enabled) portion in
perf_counter.c as well, and update that code to the
attr.config changes as well.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Ben Gamari <bgamari.foss@gmail.com>
Cc: Jason Baron <jbaron@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
LKML-Reference: <1246869094-21237-1-git-send-email-chris@chris-wilson.co.uk>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Pull linus#master to merge PER_CPU_DEF_ATTRIBUTES and alpha build fix
changes. As alpha in percpu tree uses 'weak' attribute instead of
inline assembly, there's no need for __used attribute.
Conflicts:
arch/alpha/include/asm/percpu.h
arch/mn10300/kernel/vmlinux.lds.S
include/linux/percpu-defs.h
Remove Classic RCU, given that the combination of Tree RCU and
the proposed Bloatwatch RCU do everything that Classic RCU can
with fewer bugs.
Tree RCU has been default in x86 builds for almost six months,
and seems to be quite reliable, so there does not seem to be
much justification for keeping the Classic RCU code and config
complexity around anymore.
Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
Cc: akpm@linux-foundation.org
Cc: niv@us.ibm.com
Cc: dvhltc@us.ibm.com
Cc: dipankar@in.ibm.com
Cc: dhowells@redhat.com
Cc: lethal@linux-sh.org
Cc: kernel@wantstofly.org
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This patch makes most !CONFIG_HAVE_SETUP_PER_CPU_AREA archs use
dynamic percpu allocator. The first chunk is allocated using
embedding helper and 8k is reserved for modules. This ensures that
the new allocator behaves almost identically to the original allocator
as long as static percpu variables are concerned, so it shouldn't
introduce much breakage.
s390 and alpha use custom SHIFT_PERCPU_PTR() to work around addressing
range limit the addressing model imposes. Unfortunately, this breaks
if the address is specified using a variable, so for now, the two
archs aren't converted.
The following architectures are affected by this change.
* sh
* arm
* cris
* mips
* sparc(32)
* blackfin
* avr32
* parisc (broken, under investigation)
* m32r
* powerpc(32)
As this change makes the dynamic allocator the default one,
CONFIG_HAVE_DYNAMIC_PER_CPU_AREA is replaced with its invert -
CONFIG_HAVE_LEGACY_PER_CPU_AREA, which is added to yet-to-be converted
archs. These archs implement their own setup_per_cpu_areas() and the
conversion is not trivial.
* powerpc(64)
* sparc(64)
* ia64
* alpha
* s390
Boot and batch alloc/free tests on x86_32 with debug code (x86_32
doesn't use default first chunk initialization). Compile tested on
sparc(32), powerpc(32), arm and alpha.
Kyle McMartin reported that this change breaks parisc. The problem is
still under investigation and he is okay with pushing this patch
forward and fixing parisc later.
[ Impact: use dynamic allocator for most archs w/o custom percpu setup ]
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Rusty Russell <rusty@rustcorp.com.au>
Acked-by: David S. Miller <davem@davemloft.net>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Reviewed-by: Christoph Lameter <cl@linux.com>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: Mikael Starvik <starvik@axis.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Bryan Wu <cooloney@kernel.org>
Cc: Kyle McMartin <kyle@mcmartin.ca>
Cc: Matthew Wilcox <matthew@wil.cx>
Cc: Grant Grundler <grundler@parisc-linux.org>
Cc: Hirokazu Takata <takata@linux-m32r.org>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Ingo Molnar <mingo@elte.hu>
SLAB uses get/put_online_cpus() which use a mutex which is itself only
initialized when cpu_hotplug_init() is called. Currently we hang suring
boot in SLAB due to doing that too late.
Reported by James Bottomley and Sachin Sant (and possibly others).
Debugged by Benjamin Herrenschmidt.
This just removes the dynamic initialization of the data structures, and
replaces it with a static one, avoiding this dependency entirely, and
removing one unnecessary special initcall.
Tested-by: Sachin Sant <sachinp@in.ibm.com>
Tested-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Tested-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The page allocator also needs the masking of gfp flags during boot,
so this moves it out of slab/slub and uses it with the page allocator
as well.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Acked-by: Pekka Enberg <penberg@cs.helsinki.fi>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Call constructors (gcc-generated initcall-like functions) during kernel
start and module load. Constructors are e.g. used for gcov data
initialization.
Disable constructor support for usermode Linux to prevent conflicts with
host glibc.
Signed-off-by: Peter Oberparleiter <oberpar@linux.vnet.ibm.com>
Acked-by: Rusty Russell <rusty@rustcorp.com.au>
Acked-by: WANG Cong <xiyou.wangcong@gmail.com>
Cc: Sam Ravnborg <sam@ravnborg.org>
Cc: Jeff Dike <jdike@addtoit.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Huang Ying <ying.huang@intel.com>
Cc: Li Wei <W.Li@Sun.COM>
Cc: Michael Ellerman <michaele@au1.ibm.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Heiko Carstens <heicars2@linux.vnet.ibm.com>
Cc: Martin Schwidefsky <mschwid2@linux.vnet.ibm.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Some architectures need to initialize SLAB caches to be able
to allocate page tables. They do that from pgtable_cache_init()
so the later should be called earlier now, best is before
vmalloc_init().
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* akpm: (182 commits)
fbdev: bf54x-lq043fb: use kzalloc over kmalloc/memset
fbdev: *bfin*: fix __dev{init,exit} markings
fbdev: *bfin*: drop unnecessary calls to memset
fbdev: bfin-t350mcqb-fb: drop unused local variables
fbdev: blackfin has __raw I/O accessors, so use them in fb.h
fbdev: s1d13xxxfb: add accelerated bitblt functions
tcx: use standard fields for framebuffer physical address and length
fbdev: add support for handoff from firmware to hw framebuffers
intelfb: fix a bug when changing video timing
fbdev: use framebuffer_release() for freeing fb_info structures
radeon: P2G2CLK_ALWAYS_ONb tested twice, should 2nd be P2G2CLK_DAC_ALWAYS_ONb?
s3c-fb: CPUFREQ frequency scaling support
s3c-fb: fix resource releasing on error during probing
carminefb: fix possible access beyond end of carmine_modedb[]
acornfb: remove fb_mmap function
mb862xxfb: use CONFIG_OF instead of CONFIG_PPC_OF
mb862xxfb: restrict compliation of platform driver to PPC
Samsung SoC Framebuffer driver: add Alpha Channel support
atmel-lcdc: fix pixclock upper bound detection
offb: use framebuffer_alloc() to allocate fb_info struct
...
Manually fix up conflicts due to kmemcheck in mm/slab.c
Fix allocating page cache/slab object on the unallowed node when memory
spread is set by updating tasks' mems_allowed after its cpuset's mems is
changed.
In order to update tasks' mems_allowed in time, we must modify the code of
memory policy. Because the memory policy is applied in the process's
context originally. After applying this patch, one task directly
manipulates anothers mems_allowed, and we use alloc_lock in the
task_struct to protect mems_allowed and memory policy of the task.
But in the fast path, we didn't use lock to protect them, because adding a
lock may lead to performance regression. But if we don't add a lock,the
task might see no nodes when changing cpuset's mems_allowed to some
non-overlapping set. In order to avoid it, we set all new allowed nodes,
then clear newly disallowed ones.
[lee.schermerhorn@hp.com:
The rework of mpol_new() to extract the adjusting of the node mask to
apply cpuset and mpol flags "context" breaks set_mempolicy() and mbind()
with MPOL_PREFERRED and a NULL nodemask--i.e., explicit local
allocation. Fix this by adding the check for MPOL_PREFERRED and empty
node mask to mpol_new_mpolicy().
Remove the now unneeded 'nodes = NULL' from mpol_new().
Note that mpol_new_mempolicy() is always called with a non-NULL
'nodes' parameter now that it has been removed from mpol_new().
Therefore, we don't need to test nodes for NULL before testing it for
'empty'. However, just to be extra paranoid, add a VM_BUG_ON() to
verify this assumption.]
[lee.schermerhorn@hp.com:
I don't think the function name 'mpol_new_mempolicy' is descriptive
enough to differentiate it from mpol_new().
This function applies cpuset set context, usually constraining nodes
to those allowed by the cpuset. However, when the 'RELATIVE_NODES flag
is set, it also translates the nodes. So I settled on
'mpol_set_nodemask()', because the comment block for mpol_new() mentions
that we need to call this function to "set nodes".
Some additional minor line length, whitespace and typo cleanup.]
Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Christoph Lameter <cl@linux-foundation.org>
Cc: Paul Menage <menage@google.com>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Cc: Yasunori Goto <y-goto@jp.fujitsu.com>
Cc: Pekka Enberg <penberg@cs.helsinki.fi>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
All recent distros depend on the non-deprecated sysfs layout, so
change the default value of the option to reflect that.
Signed-off-by: Kay Sievers <kay.sievers@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
This false positive is due to the fact that do_mount_root() fakes a
mount option (which is normally read from userspace), and the kernel
unconditionally reads a whole page for the mount option.
Hide the false positive by using the new __getname_gfp() with the
__GFP_NOTRACK_FALSE_POSITIVE flag.
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Vegard Nossum <vegard.nossum@gmail.com>
* 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/sam/kbuild-next: (53 commits)
.gitignore: ignore *.lzma files
kbuild: add generic --set-str option to scripts/config
kbuild: simplify argument loop in scripts/config
kbuild: handle non-existing options in scripts/config
kallsyms: generalize text region handling
kallsyms: support kernel symbols in Blackfin on-chip memory
documentation: make version fix
kbuild: fix a compile warning
gitignore: Add GNU GLOBAL files to top .gitignore
kbuild: fix delay in setlocalversion on readonly source
README: fix misleading pointer to the defconf directory
vmlinux.lds.h update
kernel-doc: cleanup perl script
Improve vmlinux.lds.h support for arch specific linker scripts
kbuild: fix headers_exports with boolean expression
kbuild/headers_check: refine extern check
kbuild: fix "Argument list too long" error for "make headers_check",
ignore *.patch files
Remove bashisms from scripts
menu: fix embedded menu presentation
...
General description: kmemcheck is a patch to the linux kernel that
detects use of uninitialized memory. It does this by trapping every
read and write to memory that was allocated dynamically (e.g. using
kmalloc()). If a memory address is read that has not previously been
written to, a message is printed to the kernel log.
Thanks to Andi Kleen for the set_memory_4k() solution.
Andrew Morton suggested documenting the shadow member of struct page.
Signed-off-by: Vegard Nossum <vegardno@ifi.uio.no>
Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>
[export kmemcheck_mark_initialized]
[build fix for setup_max_cpus]
Signed-off-by: Ingo Molnar <mingo@elte.hu>
[rebased for mainline inclusion]
Signed-off-by: Vegard Nossum <vegardno@ifi.uio.no>
* 'perfcounters-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
perf_counter: Start documenting HAVE_PERF_COUNTERS requirements
perf_counter: Add forward/backward attribute ABI compatibility
perf record: Explicity program a default counter
perf_counter: Remove PERF_TYPE_RAW special casing
perf_counter: PERF_TYPE_HW_CACHE is a hardware counter too
powerpc, perf_counter: Fix performance counter event types
perf_counter/x86: Add a quirk for Atom processors
perf_counter tools: Remove one L1-data alias
Help out arch porters who want to support perf counters by listing some
basic requirements.
Signed-off-by: Mike Frysinger <vapier@gentoo.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Paul Mackerras <paulus@samba.org>
LKML-Reference: <1244827063-24046-1-git-send-email-vapier@gentoo.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
As explained by Benjamin Herrenschmidt:
Oh and btw, your patch alone doesn't fix powerpc, because it's missing
a whole bunch of GFP_KERNEL's in the arch code... You would have to
grep the entire kernel for things that check slab_is_available() and
even then you'll be missing some.
For example, slab_is_available() didn't always exist, and so in the
early days on powerpc, we used a mem_init_done global that is set form
mem_init() (not perfect but works in practice). And we still have code
using that to do the test.
Therefore, mask out __GFP_WAIT, __GFP_IO, and __GFP_FS in the slab allocators
in early boot code to avoid enabling interrupts.
Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>
Now, SLAB is configured in very early stage and it can be used in
init routine now.
But replacing alloc_bootmem() in FLAT/DISCONTIGMEM's page_cgroup()
initialization breaks the allocation, now.
(Works well in SPARSEMEM case...it supports MEMORY_HOTPLUG and
size of page_cgroup is in reasonable size (< 1 << MAX_ORDER.)
This patch revive FLATMEM+memory cgroup by using alloc_bootmem.
In future,
We stop to support FLATMEM (if no users) or rewrite codes for flatmem
completely.But this will adds more messy codes and overheads.
Reported-by: Li Zefan <lizf@cn.fujitsu.com>
Tested-by: Li Zefan <lizf@cn.fujitsu.com>
Tested-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>
* 'for-linus' of git://git.infradead.org/users/eparis/notify:
fsnotify: allow groups to set freeing_mark to null
inotify/dnotify: should_send_event shouldn't match on FS_EVENT_ON_CHILD
dnotify: do not bother to lock entry->lock when reading mask
dnotify: do not use ?true:false when assigning to a bool
fsnotify: move events should indicate the event was on a child
inotify: reimplement inotify using fsnotify
fsnotify: handle filesystem unmounts with fsnotify marks
fsnotify: fsnotify marks on inodes pin them in core
fsnotify: allow groups to add private data to events
fsnotify: add correlations between events
fsnotify: include pathnames with entries when possible
fsnotify: generic notification queue and waitq
dnotify: reimplement dnotify using fsnotify
fsnotify: parent event notification
fsnotify: add marks to inodes so groups can interpret how to handle those inodes
fsnotify: unified filesystem notification backend
* 'for-linus' of git://linux-arm.org/linux-2.6:
kmemleak: Add the corresponding MAINTAINERS entry
kmemleak: Simple testing module for kmemleak
kmemleak: Enable the building of the memory leak detector
kmemleak: Remove some of the kmemleak false positives
kmemleak: Add modules support
kmemleak: Add kmemleak_alloc callback from alloc_large_system_hash
kmemleak: Add the vmalloc memory allocation/freeing hooks
kmemleak: Add the slub memory allocation/freeing hooks
kmemleak: Add the slob memory allocation/freeing hooks
kmemleak: Add the slab memory allocation/freeing hooks
kmemleak: Add documentation on the memory leak detector
kmemleak: Add the base support
Manual conflict resolution (with the slab/earlyboot changes) in:
drivers/char/vt.c
init/main.c
mm/slab.c
* 'perfcounters-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (574 commits)
perf_counter: Turn off by default
perf_counter: Add counter->id to the throttle event
perf_counter: Better align code
perf_counter: Rename L2 to LL cache
perf_counter: Standardize event names
perf_counter: Rename enums
perf_counter tools: Clean up u64 usage
perf_counter: Rename perf_counter_limit sysctl
perf_counter: More paranoia settings
perf_counter: powerpc: Implement generalized cache events for POWER processors
perf_counters: powerpc: Add support for POWER7 processors
perf_counter: Accurate period data
perf_counter: Introduce struct for sample data
perf_counter tools: Normalize data using per sample period data
perf_counter: Annotate exit ctx recursion
perf_counter tools: Propagate signals properly
perf_counter tools: Small frequency related fixes
perf_counter: More aggressive frequency adjustment
perf_counter/x86: Fix the model number of Intel Core2 processors
perf_counter, x86: Correct some event and umask values for Intel processors
...
Reimplement inotify_user using fsnotify. This should be feature for feature
exactly the same as the original inotify_user. This does not make any changes
to the in kernel inotify feature used by audit. Those patches (and the eventual
removal of in kernel inotify) will come after the new inotify_user proves to be
working correctly.
Signed-off-by: Eric Paris <eparis@redhat.com>
Acked-by: Al Viro <viro@zeniv.linux.org.uk>
Cc: Christoph Hellwig <hch@lst.de>
As suggested by Christoph Lameter, introduce mm_init() now that we initialize
all the kernel memory allocations together.
Cc: Christoph Lameter <cl@linux-foundation.org>
Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>
We can call vmalloc_init() after kmem_cache_init() and use kzalloc() instead of
the bootmem allocator when initializing vmalloc data structures.
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Acked-by: Nick Piggin <npiggin@suse.de>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>
This patch makes kmalloc() available earlier in the boot sequence so we can get
rid of some bootmem allocations. The bulk of the changes are due to
kmem_cache_init() being called with interrupts disabled which requires some
changes to allocator boostrap code.
Note: 32-bit x86 does WP protect test in mem_init() so we must setup traps
before we call mem_init() during boot as reported by Ingo Molnar:
We have a hard crash in the WP-protect code:
[ 0.000000] Checking if this processor honours the WP bit even in supervisor mode...BUG: Int 14: CR2 ffcff000
[ 0.000000] EDI 00000188 ESI 00000ac7 EBP c17eaf9c ESP c17eaf8c
[ 0.000000] EBX 000014e0 EDX 0000000e ECX 01856067 EAX 00000001
[ 0.000000] err 00000003 EIP c10135b1 CS 00000060 flg 00010002
[ 0.000000] Stack: c17eafa8 c17fd410 c16747bc c17eafc4 c17fd7e5 000011fd f8616000 c18237cc
[ 0.000000] 00099800 c17bb000 c17eafec c17f1668 000001c5 c17f1322 c166e039 c1822bf0
[ 0.000000] c166e033 c153a014 c18237cc 00020800 c17eaff8 c17f106a 00020800 01ba5003
[ 0.000000] Pid: 0, comm: swapper Not tainted 2.6.30-tip-02161-g7a74539-dirty #52203
[ 0.000000] Call Trace:
[ 0.000000] [<c15357c2>] ? printk+0x14/0x16
[ 0.000000] [<c10135b1>] ? do_test_wp_bit+0x19/0x23
[ 0.000000] [<c17fd410>] ? test_wp_bit+0x26/0x64
[ 0.000000] [<c17fd7e5>] ? mem_init+0x1ba/0x1d8
[ 0.000000] [<c17f1668>] ? start_kernel+0x164/0x2f7
[ 0.000000] [<c17f1322>] ? unknown_bootoption+0x0/0x19c
[ 0.000000] [<c17f106a>] ? __init_begin+0x6a/0x6f
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by Linus Torvalds <torvalds@linux-foundation.org>
Cc: Christoph Lameter <cl@linux-foundation.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Matt Mackall <mpm@selenic.com>
Cc: Nick Piggin <npiggin@suse.de>
Cc: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>
This patch adds the base support for the kernel memory leak
detector. It traces the memory allocation/freeing in a way similar to
the Boehm's conservative garbage collector, the difference being that
the unreferenced objects are not freed but only shown in
/sys/kernel/debug/kmemleak. Enabling this feature introduces an
overhead to memory allocations.
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Ingo Molnar <mingo@elte.hu>
Acked-by: Pekka Enberg <penberg@cs.helsinki.fi>
Cc: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Perfcounters were enabled by default to help testing - but now that we
are submitting it upstream, make it default-disabled.
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
* 'tracing-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (244 commits)
Revert "x86, bts: reenable ptrace branch trace support"
tracing: do not translate event helper macros in print format
ftrace/documentation: fix typo in function grapher name
tracing/events: convert block trace points to TRACE_EVENT(), fix !CONFIG_BLOCK
tracing: add protection around module events unload
tracing: add trace_seq_vprint interface
tracing: fix the block trace points print size
tracing/events: convert block trace points to TRACE_EVENT()
ring-buffer: fix ret in rb_add_time_stamp
ring-buffer: pass in lockdep class key for reader_lock
tracing: add annotation to what type of stack trace is recorded
tracing: fix multiple use of __print_flags and __print_symbolic
tracing/events: fix output format of user stack
tracing/events: fix output format of kernel stack
tracing/trace_stack: fix the number of entries in the header
ring-buffer: discard timestamps that are at the start of the buffer
ring-buffer: try to discard unneeded timestamps
ring-buffer: fix bug in ring_buffer_discard_commit
ftrace: do not profile functions when disabled
tracing: make trace pipe recognize latency format flag
...
* 'rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
rcu: rcu_sched_grace_period(): kill the bogus flush_signals()
rculist: use list_entry_rcu in places where it's appropriate
rculist.h: introduce list_entry_rcu() and list_first_entry_rcu()
rcu: Update RCU tracing documentation for __rcu_pending
rcu: Add __rcu_pending tracing to hierarchical RCU
RCU: make treercu be default
The STRIP_ASM_SYMS kconfig symbol mucks up the embedded menu because
STRIP_ASM_SYMS is in the middle of the embedded menu items but it does not
depend on EMBEDDED. Move it to beyond the end of the embedded menu so
that the menu is presented correctly.
Or if STRIP_ASM_SYMS should depend on EMBEDDED, that can also be fixed.
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
Merge reason: merge almost-rc8 into perfcounters/core, which was -rc6
based - to pick up the latest upstream fixes.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
There is no format specifiers left in the linux_banner, and gcc-4.3
complains seeing the printk.
Signed-off-by: Alex Riesen <raa.lkml@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Merge reason: tracing/core was on a .30-rc1 base and was missing out on
on a handful of tracing fixes present in .30-rc5-almost.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
With the removal of duplicate unpack_to_rootfs() (commit
df52092f3c) the messages displayed do not
actually correspond to what the kernel is doing. In addition, depending
if ramdisks are supported or not, the messages are not at all the same.
So keep the messages more in sync with what is really doing the kernel,
and only display a second message in case of failure. This also ensure
that the printk message cannot be split by other printk's.
Signed-off-by: Eric Piel <eric.piel@tremplin-utc.net>
Acked-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
V3 of the early platform driver implementation.
Platform drivers are great for embedded platforms because we can separate
driver configuration from the actual driver. So base addresses,
interrupts and other configuration can be kept with the processor or board
code, and the platform driver can be reused by many different platforms.
For early devices we have nothing today. For instance, to configure early
timers and early serial ports we cannot use platform devices. This
because the setup order during boot. Timers are needed before the
platform driver core code is available. The same goes for early printk
support. Early in this case means before initcalls.
These early drivers today have their configuration either hard coded or
they receive it using some special configuration method. This is working
quite well, but if we want to support both regular kernel modules and
early devices then we need to have two ways of configuring the same
driver. A single way would be better.
The early platform driver patch is basically a set of functions that allow
drivers to register themselves and architecture code to locate them and
probe. Registration happens through early_param(). The time for the
probe is decided by the architecture code.
See Documentation/driver-model/platform.txt for more details.
[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Magnus Damm <damm@igel.co.jp>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: Kay Sievers <kay.sievers@vrfy.org>
Cc: David Brownell <david-b@pacbell.net>
Cc: Tejun Heo <htejun@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Change cb6ff20807 ("NOMMU: Support XIP on
initramfs") seems to have broken booting from initramfs with /sbin/init
being a hardlink.
It seems like the logic required for XIP on nommu, i.e. ftruncate to
reported cpio header file size (body_len) is broken for hardlinks, which
have a reported size of 0, and the truncate thus nukes the contents of the
file (in my case busybox), making boot impossible and ending with runaway
loop modprobe binfmt-0000 - and of course 0000 is not a valid binary
format.
My fix is to only call ftruncate if size is non-zero which fixes things
for me, but I'm not certain whether this will break XIP for those files on
nommu systems, although I would guess not.
Signed-off-by: Randy Robertson <rmrobert@vmware.com>
Acked-by: David Howells <dhowells@redhat.com>
Acked-by: Paul Mundt <lethal@linux-sh.org>
Acked-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
init/initramfs.c:520: warning: 'clean_rootfs' defined but not used
Signed-off-by: Nikanth Karthikesan <knikanth@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Impact: refactor code for future changes
Current kmemtrace.h is used both as header file of kmemtrace and kmem's
tracepoints definition.
Tracepoints' definition file may be used by other code, and should only have
definition of tracepoint.
We can separate include/trace/kmemtrace.h into 2 files:
include/linux/kmemtrace.h: header file for kmemtrace
include/trace/kmem.h: definition of kmem tracepoints
Signed-off-by: Zhao Lei <zhaolei@cn.fujitsu.com>
Acked-by: Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro>
Acked-by: Pekka Enberg <penberg@cs.helsinki.fi>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Tom Zanussi <tzanussi@gmail.com>
LKML-Reference: <49DEE68A.5040902@cn.fujitsu.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Make it possible for the linker to discard local symbols from vmlinux as
they cause vmlinux to balloon when CONFIG_KALLSYMS=y and they cause
dump_stack() and get_wchan() to produce useless information under some
circumstances.
With this we add a config option (CONFIG_STRIP_ASM_SYMS) that will cause
the build to supply -X to the linker to tell it to strip temporary local
symbols.
This doesn't seem to cause gdb any problems.
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
Conflicts:
arch/powerpc/include/asm/systbl.h
arch/powerpc/include/asm/unistd.h
include/linux/init_task.h
Merge reason: the conflicts are non-trivial: PowerPC placement
of sys_perf_counter_open has to be mixed with the
new preadv/pwrite syscalls.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Largely inspired from ipc/ipc_sysctl.c. This patch isolates the mqueue
sysctl stuff in its own file.
[akpm@linux-foundation.org: build fix]
Signed-off-by: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Nadia Derbey <Nadia.Derbey@bull.net>
Signed-off-by: Serge E. Hallyn <serue@us.ibm.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Move mqueue vfsmount plus a few tunables into the ipc_namespace struct.
The CONFIG_IPC_NS boolean and the ipc_namespace struct will serve both the
posix message queue namespaces and the SYSV ipc namespaces.
The sysctl code will be fixed separately in patch 3. After just this
patch, making a change to posix mqueue tunables always changes the values
in the initial ipc namespace.
Signed-off-by: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Serge E. Hallyn <serue@us.ibm.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Make CONFIG_SLOW_WORK an automatic rather than manual config option so that
people configuring their kernels don't have to make the choice. It can be
selected automatically by those things that require it (such as FS-Cache).
Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Jeff Garzik <jgarzik@redhat.com>
Acked-by: Kyle McMartin <kyle@mcmartin.ca>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Impact: new perfcounters feature
Enable usage of tracepoints as perf counter events.
tracepoint event ids can be found in /debug/tracing/event/*/*/id
and (for now) are represented as -65536+id in the type field.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Orig-LKML-Reference: <20090319194233.744044174@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Merge reason: we have gathered quite a few conflicts, need to merge upstream
Conflicts:
arch/powerpc/kernel/Makefile
arch/x86/ia32/ia32entry.S
arch/x86/include/asm/hardirq.h
arch/x86/include/asm/unistd_32.h
arch/x86/include/asm/unistd_64.h
arch/x86/kernel/cpu/common.c
arch/x86/kernel/irq.c
arch/x86/kernel/syscall_table_32.S
arch/x86/mm/iomap_32.c
include/linux/sched.h
kernel/Makefile
Signed-off-by: Ingo Molnar <mingo@elte.hu>
* 'tracing-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (413 commits)
tracing, net: fix net tree and tracing tree merge interaction
tracing, powerpc: fix powerpc tree and tracing tree interaction
ring-buffer: do not remove reader page from list on ring buffer free
function-graph: allow unregistering twice
trace: make argument 'mem' of trace_seq_putmem() const
tracing: add missing 'extern' keywords to trace_output.h
tracing: provide trace_seq_reserve()
blktrace: print out BLK_TN_MESSAGE properly
blktrace: extract duplidate code
blktrace: fix memory leak when freeing struct blk_io_trace
blktrace: fix blk_probes_ref chaos
blktrace: make classic output more classic
blktrace: fix off-by-one bug
blktrace: fix the original blktrace
blktrace: fix a race when creating blk_tree_root in debugfs
blktrace: fix timestamp in binary output
tracing, Text Edit Lock: cleanup
tracing: filter fix for TRACE_EVENT_FORMAT events
ftrace: Using FTRACE_WARN_ON() to check "freed record" in ftrace_release()
x86: kretprobe-booster interrupt emulation code fix
...
Fix up trivial conflicts in
arch/parisc/include/asm/ftrace.h
include/linux/memory.h
kernel/extable.c
kernel/module.c
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (28 commits)
trivial: Update my email address
trivial: NULL noise: drivers/mtd/tests/mtd_*test.c
trivial: NULL noise: drivers/media/dvb/frontends/drx397xD_fw.h
trivial: Fix misspelling of "Celsius".
trivial: remove unused variable 'path' in alloc_file()
trivial: fix a pdlfush -> pdflush typo in comment
trivial: jbd header comment typo fix for JBD_PARANOID_IOFAIL
trivial: wusb: Storage class should be before const qualifier
trivial: drivers/char/bsr.c: Storage class should be before const qualifier
trivial: h8300: Storage class should be before const qualifier
trivial: fix where cgroup documentation is not correctly referred to
trivial: Give the right path in Documentation example
trivial: MTD: remove EOL from MODULE_DESCRIPTION
trivial: Fix typo in bio_split()'s documentation
trivial: PWM: fix of #endif comment
trivial: fix typos/grammar errors in Kconfig texts
trivial: Fix misspelling of firmware
trivial: cgroups: documentation typo and spelling corrections
trivial: Update contact info for Jochen Hein
trivial: fix typo "resgister" -> "register"
...
* git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-2.6-fscache: (41 commits)
NFS: Add mount options to enable local caching on NFS
NFS: Display local caching state
NFS: Store pages from an NFS inode into a local cache
NFS: Read pages from FS-Cache into an NFS inode
NFS: nfs_readpage_async() needs to be accessible as a fallback for local caching
NFS: Add read context retention for FS-Cache to call back with
NFS: FS-Cache page management
NFS: Add some new I/O counters for FS-Cache doing things for NFS
NFS: Invalidate FsCache page flags when cache removed
NFS: Use local disk inode cache
NFS: Define and create inode-level cache objects
NFS: Define and create superblock-level objects
NFS: Define and create server-level objects
NFS: Register NFS for caching and retrieve the top-level index
NFS: Permit local filesystem caching to be enabled for NFS
NFS: Add FS-Cache option bit and debug bit
NFS: Add comment banners to some NFS functions
FS-Cache: Make kAFS use FS-Cache
CacheFiles: A cache that backs onto a mounted filesystem
CacheFiles: Export things for CacheFiles
...
Impact: switch default config from CLASSIC_RCU to TREE_RCU
Given that I have not gotten any complaints or bug reports on treercu
recently, this patch makes it be the default. There are a number of
other defconfig files that explicitly call out CLASSIC_RCU, but which
have comment headers saying not to edit them. Probably holdovers from
one of the flavors of "make config", but...
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: akpm@linux-foundation.org
Cc: dipankar@in.ibm.com
Cc: niv@us.ibm.com
Cc: manfred@colorfullife.com
Cc: peterz@infradead.org
LKML-Reference: <20090403040625.GA9473@linux.vnet.ibm.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
* 'for-linus' of git://neil.brown.name/md: (53 commits)
md/raid5 revise rules for when to update metadata during reshape
md/raid5: minor code cleanups in make_request.
md: remove CONFIG_MD_RAID_RESHAPE config option.
md/raid5: be more careful about write ordering when reshaping.
md: don't display meaningless values in sysfs files resync_start and sync_speed
md/raid5: allow layout and chunksize to be changed on active array.
md/raid5: reshape using largest of old and new chunk size
md/raid5: prepare for allowing reshape to change layout
md/raid5: prepare for allowing reshape to change chunksize.
md/raid5: clearly differentiate 'before' and 'after' stripes during reshape.
Documentation/md.txt update
md: allow number of drives in raid5 to be reduced
md/raid5: change reshape-progress measurement to cope with reshaping backwards.
md: add explicit method to signal the end of a reshape.
md/raid5: enhance raid5_size to work correctly with negative delta_disks
md/raid5: drop qd_idx from r6_state
md/raid6: move raid6 data processing to raid6_pq.ko
md: raid5 run(): Fix max_degraded for raid level 4.
md: 'array_size' sysfs attribute
md: centralize ->array_sectors modifications
...
Create a dynamically sized pool of threads for doing very slow work items, such
as invoking mkdir() or rmdir() - things that may take a long time and may
sleep, holding mutexes/semaphores and hogging a thread, and are thus unsuitable
for workqueues.
The number of threads is always at least a settable minimum, but more are
started when there's more work to do, up to a limit. Because of the nature of
the load, it's not suitable for a 1-thread-per-CPU type pool. A system with
one CPU may well want several threads.
This is used by FS-Cache to do slow caching operations in the background, such
as looking up, creating or deleting cache objects.
Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Serge Hallyn <serue@us.ibm.com>
Acked-by: Steve Dickson <steved@redhat.com>
Acked-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Acked-by: Al Viro <viro@zeniv.linux.org.uk>
Tested-by: Daire Byrne <Daire.Byrne@framestore.com>
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6:
Remove two unneeded exports and make two symbols static in fs/mpage.c
Cleanup after commit 585d3bc06f
Trim includes of fdtable.h
Don't crap into descriptor table in binfmt_som
Trim includes in binfmt_elf
Don't mess with descriptor table in load_elf_binary()
Get rid of indirect include of fs_struct.h
New helper - current_umask()
check_unsafe_exec() doesn't care about signal handlers sharing
New locking/refcounting for fs_struct
Take fs_struct handling to new file (fs/fs_struct.c)
Get rid of bumping fs_struct refcount in pivot_root(2)
Kill unsharing fs_struct in __set_personality()
Allow cpusets to be configured/built on non-SMP systems
Currently it's impossible to build cpusets under UML on x86-64, since
cpusets depends on SMP and x86-64 UML doesn't support SMP.
There's code in cpusets that doesn't depend on SMP. This patch surrounds
the minimum amount of cpusets code with #ifdef CONFIG_SMP in order to
allow cpusets to build/run on UP systems (for testing purposes under UML).
Reviewed-by: Li Zefan <lizf@cn.fujitsu.com>
Signed-off-by: Paul Menage <menage@google.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
It's pointed out that swap_cgroup's message at swapon() is nonsense.
Because
* It can be calculated very easily if all necessary information is
written in Kconfig.
* It's not necessary to annoying people at every swapon().
In other view, now, memory usage per swp_entry is reduced to 2bytes from
8bytes(64bit) and I think it's reasonably small.
Reported-by: Hugh Dickins <hugh@veritas.com>
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
initramfs uses printk without a linefeed, then does some work, then uses
printk to finish the message off. However if some other code does a
printk in between, then the messages get mixed together. Better for each
message to be an independent line...
Example of problem that this fixes:
checking if image is initramfs...<7>Switched to high resolution mode on CPU 1
Switched to high resolution mode on CPU 0
it is
Signed-off-by: Simon Kitching <skitching@apache.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Don't pull it in sched.h; very few files actually need it and those
can include directly. sched.h itself only needs forward declaration
of struct fs_struct;
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
This makes the includes more explicit, and is preparation for moving
md_k.h to drivers/md/md.h
Remove include/raid/md.h as its only remaining use was to #include
other files.
Signed-off-by: NeilBrown <neilb@suse.de>
.. as they are part of the user-space interface.
Also move MdpMinorShift into there so we can remove duplication.
Lastly move mdp_major in. It is less obviously part of the user-space
interface, but do_mounts_md.c uses it, and it is acting a bit like
user-space.
Signed-off-by: NeilBrown <neilb@suse.de>
cgroup documentation was moved to Documentation/cgroups/. There are some
places that still refer to Documentation/controllers/,
Documentation/cgroups.txt and Documentation/cpusets.txt. Fix those.
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@holoscopio.com>
Reviewed-by: Li Zefan <lizf@cn.fujitsu.com>
Acked-by: Paul Menage <menage@google.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
cpu_active_map is deprecated in favor of cpu_active_mask, which is
const for safety: we use accessors now (set_cpu_active) is we really
want to make a change.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Impact: cleanup
(Thanks to Al Viro for reminding me of this, via Ingo)
CPU_MASK_ALL is the (deprecated) "all bits set" cpumask, defined as so:
#define CPU_MASK_ALL (cpumask_t) { { ... } }
Taking the address of such a temporary is questionable at best,
unfortunately 321a8e9d (cpumask: add CPU_MASK_ALL_PTR macro) added
CPU_MASK_ALL_PTR:
#define CPU_MASK_ALL_PTR (&CPU_MASK_ALL)
Which formalizes this practice. One day gcc could bite us over this
usage (though we seem to have gotten away with it so far).
So replace everywhere which used &CPU_MASK_ALL or CPU_MASK_ALL_PTR
with the modern "cpu_all_mask" (a real const struct cpumask *).
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Acked-by: Ingo Molnar <mingo@elte.hu>
Reported-by: Al Viro <viro@zeniv.linux.org.uk>
Cc: Mike Travis <travis@sgi.com>
* git://git.kernel.org/pub/scm/linux/kernel/git/arjan/linux-2.6-async-for-30:
fastboot: remove duplicate unpack_to_rootfs()
ide/net: flip the order of SATA and network init
async: remove the temporary (2.6.29) "async is off by default" code
Fix up conflicts in init/initramfs.c manually
we check if initrd is initramfs first and then do the real unpack. The check
isn't required, we can directly do unpack. If the initrd isn't an
initramfs, we can remove the garbage. In my laptop, this saves 0.1s boot
time.
This patch penalizes non-initramfs initrd case, but nowadays, initramfs is
the most widely used method for initrds.
Signed-off-by: Shaohua Li <shaohua.li@intel.com>
Acked-by: Arjan van de Ven <arjan@infradead.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
The COMPAT_BRK kconfig symbol does not depend on EMBEDDED, but it is in
the midst of the EMBEDDED menu symbols, so it mucks up the EMBEDDED menu.
Fix by moving it to just after all of the EMBEDDED menu symbols. Also,
ANON_INODES has a similar problem, so move it to just above the EMBEDDED
menu items since it is used in the EMBEDDED menu.
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This reverts commit 155b25bcc2, which was
totally wrong - the "embedded" options still exists (very much so) even
on non-embedded platforms.
It's just that we don't bother with actually asking about them when
we're not embedded, we just take their default values (which is usually
'y' - the options add features that may not be worth it in a constrained
environment).
Noticed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The COMPAT_BRK kconfig symbol does not depend on EMBEDDED, but it is in
the midst of the EMBEDDED menu symbols, so it mucks up the EMBEDDED
menu. Fix by moving it to just after all of the EMBEDDED menu symbols.
Also, surround all of the EMBEDDED symbols with "if EMBEDDED"/"endif" so
that this EMBEDDED block is clearer.
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This patch fixes a bug located by Vegard Nossum with the aid of
kmemcheck, updated based on review comments from Nick Piggin,
Ingo Molnar, and Andrew Morton. And cleans up the variable-name
and function-name language. ;-)
The boot CPU runs in the context of its idle thread during boot-up.
During this time, idle_cpu(0) will always return nonzero, which will
fool Classic and Hierarchical RCU into deciding that a large chunk of
the boot-up sequence is a big long quiescent state. This in turn causes
RCU to prematurely end grace periods during this time.
This patch changes the rcutree.c and rcuclassic.c rcu_check_callbacks()
function to ignore the idle task as a quiescent state until the
system has started up the scheduler in rest_init(), introducing a
new non-API function rcu_idle_now_means_idle() to inform RCU of this
transition. RCU maintains an internal rcu_idle_cpu_truthful variable
to track this state, which is then used by rcu_check_callback() to
determine if it should believe idle_cpu().
Because this patch has the effect of disallowing RCU grace periods
during long stretches of the boot-up sequence, this patch also introduces
Josh Triplett's UP-only optimization that makes synchronize_rcu() be a
no-op if num_online_cpus() returns 1. This allows boot-time code that
calls synchronize_rcu() to proceed normally. Note, however, that RCU
callbacks registered by call_rcu() will likely queue up until later in
the boot sequence. Although rcuclassic and rcutree can also use this
same optimization after boot completes, rcupreempt must restrict its
use of this optimization to the portion of the boot sequence before the
scheduler starts up, given that an rcupreempt RCU read-side critical
section may be preeempted.
In addition, this patch takes Nick Piggin's suggestion to make the
system_state global variable be __read_mostly.
Changes since v4:
o Changes the name of the introduced function and variable to
be less emotional. ;-)
Changes since v3:
o WARN_ON(nr_context_switches() > 0) to verify that RCU
switches out of boot-time mode before the first context
switch, as suggested by Nick Piggin.
Changes since v2:
o Created rcu_blocking_is_gp() internal-to-RCU API that
determines whether a call to synchronize_rcu() is itself
a grace period.
o The definition of rcu_blocking_is_gp() for rcuclassic and
rcutree checks to see if but a single CPU is online.
o The definition of rcu_blocking_is_gp() for rcupreempt
checks to see both if but a single CPU is online and if
the system is still in early boot.
This allows rcupreempt to again work correctly if running
on a single CPU after booting is complete.
o Added check to rcupreempt's synchronize_sched() for there
being but one online CPU.
Tested all three variants both SMP and !SMP, booted fine, passed a short
rcutorture test on both x86 and Power.
Located-by: Vegard Nossum <vegard.nossum@gmail.com>
Tested-by: Vegard Nossum <vegard.nossum@gmail.com>
Tested-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
there's a few places that currently loop over driver_probe_done(), and
I'm about to add another one. This patch abstracts it into a helper
to reduce duplication.
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Cc: Len Brown <lenb@kernel.org>
Acked-by: Greg KH <gregkh@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Sometimes it happens that KConfig dependencies are not handled
like in the following scenario:
- config A
bool
- config B
bool
depends on A
- config C
bool
select B
If one selects C, then it will select B without checking its
dependency to A, if A hasn't been selected elsewhere, it will
result in a build failure.
This is what happens on the following build error:
kernel/built-in.o: In function `marker_update_probe_range':
(.text+0x52f64): undefined reference to `tracepoint_probe_register_noupdate'
kernel/built-in.o: In function `marker_update_probe_range':
(.text+0x52f74): undefined reference to `tracepoint_probe_unregister_noupdate'
kernel/built-in.o: In function `marker_update_probe_range':
(.text+0x52fb9): undefined reference to `tracepoint_probe_unregister_noupdate'
kernel/built-in.o: In function `marker_update_probes':
marker.c:(.text+0x530ba): undefined reference to `tracepoint_probe_update_all'
CONFIG_KVM_TRACE will select CONFIG_MARKER, but the latter
depends on CONFIG_TRACEPOINTS which will not be selected.
Reported-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: cleanup
disable_ioapic_setup() in init/main.c is ugly as the function is
x86-specific. The #ifdef inline prototype there is ugly too.
Replace it with a generic arch_disable_smp_support() function - which
has a weak alias for non-x86 architectures and for non-ioapic x86 builds.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (92 commits)
gianfar: Revive VLAN support
vlan: Export symbols as non GPL symbols.
bnx2x: tx_has_work should not wait for FW
netxen: reduce memory footprint
netxen: fix vlan tso/checksum offload
net: Fix linux/if_frad.h's suitability for userspace.
net: Move config NET_NS to from net/Kconfig to init/Kconfig
isdn: Fix missing ifdef in isdn_ppp
networking: document "nc" in addition to "netcat" in netconsole.txt
e1000e: workaround hw errata
af_key: initialize xfrm encap_oa
virtio_net: Fix MAX_PACKET_LEN to support 802.1Q VLANs
lcs: fix compilation for !CONFIG_IP_MULTICAST
rtl8187: Add termination packet to prevent stall
iwlwifi: fix rs_get_rate WARN_ON()
p54usb: fix packet loss with first generation devices
sctp: Fix another socket race during accept/peeloff
sctp: Properly timestamp outgoing data chunks for rtx purposes
sctp: Correctly start rtx timer on new packet transmissions.
sctp: Fix crc32c calculations on big-endian arhes.
...
Make NET_NS available underneath the generic Namespaces config option
since all of the other namespace options are there.
Signed-off-by: Matt Helsley <matthltc@us.ibm.com>
Acked-by: Serge Hallyn <serue@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ingo Molnar wrote:
> here's a new build failure with tip/sched/rt:
>
> LD .tmp_vmlinux1
> kernel/built-in.o: In function `set_curr_task_rt':
> sched.c:(.text+0x3675): undefined reference to `plist_del'
> kernel/built-in.o: In function `pick_next_task_rt':
> sched.c:(.text+0x37ce): undefined reference to `plist_del'
> kernel/built-in.o: In function `enqueue_pushable_task':
> sched.c:(.text+0x381c): undefined reference to `plist_del'
Eliminate the plist library kconfig and make it available
unconditionally.
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
- move CONFIG_PROC_PID_CPUSET into cgroup menu
- move MM_OWNER to the bottom for better menu indent
- fix typos
- use tabs not spaces
Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Acked-by: Paul Menage <menage@google.com>
Acked-by: Balbir Singh <balbir@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Move RCU Kconfig options from top-level menu to an "RCU Subsystem"
menu under the "General Setup" menu.
Signed-off-by: Mike Travis <travis@sgi.com>
Tested-by: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This reverts commit ad7a953c52.
And commit: ("allow stripping of generated symbols under CONFIG_KALLSYMS_ALL")
9bb482476c
These stripping patches has caused a set of issues:
1) People have reported compatibility issues with binutils due to
lack of support for `--strip-unneeded-symbols' with objcopy 2.15.92.0.2
Reported by: Wenji
2) ccache and distcc no longer works as expeced
Reported by: Ted, Roland, + others
3) The installed modules increased a lot in size
Reported by: Ted, Davej + others
Reported-by: Wenji Huang <wenji.huang@oracle.com>
Reported-by: "Theodore Ts'o" <tytso@mit.edu>
Reported-by: Dave Jones <davej@redhat.com>
Reported-by: Roland McGrath <roland@redhat.com>
Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
Impact: More consistent behaviour, avoid policy in the kernel
Upgrade/downgrade initrd/initramfs decompression failure from
inconsistently a panic or a KERN_ALERT message to a KERN_EMERG event.
It is, however, possible do design a system which can recover from
this (using the kernel builtin code and/or the internal initramfs),
which means this is policy, not a technical necessity.
A good way to handle this would be to have a panic-level=X option, to
force a panic on a printk above a certain level. That is a separate
patch, however.
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Instead of failing to identify a compressed image with a decompressor
that we don't have compiled in, identify it and fail with a
comprehensible panic message.
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Impact: build fix
flush_buffer() is used unconditionally:
init/initramfs.c:456: error: 'flush_buffer' undeclared (first use in this function)
init/initramfs.c:456: error: (Each undeclared identifier is reported only once
init/initramfs.c:456: error: for each function it appears in.)
So remove the decompressor #ifdefs from around it.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
* git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-2.6-nommu:
NOMMU: Support XIP on initramfs
NOMMU: Teach kobjsize() about VMA regions.
FLAT: Don't attempt to expand the userspace stack to fill the space allocated
FDPIC: Don't attempt to expand the userspace stack to fill the space allocated
NOMMU: Improve procfs output using per-MM VMAs
NOMMU: Make mmap allocation page trimming behaviour configurable.
NOMMU: Make VMAs per MM as for MMU-mode linux
NOMMU: Delete askedalloc and realalloc variables
NOMMU: Rename ARM's struct vm_region
NOMMU: Fix cleanup handling in ramfs_nommu_get_umapped_area()
Centralize the compression format detection to a common routine in the
lib directory, and use it for both initramfs and initrd.
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Config and control variable for mem+swap controller.
This patch adds CONFIG_CGROUP_MEM_RES_CTLR_SWAP
(memory resource controller swap extension.)
For accounting swap, it's obvious that we have to use additional memory to
remember "who uses swap". This adds more overhead. So, it's better to
offer "choice" to users. This patch adds 2 choices.
This patch adds 2 parameters to enable swap extension or not.
- CONFIG
- boot option
Reviewed-by: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Li Zefan <lizf@cn.fujitsu.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Making CGROUP related configs be a sub-menu.
This patch make CGROUP related configs be a sub-menu and makes 1st level
configs of "General Setup" shorter.
including following additional changes
- add help comment about CGROUPS and GROUP_SCHED.
- moved MM_OWNER config to the bottom.
(for good indent in menuconfig)
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Reviewed-by: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Menage <menage@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Support XIP on files unpacked from the initramfs image on NOMMU systems. This
simply requires the length of the file to be preset so that the ramfs fs can
attempt to garner sufficient contiguous storage to store the file (NOMMU mmap
can only map contiguous RAM).
All the other bits to do XIP on initramfs files are present:
(1) ramfs's truncate attempts to allocate a contiguous run of pages when a
file is truncated upwards from nothing.
(2) ramfs sets BDI on its files to indicate direct mapping is possible, and
that its files can be mapped for read, write and exec.
(3) NOMMU mmap() will use the above bits to determine that it can do XIP.
Possibly this needs better controls, because it will _always_ try and do
XIP.
One disadvantage of this very simplistic approach is that sufficient space
will be allocated to store the whole file, and not just the bit that would be
XIP'd. To deal with this, though, the initramfs unpacker would have to be
able to parse the file contents.
Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Paul Mundt <lethal@linux-sh.org>
* git://git.kernel.org/pub/scm/linux/kernel/git/arjan/linux-2.6-async:
async: don't do the initcall stuff post boot
bootchart: improve output based on Dave Jones' feedback
async: make the final inode deletion an asynchronous event
fastboot: Make libata initialization even more async
fastboot: make the libata port scan asynchronous
fastboot: make scsi probes asynchronous
async: Asynchronous function calls to speed up kernel boot
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (24 commits)
trivial: chack -> check typo fix in main Makefile
trivial: Add a space (and a comma) to a printk in 8250 driver
trivial: Fix misspelling of "firmware" in docs for ncr53c8xx/sym53c8xx
trivial: Fix misspelling of "firmware" in powerpc Makefile
trivial: Fix misspelling of "firmware" in usb.c
trivial: Fix misspelling of "firmware" in qla1280.c
trivial: Fix misspelling of "firmware" in a100u2w.c
trivial: Fix misspelling of "firmware" in megaraid.c
trivial: Fix misspelling of "firmware" in ql4_mbx.c
trivial: Fix misspelling of "firmware" in acpi_memhotplug.c
trivial: Fix misspelling of "firmware" in ipw2100.c
trivial: Fix misspelling of "firmware" in atmel.c
trivial: Fix misspelled firmware in Kconfig
trivial: fix an -> a typos in documentation and comments
trivial: fix then -> than typos in comments and documentation
trivial: update Jesper Juhl CREDITS entry with new email
trivial: fix singal -> signal typo
trivial: Fix incorrect use of "loose" in event.c
trivial: printk: fix indentation of new_text_line declaration
trivial: rtc-stk17ta8: fix sparse warning
...
Right now, most of the kernel boot is strictly synchronous, such that
various hardware delays are done sequentially.
In order to make the kernel boot faster, this patch introduces
infrastructure to allow doing some of the initialization steps
asynchronously, which will hide significant portions of the hardware delays
in practice.
In order to not change device order and other similar observables, this
patch does NOT do full parallel initialization.
Rather, it operates more in the way an out of order CPU does; the work may
be done out of order and asynchronous, but the observable effects
(instruction retiring for the CPU) are still done in the original sequence.
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Impact: Resolves build failures in some configurations
Makes it possible to disable CONFIG_RD_GZIP . In that case, the
built-in initramfs will be compressed by whatever compressor is
available (bzip2 or lzma) or left uncompressed if none is available.
It also removes a couple of warnings which occur when no ramdisk
compression at all is chosen.
It also restores the select ZLIB_INFLATE in drivers/block/Kconfig
which somehow came missing. This is needed to activate compilation of
the stuff in zlib_deflate.
Signed-off-by: Alain Knaff <alain@knaff.lu>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
The function autodetect_raid is only used by __init functions, and it refers
to __initdata, so it needs __init markings. Fixes this error:
The function autodetect_raid() references
the variable __initdata raid_noautodetect.
This is often because autodetect_raid lacks a __initdata
annotation or the annotation of raid_noautodetect is wrong.
Signed-off-by: Mike Frysinger <vapier@gentoo.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
In the past, I used the root=... command line parameter to specify the
root filesystem to the kernel. Now it seems that specifying it is not
necessary. The kernel detects the root filesystem even if the kernel
command line is empty. My root fs is on a raid1 device by the way, and I
am not using initrd for the boot process.
If the kernel detects the root filesystem somehow, I think it should print
out the result of this detection, otherwise I will not know which device
has the root filesystem. Or is there an easy way to get this information
on a running system? I had a quick look at the /proc and /sys
filesystems, but haven't found anything useful there.
Signed-off-by: Marton Balint <cus@fazekas.hu>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
checkpatch warns about 'static void noinline'. It wants `static noinline
void'.
Both are permissible, but the kernel consistently uses `static inline' and
`static noinline', and consistency is good. Hence let's keep the
checkpatch warning and fix up this code site.
[akpm@linux-foundation.org: rewrote changelog]
Signed-off-by: Md.Rakib H. Mullick <rakib.mullick@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
tiny-shmem shares most of its 130 lines of code with shmem and tends to
break when particular bits of shmem get modified. Unifying saves code and
makes keeping these two in sync much easier.
before:
14367 392 24 14783 39bf mm/shmem.o
396 72 8 476 1dc mm/tiny-shmem.o
after:
14367 392 24 14783 39bf mm/shmem.o
412 72 8 492 1ec mm/shmem.o tiny
Signed-off-by: Matt Mackall <mpm@selenic.com>
Acked-by: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This should make the help text of SYSFS_DEPRECATED more clear, that this
is _not_ about (what some people think it is) suppressing a few symlinks
and variables, but a different sysfs _layout_ with new features.
Signed-off-by: Kay Sievers <kay.sievers@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Impact: Bug fix (we should not show this menu on irrelevant architectures)
Make the config machinery to drive the gzip/bzip2/lzma selection
dependent on the architecture advertising HAVE_KERNEL_* so that we
don't display this for architectures where it doesn't matter.
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Impact: Code simplification
Instead of open-coding testing for initramfs compression formats, use
a table.
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Impact: New code for initramfs decompression, new features
This is the second part of the bzip2/lzma patch
The bzip patch is based on an idea by Christian Ludwig, includes support for
compressing the kernel with bzip2 or lzma rather than gzip. Both
compressors give smaller sizes than gzip. Lzma's decompresses faster
than bzip2.
It also supports ramdisks and initramfs' compressed using these two
compressors.
The functionality has been successfully used for a couple of years by
the udpcast project
This version applies to "tip" kernel 2.6.28
This part contains:
- support for new compressions (bzip2 and lzma) in initramfs and
old-style ramdisk
- config dialog for kernel compression (but new kernel compressions
not yet supported)
Signed-off-by: Alain Knaff <alain@knaff.lu>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Now that nothing depends on it any more, remove CONFIG_KMOD.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
* 'cpus4096-for-linus-3' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (77 commits)
x86: setup_per_cpu_areas() cleanup
cpumask: fix compile error when CONFIG_NR_CPUS is not defined
cpumask: use alloc_cpumask_var_node where appropriate
cpumask: convert shared_cpu_map in acpi_processor* structs to cpumask_var_t
x86: use cpumask_var_t in acpi/boot.c
x86: cleanup some remaining usages of NR_CPUS where s/b nr_cpu_ids
sched: put back some stack hog changes that were undone in kernel/sched.c
x86: enable cpus display of kernel_max and offlined cpus
ia64: cpumask fix for is_affinity_mask_valid()
cpumask: convert RCU implementations, fix
xtensa: define __fls
mn10300: define __fls
m32r: define __fls
h8300: define __fls
frv: define __fls
cris: define __fls
cpumask: CONFIG_DISABLE_OBSOLETE_CPUMASK_FUNCTIONS
cpumask: zero extra bits in alloc_cpumask_var_node
cpumask: replace for_each_cpu_mask_nr with for_each_cpu in kernel/time/
cpumask: convert mm/
...
* 'cpus4096-for-linus-2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (66 commits)
x86: export vector_used_by_percpu_irq
x86: use logical apicid in x2apic_cluster's x2apic_cpu_mask_to_apicid_and()
sched: nominate preferred wakeup cpu, fix
x86: fix lguest used_vectors breakage, -v2
x86: fix warning in arch/x86/kernel/io_apic.c
sched: fix warning in kernel/sched.c
sched: move test_sd_parent() to an SMP section of sched.h
sched: add SD_BALANCE_NEWIDLE at MC and CPU level for sched_mc>0
sched: activate active load balancing in new idle cpus
sched: bias task wakeups to preferred semi-idle packages
sched: nominate preferred wakeup cpu
sched: favour lower logical cpu number for sched_mc balance
sched: framework for sched_mc/smt_power_savings=N
sched: convert BALANCE_FOR_xx_POWER to inline functions
x86: use possible_cpus=NUM to extend the possible cpus allowed
x86: fix cpu_mask_to_apicid_and to include cpu_online_mask
x86: update io_apic.c to the new cpumask code
x86: Introduce topology_core_cpumask()/topology_thread_cpumask()
x86: xen: use smp_call_function_many()
x86: use work_on_cpu in x86/kernel/cpu/mcheck/mce_amd_64.c
...
Fixed up trivial conflict in kernel/time/tick-sched.c manually
Impact: cleanup
We now have a cleaner check for gcc 4.1.0/4.1.1 trouble in
include/linux/compiler-gcc4.h, so remove the 4.1.0 quirk from
init/main.c.
Reported-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Acked-by: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Impact: use new API
cpu_*_map are going away in favour of cpu_*_mask, but const pointers.
So we have accessors where we really do want to frob them. Archs
will also need the (trivial) conversion before we can finally remove
cpu_*_map.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Mike Travis <travis@sgi.com>
* 'irq-fixes-for-linus-4' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
sparseirq: move __weak symbols into separate compilation unit
sparseirq: work around __weak alias bug
sparseirq: fix hang with !SPARSE_IRQ
sparseirq: set lock_class for legacy irq when sparse_irq is selected
sparseirq: work around compiler optimizing away __weak functions
sparseirq: fix desc->lock init
sparseirq: do not printk when migrating IRQ descriptors
sparseirq: remove duplicated arch_early_irq_init()
irq: simplify for_each_irq_desc() usage
proc: remove ifdef CONFIG_SPARSE_IRQ from stat.c
irq: for_each_irq_desc() move to irqnr.h
hrtimer: remove #include <linux/irq.h>
* 'irq-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86, sparseirq: clean up Kconfig entry
x86: turn CONFIG_SPARSE_IRQ off by default
sparseirq: fix numa_migrate_irq_desc dependency and comments
sparseirq: add kernel-doc notation for new member in irq_desc, -v2
locking, irq: enclose irq_desc_lock_class in CONFIG_LOCKDEP
sparseirq, xen: make sure irq_desc is allocated for interrupts
sparseirq: fix !SMP building, #2
x86, sparseirq: move irq_desc according to smp_affinity, v7
proc: enclose desc variable of show_stat() in CONFIG_SPARSE_IRQ
sparse irqs: add irqnr.h to the user headers list
sparse irqs: handle !GENIRQ platforms
sparseirq: fix !SMP && !PCI_MSI && !HT_IRQ build
sparseirq: fix Alpha build failure
sparseirq: fix typo in !CONFIG_IO_APIC case
x86, MSI: pass irq_cfg and irq_desc
x86: MSI start irq numbering from nr_irqs_gsi
x86: use NR_IRQS_LEGACY
sparse irq_desc[] array: core kernel and x86 changes
genirq: record IRQ_LEVEL in irq_desc[]
irq.h: remove padding from irq_desc on 64bits
* 'core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (63 commits)
stacktrace: provide save_stack_trace_tsk() weak alias
rcu: provide RCU options on non-preempt architectures too
printk: fix discarding message when recursion_bug
futex: clean up futex_(un)lock_pi fault handling
"Tree RCU": scalable classic RCU implementation
futex: rename field in futex_q to clarify single waiter semantics
x86/swiotlb: add default swiotlb_arch_range_needs_mapping
x86/swiotlb: add default phys<->bus conversion
x86: unify pci iommu setup and allow swiotlb to compile for 32 bit
x86: add swiotlb allocation functions
swiotlb: consolidate swiotlb info message printing
swiotlb: support bouncing of HighMem pages
swiotlb: factor out copy to/from device
swiotlb: add arch hook to force mapping
swiotlb: allow architectures to override phys<->bus<->phys conversions
swiotlb: add comment where we handle the overflow of a dma mask on 32 bit
rcu: fix rcutorture behavior during reboot
resources: skip sanity check of busy resources
swiotlb: move some definitions to header
swiotlb: allow architectures to override swiotlb pool allocation
...
Fix up trivial conflicts in
arch/x86/kernel/Makefile
arch/x86/mm/init_32.c
include/linux/hardirq.h
as per Ingo's suggestions.
Impact: new tracer plugin
This patch adapts kmemtrace raw events tracing to the unified tracing API.
To enable and use this tracer, just do the following:
echo kmemtrace > /debugfs/tracing/current_tracer
cat /debugfs/tracing/trace
You will have the following output:
# tracer: kmemtrace
#
#
# ALLOC TYPE REQ GIVEN FLAGS POINTER NODE CALLER
# FREE | | | | | | | |
# |
type_id 1 call_site 18446744071565527833 ptr 18446612134395152256
type_id 0 call_site 18446744071565585597 ptr 18446612134405955584 bytes_req 4096 bytes_alloc 4096 gfp_flags 208 node -1
type_id 1 call_site 18446744071565585534 ptr 18446612134405955584
type_id 0 call_site 18446744071565585597 ptr 18446612134405955584 bytes_req 4096 bytes_alloc 4096 gfp_flags 208 node -1
type_id 0 call_site 18446744071565636711 ptr 18446612134345164672 bytes_req 240 bytes_alloc 240 gfp_flags 208 node -1
type_id 1 call_site 18446744071565585534 ptr 18446612134405955584
type_id 0 call_site 18446744071565585597 ptr 18446612134405955584 bytes_req 4096 bytes_alloc 4096 gfp_flags 208 node -1
type_id 0 call_site 18446744071565636711 ptr 18446612134345164912 bytes_req 240 bytes_alloc 240 gfp_flags 208 node -1
type_id 1 call_site 18446744071565585534 ptr 18446612134405955584
type_id 0 call_site 18446744071565585597 ptr 18446612134405955584 bytes_req 4096 bytes_alloc 4096 gfp_flags 208 node -1
type_id 0 call_site 18446744071565636711 ptr 18446612134345165152 bytes_req 240 bytes_alloc 240 gfp_flags 208 node -1
type_id 0 call_site 18446744071566144042 ptr 18446612134346191680 bytes_req 1304 bytes_alloc 1312 gfp_flags 208 node -1
type_id 1 call_site 18446744071565585534 ptr 18446612134405955584
type_id 0 call_site 18446744071565585597 ptr 18446612134405955584 bytes_req 4096 bytes_alloc 4096 gfp_flags 208 node -1
type_id 1 call_site 18446744071565585534 ptr 18446612134405955584
That was to stay backward compatible with the format output produced in
inux/tracepoint.h.
This is the default ouput, but note that I tried something else.
If you change an option:
echo kmem_minimalistic > /debugfs/trace_options
and then cat /debugfs/trace, you will have the following output:
# tracer: kmemtrace
#
#
# ALLOC TYPE REQ GIVEN FLAGS POINTER NODE CALLER
# FREE | | | | | | | |
# |
- C 0xffff88007c088780 file_free_rcu
+ K 4096 4096 000000d0 0xffff88007cad6000 -1 getname
- C 0xffff88007cad6000 putname
+ K 4096 4096 000000d0 0xffff88007cad6000 -1 getname
+ K 240 240 000000d0 0xffff8800790dc780 -1 d_alloc
- C 0xffff88007cad6000 putname
+ K 4096 4096 000000d0 0xffff88007cad6000 -1 getname
+ K 240 240 000000d0 0xffff8800790dc870 -1 d_alloc
- C 0xffff88007cad6000 putname
+ K 4096 4096 000000d0 0xffff88007cad6000 -1 getname
+ K 240 240 000000d0 0xffff8800790dc960 -1 d_alloc
+ K 1304 1312 000000d0 0xffff8800791d7340 -1 reiserfs_alloc_inode
- C 0xffff88007cad6000 putname
+ K 4096 4096 000000d0 0xffff88007cad6000 -1 getname
- C 0xffff88007cad6000 putname
+ K 992 1000 000000d0 0xffff880079045b58 -1 alloc_inode
+ K 768 1024 000080d0 0xffff88007c096400 -1 alloc_pipe_info
+ K 240 240 000000d0 0xffff8800790dca50 -1 d_alloc
+ K 272 320 000080d0 0xffff88007c088780 -1 get_empty_filp
+ K 272 320 000080d0 0xffff88007c088000 -1 get_empty_filp
Yeah I shall confess kmem_minimalistic should be: kmem_alternative.
Whatever, I find it more readable but this a personal opinion of course.
We can drop it if you want.
On the ALLOC/FREE column, + means an allocation and - a free.
On the type column, you have K = kmalloc, C = cache, P = page
I would like the flags to be GFP_* strings but that would not be easy to not
break the column with strings....
About the node...it seems to always be -1. I don't know why but that shouldn't
be difficult to find.
I moved linux/tracepoint.h to trace/tracepoint.h as well. I think that would
be more easy to find the tracer headers if they are all in their common
directory.
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
kmemtrace provides tracing for slab allocator functions, such as kmalloc,
kfree, kmem_cache_alloc, kmem_cache_free etc.. Collected data is then fed
to the userspace application in order to analyse allocation hotspots,
internal fragmentation and so on, making it possible to see how well an
allocator performs, as well as debug and profile kernel code.
Signed-off-by: Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro>
Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>
GCC has a bug with __weak alias functions: if the functions are in
the same compilation unit as their call site, GCC can decide to
inline them - and thus rob the linker of the opportunity to override
the weak alias with the real thing.
So move all the IRQ handling related __weak symbols to kernel/irq/chip.c.
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
* git://git.kernel.org/pub/scm/linux/kernel/git/sam/kbuild-next: (25 commits)
allow stripping of generated symbols under CONFIG_KALLSYMS_ALL
kbuild: strip generated symbols from *.ko
kbuild: simplify use of genksyms
kernel-doc: check for extra kernel-doc notations
kbuild: add headerdep used to detect inclusion cycles in header files
kbuild: fix string equality testing in tags.sh
kbuild: fix make tags/cscope
kbuild: fix make incompatibility
kbuild: remove TAR_IGNORE
setlocalversion: add git-svn support
setlocalversion: print correct subversion revision
scripts: improve the decodecode script
scripts/package: allow custom options to rpm
genksyms: allow to ignore symbol checksum changes
genksyms: track symbol checksum changes
tags and cscope support really belongs in a shell script
kconfig: fix options to check-lxdialog.sh
kbuild: gen_init_cpio expands shell variables in file names
remove bashisms from scripts/extract-ikconfig
kbuild: teach mkmakfile to be silent
...
* 'tracing-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (241 commits)
sched, trace: update trace_sched_wakeup()
tracing/ftrace: don't trace on early stage of a secondary cpu boot, v3
Revert "x86: disable X86_PTRACE_BTS"
ring-buffer: prevent false positive warning
ring-buffer: fix dangling commit race
ftrace: enable format arguments checking
x86, bts: memory accounting
x86, bts: add fork and exit handling
ftrace: introduce tracing_reset_online_cpus() helper
tracing: fix warnings in kernel/trace/trace_sched_switch.c
tracing: fix warning in kernel/trace/trace.c
tracing/ring-buffer: remove unused ring_buffer size
trace: fix task state printout
ftrace: add not to regex on filtering functions
trace: better use of stack_trace_enabled for boot up code
trace: add a way to enable or disable the stack tracer
x86: entry_64 - introduce FTRACE_ frame macro v2
tracing/ftrace: add the printk-msg-only option
tracing/ftrace: use preempt_enable_no_resched_notrace in ring_buffer_time_stamp()
x86, bts: correctly report invalid bts records
...
Fixed up trivial conflict in scripts/recordmcount.pl due to SH bits
being already partly merged by the SH merge.
Impact: fix panic on null pointer with sparseirq
Some GCC versions seem to inline the weak global function,
when that function is empty.
Work it around, by making the functions return a (dummy) integer.
Signed-off-by: Yinghai <yinghai@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: build fix
Some old architectures still do not use kernel/Kconfig.preempt, so the
moving of the RCU options there broke their build:
In file included from /home/mingo/tip/include/linux/sem.h:81,
from /home/mingo/tip/include/linux/sched.h:69,
from /home/mingo/tip/arch/alpha/kernel/asm-offsets.c:9:
/home/mingo/tip/include/linux/rcupdate.h:62:2: error: #error "Unknown RCU implementation specified to kernel configuration"
Move these options back to init/Kconfig, which every architecture
includes.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Building upon parts of the module stripping patch, this patch
introduces similar stripping for vmlinux when CONFIG_KALLSYMS_ALL=y.
Using CONFIG_KALLSYMS_STRIP_GENERATED reduces the overhead of
CONFIG_KALLSYMS_ALL from 245k/310k to 65k/80k for the (i386/x86-64)
kernels I tested with.
The patch also does away with the need to special case the kallsyms-
internal symbols by making them available even in the first linking
stage.
While it is a generated file, the patch includes the changes to
scripts/genksyms/keywords.c_shipped, as I'm unsure what the procedure
here is.
Signed-off-by: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
This patch fixes a long-standing performance bug in classic RCU that
results in massive internal-to-RCU lock contention on systems with
more than a few hundred CPUs. Although this patch creates a separate
flavor of RCU for ease of review and patch maintenance, it is intended
to replace classic RCU.
This patch still handles stress better than does mainline, so I am still
calling it ready for inclusion. This patch is against the -tip tree.
Nevertheless, experience on an actual 1000+ CPU machine would still be
most welcome.
Most of the changes noted below were found while creating an rcutiny
(which should permit ejecting the current rcuclassic) and while doing
detailed line-by-line documentation.
Updates from v9 (http://lkml.org/lkml/2008/12/2/334):
o Fixes from remainder of line-by-line code walkthrough,
including comment spelling, initialization, undesirable
narrowing due to type conversion, removing redundant memory
barriers, removing redundant local-variable initialization,
and removing redundant local variables.
I do not believe that any of these fixes address the CPU-hotplug
issues that Andi Kleen was seeing, but please do give it a whirl
in case the machine is smarter than I am.
A writeup from the walkthrough may be found at the following
URL, in case you are suffering from terminal insomnia or
masochism:
http://www.kernel.org/pub/linux/kernel/people/paulmck/tmp/rcutree-walkthrough.2008.12.16a.pdf
o Made rcutree tracing use seq_file, as suggested some time
ago by Lai Jiangshan.
o Added a .csv variant of the rcudata debugfs trace file, to allow
people having thousands of CPUs to drop the data into
a spreadsheet. Tested with oocalc and gnumeric. Updated
documentation to suit.
Updates from v8 (http://lkml.org/lkml/2008/11/15/139):
o Fix a theoretical race between grace-period initialization and
force_quiescent_state() that could occur if more than three
jiffies were required to carry out the grace-period
initialization. Which it might, if you had enough CPUs.
o Apply Ingo's printk-standardization patch.
o Substitute local variables for repeated accesses to global
variables.
o Fix comment misspellings and redundant (but harmless) increments
of ->n_rcu_pending (this latter after having explicitly added it).
o Apply checkpatch fixes.
Updates from v7 (http://lkml.org/lkml/2008/10/10/291):
o Fixed a number of problems noted by Gautham Shenoy, including
the cpu-stall-detection bug that he was having difficulty
convincing me was real. ;-)
o Changed cpu-stall detection to wait for ten seconds rather than
three in order to reduce false positive, as suggested by Ingo
Molnar.
o Produced a design document (http://lwn.net/Articles/305782/).
The act of writing this document uncovered a number of both
theoretical and "here and now" bugs as noted below.
o Fix dynticks_nesting accounting confusion, simplify WARN_ON()
condition, fix kerneldoc comments, and add memory barriers
in dynticks interface functions.
o Add more data to tracing.
o Remove unused "rcu_barrier" field from rcu_data structure.
o Count calls to rcu_pending() from scheduling-clock interrupt
to use as a surrogate timebase should jiffies stop counting.
o Fix a theoretical race between force_quiescent_state() and
grace-period initialization. Yes, initialization does have to
go on for some jiffies for this race to occur, but given enough
CPUs...
Updates from v6 (http://lkml.org/lkml/2008/9/23/448):
o Fix a number of checkpatch.pl complaints.
o Apply review comments from Ingo Molnar and Lai Jiangshan
on the stall-detection code.
o Fix several bugs in !CONFIG_SMP builds.
o Fix a misspelled config-parameter name so that RCU now announces
at boot time if stall detection is configured.
o Run tests on numerous combinations of configurations parameters,
which after the fixes above, now build and run correctly.
Updates from v5 (http://lkml.org/lkml/2008/9/15/92, bad subject line):
o Fix a compiler error in the !CONFIG_FANOUT_EXACT case (blew a
changeset some time ago, and finally got around to retesting
this option).
o Fix some tracing bugs in rcupreempt that caused incorrect
totals to be printed.
o I now test with a more brutal random-selection online/offline
script (attached). Probably more brutal than it needs to be
on the people reading it as well, but so it goes.
o A number of optimizations and usability improvements:
o Make rcu_pending() ignore the grace-period timeout when
there is no grace period in progress.
o Make force_quiescent_state() avoid going for a global
lock in the case where there is no grace period in
progress.
o Rearrange struct fields to improve struct layout.
o Make call_rcu() initiate a grace period if RCU was
idle, rather than waiting for the next scheduling
clock interrupt.
o Invoke rcu_irq_enter() and rcu_irq_exit() only when
idle, as suggested by Andi Kleen. I still don't
completely trust this change, and might back it out.
o Make CONFIG_RCU_TRACE be the single config variable
manipulated for all forms of RCU, instead of the prior
confusion.
o Document tracing files and formats for both rcupreempt
and rcutree.
Updates from v4 for those missing v5 given its bad subject line:
o Separated dynticks interface so that NMIs and irqs call separate
functions, greatly simplifying it. In particular, this code
no longer requires a proof of correctness. ;-)
o Separated dynticks state out into its own per-CPU structure,
avoiding the duplicated accounting.
o The case where a dynticks-idle CPU runs an irq handler that
invokes call_rcu() is now correctly handled, forcing that CPU
out of dynticks-idle mode.
o Review comments have been applied (thank you all!!!).
For but one example, fixed the dynticks-ordering issue that
Manfred pointed out, saving me much debugging. ;-)
o Adjusted rcuclassic and rcupreempt to handle dynticks changes.
Attached is an updated patch to Classic RCU that applies a hierarchy,
greatly reducing the contention on the top-level lock for large machines.
This passes 10-hour concurrent rcutorture and online-offline testing on
128-CPU ppc64 without dynticks enabled, and exposes some timekeeping
bugs in presence of dynticks (exciting working on a system where
"sleep 1" hangs until interrupted...), which were fixed in the
2.6.27 kernel. It is getting more reliable than mainline by some
measures, so the next version will be against -tip for inclusion.
See also Manfred Spraul's recent patches (or his earlier work from
2004 at http://marc.info/?l=linux-kernel&m=108546384711797&w=2).
We will converge onto a common patch in the fullness of time, but are
currently exploring different regions of the design space. That said,
I have already gratefully stolen quite a few of Manfred's ideas.
This patch provides CONFIG_RCU_FANOUT, which controls the bushiness
of the RCU hierarchy. Defaults to 32 on 32-bit machines and 64 on
64-bit machines. If CONFIG_NR_CPUS is less than CONFIG_RCU_FANOUT,
there is no hierarchy. By default, the RCU initialization code will
adjust CONFIG_RCU_FANOUT to balance the hierarchy, so strongly NUMA
architectures may choose to set CONFIG_RCU_FANOUT_EXACT to disable
this balancing, allowing the hierarchy to be exactly aligned to the
underlying hardware. Up to two levels of hierarchy are permitted
(in addition to the root node), allowing up to 16,384 CPUs on 32-bit
systems and up to 262,144 CPUs on 64-bit systems. I just know that I
am going to regret saying this, but this seems more than sufficient
for the foreseeable future. (Some architectures might wish to set
CONFIG_RCU_FANOUT=4, which would limit such architectures to 64 CPUs.
If this becomes a real problem, additional levels can be added, but I
doubt that it will make a significant difference on real hardware.)
In the common case, a given CPU will manipulate its private rcu_data
structure and the rcu_node structure that it shares with its immediate
neighbors. This can reduce both lock and memory contention by multiple
orders of magnitude, which should eliminate the need for the strange
manipulations that are reported to be required when running Linux on
very large systems.
Some shortcomings:
o More bugs will probably surface as a result of an ongoing
line-by-line code inspection.
Patches will be provided as required.
o There are probably hangs, rcutorture failures, &c. Seems
quite stable on a 128-CPU machine, but that is kind of small
compared to 4096 CPUs. However, seems to do better than
mainline.
Patches will be provided as required.
o The memory footprint of this version is several KB larger
than rcuclassic.
A separate UP-only rcutiny patch will be provided, which will
reduce the memory footprint significantly, even compared
to the old rcuclassic. One such patch passes light testing,
and has a memory footprint smaller even than rcuclassic.
Initial reaction from various embedded guys was "it is not
worth it", so am putting it aside.
Credits:
o Manfred Spraul for ideas, review comments, and bugs spotted,
as well as some good friendly competition. ;-)
o Josh Triplett, Ingo Molnar, Peter Zijlstra, Mathieu Desnoyers,
Lai Jiangshan, Andi Kleen, Andy Whitcroft, and Andrew Morton
for reviews and comments.
o Thomas Gleixner for much-needed help with some timer issues
(see patches below).
o Jon M. Tollefson, Tim Pepper, Andrew Theurer, Jose R. Santos,
Andy Whitcroft, Darrick Wong, Nishanth Aravamudan, Anton
Blanchard, Dave Kleikamp, and Nathan Lynch for keeping machines
alive despite my heavy abuse^Wtesting.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: cleanup
Each SMP arch defines these themselves. Move them to a central
location.
Twists:
1) Some archs (m32, parisc, s390) set possible_map to all 1, so we add a
CONFIG_INIT_ALL_POSSIBLE for this rather than break them.
2) mips and sparc32 '#define cpu_possible_map phys_cpu_present_map'.
Those archs simply have phys_cpu_present_map replaced everywhere.
3) Alpha defined cpu_possible_map to cpu_present_map; this is tricky
so I just manipulate them both in sync.
4) IA64, cris and m32r have gratuitous 'extern cpumask_t cpu_possible_map'
declarations.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Reviewed-by: Grant Grundler <grundler@parisc-linux.org>
Tested-by: Tony Luck <tony.luck@intel.com>
Acked-by: Ingo Molnar <mingo@elte.hu>
Cc: Mike Travis <travis@sgi.com>
Cc: ink@jurassic.park.msu.ru
Cc: rmk@arm.linux.org.uk
Cc: starvik@axis.com
Cc: tony.luck@intel.com
Cc: takata@linux-m32r.org
Cc: ralf@linux-mips.org
Cc: grundler@parisc-linux.org
Cc: paulus@samba.org
Cc: schwidefsky@de.ibm.com
Cc: lethal@linux-sh.org
Cc: wli@holomorphy.com
Cc: davem@davemloft.net
Cc: jdike@addtoit.com
Cc: mingo@redhat.com
We merge the irq/sparseirq, x86/quirks and x86/reboot trees into the
cpus4096 tree because the io-apic changes in the sparseirq change
conflict with the cpumask changes in the cpumask tree, and we
want to resolve those.
Implement the core kernel bits of Performance Counters subsystem.
The Linux Performance Counter subsystem provides an abstraction of
performance counter hardware capabilities. It provides per task and per
CPU counters, and it provides event capabilities on top of those.
Performance counters are accessed via special file descriptors.
There's one file descriptor per virtual counter used.
The special file descriptor is opened via the perf_counter_open()
system call:
int
perf_counter_open(u32 hw_event_type,
u32 hw_event_period,
u32 record_type,
pid_t pid,
int cpu);
The syscall returns the new fd. The fd can be used via the normal
VFS system calls: read() can be used to read the counter, fcntl()
can be used to set the blocking mode, etc.
Multiple counters can be kept open at a time, and the counters
can be poll()ed.
See more details in Documentation/perf-counters.txt.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: new feature
Problem on distro kernels: irq_desc[NR_IRQS] takes megabytes of RAM with
NR_CPUS set to large values. The goal is to be able to scale up to much
larger NR_IRQS value without impacting the (important) common case.
To solve this, we generalize irq_desc[NR_IRQS] to an (optional) array of
irq_desc pointers.
When CONFIG_SPARSE_IRQ=y is used, we use kzalloc_node to get irq_desc,
this also makes the IRQ descriptors NUMA-local (to the site that calls
request_irq()).
This gets rid of the irq_cfg[] static array on x86 as well: irq_cfg now
uses desc->chip_data for x86 to store irq_cfg.
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Conflicts:
fs/nfsd/nfs4recover.c
Manually fixed above to use new creds API functions, e.g.
nfs4_save_creds().
Signed-off-by: James Morris <jmorris@namei.org>
Impact: fix initcall debug output on non-scalar ktime platforms (32-bit embedded)
The initcall_debug code access the tv64 member of ktime. This won't work
correctly for large deltas on platforms that don't use the scalar ktime
implementation.
Signed-off-by: Will Newton <will.newton@gmail.com>
Acked-by: Tim Bird <tim.bird@am.sony.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
* 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
cpuset: fix regression when failed to generate sched domains
sched, signals: fix the racy usage of ->signal in account_group_xxx/run_posix_cpu_timers
sched: fix kernel warning on /proc/sched_debug access
sched: correct sched-rt-group.txt pathname in init/Kconfig
Impact: new API
Add a new API trace_mark_tp(), which declares a marker within a
tracepoint probe. When the marker is activated, the tracepoint is
automatically enabled.
No branch test is used at the marker site, because it would be a
duplicate of the branch already present in the tracepoint.
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Conflicts:
security/keys/internal.h
security/keys/process_keys.c
security/keys/request_key.c
Fixed conflicts above by using the non 'tsk' versions.
Signed-off-by: James Morris <jmorris@namei.org>
Inaugurate copy-on-write credentials management. This uses RCU to manage the
credentials pointer in the task_struct with respect to accesses by other tasks.
A process may only modify its own credentials, and so does not need locking to
access or modify its own credentials.
A mutex (cred_replace_mutex) is added to the task_struct to control the effect
of PTRACE_ATTACHED on credential calculations, particularly with respect to
execve().
With this patch, the contents of an active credentials struct may not be
changed directly; rather a new set of credentials must be prepared, modified
and committed using something like the following sequence of events:
struct cred *new = prepare_creds();
int ret = blah(new);
if (ret < 0) {
abort_creds(new);
return ret;
}
return commit_creds(new);
There are some exceptions to this rule: the keyrings pointed to by the active
credentials may be instantiated - keyrings violate the COW rule as managing
COW keyrings is tricky, given that it is possible for a task to directly alter
the keys in a keyring in use by another task.
To help enforce this, various pointers to sets of credentials, such as those in
the task_struct, are declared const. The purpose of this is compile-time
discouragement of altering credentials through those pointers. Once a set of
credentials has been made public through one of these pointers, it may not be
modified, except under special circumstances:
(1) Its reference count may incremented and decremented.
(2) The keyrings to which it points may be modified, but not replaced.
The only safe way to modify anything else is to create a replacement and commit
using the functions described in Documentation/credentials.txt (which will be
added by a later patch).
This patch and the preceding patches have been tested with the LTP SELinux
testsuite.
This patch makes several logical sets of alteration:
(1) execve().
This now prepares and commits credentials in various places in the
security code rather than altering the current creds directly.
(2) Temporary credential overrides.
do_coredump() and sys_faccessat() now prepare their own credentials and
temporarily override the ones currently on the acting thread, whilst
preventing interference from other threads by holding cred_replace_mutex
on the thread being dumped.
This will be replaced in a future patch by something that hands down the
credentials directly to the functions being called, rather than altering
the task's objective credentials.
(3) LSM interface.
A number of functions have been changed, added or removed:
(*) security_capset_check(), ->capset_check()
(*) security_capset_set(), ->capset_set()
Removed in favour of security_capset().
(*) security_capset(), ->capset()
New. This is passed a pointer to the new creds, a pointer to the old
creds and the proposed capability sets. It should fill in the new
creds or return an error. All pointers, barring the pointer to the
new creds, are now const.
(*) security_bprm_apply_creds(), ->bprm_apply_creds()
Changed; now returns a value, which will cause the process to be
killed if it's an error.
(*) security_task_alloc(), ->task_alloc_security()
Removed in favour of security_prepare_creds().
(*) security_cred_free(), ->cred_free()
New. Free security data attached to cred->security.
(*) security_prepare_creds(), ->cred_prepare()
New. Duplicate any security data attached to cred->security.
(*) security_commit_creds(), ->cred_commit()
New. Apply any security effects for the upcoming installation of new
security by commit_creds().
(*) security_task_post_setuid(), ->task_post_setuid()
Removed in favour of security_task_fix_setuid().
(*) security_task_fix_setuid(), ->task_fix_setuid()
Fix up the proposed new credentials for setuid(). This is used by
cap_set_fix_setuid() to implicitly adjust capabilities in line with
setuid() changes. Changes are made to the new credentials, rather
than the task itself as in security_task_post_setuid().
(*) security_task_reparent_to_init(), ->task_reparent_to_init()
Removed. Instead the task being reparented to init is referred
directly to init's credentials.
NOTE! This results in the loss of some state: SELinux's osid no
longer records the sid of the thread that forked it.
(*) security_key_alloc(), ->key_alloc()
(*) security_key_permission(), ->key_permission()
Changed. These now take cred pointers rather than task pointers to
refer to the security context.
(4) sys_capset().
This has been simplified and uses less locking. The LSM functions it
calls have been merged.
(5) reparent_to_kthreadd().
This gives the current thread the same credentials as init by simply using
commit_thread() to point that way.
(6) __sigqueue_alloc() and switch_uid()
__sigqueue_alloc() can't stop the target task from changing its creds
beneath it, so this function gets a reference to the currently applicable
user_struct which it then passes into the sigqueue struct it returns if
successful.
switch_uid() is now called from commit_creds(), and possibly should be
folded into that. commit_creds() should take care of protecting
__sigqueue_alloc().
(7) [sg]et[ug]id() and co and [sg]et_current_groups.
The set functions now all use prepare_creds(), commit_creds() and
abort_creds() to build and check a new set of credentials before applying
it.
security_task_set[ug]id() is called inside the prepared section. This
guarantees that nothing else will affect the creds until we've finished.
The calling of set_dumpable() has been moved into commit_creds().
Much of the functionality of set_user() has been moved into
commit_creds().
The get functions all simply access the data directly.
(8) security_task_prctl() and cap_task_prctl().
security_task_prctl() has been modified to return -ENOSYS if it doesn't
want to handle a function, or otherwise return the return value directly
rather than through an argument.
Additionally, cap_task_prctl() now prepares a new set of credentials, even
if it doesn't end up using it.
(9) Keyrings.
A number of changes have been made to the keyrings code:
(a) switch_uid_keyring(), copy_keys(), exit_keys() and suid_keys() have
all been dropped and built in to the credentials functions directly.
They may want separating out again later.
(b) key_alloc() and search_process_keyrings() now take a cred pointer
rather than a task pointer to specify the security context.
(c) copy_creds() gives a new thread within the same thread group a new
thread keyring if its parent had one, otherwise it discards the thread
keyring.
(d) The authorisation key now points directly to the credentials to extend
the search into rather pointing to the task that carries them.
(e) Installing thread, process or session keyrings causes a new set of
credentials to be created, even though it's not strictly necessary for
process or session keyrings (they're shared).
(10) Usermode helper.
The usermode helper code now carries a cred struct pointer in its
subprocess_info struct instead of a new session keyring pointer. This set
of credentials is derived from init_cred and installed on the new process
after it has been cloned.
call_usermodehelper_setup() allocates the new credentials and
call_usermodehelper_freeinfo() discards them if they haven't been used. A
special cred function (prepare_usermodeinfo_creds()) is provided
specifically for call_usermodehelper_setup() to call.
call_usermodehelper_setkeys() adjusts the credentials to sport the
supplied keyring as the new session keyring.
(11) SELinux.
SELinux has a number of changes, in addition to those to support the LSM
interface changes mentioned above:
(a) selinux_setprocattr() no longer does its check for whether the
current ptracer can access processes with the new SID inside the lock
that covers getting the ptracer's SID. Whilst this lock ensures that
the check is done with the ptracer pinned, the result is only valid
until the lock is released, so there's no point doing it inside the
lock.
(12) is_single_threaded().
This function has been extracted from selinux_setprocattr() and put into
a file of its own in the lib/ directory as join_session_keyring() now
wants to use it too.
The code in SELinux just checked to see whether a task shared mm_structs
with other tasks (CLONE_VM), but that isn't good enough. We really want
to know if they're part of the same thread group (CLONE_THREAD).
(13) nfsd.
The NFS server daemon now has to use the COW credentials to set the
credentials it is going to use. It really needs to pass the credentials
down to the functions it calls, but it can't do that until other patches
in this series have been applied.
Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: James Morris <jmorris@namei.org>
Signed-off-by: James Morris <jmorris@namei.org>
In 2007, a0acd82080 changed the default
slab allocator to SLUB, but the SLAB help text still says SLAB is the
default. This change fixes that.
Signed-off-by: Simon Arlott <simon@fire.lp0.eu>
Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>
init/Kconfig directs the user to Documentation/sched-rt-group.txt, but
the file is actually in Documentation/scheduler/sched-rt-group.txt.
This patch corrects the pathname mentioned in init/Kconfig.
Signed-off-by: Adrian Knoth <adi@drcomp.erfurt.thur.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: Cleanups on the boot tracer and ftrace
This patch bring some cleanups about the boot tracer headers. The
functions and structures of this tracer have nothing related to ftrace
and should have so their own header file.
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: modify boot tracer
We used to disable the initcall tracing at a specified time (IE: end
of builtin initcalls). But we don't need it anymore. It will be
stopped when initcalls are finished.
However we want two things:
_Start this tracing only after pre-smp initcalls are finished.
_Since we are planning to trace sched_switches at the same time, we
want to enable them only during the initcall execution.
For this purpose, this patch introduce two functions to enable/disable
the sched_switch tracing during boot.
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Removed duplicated #include <linux/delay.h> in init/do_mounts_md.c.
The same compile error ("error: implicit declaration of function
'msleep'") got fixed twice:
- f8b77d3939 ("init/do_mounts_md.c:
msleep compile fix")
- 73b4a24f5f ("init/do_mounts_md.c must
#include <linux/delay.h>")
by people adding the <linux/delay.h> include in two slightly different
places. Andrew's quilt scripts happily ignore the fuzz, and will
re-apply the patch even though they had conflicts.
Signed-off-by: Huang Weiyi <weiyi.huang@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
page_cgroup is now allocated at boot and memmap doesn't includes pointer
for page_cgroup. Fix the menu help text.
Reviewed-by: Balbir Singh <balbir@linux.vnet.ibm.com>
Signed-off-by: KAMEZAWA hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This reverts commit a802dd0eb5 by moving
the call to init_workqueues() back where it belongs - after SMP has been
initialized.
It also moves stop_machine_init() - which needs workqueues - to a later
phase using a core_initcall() instead of early_initcall(). That should
satisfy all ordering requirements, and was apparently the reason why
init_workqueues() was moved to be too early.
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6-for-linus:
stop_machine: fix error code handling on multiple cpus
stop_machine: use workqueues instead of kernel threads
workqueue: introduce create_rt_workqueue
Call init_workqueues before pre smp initcalls.
Make panic= and panic_on_oops into core_params
Make initcall_debug a core_param
core_param() for genuinely core kernel parameters
param: Fix duplicate module prefixes
module: check kernel param length at compile time, not runtime
Remove stop_machine during module load v2
module: simplify load_module.
page_cgroup_init() is called from mem_cgroup_init(). But at this
point, we cannot call alloc_bootmem().
(and this caused panic at boot.)
This patch moves page_cgroup_init() to init/main.c.
Time table is following:
==
parse_args(). # we can trust mem_cgroup_subsys.disabled bit after this.
....
cgroup_init_early() # "early" init of cgroup.
....
setup_arch() # memmap is allocated.
...
page_cgroup_init();
mem_init(); # we cannot call alloc_bootmem after this.
....
cgroup_init() # mem_cgroup is initialized.
==
Before page_cgroup_init(), mem_map must be initialized. So,
I added page_cgroup_init() to init/main.c directly.
(*) maybe this is not very clean but
- cgroup_init_early() is too early
- in cgroup_init(), we have to use vmalloc instead of alloc_bootmem().
use of vmalloc area in x86-32 is important and we should avoid very large
vmalloc() in x86-32. So, we want to use alloc_bootmem() and added page_cgroup_init()
directly to init/main.c
[akpm@linux-foundation.org: remove unneeded/bad mem_cgroup_subsys declaration]
[akpm@linux-foundation.org: fix build]
Acked-by: Balbir Singh <balbir@linux.vnet.ibm.com>
Tested-by: Balbir Singh <balbir@linux.vnet.ibm.com>
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
commit 3d13731024 ("PCI: allow quirks to be
compiled out") introduced CONFIG_PCI_QUIRKS, which now shows up in each
and every .config. Fix this by making it depend on PCI.
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
This allows to create workqueues from within the context of
a pre smp initcall (aka early_initcall).
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This is the one I really wanted: now it effects module loading, it
makes sense to be able to flip it after boot.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Acked-by: Arjan van de Ven <arjan@linux.intel.com>
* 'linux-next' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6: (41 commits)
PCI: fix pci_ioremap_bar() on s390
PCI: fix AER capability check
PCI: use pci_find_ext_capability everywhere
PCI: remove #ifdef DEBUG around dev_dbg call
PCI hotplug: fix get_##name return value problem
PCI: document the pcie_aspm kernel parameter
PCI: introduce an pci_ioremap(pdev, barnr) function
powerpc/PCI: Add legacy PCI access via sysfs
PCI: Add ability to mmap legacy_io on some platforms
PCI: probing debug message uniformization
PCI: support PCIe ARI capability
PCI: centralize the capabilities code in probe.c
PCI: centralize the capabilities code in pci-sysfs.c
PCI: fix 64-vbit prefetchable memory resource BARs
PCI: replace cfg space size (256/4096) by macros.
PCI: use resource_size() everywhere.
PCI: use same arg names in PCI_VDEVICE comment
PCI hotplug: rpaphp: make debug var unique
PCI: use %pF instead of print_fn_descriptor_symbol() in quirks.c
PCI: fix hotplug get_##name return value problem
...
This patch adds the CONFIG_PCI_QUIRKS option which allows to remove all
the PCI quirks, which are not necessarily used on embedded systems when
PCI is working properly. As this is a size-reduction option, it depends
on CONFIG_EMBEDDED. It allows to save almost 12 kilobytes of kernel
code:
text data bss dec hex filename
1287806 123596 212992 1624394 18c94a vmlinux.old
1275854 123596 212992 1612442 189a9a vmlinux
-11952 0 0 -11952 -2EB0 +/-
This patch has originally been written by Zwane Mwaikambo
<zwane@arm.linux.org.uk> and is part of the Linux Tiny project.
Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
This patch implements a new freezer subsystem in the control groups
framework. It provides a way to stop and resume execution of all tasks in
a cgroup by writing in the cgroup filesystem.
The freezer subsystem in the container filesystem defines a file named
freezer.state. Writing "FROZEN" to the state file will freeze all tasks
in the cgroup. Subsequently writing "RUNNING" will unfreeze the tasks in
the cgroup. Reading will return the current state.
* Examples of usage :
# mkdir /containers/freezer
# mount -t cgroup -ofreezer freezer /containers
# mkdir /containers/0
# echo $some_pid > /containers/0/tasks
to get status of the freezer subsystem :
# cat /containers/0/freezer.state
RUNNING
to freeze all tasks in the container :
# echo FROZEN > /containers/0/freezer.state
# cat /containers/0/freezer.state
FREEZING
# cat /containers/0/freezer.state
FROZEN
to unfreeze all tasks in the container :
# echo RUNNING > /containers/0/freezer.state
# cat /containers/0/freezer.state
RUNNING
This is the basic mechanism which should do the right thing for user space
task in a simple scenario.
It's important to note that freezing can be incomplete. In that case we
return EBUSY. This means that some tasks in the cgroup are busy doing
something that prevents us from completely freezing the cgroup at this
time. After EBUSY, the cgroup will remain partially frozen -- reflected
by freezer.state reporting "FREEZING" when read. The state will remain
"FREEZING" until one of these things happens:
1) Userspace cancels the freezing operation by writing "RUNNING" to
the freezer.state file
2) Userspace retries the freezing operation by writing "FROZEN" to
the freezer.state file (writing "FREEZING" is not legal
and returns EIO)
3) The tasks that blocked the cgroup from entering the "FROZEN"
state disappear from the cgroup's set of tasks.
[akpm@linux-foundation.org: coding-style fixes]
[akpm@linux-foundation.org: export thaw_process]
Signed-off-by: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Matt Helsley <matthltc@us.ibm.com>
Acked-by: Serge E. Hallyn <serue@us.ibm.com>
Tested-by: Matt Helsley <matthltc@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Rewrite the vmap allocator to use rbtrees and lazy tlb flushing, and
provide a fast, scalable percpu frontend for small vmaps (requires a
slightly different API, though).
The biggest problem with vmap is actually vunmap. Presently this requires
a global kernel TLB flush, which on most architectures is a broadcast IPI
to all CPUs to flush the cache. This is all done under a global lock. As
the number of CPUs increases, so will the number of vunmaps a scaled
workload will want to perform, and so will the cost of a global TLB flush.
This gives terrible quadratic scalability characteristics.
Another problem is that the entire vmap subsystem works under a single
lock. It is a rwlock, but it is actually taken for write in all the fast
paths, and the read locking would likely never be run concurrently anyway,
so it's just pointless.
This is a rewrite of vmap subsystem to solve those problems. The existing
vmalloc API is implemented on top of the rewritten subsystem.
The TLB flushing problem is solved by using lazy TLB unmapping. vmap
addresses do not have to be flushed immediately when they are vunmapped,
because the kernel will not reuse them again (would be a use-after-free)
until they are reallocated. So the addresses aren't allocated again until
a subsequent TLB flush. A single TLB flush then can flush multiple
vunmaps from each CPU.
XEN and PAT and such do not like deferred TLB flushing because they can't
always handle multiple aliasing virtual addresses to a physical address.
They now call vm_unmap_aliases() in order to flush any deferred mappings.
That call is very expensive (well, actually not a lot more expensive than
a single vunmap under the old scheme), however it should be OK if not
called too often.
The virtual memory extent information is stored in an rbtree rather than a
linked list to improve the algorithmic scalability.
There is a per-CPU allocator for small vmaps, which amortizes or avoids
global locking.
To use the per-CPU interface, the vm_map_ram / vm_unmap_ram interfaces
must be used in place of vmap and vunmap. Vmalloc does not use these
interfaces at the moment, so it will not be quite so scalable (although it
will use lazy TLB flushing).
As a quick test of performance, I ran a test that loops in the kernel,
linearly mapping then touching then unmapping 4 pages. Different numbers
of tests were run in parallel on an 4 core, 2 socket opteron. Results are
in nanoseconds per map+touch+unmap.
threads vanilla vmap rewrite
1 14700 2900
2 33600 3000
4 49500 2800
8 70631 2900
So with a 8 cores, the rewritten version is already 25x faster.
In a slightly more realistic test (although with an older and less
scalable version of the patch), I ripped the not-very-good vunmap batching
code out of XFS, and implemented the large buffer mapping with vm_map_ram
and vm_unmap_ram... along with a couple of other tricks, I was able to
speed up a large directory workload by 20x on a 64 CPU system. I believe
vmap/vunmap is actually sped up a lot more than 20x on such a system, but
I'm running into other locks now. vmap is pretty well blown off the
profiles.
Before:
1352059 total 0.1401
798784 _write_lock 8320.6667 <- vmlist_lock
529313 default_idle 1181.5022
15242 smp_call_function 15.8771 <- vmap tlb flushing
2472 __get_vm_area_node 1.9312 <- vmap
1762 remove_vm_area 4.5885 <- vunmap
316 map_vm_area 0.2297 <- vmap
312 kfree 0.1950
300 _spin_lock 3.1250
252 sn_send_IPI_phys 0.4375 <- tlb flushing
238 vmap 0.8264 <- vmap
216 find_lock_page 0.5192
196 find_next_bit 0.3603
136 sn2_send_IPI 0.2024
130 pio_phys_write_mmr 2.0312
118 unmap_kernel_range 0.1229
After:
78406 total 0.0081
40053 default_idle 89.4040
33576 ia64_spinlock_contention 349.7500
1650 _spin_lock 17.1875
319 __reg_op 0.5538
281 _atomic_dec_and_lock 1.0977
153 mutex_unlock 1.5938
123 iget_locked 0.1671
117 xfs_dir_lookup 0.1662
117 dput 0.1406
114 xfs_iget_core 0.0268
92 xfs_da_hashname 0.1917
75 d_alloc 0.0670
68 vmap_page_range 0.0462 <- vmap
58 kmem_cache_alloc 0.0604
57 memset 0.0540
52 rb_next 0.1625
50 __copy_user 0.0208
49 bitmap_find_free_region 0.2188 <- vmap
46 ia64_sn_udelay 0.1106
45 find_inode_fast 0.1406
42 memcmp 0.2188
42 finish_task_switch 0.1094
42 __d_lookup 0.0410
40 radix_tree_lookup_slot 0.1250
37 _spin_unlock_irqrestore 0.3854
36 xfs_bmapi 0.0050
36 kmem_cache_free 0.0256
35 xfs_vn_getattr 0.0322
34 radix_tree_lookup 0.1062
33 __link_path_walk 0.0035
31 xfs_da_do_buf 0.0091
30 _xfs_buf_find 0.0204
28 find_get_page 0.0875
27 xfs_iread 0.0241
27 __strncpy_from_user 0.2812
26 _xfs_buf_initialize 0.0406
24 _xfs_buf_lookup_pages 0.0179
24 vunmap_page_range 0.0250 <- vunmap
23 find_lock_page 0.0799
22 vm_map_ram 0.0087 <- vmap
20 kfree 0.0125
19 put_page 0.0330
18 __kmalloc 0.0176
17 xfs_da_node_lookup_int 0.0086
17 _read_lock 0.0885
17 page_waitqueue 0.0664
vmap has gone from being the top 5 on the profiles and flushing the crap
out of all TLBs, to using less than 1% of kernel time.
[akpm@linux-foundation.org: cleanups, section fix]
[akpm@linux-foundation.org: fix build on alpha]
Signed-off-by: Nick Piggin <npiggin@suse.de>
Cc: Jeremy Fitzhardinge <jeremy@goop.org>
Cc: Krzysztof Helt <krzysztof.h1@poczta.fm>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This patch fixes the following compile error caused by commit
589f800bb1 ("fastboot: make the raid
autodetect code wait for all devices to init"):
CC init/do_mounts_md.o
init/do_mounts_md.c: In function 'autodetect_raid':
init/do_mounts_md.c:285: error: implicit declaration of function 'msleep'
make[2]: *** [init/do_mounts_md.o] Error 1
Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This patchs adds the CONFIG_AIO option which allows to remove support
for asynchronous I/O operations, that are not necessarly used by
applications, particularly on embedded devices. As this is a
size-reduction option, it depends on CONFIG_EMBEDDED. It allows to
save ~7 kilobytes of kernel code/data:
text data bss dec hex filename
1115067 119180 217088 1451335 162547 vmlinux
1108025 119048 217088 1444161 160941 vmlinux.new
-7042 -132 0 -7174 -1C06 +/-
This patch has been originally written by Matt Mackall
<mpm@selenic.com>, and is part of the Linux Tiny project.
[randy.dunlap@oracle.com: build fix]
Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
Cc: Benjamin LaHaise <bcrl@kvack.org>
Cc: Zach Brown <zach.brown@oracle.com>
Signed-off-by: Matt Mackall <mpm@selenic.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
When unpacking the cpio into the initramfs, mtimes are not preserved by
default. This patch adds an INITRAMFS_PRESERVE_MTIME option that allows
mtimes stored in the cpio image to be used when constructing the
initramfs.
For embedded applications that run exclusively out of the initramfs, this
is invaluable:
When building embedded application initramfs images, its nice to know when
the files were actually created during the build process - that makes it
easier to see what files were modified when so we can compare the files
that are being used on the image with the files used during the build
process. This might help (for example) to determine if the target system
has all the updated files you expect to see w/o having to check MD5s etc.
In our environment, the whole system runs off the initramfs partition, and
seeing the modified times of the shared libraries (for example) helps us
find bugs that may have been introduced by the build system incorrectly
propogating outdated shared libraries into the image.
Similarly, many of the initializion/configuration files in /etc might be
dynamically built by the build system, and knowing when they were modified
helps us sanity check whether the target system has the "latest" files
etc.
Finally, we might use last modified times to determine whether a hot fix
should be applied or not to the running ramfs.
Signed-off-by: Nye Liu <nyet@nyet.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
identify_ramdisk_image() returns 0 (not -1) if a gzipped ramdisk is found:
if (buf[0] == 037 && ((buf[1] == 0213) || (buf[1] == 0236))) {
printk(KERN_NOTICE
"RAMDISK: Compressed image found at block %d\n",
start_block);
nblocks = 0;
^^^^^^^^^^^
goto done;
}
...
done:
sys_lseek(fd, start_block * BLOCK_SIZE, 0);
kfree(buf);
return nblocks;
^^^^^^^^^^^^^^
Hence correct the typo in the comment, which has existed since the
addition of compressed ramdisk support in 1.3.48.
Signed-off-by: Geert Uytterhoeven <Geert.Uytterhoeven@sonycom.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arjan/linux-2.6-fastboot:
raid, fastboot: hide RAID autodetect option if MD is compiled as a module
raid: make RAID autodetect default a KConfig option
warning: fix init do_mounts_md c
fastboot: make the RAID autostart code print a message just before waiting
fastboot: make the raid autodetect code wait for all devices to init
fastboot: Fix bootgraph.pl initcall name regexp
fastboot: fix issues and improve output of bootgraph.pl
Add a script to visualize the kernel boot process / time
Change the time resolution for initcall_debug to microseconds, from
milliseconds. This is handy to determine which initcalls you want to work
on for faster booting.
One one of my test machines, over 90% of the initcalls are less than a
millisecond and (without this patch) these are all reported as 0 msecs.
Working on the 900 us ones is more important than the 4 us ones.
With 'quiet' on the kernel command line, this adds no significant overhead
to kernel boot time.
Signed-off-by: Tim Bird <tim.bird@am.sony.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
At this time, only built-in initcalls interest us.
We can't really produce a relevant graph if we include
the modules initcall too.
I had good results after this patch (see svg in attachment).
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
After some initcall traces, some initcall names may be inconsistent.
That's because these functions will disappear from the .init section
and also their name from the symbols table.
So we have to copy the name of the function in a buffer large enough
during the trace appending. It is not costly for the ring_buffer because
the number of initcall entries is commonly not really large.
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Change the boot tracer printing to make it parsable for
the scripts/bootgraph.pl script.
We have now to output two lines for each initcall, according to the
printk in do_one_initcall() in init/main.c
We need now the call's time and the return's time.
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Launch the boot tracing inside the initcall_debug area. Old printk
have not been removed to keep the old way of initcall tracing for
backward compatibility.
[ mingo@elte.hu: resolved conflicts ]
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
When optimizing the kernel boot time, it's very valuable to visualize
what is going on at which time. In addition, with the fastboot asynchronous
initcall level, it's very valuable to see which initcall gets run where
and when.
This patch adds a script to turn a dmesg into a SVG graph (that can be
shown with tools such as InkScape, Gimp or Firefox) and a small change
to the initcall code to print the PID of the thread calling the initcall
(so that the script can work out the parallelism).
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
This is the infrastructure to the converting the mcount call sites
recorded by the __mcount_loc section into nops on boot. It also allows
for using these sites to enable tracing as normal. When the __mcount_loc
section is used, the "ftraced" kernel thread is disabled.
This uses the current infrastructure to record the mcount call sites
as well as convert them to nops. The mcount function is kept as a stub
on boot up and not converted to the ftrace_record_ip function. We use the
ftrace_record_ip to only record from the table.
This patch does not handle modules. That comes with a later patch.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
do not expose users to CONFIG_TRACEPOINTS - tracers can select it
just fine.
update ftrace to select CONFIG_TRACEPOINTS.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Implementation of kernel tracepoints. Inspired from the Linux Kernel
Markers. Allows complete typing verification by declaring both tracing
statement inline functions and probe registration/unregistration static
inline functions within the same macro "DEFINE_TRACE". No format string
is required. See the tracepoint Documentation and Samples patches for
usage examples.
Taken from the documentation patch :
"A tracepoint placed in code provides a hook to call a function (probe)
that you can provide at runtime. A tracepoint can be "on" (a probe is
connected to it) or "off" (no probe is attached). When a tracepoint is
"off" it has no effect, except for adding a tiny time penalty (checking
a condition for a branch) and space penalty (adding a few bytes for the
function call at the end of the instrumented function and adds a data
structure in a separate section). When a tracepoint is "on", the
function you provide is called each time the tracepoint is executed, in
the execution context of the caller. When the function provided ends its
execution, it returns to the caller (continuing from the tracepoint
site).
You can put tracepoints at important locations in the code. They are
lightweight hooks that can pass an arbitrary number of parameters, which
prototypes are described in a tracepoint declaration placed in a header
file."
Addition and removal of tracepoints is synchronized by RCU using the
scheduler (and preempt_disable) as guarantees to find a quiescent state
(this is really RCU "classic"). The update side uses rcu_barrier_sched()
with call_rcu_sched() and the read/execute side uses
"preempt_disable()/preempt_enable()".
We make sure the previous array containing probes, which has been
scheduled for deletion by the rcu callback, is indeed freed before we
proceed to the next update. It therefore limits the rate of modification
of a single tracepoint to one update per RCU period. The objective here
is to permit fast batch add/removal of probes on _different_
tracepoints.
Changelog :
- Use #name ":" #proto as string to identify the tracepoint in the
tracepoint table. This will make sure not type mismatch happens due to
connexion of a probe with the wrong type to a tracepoint declared with
the same name in a different header.
- Add tracepoint_entry_free_old.
- Change __TO_TRACE to get rid of the 'i' iterator.
Masami Hiramatsu <mhiramat@redhat.com> :
Tested on x86-64.
Performance impact of a tracepoint : same as markers, except that it
adds about 70 bytes of instructions in an unlikely branch of each
instrumented function (the for loop, the stack setup and the function
call). It currently adds a memory read, a test and a conditional branch
at the instrumentation site (in the hot path). Immediate values will
eventually change this into a load immediate, test and branch, which
removes the memory read which will make the i-cache impact smaller
(changing the memory read for a load immediate removes 3-4 bytes per
site on x86_32 (depending on mov prefixes), or 7-8 bytes on x86_64, it
also saves the d-cache hit).
About the performance impact of tracepoints (which is comparable to
markers), even without immediate values optimizations, tests done by
Hideo Aoki on ia64 show no regression. His test case was using hackbench
on a kernel where scheduler instrumentation (about 5 events in code
scheduler code) was added.
Quoting Hideo Aoki about Markers :
I evaluated overhead of kernel marker using linux-2.6-sched-fixes git
tree, which includes several markers for LTTng, using an ia64 server.
While the immediate trace mark feature isn't implemented on ia64, there
is no major performance regression. So, I think that we don't have any
issues to propose merging marker point patches into Linus's tree from
the viewpoint of performance impact.
I prepared two kernels to evaluate. The first one was compiled without
CONFIG_MARKERS. The second one was enabled CONFIG_MARKERS.
I downloaded the original hackbench from the following URL:
http://devresources.linux-foundation.org/craiger/hackbench/src/hackbench.c
I ran hackbench 5 times in each condition and calculated the average and
difference between the kernels.
The parameter of hackbench: every 50 from 50 to 800
The number of CPUs of the server: 2, 4, and 8
Below is the results. As you can see, major performance regression
wasn't found in any case. Even if number of processes increases,
differences between marker-enabled kernel and marker- disabled kernel
doesn't increase. Moreover, if number of CPUs increases, the differences
doesn't increase either.
Curiously, marker-enabled kernel is better than marker-disabled kernel
in more than half cases, although I guess it comes from the difference
of memory access pattern.
* 2 CPUs
Number of | without | with | diff | diff |
processes | Marker [Sec] | Marker [Sec] | [Sec] | [%] |
--------------------------------------------------------------
50 | 4.811 | 4.872 | +0.061 | +1.27 |
100 | 9.854 | 10.309 | +0.454 | +4.61 |
150 | 15.602 | 15.040 | -0.562 | -3.6 |
200 | 20.489 | 20.380 | -0.109 | -0.53 |
250 | 25.798 | 25.652 | -0.146 | -0.56 |
300 | 31.260 | 30.797 | -0.463 | -1.48 |
350 | 36.121 | 35.770 | -0.351 | -0.97 |
400 | 42.288 | 42.102 | -0.186 | -0.44 |
450 | 47.778 | 47.253 | -0.526 | -1.1 |
500 | 51.953 | 52.278 | +0.325 | +0.63 |
550 | 58.401 | 57.700 | -0.701 | -1.2 |
600 | 63.334 | 63.222 | -0.112 | -0.18 |
650 | 68.816 | 68.511 | -0.306 | -0.44 |
700 | 74.667 | 74.088 | -0.579 | -0.78 |
750 | 78.612 | 79.582 | +0.970 | +1.23 |
800 | 85.431 | 85.263 | -0.168 | -0.2 |
--------------------------------------------------------------
* 4 CPUs
Number of | without | with | diff | diff |
processes | Marker [Sec] | Marker [Sec] | [Sec] | [%] |
--------------------------------------------------------------
50 | 2.586 | 2.584 | -0.003 | -0.1 |
100 | 5.254 | 5.283 | +0.030 | +0.56 |
150 | 8.012 | 8.074 | +0.061 | +0.76 |
200 | 11.172 | 11.000 | -0.172 | -1.54 |
250 | 13.917 | 14.036 | +0.119 | +0.86 |
300 | 16.905 | 16.543 | -0.362 | -2.14 |
350 | 19.901 | 20.036 | +0.135 | +0.68 |
400 | 22.908 | 23.094 | +0.186 | +0.81 |
450 | 26.273 | 26.101 | -0.172 | -0.66 |
500 | 29.554 | 29.092 | -0.461 | -1.56 |
550 | 32.377 | 32.274 | -0.103 | -0.32 |
600 | 35.855 | 35.322 | -0.533 | -1.49 |
650 | 39.192 | 38.388 | -0.804 | -2.05 |
700 | 41.744 | 41.719 | -0.025 | -0.06 |
750 | 45.016 | 44.496 | -0.520 | -1.16 |
800 | 48.212 | 47.603 | -0.609 | -1.26 |
--------------------------------------------------------------
* 8 CPUs
Number of | without | with | diff | diff |
processes | Marker [Sec] | Marker [Sec] | [Sec] | [%] |
--------------------------------------------------------------
50 | 2.094 | 2.072 | -0.022 | -1.07 |
100 | 4.162 | 4.273 | +0.111 | +2.66 |
150 | 6.485 | 6.540 | +0.055 | +0.84 |
200 | 8.556 | 8.478 | -0.078 | -0.91 |
250 | 10.458 | 10.258 | -0.200 | -1.91 |
300 | 12.425 | 12.750 | +0.325 | +2.62 |
350 | 14.807 | 14.839 | +0.032 | +0.22 |
400 | 16.801 | 16.959 | +0.158 | +0.94 |
450 | 19.478 | 19.009 | -0.470 | -2.41 |
500 | 21.296 | 21.504 | +0.208 | +0.98 |
550 | 23.842 | 23.979 | +0.137 | +0.57 |
600 | 26.309 | 26.111 | -0.198 | -0.75 |
650 | 28.705 | 28.446 | -0.259 | -0.9 |
700 | 31.233 | 31.394 | +0.161 | +0.52 |
750 | 34.064 | 33.720 | -0.344 | -1.01 |
800 | 36.320 | 36.114 | -0.206 | -0.57 |
--------------------------------------------------------------
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Acked-by: Masami Hiramatsu <mhiramat@redhat.com>
Acked-by: 'Peter Zijlstra' <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
* 'proc' of git://git.kernel.org/pub/scm/linux/kernel/git/adobriyan/proc:
proc: remove kernel.maps_protect
proc: remove now unneeded ADDBUF macro
[PATCH] proc: show personality via /proc/pid/personality
[PATCH] signal, procfs: some lock_task_sighand() users do not need rcu_read_lock()
proc: move PROC_PAGE_MONITOR to fs/proc/Kconfig
proc: make grab_header() static
proc: remove unused get_dma_list()
proc: remove dummy vmcore_open()
proc: proc_sys_root tweak
proc: fix return value of proc_reg_open() in "too late" case
Fixed up trivial conflict in removed file arch/sparc/include/asm/dma_32.h
RAID autodetect has the side effect of requiring synchronisation
of all device drivers, which can make the boot several seconds longer
(I've measured 7 on one of my laptops).... even for systems that don't
have RAID setup for the root filesystem (the only FS where this matters).
This patch makes the default for autodetect a config option; either way
the user can always override via the kernel command line.
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Acked-by: NeilBrown <neilb@suse.de>
fix warning:
init/do_mounts_md.c: In function ‘md_run_setup’:
init/do_mounts_md.c:282: warning: ISO C90 forbids mixed declarations and code
also, use the opportunity to put the RAID autodetection code
into a separate function - this also solves a checkpatch style warning.
No code changed:
md5:
aa36a35faef371b05f1974ad583bdbbd do_mounts_md.o.before.asm
aa36a35faef371b05f1974ad583bdbbd do_mounts_md.o.after.asm
Signed-off-by: Ingo Molnar <mingo@elte.hu>
As requested/suggested by Neil Brown: make the raid code print that it's
about to wait for probing to be done as well as give a suggestion on how
to disable the probing if the user doesn't use raid.
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com
The raid autodetect code really needs to have all devices probed before
it can detect raid arrays; not doing so would give rather messy situations
where arrays would get detected as degraded while they shouldn't be etc.
This is in preparation of removing the "wait for everything to init"
code that makes everyone pay, not just raid users.
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
When optimizing the kernel boot time, it's very valuable to visualize
what is going on at which time. In addition, with some of the initializing
going asynchronous soon, it's valuable to track/print which worker thread
is executing the initialization.
This patch adds a script to turn a dmesg into a SVG graph (that can be
shown with tools such as InkScape, Gimp or Firefox) and a small change
to the initcall code to print the PID of the thread calling the initcall
(so that the script can work out the parallelism).
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
DEBUG_BLOCK_EXT_DEVT shuffles SCSI and IDE device numbers and root
device number set using rdev become meaningless. Root devices should
be explicitly specified using textual names. Warn about it if root
can't be found and DEBUG_BLOCK_EXT_DEVT is enabled. Also, add warning
to the help text.
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
.. small detail, but the silly e1000e initcall warning debugging caused
me to look at this code. Rather than gouge my eyes out with a spoon, I
just fixed it.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
I noticed that sysctl_check.o was the largest object file in
a allnoconfig build in kernel/*.
36243 0 0 36243 8d93 kernel/sysctl_check.o
This is because it was default y and && EMBEDDED. But I don't
really see a need for a non kernel developer to have their
sysctls checked all the time.
So move the Kconfig into the kernel debugging section and
also drop the default y and the EMBEDDED check.
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The kernel has this really nice facility where if you put "initcall_debug"
on the kernel commandline, it'll print which function it's going to
execute just before calling an initcall, and then after the call completes
it will
1) print if it had an error code
2) checks for a few simple bugs (like leaving irqs off)
and
3) print how long the init call took in milliseconds.
While trying to optimize the boot speed of my laptop, I have been loving
number 3 to figure out what to optimize... ... and then I wished that
the same thing was done for module loading.
This patch makes the module loader use this exact same functionality; it's
a logical extension in my view (since modules are just sort of late
binding initcalls anyway) and so far I've found it quite useful in finding
where things are too slow in my boot.
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Given that the init/Kconfig file uses a "menuconfig" directive for
modules already, might as well wrap all the submenu entries in an "if"
to toss all those dependencies.
Signed-off-by: Robert P. J. Day <rpjday@crashcourse.ca>
Acked-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
* git://git.kernel.org/pub/scm/linux/kernel/git/sam/kbuild-fixes:
kbuild: scripts/ver_linux: don't set PATH
Kconfig/init: change help text to match default value
kbuild: genksyms: Include extern information in dumps
kbuild: genksyms parser: fix the __attribute__ rule
kbuild: scripts/genksyms/lex.l: add %option noinput
kconfig: scripts/kconfig/zconf.l: add %option noinput
kbuild: fix O=... build of um
Change the "If unsure" message to match the default value.
Signed-off-by: John Kacur <jkacur at gmail dot com>
Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
commit fb6624ebd9 (initrd: Fix virtual/physical
mix-up in overwrite test) introduced the compiler warning below on mips,
as its virt_to_page() doesn't cast the passed address to unsigned long
internally, unlike on most other architectures:
init/main.c: In function `start_kernel':
init/main.c:633: warning: passing argument 1 of `virt_to_phys' makes pointer from integer without a cast
init/main.c:636: warning: passing argument 1 of `virt_to_phys' makes pointer from integer without a cast
For now, kill the warning by explicitly casting initrd_start to `void *', as
that's the type it should really be.
Reported-by: Atsushi Nemoto <anemo@mba.ocn.ne.jp>
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Rabin Vincent noticed that there's a stray <6> in BogoMIPS printk:
> Remove the extra KERN_INFO which causes this:
> Calibrating delay loop... <6>179.40 BogoMIPS (lpj=897024)
> - printk(KERN_INFO "%lu.%02lu BogoMIPS (lpj=%lu)\n",
> - loops_per_jiffy/(500000/HZ),
> - (loops_per_jiffy/(5000/HZ)) % 100, loops_per_jiffy);
> + printk("%lu.%02lu BogoMIPS (lpj=%lu)\n",
> + loops_per_jiffy/(500000/HZ),
> + (loops_per_jiffy/(5000/HZ)) % 100, loops_per_jiffy);
> }
How about just using KERN_CONT and leaving the whitespace
for a patch that does the entire file?
Reported-by: Rabin Vincent <rabin@rab.in>
* git://git.kernel.org/pub/scm/linux/kernel/git/sam/kbuild-next: (25 commits)
setlocalversion: do not describe if there is nothing to describe
kconfig: fix typos: "Suport" -> "Support"
kconfig: make defconfig is no longer chatty
kconfig: make oldconfig is now less chatty
kconfig: speed up all*config + randconfig
kconfig: set all new symbols automatically
kconfig: add diffconfig utility
kbuild: remove Module.markers during mrproper
kbuild: sparse needs CF not CHECKFLAGS
kernel-doc: handle/strip __init
vmlinux.lds: move __attribute__((__cold__)) functions back into final .text section
init: fix URL of "The GNU Accounting Utilities"
kbuild: add arch/$ARCH/include to search path
kbuild: asm symlink support for arch/$ARCH/include
kbuild: support arch/$ARCH/include for tags, cscope
kbuild: prepare headers_* for arch/$ARCH/include
kbuild: install all headers when arch is changed
kbuild: make clean removes *.o.* as well
kbuild: optimize headers_* targets
kbuild: only one call for include/ in make headers_*
...
This patch makes the needlessly global root_device_name static.
Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
A previous patch added the early_initcall(), to allow a cleaner hooking of
pre-SMP initcalls. Now we remove the older interface, converting all
existing users to the new one.
[akpm@linux-foundation.org: cleanups]
[akpm@linux-foundation.org: build fix]
[kosaki.motohiro@jp.fujitsu.com: warning fix]
[kosaki.motohiro@jp.fujitsu.com: warning fix]
Signed-off-by: Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro>
Cc: Tom Zanussi <tzanussi@gmail.com>
Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Added early initcall (pre-SMP) support, using an identical interface to
that of regular initcalls. Functions called from do_pre_smp_initcalls()
could be converted to use this cleaner interface.
This is required by CPU hotplug, because early users have to register
notifiers before going SMP. One such CPU hotplug user is the relay
interface with buffer-only channels, which needs to register such a
notifier, to be usable in early code. This in turn is used by kmemtrace.
Signed-off-by: Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro>
Cc: Tom Zanussi <tzanussi@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Following patch corrects URL of "The GNU Accounting Utilities" in init/Kconfig.
Noticed by: Bart Van Assche" <bart.vanassche@gmail.com>
Signed-off-by: S.Çağlar Onur <caglar@pardus.org.tr>
Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
This patch adds proper prototypes for pid{hash,map}_init() in
include/linux/pid_namespace.h
Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
int Version_* is only used with ksymoops, which is only needed (according
to README and Documentation/Changes) if CONFIG_KALLSYMS is NOT defined.
Therefore this patch defines version_string only if CONFIG_KALLSYMS is not
defined.
Signed-off-by: Daniel Guilak <daniel@danielguilak.com>
Cc: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Inflate requires some dynamic memory allocation very early in the boot
process and this is provided with a set of four functions:
malloc/free/gzip_mark/gzip_release.
The old inflate code used a mark/release strategy rather than implement
free. This new version instead keeps a count on the number of outstanding
allocations and when it hits zero, it resets the malloc arena.
This allows removing all the mark and release implementations and unifying
all the malloc/free implementations.
The architecture-dependent code must define two addresses:
- free_mem_ptr, the address of the beginning of the area in which
allocations should be made
- free_mem_end_ptr, the address of the end of the area in which
allocations should be made. If set to 0, then no check is made on
the number of allocations, it just grows as much as needed
The architecture-dependent code can also provide an arch_decomp_wdog()
function call. This function will be called several times during the
decompression process, and allow to notify the watchdog that the system is
still running. If an architecture provides such a call, then it must
define ARCH_HAS_DECOMP_WDOG so that the generic inflate code calls
arch_decomp_wdog().
Work initially done by Matt Mackall, updated to a recent version of the
kernel and improved by me.
[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
Cc: Matt Mackall <mpm@selenic.com>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: Mikael Starvik <mikael.starvik@axis.com>
Cc: Jesper Nilsson <jesper.nilsson@axis.com>
Cc: Haavard Skinnemoen <hskinnemoen@atmel.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Acked-by: Paul Mundt <lethal@linux-sh.org>
Acked-by: Yoshinori Sato <ysato@users.sourceforge.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
There seems to be little point in explicitly setting, then testing the macro
BUILD_CRAMDISK within the context of a single source file.
Signed-off-by: Robert P. J. Day <rpjday@crashcourse.ca>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Every file should include the headers containing the externs for its
global code (in this case for rd_doload).
Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* 'sched/for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
sched: hrtick_enabled() should use cpu_active()
sched, x86: clean up hrtick implementation
sched: fix build error, provide partition_sched_domains() unconditionally
sched: fix warning in inc_rt_tasks() to not declare variable 'rq' if it's not needed
cpu hotplug: Make cpu_active_map synchronization dependency clear
cpu hotplug, sched: Introduce cpu_active_map and redo sched domain managment (take 2)
sched: rework of "prioritize non-migratable tasks over migratable ones"
sched: reduce stack size in isolated_cpu_setup()
Revert parts of "ftrace: do not trace scheduler functions"
Fixed up conflicts in include/asm-x86/thread_info.h (due to the
TIF_SINGLESTEP unification vs TIF_HRTICK_RESCHED removal) and
kernel/sched_fair.c (due to cpu_active_map vs for_each_cpu_mask_nr()
introduction).
... as preparation for removing it completely, make it an
invisible bool defaulting to yes.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
module.c and module.h conatains code for finding
exported symbols which are declared with EXPORT_UNUSED_SYMBOL,
and this code is compiled in even if CONFIG_UNUSED_SYMBOLS is not set
and thus there can be no EXPORT_UNUSED_SYMBOLs in modules anyway
(because EXPORT_UNUSED_SYMBOL(x) are compiled out to nothing then).
This patch adds required #ifdefs.
Signed-off-by: Denys Vlasenko <vda.linux@googlemail.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
On recent kernels, I get the following error when using an initrd:
| initrd overwritten (0x00b78000 < 0x07668000) - disabling it.
My Amiga 4000 has 12 MiB of RAM at physical address 0x07400000 (virtual
0x00000000).
The initrd is located at the end of RAM: 0x00b78000 - 0x00c00000 (virtual).
The overwrite test compares the (virtual) initrd location to the (physical)
first available memory location, which fails.
This patch converts initrd_start to a page frame number, so it can safely be
compared with min_low_pfn.
Before the introduction of discontiguous memory support on m68k
(12d810c1b8), min_low_pfn was just left
untouched by the m68k-specific code (zero, I guess), and everything worked
fine.
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>