linux

Commit Graph

Author	SHA1	Message	Date
Linus Torvalds	12e24f34cb	Merge branch 'perfcounters-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'perfcounters-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (49 commits) perfcounter: Handle some IO return values perf_counter: Push perf_sample_data through the swcounter code perf_counter tools: Define and use our own u64, s64 etc. definitions perf_counter: Close race in perf_lock_task_context() perf_counter, x86: Improve interactions with fast-gup perf_counter: Simplify and fix task migration counting perf_counter tools: Add a data file header perf_counter: Update userspace callchain sampling uses perf_counter: Make callchain samples extensible perf report: Filter to parent set by default perf_counter tools: Handle lost events perf_counter: Add event overlow handling fs: Provide empty .set_page_dirty() aop for anon inodes perf_counter: tools: Makefile tweaks for 64-bit powerpc perf_counter: powerpc: Add processor back-end for MPC7450 family perf_counter: powerpc: Make powerpc perf_counter code safe for 32-bit kernels perf_counter: powerpc: Change how processor-specific back-ends get selected perf_counter: powerpc: Use unsigned long for register and constraint values perf_counter: powerpc: Enable use of software counters on 32-bit powerpc perf_counter tools: Add and use isprint() ...	2009-06-20 11:29:32 -07:00
Linus Torvalds	1eb51c33b2	Merge branch 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: sched: Fix out of scope variable access in sched_slice() sched: Hide runqueues from direct refer at source code level sched: Remove unneeded __ref tag sched, x86: Fix cpufreq + sched_clock() TSC scaling	2009-06-20 10:57:40 -07:00
Linus Torvalds	b0b7065b64	Merge branch 'tracing-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'tracing-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (24 commits) tracing/urgent: warn in case of ftrace_start_up inbalance tracing/urgent: fix unbalanced ftrace_start_up function-graph: add stack frame test function-graph: disable when both x86_32 and optimize for size are configured ring-buffer: have benchmark test print to trace buffer ring-buffer: do not grab locks in nmi ring-buffer: add locks around rb_per_cpu_empty ring-buffer: check for less than two in size allocation ring-buffer: remove useless compile check for buffer_page size ring-buffer: remove useless warn on check ring-buffer: use BUF_PAGE_HDR_SIZE in calculating index tracing: update sample event documentation tracing/filters: fix race between filter setting and module unload tracing/filters: free filter_string in destroy_preds() ring-buffer: use commit counters for commit pointer accounting ring-buffer: remove unused variable ring-buffer: have benchmark test handle discarded events ring-buffer: prevent adding write in discarded area tracing/filters: strloc should be unsigned short tracing/filters: operand can be negative ... Fix up kmemcheck-induced conflict in kernel/trace/ring_buffer.c manually	2009-06-20 10:56:46 -07:00
Linus Torvalds	38df92b8ce	Merge branch 'timers-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'timers-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: NOHZ: Properly feed cpufreq ondemand governor	2009-06-20 10:51:44 -07:00
Ingo Molnar	d4c4038343	Merge branch 'tip/tracing/urgent-1' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-2.6-trace into tracing/urgent	2009-06-20 18:26:48 +02:00
Ingo Molnar	3daeb4da9a	Merge branch 'tip/tracing/urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-2.6-trace into tracing/urgent	2009-06-20 17:25:49 +02:00
Peter Zijlstra	92bf309a9c	perf_counter: Push perf_sample_data through the swcounter code Push the perf_sample_data further outwards to the swcounter interface, to abstract it away some more. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-06-20 12:30:30 +02:00
Frederic Weisbecker	9ea1a153a4	tracing/urgent: warn in case of ftrace_start_up inbalance Prevent from further ftrace_start_up inbalances so that we avoid future nop patching omissions with dynamic ftrace. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Steven Rostedt <rostedt@goodmis.org>	2009-06-20 06:52:21 +02:00
Frederic Weisbecker	c85a17e226	tracing/urgent: fix unbalanced ftrace_start_up Perfcounter reports the following stats for a wide system profiling: # # (2364 samples) # # Overhead Symbol # ........ ...... # 15.40% [k] mwait_idle_with_hints 8.29% [k] read_hpet 5.75% [k] ftrace_caller 3.60% [k] ftrace_call [...] This snapshot has been taken while neither the function tracer nor the function graph tracer was running. With dynamic ftrace, such results show a wrong ftrace behaviour because all calls to ftrace_caller or ftrace_graph_caller (the patched calls to mcount) are supposed to be patched into nop if none of those tracers are running. The problem occurs after the first run of the function tracer. Once we launch it a second time, the callsites will never be nopped back, unless you set custom filters. For example it happens during the self tests at boot time. The function tracer selftest runs, and then the dynamic tracing is tested too. After that, the callsites are left un-nopped. This is because the reset callback of the function tracer tries to unregister two ftrace callbacks in once: the common function tracer and the function tracer with stack backtrace, regardless of which one is currently in use. It then creates an unbalance on ftrace_start_up value which is expected to be zero when the last ftrace callback is unregistered. When it reaches zero, the FTRACE_DISABLE_CALLS is set on the next ftrace command, triggering the patching into nop. But since it becomes unbalanced, ie becomes lower than zero, if the kernel functions are patched again (as in every further function tracer runs), they won't ever be nopped back. Note that ftrace_call and ftrace_graph_call are still patched back to ftrace_stub in the off case, but not the callers of ftrace_call and ftrace_graph_caller. It means that the tracing is well deactivated but we waste a useless call into every kernel function. This patch just unregisters the right ftrace_ops for the function tracer on its reset callback and ignores the other one which is not registered, fixing the unbalance. The problem also happens is .30 Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: stable@kernel.org	2009-06-20 06:28:46 +02:00
Oleg Nesterov	befca96779	ptrace: wait_task_zombie: do not account traced sub-threads The bug is ancient. If we trace the sub-thread of our natural child and this sub-thread exits, we update parent->signal->cxxx fields. But we should not do this until the whole thread-group exits, otherwise we account this thread (and all other live threads) twice. Add the task_detached() check. No need to check thread_group_empty(), wait_consider_task()->delay_group_leader() already did this. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Acked-by: Roland McGrath <roland@redhat.com> Cc: Stanislaw Gruszka <sgruszka@redhat.com> Cc: Vitaly Mayatskikh <vmayatsk@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-06-19 16:46:06 -07:00
Peter Zijlstra	b49a9e7e72	perf_counter: Close race in perf_lock_task_context() perf_lock_task_context() is buggy because it can return a dead context. the RCU read lock in perf_lock_task_context() only guarantees the memory won't get freed, it doesn't guarantee the object is valid (in our case refcount > 0). Therefore we can return a locked object that can get freed the moment we release the rcu read lock. perf_pin_task_context() then increases the refcount and does an unlock on freed memory. That increased refcount will cause a double free, in case it started out with 0. Ammend this by including the get_ctx() functionality in perf_lock_task_context() (all users already did this later anyway), and return a NULL context when the found one is already dead. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-06-19 17:57:36 +02:00
Peter Zijlstra	e5289d4a18	perf_counter: Simplify and fix task migration counting The task migrations counter was causing rare and hard to decypher memory corruptions under load. After a day of debugging and bisection we found that the problem was introduced with: 3f731ca: perf_counter: Fix cpu migration counter Turning them off fixes the crashes. Incidentally, the whole perf_counter_task_migration() logic can be done simpler as well, by injecting a proper sw-counter event. This cleanup also fixed the crashes. The precise failure mode is not completely clear yet, but we are clearly not unhappy about having a fix ;-) Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com> Cc: Marcelo Tosatti <mtosatti@redhat.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-06-19 13:43:12 +02:00
Steven Rostedt	71e308a239	function-graph: add stack frame test In case gcc does something funny with the stack frames, or the return from function code, we would like to detect that. An arch may implement passing of a variable that is unique to the function and can be saved on entering a function and can be tested when exiting the function. Usually the frame pointer can be used for this purpose. This patch also implements this for x86. Where it passes in the stack frame of the parent function, and will test that frame on exit. There was a case in x86_32 with optimize for size (-Os) where, for a few functions, gcc would align the stack frame and place a copy of the return address into it. The function graph tracer modified the copy and not the actual return address. On return from the funtion, it did not go to the tracer hook, but returned to the parent. This broke the function graph tracer, because the return of the parent (where gcc did not do this funky manipulation) returned to the location that the child function was suppose to. This caused strange kernel crashes. This test detected the problem and pointed out where the issue was. This modifies the parameters of one of the functions that the arch specific code calls, so it includes changes to arch code to accommodate the new prototype. Note, I notice that the parsic arch implements its own push_return_trace. This is now a generic function and the ftrace_push_return_trace should be used instead. This patch does not touch that code. Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Helge Deller <deller@gmx.de> Cc: Kyle McMartin <kyle@mcmartin.ca> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2009-06-18 18:40:18 -04:00
Steven Rostedt	eb4a03780d	function-graph: disable when both x86_32 and optimize for size are configured On x86_32, when optimize for size is set, gcc may align the frame pointer and make a copy of the the return address inside the stack frame. The return address that is located in the stack frame may not be the one used to return to the calling function. This will break the function graph tracer. The function graph tracer replaces the return address with a jump to a hook function that can trace the exit of the function. If it only replaces a copy, then the hook will not be called when the function returns. Worse yet, when the parent function returns, the function graph tracer will return back to the location of the child function which will easily crash the kernel with weird results. To see the problem, when i386 is compiled with -Os we get: c106be03: 57 push %edi c106be04: 8d 7c 24 08 lea 0x8(%esp),%edi c106be08: 83 e4 e0 and $0xffffffe0,%esp c106be0b: ff 77 fc pushl 0xfffffffc(%edi) c106be0e: 55 push %ebp c106be0f: 89 e5 mov %esp,%ebp c106be11: 57 push %edi c106be12: 56 push %esi c106be13: 53 push %ebx c106be14: 81 ec 8c 00 00 00 sub $0x8c,%esp c106be1a: e8 f5 57 fb ff call c1021614 <mcount> When it is compiled with -O2 instead we get: c10896f0: 55 push %ebp c10896f1: 89 e5 mov %esp,%ebp c10896f3: 83 ec 28 sub $0x28,%esp c10896f6: 89 5d f4 mov %ebx,0xfffffff4(%ebp) c10896f9: 89 75 f8 mov %esi,0xfffffff8(%ebp) c10896fc: 89 7d fc mov %edi,0xfffffffc(%ebp) c10896ff: e8 d0 08 fa ff call c1029fd4 <mcount> The compile with -Os will align the stack pointer then set up the frame pointer (%ebp), and it copies the return address back into the stack frame. The change to the return address in mcount is done to the copy and not the real place holder of the return address. Then compile with -O2 sets up the frame pointer first, this makes the change to the return address by mcount affect where the function will jump on exit. Reported-by: Jake Edge <jake@lwn.net> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2009-06-18 18:39:30 -04:00
Peter Oberparleiter	7bf99fb673	gcov: enable GCOV_PROFILE_ALL for x86_64 Enable gcov profiling of the entire kernel on x86_64. Required changes include disabling profiling for: * arch/kernel/acpi/realmode and arch/kernel/boot/compressed: not linked to main kernel * arch/vdso, arch/kernel/vsyscall_64 and arch/kernel/hpet: profiling causes segfaults during boot (incompatible context) Signed-off-by: Peter Oberparleiter <oberpar@linux.vnet.ibm.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: Huang Ying <ying.huang@intel.com> Cc: Li Wei <W.Li@Sun.COM> Cc: Michael Ellerman <michaele@au1.ibm.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Heiko Carstens <heicars2@linux.vnet.ibm.com> Cc: Martin Schwidefsky <mschwid2@linux.vnet.ibm.com> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: WANG Cong <xiyou.wangcong@gmail.com> Cc: Sam Ravnborg <sam@ravnborg.org> Cc: Jeff Dike <jdike@addtoit.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-06-18 13:03:58 -07:00
Peter Oberparleiter	2521f2c228	gcov: add gcov profiling infrastructure Enable the use of GCC's coverage testing tool gcov [1] with the Linux kernel. gcov may be useful for: * debugging (has this code been reached at all?) * test improvement (how do I change my test to cover these lines?) * minimizing kernel configurations (do I need this option if the associated code is never run?) The profiling patch incorporates the following changes: * change kbuild to include profiling flags * provide functions needed by profiling code * present profiling data as files in debugfs Note that on some architectures, enabling gcc's profiling option "-fprofile-arcs" for the entire kernel may trigger compile/link/ run-time problems, some of which are caused by toolchain bugs and others which require adjustment of architecture code. For this reason profiling the entire kernel is initially restricted to those architectures for which it is known to work without changes. This restriction can be lifted once an architecture has been tested and found compatible with gcc's profiling. Profiling of single files or directories is still available on all platforms (see config help text). [1] http://gcc.gnu.org/onlinedocs/gcc/Gcov.html Signed-off-by: Peter Oberparleiter <oberpar@linux.vnet.ibm.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: Huang Ying <ying.huang@intel.com> Cc: Li Wei <W.Li@Sun.COM> Cc: Michael Ellerman <michaele@au1.ibm.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Heiko Carstens <heicars2@linux.vnet.ibm.com> Cc: Martin Schwidefsky <mschwid2@linux.vnet.ibm.com> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: WANG Cong <xiyou.wangcong@gmail.com> Cc: Sam Ravnborg <sam@ravnborg.org> Cc: Jeff Dike <jdike@addtoit.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-06-18 13:03:57 -07:00
Peter Oberparleiter	b99b87f70c	kernel: constructor support Call constructors (gcc-generated initcall-like functions) during kernel start and module load. Constructors are e.g. used for gcov data initialization. Disable constructor support for usermode Linux to prevent conflicts with host glibc. Signed-off-by: Peter Oberparleiter <oberpar@linux.vnet.ibm.com> Acked-by: Rusty Russell <rusty@rustcorp.com.au> Acked-by: WANG Cong <xiyou.wangcong@gmail.com> Cc: Sam Ravnborg <sam@ravnborg.org> Cc: Jeff Dike <jdike@addtoit.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: Huang Ying <ying.huang@intel.com> Cc: Li Wei <W.Li@Sun.COM> Cc: Michael Ellerman <michaele@au1.ibm.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Heiko Carstens <heicars2@linux.vnet.ibm.com> Cc: Martin Schwidefsky <mschwid2@linux.vnet.ibm.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-06-18 13:03:57 -07:00
Alexey Dobriyan	90af90d7d3	nsproxy: extract create_nsproxy() clone_nsproxy() does useless copying of old nsproxy -- every pointer will be rewritten to new ns or to old ns. Remove copying, rename clone_nsproxy(), create_nsproxy() will be used by C/R code to create fresh nsproxy on restart. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Acked-by: Serge Hallyn <serue@us.ibm.com> Cc: Pavel Emelyanov <xemul@openvz.org> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-06-18 13:03:56 -07:00
Alexey Dobriyan	4c2a7e72d5	utsns: extract creeate_uts_ns() create_uts_ns() will be used by C/R to create fresh uts_ns. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Acked-by: Serge Hallyn <serue@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-06-18 13:03:55 -07:00
Alexey Dobriyan	dca4a97960	pidns: rewrite copy_pid_ns() copy_pid_ns() is a perfect example of a case where unwinding leads to more code and makes it less clear. Watch the diffstat. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Cc: Pavel Emelyanov <xemul@openvz.org> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Reviewed-by: Serge Hallyn <serue@us.ibm.com> Acked-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com> Reviewed-by: WANG Cong <xiyou.wangcong@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-06-18 13:03:55 -07:00
Alexey Dobriyan	ed469a63c3	pidns: make create_pid_namespace() accept parent pidns create_pid_namespace() creates everything, but caller has to assign parent pidns by hand, which is unnatural. At the moment of call new ->level has to be taken from somewhere and parent pidns is already available. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Cc: Pavel Emelyanov <xemul@openvz.org> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Acked-by: Serge Hallyn <serue@us.ibm.com> Acked-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com> Reviewed-by: WANG Cong <xiyou.wangcong@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-06-18 13:03:55 -07:00
Christoph Hellwig	17f98dcf60	pids: clean up find_task_by_pid variants find_task_by_pid_type_ns is only used to implement find_task_by_vpid and find_task_by_pid_ns, but both of them pass PIDTYPE_PID as first argument. So just fold find_task_by_pid_type_ns into find_task_by_pid_ns and use find_task_by_pid_ns to implement find_task_by_vpid. While we're at it also remove the exports for find_task_by_pid_ns and find_task_by_vpid - we don't have any modular callers left as the only modular caller of he old pre pid namespace find_task_by_pid (gfs2) was switched to pid_task which operates on a struct pid pointer instead of a pid_t. Given the confusion about pid_t values vs namespace that's generally the better option anyway and I think we're better of restricting modules to do it that way. Signed-off-by: Christoph Hellwig <hch@lst.de> Cc: Pavel Emelyanov <xemul@openvz.org> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-06-18 13:03:55 -07:00
Sukanto Ghosh	7338f29984	sysctl.c: remove unused variable Remoce the unused variable 'val' from __do_proc_dointvec() The integer has been declared and used as 'val = -val' and there is no reference to it anywhere. Signed-off-by: Sukanto Ghosh <sukanto.cse.iitb@gmail.com> Cc: Jaswinder Singh Rajput <jaswinder@kernel.org> Cc: Sukanto Ghosh <sukanto.cse.iitb@gmail.com> Cc: Jiri Kosina <jkosina@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-06-18 13:03:54 -07:00
Oleg Nesterov	371cbb387e	kthreads: simplify migration_thread() exit path Now that kthread_stop() can be used even if the task has already exited, we can kill the "wait_to_die:" loop in migration_thread(). But we must pin rq->migration_thread after creation. Actually, I don't think CPU_UP_CANCELED or CPU_DEAD should wait for ->migration_thread exit. Perhaps we can simplify this code a bit more. migration_call() can set ->should_stop and forget about this thread. But we need a new helper in kthred.c for that. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Cc: Christoph Hellwig <hch@lst.de> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Pavel Emelyanov <xemul@openvz.org> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Vitaliy Gusev <vgusev@openvz.org Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-06-18 13:03:54 -07:00
Oleg Nesterov	63706172f3	kthreads: rework kthread_stop() Based on Eric's patch which in turn was based on my patch. kthread_stop() has the nasty problems: - it runs unpredictably long with the global semaphore held. - it deadlocks if kthread itself does kthread_stop() before it obeys the kthread_should_stop() request. - it is not useable if kthread exits on its own, see for example the ugly "wait_to_die:" hack in migration_thread() - it is not possible to just tell kthread it should stop, we must always wait for its exit. With this patch kthread() allocates all neccesary data (struct kthread) on its own stack, globals kthread_stop_xxx are deleted. ->vfork_done is used as a pointer into "struct kthread", this means kthread_stop() can easily wait for kthread's exit. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Cc: Christoph Hellwig <hch@lst.de> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Pavel Emelyanov <xemul@openvz.org> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Vitaliy Gusev <vgusev@openvz.org Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-06-18 13:03:54 -07:00
Oleg Nesterov	cdd140bdd6	kthreads: simplify the startup synchronization We use two completions two create the kernel thread, this is a bit ugly. kthread() wakes up create_kthread() via ->started, then create_kthread() wakes up the caller kthread_create() via ->done. But kthread() does not need to wait for kthread(), it can just return. Instead kthread() itself can wake up the caller of kthread_create(). Kill kthread_create_info->started, ->done is enough. This improves the scalability a bit and sijmplifies the code. The only problem if kernel_thread() fails, in that case create_kthread() must do complete(&create->done). Signed-off-by: Oleg Nesterov <oleg@redhat.com> Cc: Christoph Hellwig <hch@lst.de> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Pavel Emelyanov <xemul@openvz.org> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Vitaliy Gusev <vgusev@openvz.org Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-06-18 13:03:53 -07:00
Richard Kennedy	e1eb1ebcca	mm: exit.c reorder wait_opts to remove padding on 64 bit builds Reorder struct wait_opts to remove 8 bytes of alignment padding on 64 bit builds. Signed-off-by: Richard Kennedy <richard@rsk.demon.co.uk> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Roland McGrath <roland@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-06-18 13:03:53 -07:00
Oleg Nesterov	f95d39d10f	do_wait: fix the theoretical race with stop/trace/cont do_wait: current->state = TASK_INTERRUPTIBLE; read_lock(&tasklist_lock); ... search for the task to reap ... In theory, the ->state changing can leak into the critical section. Since the child can change its status under read_lock(tasklist) in parallel (finish_stop/ptrace_stop), we can miss the wakeup if __wake_up_parent() sees us in TASK_RUNNING state. Add the barrier. Also, use __set_current_state() to set TASK_RUNNING. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Cc: Ingo Molnar <mingo@elte.hu> Acked-by: Roland McGrath <roland@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-06-18 13:03:53 -07:00
Oleg Nesterov	a3f6dfb729	do_wait: kill the old BUG_ON, use while_each_thread() do_wait() does BUG_ON(tsk->signal != current->signal), this looks like a raher obsolete check. At least, I don't think do_wait() is the best place to verify that all threads have the same ->signal. Remove it. Also, change the code to use while_each_thread(). Signed-off-by: Oleg Nesterov <oleg@redhat.com> Cc: Ingo Molnar <mingo@elte.hu> Acked-by: Roland McGrath <roland@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-06-18 13:03:53 -07:00
Oleg Nesterov	64a16caf5e	do_wait: simplify retval/tsk_result/notask_error mess Now that we don't pass &retval down to other helpers we can simplify the code more. - kill tsk_result, just use retval - add the "notask" label right after the main loop, and s/got end/goto notask/ after the fastpath pid check. This way we don't need to initialize retval before this check and the code becomes a bit more clean, if this pid has no attached tasks we should just skip the list search. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Cc: Ingo Molnar <mingo@elte.hu> Acked-by: Roland McGrath <roland@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-06-18 13:03:53 -07:00
Oleg Nesterov	9e8ae01d1c	introduce "struct wait_opts" to simplify do_wait() patches Introduce "struct wait_opts" which holds the parameters for misc helpers in do_wait() pathes. This adds 13 lines to kernel/exit.c, but saves 256 bytes from .o and imho makes the code much more readable. This patch temporary uglifies rusage/siginfo code a little bit, will be addressed by further cleanups. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Reviewed-by: Ingo Molnar <mingo@elte.hu> Acked-by: Roland McGrath <roland@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-06-18 13:03:52 -07:00
Oleg Nesterov	47918025ef	shift "ptrace implies WUNTRACED" from ptrace_do_wait() to wait_task_stopped() No functional changes, preparation for the next patch. ptrace_do_wait() adds WUNTRACED to options for wait_task_stopped() which should always accept the stopped tracee, even if do_wait() was called without WUNTRACED. Change wait_task_stopped() to check "ptrace \|\| WUNTRACED" instead. This makes the code more explicit, and "int options" argument becomes const in do_wait() pathes. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Roland McGrath <roland@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-06-18 13:03:52 -07:00
Oleg Nesterov	72a1de39f8	copy_process(): remove the unneeded clear_tsk_thread_flag(TIF_SIGPENDING) The forked child can have TIF_SIGPENDING if it was copied from parent's ti->flags. But this is harmless and actually almost never happens, because copy_process() can't succeed if signal_pending() == T. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Roland McGrath <roland@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-06-18 13:03:52 -07:00
Oleg Nesterov	77d1ef7956	wait_task_zombie: do not use thread_group_cputime() There is no reason for thread_group_cputime() in wait_task_zombie(), there must be no other threads. This call was previously needed to collect the per-cpu data which we do not have any longer. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Acked-by: Roland McGrath <roland@redhat.com> Cc: Stanislaw Gruszka <sgruszka@redhat.com> Cc: Vitaly Mayatskikh <vmayatsk@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-06-18 13:03:52 -07:00
Oleg Nesterov	e49612544c	ptrace: don't take tasklist to get/set ->last_siginfo Change ptrace_getsiginfo/ptrace_setsiginfo to use lock_task_sighand() without tasklist_lock. Perhaps it makes sense to make a single helper with "bool rw" argument. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Roland McGrath <roland@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-06-18 13:03:52 -07:00
Oleg Nesterov	d92656633b	ptrace: do_notify_parent_cldstop: fix the wrong ->nsproxy usage If the non-traced sub-thread calls do_notify_parent_cldstop(), we send the notification to group_leader->real_parent and we report group_leader's pid. But, if group_leader is traced we use the wrong ->parent->nsproxy->pid_ns, the tracer and parent can live in different namespaces. Change the code to use "parent" instead of tsk->parent. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Roland McGrath <roland@redhat.com> Acked-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-06-18 13:03:52 -07:00
Oleg Nesterov	d1e98f429a	ptrace: wait_task_zombie: s/->parent/->real_parent/ Change wait_task_zombie() to use ->real_parent instead of ->parent. We could even use current afaics, but ->real_parent is more clean. We know that the child is not ptrace_reparented() and thus they are equal. But we should avoid using task_struct->parent, we are going to remove it. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Roland McGrath <roland@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-06-18 13:03:51 -07:00
Oleg Nesterov	8053bdd5ce	ptrace_get_task_struct: s/tasklist/rcu/, make it static - Use rcu_read_lock() instead of tasklist_lock to find/get the task in ptrace_get_task_struct(). - Make it static, it has no callers outside of ptrace.c. - The comment doesn't match the reality, this helper does not do any checks. Beacuse it is really trivial and static I removed the whole comment. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Roland McGrath <roland@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-06-18 13:03:51 -07:00
Oleg Nesterov	4b105cbbaf	ptrace: do not use task_lock() for attach Remove the "Nasty, nasty" lock dance in ptrace_attach()/ptrace_traceme() - from now task_lock() has nothing to do with ptrace at all. With the recent changes nobody uses task_lock() to serialize with ptrace, but in fact it was never needed and it was never used consistently. However ptrace_attach() calls __ptrace_may_access() and needs task_lock() to pin task->mm for get_dumpable(). But we can call __ptrace_may_access() before we take tasklist_lock, ->cred_exec_mutex protects us against do_execve() path which can change creds and MMF_DUMP* flags. (ugly, but we can't use ptrace_may_access() because it hides the error code, so we have to take task_lock() and use __ptrace_may_access()). NOTE: this change assumes that LSM hooks, security_ptrace_may_access() and security_ptrace_traceme(), can be called without task_lock() held. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Cc: Chris Wright <chrisw@sous-sol.org> Acked-by: Roland McGrath <roland@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-06-18 13:03:51 -07:00
Oleg Nesterov	f2f0b00ad6	ptrace: cleanup check/set of PT_PTRACED during attach ptrace_attach() and ptrace_traceme() are the last functions which look as if the untraced task can have task->ptrace != 0, this must not be possible. Change the code to just check ->ptrace != 0 and s/\|=/=/ to set PT_PTRACED. Also, a couple of trivial whitespace cleanups in ptrace_attach(). And move ptrace_traceme() up near ptrace_attach() to keep them close to each other. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Cc: Chris Wright <chrisw@sous-sol.org> Acked-by: Roland McGrath <roland@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-06-18 13:03:51 -07:00
Oleg Nesterov	b79b7ba93d	ptrace: ptrace_attach: check PF_KTHREAD + exit_state instead of ->mm - Add PF_KTHREAD check to prevent attaching to the kernel thread with a borrowed ->mm. With or without this change we can race with daemonize() which can set PF_KTHREAD or clear ->mm after ptrace_attach() does the check, but this doesn't matter because reparent_to_kthreadd() does ptrace_unlink(). - Kill "!task->mm" check. We don't really care about ->mm != NULL, and the task can call exit_mm() right after we drop task_lock(). What we need is to make sure we can't attach after exit_notify(), check task->exit_state != 0 instead. Also, move the "already traced" check down for cosmetic reasons. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Cc: Chris Wright <chrisw@sous-sol.org> Acked-by: Roland McGrath <roland@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-06-18 13:03:51 -07:00
Oleg Nesterov	5cb1144689	ptrace: do not use task->ptrace directly in core kernel No functional changes. - Nobody except ptrace.c & co should use ptrace flags directly, we have task_ptrace() for that. - No need to specially check PT_PTRACED, we must not have other PT_ bits set without PT_PTRACED. And no need to know this flag exists. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Cc: Roland McGrath <roland@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-06-18 13:03:51 -07:00
Oleg Nesterov	dea33cfd99	ptrace: mm_need_new_owner: use ->real_parent to search in the siblings "Search in the siblings" should use ->real_parent, not ->parent. If the task is traced then ->parent == tracer, while the task's parent is always ->real_parent. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Roland McGrath <roland@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-06-18 13:03:49 -07:00
Oleg Nesterov	87245135d5	allow_signal: kill the bogus ->mm check, add a note about CLONE_SIGHAND allow_signal() checks ->mm == NULL. Not sure why. Perhaps to make sure current is the kernel thread. But this helper must not be used unless we are the kernel thread, kill this check. Also, document the fact that the CLONE_SIGHAND kthread must not use allow_signal(), unless the caller really wants to change the parent's ->sighand->action as well. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Roland McGrath <roland@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-06-18 13:03:48 -07:00
Daisuke Nishimura	c5b947b288	memcg: add interface to reset limits We don't have an interface to reset mem.limit or memsw.limit now. This patch allows to reset mem.limit or memsw.limit when they are being set to -1. Signed-off-by: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Balbir Singh <balbir@in.ibm.com> Cc: Li Zefan <lizf@cn.fujitsu.com> Cc: Dhaval Giani <dhaval@linux.vnet.ibm.com> Cc: YAMAMOTO Takashi <yamamoto@valinux.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-06-18 13:03:48 -07:00
Li Zefan	f9ab5b5b0f	cgroups: forbid noprefix if mounting more than just cpuset subsystem The 'noprefix' option was introduced for backwards-compatibility of cpuset, but actually it can be used when mounting other subsystems. This results in possibility of name collision, and now the collision can really happen, because we have 'stat' file in both memory and cpuacct subsystem: # mount -t cgroup -o noprefix,memory,cpuacct xxx /mnt Cgroup will happily mount the 2 subsystems, but only 'stat' file of memory subsys can be seen. We don't want users to use nopreifx, and also want to avoid name collision, so we change to allow noprefix only if mounting just the cpuset subsystem. [akpm@linux-foundation.org: fix shift for cpuset_subsys_id >= 32] Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> Cc: Paul Menage <menage@google.com> Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Balbir Singh <balbir@linux.vnet.ibm.com> Acked-by: Dhaval Giani <dhaval@linux.vnet.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-06-18 13:03:46 -07:00
Keika Kobayashi	aa0ce5bbc2	softirq: introduce statistics for softirq Statistics for softirq doesn't exist. It will be helpful like statistics for interrupts. This patch introduces counting the number of softirq, which will be exported in /proc/softirqs. When softirq handler consumes much CPU time, /proc/stat is like the following. $ while :; do cat /proc/stat \| head -n1 ; sleep 10 ; done cpu 88 0 408 739665 583 28 2 0 0 cpu 450 0 1090 740970 594 28 1294 0 0 ^^^^ softirq In such a situation, /proc/softirqs shows us which softirq handler is invoked. We can see the increase rate of softirqs. <before> $ cat /proc/softirqs CPU0 CPU1 CPU2 CPU3 HI 0 0 0 0 TIMER 462850 462805 462782 462718 NET_TX 0 0 0 365 NET_RX 2472 2 2 40 BLOCK 0 0 381 1164 TASKLET 0 0 0 224 SCHED 462654 462689 462698 462427 RCU 3046 2423 3367 3173 <after> $ cat /proc/softirqs CPU0 CPU1 CPU2 CPU3 HI 0 0 0 0 TIMER 463361 465077 465056 464991 NET_TX 53 0 1 365 NET_RX 3757 2 2 40 BLOCK 0 0 398 1170 TASKLET 0 0 0 224 SCHED 463074 464318 464612 463330 RCU 3505 2948 3947 3673 When CPU TIME of softirq is high, the rates of increase is the following. TIMER : 220/sec : CPU1-3 NET_TX : 5/sec : CPU0 NET_RX : 120/sec : CPU0 SCHED : 40-200/sec : all CPU RCU : 45-58/sec : all CPU The rates of increase in an idle mode is the following. TIMER : 250/sec SCHED : 250/sec RCU : 2/sec It seems many softirqs for receiving packets and rcu are invoked. This gives us help for checking system. Signed-off-by: Keika Kobayashi <kobayashi.kk@ncos.nec.co.jp> Reviewed-by: Hiroshi Shimamoto <h-shimamoto@ct.jp.nec.com> Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Eric Dumazet <dada1@cosmosbay.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-06-18 13:03:40 -07:00
Peter Zijlstra	43a21ea81a	perf_counter: Add event overlow handling Alternative method of mmap() data output handling that provides better overflow management and a more reliable data stream. Unlike the previous method, that didn't have any user->kernel feedback and relied on userspace keeping up, this method relies on userspace writing its last read position into the control page. It will ensure new output doesn't overwrite not-yet read events, new events for which there is no space left are lost and the overflow counter is incremented, providing exact event loss numbers. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-06-18 14:46:11 +02:00
Steven Rostedt	4b221f0313	ring-buffer: have benchmark test print to trace buffer Currently the output of the ring buffer benchmark/test prints to the console. This test runs for ten seconds every ten seconds and ouputs the result after every iteration. This needlessly fills up the logs. This patch makes the ring buffer benchmark/test print to the ftrace buffer using trace_printk. To view the test results, you must examine the debug/tracing/trace file. Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2009-06-17 17:01:09 -04:00
Steven Rostedt	8d707e8eb4	ring-buffer: do not grab locks in nmi If ftrace_dump_on_oops is set, and an NMI detects a lockup, then it will need to read from the ring buffer. But the read side of the ring buffer still takes locks. This patch adds a check on the read side that if it is in an NMI, then it will disable the ring buffer and not take any locks. Reads can still happen on a disabled ring buffer. Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2009-06-17 14:16:27 -04:00
Steven Rostedt	d47882078f	ring-buffer: add locks around rb_per_cpu_empty The checking of whether the buffer is empty or not needs to be serialized among the readers. Add the reader spin lock around it. Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2009-06-17 14:16:23 -04:00
Steven Rostedt	5f78abeebb	ring-buffer: check for less than two in size allocation The ring buffer must have at least two pages allocated for the reader page swap to work. The page count check will miss the case of a zero size passed in. Even though a zero size ring buffer would probably fail an allocation, making the min size check for less than two instead of equal to one makes the code a bit more robust. Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2009-06-17 14:16:20 -04:00
Steven Rostedt	0dcd4d6c3e	ring-buffer: remove useless compile check for buffer_page size The original version of the ring buffer had a hack to map the page struct that held the pages of the buffer to also be the structure that the ring buffer would keep the pages in a link list. This overlap of the page struct was very dangerous and that hack was removed a while ago. But there was a check to make sure the buffer_page never became bigger than the page struct, and would fail the compile if it did. The check was only meaningful when we had the hack. Now that we have separate allocated descriptors for the buffer pages, we can remove this check. Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2009-06-17 14:16:07 -04:00
Linus Torvalds	b7c142dbf1	Merge branch 'linux-next' of git://git.infradead.org/ubifs-2.6 * 'linux-next' of git://git.infradead.org/ubifs-2.6: UBIFS: start using hrtimers hrtimer: export ktime_add_safe UBIFS: do not forget to register BDI device UBIFS: allow sync option in rootflags UBIFS: remove dead code UBIFS: use anonymous device UBIFS: return proper error code if the compr is not present UBIFS: return error if link and unlink race UBIFS: reset no_space flag after inode deletion	2009-06-17 09:46:33 -07:00
Christian Engelmayer	3104bf03a9	sched: Fix out of scope variable access in sched_slice() Access to local variable lw is aliased by usage of pointer load. Access to pointer load in calc_delta_mine() happens when lw is already out of scope. [ Reported by static code analysis. ] Signed-off-by: Christian Engelmayer <christian.engelmayer@frequentis.com> LKML-Reference: <20090616103512.0c846e51@frequentis.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-06-17 18:37:54 +02:00
Hitoshi Mitake	348ec61e62	sched: Hide runqueues from direct refer at source code level There are some points which refer the per-cpu value "runqueues" directly. sched.c provides nice abstraction, such as cpu_rq() and this_rq(), so we should use these macros when looking runqueues. Signed-off-by: Hitoshi Mitake <mitake@dcl.info.waseda.ac.jp> LKML-Reference: <20090617.222055.374768827975756908.mitake@dcl.info.waseda.ac.jp> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-06-17 18:29:42 +02:00
Li Zefan	fd5e1b5dba	sched: Remove unneeded __ref tag Those two functions no longer call alloc_bootmmem_cpumask_var(), so no need to tag them with __init_refok. Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> Acked-by: Pekka Enberg <penberg@cs.helsinki.fi> LKML-Reference: <4A35DD5B.9050106@cn.fujitsu.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-06-17 16:08:04 +02:00
Ingo Molnar	a3d06cc6aa	Merge branch 'linus' into perfcounters/core Conflicts: arch/x86/include/asm/kmap_types.h include/linux/mm.h include/asm-generic/kmap_types.h Merge reason: We crossed changes with kmap_types.h cleanups in mainline. Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-06-17 13:06:17 +02:00
Linus Torvalds	517d08699b	Merge branch 'akpm' * akpm: (182 commits) fbdev: bf54x-lq043fb: use kzalloc over kmalloc/memset fbdev: bfin: fix __dev{init,exit} markings fbdev: bfin: drop unnecessary calls to memset fbdev: bfin-t350mcqb-fb: drop unused local variables fbdev: blackfin has __raw I/O accessors, so use them in fb.h fbdev: s1d13xxxfb: add accelerated bitblt functions tcx: use standard fields for framebuffer physical address and length fbdev: add support for handoff from firmware to hw framebuffers intelfb: fix a bug when changing video timing fbdev: use framebuffer_release() for freeing fb_info structures radeon: P2G2CLK_ALWAYS_ONb tested twice, should 2nd be P2G2CLK_DAC_ALWAYS_ONb? s3c-fb: CPUFREQ frequency scaling support s3c-fb: fix resource releasing on error during probing carminefb: fix possible access beyond end of carmine_modedb[] acornfb: remove fb_mmap function mb862xxfb: use CONFIG_OF instead of CONFIG_PPC_OF mb862xxfb: restrict compliation of platform driver to PPC Samsung SoC Framebuffer driver: add Alpha Channel support atmel-lcdc: fix pixclock upper bound detection offb: use framebuffer_alloc() to allocate fb_info struct ... Manually fix up conflicts due to kmemcheck in mm/slab.c	2009-06-16 19:50:13 -07:00
Chris Peterson	009789f040	slow-work: use round_jiffies() for thread pool's cull and OOM timers Round the slow work queue's cull and OOM timeouts to whole second boundary with round_jiffies(). The slow work queue uses a pair of timers to cull idle threads and, after OOM, to delay new thread creation. This patch also extracts the mod_timer() logic for the cull timer into a separate helper function. By rounding non-time-critical timers such as these to whole seconds, they will be batched up to fire at the same time rather than being spread out. This allows the CPU wake up less, which saves power. Signed-off-by: Chris Peterson <cpeterso@cpeterso.com> Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-06-16 19:47:49 -07:00
Alexey Dobriyan	30639b6af8	groups: move code to kernel/groups.c Move supplementary groups implementation to kernel/groups.c . kernel/sys.c already accumulated quite a few random stuff. Do strictly copy/paste + add required headers to compile. Compile-tested on many configs and archs. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-06-16 19:47:48 -07:00
Robert P. J. Day	b33112d1cc	kernel/kfifo.c: replace conditional test with is_power_of_2() Signed-off-by: Robert P. J. Day <rpjday@crashcourse.ca> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-06-16 19:47:47 -07:00
KOSAKI Motohiro	6837765963	mm: remove CONFIG_UNEVICTABLE_LRU config option Currently, nobody wants to turn UNEVICTABLE_LRU off. Thus this configurability is unnecessary. Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Andi Kleen <andi@firstfloor.org> Acked-by: Minchan Kim <minchan.kim@gmail.com> Cc: David Woodhouse <dwmw2@infradead.org> Cc: Matt Mackall <mpm@selenic.com> Cc: Rik van Riel <riel@redhat.com> Cc: Lee Schermerhorn <lee.schermerhorn@hp.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-06-16 19:47:42 -07:00
Rafael J. Wysocki	7f33d49a2e	mm, PM/Freezer: Disable OOM killer when tasks are frozen Currently, the following scenario appears to be possible in theory: * Tasks are frozen for hibernation or suspend. * Free pages are almost exhausted. * Certain piece of code in the suspend code path attempts to allocate some memory using GFP_KERNEL and allocation order less than or equal to PAGE_ALLOC_COSTLY_ORDER. * __alloc_pages_internal() cannot find a free page so it invokes the OOM killer. * The OOM killer attempts to kill a task, but the task is frozen, so it doesn't die immediately. * __alloc_pages_internal() jumps to 'restart', unsuccessfully tries to find a free page and invokes the OOM killer. * No progress can be made. Although it is now hard to trigger during hibernation due to the memory shrinking carried out by the hibernation code, it is theoretically possible to trigger during suspend after the memory shrinking has been removed from that code path. Moreover, since memory allocations are going to be used for the hibernation memory shrinking, it will be even more likely to happen during hibernation. To prevent it from happening, introduce the oom_killer_disabled switch that will cause __alloc_pages_internal() to fail in the situations in which the OOM killer would have been called and make the freezer set this switch after tasks have been successfully frozen. [akpm@linux-foundation.org: be nicer to the namespace] Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Cc: Fengguang Wu <fengguang.wu@gmail.com> Cc: David Rientjes <rientjes@google.com> Acked-by: Pavel Machek <pavel@ucw.cz> Cc: Mel Gorman <mel@csn.ul.ie> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-06-16 19:47:40 -07:00
Mel Gorman	6484eb3e2a	page allocator: do not check NUMA node ID when the caller knows the node is valid Callers of alloc_pages_node() can optionally specify -1 as a node to mean "allocate from the current node". However, a number of the callers in fast paths know for a fact their node is valid. To avoid a comparison and branch, this patch adds alloc_pages_exact_node() that only checks the nid with VM_BUG_ON(). Callers that know their node is valid are then converted. Signed-off-by: Mel Gorman <mel@csn.ul.ie> Reviewed-by: Christoph Lameter <cl@linux-foundation.org> Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Reviewed-by: Pekka Enberg <penberg@cs.helsinki.fi> Acked-by: Paul Mundt <lethal@linux-sh.org> [for the SLOB NUMA bits] Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Nick Piggin <nickpiggin@yahoo.com.au> Cc: Dave Hansen <dave@linux.vnet.ibm.com> Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-06-16 19:47:32 -07:00
Miao Xie	58568d2a82	cpuset,mm: update tasks' mems_allowed in time Fix allocating page cache/slab object on the unallowed node when memory spread is set by updating tasks' mems_allowed after its cpuset's mems is changed. In order to update tasks' mems_allowed in time, we must modify the code of memory policy. Because the memory policy is applied in the process's context originally. After applying this patch, one task directly manipulates anothers mems_allowed, and we use alloc_lock in the task_struct to protect mems_allowed and memory policy of the task. But in the fast path, we didn't use lock to protect them, because adding a lock may lead to performance regression. But if we don't add a lock,the task might see no nodes when changing cpuset's mems_allowed to some non-overlapping set. In order to avoid it, we set all new allowed nodes, then clear newly disallowed ones. [lee.schermerhorn@hp.com: The rework of mpol_new() to extract the adjusting of the node mask to apply cpuset and mpol flags "context" breaks set_mempolicy() and mbind() with MPOL_PREFERRED and a NULL nodemask--i.e., explicit local allocation. Fix this by adding the check for MPOL_PREFERRED and empty node mask to mpol_new_mpolicy(). Remove the now unneeded 'nodes = NULL' from mpol_new(). Note that mpol_new_mempolicy() is always called with a non-NULL 'nodes' parameter now that it has been removed from mpol_new(). Therefore, we don't need to test nodes for NULL before testing it for 'empty'. However, just to be extra paranoid, add a VM_BUG_ON() to verify this assumption.] [lee.schermerhorn@hp.com: I don't think the function name 'mpol_new_mempolicy' is descriptive enough to differentiate it from mpol_new(). This function applies cpuset set context, usually constraining nodes to those allowed by the cpuset. However, when the 'RELATIVE_NODES flag is set, it also translates the nodes. So I settled on 'mpol_set_nodemask()', because the comment block for mpol_new() mentions that we need to call this function to "set nodes". Some additional minor line length, whitespace and typo cleanup.] Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Christoph Lameter <cl@linux-foundation.org> Cc: Paul Menage <menage@google.com> Cc: Nick Piggin <nickpiggin@yahoo.com.au> Cc: Yasunori Goto <y-goto@jp.fujitsu.com> Cc: Pekka Enberg <penberg@cs.helsinki.fi> Cc: David Rientjes <rientjes@google.com> Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-06-16 19:47:31 -07:00
Miao Xie	950592f7b9	cpusets: update tasks' page/slab spread flags in time Fix the bug that the kernel didn't spread page cache/slab object evenly over all the allowed nodes when spread flags were set by updating tasks' page/slab spread flags in time. Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Christoph Lameter <cl@linux-foundation.org> Cc: Paul Menage <menage@google.com> Cc: Nick Piggin <nickpiggin@yahoo.com.au> Cc: Yasunori Goto <y-goto@jp.fujitsu.com> Cc: Pekka Enberg <penberg@cs.helsinki.fi> Cc: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-06-16 19:47:31 -07:00
Miao Xie	f3b39d47eb	cpusets: restructure the function cpuset_update_task_memory_state() The kernel still allocates the page caches on old node after modifying its cpuset's mems when 'memory_spread_page' was set, or it didn't spread the page cache evenly over all the nodes that faulting task is allowed to usr after memory_spread_page was set. it is caused by the old mem_allowed and flags of the task, the current kernel doesn't updates them unless some function invokes cpuset_update_task_memory_state(), it is too late sometimes.We must update the mem_allowed and the flags of the tasks in time. Slab has the same problem. The following patches fix this bug by updating tasks' mem_allowed and spread flag after its cpuset's mems or spread flag is changed. This patch: Extract a function from cpuset_update_task_memory_state(). It will be used later for update tasks' page/slab spread flags after its cpuset's flag is set Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Christoph Lameter <cl@linux-foundation.org> Cc: Paul Menage <menage@google.com> Cc: Nick Piggin <nickpiggin@yahoo.com.au> Cc: Yasunori Goto <y-goto@jp.fujitsu.com> Cc: Pekka Enberg <penberg@cs.helsinki.fi> Cc: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-06-16 19:47:31 -07:00
Steven Rostedt	c6a9d7b55e	ring-buffer: remove useless warn on check A check if "write > BUF_PAGE_SIZE" is done right after a if (write > BUF_PAGE_SIZE) return ...; Thus the check is actually testing the compiler and not the kernel. This is useless, remove it. Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2009-06-16 21:19:26 -04:00
Steven Rostedt	22f470f8da	ring-buffer: use BUF_PAGE_HDR_SIZE in calculating index The index of the event is found by masking PAGE_MASK to it and subtracting the header size. Currently the header size is calculate by PAGE_SIZE - BUF_PAGE_SIZE, when we already have a macro BUF_PAGE_HDR_SIZE to define it. If we want to change BUF_PAGE_SIZE to something less than filling the rest of the page (this is done for debugging), then we break the algorithm to find the index. Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2009-06-16 21:19:23 -04:00
Li Zefan	00e95830a4	tracing/filters: fix race between filter setting and module unload Module unload is protected by event_mutex, while setting filter is protected by filter_mutex. This leads to the race: echo 'bar == 0 \|\| bar == 10' \ \| > sample/filter \| \| insmod sample.ko add_pred("bar == 0") \| -> n_preds == 1 \| add_pred("bar == 100") \| -> n_preds == 2 \| \| rmmod sample.ko \| insmod sample.ko add_pred("&&") \| -> n_preds == 1 (should be 3) \| Now event->filter->preds is corrupted. An then when filter_match_preds() is called, the WARN_ON() in it will be triggered. To avoid the race, we remove filter_mutex, and replace it with event_mutex. [ Impact: prevent corruption of filters by module removing and loading ] Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> LKML-Reference: <4A375A4D.6000205@cn.fujitsu.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2009-06-16 16:25:37 -04:00
Li Zefan	57be88878e	tracing/filters: free filter_string in destroy_preds() filter->filter_string is not freed when unloading a module: # insmod trace-events-sample.ko # echo "bar < 100" > /mnt/tracing/events/sample/foo_bar/filter # rmmod trace-events-sample.ko [ Impact: fix memory leak when unloading module ] Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> LKML-Reference: <4A375A30.9060802@cn.fujitsu.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2009-06-16 16:25:35 -04:00
Steven Rostedt	fa7439531d	ring-buffer: use commit counters for commit pointer accounting The ring buffer is made up of three sets of pointers. The head page pointer, which points to the next page for the reader to get. The commit pointer and commit index, which points to the page and index of the last committed write respectively. The tail pointer and tail index, which points to the page and the index of the last reserved data respectively (non committed). The commit pointer is only moved forward by the outer most writer. If a nested writer comes in, it will not move the pointer forward. The current implementation has a flaw. It assumes that the outer most writer successfully reserved data. There's a small race window where the outer most writer could find the tail pointer, but a nested writer could come in (via interrupt) and move the tail forward, and even the commit forward. The outer writer would not realized the commit moved forward and the accounting will break. This patch changes the design to use counters in the per cpu buffers to keep track of commits. The counters are incremented at the start of the commit, and decremented at the end. If the end commit counter is 1, then it moves the commit pointers. A loop is made to check for races between checking and moving the commit pointers. Only the outer commit should move the pointers anyway. The test of knowing if a reserve is equal to the last commit update is still needed to know for time keeping. The time code is much less racey than the commit updates. This change not only solves the mentioned race, but also makes the code simpler. [ Impact: fix commit race and simplify code ] Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2009-06-16 16:25:33 -04:00
Steven Rostedt	263294f3e1	ring-buffer: remove unused variable Fix the compiler error: kernel/trace/ring_buffer.c: In function 'rb_move_tail': kernel/trace/ring_buffer.c:1236: warning: unused variable 'event' Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2009-06-16 16:24:39 -04:00
Linus Torvalds	b3fec0fe35	Merge branch 'for-linus2' of git://git.kernel.org/pub/scm/linux/kernel/git/vegard/kmemcheck * 'for-linus2' of git://git.kernel.org/pub/scm/linux/kernel/git/vegard/kmemcheck: (39 commits) signal: fix __send_signal() false positive kmemcheck warning fs: fix do_mount_root() false positive kmemcheck warning fs: introduce __getname_gfp() trace: annotate bitfields in struct ring_buffer_event net: annotate struct sock bitfield c2port: annotate bitfield for kmemcheck net: annotate inet_timewait_sock bitfields ieee1394/csr1212: fix false positive kmemcheck report ieee1394: annotate bitfield net: annotate bitfields in struct inet_sock net: use kmemcheck bitfields API for skbuff kmemcheck: introduce bitfield API kmemcheck: add opcode self-testing at boot x86: unify pte_hidden x86: make _PAGE_HIDDEN conditional kmemcheck: make kconfig accessible for other architectures kmemcheck: enable in the x86 Kconfig kmemcheck: add hooks for the page allocator kmemcheck: add hooks for page- and sg-dma-mappings kmemcheck: don't track page tables ...	2009-06-16 13:09:51 -07:00
Linus Torvalds	6fd03301d7	Merge git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core-2.6 * git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core-2.6: (64 commits) debugfs: use specified mode to possibly mark files read/write only debugfs: Fix terminology inconsistency of dir name to mount debugfs filesystem. xen: remove driver_data direct access of struct device from more drivers usb: gadget: at91_udc: remove driver_data direct access of struct device uml: remove driver_data direct access of struct device block/ps3: remove driver_data direct access of struct device s390: remove driver_data direct access of struct device parport: remove driver_data direct access of struct device parisc: remove driver_data direct access of struct device of_serial: remove driver_data direct access of struct device mips: remove driver_data direct access of struct device ipmi: remove driver_data direct access of struct device infiniband: ehca: remove driver_data direct access of struct device ibmvscsi: gadget: at91_udc: remove driver_data direct access of struct device hvcs: remove driver_data direct access of struct device xen block: remove driver_data direct access of struct device thermal: remove driver_data direct access of struct device scsi: remove driver_data direct access of struct device pcmcia: remove driver_data direct access of struct device PCIE: remove driver_data direct access of struct device ... Manually fix up trivial conflicts due to different direct driver_data direct access fixups in drivers/block/{ps3disk.c,ps3vram.c}	2009-06-16 12:57:37 -07:00
Linus Torvalds	b231125af7	printk: add KERN_DEFAULT loglevel to print_modules() Several WARN_ON() messages omit the '\n' at the end of the string, which is a simple (and understandable) error. The next line printed after that warning line is usually the current module list, and that printk does not have a log-level marker - resulting in one long mixed-up line. Adding this loglevel marker will now avoid this unreadable mess. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-06-16 11:07:14 -07:00
Linus Torvalds	e28d713704	printk: Add KERN_DEFAULT printk log-level This adds a KERN_DEFAULT loglevel marker, for when you cannot decide which loglevel you want, and just want to keep an existing printk with the default loglevel. The difference between having KERN_DEFAULT and having no log-level marker at all is two-fold: - having the log-level marker will now force a new-line if the previous printout had not added one (perhaps because it forgot, but perhaps because it expected a continuation) - having a log-level marker is required if you are printing out a message that otherwise itself could perhaps otherwise be mistaken for a log-level. Signed-of-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-06-16 11:02:28 -07:00
Linus Torvalds	5fd29d6ccb	printk: clean up handling of log-levels and newlines It used to be that we would only look at the log-level in a printk() after explicit newlines, which can cause annoying problems when the previous printk() did not end with a '\n'. In that case, the log-level marker would be just printed out in the middle of the line, and be seen as just noise rather than change the logging level. This changes things to always look at the log-level in the first bytes of the printout. If a log level marker is found, it is always used as the log-level. Additionally, if no newline existed, one is added (unless the log-level is the explicit KERN_CONT marker, to explicitly show that it's a continuation of a previous line). Acked-by: Arjan van de Ven <arjan@infradead.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-06-16 10:57:02 -07:00
Steven Rostedt	9086c7b90a	ring-buffer: have benchmark test handle discarded events With the addition of commit: `c7b0930857` ring-buffer: prevent adding write in discarded area The ring buffer may now add discarded events when a write passes the end of a buffer page. Before, a discarded event was only added when the tracer deliberately created one. The ring buffer benchmark test does not handle discarded events when it reads the buffer and fails when it encounters one. Also fix the increment for large data entries (luckily, the test did not add any yet). [ Impact: fix false failure of ring buffer self test ] Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2009-06-16 13:48:52 -04:00
GeunSik Lim	156f5a7801	debugfs: Fix terminology inconsistency of dir name to mount debugfs filesystem. Many developers use "/debug/" or "/debugfs/" or "/sys/kernel/debug/" directory name to mount debugfs filesystem for ftrace according to ./Documentation/tracers/ftrace.txt file. And, three directory names(ex:/debug/, /debugfs/, /sys/kernel/debug/) is existed in kernel source like ftrace, DRM, Wireless, Documentation, Network[sky2]files to mount debugfs filesystem. debugfs means debug filesystem for debugging easy to use by greg kroah hartman. "/sys/kernel/debug/" name is suitable as directory name of debugfs filesystem. - debugfs related reference: http://lwn.net/Articles/334546/ Fix inconsistency of directory name to mount debugfs filesystem. * From Steven Rostedt - find_debugfs() and tracing_files() in this patch. Signed-off-by: GeunSik Lim <geunsik.lim@samsung.com> Acked-by : Inaky Perez-Gonzalez <inaky@linux.intel.com> Reviewed-by : Steven Rostedt <rostedt@goodmis.org> Reviewed-by : James Smart <james.smart@emulex.com> CC: Jiri Kosina <trivial@kernel.org> CC: David Airlie <airlied@linux.ie> CC: Peter Osterlund <petero2@telia.com> CC: Ananth N Mavinakayanahalli <ananth@in.ibm.com> CC: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com> CC: Masami Hiramatsu <mhiramat@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2009-06-15 21:30:28 -07:00
Kay Sievers	3959214f97	sched: delayed cleanup of user_struct During bootup performance tracing we see repeated occurrences of /sys/kernel/uid/* events for the same uid, leading to a, in this case, rather pointless userspace processing for the same uid over and over. This is usually caused by tools which change their uid to "nobody", to run without privileges to read data supplied by untrusted users. This change delays the execution of the (already existing) scheduled work, to cleanup the uid after one second, so the allocated and announced uid can possibly be re-used by another process. This is the current behavior, where almost every invocation of a binary, which changes the uid, creates two events: $ read START < /sys/kernel/uevent_seqnum; \ for i in `seq 100`; do su --shell=/bin/true bin; done; \ read END < /sys/kernel/uevent_seqnum; \ echo $(($END - $START)) 178 With the delayed cleanup, we get only two events, and userspace finishes a bit faster too: $ read START < /sys/kernel/uevent_seqnum; \ for i in `seq 100`; do su --shell=/bin/true bin; done; \ read END < /sys/kernel/uevent_seqnum; \ echo $(($END - $START)) 1 Acked-by: Dhaval Giani <dhaval@linux.vnet.ibm.com> Signed-off-by: Kay Sievers <kay.sievers@vrfy.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2009-06-15 21:30:23 -07:00
Linus Torvalds	19035e5b5d	Merge branch 'timers-for-linus-migration' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'timers-for-linus-migration' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: timers: Logic to move non pinned timers timers: /proc/sys sysctl hook to enable timer migration timers: Identifying the existing pinned timers timers: Framework for identifying pinned timers timers: allow deferrable timers for intervals tv2-tv5 to be deferred Fix up conflicts in kernel/sched.c and kernel/timer.c manually	2009-06-15 10:06:19 -07:00
Linus Torvalds	f9db6e0951	Merge branch 'timers-for-linus-clockevents' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'timers-for-linus-clockevents' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: clockevent: export register_device and delta2ns clockevents: tick_broadcast_device can become static	2009-06-15 09:58:50 -07:00
Linus Torvalds	3f27c0d2a4	Merge branch 'timers-for-linus-clocksource' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'timers-for-linus-clocksource' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: clocksource: prevent selection of low resolution clocksourse also for nohz=on clocksource: sanity check sysfs clocksource changes	2009-06-15 09:58:33 -07:00
Steven Rostedt	c7b0930857	ring-buffer: prevent adding write in discarded area This a very tight race where an interrupt could come in and not have enough data to put into the end of a buffer page, and that it would fail to write and need to go to the next page. But if this happened when another writer was about to reserver their data, and that writer has smaller data to reserve, then it could succeed even though the interrupt moved the tail page. To pervent that, if we fail to store data, and by subtracting the amount we reserved we still have room for smaller data, we need to fill that space with "discarded" data. [ Impact: prevent race were buffer data may be lost ] Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2009-06-15 11:37:19 -04:00
Li Zefan	0ac2058f68	tracing/filters: strloc should be unsigned short I forgot to update filter code accordingly in "tracing/events: change the type of __str_loc_item to unsigned short" (commt `b0aae68cc5`) It can cause system crash: # echo 1 > tracing/events/irq/irq_handler_entry/enable # echo 'name == eth0' > tracing/events/irq/irq_handler_entry/filter [ Impact: fix crash while filtering on __string() field ] Acked-by: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> LKML-Reference: <4A35B905.3090500@cn.fujitsu.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2009-06-15 11:37:18 -04:00
Li Zefan	5e4904cb63	tracing/filters: operand can be negative This should be a bug: # cat format name: foo_bar ID: 71 format: ... field:int bar; offset:24; size:4; # echo 'bar < 0' > filter # echo 'bar < -1' > filter bash: echo: write error: Invalid argument [ Impact: fix to allow negative operand in filer expr ] Acked-by: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> LKML-Reference: <4A35B8DF.60400@cn.fujitsu.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2009-06-15 11:37:16 -04:00
Li Zefan	e4f2d10f47	tracing: replace a GFP_ATOMIC with GFP_KERNEL allocation Atomic allocation is not needed here. [ Impact: clean up of memory alloction type ] Acked-by: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> LKML-Reference: <4A35B898.2050607@cn.fujitsu.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2009-06-15 11:37:14 -04:00
Li Zefan	215368e8e5	tracing: fix a typo in tracing_cpumask_write() It's tracing_cpumask_new that should be kfree()ed. This causes tracing_cpumask to be freed due to the typo: # echo z > tracing_cpumask bash: echo: write error: Invalid argument And subsequent reads/writes to tracing_cpuamsk will access this already-freed tracing_cpumask, thus may lead to crash. [ Impact: fix leak and crash when writing invalid val to tracing_cpumask ] Acked-by: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> LKML-Reference: <4A35B86A.7070608@cn.fujitsu.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2009-06-15 11:37:12 -04:00
Rusty Russell	3f237a79dd	cpumask: use new operators in kernel/trace Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> LKML-Reference: <200906122115.30787.rusty@rustcorp.com.au> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2009-06-15 11:36:42 -04:00
Peter Zijlstra	75f937f24b	perf_counter: Fix ctx->mutex vs counter->mutex inversion Simon triggered a lockdep inversion report about us taking ctx->mutex vs counter->mutex in inverse orders. Fix that up. Reported-by: Simon Holm Thøgersen <odie@cs.aau.dk> Tested-by: Simon Holm Thøgersen <odie@cs.aau.dk> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-06-15 15:57:49 +02:00
Vegard Nossum	722f2a6c87	Merge commit 'linus/master' into HEAD Conflicts: MAINTAINERS Signed-off-by: Vegard Nossum <vegard.nossum@gmail.com>	2009-06-15 15:50:49 +02:00
Vegard Nossum	7a0aeb14e1	signal: fix __send_signal() false positive kmemcheck warning This false positive is due to field padding in struct sigqueue. When this dynamically allocated structure is copied to the stack (in arch- specific delivery code), kmemcheck sees a read from the padding, which is, naturally, uninitialized. Hide the false positive using the __GFP_NOTRACK_FALSE_POSITIVE flag. Also made the rlimit override code a bit clearer by introducing a new variable. Cc: Oleg Nesterov <oleg@tv-sign.ru> Signed-off-by: Vegard Nossum <vegard.nossum@gmail.com>	2009-06-15 15:49:43 +02:00
Vegard Nossum	1744a21d57	trace: annotate bitfields in struct ring_buffer_event This gets rid of a heap of false-positive warnings from the tracer code due to the use of bitfields. [rebased for mainline inclusion] Signed-off-by: Vegard Nossum <vegard.nossum@gmail.com>	2009-06-15 15:49:37 +02:00
Vegard Nossum	2dff440525	kmemcheck: add mm functions With kmemcheck enabled, the slab allocator needs to do this: 1. Tell kmemcheck to allocate the shadow memory which stores the status of each byte in the allocation proper, e.g. whether it is initialized or uninitialized. 2. Tell kmemcheck which parts of memory that should be marked uninitialized. There are actually a few more states, such as "not yet allocated" and "recently freed". If a slab cache is set up using the SLAB_NOTRACK flag, it will never return memory that can take page faults because of kmemcheck. If a slab cache is NOT set up using the SLAB_NOTRACK flag, callers can still request memory with the __GFP_NOTRACK flag. This does not prevent the page faults from occuring, however, but marks the object in question as being initialized so that no warnings will ever be produced for this object. In addition to (and in contrast to) __GFP_NOTRACK, the __GFP_NOTRACK_FALSE_POSITIVE flag indicates that the allocation should not be tracked _because_ it would produce a false positive. Their values are identical, but need not be so in the future (for example, we could now enable/disable false positives with a config option). Parts of this patch were contributed by Pekka Enberg but merged for atomicity. Signed-off-by: Vegard Nossum <vegard.nossum@gmail.com> Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi> Signed-off-by: Ingo Molnar <mingo@elte.hu> [rebased for mainline inclusion] Signed-off-by: Vegard Nossum <vegard.nossum@gmail.com>	2009-06-15 12:40:03 +02:00
Linus Torvalds	45e3e1935e	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/sam/kbuild-next * 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/sam/kbuild-next: (53 commits) .gitignore: ignore .lzma files kbuild: add generic --set-str option to scripts/config kbuild: simplify argument loop in scripts/config kbuild: handle non-existing options in scripts/config kallsyms: generalize text region handling kallsyms: support kernel symbols in Blackfin on-chip memory documentation: make version fix kbuild: fix a compile warning gitignore: Add GNU GLOBAL files to top .gitignore kbuild: fix delay in setlocalversion on readonly source README: fix misleading pointer to the defconf directory vmlinux.lds.h update kernel-doc: cleanup perl script Improve vmlinux.lds.h support for arch specific linker scripts kbuild: fix headers_exports with boolean expression kbuild/headers_check: refine extern check kbuild: fix "Argument list too long" error for "make headers_check", ignore .patch files Remove bashisms from scripts menu: fix embedded menu presentation ...	2009-06-14 14:12:18 -07:00
Linus Torvalds	489f7ab6c1	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (31 commits) trivial: remove the trivial patch monkey's name from SubmittingPatches trivial: Fix a typo in comment of addrconf_dad_start() trivial: usb: fix missing space typo in doc trivial: pci hotplug: adding __init/__exit macros to sgi_hotplug trivial: Remove the hyphen from git commands trivial: fix ETIMEOUT -> ETIMEDOUT typos trivial: Kconfig: .ko is normally not included in module names trivial: SubmittingPatches: fix typo trivial: Documentation/dell_rbu.txt: fix typos trivial: Fix Pavel's address in MAINTAINERS trivial: ftrace:fix description of trace directory trivial: unnecessary (void*) cast removal in sound/oss/msnd.c trivial: input/misc: Fix typo in Kconfig trivial: fix grammo in bus_for_each_dev() kerneldoc trivial: rbtree.txt: fix rb_entry() parameters in sample code trivial: spelling fix in ppc code comments trivial: fix typo in bio_alloc kernel doc trivial: Documentation/rbtree.txt: cleanup kerneldoc of rbtree.txt trivial: Miscellaneous documentation typo fixes trivial: fix typo milisecond/millisecond for documentation and source comments. ...	2009-06-14 13:46:25 -07:00
Linus Torvalds	a2ee2981ae	Merge branch 'x86-mce-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-mce-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (80 commits) x86, mce: Add boot options for corrected errors x86, mce: Fix mce printing x86, mce: fix for mce counters x86, mce: support action-optional machine checks x86, mce: define MCE_VECTOR x86, mce: rename mce_notify_user to mce_notify_irq x86: fix panic with interrupts off (needed for MCE) x86, mce: export MCE severities coverage via debugfs x86, mce: implement new status bits x86, mce: print header/footer only once for multiple MCEs x86, mce: default to panic timeout for machine checks x86, mce: improve mce_get_rip x86, mce: make non Monarch panic message "Fatal machine check" too x86, mce: switch x86 machine check handler to Monarch election. x86, mce: implement panic synchronization x86, mce: implement bootstrapping for machine check wakeups x86, mce: check early in exception handler if panic is needed x86, mce: add table driven machine check grading x86, mce: remove TSC print heuristic x86, mce: log corrected errors when panicing ...	2009-06-13 13:14:51 -07:00
Vegard Nossum	dfec072ecd	kmemcheck: add the kmemcheck core General description: kmemcheck is a patch to the linux kernel that detects use of uninitialized memory. It does this by trapping every read and write to memory that was allocated dynamically (e.g. using kmalloc()). If a memory address is read that has not previously been written to, a message is printed to the kernel log. Thanks to Andi Kleen for the set_memory_4k() solution. Andrew Morton suggested documenting the shadow member of struct page. Signed-off-by: Vegard Nossum <vegardno@ifi.uio.no> Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi> [export kmemcheck_mark_initialized] [build fix for setup_max_cpus] Signed-off-by: Ingo Molnar <mingo@elte.hu> [rebased for mainline inclusion] Signed-off-by: Vegard Nossum <vegardno@ifi.uio.no>	2009-06-13 15:37:30 +02:00

1 2 3 4 5 ...

7589 Commits