Commit Graph

313 Commits

Author SHA1 Message Date
Huang Rui 0fb0328d34 sched/x86: Drop repeated word from mwait_idle() comment
A single "default" is fine.

Signed-off-by: Huang Rui <ray.huang@amd.com>
[ Fix another typo and reflow comment. ]
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1432022472-2224-5-git-send-email-ray.huang@amd.com
Link: http://lkml.kernel.org/r/1432628901-18044-7-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-27 14:38:04 +02:00
Ingo Molnar 04c8e01d50 x86/fpu: Move fpu__clear() to 'struct fpu *' parameter passing
Do it like all other high level FPU state handling functions: they
only know about struct fpu, not about the task.

(Also remove a dead prototype while at it.)

Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 15:48:04 +02:00
Ingo Molnar 5033861575 x86/fpu: Synchronize the naming of drop_fpu() and fpu_reset_state()
drop_fpu() and fpu_reset_state() are similar in functionality
and in scope, yet this is not apparent from their names.

drop_fpu() deactivates FPU contents (both the fpregs and the fpstate),
but leaves register contents intact in the eager-FPU case, mostly as an
optimization. It disables fpregs in the lazy FPU case. The drop_fpu()
method can be used to destroy FPU state in an optimized way, when we
know that a new state will be loaded before user-space might see
any remains of the old FPU state:

     - such as in sys_exit()'s exit_thread() where we know this task
       won't execute any user-space instructions anymore and the
       next context switch cleans up the FPU. The old FPU state
       might still be around in the eagerfpu case but won't be
       saved.

     - in __restore_xstate_sig(), where we use drop_fpu() before
       copying a new state into the fpstate and activating that one.
       No user-pace instructions can execute between those steps.

     - in sys_execve()'s fpu__clear(): there we use drop_fpu() in
       the !eagerfpu case, where it's equivalent to a full reinit.

fpu_reset_state() is a stronger version of drop_fpu(): both in
the eagerfpu and the lazy-FPU case it guarantees that fpregs
are reinitialized to init state. This method is used in cases
where we need a full reset:

     - handle_signal() uses fpu_reset_state() to reset the FPU state
       to init before executing a user-space signal handler. While we
       have already saved the original FPU state at this point, and
       always restore the original state, the signal handling code
       still has to do this reinit, because signals may interrupt
       any user-space instruction, and the FPU might be in various
       intermediate states (such as an unbalanced x87 stack) that is
       not immediately usable for general C signal handler code.

     - __restore_xstate_sig() uses fpu_reset_state() when the signal
       frame has no FP context. Since the signal handler may have
       modified the FPU state, it gets reset back to init state.

     - in another branch __restore_xstate_sig() uses fpu_reset_state()
       to handle a restoration error: when restore_user_xstate() fails
       to restore FPU state and we might have inconsistent FPU data,
       fpu_reset_state() is used to reset it back to a known good
       state.

     - __kernel_fpu_end() uses fpu_reset_state() in an error branch.
       This is in a 'must not trigger' error branch, so on bug-free
       kernels this never triggers.

     - fpu__restore() uses fpu_reset_state() in an error path
       as well: if the fpstate was set up with invalid FPU state
       (via ptrace or via a signal handler), then it's reset back
       to init state.

     - likewise, the scheduler's switch_fpu_finish() uses it in a
       restoration error path too.

Move both drop_fpu() and fpu_reset_state() to the fpu__*() namespace
and harmonize their naming with their function:

    fpu__drop()
    fpu__reset()

This clearly shows that both methods operate on the full state of the
FPU, just like fpu__restore().

Also add comments to explain what each function does.

Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 15:48:03 +02:00
Ingo Molnar c4d6ee6e2e x86/fpu: Remove failure paths from fpstate-alloc low level functions
Now that we always allocate the FPU context as part of task_struct there's
no need for separate allocations - remove them and their primary failure
handling code.

( Note that there's still secondary error codes that have become superfluous,
  those will be removed in separate patches. )

Move the somewhat misplaced setup_xstate_comp() call to the core.

Reviewed-by: Borislav Petkov <bp@alien8.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 15:47:50 +02:00
Ingo Molnar 78f7f1e54b x86/fpu: Rename fpu-internal.h to fpu/internal.h
This unifies all the FPU related header files under a unified, hiearchical
naming scheme:

 - asm/fpu/types.h:      FPU related data types, needed for 'struct task_struct',
                         widely included in almost all kernel code, and hence kept
                         as small as possible.

 - asm/fpu/api.h:        FPU related 'public' methods exported to other subsystems.

 - asm/fpu/internal.h:   FPU subsystem internal methods

 - asm/fpu/xsave.h:      XSAVE support internal methods

(Also standardize the header guard in asm/fpu/internal.h.)

Reviewed-by: Borislav Petkov <bp@alien8.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 15:47:31 +02:00
Ingo Molnar 2e8a310266 x86/fpu: Rename fpu__flush_thread() to fpu__clear()
The primary purpose of this function is to clear the current task's
FPU before an exec(), to not leak information from the previous task,
and to allow the new task to start with freshly initialized FPU
registers.

Rename the function to reflect this primary purpose.

Reviewed-by: Borislav Petkov <bp@alien8.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 15:47:29 +02:00
Ingo Molnar c69e098b1f x86/fpu: Use 'struct fpu' in fpu__copy()
Migrate this function to pure 'struct fpu' usage.

Reviewed-by: Borislav Petkov <bp@alien8.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 15:47:29 +02:00
Ingo Molnar ca6787ba0f x86/fpu: Remove 'struct task_struct' usage from drop_fpu()
Migrate this function to pure 'struct fpu' usage.

Reviewed-by: Borislav Petkov <bp@alien8.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 15:47:25 +02:00
Ingo Molnar a752b53d9d x86/fpu: Factor out fpu__copy()
Introduce fpu__copy() and use it in arch_dup_task_struct(),
thus moving another chunk of FPU logic to fpu/core.c.

Reviewed-by: Borislav Petkov <bp@alien8.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 15:47:19 +02:00
Ingo Molnar 8ffb53ab98 x86/fpu: Move task_xstate_cachep handling to core.c
This code was historically in process.c, now we have FPU core internals in
fpu/core.c instead - move it there.

Reviewed-by: Borislav Petkov <bp@alien8.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 15:47:19 +02:00
Ingo Molnar 81683cc827 x86/fpu: Factor out fpu__flush_thread() from flush_thread()
flush_thread() open codes a lot of FPU internals - create a separate
function for it in fpu/core.c.

Turns out that this does not hurt performance:

   text    data     bss     dec     hex filename
   11843039        1884440 1130496 14857975         e2b6f7 vmlinux.before
   11843039        1884440 1130496 14857975         e2b6f7 vmlinux.after

and since this is a slowpath clarity comes first anyway.

We can reconsider inlining decisions after the FPU code has been cleaned up.

Reviewed-by: Borislav Petkov <bp@alien8.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 15:47:17 +02:00
Ingo Molnar 11ad19277e x86/fpu: Remove the free_thread_xstate() complication
Use fpstate_free() directly to manage FPU state.

Only process.c was using this method, so this is a speedup as well,
as it removes the extra function call and related clobbers.

Reviewed-by: Borislav Petkov <bp@alien8.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 15:47:17 +02:00
Ingo Molnar f89e32e0a3 x86/fpu: Fix header file dependencies of fpu-internal.h
Fix a minor header file dependency bug in asm/fpu-internal.h: it
relies on i387.h but does not include it. All users of fpu-internal.h
included it explicitly.

Also remove unnecessary includes, to reduce compilation time.

This also makes it easier to use it as a standalone header file
for FPU internals, such as an upcoming C module in arch/x86/kernel/fpu/.

Reviewed-by: Borislav Petkov <bp@alien8.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 15:47:16 +02:00
Ingo Molnar c0c2803dee x86/fpu: Move thread_info::fpu_counter into thread_info::fpu.counter
This field is kept separate from the main FPU state structure for
no good reason.

Reviewed-by: Borislav Petkov <bp@alien8.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 15:47:14 +02:00
Ingo Molnar a7c2a83364 x86/fpu: Rename fpu_free() to fpstate_free()
Use the fpu__*() namespace.

Reviewed-by: Borislav Petkov <bp@alien8.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 15:47:13 +02:00
Ingo Molnar ed97b08546 x86/fpu: Rename fpu_alloc() to fpstate_alloc()
Use the fpu__*() namespace for fpstate_alloc() as well.

Also add a comment about FPU state alignment.

Reviewed-by: Borislav Petkov <bp@alien8.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 15:47:13 +02:00
Ingo Molnar 97185c95f7 x86/fpu: Split an fpstate_alloc_init() function out of init_fpu()
Most init_fpu() users don't want the register-saving aspect of the
function, they are calling it for 'current' and when FPU registers
are not allocated and initialized yet.

Split out a simplified API that does just that (and add debug-checks
for these conditions): fpstate_alloc_init().

Use it where appropriate.

Reviewed-by: Borislav Petkov <bp@alien8.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 15:47:10 +02:00
Bobby Powers c88d47480d x86/fpu: Always restore_xinit_state() when use_eager_cpu()
The following commit:

  f893959b08 ("x86/fpu: Don't abuse drop_init_fpu() in flush_thread()")

removed drop_init_fpu() usage from flush_thread(). This seems to break
things for me - the Go 1.4 test suite fails all over the place with
floating point comparision errors (offending commit found through
bisection).

The functional change was that flush_thread() after this commit
only calls restore_init_xstate() when both use_eager_fpu() and
!used_math() are true. drop_init_fpu() (now fpu_reset_state()) calls
restore_init_xstate() regardless of whether current used_math() - apply
the same logic here.

Switch used_math() -> tsk_used_math(tsk) to consistently use the grabbed
tsk instead of current, like in the rest of flush_thread().

Tested-by: Dave Hansen <dave.hansen@intel.com>
Signed-off-by: Bobby Powers <bobbypowers@gmail.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Oleg Nesterov <oleg@redhat.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Pekka Riikonen <priikone@iki.fi>
Cc: Quentin Casasnovas <quentin.casasnovas@oracle.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Suresh Siddha <sbsiddha@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Fixes: f893959b ("x86/fpu: Don't abuse drop_init_fpu() in flush_thread()")
Link: http://lkml.kernel.org/r/1430147441-9820-1-git-send-email-bobbypowers@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-06 11:22:03 +02:00
Marc Dionne de71ad2c97 x86: Make cpu_tss available to external modules
Commit 75182b1632 ("x86/asm/entry: Switch all C consumers of
kernel_stack to this_cpu_sp0()") changed current_thread_info
to use this_cpu_sp0, and indirectly made it rely on init_tss
which was exported with EXPORT_PER_CPU_SYMBOL_GPL.
As a result some macros and inline functions such as set/get_fs,
test_thread_flag and variants have been made unusable for
external modules.

Make cpu_tss exported with EXPORT_PER_CPU_SYMBOL so that these
functions are accessible again, as they were previously.

Signed-off-by: Marc Dionne <marc.dionne@your-file-system.com>
Acked-by: Andy Lutomirski <luto@amacapital.net>
Link: http://lkml.kernel.org/r/1430763404-21221-1-git-send-email-marc.dionne@your-file-system.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-05-05 20:40:31 +02:00
Linus Torvalds 421ec9017f Merge branch 'x86-fpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fpu changes from Ingo Molnar:
 "Various x86 FPU handling cleanups, refactorings and fixes (Borislav
  Petkov, Oleg Nesterov, Rik van Riel)"

* 'x86-fpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (21 commits)
  x86/fpu: Kill eager_fpu_init_bp()
  x86/fpu: Don't allocate fpu->state for swapper/0
  x86/fpu: Rename drop_init_fpu() to fpu_reset_state()
  x86/fpu: Fold __drop_fpu() into its sole user
  x86/fpu: Don't abuse drop_init_fpu() in flush_thread()
  x86/fpu: Use restore_init_xstate() instead of math_state_restore() on kthread exec
  x86/fpu: Introduce restore_init_xstate()
  x86/fpu: Document user_fpu_begin()
  x86/fpu: Factor out memset(xstate, 0) in fpu_finit() paths
  x86/fpu: Change xstateregs_get()/set() to use ->xsave.i387 rather than ->fxsave
  x86/fpu: Don't abuse FPU in kernel threads if use_eager_fpu()
  x86/fpu: Always allow FPU in interrupt if use_eager_fpu()
  x86/fpu: __kernel_fpu_begin() should clear fpu_owner_task even if use_eager_fpu()
  x86/fpu: Also check fpu_lazy_restore() when use_eager_fpu()
  x86/fpu: Use task_disable_lazy_fpu_restore() helper
  x86/fpu: Use an explicit if/else in switch_fpu_prepare()
  x86/fpu: Introduce task_disable_lazy_fpu_restore() helper
  x86/fpu: Move lazy restore functions up a few lines
  x86/fpu: Change math_error() to use unlazy_fpu(), kill (now) unused save_init_fpu()
  x86/fpu: Don't do __thread_fpu_end() if use_eager_fpu()
  ...
2015-04-13 13:24:23 -07:00
Linus Torvalds 60f898eeaa Merge branch 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 asm changes from Ingo Molnar:
 "There were lots of changes in this development cycle:

   - over 100 separate cleanups, restructuring changes, speedups and
     fixes in the x86 system call, irq, trap and other entry code, part
     of a heroic effort to deobfuscate a decade old spaghetti asm code
     and its C code dependencies (Denys Vlasenko, Andy Lutomirski)

   - alternatives code fixes and enhancements (Borislav Petkov)

   - simplifications and cleanups to the compat code (Brian Gerst)

   - signal handling fixes and new x86 testcases (Andy Lutomirski)

   - various other fixes and cleanups

  By their nature many of these changes are risky - we tried to test
  them well on many different x86 systems (there are no known
  regressions), and they are split up finely to help bisection - but
  there's still a fair bit of residual risk left so caveat emptor"

* 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (148 commits)
  perf/x86/64: Report regs_user->ax too in get_regs_user()
  perf/x86/64: Simplify regs_user->abi setting code in get_regs_user()
  perf/x86/64: Do report user_regs->cx while we are in syscall, in get_regs_user()
  perf/x86/64: Do not guess user_regs->cs, ss, sp in get_regs_user()
  x86/asm/entry/32: Tidy up JNZ instructions after TESTs
  x86/asm/entry/64: Reduce padding in execve stubs
  x86/asm/entry/64: Remove GET_THREAD_INFO() in ret_from_fork
  x86/asm/entry/64: Simplify jumps in ret_from_fork
  x86/asm/entry/64: Remove a redundant jump
  x86/asm/entry/64: Optimize [v]fork/clone stubs
  x86/asm/entry: Zero EXTRA_REGS for stub32_execve() too
  x86/asm/entry/64: Move stub_x32_execvecloser() to stub_execveat()
  x86/asm/entry/64: Use common code for rt_sigreturn() epilogue
  x86/asm/entry/64: Add forgotten CFI annotation
  x86/asm/entry/irq: Simplify interrupt dispatch table (IDT) layout
  x86/asm/entry/64: Move opportunistic sysret code to syscall code path
  x86, selftests: Add sigreturn selftest
  x86/alternatives: Guard NOPs optimization
  x86/asm/entry: Clear EXTRA_REGS for all executable formats
  x86/signal: Remove pax argument from restore_sigcontext
  ...
2015-04-13 13:16:36 -07:00
Linus Torvalds 7fd56474db Merge branch 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timer updates from Ingo Molnar:
 "The main changes in this cycle were:

   - clockevents state machine cleanups and enhancements (Viresh Kumar)

   - clockevents broadcast notifier horror to state machine conversion
     and related cleanups (Thomas Gleixner, Rafael J Wysocki)

   - clocksource and timekeeping core updates (John Stultz)

   - clocksource driver updates and fixes (Ben Dooks, Dmitry Osipenko,
     Hans de Goede, Laurent Pinchart, Maxime Ripard, Xunlei Pang)

   - y2038 fixes (Xunlei Pang, John Stultz)

   - NMI-safe ktime_get_raw_fast() and general refactoring of the clock
     code, in preparation to perf's per event clock ID support (Peter
     Zijlstra)

   - generic sched/clock fixes, optimizations and cleanups (Daniel
     Thompson)

   - clockevents cpu_down() race fix (Preeti U Murthy)"

* 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (94 commits)
  timers/PM: Drop unnecessary braces from tick_freeze()
  timers/PM: Fix up tick_unfreeze()
  timekeeping: Get rid of stale comment
  clockevents: Cleanup dead cpu explicitely
  clockevents: Make tick handover explicit
  clockevents: Remove broadcast oneshot control leftovers
  sched/idle: Use explicit broadcast oneshot control function
  ARM: Tegra: Use explicit broadcast oneshot control function
  ARM: OMAP: Use explicit broadcast oneshot control function
  intel_idle: Use explicit broadcast oneshot control function
  ACPI/idle: Use explicit broadcast control function
  ACPI/PAD: Use explicit broadcast oneshot control function
  x86/amd/idle, clockevents: Use explicit broadcast oneshot control functions
  clockevents: Provide explicit broadcast oneshot control functions
  clockevents: Remove the broadcast control leftovers
  ARM: OMAP: Use explicit broadcast control function
  intel_idle: Use explicit broadcast control function
  cpuidle: Use explicit broadcast control function
  ACPI/processor: Use explicit broadcast control function
  ACPI/PAD: Use explicit broadcast control function
  ...
2015-04-13 11:08:28 -07:00
Thomas Gleixner 435c350e81 x86/amd/idle, clockevents: Use explicit broadcast oneshot control functions
Replace the clockevents_notify() call with an explicit function call.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/8569669.lgxIty9PKW@vostro.rjw.lan
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-04-03 08:44:34 +02:00
Thomas Gleixner 162a688e84 x86/amd/idle, clockevents: Use explicit broadcast control function
Replace the clockevents_notify() call with an explicit function call.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1528188.S1pjqkSL1P@vostro.rjw.lan
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-04-03 08:44:31 +02:00
Ingo Molnar e1b63dec2d Merge branch 'sched/urgent' into sched/core, to pick up fixes before applying new patches
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-03-23 10:50:29 +01:00
Oleg Nesterov f893959b08 x86/fpu: Don't abuse drop_init_fpu() in flush_thread()
flush_thread() -> drop_init_fpu() is suboptimal and confusing. It does
drop_fpu() or restore_init_xstate() depending on !use_eager_fpu(). But
flush_thread() too checks eagerfpu right after that, and if it is true
then restore_init_xstate() just burns CPU for no reason. We are going to
load init_xstate_buf again after we set used_math()/user_has_fpu(), until
then the FPU state can't survive after switch_to().

Remove it, and change the "if (!use_eager_fpu())" to call drop_fpu().
While at it, clean up the tsk/current usage.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Pekka Riikonen <priikone@iki.fi>
Cc: Quentin Casasnovas <quentin.casasnovas@oracle.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Suresh Siddha <sbsiddha@gmail.com>
Link: http://lkml.kernel.org/r/20150313173030.GA31217@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-03-23 10:13:58 +01:00
Oleg Nesterov 9cb6ce823b x86/fpu: Use restore_init_xstate() instead of math_state_restore() on kthread exec
Change flush_thread() to do user_fpu_begin() and restore_init_xstate()
instead of math_state_restore().

Note: "TODO: cleanup this horror" is still valid. We do not need
init_fpu() at all, we only need fpu_alloc() and memset(0). But this
needs other changes, in particular user_fpu_begin() should set
used_math().

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Pekka Riikonen <priikone@iki.fi>
Cc: Quentin Casasnovas <quentin.casasnovas@oracle.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Suresh Siddha <sbsiddha@gmail.com>
Link: http://lkml.kernel.org/r/20150311173449.GE5032@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-03-23 10:13:58 +01:00
Ingo Molnar eda2360ad1 Linux 4.0-rc5
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABAgAGBQJVD1VGAAoJEHm+PkMAQRiG7yoH/juKOQ1zbxi5M+mleDEEJtA0
 RxQSojqEMWIKrWi8PNZxjENn1OZB6XOLIXOhlyAZBrmgsjO34p1DyXlZMznr/R8W
 kQ2Xxs061hRtB3OuruMIqOApUrjuqsaCwgbgUS1qWmqZcoyZN4oELyZMP6OOlqv5
 UUBZm8MfyXGyxrCcg39mjct3VEOhiuEcvL6SUxOC380CdSVAnyqHFPcz0JVqMUn9
 9RUBs0T9cMdhb0mZ2bfXzt6AKArj63G2nXOum+VzFcvspSm2U+MPIDCuoE+ZbTPS
 jqIAgG0rj1ezRyb5oeJrvlU0Yy3u/cXoMPs9+kORvpladooYNLti8ovh6qllm0I=
 =d/ye
 -----END PGP SIGNATURE-----

Merge tag 'v4.0-rc5' into x86/fpu, to prevent conflicts

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-03-23 10:13:36 +01:00
Andy Lutomirski d9e05cc5a5 x86/asm/entry: Unify and fix initial thread_struct::sp0 values
x86_32 and x86_64 need slightly different thread_struct::sp0 values, and
x86_32's was incorrect for init.

This never mattered -- the init thread never runs user code, so we never
used thread_struct::sp0 for anything.

Fix it and mostly unify them.

Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1b810c1d2e797e27bb4a7708c426101161edd1f6.1426009661.git.luto@amacapital.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-03-17 09:25:27 +01:00
Mike Galbraith f8e617f458 sched/idle/x86: Optimize unnecessary mwait_idle() resched IPIs
To fully take advantage of MWAIT, apparently the CLFLUSH instruction needs
another quirk on certain CPUs: proper barriers around it on certain machines.

On a Q6600 SMP system, pipe-test scheduling performance, cross core,
improves significantly:

  3.8.13                   487.2 KHz    1.000
  3.13.0-master            415.5 KHz     .852
  3.13.0-master+           415.2 KHz     .852     + restore mwait_idle
  3.13.0-master++          488.5 KHz    1.002     + restore mwait_idle + IPI fix

Since X86_BUG_CLFLUSH_MONITOR is already a quirk, don't create a separate
quirk for the extra smp_mb()s.

Signed-off-by: Mike Galbraith <bitbucket@online.de>
Cc: <stable@vger.kernel.org> # 3.10+
Cc: Borislav Petkov <bp@alien8.de>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Ian Malone <ibmalone@gmail.com>
Cc: Josh Boyer <jwboyer@redhat.com>
Cc: Len Brown <len.brown@intel.com>
Cc: Len Brown <lenb@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1390061684.5566.4.camel@marge.simpson.net
[ Ported to recent kernel, added comments about the quirk. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-03-16 11:14:22 +01:00
Len Brown b253149b84 sched/idle/x86: Restore mwait_idle() to fix boot hangs, to improve power savings and to improve performance
In Linux-3.9 we removed the mwait_idle() loop:

  69fb3676df ("x86 idle: remove mwait_idle() and "idle=mwait" cmdline param")

The reasoning was that modern machines should be sufficiently
happy during the boot process using the default_idle() HALT
loop, until cpuidle loads and either acpi_idle or intel_idle
invoke the newer MWAIT-with-hints idle loop.

But two machines reported problems:

 1. Certain Core2-era machines support MWAIT-C1 and HALT only.
    MWAIT-C1 is preferred for optimal power and performance.
    But if they support just C1, cpuidle never loads and
    so they use the boot-time default idle loop forever.

 2. Some laptops will boot-hang if HALT is used,
    but will boot successfully if MWAIT is used.
    This appears to be a hidden assumption in BIOS SMI,
    that is presumably valid on the proprietary OS
    where the BIOS was validated.

       https://bugzilla.kernel.org/show_bug.cgi?id=60770

So here we effectively revert the patch above, restoring
the mwait_idle() loop.  However, we don't bother restoring
the idle=mwait cmdline parameter, since it appears to add
no value.

Maintainer notes:

  For 3.9, simply revert 69fb3676df
  for 3.10, patch -F3 applies, fuzz needed due to __cpuinit use in
  context For 3.11, 3.12, 3.13, this patch applies cleanly

Tested-by: Mike Galbraith <bitbucket@online.de>
Signed-off-by: Len Brown <len.brown@intel.com>
Acked-by: Mike Galbraith <bitbucket@online.de>
Cc: <stable@vger.kernel.org> # 3.9+
Cc: Borislav Petkov <bp@alien8.de>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Ian Malone <ibmalone@gmail.com>
Cc: Josh Boyer <jwboyer@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/345254a551eb5a6a866e048d7ab570fd2193aca4.1389763084.git.len.brown@intel.com
[ Ported to recent kernels. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-03-16 11:14:21 +01:00
Andy Lutomirski d0a0de21f8 x86/asm/entry: Remove INIT_TSS and fold the definitions into 'cpu_tss'
The INIT_TSS is unnecessary.  Just define the initial TSS where
'cpu_tss' is defined.

While we're at it, merge the 32-bit and 64-bit definitions.  The
only syntactic change is that 32-bit kernels were computing sp0
as long, but now they compute it as unsigned long.

Verified by objdump: the contents and relocations of
.data..percpu..shared_aligned are unchanged on 32-bit and 64-bit
kernels.

Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/8fc39fa3f6c5d635e93afbdd1a0fe0678a6d7913.1425611534.git.luto@amacapital.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-03-06 08:32:58 +01:00
Andy Lutomirski 24933b82c0 x86/asm/entry: Rename 'init_tss' to 'cpu_tss'
It has nothing to do with init -- there's only one TSS per cpu.

Other names considered include:

 - current_tss: Confusing because we never switch the tss.
 - singleton_tss: Too long.

This patch was generated with 's/init_tss/cpu_tss/g'.  Followup
patches will fix INIT_TSS and INIT_TSS_IST by hand.

Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/da29fb2a793e4f649d93ce2d1ed320ebe8516262.1425611534.git.luto@amacapital.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-03-06 08:32:58 +01:00
Andy Lutomirski 8ef46a672a x86/asm/entry: Add this_cpu_sp0() to read sp0 for the current cpu
We currently store references to the top of the kernel stack in
multiple places: kernel_stack (with an offset) and
init_tss.x86_tss.sp0 (no offset).  The latter is defined by
hardware and is a clean canonical way to find the top of the
stack.  Add an accessor so we can start using it.

This needs minor paravirt tweaks.  On native, sp0 defines the
top of the kernel stack and is therefore always correct.  On Xen
and lguest, the hypervisor tracks the top of the stack, but we
want to start reading sp0 in the kernel.  Fixing this is simple:
just update our local copy of sp0 as well as the hypervisor's
copy on task switches.

Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/8d675581859712bee09a055ed8f785d80dac1eca.1425611534.git.luto@amacapital.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-03-06 08:32:57 +01:00
Oleg Nesterov 110d7f7513 x86/fpu: Don't abuse FPU in kernel threads if use_eager_fpu()
AFAICS, there is no reason why kernel threads should have FPU context
even if use_eager_fpu() == T. Now that interrupted_kernel_fpu_idle()
does not check __thread_has_fpu() in the use_eager_fpu() case, we
can remove the init_fpu() code from eager_fpu_init() and change
flush_thread() called by do_execve() to initialize FPU.

Note: of course, the change in flush_thread() is horrible and must be
cleanuped. We need the new helper, and flush_thread() should return the
error if init_fpu() fails.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Reviewed-by: Rik van Riel <riel@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Suresh Siddha <sbsiddha@gmail.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Link: http://lkml.kernel.org/r/20150119185212.GD16427@redhat.com
Signed-off-by: Borislav Petkov <bp@suse.de>
2015-02-23 15:50:45 +01:00
Rik van Riel 6a5fe8952b x86/fpu: Use task_disable_lazy_fpu_restore() helper
Replace magic assignments of fpu.last_cpu = ~0 with more explicit
task_disable_lazy_fpu_restore() calls.

Signed-off-by: Rik van Riel <riel@redhat.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/1423252925-14451-8-git-send-email-riel@redhat.com
Signed-off-by: Borislav Petkov <bp@suse.de>
2015-02-19 11:15:55 +01:00
Andy Lutomirski 375074cc73 x86: Clean up cr4 manipulation
CR4 manipulation was split, seemingly at random, between direct
(write_cr4) and using a helper (set/clear_in_cr4).  Unfortunately,
the set_in_cr4 and clear_in_cr4 helpers also poke at the boot code,
which only a small subset of users actually wanted.

This patch replaces all cr4 access in functions that don't leave cr4
exactly the way they found it with new helpers cr4_set_bits,
cr4_clear_bits, and cr4_set_bits_and_update_boot.

Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Vince Weaver <vince@deater.net>
Cc: "hillf.zj" <hillf.zj@alibaba-inc.com>
Cc: Valdis Kletnieks <Valdis.Kletnieks@vt.edu>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/495a10bdc9e67016b8fd3945700d46cfd5c12c2f.1414190806.git.luto@amacapital.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-02-04 12:10:41 +01:00
Oleg Nesterov dc56c0f9b8 x86, fpu: Shift "fpu_counter = 0" from copy_thread() to arch_dup_task_struct()
Cosmetic, but I think thread.fpu_counter should be initialized in
arch_dup_task_struct() too, along with other "fpu" variables. And
probably it make sense to turn it into thread.fpu->counter.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Link: http://lkml.kernel.org/r/20140902175730.GA21669@redhat.com
Reviewed-by: Suresh Siddha <sbsiddha@gmail.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2014-09-02 14:51:16 -07:00
Oleg Nesterov 5e23fee23e x86, fpu: copy_process: Sanitize fpu->last_cpu initialization
Cosmetic, but imho memset(&dst->thread.fpu, 0) is not good simply
because it hides the (important) usage of ->has_fpu/etc from grep.
Change this code to initialize the members explicitly.

And note that ->last_cpu = 0 looks simply wrong, this can confuse
fpu_lazy_restore() if per_cpu(fpu_owner_task, 0) has already exited
and copy_process() re-allocated the same task_struct. Fortunately
this is not actually possible because child->fpu_counter == 0 and
thus fpu_lazy_restore() will not be called, but still this is not
clean/robust.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Link: http://lkml.kernel.org/r/20140902175727.GA21666@redhat.com
Reviewed-by: Suresh Siddha <sbsiddha@gmail.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2014-09-02 14:51:16 -07:00
Oleg Nesterov f1853505d9 x86, fpu: copy_process: Avoid fpu_alloc/copy if !used_math()
arch_dup_task_struct() copies thread.fpu if fpu_allocated(), this
looks suboptimal and misleading. Say, a forking process could use
FPU only once in a signal handler but now tsk_used_math(src) == F,
in this case the child gets a copy of fpu->state for no reason. The
child won't use the saved registers anyway even if it starts to use
FPU, this can only avoid fpu_alloc() in do_device_not_available().

Change this code to check tsk_used_math(current) instead. We still
need to clear fpu->has_fpu/state, we could do this memset(0) under
fpu_allocated() check but I think this doesn't make sense. See also
the next change.

use_eager_fpu() assumes that fpu_allocated() is always true, but a
forking task (and thus its child) must always have PF_USED_MATH set,
otherwise the child can either use FPU without used_math() (note that
switch_fpu_prepare() doesn't do stts() in this case), or it will be
killed by do_device_not_available()->BUG_ON(use_eager_fpu).

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Link: http://lkml.kernel.org/r/20140902175723.GA21659@redhat.com
Reviewed-by: Suresh Siddha <sbsiddha@gmail.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2014-09-02 14:51:16 -07:00
Fenghua Yu 7496d6458f Define kernel API to get address of each state in xsave area
In standard form, each state is saved in the xsave area in fixed offset.
But in compacted form, offset of each saved state only can be calculated during
run time because some xstates may not be enabled and saved.

We define kernel API get_xsave_addr() returns address of a given state saved in a xsave area.

It can be called in kernel to get address of each xstate in xsave area in
either standard format or compacted format.

It's useful when kernel wants to directly access each state in xsave area.

Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Link: http://lkml.kernel.org/r/1401387164-43416-17-git-send-email-fenghua.yu@intel.com
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2014-05-29 14:33:09 -07:00
Nicolas Pitre 16f8b05abe sched/idle, x86: Remove redundant cpuidle_idle_call()
The core idle loop now takes care of it.

Signed-off-by: Nicolas Pitre <nico@linaro.org>
Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Cc: Preeti U Murthy <preeti@linux.vnet.ibm.com>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linuxppc-dev@lists.ozlabs.org
Cc: linux-sh@vger.kernel.org
Cc: linux-pm@vger.kernel.org
Cc: Russell King <linux@arm.linux.org.uk>
Cc: linaro-kernel@lists.linaro.org
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/n/tip-3ioazimg4j5iq6kdefks04i8@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-02-11 09:58:28 +01:00
Peter Zijlstra ea81174789 sched, idle: Fix the idle polling state logic
Mike reported that commit 7d1a9417 ("x86: Use generic idle loop")
regressed several workloads and caused excessive reschedule
interrupts.

The patch in question failed to notice that the x86 code had an
inverted sense of the polling state versus the new generic code (x86:
default polling, generic: default !polling).

Fix the two prominent x86 mwait based idle drivers and introduce a few
new generic polling helpers (fixing the wrong smp_mb__after_clear_bit
usage).

Also switch the idle routines to using tif_need_resched() which is an
immediate TIF_NEED_RESCHED test as opposed to need_resched which will
end up being slightly different.

Reported-by: Mike Galbraith <bitbucket@online.de>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: lenb@kernel.org
Cc: tglx@linutronix.de
Link: http://lkml.kernel.org/n/tip-nc03imb0etuefmzybzj7sprf@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-09-25 13:53:10 +02:00
Andi Kleen 277d5b40b7 x86, asmlinkage: Make several variables used from assembler/linker script visible
Plus one function, load_gs_index().

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Link: http://lkml.kernel.org/r/1375740170-7446-10-git-send-email-andi@firstfloor.org
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-08-06 14:20:13 -07:00
Paul Gortmaker 148f9bb877 x86: delete __cpuinit usage from all x86 files
The __cpuinit type of throwaway sections might have made sense
some time ago when RAM was more constrained, but now the savings
do not offset the cost and complications.  For example, the fix in
commit 5e427ec2d0 ("x86: Fix bit corruption at CPU resume time")
is a good example of the nasty type of bugs that can be created
with improper use of the various __init prefixes.

After a discussion on LKML[1] it was decided that cpuinit should go
the way of devinit and be phased out.  Once all the users are gone,
we can then finally remove the macros themselves from linux/init.h.

Note that some harmless section mismatch warnings may result, since
notify_cpu_starting() and cpu_up() are arch independent (kernel/cpu.c)
are flagged as __cpuinit  -- so if we remove the __cpuinit from
arch specific callers, we will also get section mismatch warnings.
As an intermediate step, we intend to turn the linux/init.h cpuinit
content into no-ops as early as possible, since that will get rid
of these warnings.  In any case, they are temporary and harmless.

This removes all the arch/x86 uses of the __cpuinit macros from
all C files.  x86 only had the one __CPUINIT used in assembly files,
and it wasn't paired off with a .previous or a __FINIT, so we can
delete it directly w/o any corresponding additional change there.

[1] https://lkml.org/lkml/2013/5/20/589

Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: x86@kernel.org
Acked-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: H. Peter Anvin <hpa@linux.intel.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2013-07-14 19:36:56 -04:00
Thomas Gleixner d7880812b3 idle: Add the stack canary init to cpu_startup_entry()
Moving x86 to the generic idle implementation (commit 7d1a9417 "x86:
Use generic idle loop") wreckaged the stack protector.

I stupidly missed that boot_init_stack_canary() must be inlined from a
function which never returns, but I put that call into
arch_cpu_idle_prepare() which of course returns.

I pondered to play tricks with arch_cpu_idle_prepare() first, but then
I noticed, that the other archs which have implemented the
stackprotector (ARM and SH) do not initialize the canary for the
non-boot cpus.

So I decided to move the boot_init_stack_canary() call into
cpu_startup_entry() ifdeffed with an CONFIG_X86 for now. This #ifdef
is just a temporary measure as I don't want to inflict the
boot_init_stack_canary() call on ARM and SH that late in the cycle.

I'll queue a patch for 3.11 which removes the #ifdef if the ARM/SH
maintainers have no objection.

Reported-by: Wouter van Kesteren <woutershep@gmail.com>
Cc: x86@kernel.org
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Paul Mundt <lethal@linux-sh.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2013-06-11 22:04:47 +02:00
Thomas Gleixner 97a5b81fa4 x86: Fix idle consolidation fallout
The core code expects the arch idle code to return with interrupts
enabled. The conversion missed two x86 cases which fail to do that.

Reported-and-tested-by: Markus Trippelsdorf <markus@trippelsdorf.de>
Tested-by: Borislav Petkov <bp@alien8.de>
Link: http://lkml.kernel.org/r/alpine.LFD.2.02.1305021557030.3972@ionos
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2013-05-07 16:24:03 +02:00
Tejun Heo a43cb95d54 dump_stack: unify debug information printed by show_regs()
show_regs() is inherently arch-dependent but it does make sense to print
generic debug information and some archs already do albeit in slightly
different forms.  This patch introduces a generic function to print debug
information from show_regs() so that different archs print out the same
information and it's much easier to modify what's printed.

show_regs_print_info() prints out the same debug info as dump_stack()
does plus task and thread_info pointers.

* Archs which didn't print debug info now do.

  alpha, arc, blackfin, c6x, cris, frv, h8300, hexagon, ia64, m32r,
  metag, microblaze, mn10300, openrisc, parisc, score, sh64, sparc,
  um, xtensa

* Already prints debug info.  Replaced with show_regs_print_info().
  The printed information is superset of what used to be there.

  arm, arm64, avr32, mips, powerpc, sh32, tile, unicore32, x86

* s390 is special in that it used to print arch-specific information
  along with generic debug info.  Heiko and Martin think that the
  arch-specific extra isn't worth keeping s390 specfic implementation.
  Converted to use the generic version.

Note that now all archs print the debug info before actual register
dumps.

An example BUG() dump follows.

 kernel BUG at /work/os/work/kernel/workqueue.c:4841!
 invalid opcode: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
 Modules linked in:
 CPU: 0 PID: 1 Comm: swapper/0 Not tainted 3.9.0-rc1-work+ #7
 Hardware name: empty empty/S3992, BIOS 080011  10/26/2007
 task: ffff88007c85e040 ti: ffff88007c860000 task.ti: ffff88007c860000
 RIP: 0010:[<ffffffff8234a07e>]  [<ffffffff8234a07e>] init_workqueues+0x4/0x6
 RSP: 0000:ffff88007c861ec8  EFLAGS: 00010246
 RAX: ffff88007c861fd8 RBX: ffffffff824466a8 RCX: 0000000000000001
 RDX: 0000000000000046 RSI: 0000000000000001 RDI: ffffffff8234a07a
 RBP: ffff88007c861ec8 R08: 0000000000000000 R09: 0000000000000000
 R10: 0000000000000001 R11: 0000000000000000 R12: ffffffff8234a07a
 R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
 FS:  0000000000000000(0000) GS:ffff88007dc00000(0000) knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
 CR2: ffff88015f7ff000 CR3: 00000000021f1000 CR4: 00000000000007f0
 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
 Stack:
  ffff88007c861ef8 ffffffff81000312 ffffffff824466a8 ffff88007c85e650
  0000000000000003 0000000000000000 ffff88007c861f38 ffffffff82335e5d
  ffff88007c862080 ffffffff8223d8c0 ffff88007c862080 ffffffff81c47760
 Call Trace:
  [<ffffffff81000312>] do_one_initcall+0x122/0x170
  [<ffffffff82335e5d>] kernel_init_freeable+0x9b/0x1c8
  [<ffffffff81c47760>] ? rest_init+0x140/0x140
  [<ffffffff81c4776e>] kernel_init+0xe/0xf0
  [<ffffffff81c6be9c>] ret_from_fork+0x7c/0xb0
  [<ffffffff81c47760>] ? rest_init+0x140/0x140
  ...

v2: Typo fix in x86-32.

v3: CPU number dropped from show_regs_print_info() as
    dump_stack_print_info() has been updated to print it.  s390
    specific implementation dropped as requested by s390 maintainers.

Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: David S. Miller <davem@davemloft.net>
Acked-by: Jesper Nilsson <jesper.nilsson@axis.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Fengguang Wu <fengguang.wu@intel.com>
Cc: Mike Frysinger <vapier@gentoo.org>
Cc: Vineet Gupta <vgupta@synopsys.com>
Cc: Sam Ravnborg <sam@ravnborg.org>
Acked-by: Chris Metcalf <cmetcalf@tilera.com>		[tile bits]
Acked-by: Richard Kuo <rkuo@codeaurora.org>		[hexagon bits]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-04-30 17:04:02 -07:00
Linus Torvalds df8edfa9af Merge branch 'x86-cpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 cpuid changes from Ingo Molnar:
 "The biggest change is x86 CPU bug handling refactoring and cleanups,
  by Borislav Petkov"

* 'x86-cpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86, CPU, AMD: Drop useless label
  x86, AMD: Correct {rd,wr}msr_amd_safe warnings
  x86: Fold-in trivial check_config function
  x86, cpu: Convert AMD Erratum 400
  x86, cpu: Convert AMD Erratum 383
  x86, cpu: Convert Cyrix coma bug detection
  x86, cpu: Convert FDIV bug detection
  x86, cpu: Convert F00F bug detection
  x86, cpu: Expand cpufeature facility to include cpu bugs
2013-04-30 08:34:38 -07:00
Thomas Gleixner 7d1a941731 x86: Use generic idle loop
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Paul McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Reviewed-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Cc: Magnus Damm <magnus.damm@gmail.com>
Link: http://lkml.kernel.org/r/20130321215235.486594473@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: x86@kernel.org
2013-04-08 17:39:29 +02:00
Borislav Petkov 7d7dc116e5 x86, cpu: Convert AMD Erratum 400
Convert AMD erratum 400 to the bug infrastructure. Then, retract all
exports for modules since they're not needed now and make the AMD
erratum checking machinery local to amd.c. Use forward declarations to
avoid shuffling too much code around needlessly.

Signed-off-by: Borislav Petkov <bp@suse.de>
Link: http://lkml.kernel.org/r/1363788448-31325-7-git-send-email-bp@alien8.de
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2013-04-02 10:12:55 -07:00
Rafael J. Wysocki 10baf04e95 Merge branch 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux
* 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux: (35 commits)
  PM idle: remove global declaration of pm_idle
  unicore32 idle: delete stray pm_idle comment
  openrisc idle: delete pm_idle
  mn10300 idle: delete pm_idle
  microblaze idle: delete pm_idle
  m32r idle: delete pm_idle, and other dead idle code
  ia64 idle: delete pm_idle
  cris idle: delete idle and pm_idle
  ARM64 idle: delete pm_idle
  ARM idle: delete pm_idle
  blackfin idle: delete pm_idle
  sparc idle: rename pm_idle to sparc_idle
  sh idle: rename global pm_idle to static sh_idle
  x86 idle: rename global pm_idle to static x86_idle
  APM idle: register apm_cpu_idle via cpuidle
  tools/power turbostat: display SMI count by default
  intel_idle: export both C1 and C1E
  cpuidle: remove vestage definition of cpuidle_state_usage.driver_data
  x86 idle: remove 32-bit-only "no-hlt" parameter, hlt_works_ok flag
  x86 idle: remove mwait_idle() and "idle=mwait" cmdline param
  ...

Conflicts:
	arch/x86/kernel/process.c (with PM / tracing commit 43720bd)
	drivers/acpi/processor_idle.c (with ACPICA commit 4f84291)
2013-02-18 22:34:11 +01:00
Len Brown ca62cf59ce Merge branch 'misc' into release
Conflicts:
	arch/x86/kernel/process.c

Signed-off-by: Len Brown <len.brown@intel.com>
2013-02-18 00:25:53 -05:00
Len Brown a476bda30b x86 idle: rename global pm_idle to static x86_idle
(pm_idle)() is being removed from linux/pm.h
because Linux does not have such a cross-architecture concept.

x86 uses an idle function pointer in its architecture
specific code as a backup to cpuidle.  So we re-name
x86 use of pm_idle to x86_idle, and make it static to x86.

Signed-off-by: Len Brown <len.brown@intel.com>
Cc: x86@kernel.org
2013-02-17 23:34:58 -05:00
Len Brown dd8af07626 APM idle: register apm_cpu_idle via cpuidle
Update APM to register its local idle routine with cpuidle.

This allows us to stop exporting pm_idle to modules on x86.

The Kconfig sub-option, APM_CPU_IDLE, now depends on on CPU_IDLE.

Compile-tested only.

Signed-off-by: Len Brown <len.brown@intel.com>
Reviewed-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Cc: Jiri Kosina <jkosina@suse.cz>
2013-02-17 23:34:46 -05:00
Len Brown 27be457000 x86 idle: remove 32-bit-only "no-hlt" parameter, hlt_works_ok flag
Remove 32-bit x86 a cmdline param "no-hlt",
and the cpuinfo_x86.hlt_works_ok that it sets.

If a user wants to avoid HLT, then "idle=poll"
is much more useful, as it avoids invocation of HLT
in idle, while "no-hlt" failed to do so.

Indeed, hlt_works_ok was consulted in only 3 places.

First, in /proc/cpuinfo where "hlt_bug yes"
would be printed if and only if the user booted
the system with "no-hlt" -- as there was no other code
to set that flag.

Second, check_hlt() would not invoke halt() if "no-hlt"
were on the cmdline.

Third, it was consulted in stop_this_cpu(), which is invoked
by native_machine_halt()/reboot_interrupt()/smp_stop_nmi_callback() --
all cases where the machine is being shutdown/reset.
The flag was not consulted in the more frequently invoked
play_dead()/hlt_play_dead() used in processor offline and suspend.

Since Linux-3.0 there has been a run-time notice upon "no-hlt" invocations
indicating that it would be removed in 2012.

Signed-off-by: Len Brown <len.brown@intel.com>
Cc: x86@kernel.org
2013-02-10 03:32:22 -05:00
Len Brown 69fb3676df x86 idle: remove mwait_idle() and "idle=mwait" cmdline param
mwait_idle() is a C1-only idle loop intended to be more efficient
than HLT, starting on Pentium-4 HT-enabled processors.

But mwait_idle() has been replaced by the more general
mwait_idle_with_hints(), which handles both C1 and deeper C-states.
ACPI processor_idle and intel_idle use only mwait_idle_with_hints(),
and no longer use mwait_idle().

Here we simplify the x86 native idle code by removing mwait_idle(),
and the "idle=mwait" bootparam used to invoke it.

Since Linux 3.0 there has been a boot-time warning when "idle=mwait"
was invoked saying it would be removed in 2012.  This removal
was also noted in the (now removed:-) feature-removal-schedule.txt.

After this change, kernels configured with
(CONFIG_ACPI=n && CONFIG_INTEL_IDLE=n) when run on hardware
that supports MWAIT will simply use HLT.  If MWAIT is desired
on those systems, cpuidle and the cpuidle drivers above
can be enabled.

Signed-off-by: Len Brown <len.brown@intel.com>
Cc: x86@kernel.org
2013-02-10 03:03:41 -05:00
Len Brown 6a377ddc4e xen idle: make xen-specific macro xen-specific
This macro is only invoked by Xen,
so make its definition specific to Xen.

> set_pm_idle_to_default()
< xen_set_default_idle()

Signed-off-by: Len Brown <len.brown@intel.com>
Cc: xen-devel@lists.xensource.com
2013-02-10 01:06:34 -05:00
Paul Gortmaker 43720bd601 PM / tracing: remove deprecated power trace API
The text in Documentation said it would be removed in 2.6.41;
the text in the Kconfig said removal in the 3.1 release.  Either
way you look at it, we are well past both, so push it off a cliff.

Note that the POWER_CSTATE and the POWER_PSTATE are part of the
legacy tracing API.  Remove all tracepoints which use these flags.
As can be seen from context, most already have a trace entry via
trace_cpu_idle anyways.

Also, the cpufreq/cpufreq.c PSTATE one is actually unpaired, as
compared to the CSTATE ones which all have a clear start/stop.
As part of this, the trace_power_frequency also becomes orphaned,
so it too is deleted.

Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-01-26 00:39:12 +01:00
Linus Torvalds 9977d9b379 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/signal
Pull big execve/kernel_thread/fork unification series from Al Viro:
 "All architectures are converted to new model.  Quite a bit of that
  stuff is actually shared with architecture trees; in such cases it's
  literally shared branch pulled by both, not a cherry-pick.

  A lot of ugliness and black magic is gone (-3KLoC total in this one):

   - kernel_thread()/kernel_execve()/sys_execve() redesign.

     We don't do syscalls from kernel anymore for either kernel_thread()
     or kernel_execve():

     kernel_thread() is essentially clone(2) with callback run before we
     return to userland, the callbacks either never return or do
     successful do_execve() before returning.

     kernel_execve() is a wrapper for do_execve() - it doesn't need to
     do transition to user mode anymore.

     As a result kernel_thread() and kernel_execve() are
     arch-independent now - they live in kernel/fork.c and fs/exec.c
     resp.  sys_execve() is also in fs/exec.c and it's completely
     architecture-independent.

   - daemonize() is gone, along with its parts in fs/*.c

   - struct pt_regs * is no longer passed to do_fork/copy_process/
     copy_thread/do_execve/search_binary_handler/->load_binary/do_coredump.

   - sys_fork()/sys_vfork()/sys_clone() unified; some architectures
     still need wrappers (ones with callee-saved registers not saved in
     pt_regs on syscall entry), but the main part of those suckers is in
     kernel/fork.c now."

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/signal: (113 commits)
  do_coredump(): get rid of pt_regs argument
  print_fatal_signal(): get rid of pt_regs argument
  ptrace_signal(): get rid of unused arguments
  get rid of ptrace_signal_deliver() arguments
  new helper: signal_pt_regs()
  unify default ptrace_signal_deliver
  flagday: kill pt_regs argument of do_fork()
  death to idle_regs()
  don't pass regs to copy_process()
  flagday: don't pass regs to copy_thread()
  bfin: switch to generic vfork, get rid of pointless wrappers
  xtensa: switch to generic clone()
  openrisc: switch to use of generic fork and clone
  unicore32: switch to generic clone(2)
  score: switch to generic fork/vfork/clone
  c6x: sanitize copy_thread(), get rid of clone(2) wrapper, switch to generic clone()
  take sys_fork/sys_vfork/sys_clone prototypes to linux/syscalls.h
  mn10300: switch to generic fork/vfork/clone
  h8300: switch to generic fork/vfork/clone
  tile: switch to generic clone()
  ...

Conflicts:
	arch/microblaze/include/asm/Kbuild
2012-12-12 12:22:13 -08:00
Al Viro 1d4b4b2994 x86, um: switch to generic fork/vfork/clone
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-11-28 22:13:44 -05:00
Daniel Lezcano 4d0e42cc66 x86: Remove dead hlt_use_halt code
The hlt_use_halt function returns always true and there is only
one definition of it.

The default_idle function can then get ride of the if ...
statement and we can remove the else branch.

Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Cc: linaro-dev@lists.linaro.org
Cc: patches@linaro.org
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1351181591-8710-1-git-send-email-daniel.lezcano@linaro.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-10-26 12:25:03 +02:00
Linus Torvalds 42859eea96 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/signal
Pull generic execve() changes from Al Viro:
 "This introduces the generic kernel_thread() and kernel_execve()
  functions, and switches x86, arm, alpha, um and s390 over to them."

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/signal: (26 commits)
  s390: convert to generic kernel_execve()
  s390: switch to generic kernel_thread()
  s390: fold kernel_thread_helper() into ret_from_fork()
  s390: fold execve_tail() into start_thread(), convert to generic sys_execve()
  um: switch to generic kernel_thread()
  x86, um/x86: switch to generic sys_execve and kernel_execve
  x86: split ret_from_fork
  alpha: introduce ret_from_kernel_execve(), switch to generic kernel_execve()
  alpha: switch to generic kernel_thread()
  alpha: switch to generic sys_execve()
  arm: get rid of execve wrapper, switch to generic execve() implementation
  arm: optimized current_pt_regs()
  arm: introduce ret_from_kernel_execve(), switch to generic kernel_execve()
  arm: split ret_from_fork, simplify kernel_thread() [based on patch by rmk]
  generic sys_execve()
  generic kernel_execve()
  new helper: current_pt_regs()
  preparation for generic kernel_thread()
  um: kill thread->forking
  um: let signal_delivered() do SIGTRAP on singlestepping into handler
  ...
2012-10-10 12:02:25 +09:00
Al Viro 6783eaa2e1 x86, um/x86: switch to generic sys_execve and kernel_execve
32bit wrapper is lost on that; 64bit one is *not*, since
we need to arrange for full pt_regs on stack when we call
sys_execve() and we need to load callee-saved ones from
there afterwards.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-09-30 22:53:32 -04:00
Al Viro 7076aada10 x86: split ret_from_fork
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-09-30 22:53:31 -04:00
Al Viro e76623d694 x86: get rid of TIF_IRET hackery
TIF_NOTIFY_RESUME will work in precisely the same way; all that
is achieved by TIF_IRET is appearing that there's some work to be
done, so we end up on the iret exit path.  Just use NOTIFY_RESUME.
And for execve() do that in 32bit start_thread(), not sys_execve()
itself.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-09-20 09:50:17 -04:00
Suresh Siddha 5d2bd7009f x86, fpu: decouple non-lazy/eager fpu restore from xsave
Decouple non-lazy/eager fpu restore policy from the existence of the xsave
feature. Introduce a synthetic CPUID flag to represent the eagerfpu
policy. "eagerfpu=on" boot paramter will enable the policy.

Requested-by: H. Peter Anvin <hpa@zytor.com>
Requested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Link: http://lkml.kernel.org/r/1347300665-6209-2-git-send-email-suresh.b.siddha@intel.com
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2012-09-18 15:52:22 -07:00
Suresh Siddha 304bceda6a x86, fpu: use non-lazy fpu restore for processors supporting xsave
Fundamental model of the current Linux kernel is to lazily init and
restore FPU instead of restoring the task state during context switch.
This changes that fundamental lazy model to the non-lazy model for
the processors supporting xsave feature.

Reasons driving this model change are:

i. Newer processors support optimized state save/restore using xsaveopt and
xrstor by tracking the INIT state and MODIFIED state during context-switch.
This is faster than modifying the cr0.TS bit which has serializing semantics.

ii. Newer glibc versions use SSE for some of the optimized copy/clear routines.
With certain workloads (like boot, kernel-compilation etc), application
completes its work with in the first 5 task switches, thus taking upto 5 #DNA
traps with the kernel not getting a chance to apply the above mentioned
pre-load heuristic.

iii. Some xstate features (like AMD's LWP feature) don't honor the cr0.TS bit
and thus will not work correctly in the presence of lazy restore. Non-lazy
state restore is needed for enabling such features.

Some data on a two socket SNB system:
 * Saved 20K DNA exceptions during boot on a two socket SNB system.
 * Saved 50K DNA exceptions during kernel-compilation workload.
 * Improved throughput of the AVX based checksumming function inside the
   kernel by ~15% as xsave/xrstor is faster than the serializing clts/stts
   pair.

Also now kernel_fpu_begin/end() relies on the patched
alternative instructions. So move check_fpu() which uses the
kernel_fpu_begin/end() after alternative_instructions().

Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Link: http://lkml.kernel.org/r/1345842782-24175-7-git-send-email-suresh.b.siddha@intel.com
Merge 32-bit boot fix from,
Link: http://lkml.kernel.org/r/1347300665-6209-4-git-send-email-suresh.b.siddha@intel.com
Cc: Jim Kukunas <james.t.kukunas@linux.intel.com>
Cc: NeilBrown <neilb@suse.de>
Cc: Avi Kivity <avi@redhat.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2012-09-18 15:52:11 -07:00
Suresh Siddha 72a671ced6 x86, fpu: Unify signal handling code paths for x86 and x86_64 kernels
Currently for x86 and x86_32 binaries, fpstate in the user sigframe is copied
to/from the fpstate in the task struct.

And in the case of signal delivery for x86_64 binaries, if the fpstate is live
in the CPU registers, then the live state is copied directly to the user
sigframe. Otherwise  fpstate in the task struct is copied to the user sigframe.
During restore, fpstate in the user sigframe is restored directly to the live
CPU registers.

Historically, different code paths led to different bugs. For example,
x86_64 code path was not preemption safe till recently. Also there is lot
of code duplication for support of new features like xsave etc.

Unify signal handling code paths for x86 and x86_64 kernels.

New strategy is as follows:

Signal delivery: Both for 32/64-bit frames, align the core math frame area to
64bytes as needed by xsave (this where the main fpu/extended state gets copied
to and excludes the legacy compatibility fsave header for the 32-bit [f]xsave
frames). If the state is live, copy the register state directly to the user
frame. If not live, copy the state in the thread struct to the user frame. And
for 32-bit [f]xsave frames, construct the fsave header separately before
the actual [f]xsave area.

Signal return: As the 32-bit frames with [f]xstate has an additional
'fsave' header, copy everything back from the user sigframe to the
fpstate in the task structure and reconstruct the fxstate from the 'fsave'
header (Also user passed pointers may not be correctly aligned for
any attempt to directly restore any partial state). At the next fpstate usage,
everything will be restored to the live CPU registers.
For all the 64-bit frames and the 32-bit fsave frame, restore the state from
the user sigframe directly to the live CPU registers. 64-bit signals always
restored the math frame directly, so we can expect the math frame pointer
to be correctly aligned. For 32-bit fsave frames, there are no alignment
requirements, so we can restore the state directly.

"lat_sig catch" microbenchmark numbers (for x86, x86_64, x86_32 binaries) are
with in the noise range with this change.

Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Link: http://lkml.kernel.org/r/1343171129-2747-4-git-send-email-suresh.b.siddha@intel.com
[ Merged in compilation fix ]
Link: http://lkml.kernel.org/r/1344544736.8326.17.camel@sbsiddha-desk.sc.intel.com
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2012-09-18 15:51:48 -07:00
Joe Perches c767a54ba0 x86/debug: Add KERN_<LEVEL> to bare printks, convert printks to pr_<level>
Use a more current logging style:

 - Bare printks should have a KERN_<LEVEL> for consistency's sake
 - Add pr_fmt where appropriate
 - Neaten some macro definitions
 - Convert some Ok output to OK
 - Use "%s: ", __func__ in pr_fmt for summit
 - Convert some printks to pr_<level>

Message output is not identical in all cases.

Signed-off-by: Joe Perches <joe@perches.com>
Cc: levinsasha928@gmail.com
Link: http://lkml.kernel.org/r/1337655007.24226.10.camel@joe2Laptop
[ merged two similar patches, tidied up the changelog ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-06-06 09:17:22 +02:00
Linus Torvalds ec0d7f18ab Merge branch 'x86-fpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull fpu state cleanups from Ingo Molnar:
 "This tree streamlines further aspects of FPU handling by eliminating
  the prepare_to_copy() complication and moving that logic to
  arch_dup_task_struct().

  It also fixes the FPU dumps in threaded core dumps, removes and old
  (and now invalid) assumption plus micro-optimizes the exit path by
  avoiding an FPU save for dead tasks."

Fixed up trivial add-add conflict in arch/sh/kernel/process.c that came
in because we now do the FPU handling in arch_dup_task_struct() rather
than the legacy (and now gone) prepare_to_copy().

* 'x86-fpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86, fpu: drop the fpu state during thread exit
  x86, xsave: remove thread_has_fpu() bug check in __sanitize_i387_state()
  coredump: ensure the fpu state is flushed for proper multi-threaded core dump
  fork: move the real prepare_to_copy() users to arch_dup_task_struct()
2012-05-23 10:59:07 -07:00
Linus Torvalds 19bec32d7f Merge branches 'x86-asm-for-linus', 'x86-cleanups-for-linus', 'x86-cpu-for-linus', 'x86-debug-for-linus' and 'x86-microcode-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull initial trivial x86 stuff from Ingo Molnar.

Various random cleanups and trivial fixes.

* 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86-64: Eliminate dead ia32 syscall handlers

* 'x86-cleanups-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/pci-calgary_64.c: Remove obsoleted simple_strtoul() usage
  x86: Don't continue booting if we can't load the specified initrd
  x86: kernel/dumpstack.c simple_strtoul cleanup
  x86: kernel/check.c simple_strtoul cleanup
  debug: Add CONFIG_READABLE_ASM
  x86: spinlock.h: Remove REG_PTR_MODE

* 'x86-cpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/cache_info: Fix setup of l2/l3 ids

* 'x86-debug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86: Avoid double stack traces with show_regs()

* 'x86-microcode-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86, microcode: microcode_core.c simple_strtoul cleanup
2012-05-23 10:09:50 -07:00
Linus Torvalds d79ee93de9 Merge branch 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler changes from Ingo Molnar:
 "The biggest change is the cleanup/simplification of the load-balancer:
  instead of the current practice of architectures twiddling scheduler
  internal data structures and providing the scheduler domains in
  colorfully inconsistent ways, we now have generic scheduler code in
  kernel/sched/core.c:sched_init_numa() that looks at the architecture's
  node_distance() parameters and (while not fully trusting it) deducts a
  NUMA topology from it.

  This inevitably changes balancing behavior - hopefully for the better.

  There are various smaller optimizations, cleanups and fixlets as well"

* 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  sched: Taint kernel with TAINT_WARN after sleep-in-atomic bug
  sched: Remove stale power aware scheduling remnants and dysfunctional knobs
  sched/debug: Fix printing large integers on 32-bit platforms
  sched/fair: Improve the ->group_imb logic
  sched/nohz: Fix rq->cpu_load[] calculations
  sched/numa: Don't scale the imbalance
  sched/fair: Revert sched-domain iteration breakage
  sched/x86: Rewrite set_cpu_sibling_map()
  sched/numa: Fix the new NUMA topology bits
  sched/numa: Rewrite the CONFIG_NUMA sched domain support
  sched/fair: Propagate 'struct lb_env' usage into find_busiest_group
  sched/fair: Add some serialization to the sched_domain load-balance walk
  sched/fair: Let minimally loaded cpu balance the group
  sched: Change rq->nr_running to unsigned int
  x86/numa: Check for nonsensical topologies on real hw as well
  x86/numa: Hard partition cpu topology masks on node boundaries
  x86/numa: Allow specifying node_distance() for numa=fake
  x86/sched: Make mwait_usable() heed to "idle=" kernel parameters properly
  sched: Update documentation and comments
  sched_rt: Avoid unnecessary dequeue and enqueue of pushable tasks in set_cpus_allowed_rt()
2012-05-22 18:27:32 -07:00
Linus Torvalds f5c101892f Merge branch 'for-3.5' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu
Pull percpu updates from Tejun Heo:
 "Contains Alex Shi's three patches to remove percpu_xxx() which overlap
  with this_cpu_xxx().  There shouldn't be any functional change."

* 'for-3.5' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu:
  percpu: remove percpu_xxx() functions
  x86: replace percpu_xxx funcs with this_cpu_xxx
  net: replace percpu_xxx funcs with this_cpu_xxx or __this_cpu_xxx
2012-05-22 17:37:47 -07:00
Suresh Siddha 1dcc8d7ba2 x86, fpu: drop the fpu state during thread exit
There is no need to save any active fpu state to the task structure
memory if the task is dead. Just drop the state instead.

For example, this saved some 1770 xsave's during the system boot
of a two socket Xeon system.

Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Link: http://lkml.kernel.org/r/1336692811-30576-4-git-send-email-suresh.b.siddha@intel.com
Cc: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2012-05-16 15:20:59 -07:00
Suresh Siddha 55ccf3fe3f fork: move the real prepare_to_copy() users to arch_dup_task_struct()
Historical prepare_to_copy() is mostly a no-op, duplicated for majority of
the architectures and the rest following the x86 model of flushing the extended
register state like fpu there.

Remove it and use the arch_dup_task_struct() instead.

Suggested-by: Oleg Nesterov <oleg@redhat.com>
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Link: http://lkml.kernel.org/r/1336692811-30576-1-git-send-email-suresh.b.siddha@intel.com
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: David Howells <dhowells@redhat.com>
Cc: Koichi Yasutake <yasutake.koichi@jp.panasonic.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: Chris Zankel <chris@zankel.net>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Haavard Skinnemoen <hskinnemoen@gmail.com>
Cc: Mike Frysinger <vapier@gentoo.org>
Cc: Mark Salter <msalter@redhat.com>
Cc: Aurelien Jacquiot <a-jacquiot@ti.com>
Cc: Mikael Starvik <starvik@axis.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: Richard Kuo <rkuo@codeaurora.org>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Jonas Bonn <jonas@southpole.se>
Cc: James E.J. Bottomley <jejb@parisc-linux.org>
Cc: Helge Deller <deller@gmx.de>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Chen Liqin <liqin.chen@sunplusct.com>
Cc: Lennox Wu <lennox.wu@gmail.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Chris Metcalf <cmetcalf@tilera.com>
Cc: Jeff Dike <jdike@addtoit.com>
Cc: Richard Weinberger <richard@nod.at>
Cc: Guan Xuetao <gxt@mprc.pku.edu.cn>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2012-05-16 15:16:26 -07:00
Alex Shi c6ae41e7d4 x86: replace percpu_xxx funcs with this_cpu_xxx
Since percpu_xxx() serial functions are duplicated with this_cpu_xxx().
Removing percpu_xxx() definition and replacing them by this_cpu_xxx()
in code. There is no function change in this patch, just preparation for
later percpu_xxx serial function removing.

On x86 machine the this_cpu_xxx() serial functions are same as
__this_cpu_xxx() without no unnecessary premmpt enable/disable.

Thanks for Stephen Rothwell, he found and fixed a i386 build error in
the patch.

Also thanks for Andrew Morton, he kept updating the patchset in Linus'
tree.

Signed-off-by: Alex Shi <alex.shi@intel.com>
Acked-by: Christoph Lameter <cl@gentwo.org>
Acked-by: Tejun Heo <tj@kernel.org>
Acked-by: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Tejun Heo <tj@kernel.org>
2012-05-14 14:15:31 -07:00
Jan Beulich 57da8b960b x86: Avoid double stack traces with show_regs()
What was called show_registers() so far already showed a stack
trace for kernel faults, and kernel_stack_pointer() isn't even
valid to be used for faults from user mode, hence it was
pointless for show_regs() to call show_trace() after
show_registers().

Simply rename show_registers() to show_regs() and eliminate
the old definition.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Link: http://lkml.kernel.org/r/4FAA3D3902000078000826E1@nat28.tlf.novell.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-05-09 11:44:42 +02:00
Thomas Gleixner 38e7c572ce x86: Use common threadinfo allocator
The only difference is the free_thread_info function, which frees
xstate.

Use the new arch_release_task_struct() function instead and switch
over to the core allocator.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20120505150141.559556763@linutronix.de
Cc: x86@kernel.org
2012-05-08 14:08:44 +02:00
Thomas Gleixner 85f7f65627 x86: Use kick_all_cpus_sync()
Use kick_all_cpus_sync() and remove cpu_idle_wait().

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20120507175652.190382227@linutronix.de
Cc: x86@kernel.org
2012-05-08 12:35:06 +02:00
Srivatsa S. Bhat 19209bbb86 x86/sched: Make mwait_usable() heed to "idle=" kernel parameters properly
The checks that exist in mwait_usable() for "idle=" kernel
parameters are insufficient. As a result, mwait_usable() can
return 1 even if "idle=nomwait" or "idle=poll" or "idle=halt"
parameters are passed.

Of these cases, incorrect handling of idle=nomwait is a
universal problem since mwait can get used for usual CPU idling.
However the rest of the cases are problematic only during CPU
Hotplug (offline) because, in the CPU offline path, the function
mwait_play_dead() is called, which might result in mwait being
used in the offline CPUs, if mwait_usable() happens to return 1.

Fix these issues by checking for the boot time "idle=" kernel
parameter properly in mwait_usable().

The first issue (usual cpu idling) is demonstrated below:

Before applying the patch (dmesg snippet):

 [    0.000000] Command line: [...] idle=nomwait
 [    0.000000] Kernel command line: [...] idle=nomwait
 [    0.000000]  RCU dyntick-idle grace-period acceleration is enabled.
 [    0.140606] using mwait in idle threads.  <======= mwait being used
 [    4.303986] cpuidle: using governor ladder
 [    4.308232] cpuidle: using governor menu

After applying the patch:

 [    0.000000] Command line: [...] idle=nomwait
 [    0.000000] Kernel command line: [...] idle=nomwait
 [    0.000000]  RCU dyntick-idle grace-period acceleration is enabled.
 [    4.264100] cpuidle: using governor ladder
 [    4.268342] cpuidle: using governor menu

Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Acked-by: Deepthi Dharwar <deepthi@linux.vnet.ibm.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Cc: venki@google.com
Cc: suresh.b.siddha@intel.com
Cc: Borislav Petkov <bp@amd64.org>
Cc: lenb@kernel.org
Cc: Rafael J. Wysocki <rjw@sisk.pl>
Link: http://lkml.kernel.org/r/4F9E37B8.30001@linux.vnet.ibm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-05-07 16:27:20 +02:00
Thomas Gleixner 45046892ef x86: Use generic init_task
Same code. Use the generic version. The special Makefile treatment is
pointless anyway as init_task.o contains only data which is handled by
the linker script. So no point on being treated like head text.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20120503085035.739963562@linutronix.de
Cc: x86@kernel.org
2012-05-05 13:00:26 +02:00
Len Brown f6365201d8 x86: Remove the ancient and deprecated disable_hlt() and enable_hlt() facility
The X86_32-only disable_hlt/enable_hlt mechanism was used by the
32-bit floppy driver. Its effect was to replace the use of the
HLT instruction inside default_idle() with cpu_relax() - essentially
it turned off the use of HLT.

This workaround was commented in the code as:

 "disable hlt during certain critical i/o operations"

 "This halt magic was a workaround for ancient floppy DMA
  wreckage. It should be safe to remove."

H. Peter Anvin additionally adds:

 "To the best of my knowledge, no-hlt only existed because of
  flaky power distributions on 386/486 systems which were sold to
  run DOS.  Since DOS did no power management of any kind,
  including HLT, the power draw was fairly uniform; when exposed
  to the much hhigher noise levels you got when Linux used HLT
  caused some of these systems to fail.

  They were by far in the minority even back then."

Alan Cox further says:

 "Also for the Cyrix 5510 which tended to go castors up if a HLT
  occurred during a DMA cycle and on a few other boxes HLT during
  DMA tended to go astray.

  Do we care ? I doubt it. The 5510 was pretty obscure, the 5520
  fixed it, the 5530 is probably the oldest still in any kind of
  use."

So, let's finally drop this.

Signed-off-by: Len Brown <len.brown@intel.com>
Signed-off-by: Josh Boyer <jwboyer@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: "H. Peter Anvin" <hpa@zytor.com>
Acked-by: Alan Cox <alan@lxorguk.ukuu.org.uk>
Cc: Stephen Hemminger <shemminger@vyatta.com
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: <stable@kernel.org>
Link: http://lkml.kernel.org/n/tip-3rhk9bzf0x9rljkv488tloib@git.kernel.org
[ If anyone cares then alternative instruction patching could be
  used to replace HLT with a one-byte NOP instruction. Much simpler. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-03-30 08:50:27 +02:00
Linus Torvalds 6b8212a313 Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 updates from Ingo Molnar.

This touches some non-x86 files due to the sanitized INLINE_SPIN_UNLOCK
config usage.

Fixed up trivial conflicts due to just header include changes (removing
headers due to cpu_idle() merge clashing with the <asm/system.h> split).

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/apic/amd: Be more verbose about LVT offset assignments
  x86, tls: Off by one limit check
  x86/ioapic: Add io_apic_ops driver layer to allow interception
  x86/olpc: Add debugfs interface for EC commands
  x86: Merge the x86_32 and x86_64 cpu_idle() functions
  x86/kconfig: Remove CONFIG_TR=y from the defconfigs
  x86: Stop recursive fault in print_context_stack after stack overflow
  x86/io_apic: Move and reenable irq only when CONFIG_GENERIC_PENDING_IRQ=y
  x86/apic: Add separate apic_id_valid() functions for selected apic drivers
  locking/kconfig: Simplify INLINE_SPIN_UNLOCK usage
  x86/kconfig: Update defconfigs
  x86: Fix excessive MSR print out when show_msr is not specified
2012-03-29 14:28:26 -07:00
David Howells f05e798ad4 Disintegrate asm/system.h for X86
Disintegrate asm/system.h for X86.

Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: H. Peter Anvin <hpa@zytor.com>
cc: x86@kernel.org
2012-03-28 18:11:12 +01:00
Richard Weinberger 90e240142b x86: Merge the x86_32 and x86_64 cpu_idle() functions
Both functions are mostly identical.
The differences are:

- x86_32's cpu_idle() makes use of check_pgt_cache(), which is a
  nop on both x86_32 and x86_64.

- x86_64's cpu_idle() uses enter/__exit_idle/(), on x86_32 these
  function are a nop.

- In contrast to x86_32, x86_64 calls rcu_idle_enter/exit() in
  the innermost loop because idle notifications need RCU.
  Calling these function on x86_32 also in the innermost loop
  does not hurt.

So we can merge both functions.

Signed-off-by: Richard Weinberger <richard@nod.at>
Acked-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: paulmck@linux.vnet.ibm.com
Cc: josh@joshtriplett.org
Cc: tj@kernel.org
Link: http://lkml.kernel.org/r/1332709204-22496-1-git-send-email-richard@nod.at
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-03-26 03:16:07 +02:00
Linus Torvalds 35cb8d9e18 Merge branch 'x86-fpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86/fpu changes from Ingo Molnar.

* 'x86-fpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  i387: Split up <asm/i387.h> into exported and internal interfaces
  i387: Uninline the generic FP helpers that we expose to kernel modules
2012-03-22 09:41:22 -07:00
Linus Torvalds 1361b83a13 i387: Split up <asm/i387.h> into exported and internal interfaces
While various modules include <asm/i387.h> to get access to things we
actually *intend* for them to use, most of that header file was really
pretty low-level internal stuff that we really don't want to expose to
others.

So split the header file into two: the small exported interfaces remain
in <asm/i387.h>, while the internal definitions that are only used by
core architecture code are now in <asm/fpu-internal.h>.

The guiding principle for this was to expose functions that we export to
modules, and leave them in <asm/i387.h>, while stuff that is used by
task switching or was marked GPL-only is in <asm/fpu-internal.h>.

The fpu-internal.h file could be further split up too, especially since
arch/x86/kvm/ uses some of the remaining stuff for its module.  But that
kvm usage should probably be abstracted out a bit, and at least now the
internal FPU accessor functions are much more contained.  Even if it
isn't perhaps as contained as it _could_ be.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/alpine.LFD.2.02.1202211340330.5354@i5.linux-foundation.org
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2012-02-21 14:12:54 -08:00
Steven Rostedt 484546509c x86/tracing: Denote the power and cpuidle tracepoints as _rcuidle()
The power and cpuidle tracepoints are called within a rcu_idle_exit()
section, and must be denoted with the _rcuidle() version of the tracepoint.

Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2012-02-13 09:14:43 -05:00
Seiichi Ikarashi 1cf8343f55 x86: Fix rflags in FAKE_STACK_FRAME
The x86_64 kernel pushes the fake kernel stack in
arch/x86/kernel/entry_64.S:FAKE_STACK_FRAME, and
rflags register in it does not conform to the specification.

Although Intel's manual[1] says bit 1 of it shall be set to 1,
this bit is cleared to 0 on pushing the fake stack.

[1] Intel(R) 64 and IA-32 Architectures Software Developer's Manual
    Vol.1 3-21 Figure 3-8. EFLAGS Register

If it is not on purpose, it is better to be fixed, because
it can lead some tools misunderstanding the stack frame. For example,
"crash" utility[2] actually detects it and warns you like
below:

       RIP: ffffffff8005dfa2  RSP: ffff8104ce0c7f58  RFLAGS: 00000200
       [...]

       bt: WARNING: possibly bogus exception frame

Signed-off-by: Seiichi Ikarashi <s.ikarashi@jp.fujitsu.com>
Tested-by: Masayoshi MIZUMA <m.mizuma@jp.fujitsu.com>
Cc: Jan Beulich <JBeulich@suse.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-12-06 10:02:38 +01:00
Konrad Rzeszutek Wilk e5fd47bfab xen/pm_idle: Make pm_idle be default_idle under Xen.
The idea behind commit d91ee5863b ("cpuidle: replace xen access to x86
pm_idle and default_idle") was to have one call - disable_cpuidle()
which would make pm_idle not be molested by other code.  It disallows
cpuidle_idle_call to be set to pm_idle (which is excellent).

But in the select_idle_routine() and idle_setup(), the pm_idle can still
be set to either: amd_e400_idle, mwait_idle or default_idle.  This
depends on some CPU flags (MWAIT) and in AMD case on the type of CPU.

In case of mwait_idle we can hit some instances where the hypervisor
(Amazon EC2 specifically) sets the MWAIT and we get:

  Brought up 2 CPUs
  invalid opcode: 0000 [#1] SMP

  Pid: 0, comm: swapper Not tainted 3.1.0-0.rc6.git0.3.fc16.x86_64 #1
  RIP: e030:[<ffffffff81015d1d>]  [<ffffffff81015d1d>] mwait_idle+0x6f/0xb4
  ...
  Call Trace:
   [<ffffffff8100e2ed>] cpu_idle+0xae/0xe8
   [<ffffffff8149ee78>] cpu_bringup_and_idle+0xe/0x10
  RIP  [<ffffffff81015d1d>] mwait_idle+0x6f/0xb4
   RSP <ffff8801d28ddf10>

In the case of amd_e400_idle we don't get so spectacular crashes, but we
do end up making an MSR which is trapped in the hypervisor, and then
follow it up with a yield hypercall.  Meaning we end up going to
hypervisor twice instead of just once.

The previous behavior before v3.0 was that pm_idle was set to
default_idle regardless of select_idle_routine/idle_setup.

We want to do that, but only for one specific case: Xen.  This patch
does that.

Fixes RH BZ #739499 and Ubuntu #881076
Reported-by: Stefan Bader <stefan.bader@canonical.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-12-03 10:49:58 -08:00
Zhao Jin c812d8f7ad x86, mm, trivial: Remove unnecessary get_order() in free_thread_info()
Because THREAD_SIZE is defined as PAGE_SIZE << THREAD_ORDER on x86, the
call of get_order(THREAD_SIZE) can be replaced with THREAD_ORDER.

Signed-off-by: Zhao Jin <cronozhj@gmail.com>
Link: http://lkml.kernel.org/r/4E4FB5A9.700@gmail.com
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-08-23 21:49:35 +02:00
Len Brown 4bfc8288bc x86 idle: move mwait_idle_with_hints() to where it is used
...and make it static

no functional change

cc: x86@kernel.org
Acked-by: H. Peter Anvin <hpa@linux.intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
2011-08-03 19:06:36 -04:00
Andy Whitcroft 60b8b1de0d x86 idle: APM requires pm_idle/default_idle unconditionally when a module
[ Also from Ben Hutchings <ben@decadent.org.uk> and Vitaliy Ivanov
  <vitalivanov@gmail.com> ]

Commit 06ae40ce07 ("x86 idle: EXPORT_SYMBOL(default_idle, pm_idle)
only when APM demands it") removed the export for pm_idle/default_idle
unless the apm module was modularised and CONFIG_APM_CPU_IDLE was set.

But the apm module uses pm_idle/default_idle unconditionally,
CONFIG_APM_CPU_IDLE only affects the bios idle threshold.  Adjust the
export accordingly.

[ Used #ifdef instead of #if defined() as it's shorter, and what both
  Ben and Vitaliy used.. Andy, you're out-voted ;)    - Linus ]

Reported-by: Randy Dunlap <randy.dunlap@oracle.com>
Acked-by: Jiri Kosina <jkosina@suse.cz>
Acked-by: Ingo Molnar <mingo@elte.hu>
Acked-by: Len Brown <len.brown@intel.com>
Signed-off-by: Andy Whitcroft <apw@canonical.com>
Signed-off-by: Vitaliy Ivanov <vitalivanov@gmail.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-06-14 13:42:20 -07:00
Linus Torvalds af0d6a0a3a Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86: Fix mwait_play_dead() faulting on mwait-incapable cpus
  x86 idle: Fix mwait deprecation warning message

Evil merge to remove extra quote noticed by Joe Perches
2011-06-01 02:07:22 +09:00
Borislav Petkov 598e887d8b x86 idle: Fix mwait deprecation warning message
Fix:

  arch/x86/kernel/process.c:645:1: warning: unknown escape sequence '\i'

due to missing escape backslash, introduced by this commit:

  5d4c47e0195b: x86 idle: deprecate mwait_idle() and "idle=mwait" cmdline param

Signed-off-by: Borislav Petkov <bp@alien8.de>
Cc: Len Brown <len.brown@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/1306748286-24701-1-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-05-30 13:02:04 +02:00
Linus Torvalds f310642123 Merge branch 'idle-release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-idle-2.6
* 'idle-release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-idle-2.6:
  x86 idle: deprecate mwait_idle() and "idle=mwait" cmdline param
  x86 idle: deprecate "no-hlt" cmdline param
  x86 idle APM: deprecate CONFIG_APM_CPU_IDLE
  x86 idle floppy: deprecate disable_hlt()
  x86 idle: EXPORT_SYMBOL(default_idle, pm_idle) only when APM demands it
  x86 idle: clarify AMD erratum 400 workaround
  idle governor: Avoid lock acquisition to read pm_qos before entering idle
  cpuidle: menu: fixed wrapping timers at 4.294 seconds
2011-05-29 11:18:09 -07:00
Len Brown 5d4c47e019 x86 idle: deprecate mwait_idle() and "idle=mwait" cmdline param
mwait_idle() is a C1-only idle loop intended to be more efficient
than HLT on SMP hardware that supports it.

But mwait_idle() has been replaced by the more general
mwait_idle_with_hints(), which handles both C1 and deeper C-states.
ACPI uses only mwait_idle_with_hints(), and never uses mwait_idle().

Deprecate mwait_idle() and the "idle=mwait" cmdline param
to simplify the x86 idle code.

After this change, kernels configured with
(!CONFIG_ACPI=n && !CONFIG_INTEL_IDLE=n) when run on hardware
that support MWAIT will simply use HLT.  If MWAIT is desired
on those systems, cpuidle and the cpuidle drivers above
can be used.

cc: x86@kernel.org
cc: stable@kernel.org # .39.x
Signed-off-by: Len Brown <len.brown@intel.com>
2011-05-29 03:39:17 -04:00
Len Brown 06ae40ce07 x86 idle: EXPORT_SYMBOL(default_idle, pm_idle) only when APM demands it
In the long run, we don't want default_idle() or (pm_idle)() to
be exported outside of process.c.  Start by not exporting them
to modules, unless the APM build demands it.

cc: x86@kernel.org
cc: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Len Brown <len.brown@intel.com>
2011-05-29 03:39:14 -04:00
Len Brown 02c68a0201 x86 idle: clarify AMD erratum 400 workaround
The workaround for AMD erratum 400 uses the term "c1e" falsely suggesting:
1. Intel C1E is somehow involved
2. All AMD processors with C1E are involved

Use the string "amd_c1e" instead of simply "c1e" to clarify that
this workaround is specific to AMD's version of C1E.
Use the string "e400" to clarify that the workaround is specific
to AMD processors with Erratum 400.

This patch is text-substitution only, with no functional change.

cc: x86@kernel.org
Acked-by: Borislav Petkov <borislav.petkov@amd.com>
Signed-off-by: Len Brown <len.brown@intel.com>
2011-05-29 03:38:57 -04:00
Christoph Lameter 349c004e3d x86: A fast way to check capabilities of the current cpu
Add this_cpu_has() which determines if the current cpu has a certain
ability using a segment prefix and a bit test operation.

For that we need to add bit operations to x86s percpu.h.

Many uses of cpu_has use a pointer passed to a function to determine
the current flags. That is no longer necessary after this patch.

However, this patch only converts the straightforward cases where
cpu_has is used with this_cpu_ptr. The rest is work for later.

-tj: Rolled up patch to add x86_ prefix and use percpu_read() instead
     of percpu_read_stable().

Signed-off-by: Christoph Lameter <cl@linux.com>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Tejun Heo <tj@kernel.org>
2011-03-29 10:18:30 +02:00
Namhyung Kim e8e999cf3c x86, dumpstack: Correct stack dump info when frame pointer is available
Current stack dump code scans entire stack and check each entry
contains a pointer to kernel code. If CONFIG_FRAME_POINTER=y it
could mark whether the pointer is valid or not based on value of
the frame pointer. Invalid entries could be preceded by '?' sign.

However this was not going to happen because scan start point
was always higher than the frame pointer so that they could not
meet.

Commit 9c0729dc80 ("x86: Eliminate bp argument from the stack
tracing routines") delayed bp acquisition point, so the bp was
read in lower frame, thus all of the entries were marked
invalid.

This patch fixes this by reverting above commit while retaining
stack_frame() helper as suggested by Frederic Weisbecker.

End result looks like below:

before:

 [    3.508329] Call Trace:
 [    3.508551]  [<ffffffff814f35c9>] ? panic+0x91/0x199
 [    3.508662]  [<ffffffff814f3739>] ? printk+0x68/0x6a
 [    3.508770]  [<ffffffff81a981b2>] ? mount_block_root+0x257/0x26e
 [    3.508876]  [<ffffffff81a9821f>] ? mount_root+0x56/0x5a
 [    3.508975]  [<ffffffff81a98393>] ? prepare_namespace+0x170/0x1a9
 [    3.509216]  [<ffffffff81a9772b>] ? kernel_init+0x1d2/0x1e2
 [    3.509335]  [<ffffffff81003894>] ? kernel_thread_helper+0x4/0x10
 [    3.509442]  [<ffffffff814f6880>] ? restore_args+0x0/0x30
 [    3.509542]  [<ffffffff81a97559>] ? kernel_init+0x0/0x1e2
 [    3.509641]  [<ffffffff81003890>] ? kernel_thread_helper+0x0/0x10

after:

 [    3.522991] Call Trace:
 [    3.523351]  [<ffffffff814f35b9>] panic+0x91/0x199
 [    3.523468]  [<ffffffff814f3729>] ? printk+0x68/0x6a
 [    3.523576]  [<ffffffff81a981b2>] mount_block_root+0x257/0x26e
 [    3.523681]  [<ffffffff81a9821f>] mount_root+0x56/0x5a
 [    3.523780]  [<ffffffff81a98393>] prepare_namespace+0x170/0x1a9
 [    3.523885]  [<ffffffff81a9772b>] kernel_init+0x1d2/0x1e2
 [    3.523987]  [<ffffffff81003894>] kernel_thread_helper+0x4/0x10
 [    3.524228]  [<ffffffff814f6880>] ? restore_args+0x0/0x30
 [    3.524345]  [<ffffffff81a97559>] ? kernel_init+0x0/0x1e2
 [    3.524445]  [<ffffffff81003890>] ? kernel_thread_helper+0x0/0x10

 -v5:
   * fix build breakage with oprofile

 -v4:
   * use 0 instead of regs->bp
   * separate out printk changes

 -v3:
   * apply comment from Frederic
   * add a couple of printk fixes

Signed-off-by: Namhyung Kim <namhyung@gmail.com>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Soren Sandmann <ssp@redhat.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Robert Richter <robert.richter@amd.com>
LKML-Reference: <1300416006-3163-1-git-send-email-namhyung@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-03-18 10:51:42 +01:00
Jan Beulich fd8fa4d3dd x86: Combine printk()s in show_regs_common()
Printing a single character alone when there's an immediately
following printk() is pretty pointless (and wasteful).

Signed-off-by: Jan Beulich <jbeulich@novell.com>
LKML-Reference: <4D5D535A0200007800032730@vpn.id2.novell.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-02-18 08:52:30 +01:00
Naga Chumbalkar 84e383b322 x86, dmi, debug: Log board name (when present) in dmesg/oops output
The "Type 2" SMBIOS record that contains Board Name is not
strictly required and may be absent in the SMBIOS on some
platforms.

( Please note that Type 2 is not listed in Table 3 in Sec 6.2
  ("Required Structures and Data") of the SMBIOS v2.7
  Specification. )

Use the Manufacturer Name (aka System Vendor) name.
Print Board Name only when it is present.

Before the fix:
  (i) dmesg output: DMI: /ProLiant DL380 G6, BIOS P62 01/29/2011
 (ii) oops output:  Pid: 2170, comm: bash Not tainted 2.6.38-rc4+ #3 /ProLiant DL380 G6

After the fix:
  (i) dmesg output: DMI: HP ProLiant DL380 G6, BIOS P62 01/29/2011
 (ii) oops output:  Pid: 2278, comm: bash Not tainted 2.6.38-rc4+ #4 HP ProLiant DL380 G6

Signed-off-by: Naga Chumbalkar <nagananda.chumbalkar@hp.com>
Reviewed-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Cc: <stable@kernel.org> # .3x - good for debugging, please apply as far back as it applies cleanly
LKML-Reference: <20110214224423.2182.13929.sendpatchset@nchumbalkar.americas.hpqcorp.net>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-02-15 04:20:57 +01:00
Borislav Petkov 1c9d16e359 x86: Fix mwait_usable section mismatch
We use it in non __cpuinit code now too so drop marker.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
LKML-Reference: <20110211171754.GA21047@aftab>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-02-14 12:08:28 +01:00
Borislav Petkov 93789b32db x86, hotplug: Fix powersavings with offlined cores on AMD
ea53069231 made a CPU use monitor/mwait
when offline. This is not the optimal choice for AMD wrt to powersavings
and we'd prefer our cores to halt (i.e. enter C1) instead. For this, the
same selection whether to use monitor/mwait has to be used as when we
select the idle routine for the machine.

With this patch, offlining cores 1-5 on a X6 machine allows core0 to
boost again.

[ hpa: putting this in urgent since it is a (power) regression fix ]

Reported-by: Andreas Herrmann <andreas.herrmann3@amd.com>
Cc: stable@kernel.org # 37.x
Cc: H. Peter Anvin <hpa@linux.intel.com>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Len Brown <lenb@kernel.org>
Cc: Venkatesh Pallipadi <venki@google.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.hl>
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
LKML-Reference: <1295534572-10730-1-git-send-email-bp@amd64.org>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2011-01-21 18:14:54 -08:00
Len Brown 43952886f0 Merge branch 'cpuidle-perf-events' into idle-test 2011-01-12 18:06:19 -05:00
Len Brown 56dbed129d Merge branch 'linus' into idle-test 2011-01-12 18:06:06 -05:00
Thomas Renninger f77cfe4ea2 cpuidle/x86/perf: fix power:cpu_idle double end events and throw cpu_idle events from the cpuidle layer
Currently intel_idle and acpi_idle driver show double cpu_idle "exit idle"
events -> this patch fixes it and makes cpu_idle events throwing less complex.

It also introduces cpu_idle events for all architectures which use
the cpuidle subsystem, namely:
  - arch/arm/mach-at91/cpuidle.c
  - arch/arm/mach-davinci/cpuidle.c
  - arch/arm/mach-kirkwood/cpuidle.c
  - arch/arm/mach-omap2/cpuidle34xx.c
  - arch/drivers/acpi/processor_idle.c (for all cases, not only mwait)
  - arch/x86/kernel/process.c (did throw events before, but was a mess)
  - drivers/idle/intel_idle.c (did throw events before)

Convention should be:
Fire cpu_idle events inside the current pm_idle function (not somewhere
down the the callee tree) to keep things easy.

Current possible pm_idle functions in X86:
c1e_idle, poll_idle, cpuidle_idle_call, mwait_idle, default_idle
-> this is really easy is now.

This affects userspace:
The type field of the cpu_idle power event can now direclty get
mapped to:
/sys/devices/system/cpu/cpuX/cpuidle/stateX/{name,desc,usage,time,...}
instead of throwing very CPU/mwait specific values.
This change is not visible for the intel_idle driver.
For the acpi_idle driver it should only be visible if the vendor
misses out C-states in his BIOS.
Another (perf timechart) patch reads out cpuidle info of cpu_idle
events from:
/sys/.../cpuidle/stateX/*, then the cpuidle events are mapped
to the correct C-/cpuidle state again, even if e.g. vendors miss
out C-states in their BIOS and for example only export C1 and C3.
-> everything is fine.

Signed-off-by: Thomas Renninger <trenn@suse.de>
CC: Robert Schoene <robert.schoene@tu-dresden.de>
CC: Jean Pihet <j-pihet@ti.com>
CC: Arjan van de Ven <arjan@linux.intel.com>
CC: Ingo Molnar <mingo@elte.hu>
CC: Frederic Weisbecker <fweisbec@gmail.com>
CC: linux-pm@lists.linux-foundation.org
CC: linux-acpi@vger.kernel.org
CC: linux-kernel@vger.kernel.org
CC: linux-perf-users@vger.kernel.org
CC: linux-omap@vger.kernel.org
Signed-off-by: Len Brown <len.brown@intel.com>
2011-01-12 18:05:16 -05:00
Thomas Renninger d18960494f ACPI, intel_idle: Cleanup idle= internal variables
Having four variables for the same thing:
  idle_halt, idle_nomwait, force_mwait and boot_option_idle_overrides
is rather confusing and unnecessary complex.

if idle= boot param is passed, only set up one variable:
boot_option_idle_overrides

Introduces following functional changes/fixes:
  - intel_idle driver does not register if any idle=xy
    boot param is passed.
  - processor_idle.c will also not register a cpuidle driver
    and get active if idle=halt is passed.
    Before a cpuidle driver with one (C1, halt) state got registered
    Now the default_idle function will be used which finally uses
    the same idle call to enter sleep state (safe_halt()), but
    without registering a whole cpuidle driver.

That means idle= param will always avoid cpuidle drivers to register
with one exception (same behavior as before):
idle=nomwait
may still register acpi_idle cpuidle driver, but C1 will not use
mwait, but hlt. This can be a workaround for IO based deeper sleep
states where C1 mwait causes problems.

Signed-off-by: Thomas Renninger <trenn@suse.de>
cc: x86@kernel.org
Signed-off-by: Len Brown <len.brown@intel.com>
2011-01-12 12:47:30 -05:00
Linus Torvalds 72eb6a7914 Merge branch 'for-2.6.38' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu
* 'for-2.6.38' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu: (30 commits)
  gameport: use this_cpu_read instead of lookup
  x86: udelay: Use this_cpu_read to avoid address calculation
  x86: Use this_cpu_inc_return for nmi counter
  x86: Replace uses of current_cpu_data with this_cpu ops
  x86: Use this_cpu_ops to optimize code
  vmstat: User per cpu atomics to avoid interrupt disable / enable
  irq_work: Use per cpu atomics instead of regular atomics
  cpuops: Use cmpxchg for xchg to avoid lock semantics
  x86: this_cpu_cmpxchg and this_cpu_xchg operations
  percpu: Generic this_cpu_cmpxchg() and this_cpu_xchg support
  percpu,x86: relocate this_cpu_add_return() and friends
  connector: Use this_cpu operations
  xen: Use this_cpu_inc_return
  taskstats: Use this_cpu_ops
  random: Use this_cpu_inc_return
  fs: Use this_cpu_inc_return in buffer.c
  highmem: Use this_cpu_xx_return() operations
  vmstat: Use this_cpu_inc_return for vm statistics
  x86: Support for this_cpu_add, sub, dec, inc_return
  percpu: Generic support for this_cpu_add, sub, dec, inc_return
  ...

Fixed up conflicts: in arch/x86/kernel/{apic/nmi.c, apic/x2apic_uv_x.c, process.c}
as per Tejun.
2011-01-07 17:02:58 -08:00
Thomas Renninger 25e41933b5 perf: Clean up power events by introducing new, more generic ones
Add these new power trace events:

 power:cpu_idle
 power:cpu_frequency
 power:machine_suspend

The old C-state/idle accounting events:
  power:power_start
  power:power_end

Have now a replacement (but we are still keeping the old
tracepoints for compatibility):

  power:cpu_idle

and
  power:power_frequency

is replaced with:
  power:cpu_frequency

power:machine_suspend is newly introduced.

Jean Pihet has a patch integrated into the generic layer
(kernel/power/suspend.c) which will make use of it.

the type= field got removed from both, it was never
used and the type is differed by the event type itself.

perf timechart userspace tool gets adjusted in a separate patch.

Signed-off-by: Thomas Renninger <trenn@suse.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Acked-by: Arjan van de Ven <arjan@linux.intel.com>
Acked-by: Jean Pihet <jean.pihet@newoldbits.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: rjw@sisk.pl
LKML-Reference: <1294073445-14812-3-git-send-email-trenn@suse.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
LKML-Reference: <1290072314-31155-2-git-send-email-trenn@suse.de>
2011-01-04 08:16:54 +01:00
Tejun Heo 7b543a5334 x86: Replace uses of current_cpu_data with this_cpu ops
Replace all uses of current_cpu_data with this_cpu operations on the
per cpu structure cpu_info.  The scala accesses are replaced with the
matching this_cpu ops which results in smaller and more efficient
code.

In the long run, it might be a good idea to remove cpu_data() macro
too and use per_cpu macro directly.

tj: updated description

Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Ingo Molnar <mingo@elte.hu>
Acked-by: H. Peter Anvin <hpa@zytor.com>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Christoph Lameter <cl@linux.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2010-12-30 12:22:03 +01:00
Soeren Sandmann Pedersen 9c0729dc80 x86: Eliminate bp argument from the stack tracing routines
The various stack tracing routines take a 'bp' argument in which the
caller is supposed to provide the base pointer to use, or 0 if doesn't
have one. Since bp is garbage whenever CONFIG_FRAME_POINTER is not
defined, this means all callers in principle should either always pass
0, or be conditional on CONFIG_FRAME_POINTER.

However, there are only really three use cases for stack tracing:

(a) Trace the current task, including IRQ stack if any
(b) Trace the current task, but skip IRQ stack
(c) Trace some other task

In all cases, if CONFIG_FRAME_POINTER is not defined, bp should just
be 0.  If it _is_ defined, then

- in case (a) bp should be gotten directly from the CPU's register, so
  the caller should pass NULL for regs,

- in case (b) the caller should should pass the IRQ registers to
  dump_trace(),

- in case (c) bp should be gotten from the top of the task's stack, so
  the caller should pass NULL for regs.

Hence, the bp argument is not necessary because the combination of
task and regs is sufficient to determine an appropriate value for bp.

This patch introduces a new inline function stack_frame(task, regs)
that computes the desired bp. This function is then called from the
two versions of dump_stack().

Signed-off-by: Soren Sandmann <ssp@redhat.com>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Arjan van de Ven <arjan@infradead.org>,
Cc: Frederic Weisbecker <fweisbec@gmail.com>,
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>,
LKML-Reference: <m3oc9rop28.fsf@dhcp-100-3-82.bos.redhat.com>>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
2010-11-18 14:37:34 +01:00
David Howells d7627467b7 Make do_execve() take a const filename pointer
Make do_execve() take a const filename pointer so that kernel_execve() compiles
correctly on ARM:

arch/arm/kernel/sys_arm.c:88: warning: passing argument 1 of 'do_execve' discards qualifiers from pointer target type

This also requires the argv and envp arguments to be consted twice, once for
the pointer array and once for the strings the array points to.  This is
because do_execve() passes a pointer to the filename (now const) to
copy_strings_kernel().  A simpler alternative would be to cast the filename
pointer in do_execve() when it's passed to copy_strings_kernel().

do_execve() may not change any of the strings it is passed as part of the argv
or envp lists as they are some of them in .rodata, so marking these strings as
const should be fine.

Further kernel_execve() and sys_execve() need to be changed to match.

This has been test built on x86_64, frv, arm and mips.

Signed-off-by: David Howells <dhowells@redhat.com>
Tested-by: Ralf Baechle <ralf@linux-mips.org>
Acked-by: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-08-17 18:07:43 -07:00
David Howells c788732523 Mark arguments to certain syscalls as being const
Mark arguments to certain system calls as being const where they should be but
aren't.  The list includes:

 (*) The filename arguments of various stat syscalls, execve(), various utimes
     syscalls and some mount syscalls.

 (*) The filename arguments of some syscall helpers relating to the above.

 (*) The buffer argument of various write syscalls.

Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-08-13 16:53:13 -07:00
Linus Torvalds 0f477dd085 Merge branch 'x86-cpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-cpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86: Fix keeping track of AMD C1E
  x86, cpu: Package Level Thermal Control, Power Limit Notification definitions
  x86, cpu: Export AMD errata definitions
  x86, cpu: Use AMD errata checking framework for erratum 383
  x86, cpu: Clean up AMD erratum 400 workaround
  x86, cpu: AMD errata checking framework
  x86, cpu: Split addon_cpuid_features.c
  x86, cpu: Clean up formatting in cpufeature.h, remove override
  x86, cpu: Enumerate xsaveopt
  x86, cpu: Add xsaveopt cpufeature
  x86, cpu: Make init_scattered_cpuid_features() consider cpuid subleaves
  x86, cpu: Support the features flags in new CPUID leaf 7
  x86, cpu: Add CPU flags for F16C and RDRND
  x86: Look for IA32_ENERGY_PERF_BIAS support
  x86, AMD: Extend support to future families
  x86, cacheinfo: Carve out L3 cache slot accessors
  x86, xsave: Cleanup return codes in check_for_xstate()
2010-08-06 10:02:36 -07:00
Linus Torvalds 8d91530c5f Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/davej/cpufreq
* 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/davej/cpufreq:
  [CPUFREQ] Remove pointless printk from p4-clockmod.
  [CPUFREQ] Fix section mismatch for powernow_cpu_init in powernow-k7.c
  [CPUFREQ] Fix section mismatch for longhaul_cpu_init.
  [CPUFREQ] Fix section mismatch for longrun_cpu_init.
  [CPUFREQ] powernow-k8: Fix misleading variable naming
  [CPUFREQ] Convert pci_table entries to PCI_VDEVICE (if PCI_ANY_ID is used)
  [CPUFREQ] arch/x86/kernel/cpu/cpufreq: use for_each_pci_dev()
  [CPUFREQ] fix brace coding style issue.
  [CPUFREQ] x86 cpufreq: Make trace_power_frequency cpufreq driver independent
  [CPUFREQ] acpi-cpufreq: Fix CPU_ANY CPUFREQ_{PRE,POST}CHANGE notification
  [CPUFREQ] ondemand: don't synchronize sample rate unless multiple cpus present
  [CPUFREQ] unexport (un)lock_policy_rwsem* functions
  [CPUFREQ] ondemand: Refactor frequency increase code
  [CPUFREQ] powernow-k8: On load failure, remind the user to enable support in BIOS setup
  [CPUFREQ] powernow-k8: Limit Pstate transition latency check
  [CPUFREQ] Fix PCC driver error path
  [CPUFREQ] fix double freeing in error path of pcc-cpufreq
  [CPUFREQ] pcc driver should check for pcch method before calling _OSC
  [CPUFREQ] fix memory leak in cpufreq_add_dev
  [CPUFREQ] revert "[CPUFREQ] remove rwsem lock from CPUFREQ_GOV_STOP call (second call site)"

Manually fix up non-data merge conflict introduced by new calling
conventions for trace_power_start() in commit 6f4f2723d0 ("x86
cpufreq: Make trace_power_frequency cpufreq driver independent"), which
didn't update the intel_idle native hardware cpuidle driver.
2010-08-04 11:13:36 -07:00
Thomas Renninger 6f4f2723d0 [CPUFREQ] x86 cpufreq: Make trace_power_frequency cpufreq driver independent
and fix the broken case if a core's frequency depends on others.

trace_power_frequency was only implemented in a rather ungeneric way
in acpi-cpufreq driver's target() function only.
-> Move the call to trace_power_frequency to
   cpufreq.c:cpufreq_notify_transition() where CPUFREQ_POSTCHANGE
   notifier is triggered.
   This will support power frequency tracing by all cpufreq drivers

trace_power_frequency did not trace frequency changes correctly when
the userspace governor was used or when CPU cores' frequency depend
on each other.
-> Moving this into the CPUFREQ_POSTCHANGE notifier and pass the cpu
   which gets switched automatically fixes this.

Robert Schoene provided some important fixes on top of my initial
quick shot version which are integrated in this patch:
- Forgot some changes in power_end trace (TP_printk/variable names)
- Variable dummy in power_end must now be cpu_id
- Use static 64 bit variable instead of unsigned int for cpu_id

Signed-off-by: Thomas Renninger <trenn@suse.de>
CC: davej@redhat.com
CC: arjan@infradead.org
CC: linux-kernel@vger.kernel.org
CC: robert.schoene@tu-dresden.de
Tested-by: robert.schoene@tu-dresden.de
Signed-off-by: Dave Jones <davej@redhat.com>
2010-08-03 13:47:05 -04:00
Michal Schmidt e8c534ec06 x86: Fix keeping track of AMD C1E
Accomodate the original C1E-aware idle routine to the different times
during boot when the BIOS enables C1E. While at it, remove the synthetic
CPUID flag in favor of a single global setting which denotes C1E status
on the system.

[ hpa: changed c1e_enabled to be a bool; clarified cpu bit 3:21 comment ]

Signed-off-by: Michal Schmidt <mschmidt@redhat.com>
LKML-Reference: <20100727165335.GA11630@aftab>
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
2010-08-02 08:45:56 -07:00
Sheng Yang 5ee481da7b x86: Export FPU API for KVM use
Also add some constants.

Signed-off-by: Sheng Yang <sheng@linux.intel.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-08-01 10:35:48 +03:00
Hans Rosenfeld 9d8888c2a2 x86, cpu: Clean up AMD erratum 400 workaround
Remove check_c1e_idle() and use the new AMD errata checking framework
instead.

Signed-off-by: Hans Rosenfeld <hans.rosenfeld@amd.com>
LKML-Reference: <1280336972-865982-2-git-send-email-hans.rosenfeld@amd.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2010-07-28 13:12:11 -07:00
Linus Torvalds 41d59102e1 Merge branch 'x86-fpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-fpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86, fpu: Use static_cpu_has() to implement use_xsave()
  x86: Add new static_cpu_has() function using alternatives
  x86, fpu: Use the proper asm constraint in use_xsave()
  x86, fpu: Unbreak FPU emulation
  x86: Introduce 'struct fpu' and related API
  x86: Eliminate TS_XSAVE
  x86-32: Don't set ignore_fpu_irq in simd exception
  x86: Merge kernel_math_error() into math_error()
  x86: Merge simd_math_error() into math_error()
  x86-32: Rework cache flush denied handler

Fix trivial conflict in arch/x86/kernel/process.c
2010-05-18 08:58:16 -07:00
Linus Torvalds 4d7b4ac22f Merge branch 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (311 commits)
  perf tools: Add mode to build without newt support
  perf symbols: symbol inconsistency message should be done only at verbose=1
  perf tui: Add explicit -lslang option
  perf options: Type check all the remaining OPT_ variants
  perf options: Type check OPT_BOOLEAN and fix the offenders
  perf options: Check v type in OPT_U?INTEGER
  perf options: Introduce OPT_UINTEGER
  perf tui: Add workaround for slang < 2.1.4
  perf record: Fix bug mismatch with -c option definition
  perf options: Introduce OPT_U64
  perf tui: Add help window to show key associations
  perf tui: Make <- exit menus too
  perf newt: Add single key shortcuts for zoom into DSO and threads
  perf newt: Exit browser unconditionally when CTRL+C, q or Q is pressed
  perf newt: Fix the 'A'/'a' shortcut for annotate
  perf newt: Make <- exit the ui_browser
  x86, perf: P4 PMU - fix counters management logic
  perf newt: Make <- zoom out filters
  perf report: Report number of events, not samples
  perf hist: Clarify events_stats fields usage
  ...

Fix up trivial conflicts in kernel/fork.c and tools/perf/builtin-record.c
2010-05-18 08:19:03 -07:00
Andreas Herrmann f01487119d x86, amd: Check X86_FEATURE_OSVW bit before accessing OSVW MSRs
If host CPU is exposed to a guest the OSVW MSRs are not guaranteed
to be present and a GP fault occurs. Thus checking the feature flag is
essential.

Cc: <stable@kernel.org> # .32.x .33.x
Signed-off-by: Andreas Herrmann <andreas.herrmann3@amd.com>
LKML-Reference: <20100427101348.GC4489@alberich.amd.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2010-05-13 16:21:20 -07:00
Avi Kivity 8660328332 x86: Introduce 'struct fpu' and related API
Currently all fpu state access is through tsk->thread.xstate.  Since we wish
to generalize fpu access to non-task contexts, wrap the state in a new
'struct fpu' and convert existing access to use an fpu API.

Signal frame handlers are not converted to the API since they will remain
task context only things.

Signed-off-by: Avi Kivity <avi@redhat.com>
Acked-by: Suresh Siddha <suresh.b.siddha@intel.com>
LKML-Reference: <1273135546-29690-3-git-send-email-avi@redhat.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2010-05-10 10:48:55 -07:00
Ingo Molnar ec5e61aabe Merge branch 'perf/urgent' into perf/core
Conflicts:
	arch/x86/kernel/cpu/perf_event.c

Merge reason: Resolve the conflict, pick up fixes

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-04-02 19:38:10 +02:00
Peter Zijlstra ea8e61b7bb x86, ptrace: Fix block-step
Implement ptrace-block-step using TIF_BLOCKSTEP which will set
DEBUGCTLMSR_BTF when set for a task while preserving any other
DEBUGCTLMSR bits.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <20100325135414.017536066@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-03-26 11:33:57 +01:00
Peter Zijlstra faa4602e47 x86, perf, bts, mm: Delete the never used BTS-ptrace code
Support for the PMU's BTS features has been upstreamed in
v2.6.32, but we still have the old and disabled ptrace-BTS,
as Linus noticed it not so long ago.

It's buggy: TIF_DEBUGCTLMSR is trampling all over that MSR without
regard for other uses (perf) and doesn't provide the flexibility
needed for perf either.

Its users are ptrace-block-step and ptrace-bts, since ptrace-bts
was never used and ptrace-block-step can be implemented using a
much simpler approach.

So axe all 3000 lines of it. That includes the *locked_memory*()
APIs in mm/mlock.c as well.

Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Roland McGrath <roland@redhat.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Markus Metzger <markus.t.metzger@intel.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
LKML-Reference: <20100325135413.938004390@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-03-26 11:33:55 +01:00
Andreas Herrmann 035a02c1e1 x86, amd: Restrict usage of c1e_idle()
Currently c1e_idle returns true for all CPUs greater than or equal to
family 0xf model 0x40. This covers too many CPUs.

Meanwhile a respective erratum for the underlying problem was filed
(#400). This patch adds the logic to check whether erratum #400
applies to a given CPU.
Especially for CPUs where SMI/HW triggered C1e is not supported,
c1e_idle() doesn't need to be used. We can check this by looking at
the respective OSVW bit for erratum #400.

Cc: <stable@kernel.org> # .32.x .33.x
Signed-off-by: Andreas Herrmann <andreas.herrmann3@amd.com>
LKML-Reference: <20100319110922.GA19614@alberich.amd.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2010-03-19 14:43:36 -07:00
Linus Torvalds 15c989d4d1 Merge branch 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86, k8 nb: Fix boot crash: enable k8_northbridges unconditionally on AMD systems
  x86, UV: Fix target_cpus() in x2apic_uv_x.c
  x86: Reduce per cpu warning boot up messages
  x86: Reduce per cpu MCA boot up messages
  x86_64, cpa: Don't work hard in preserving kernel 2M mappings when using 4K already
2010-03-13 14:45:49 -08:00
Mike Travis d6dd692168 x86: Reduce per cpu warning boot up messages
Reduce warning message output to one line only instead of per
cpu.

Signed-of-by: Mike Travis <travis@sgi.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: x86@kernel.org
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-03-11 14:27:47 +01:00
Linus Torvalds a7f16d10b5 Merge branch 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86: Mark atomic irq ops raw for 32bit legacy
  x86: Merge show_regs()
  x86: Macroise x86 cache descriptors
  x86-32: clean up rwsem inline asm statements
  x86: Merge asm/atomic_{32,64}.h
  x86: Sync asm/atomic_32.h and asm/atomic_64.h
  x86: Split atomic64_t functions into seperate headers
  x86-64: Modify memcpy()/memset() alternatives mechanism
  x86-64: Modify copy_user_generic() alternatives mechanism
  x86: Lift restriction on the location of FIX_BTMAP_*
  x86, core: Optimize hweight32()
2010-02-28 10:35:09 -08:00
H. Peter Anvin 05d43ed8a8 x86: get rid of the insane TIF_ABI_PENDING bit
Now that the previous commit made it possible to do the personality
setting at the point of no return, we do just that for ELF binaries.
And suddenly all the reasons for that insane TIF_ABI_PENDING bit go
away, and we can just make SET_PERSONALITY() just do the obvious thing
for a 32-bit compat process.

Everything becomes much more straightforward this way.

Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Cc: stable@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-01-29 08:22:01 -08:00
Brian Gerst 3bef444797 x86: Merge show_regs()
Using kernel_stack_pointer() allows 32-bit and 64-bit versions to
be merged.  This is more correct for 64-bit, since the old %rsp is
always saved on the stack.

Signed-off-by: Brian Gerst <brgerst@gmail.com>
LKML-Reference: <1263397555-27695-1-git-send-email-brgerst@gmail.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2010-01-13 09:23:15 -08:00
Cyrill Gorcunov 864a0922dd x86: kernel_thread() -- initialize SS to a known state
Before the kernel_thread was converted into "C" we had
pt_regs::ss set to __KERNEL_DS (by SAVE_ALL asm macro).

Though I must admit I didn't find any *explicit* load of
%ss from this structure the better to be on a safe side
and set it to a known value.

Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Cc: Christian Kujau <lists@nerdbynature.de>
Cc: Jeremy Fitzhardinge <Jeremy.Fitzhardinge@citrix.com>
Cc: Brian Gerst <brgerst@gmail.com>
LKML-Reference: <1263377768-19600-1-git-send-email-ian.campbell@citrix.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-01-13 11:23:45 +01:00
Pekka Enberg d015a09298 x86: Use KERN_DEFAULT log-level in __show_regs()
Andrew Morton reported a strange looking kmemcheck warning:

  WARNING: kmemcheck: Caught 32-bit read from uninitialized memory (ffff88004fba6c20)
  0000000000000000310000000000000000000000000000002413000000c9ffff
   u u u u u u u u u u u u u u u u i i i i i i i i u u u u u u u u

   [<ffffffff810af3aa>] kmemleak_scan+0x25a/0x540
   [<ffffffff810afbcb>] kmemleak_scan_thread+0x5b/0xe0
   [<ffffffff8104d0fe>] kthread+0x9e/0xb0
   [<ffffffff81003074>] kernel_thread_helper+0x4/0x10
   [<ffffffffffffffff>] 0xffffffffffffffff

The above printout is missing register dump completely. The
problem here is that the output comes from syslog which doesn't
show KERN_INFO log-level messages. We didn't see this before
because both of us were testing on 32-bit kernels which use the
_default_ log-level.

Fix that up by explicitly using KERN_DEFAULT log-level for
__show_regs() printks.

Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>
Cc: Vegard Nossum <vegard.nossum@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Arjan van de Ven <arjan@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
LKML-Reference: <1261988819.4641.2.camel@penberg-laptop>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-12-28 09:40:21 +01:00
Ingo Molnar ab1eebe77d Merge branch 'x86/asm' into x86/urgent
Merge reason: it's stable so lets push it upstream.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-12-15 20:33:28 +01:00
Brian Gerst df59e7bf43 x86: Merge kernel_thread()
Signed-off-by: Brian Gerst <brgerst@gmail.com>
LKML-Reference: <1260380084-3707-6-git-send-email-brgerst@gmail.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-12-10 16:41:31 -08:00
Brian Gerst f839bbc5c8 x86: Merge sys_clone
Change 32-bit sys_clone to new PTREGSCALL stub, and merge with 64-bit.

Signed-off-by: Brian Gerst <brgerst@gmail.com>
LKML-Reference: <1260403316-5679-7-git-send-email-brgerst@gmail.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-12-09 16:29:42 -08:00
Brian Gerst 11cf88bd0b x86: Merge sys_execve
Change 32-bit sys_execve to PTREGSCALL3, and merge with 64-bit.

Signed-off-by: Brian Gerst <brgerst@gmail.com>
LKML-Reference: <1260403316-5679-4-git-send-email-brgerst@gmail.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-12-09 16:28:34 -08:00
Andy Isaacson a1884b8e55 x86: Print DMI_BOARD_NAME as well as DMI_PRODUCT_NAME from __show_regs()
Robert Hancock observes that DMI_BOARD_NAME is often more useful
than DMI_PRODUCT_NAME, especially on standalone motherboards.
So, print both.

Signed-off-by: Andy Isaacson <adi@hexapodia.org>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Robert Hancock <hancockrwd@gmail.com>
Cc: Richard Zidlicky <rz@linux-m68k.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
LKML-Reference: <20091208083021.GB27174@hexapodia.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-12-09 10:17:59 +01:00
Andy Isaacson 814e2c84a7 x86: Factor duplicated code out of __show_regs() into show_regs_common()
Unify x86_32 and x86_64 implementations of __show_regs() header,
standardizing on the x86_64 format string in the process. Also,
32-bit will now call print_modules.

Signed-off-by: Andy Isaacson <adi@hexapodia.org>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Robert Hancock <hancockrwd@gmail.com>
Cc: Richard Zidlicky <rz@linux-m68k.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
LKML-Reference: <20091208082942.GA27174@hexapodia.org>
[ v2: resolved conflict ]
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-12-09 10:17:58 +01:00
Linus Torvalds ed9216c171 Merge branch 'kvm-updates/2.6.33' of git://git.kernel.org/pub/scm/virt/kvm/kvm
* 'kvm-updates/2.6.33' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (84 commits)
  KVM: VMX: Fix comparison of guest efer with stale host value
  KVM: s390: Fix prefix register checking in arch/s390/kvm/sigp.c
  KVM: Drop user return notifier when disabling virtualization on a cpu
  KVM: VMX: Disable unrestricted guest when EPT disabled
  KVM: x86 emulator: limit instructions to 15 bytes
  KVM: s390: Make psw available on all exits, not just a subset
  KVM: x86: Add KVM_GET/SET_VCPU_EVENTS
  KVM: VMX: Report unexpected simultaneous exceptions as internal errors
  KVM: Allow internal errors reported to userspace to carry extra data
  KVM: Reorder IOCTLs in main kvm.h
  KVM: x86: Polish exception injection via KVM_SET_GUEST_DEBUG
  KVM: only clear irq_source_id if irqchip is present
  KVM: x86: disallow KVM_{SET,GET}_LAPIC without allocated in-kernel lapic
  KVM: x86: disallow multiple KVM_CREATE_IRQCHIP
  KVM: VMX: Remove vmx->msr_offset_efer
  KVM: MMU: update invlpg handler comment
  KVM: VMX: move CR3/PDPTR update to vmx_set_cr3
  KVM: remove duplicated task_switch check
  KVM: powerpc: Fix BUILD_BUG_ON condition
  KVM: VMX: Use shared msr infrastructure
  ...

Trivial conflicts due to new Kconfig options in arch/Kconfig and kernel/Makefile
2009-12-08 08:02:38 -08:00
Frederic Weisbecker 24f1e32c60 hw-breakpoints: Rewrite the hw-breakpoints layer on top of perf events
This patch rebase the implementation of the breakpoints API on top of
perf events instances.

Each breakpoints are now perf events that handle the
register scheduling, thread/cpu attachment, etc..

The new layering is now made as follows:

       ptrace       kgdb      ftrace   perf syscall
          \          |          /         /
           \         |         /         /
                                        /
            Core breakpoint API        /
                                      /
                     |               /
                     |              /

              Breakpoints perf events

                     |
                     |

               Breakpoints PMU ---- Debug Register constraints handling
                                    (Part of core breakpoint API)
                     |
                     |

             Hardware debug registers

Reasons of this rewrite:

- Use the centralized/optimized pmu registers scheduling,
  implying an easier arch integration
- More powerful register handling: perf attributes (pinned/flexible
  events, exclusive/non-exclusive, tunable period, etc...)

Impact:

- New perf ABI: the hardware breakpoints counters
- Ptrace breakpoints setting remains tricky and still needs some per
  thread breakpoints references.

Todo (in the order):

- Support breakpoints perf counter events for perf tools (ie: implement
  perf_bpcounter_event())
- Support from perf tools

Changes in v2:

- Follow the perf "event " rename
- The ptrace regression have been fixed (ptrace breakpoint perf events
  weren't released when a task ended)
- Drop the struct hw_breakpoint and store generic fields in
  perf_event_attr.
- Separate core and arch specific headers, drop
  asm-generic/hw_breakpoint.h and create linux/hw_breakpoint.h
- Use new generic len/type for breakpoint
- Handle off case: when breakpoints api is not supported by an arch

Changes in v3:

- Fix broken CONFIG_KVM, we need to propagate the breakpoint api
  changes to kvm when we exit the guest and restore the bp registers
  to the host.

Changes in v4:

- Drop the hw_breakpoint_restore() stub as it is only used by KVM
- EXPORT_SYMBOL_GPL hw_breakpoint_restore() as KVM can be built as a
  module
- Restore the breakpoints unconditionally on kvm guest exit:
  TIF_DEBUG_THREAD doesn't anymore cover every cases of running
  breakpoints and vcpu->arch.switch_db_regs might not always be
  set when the guest used debug registers.
  (Waiting for a reliable optimization)

Changes in v5:

- Split-up the asm-generic/hw-breakpoint.h moving to
  linux/hw_breakpoint.h into a separate patch
- Optimize the breakpoints restoring while switching from kvm guest
  to host. We only want to restore the state if we have active
  breakpoints to the host, otherwise we don't care about messed-up
  address registers.
- Add asm/hw_breakpoint.h to Kbuild
- Fix bad breakpoint type in trace_selftest.c

Changes in v6:

- Fix wrong header inclusion in trace.h (triggered a build
  error with CONFIG_FTRACE_SELFTEST

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Prasad <prasad@linux.vnet.ibm.com>
Cc: Alan Stern <stern@rowland.harvard.edu>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Jan Kiszka <jan.kiszka@web.de>
Cc: Jiri Slaby <jirislaby@gmail.com>
Cc: Li Zefan <lizf@cn.fujitsu.com>
Cc: Avi Kivity <avi@redhat.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Masami Hiramatsu <mhiramat@redhat.com>
Cc: Paul Mundt <lethal@linux-sh.org>
2009-11-08 15:34:42 +01:00
Paul Mundt 41a48d14f6 x86/hw-breakpoints: Actually flush thread breakpoints in flush_thread().
flush_thread() tries to do a TIF_DEBUG check before calling in to
flush_thread_hw_breakpoint() (which subsequently clears the thread flag),
but for some reason, the x86 code is manually clearing TIF_DEBUG
immediately before the test, so this path will never be taken.

This kills off the erroneous clear_tsk_thread_flag() and lets
flush_thread_hw_breakpoint() actually get invoked.

Presumably folks were getting lucky with testing and the
free_thread_info() -> free_thread_xstate() path was taking care of the
flush there.

Signed-off-by: Paul Mundt <lethal@linux-sh.org>
Acked-by: "K.Prasad" <prasad@linux.vnet.ibm.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Alan Stern <stern@rowland.harvard.edu>
LKML-Reference: <20091005102306.GA7889@linux-sh.org>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
2009-11-03 18:05:44 +01:00
Frederic Weisbecker 0f8f86c7bd Merge commit 'perf/core' into perf/hw-breakpoint
Conflicts:
	kernel/Makefile
	kernel/trace/Makefile
	kernel/trace/trace.h
	samples/Makefile

Merge reason: We need to be uptodate with the perf events development
branch because we plan to rewrite the breakpoints API on top of
perf events.
2009-10-18 01:12:33 +02:00
Avi Kivity 7c68af6e32 core, x86: Add user return notifiers
Add a general per-cpu notifier that is called whenever the kernel is
about to return to userspace.  The notifier uses a thread_info flag
and existing checks, so there is no impact on user return or context
switch fast paths.

This will be used initially to speed up KVM task switching by lazily
updating MSRs.

Signed-off-by: Avi Kivity <avi@redhat.com>
LKML-Reference: <1253342422-13811-1-git-send-email-avi@redhat.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-10-01 12:12:18 -07:00
Li Zefan 79f5599772 cpumask: use zalloc_cpumask_var() where possible
Remove open-coded zalloc_cpumask_var() and zalloc_cpumask_var_node().

Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2009-09-24 09:34:24 +09:30
Arjan van de Ven 288f023e70 tracing, x86, cpuidle: Move the end point of a C state in the power tracer
The "end of a C state" trace point currently happens before
the code runs that corrects the TSC for having stopped during idle.

The result of this is that the timestamp of the end-of-C-state event
is garbage on cpus where the TSC stops during idle.

This patch moves the end point of the C state to after the timekeeping
engine of the kernel has been corrected.

Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Cc: Len Brown <len.brown@intel.com>
Cc: fweisbec@gmail.com
Cc: peterz@infradead.org
Cc: Paul Mackerras <paulus@samba.org>
LKML-Reference: <20090919133533.139c2a46@infradead.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-09-19 18:57:52 +02:00