Commit Graph

505836 Commits

Author SHA1 Message Date
Ingo Molnar 7fcb3bc361 x86/asm/entry/64: Update comments about stack frames
Tweak a few outdated comments that were obsoleted by recent changes
to syscall entry code:

 - we no longer have a "partial stack frame" on
   entry, ever.

 - explain the syscall entry usage of old_rsp.

Partially based on a (split out of) patch from Denys Vlasenko.

Originally-from: Denys Vlasenko <dvlasenk@redhat.com>
Acked-by: Borislav Petkov <bp@alien8.de>
Cc: Alexei Starovoitov <ast@plumgrid.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Will Drewry <wad@chromium.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-03-17 16:01:41 +01:00
Ingo Molnar ac9af4983e x86/asm/entry/64: Remove thread_struct::usersp
Nothing uses thread_struct::usersp anymore, so remove it.

Originally-from: Denys Vlasenko <dvlasenk@redhat.com>
Tested-by: Borislav Petkov <bp@alien8.de>
Acked-by: Borislav Petkov <bp@alien8.de>
Cc: Alexei Starovoitov <ast@plumgrid.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Will Drewry <wad@chromium.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-03-17 16:01:41 +01:00
Ingo Molnar 9854dd74c3 x86/asm/entry/64: Simplify 'old_rsp' usage
Remove all manipulations of PER_CPU(old_rsp) in C code:

 - it is not used on SYSRET return anymore, and system entries
   are atomic, so updating it from the fork and context switch
   paths is pointless.

 - Tweak a few related comments as well: we no longer have a
   "partial stack frame" on entry, ever.

Based on (split out of) patch from Denys Vlasenko.

Originally-from: Denys Vlasenko <dvlasenk@redhat.com>
Tested-by: Borislav Petkov <bp@alien8.de>
Acked-by: Borislav Petkov <bp@alien8.de>
Cc: Alexei Starovoitov <ast@plumgrid.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Will Drewry <wad@chromium.org>
Link: http://lkml.kernel.org/r/1426599779-8010-2-git-send-email-dvlasenk@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-03-17 16:01:41 +01:00
Denys Vlasenko 33db1fd48a x86/asm/entry/64: Enable interrupts *after* we fetch PER_CPU_VAR(old_rsp)
We want to use PER_CPU_VAR(old_rsp) as a simple temporary register,
to shuffle user-space RSP into (and from) when we set up the system
call stack frame. At that point we cannot shuffle values into general
purpose registers, because we have not saved them yet.

To be able to do this shuffling into a memory location, we must be
atomic and must not be preempted while we do the shuffling, otherwise
the 'temporary' register gets overwritten by some other task's
temporary register contents ...

Tested-by: Borislav Petkov <bp@alien8.de>
Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
Acked-by: Borislav Petkov <bp@alien8.de>
Cc: Alexei Starovoitov <ast@plumgrid.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Will Drewry <wad@chromium.org>
Link: http://lkml.kernel.org/r/1426600344-8254-1-git-send-email-dvlasenk@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-03-17 16:01:40 +01:00
Ingo Molnar 8b6c0ab1a1 x86/asm/entry: Document and clean up the enable_sep_cpu() and syscall32_cpu_init() functions
Clean up the flow and document the functions a bit better.

Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Alexei Starovoitov <ast@plumgrid.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Will Drewry <wad@chromium.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-03-17 09:25:29 +01:00
Denys Vlasenko d828c71fba x86/asm/entry/32: Document the 32-bit SYSENTER "emergency stack" better
Before the patch, the 'tss_struct::stack' field was not referenced anywhere.

It was used only to set SYSENTER's stack to point after the last byte
of tss_struct, thus the trailing field, stack[64], was used.

But grep would not know it. You can comment it out, compile,
and kernel will even run until an unlucky NMI corrupts
io_bitmap[] (which is also not easily detectable).

This patch changes code so that the purpose and usage of this
field is not mysterious anymore, and can be easily grepped for.

This does change generated code, for a subtle reason:
since tss_struct is ____cacheline_aligned, there happens to be
5 longs of padding at the end. Old code was using the padding
too; new code will strictly use it only for SYSENTER_stack[].

Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Alexei Starovoitov <ast@plumgrid.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Will Drewry <wad@chromium.org>
Link: http://lkml.kernel.org/r/1425912738-559-2-git-send-email-dvlasenk@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-03-17 09:25:29 +01:00
Denys Vlasenko 3876488444 include/stddef.h: Move offsetofend() from vfio.h to a generic kernel header
Suggested by Andy.

Suggested-by: Andy Lutomirski <luto@amacapital.net>
Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Alexei Starovoitov <ast@plumgrid.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Will Drewry <wad@chromium.org>
Link: http://lkml.kernel.org/r/1425912738-559-1-git-send-email-dvlasenk@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-03-17 09:25:28 +01:00
Denys Vlasenko 5c39403e00 x86/asm/entry: Simplify task_pt_regs() macro definition
Before this change, task_pt_regs() was using KSTK_TOP(),
and it was the only use of that macro. In turn, KSTK_TOP used
THREAD_SIZE_LONGS, and it was the only use of that macro too.

Fold these macros into task_pt_regs(). Tweak comment
about "- 8" - we now use a symbolic constant, not literal 8.

Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
Reviewed-by: Steven Rostedt <rostedt@goodmis.org>
Cc: Alexei Starovoitov <ast@plumgrid.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Drewry <wad@chromium.org>
Link: http://lkml.kernel.org/r/1426255743-5394-1-git-send-email-dvlasenk@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-03-17 09:25:28 +01:00
Andy Lutomirski 76e4c4908a x86/asm/entry/32: Document our abuse of x86_hw_tss::ss1 and x86_hw_tss::sp1
This has confused me for a while.  Now that I figured it out, document it.

Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/b7efc1b7364039824776f68e9ddee9ec1500e894.1426009661.git.luto@amacapital.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-03-17 09:25:27 +01:00
Andy Lutomirski d9e05cc5a5 x86/asm/entry: Unify and fix initial thread_struct::sp0 values
x86_32 and x86_64 need slightly different thread_struct::sp0 values, and
x86_32's was incorrect for init.

This never mattered -- the init thread never runs user code, so we never
used thread_struct::sp0 for anything.

Fix it and mostly unify them.

Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1b810c1d2e797e27bb4a7708c426101161edd1f6.1426009661.git.luto@amacapital.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-03-17 09:25:27 +01:00
Andy Lutomirski 3ee4298f44 x86/asm/entry: Create and use a 'TOP_OF_KERNEL_STACK_PADDING' macro
x86_32, unlike x86_64, pads the top of the kernel stack, because the
hardware stack frame formats are variable in size.

Document this padding and give it a name.

This should make no change whatsoever to the compiled kernel
image. It also doesn't fix any of the current bugs in this area.

Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Acked-by: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/02bf2f54b8dcb76a62a142b6dfe07d4ef7fc582e.1426009661.git.luto@amacapital.net
[ Fixed small details, such as a missed magic constant in entry_32.S pointed out by Denys Vlasenko. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-03-17 09:25:26 +01:00
Andy Lutomirski 9a036b93a3 x86/signal/64: Remove 'fs' and 'gs' from sigcontext
As far as I can tell, these fields have been set to zero on save
and ignored on restore since Linux was imported into git.
Rename them '__pad1' and '__pad2' to avoid confusion.  This may
also allow us to recycle them some day.

This also adds a comment clarifying the history of those fields.

I'm intentionally avoiding calling either of them '__pad0': the
field formerly known as '__pad0' is now 'ss'.

Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Reviewed-by: Oleg Nesterov <oleg@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Borislav Petkov <bp@alien8.de>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/844f8490e938780c03355be4c9b69eb4c494bf4e.1426193719.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-03-17 09:25:26 +01:00
Andy Lutomirski c6f2062935 x86/signal/64: Fix SS handling for signals delivered to 64-bit programs
The comment in the signal code says that apps can save/restore
other segments on their own.  It's true that apps can *save* SS
on their own, but there's no way for apps to restore it: SYSCALL
effectively resets SS to __USER_DS, so any value that user code
tries to load into SS gets lost on entry to sigreturn.

This recycles two padding bytes in the segment selector area for SS.

While we're at it, we need a second change to make this useful.

If the signal we're delivering is caused by a bad SS value,
saving that value isn't enough.  We need to remove that bad
value from the regs before we try to deliver the signal.  Oddly,
the i386 code already got this right.

I suspect that 64-bit programs that try to run 16-bit code and
use signals will have a lot of trouble without this.

Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Reviewed-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Borislav Petkov <bp@suse.de>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/405594361340a2ec32f8e2b115c142df0e180d8e.1426193719.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-03-17 09:25:25 +01:00
Denys Vlasenko 263042e463 x86/asm/entry/64: Save user RSP in pt_regs->sp on SYSCALL64 fastpath
Prepare for the removal of 'usersp', by simplifying PER_CPU(old_rsp) usage:

  - use it only as temp storage

  - store the userspace stack pointer immediately in pt_regs->sp
    on syscall entry, instead of using it later, on syscall exit.

  - change C code to use pt_regs->sp only, instead of PER_CPU(old_rsp)
    and task->thread.usersp.

FIXUP/RESTORE_TOP_OF_STACK are simplified as well.

Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Alexei Starovoitov <ast@plumgrid.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Will Drewry <wad@chromium.org>
Link: http://lkml.kernel.org/r/1425926364-9526-4-git-send-email-dvlasenk@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-03-10 13:56:10 +01:00
Denys Vlasenko 616ab249f1 x86/asm/entry/64: Remove stub_iopl
stub_iopl is no longer needed: pt_regs->flags needs no fixing up
after previous change. Remove it.

Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Alexei Starovoitov <ast@plumgrid.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Will Drewry <wad@chromium.org>
Link: http://lkml.kernel.org/r/1425984307-2143-1-git-send-email-dvlasenk@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-03-10 13:56:10 +01:00
Denys Vlasenko 29722cd4ef x86/asm/entry/64: Save R11 into pt_regs->flags on SYSCALL64 fastpath
Before this patch, R11 was saved in pt_regs->r11.

Which looks natural, but requires messy shuffling to/from iret
frame whenever ptrace or e.g. sys_iopl() wants to modify flags -
because that's how this register is used by SYSCALL/SYSRET.

This patch saves R11 in pt_regs->flags, and uses that value for
the SYSRET64 instruction. Shuffling is eliminated.

FIXUP/RESTORE_TOP_OF_STACK are simplified.

stub_iopl is no longer needed: pt_regs->flags needs no fixing up.

Testing shows that syscall fast path is ~54.3 ns before
and after the patch (on 2.7 GHz Sandy Bridge CPU).

Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Alexei Starovoitov <ast@plumgrid.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Will Drewry <wad@chromium.org>
Link: http://lkml.kernel.org/r/1425926364-9526-2-git-send-email-dvlasenk@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-03-10 13:56:10 +01:00
Denys Vlasenko 3e1aa7cb59 x86/asm: Optimize unnecessarily wide TEST instructions
By the nature of the TEST operation, it is often possible to test
a narrower part of the operand:

    "testl $3,  mem"  ->  "testb $3, mem",
    "testq $3, %rcx"  ->  "testb $3, %cl"

This results in shorter instructions, because the TEST instruction
has no sign-entending byte-immediate forms unlike other ALU ops.

Note that this change does not create any LCP (Length-Changing Prefix)
stalls, which happen when adding a 0x66 prefix, which happens when
16-bit immediates are used, which changes such TEST instructions:

  [test_opcode] [modrm] [imm32]

to:

  [0x66] [test_opcode] [modrm] [imm16]

where [imm16] has a *different length* now: 2 bytes instead of 4.
This confuses the decoder and slows down execution.

REX prefixes were carefully designed to almost never hit this case:
adding REX prefix does not change instruction length except MOVABS
and MOV [addr],RAX instruction.

This patch does not add instructions which would use a 0x66 prefix,
code changes in assembly are:

    -48 f7 07 01 00 00 00 	testq  $0x1,(%rdi)
    +f6 07 01             	testb  $0x1,(%rdi)
    -48 f7 c1 01 00 00 00 	test   $0x1,%rcx
    +f6 c1 01             	test   $0x1,%cl
    -48 f7 c1 02 00 00 00 	test   $0x2,%rcx
    +f6 c1 02             	test   $0x2,%cl
    -41 f7 c2 01 00 00 00 	test   $0x1,%r10d
    +41 f6 c2 01          	test   $0x1,%r10b
    -48 f7 c1 04 00 00 00 	test   $0x4,%rcx
    +f6 c1 04             	test   $0x4,%cl
    -48 f7 c1 08 00 00 00 	test   $0x8,%rcx
    +f6 c1 08             	test   $0x8,%cl

Linus further notes:

   "There are no stalls from using 8-bit instruction forms.

    Now, changing from 64-bit or 32-bit 'test' instructions to 8-bit ones
    *could* cause problems if it ends up having forwarding issues, so that
    instead of just forwarding the result, you end up having to wait for
    it to be stable in the L1 cache (or possibly the register file). The
    forwarding from the store buffer is simplest and most reliable if the
    read is done at the exact same address and the exact same size as the
    write that gets forwarded.

    But that's true only if:

     (a) the write was very recent and is still in the write queue. I'm
         not sure that's the case here anyway.

     (b) on at least most Intel microarchitectures, you have to test a
         different byte than the lowest one (so forwarding a 64-bit write
         to a 8-bit read ends up working fine, as long as the 8-bit read
         is of the low 8 bits of the written data).

    A very similar issue *might* show up for registers too, not just
    memory writes, if you use 'testb' with a high-byte register (where
    instead of forwarding the value from the original producer it needs to
    go through the register file and then shifted). But it's mainly a
    problem for store buffers.

    But afaik, the way Denys changed the test instructions, neither of the
    above issues should be true.

    The real problem for store buffer forwarding tends to be "write 8
    bits, read 32 bits". That can be really surprisingly expensive,
    because the read ends up having to wait until the write has hit the
    cacheline, and we might talk tens of cycles of latency here. But
    "write 32 bits, read the low 8 bits" *should* be fast on pretty much
    all x86 chips, afaik."

Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
Acked-by: Andy Lutomirski <luto@amacapital.net>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: H. Peter Anvin <hpa@linux.intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Will Drewry <wad@chromium.org>
Link: http://lkml.kernel.org/r/1425675332-31576-1-git-send-email-dvlasenk@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-03-07 11:12:43 +01:00
Andy Lutomirski a7fcf28d43 x86/asm/entry: Replace this_cpu_sp0() with current_top_of_stack() and fix it on x86_32
I broke 32-bit kernels.  The implementation of sp0 was correct
as far as I can tell, but sp0 was much weirder on x86_32 than I
realized.  It has the following issues:

 - Init's sp0 is inconsistent with everything else's: non-init tasks
   are offset by 8 bytes.  (I have no idea why, and the comment is unhelpful.)

 - vm86 does crazy things to sp0.

Fix it up by replacing this_cpu_sp0() with
current_top_of_stack() and using a new percpu variable to track
the top of the stack on x86_32.

Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Fixes: 75182b1632 ("x86/asm/entry: Switch all C consumers of kernel_stack to this_cpu_sp0()")
Link: http://lkml.kernel.org/r/d09dbe270883433776e0cbee3c7079433349e96d.1425692936.git.luto@amacapital.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-03-07 09:34:03 +01:00
Andy Lutomirski b27559a433 x86/asm/entry: Delay loading sp0 slightly on task switch
The change:

  75182b1632 ("x86/asm/entry: Switch all C consumers of kernel_stack to this_cpu_sp0()")

had the unintended side effect of changing the return value of
current_thread_info() during part of the context switch process.
Change it back.

This has no effect as far as I can tell -- it's just for
consistency.

Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/9fcaa47dd8487db59eed7a3911b6ae409476763e.1425692936.git.luto@amacapital.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-03-07 09:34:03 +01:00
Andy Lutomirski 9b47668843 x86/asm/entry: Rename 'INIT_TSS_IST' to 'CPU_TSS_IST'
This has nothing to do with the init thread or the initial
anything. It's just the CPU's TSS.

Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/a0bd5e26b32a2e1f08ff99017d0997118fbb2485.1425611534.git.luto@amacapital.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-03-06 08:32:58 +01:00
Andy Lutomirski d0a0de21f8 x86/asm/entry: Remove INIT_TSS and fold the definitions into 'cpu_tss'
The INIT_TSS is unnecessary.  Just define the initial TSS where
'cpu_tss' is defined.

While we're at it, merge the 32-bit and 64-bit definitions.  The
only syntactic change is that 32-bit kernels were computing sp0
as long, but now they compute it as unsigned long.

Verified by objdump: the contents and relocations of
.data..percpu..shared_aligned are unchanged on 32-bit and 64-bit
kernels.

Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/8fc39fa3f6c5d635e93afbdd1a0fe0678a6d7913.1425611534.git.luto@amacapital.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-03-06 08:32:58 +01:00
Andy Lutomirski 24933b82c0 x86/asm/entry: Rename 'init_tss' to 'cpu_tss'
It has nothing to do with init -- there's only one TSS per cpu.

Other names considered include:

 - current_tss: Confusing because we never switch the tss.
 - singleton_tss: Too long.

This patch was generated with 's/init_tss/cpu_tss/g'.  Followup
patches will fix INIT_TSS and INIT_TSS_IST by hand.

Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/da29fb2a793e4f649d93ce2d1ed320ebe8516262.1425611534.git.luto@amacapital.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-03-06 08:32:58 +01:00
Andy Lutomirski 9d0c914c60 x86/asm/entry/64/compat: Change the 32-bit sysenter code to use sp0
The ia32 sysenter code loaded the top of the kernel stack into
rsp by loading kernel_stack and then adjusting it.  It can be
simplified to just read sp0 directly.

This requires the addition of a new asm-offsets entry for sp0.

Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/88ff9006163d296a0665338585c36d9bfb85235d.1425611534.git.luto@amacapital.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-03-06 08:32:58 +01:00
Andy Lutomirski 75182b1632 x86/asm/entry: Switch all C consumers of kernel_stack to this_cpu_sp0()
This will make modifying the semantics of kernel_stack easier.

The change to ist_begin_non_atomic() is necessary because sp0 no
longer points to the same THREAD_SIZE-aligned region as RSP;
it's one byte too high for that.  At Denys' suggestion, rather
than offsetting it, just check explicitly that we're in the
correct range ending at sp0.  This has the added benefit that we
no longer assume that the thread stack is aligned to
THREAD_SIZE.

Suggested-by: Denys Vlasenko <dvlasenk@redhat.com>
Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/ef8254ad414cbb8034c9a56396eeb24f5dd5b0de.1425611534.git.luto@amacapital.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-03-06 08:32:57 +01:00
Andy Lutomirski 8ef46a672a x86/asm/entry: Add this_cpu_sp0() to read sp0 for the current cpu
We currently store references to the top of the kernel stack in
multiple places: kernel_stack (with an offset) and
init_tss.x86_tss.sp0 (no offset).  The latter is defined by
hardware and is a clean canonical way to find the top of the
stack.  Add an accessor so we can start using it.

This needs minor paravirt tweaks.  On native, sp0 defines the
top of the kernel stack and is therefore always correct.  On Xen
and lguest, the hypervisor tracks the top of the stack, but we
want to start reading sp0 in the kernel.  Fixing this is simple:
just update our local copy of sp0 as well as the hypervisor's
copy on task switches.

Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/8d675581859712bee09a055ed8f785d80dac1eca.1425611534.git.luto@amacapital.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-03-06 08:32:57 +01:00
Wang Nan 5eca7453d6 x86/traps: Separate set_intr_gate() and clean up early_trap_init()
As early_trap_init() doesn't use IST, replace
set_intr_gate_ist() and set_system_intr_gate_ist() with their
standard counterparts.

set_intr_gate() requires a trace_debug symbol which we don't
have and won't use. This patch separates set_intr_gate() into two
parts, and uses base version in early_trap_init().

Reported-by: Andy Lutomirski <luto@amacapital.net>
Signed-off-by: Wang Nan <wangnan0@huawei.com>
Acked-by: Andy Lutomirski <luto@amacapital.net>
Cc: <dave.hansen@linux.intel.com>
Cc: <lizefan@huawei.com>
Cc: <masami.hiramatsu.pt@hitachi.com>
Cc: <oleg@redhat.com>
Cc: <rostedt@goodmis.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1425010789-13714-1-git-send-email-wangnan0@huawei.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-03-05 00:47:29 +01:00
Andy Lutomirski 1e3fbb8a1d x86/asm/entry/64: Remove a bogus 'ret_from_fork' optimization
'ret_from_fork' checks TIF_IA32 to determine whether 'pt_regs' and
the related state make sense for 'ret_from_sys_call'.  This is
entirely the wrong check.  TS_COMPAT would make a little more
sense, but there's really no point in keeping this optimization
at all.

This fixes a return to the wrong user CS if we came from int
0x80 in a 64-bit task.

Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/4710be56d76ef994ddf59087aad98c000fbab9a4.1424989793.git.luto@amacapital.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-03-04 22:50:53 +01:00
Denys Vlasenko d441c1f2b7 x86/asm/entry/64: Simplify optimistic SYSRET
Avoid redundant load of %r11 (it is already loaded a few
instructions before).

Also simplify %rsp restoration, instead of two steps:

         add $0x80, %rsp
         mov 0x18(%rsp), %rsp

we can do a simplified single step to restore user-space RSP:

         mov 0x98(%rsp), %rsp

and get the same result.

Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
Signed-off-by: Andy Lutomirski <luto@amacapital.net>
[ Clarified the changelog. ]
Cc: Alexei Starovoitov <ast@plumgrid.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Drewry <wad@chromium.org>
Link: http://lkml.kernel.org/r/1aef69b346a6db0d99cdfb0f5ba83e8c985e27d7.1424989793.git.luto@amacapital.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-03-04 22:50:52 +01:00
Denys Vlasenko b3ab90b333 x86/asm/entry/64/compat: Use more readable constant
The last instance of "mysterious" SS+8 constant is replaced by
SIZEOF_PTREGS.

Message-Id: <1424822419-10267-1-git-send-email-dvlasenk@redhat.com>
Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Cc: Alexei Starovoitov <ast@plumgrid.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Will Drewry <wad@chromium.org>
Link: http://lkml.kernel.org/r/d35aeba3059407ac54f472ddcfbea767ff8916ac.1424989793.git.luto@amacapital.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-03-04 22:50:52 +01:00
Denys Vlasenko 911d2bb5cc x86/asm/entry/64: Use more readable constants
Constants such as SS+8 or SS+8-RIP are mysterious.
In most cases, SS+8 is just meant to be SIZEOF_PTREGS,
SS+8-RIP is RIP's offset in the iret frame.

This patch changes some of these constants to be less
mysterious.

No code changes (verified with objdump).

Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Cc: Alexei Starovoitov <ast@plumgrid.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Drewry <wad@chromium.org>
Link: http://lkml.kernel.org/r/1d20491384773bd606e23a382fac23ddb49b5178.1424989793.git.luto@amacapital.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-03-04 22:50:52 +01:00
Denys Vlasenko 14f6e9532d x86/asm/entry/64/compat: Fold the IA32_ARG_FIXUP macro into its callers
Use of a small macro - one with conditional expansion - does
more harm than good. It obfuscates code, with minimal code
reuse.

For example, because of obfuscation it's not obvious that
in 'ia32_sysenter_target', we can optimize loading of r9 -
currently it is loaded with a detour through ebp.

This patch folds the IA32_ARG_FIXUP macro into its callers.

No code changes.

Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Cc: Alexei Starovoitov <ast@plumgrid.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Will Drewry <wad@chromium.org>
Link: http://lkml.kernel.org/r/4da092094cd78734384ac31e0d4ec1d8f69145a2.1424989793.git.luto@amacapital.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-03-04 22:50:52 +01:00
Denys Vlasenko ebfc453e27 x86/asm/entry/64: Clean up and document various entry code details
This patch does a lot of cleanup in comments and formatting,
but it does not change any code:

 - Rename 'save_paranoid' to 'paranoid_entry': this makes naming
   similar to its "non-paranoid" sibling, 'error_entry',
   and to its counterpart, 'paranoid_exit'.

 - Use the same CFI annotation atop 'paranoid_entry' and 'error_entry'.

 - Fix irregular indentation of assembler operands.

 - Add/fix comments on top of 'paranoid_entry' and 'error_entry'.

 - Remove stale comment about "oldrax".

 - Make comments about "no swapgs" flag in ebx more prominent.

 - Deindent wrongly indented top-level comment atop 'paranoid_exit'.

 - Indent wrongly deindented comment inside 'error_entry'.

Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Cc: Alexei Starovoitov <ast@plumgrid.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Will Drewry <wad@chromium.org>
Link: http://lkml.kernel.org/r/4640f9fcd5ea46eb299b1cd6d3f5da3167d2f78d.1424989793.git.luto@amacapital.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-03-04 22:50:51 +01:00
Denys Vlasenko 1eeb207f87 x86/asm/entry/64: Move 'save_paranoid' and 'ret_from_fork' closer to their users
For some odd reason, these two functions are at the very top of
the file. "save_paranoid"'s caller is approximately in the middle
of it, move it there. Move 'ret_from_fork' to be right after
fork/exec helpers.

This is a pure block move, nothing is changed in the function
bodies.

Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Cc: Alexei Starovoitov <ast@plumgrid.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Drewry <wad@chromium.org>
Link: http://lkml.kernel.org/r/6446bbfe4094532623a5b83779b7015fec167a9d.1424989793.git.luto@amacapital.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-03-04 22:50:51 +01:00
Denys Vlasenko b87cf63e2a x86/asm/entry: Add comments about various syscall instructions
SYSCALL/SYSRET and SYSENTER/SYSEXIT have weird semantics.
Moreover, they differ in 32- and 64-bit mode.

What is saved? What is not? Is rsp set? Are interrupts disabled?
People tend to not remember these details well enough.

This patch adds comments which explain in detail
what registers are modified by each of these instructions.

The comments are placed immediately before corresponding
entry and exit points.

Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Cc: Alexei Starovoitov <ast@plumgrid.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Drewry <wad@chromium.org>
Link: http://lkml.kernel.org/r/a94b98b63527797c871a81402ff5060b18fa880a.1424989793.git.luto@amacapital.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-03-04 22:50:51 +01:00
Andy Lutomirski 050273d19b x86/asm/entry/64: Remove 'int_check_syscall_exit_work'
Nothing references it anymore.

Reported-by: Denys Vlasenko <vda.linux@googlemail.com>
Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Fixes: 96b6352c12 ("x86_64, entry: Remove the syscall exit audit and schedule optimizations")
Link: http://lkml.kernel.org/r/dd2a4d26ecc7a5db61b476727175cd99ae2b32a4.1424989793.git.luto@amacapital.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-03-04 22:50:50 +01:00
Denys Vlasenko f2db9382c1 x86/asm/entry: Do mass removal of 'ARGOFFSET'
ARGOFFSET is zero now, removing it changes no code.

A few macros lost "offset" parameter, since it is always zero
now too.

No code changes - verified with objdump.

Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Cc: Alexei Starovoitov <ast@plumgrid.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Drewry <wad@chromium.org>
Link: http://lkml.kernel.org/r/8689f937622d9d2db0ab8be82331fa15e4ed4713.1424989793.git.luto@amacapital.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-03-04 22:50:50 +01:00
Denys Vlasenko 0d55083698 x86/asm/entry/64: Shrink code in 'paranoid_exit'
RESTORE_EXTRA_REGS + RESTORE_C_REGS looks small, but it's
a lot of instructions (fourteen). Let's reuse them.

Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
[ Cleaned up the labels. ]
Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Cc: Alexei Starovoitov <ast@plumgrid.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Will Drewry <wad@chromium.org>
Link: http://lkml.kernel.org/r/1421272101-16847-2-git-send-email-dvlasenk@redhat.com
Link: http://lkml.kernel.org/r/59d71848cee3ec9eb48c0252e602efd6bd560e3c.1424989793.git.luto@amacapital.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-03-04 22:50:50 +01:00
Denys Vlasenko e90e147cbc x86/asm/entry/64: Fix comments
- Misleading and slightly incorrect comments in "struct pt_regs" are
   fixed (four instances).

 - Fix incorrect comment atop EMPTY_FRAME macro.

 - Explain in more detail what we do with stack layout during hw interrupt.

 - Correct comments about "partial stack frame" which are no longer
   true.

Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Cc: Alexei Starovoitov <ast@plumgrid.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Will Drewry <wad@chromium.org>
Link: http://lkml.kernel.org/r/1423778052-21038-3-git-send-email-dvlasenk@redhat.com
Link: http://lkml.kernel.org/r/e1f4429c491fe6ceeddb879dea2786e0f8920f9c.1424989793.git.luto@amacapital.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-03-04 22:50:49 +01:00
Denys Vlasenko 76f5df43ca x86/asm/entry/64: Always allocate a complete "struct pt_regs" on the kernel stack
The 64-bit entry code was using six stack slots less by not
saving/restoring registers which are callee-preserved according
to the C ABI, and was not allocating space for them.

Only when syscalls needed a complete "struct pt_regs" was
the complete area allocated and filled in.

As an additional twist, on interrupt entry a "slightly less
truncated pt_regs" trick is used, to make nested interrupt
stacks easier to unwind.

This proved to be a source of significant obfuscation and subtle
bugs. For example, 'stub_fork' had to pop the return address,
extend the struct, save registers, and push return address back.
Ugly. 'ia32_ptregs_common' pops return address and "returns" via
jmp insn, throwing a wrench into CPU return stack cache.

This patch changes the code to always allocate a complete
"struct pt_regs" on the kernel stack. The saving of registers
is still done lazily.

"Partial pt_regs" trick on interrupt stack is retained.

Macros which manipulate "struct pt_regs" on stack are reworked:

 - ALLOC_PT_GPREGS_ON_STACK allocates the structure.

 - SAVE_C_REGS saves to it those registers which are clobbered
   by C code.

 - SAVE_EXTRA_REGS saves to it all other registers.

 - Corresponding RESTORE_* and REMOVE_PT_GPREGS_FROM_STACK macros
   reverse it.

'ia32_ptregs_common', 'stub_fork' and friends lost their ugly dance
with the return pointer.

LOAD_ARGS32 in ia32entry.S now uses symbolic stack offsets
instead of magic numbers.

'error_entry' and 'save_paranoid' now use SAVE_C_REGS +
SAVE_EXTRA_REGS instead of having it open-coded yet again.

Patch was run-tested: 64-bit executables, 32-bit executables,
strace works.

Timing tests did not show measurable difference in 32-bit
and 64-bit syscalls.

Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Cc: Alexei Starovoitov <ast@plumgrid.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Will Drewry <wad@chromium.org>
Link: http://lkml.kernel.org/r/1423778052-21038-2-git-send-email-dvlasenk@redhat.com
Link: http://lkml.kernel.org/r/b89763d354aa23e670b9bdf3a40ae320320a7c2e.1424989793.git.luto@amacapital.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-03-04 22:50:49 +01:00
Denys Vlasenko 6e1327bd2b x86/asm/entry/64: Fix incorrect symbolic constant usage: R11->ARGOFFSET
Since the last fix of this nature, a few more instances have crept
in. Fix them up. No object code changes (constants have the same
value).

Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Cc: Alexei Starovoitov <ast@plumgrid.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Will Drewry <wad@chromium.org>
Link: http://lkml.kernel.org/r/1423778052-21038-1-git-send-email-dvlasenk@redhat.com
Link: http://lkml.kernel.org/r/f5e1c4084319a42e5f14d41e2d638949ce66bc08.1424989793.git.luto@amacapital.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-03-04 22:50:49 +01:00
Denys Vlasenko 49db46a67b x86/asm: Introduce push/pop macros which generate CFI_REL_OFFSET and CFI_RESTORE
Sequences:

        pushl_cfi %reg
        CFI_REL_OFFSET reg, 0

and:

        popl_cfi %reg
        CFI_RESTORE reg

happen quite often. This patch adds macros which generate them.

No assembly changes (verified with objdump -dr vmlinux.o).

Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Cc: Alexei Starovoitov <ast@plumgrid.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Will Drewry <wad@chromium.org>
Link: http://lkml.kernel.org/r/1421017655-25561-1-git-send-email-dvlasenk@redhat.com
Link: http://lkml.kernel.org/r/2202eb90f175cf45d1b2d1c64dbb5676a8ad07ad.1424989793.git.luto@amacapital.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-03-04 22:50:49 +01:00
Denys Vlasenko 69e8544cd0 x86/asm/64: Open-code register save/restore in trace_hardirqs*() thunks
This is a preparatory patch for change in "struct pt_regs"
handling in entry_64.S.

trace_hardirqs*() thunks were (ab)using a part of the
'pt_regs' handling code, namely the SAVE_ARGS/RESTORE_ARGS
macros, to save/restore registers across C function calls.

Since SAVE_ARGS is going to be changed, open-code
register saving/restoring here.

Incidentally, this removes a bit of dead code:
one SAVE_ARGS was used just to emit a CFI annotation,
but it also generated unreachable assembly instructions.

Take a page from thunk_32.S and use push/pop instructions
instead of movq, they are far shorter:
1 or 2 bytes versus 5, and no need for instructions to adjust %rsp:

   text	   data	    bss	    dec	    hex	filename
    333	     40	      0	    373	    175	thunk_64_movq.o
    104	     40	      0	    144	     90	thunk_64_push_pop.o

[ This is ugly as sin, but we'll fix up the ugliness in the next
  patch. I see no point in reordering patches just to avoid an
  ugly intermediate state.  --Andy ]

Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Cc: Alexei Starovoitov <ast@plumgrid.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Will Drewry <wad@chromium.org>
Link: http://lkml.kernel.org/r/1420927210-19738-4-git-send-email-dvlasenk@redhat.com
Link: http://lkml.kernel.org/r/4c979ad604f0f02c5ade3b3da308b53eabd5e198.1424989793.git.luto@amacapital.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-03-04 22:50:48 +01:00
Ingo Molnar f8e92fb4b0 A more involved rework of the alternatives framework to be able to
pad instructions and thus make using the alternatives macros more
 straightforward and without having to figure out old and new instruction
 sizes but have the toolchain figure that out for us.
 
 Furthermore, it optimizes JMPs used so that fetch and decode can be
 relieved with smaller versions of the JMPs, where possible.
 
 Some stats:
 
 x86_64 defconfig:
 
 Alternatives sites total:               2478
 Total padding added (in Bytes):         6051
 
 The padding is currently done for:
 
 X86_FEATURE_ALWAYS
 X86_FEATURE_ERMS
 X86_FEATURE_LFENCE_RDTSC
 X86_FEATURE_MFENCE_RDTSC
 X86_FEATURE_SMAP
 
 This is with the latest version of the patchset. Of course, on each
 machine the alternatives sites actually being patched are a proper
 subset of the total number.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJU9ekpAAoJEBLB8Bhh3lVKyjYP/AiHEiHkkjnpwTt49kUtUMI6
 GIlGfJVNjp5LLnSRD/fkL/wdkBgQtMzr9O1g8Qi/lbFqxsOFteU9f1OtLx34ZwZw
 MhtdiHcrKGMsaIxTJh4FaqPHBT5ussm2yn1jlAX+LgILd3dpqe3oytsO8JihcK9j
 t2u9V/Lq92TV7zXxGgWJsPc86WhhgdldlU3X96S++Di18bnDaKbGkzthU6WzZG/H
 qtFZ5bfK8TlVHYduft+D9ZPzFYGp1WCOa03qU4+Djaxw02HDB6Ltysend9zg0lB1
 RT/BP0PwHD3mOL11qpgtV1ChCbR8FJMN/z5+YdSNJgzDQA0H5Sf0UueTweosfAz+
 /iC5t/wkegdYtqtA0nKVypYOJCS+UdfMZXenYgtSUJl6drB6I5BCW4mVft3AuWo+
 EilPGpblvmjWRx1HiF4/Q/5zrSWHzmKQDyXuyxI9m0OUxAGAM0+8CY6wOqRA5pX+
 /f5MjZ1hXELQGhl5Qdj4nqJacICGevJ8WYdZ53B+uYVxz7fbXk9hSYcZKT94UshD
 qSdaV4XJSuC7pDKqiWoNWXp5N1g+D2BgfwoQEr/RnodFZRlfc+cmOv/visak0OLr
 E/pp1vJvCi3+T3ImX1MCDiXmflQtFctiL3hNgMXYK2IGhJb2RDC2bFeZkksOHuAE
 BGgrn+usQDjVlikEnfI3
 =0KXp
 -----END PGP SIGNATURE-----

Merge tag 'alternatives_padding' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp into x86/asm

Pull alternative instructions framework improvements from Borislav Petkov:

 "A more involved rework of the alternatives framework to be able to
  pad instructions and thus make using the alternatives macros more
  straightforward and without having to figure out old and new instruction
  sizes but have the toolchain figure that out for us.

  Furthermore, it optimizes JMPs used so that fetch and decode can be
  relieved with smaller versions of the JMPs, where possible.

  Some stats:

    x86_64 defconfig:

    Alternatives sites total:               2478
    Total padding added (in Bytes):         6051

  The padding is currently done for:

    X86_FEATURE_ALWAYS
    X86_FEATURE_ERMS
    X86_FEATURE_LFENCE_RDTSC
    X86_FEATURE_MFENCE_RDTSC
    X86_FEATURE_SMAP

  This is with the latest version of the patchset. Of course, on each
  machine the alternatives sites actually being patched are a proper
  subset of the total number."

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-03-04 06:36:15 +01:00
Ingo Molnar d2c032e3dc Linux 4.0-rc2
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABAgAGBQJU9enEAAoJEHm+PkMAQRiG/ewIAJ4MW4tcAhaVj6ndCF3+uL/b
 RaVm1apUjsTloe5Fl0TT9J5CO3zdOetmMNToy2sf0W4MJDIyHf21o83l7eniV/6q
 al/c3fQ6HVtNjiSUNghTtzVlL+gUD1F60b9BGYi1V5h2Mp8u0NG1alTGLQfCB8sE
 ArB+v2aWEdSPn7mZDA0Yuc1In+8bkpht3oy+OLD/8JNkqqLnml9YOyPjM1cuRpBr
 NxKCLcPzSHH9/nR3T6XtkxXYV5xD3+CDm9roJhfHukoFmfT/G3C65Zcp2KEed/Cw
 QQpu+ox7fpUs10F/Fbfm8AE+tRB4o2sGh97sprXrO5oaFdx6FPIBo4WN8i/Vy68=
 =qpY+
 -----END PGP SIGNATURE-----

Merge tag 'v4.0-rc2' into x86/asm, to refresh the tree

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-03-04 06:35:43 +01:00
Brian Gerst 7e8e385aaf x86/compat: Remove sys32_vm86_warning
The check against lastcomm is racy, and the message it produces
isn't necessary.  vm86 support can be disabled on a 32-bit
kernel also, and doesn't have this message.  Switch to
sys_ni_syscall instead.

Signed-off-by: Brian Gerst <brgerst@gmail.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1425439896-8322-4-git-send-email-brgerst@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-03-04 06:16:21 +01:00
Brian Gerst 2aa4a71092 x86/compat: Merge native and compat 32-bit syscall tables
Combine the 32-bit syscall tables into one file.

Signed-off-by: Brian Gerst <brgerst@gmail.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1425439896-8322-3-git-send-email-brgerst@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-03-04 06:16:21 +01:00
Brian Gerst 29a5ff97fa x86/compat: Remove compat_ni_syscall()
compat_ni_syscall() does the same thing as sys_ni_syscall().

Signed-off-by: Brian Gerst <brgerst@gmail.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1425439896-8322-2-git-send-email-brgerst@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-03-04 06:16:21 +01:00
Linus Torvalds 13a7a6ac0a Linux 4.0-rc2 2015-03-03 09:04:59 -08:00
Daniel Vetter 9128b040eb drm/i915: Fix modeset state confusion in the load detect code
This is a tricky story of the new atomic state handling and the legacy
code fighting over each another. The bug at hand is an underrun of the
framebuffer reference with subsequent hilarity caused by the load
detect code. Which is peculiar since the the exact same code works
fine as the implementation of the legacy setcrtc ioctl.

Let's look at the ingredients:

- Currently our code is a crazy mix of legacy modeset interfaces to
  set the parameters and half-baked atomic state tracking underneath.
  While this transition is going we're using the transitional plane
  helpers to update the atomic side (drm_plane_helper_disable/update
  and friends), i.e. plane->state->fb. Since the state structure owns
  the fb those functions take care of that themselves.

  The legacy state (specifically crtc->primary->fb) is still managed
  by the old code (and mostly by the drm core), with the fb reference
  counting done by callers (core drm for the ioctl or the i915 load
  detect code). The relevant commit is

  commit ea2c67bb4a
  Author: Matt Roper <matthew.d.roper@intel.com>
  Date:   Tue Dec 23 10:41:52 2014 -0800

      drm/i915: Move to atomic plane helpers (v9)

- drm_plane_helper_disable has special code to handle multiple calls
  in a row - it checks plane->crtc == NULL and bails out. This is to
  match the proper atomic implementation which needs the crtc to get
  at the implied locking context atomic updates always need. See

  commit acf24a395c
  Author: Daniel Vetter <daniel.vetter@ffwll.ch>
  Date:   Tue Jul 29 15:33:05 2014 +0200

      drm/plane-helper: transitional atomic plane helpers

- The universal plane code split out the implicit primary plane from
  the CRTC into it's own full-blown drm_plane object. As part of that
  the setcrtc ioctl (which updated both the crtc mode and primary
  plane) learned to set crtc->primary->crtc on modeset to make sure
  the plane->crtc assignments statate up to date in

  commit e13161af80
  Author: Matt Roper <matthew.d.roper@intel.com>
  Date:   Tue Apr 1 15:22:38 2014 -0700

      drm: Add drm_crtc_init_with_planes() (v2)

  Unfortunately we've forgotten to update the load detect code. Which
  wasn't a problem since the load detect modeset is temporary and
  always undone before we drop the locks.

- Finally there is a organically grown history (i.e. don't ask) around
  who sets the legacy plane->fb for the various driver entry points.
  Originally updating that was the drivers duty, but for almost all
  places we've moved that (plus updating the refcounts) into the core.
  Again the exception is the load detect code.

Taking all together the following happens:
- The load detect code doesn't set crtc->primary->crtc. This is only
  really an issue on crtcs never before used or when userspace
  explicitly disabled the primary plane.

- The plane helper glue code short-circuits because of that and leaves
  a non-NULL fb behind in plane->state->fb and plane->fb. The state
  fb isn't a real problem (it's properly refcounted on its own), it's
  just the canary.

- Load detect code drops the reference for that fb, but doesn't set
  plane->fb = NULL. This is ok since it's still living in that old
  world where drivers had to clear the pointer but the core/callers
  handled the refcounting.

- On the next modeset the drm core notices plane->fb and takes care of
  refcounting it properly by doing another unref. This drops the
  refcount to zero, leaving state->plane now pointing at freed memory.

- intel_plane_duplicate_state still assume it owns a reference to that
  very state->fb and bad things start to happen.

Fix this all by applying the same duct-tape as for the legacy setcrtc
ioctl code and set crtc->primary->crtc properly.

Cc: Matt Roper <matthew.d.roper@intel.com>
Cc: Paul Bolle <pebolle@tiscali.nl>
Cc: Rob Clark <robdclark@gmail.com>
Cc: Paulo Zanoni <przanoni@gmail.com>
Cc: Sean Paul <seanpaul@chromium.org>
Cc: Matt Roper <matthew.d.roper@intel.com>
Reported-and-tested-by: Linus Torvalds <torvalds@linux-foundation.org>
Reported-by: Paul Bolle <pebolle@tiscali.nl>
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-03-03 09:04:33 -08:00
Borislav Petkov dfecb95cdf perf/bench: Add -r all so that you can run all mem* routines
perf bench mem mem{set,cpy} -r all thus runs all available mem
benchmarking routines.

Reviewed-by: Hitoshi Mitake <mitake.hitoshi@lab.ntt.co.jp>
Signed-off-by: Borislav Petkov <bp@suse.de>
2015-03-03 18:01:58 +01:00