linux/arch/x86
Ingo Molnar 1bc6b056d8 x86/fpu: Optimize copy_fpregs_to_fpstate() by removing the FNCLEX synchronization with FP exceptions
So we have the following ancient code in copy_fpregs_to_fpstate():

	if (unlikely(fpu->state->fxsave.swd & X87_FSW_ES)) {
		asm volatile("fnclex");
		goto drop_fpregs;
	}

which clears pending FPU exceptions and then drops registers, which
causes the next FP instruction of the saved context to re-load the
saved FPU state, with all pending exceptions marked properly, and
will re-start the exception handling mechanism in the hardware.

Since FPU exceptions are always issued on instruction boundaries,
in particular on the next FP instruction following the exception
generating instruction, there's no fear of getting an FP exception
asynchronously.

They were truly asynchronous back in the IRQ13 days, when the FPU was
a weird and expensive co-processor that did its own processing, and we
had to synchronize with them, but that code is not working anymore:
we don't have IRQ13 mapped in the IDT anymore.

With the introduction of optimized XSAVE support there's a new
complication: if the xstate features bit indicates that a particular
state component is unused (in 'init state'), then the hardware does
not guarantee that the XSAVE (et al) instruction keeps the underlying
FPU state image in memory valid and current. In practice this means
that the hardware won't write it, and the exceptions flag in the
state might be an older version, with it still being set. This
meant that we had to check the xfeatures flag as well, adding
another memory load and branch to a critical hot path of the scheduler.

So optimize all this by removing both the old quirk and the new check,
and straight-line optimizing the most common cases with likely()
hints. Quite a bit of code gets removed this way:

  arch/x86/kernel/process_64.o:

    text    data     bss     dec     filename
    5484       8       0    5492     process_64.o.before
    5416       8       0    5424     process_64.o.after

Now there's also a chance that some weird behavior or erratum was
masked by our IRQ13 handling quirk (or that I misunderstood the
nature of the quirk), and that this change triggers some badness.

There's no real good way to protect against that possibility other
than keeping this change well isolated, well commented and well
bisectable. If you bisect a weird (or not so weird) breakage to
this commit then please let us know!

Reviewed-by: Borislav Petkov <bp@alien8.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 15:47:49 +02:00
..
boot * Avoid garbage names in efivarfs due to buggy firmware by zero'ing 2015-05-06 08:30:24 +02:00
configs x86/build/defconfig: Enable USB_EHCI_TT_NEWSCHED=y 2015-02-19 02:21:14 +01:00
crypto x86/fpu: Rename fpu-internal.h to fpu/internal.h 2015-05-19 15:47:31 +02:00
ia32 x86/fpu: Rename fpu-internal.h to fpu/internal.h 2015-05-19 15:47:31 +02:00
include x86/fpu: Optimize copy_fpregs_to_fpstate() by removing the FNCLEX synchronization with FP exceptions 2015-05-19 15:47:49 +02:00
kernel x86/fpu: Rename fpu_save_init() to copy_fpregs_to_fpstate() 2015-05-19 15:47:49 +02:00
kvm x86/fpu: Rename fpu_save_init() to copy_fpregs_to_fpstate() 2015-05-19 15:47:49 +02:00
lguest x86/fpu: Rename i387.h to fpu/api.h 2015-05-19 15:47:30 +02:00
lib x86/fpu: Rename i387.h to fpu/api.h 2015-05-19 15:47:30 +02:00
math-emu x86/fpu: Move various internal function prototypes to fpu/internal.h 2015-05-19 15:47:48 +02:00
mm x86/fpu: Rename fpu_save_init() to copy_fpregs_to_fpstate() 2015-05-19 15:47:49 +02:00
net x86: bpf_jit: fix FROM_BE16 and FROM_LE16/32 instructions 2015-05-12 23:13:08 -04:00
oprofile x86/asm/entry: Change all 'user_mode_vm()' calls to 'user_mode()' 2015-03-23 11:14:17 +01:00
pci x86/PCI/ACPI: Make all resources except [io 0xcf8-0xcff] available on PCI bus 2015-04-30 22:17:34 +02:00
platform TTY/Serial patches for 4.1-rc1 2015-04-21 09:33:10 -07:00
power x86/fpu: Move various internal function prototypes to fpu/internal.h 2015-05-19 15:47:48 +02:00
purgatory Merge branches 'x86-build-for-linus', 'x86-cleanups-for-linus' and 'x86-debug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2014-12-10 12:35:46 -08:00
realmode Merge branch 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2015-02-16 14:58:12 -08:00
syscalls xen: features and fixes for 4.1-rc0 2015-04-16 14:01:03 -05:00
tools x86, build: replace Perl script with Shell script 2015-01-26 13:37:18 -08:00
um Merge branch 'exec_domain_rip_v2' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/misc 2015-04-15 13:53:55 -07:00
vdso x86/vdso: Fix 'make bzImage' on older distros 2015-05-11 10:25:02 +02:00
video
xen x86/fpu: Simplify fpu__cpu_init() 2015-05-19 15:47:44 +02:00
.gitignore x86/build: Add arch/x86/purgatory/ make generated files to gitignore 2014-10-09 09:29:46 +02:00
Kbuild kexec: create a new config option CONFIG_KEXEC_FILE for new syscall 2014-08-29 16:28:16 -07:00
Kconfig Initial ACPI support for arm64: 2015-04-24 08:23:45 -07:00
Kconfig.cpu
Kconfig.debug x86, intel-mid: remove Intel MID specific serial support 2015-03-07 03:25:18 +01:00
Makefile kbuild: use relative path more to include Makefile 2015-04-02 16:42:08 +02:00
Makefile.um kbuild: use relative path more to include Makefile 2015-04-02 16:42:08 +02:00
Makefile_32.cpu