linux/arch/x86/power/hibernate_asm_64.S

147 lines
3.5 KiB
ArmAsm
Raw Normal View History

/*
* Hibernation support for x86-64
*
* Distribute under GPLv2.
*
* Copyright 2007 Rafael J. Wysocki <rjw@sisk.pl>
* Copyright 2005 Andi Kleen <ak@suse.de>
* Copyright 2004 Pavel Machek <pavel@suse.cz>
*
Hibernation: Arbitrary boot kernel support on x86_64 Make it possible to restore a hibernation image on x86_64 with the help of a kernel different from the one in the image. The idea is to split the core restoration code into two separate parts and to place each of them in a different page.  The first part belongs to the boot kernel and is executed as the last step of the image kernel's memory restoration procedure.  Before being executed, it is relocated to a safe page that won't be overwritten while copying the image kernel pages. The final operation performed by it is a jump to the second part of the core restoration code that belongs to the image kernel and has just been restored. This code makes the CPU switch to the image kernel's page tables and restores the state of general purpose registers (including the stack pointer) from before the hibernation. The main issue with this idea is that in order to jump to the second part of the core restoration code the boot kernel needs to know its address.  However, this address may be passed to it in the image header.  Namely, the part of the image header previously used for checking if the version of the image kernel is correct can be replaced with some architecture specific data that will allow the boot kernel to jump to the right address within the image kernel.  These data should also be used for checking if the image kernel is compatible with the boot kernel (as far as the memory restroration procedure is concerned). It can be done, for example, with the help of a "magic" value that has to be equal in both kernels, so that they can be regarded as compatible. Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Acked-by: Pavel Machek <pavel@ucw.cz> Cc: Andi Kleen <ak@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-18 18:04:53 +08:00
* swsusp_arch_resume must not use any stack or any nonlocal variables while
* copying pages:
*
* Its rewriting one kernel image with another. What is stack in "old"
* image could very well be data page in "new" image, and overwriting
* your own stack under you is bad idea.
*/
.text
#include <linux/linkage.h>
#include <asm/segment.h>
#include <asm/page_types.h>
#include <asm/asm-offsets.h>
#include <asm/processor-flags.h>
2016-01-22 06:49:24 +08:00
#include <asm/frame.h>
ENTRY(swsusp_arch_suspend)
movq $saved_context, %rax
movq %rsp, pt_regs_sp(%rax)
movq %rbp, pt_regs_bp(%rax)
movq %rsi, pt_regs_si(%rax)
movq %rdi, pt_regs_di(%rax)
movq %rbx, pt_regs_bx(%rax)
movq %rcx, pt_regs_cx(%rax)
movq %rdx, pt_regs_dx(%rax)
movq %r8, pt_regs_r8(%rax)
movq %r9, pt_regs_r9(%rax)
movq %r10, pt_regs_r10(%rax)
movq %r11, pt_regs_r11(%rax)
movq %r12, pt_regs_r12(%rax)
movq %r13, pt_regs_r13(%rax)
movq %r14, pt_regs_r14(%rax)
movq %r15, pt_regs_r15(%rax)
pushfq
popq pt_regs_flags(%rax)
/* save cr3 */
movq %cr3, %rax
movq %rax, restore_cr3(%rip)
Hibernation: Arbitrary boot kernel support on x86_64 Make it possible to restore a hibernation image on x86_64 with the help of a kernel different from the one in the image. The idea is to split the core restoration code into two separate parts and to place each of them in a different page.  The first part belongs to the boot kernel and is executed as the last step of the image kernel's memory restoration procedure.  Before being executed, it is relocated to a safe page that won't be overwritten while copying the image kernel pages. The final operation performed by it is a jump to the second part of the core restoration code that belongs to the image kernel and has just been restored. This code makes the CPU switch to the image kernel's page tables and restores the state of general purpose registers (including the stack pointer) from before the hibernation. The main issue with this idea is that in order to jump to the second part of the core restoration code the boot kernel needs to know its address.  However, this address may be passed to it in the image header.  Namely, the part of the image header previously used for checking if the version of the image kernel is correct can be replaced with some architecture specific data that will allow the boot kernel to jump to the right address within the image kernel.  These data should also be used for checking if the image kernel is compatible with the boot kernel (as far as the memory restroration procedure is concerned). It can be done, for example, with the help of a "magic" value that has to be equal in both kernels, so that they can be regarded as compatible. Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Acked-by: Pavel Machek <pavel@ucw.cz> Cc: Andi Kleen <ak@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-18 18:04:53 +08:00
x86/power/64: Fix hibernation return address corruption In kernel bug 150021, a kernel panic was reported when restoring a hibernate image. Only a picture of the oops was reported, so I can't paste the whole thing here. But here are the most interesting parts: kernel tried to execute NX-protected page - exploit attempt? (uid: 0) BUG: unable to handle kernel paging request at ffff8804615cfd78 ... RIP: ffff8804615cfd78 RSP: ffff8804615f0000 RBP: ffff8804615cfdc0 ... Call Trace: do_signal+0x23 exit_to_usermode_loop+0x64 ... The RIP is on the same page as RBP, so it apparently started executing on the stack. The bug was bisected to commit ef0f3ed5a4ac (x86/asm/power: Create stack frames in hibernate_asm_64.S), which in retrospect seems quite dangerous, since that code saves and restores the stack pointer from a global variable ('saved_context'). There are a lot of moving parts in the hibernate save and restore paths, so I don't know exactly what caused the panic. Presumably, a FRAME_END was executed without the corresponding FRAME_BEGIN, or vice versa. That would corrupt the return address on the stack and would be consistent with the details of the above panic. [ rjw: One major problem is that by the time the FRAME_BEGIN in restore_registers() is executed, the stack pointer value may not be valid any more. Namely, the stack area pointed to by it previously may have been overwritten by some image memory contents and that page frame may now be used for whatever different purpose it had been allocated for before hibernation. In that case, the FRAME_BEGIN will corrupt that memory. ] Instead of doing the frame pointer save/restore around the bounds of the affected functions, just do it around the call to swsusp_save(). That has the same effect of ensuring that if swsusp_save() sleeps, the frame pointers will be correct. It's also a much more obviously safe way to do it than the original patch. And objtool still doesn't report any warnings. Fixes: ef0f3ed5a4ac (x86/asm/power: Create stack frames in hibernate_asm_64.S) Link: https://bugzilla.kernel.org/show_bug.cgi?id=150021 Cc: 4.6+ <stable@vger.kernel.org> # 4.6+ Reported-by: Andre Reinke <andre.reinke@mailbox.org> Tested-by: Andre Reinke <andre.reinke@mailbox.org> Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Acked-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-07-29 05:15:21 +08:00
FRAME_BEGIN
call swsusp_save
2016-01-22 06:49:24 +08:00
FRAME_END
ret
2016-01-22 06:49:24 +08:00
ENDPROC(swsusp_arch_suspend)
[PATCH] x86_64: Set up safe page tables during resume The following patch makes swsusp avoid the possible temporary corruption of page translation tables during resume on x86-64. This is achieved by creating a copy of the relevant page tables that will not be modified by swsusp and can be safely used by it on resume. The problem is that during resume on x86-64 swsusp may temporarily corrupt the page tables used for the direct mapping of RAM. If that happens, a page fault occurs and cannot be handled properly, which leads to the solid hang of the affected system. This leads to the loss of the system's state from before suspend and may result in the loss of data or the corruption of filesystems, so it is a serious issue. Also, it appears to happen quite often (for me, as often as 50% of the time). The problem is related to the fact that (at least) one of the PMD entries used in the direct memory mapping (starting at PAGE_OFFSET) points to a page table the physical address of which is much greater than the physical address of the PMD entry itself. Moreover, unfortunately, the physical address of the page table before suspend (i.e. the one stored in the suspend image) happens to be different to the physical address of the corresponding page table used during resume (i.e. the one that is valid right before swsusp_arch_resume() in arch/x86_64/kernel/suspend_asm.S is executed). Thus while the image is restored, the "offending" PMD entry gets overwritten, so it does not point to the right physical address any more (i.e. there's no page table at the address pointed to by it, because it points to the address the page table has been at during suspend). Consequently, if the PMD entry is used later on, and it _is_ used in the process of copying the image pages, a page fault occurs, but it cannot be handled in the normal way and the system hangs. In principle we can call create_resume_mapping() from swsusp_arch_resume() (ie. from suspend_asm.S), but then the memory allocations in create_resume_mapping(), resume_pud_mapping(), and resume_pmd_mapping() must be made carefully so that we use _only_ NosaveFree pages in them (the other pages are overwritten by the loop in swsusp_arch_resume()). Additionally, we are in atomic context at that time, so we cannot use GFP_KERNEL. Moreover, if one of the allocations fails, we should free all of the allocated pages, so we need to trace them somehow. All of this is done in the appended patch, except that the functions populating the page tables are located in arch/x86_64/kernel/suspend.c rather than in init.c. It may be done in a more elegan way in the future, with the help of some swsusp patches that are in the works now. [AK: move some externs into headers, renamed a function] Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-10-10 03:19:40 +08:00
ENTRY(restore_image)
Hibernation: Arbitrary boot kernel support on x86_64 Make it possible to restore a hibernation image on x86_64 with the help of a kernel different from the one in the image. The idea is to split the core restoration code into two separate parts and to place each of them in a different page.  The first part belongs to the boot kernel and is executed as the last step of the image kernel's memory restoration procedure.  Before being executed, it is relocated to a safe page that won't be overwritten while copying the image kernel pages. The final operation performed by it is a jump to the second part of the core restoration code that belongs to the image kernel and has just been restored. This code makes the CPU switch to the image kernel's page tables and restores the state of general purpose registers (including the stack pointer) from before the hibernation. The main issue with this idea is that in order to jump to the second part of the core restoration code the boot kernel needs to know its address.  However, this address may be passed to it in the image header.  Namely, the part of the image header previously used for checking if the version of the image kernel is correct can be replaced with some architecture specific data that will allow the boot kernel to jump to the right address within the image kernel.  These data should also be used for checking if the image kernel is compatible with the boot kernel (as far as the memory restroration procedure is concerned). It can be done, for example, with the help of a "magic" value that has to be equal in both kernels, so that they can be regarded as compatible. Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Acked-by: Pavel Machek <pavel@ucw.cz> Cc: Andi Kleen <ak@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-18 18:04:53 +08:00
/* prepare to jump to the image kernel */
x86/power/64: Fix kernel text mapping corruption during image restoration Logan Gunthorpe reports that hibernation stopped working reliably for him after commit ab76f7b4ab23 (x86/mm: Set NX on gap between __ex_table and rodata). That turns out to be a consequence of a long-standing issue with the 64-bit image restoration code on x86, which is that the temporary page tables set up by it to avoid page tables corruption when the last bits of the image kernel's memory contents are copied into their original page frames re-use the boot kernel's text mapping, but that mapping may very well get corrupted just like any other part of the page tables. Of course, if that happens, the final jump to the image kernel's entry point will go to nowhere. The exact reason why commit ab76f7b4ab23 matters here is that it sometimes causes a PMD of a large page to be split into PTEs that are allocated dynamically and get corrupted during image restoration as described above. To fix that issue note that the code copying the last bits of the image kernel's memory contents to the page frames occupied by them previoulsy doesn't use the kernel text mapping, because it runs from a special page covered by the identity mapping set up for that code from scratch. Hence, the kernel text mapping is only needed before that code starts to run and then it will only be used just for the final jump to the image kernel's entry point. Accordingly, the temporary page tables set up in swsusp_arch_resume() on x86-64 need to contain the kernel text mapping too. That mapping is only going to be used for the final jump to the image kernel, so it only needs to cover the image kernel's entry point, because the first thing the image kernel does after getting control back is to switch over to its own original page tables. Moreover, the virtual address of the image kernel's entry point in that mapping has to be the same as the one mapped by the image kernel's page tables. With that in mind, modify the x86-64's arch_hibernation_header_save() and arch_hibernation_header_restore() routines to pass the physical address of the image kernel's entry point (in addition to its virtual address) to the boot kernel (a small piece of assembly code involved in passing the entry point's virtual address to the image kernel is not necessary any more after that, so drop it). Update RESTORE_MAGIC too to reflect the image header format change. Next, in set_up_temporary_mappings(), use the physical and virtual addresses of the image kernel's entry point passed in the image header to set up a minimum kernel text mapping (using memory pages that won't be overwritten by the image kernel's memory contents) that will map those addresses to each other as appropriate. This makes the concern about the possible corruption of the original boot kernel text mapping go away and if the the minimum kernel text mapping used for the final jump marks the image kernel's entry point memory as executable, the jump to it is guaraneed to succeed. Fixes: ab76f7b4ab23 (x86/mm: Set NX on gap between __ex_table and rodata) Link: http://marc.info/?l=linux-pm&m=146372852823760&w=2 Reported-by: Logan Gunthorpe <logang@deltatee.com> Reported-and-tested-by: Borislav Petkov <bp@suse.de> Tested-by: Kees Cook <keescook@chromium.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-07-01 00:11:41 +08:00
movq restore_jump_address(%rip), %r8
movq restore_cr3(%rip), %r9
/* prepare to switch to temporary page tables */
movq temp_level4_pgt(%rip), %rax
movq mmu_cr4_features(%rip), %rbx
Hibernation: Arbitrary boot kernel support on x86_64 Make it possible to restore a hibernation image on x86_64 with the help of a kernel different from the one in the image. The idea is to split the core restoration code into two separate parts and to place each of them in a different page.  The first part belongs to the boot kernel and is executed as the last step of the image kernel's memory restoration procedure.  Before being executed, it is relocated to a safe page that won't be overwritten while copying the image kernel pages. The final operation performed by it is a jump to the second part of the core restoration code that belongs to the image kernel and has just been restored. This code makes the CPU switch to the image kernel's page tables and restores the state of general purpose registers (including the stack pointer) from before the hibernation. The main issue with this idea is that in order to jump to the second part of the core restoration code the boot kernel needs to know its address.  However, this address may be passed to it in the image header.  Namely, the part of the image header previously used for checking if the version of the image kernel is correct can be replaced with some architecture specific data that will allow the boot kernel to jump to the right address within the image kernel.  These data should also be used for checking if the image kernel is compatible with the boot kernel (as far as the memory restroration procedure is concerned). It can be done, for example, with the help of a "magic" value that has to be equal in both kernels, so that they can be regarded as compatible. Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Acked-by: Pavel Machek <pavel@ucw.cz> Cc: Andi Kleen <ak@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-18 18:04:53 +08:00
/* prepare to copy image data to their original locations */
movq restore_pblist(%rip), %rdx
x86/power/64: Fix kernel text mapping corruption during image restoration Logan Gunthorpe reports that hibernation stopped working reliably for him after commit ab76f7b4ab23 (x86/mm: Set NX on gap between __ex_table and rodata). That turns out to be a consequence of a long-standing issue with the 64-bit image restoration code on x86, which is that the temporary page tables set up by it to avoid page tables corruption when the last bits of the image kernel's memory contents are copied into their original page frames re-use the boot kernel's text mapping, but that mapping may very well get corrupted just like any other part of the page tables. Of course, if that happens, the final jump to the image kernel's entry point will go to nowhere. The exact reason why commit ab76f7b4ab23 matters here is that it sometimes causes a PMD of a large page to be split into PTEs that are allocated dynamically and get corrupted during image restoration as described above. To fix that issue note that the code copying the last bits of the image kernel's memory contents to the page frames occupied by them previoulsy doesn't use the kernel text mapping, because it runs from a special page covered by the identity mapping set up for that code from scratch. Hence, the kernel text mapping is only needed before that code starts to run and then it will only be used just for the final jump to the image kernel's entry point. Accordingly, the temporary page tables set up in swsusp_arch_resume() on x86-64 need to contain the kernel text mapping too. That mapping is only going to be used for the final jump to the image kernel, so it only needs to cover the image kernel's entry point, because the first thing the image kernel does after getting control back is to switch over to its own original page tables. Moreover, the virtual address of the image kernel's entry point in that mapping has to be the same as the one mapped by the image kernel's page tables. With that in mind, modify the x86-64's arch_hibernation_header_save() and arch_hibernation_header_restore() routines to pass the physical address of the image kernel's entry point (in addition to its virtual address) to the boot kernel (a small piece of assembly code involved in passing the entry point's virtual address to the image kernel is not necessary any more after that, so drop it). Update RESTORE_MAGIC too to reflect the image header format change. Next, in set_up_temporary_mappings(), use the physical and virtual addresses of the image kernel's entry point passed in the image header to set up a minimum kernel text mapping (using memory pages that won't be overwritten by the image kernel's memory contents) that will map those addresses to each other as appropriate. This makes the concern about the possible corruption of the original boot kernel text mapping go away and if the the minimum kernel text mapping used for the final jump marks the image kernel's entry point memory as executable, the jump to it is guaraneed to succeed. Fixes: ab76f7b4ab23 (x86/mm: Set NX on gap between __ex_table and rodata) Link: http://marc.info/?l=linux-pm&m=146372852823760&w=2 Reported-by: Logan Gunthorpe <logang@deltatee.com> Reported-and-tested-by: Borislav Petkov <bp@suse.de> Tested-by: Kees Cook <keescook@chromium.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-07-01 00:11:41 +08:00
/* jump to relocated restore code */
Hibernation: Arbitrary boot kernel support on x86_64 Make it possible to restore a hibernation image on x86_64 with the help of a kernel different from the one in the image. The idea is to split the core restoration code into two separate parts and to place each of them in a different page.  The first part belongs to the boot kernel and is executed as the last step of the image kernel's memory restoration procedure.  Before being executed, it is relocated to a safe page that won't be overwritten while copying the image kernel pages. The final operation performed by it is a jump to the second part of the core restoration code that belongs to the image kernel and has just been restored. This code makes the CPU switch to the image kernel's page tables and restores the state of general purpose registers (including the stack pointer) from before the hibernation. The main issue with this idea is that in order to jump to the second part of the core restoration code the boot kernel needs to know its address.  However, this address may be passed to it in the image header.  Namely, the part of the image header previously used for checking if the version of the image kernel is correct can be replaced with some architecture specific data that will allow the boot kernel to jump to the right address within the image kernel.  These data should also be used for checking if the image kernel is compatible with the boot kernel (as far as the memory restroration procedure is concerned). It can be done, for example, with the help of a "magic" value that has to be equal in both kernels, so that they can be regarded as compatible. Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Acked-by: Pavel Machek <pavel@ucw.cz> Cc: Andi Kleen <ak@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-18 18:04:53 +08:00
movq relocated_restore_code(%rip), %rcx
jmpq *%rcx
/* code below has been relocated to a safe page */
ENTRY(core_restore_code)
x86/power/64: Fix kernel text mapping corruption during image restoration Logan Gunthorpe reports that hibernation stopped working reliably for him after commit ab76f7b4ab23 (x86/mm: Set NX on gap between __ex_table and rodata). That turns out to be a consequence of a long-standing issue with the 64-bit image restoration code on x86, which is that the temporary page tables set up by it to avoid page tables corruption when the last bits of the image kernel's memory contents are copied into their original page frames re-use the boot kernel's text mapping, but that mapping may very well get corrupted just like any other part of the page tables. Of course, if that happens, the final jump to the image kernel's entry point will go to nowhere. The exact reason why commit ab76f7b4ab23 matters here is that it sometimes causes a PMD of a large page to be split into PTEs that are allocated dynamically and get corrupted during image restoration as described above. To fix that issue note that the code copying the last bits of the image kernel's memory contents to the page frames occupied by them previoulsy doesn't use the kernel text mapping, because it runs from a special page covered by the identity mapping set up for that code from scratch. Hence, the kernel text mapping is only needed before that code starts to run and then it will only be used just for the final jump to the image kernel's entry point. Accordingly, the temporary page tables set up in swsusp_arch_resume() on x86-64 need to contain the kernel text mapping too. That mapping is only going to be used for the final jump to the image kernel, so it only needs to cover the image kernel's entry point, because the first thing the image kernel does after getting control back is to switch over to its own original page tables. Moreover, the virtual address of the image kernel's entry point in that mapping has to be the same as the one mapped by the image kernel's page tables. With that in mind, modify the x86-64's arch_hibernation_header_save() and arch_hibernation_header_restore() routines to pass the physical address of the image kernel's entry point (in addition to its virtual address) to the boot kernel (a small piece of assembly code involved in passing the entry point's virtual address to the image kernel is not necessary any more after that, so drop it). Update RESTORE_MAGIC too to reflect the image header format change. Next, in set_up_temporary_mappings(), use the physical and virtual addresses of the image kernel's entry point passed in the image header to set up a minimum kernel text mapping (using memory pages that won't be overwritten by the image kernel's memory contents) that will map those addresses to each other as appropriate. This makes the concern about the possible corruption of the original boot kernel text mapping go away and if the the minimum kernel text mapping used for the final jump marks the image kernel's entry point memory as executable, the jump to it is guaraneed to succeed. Fixes: ab76f7b4ab23 (x86/mm: Set NX on gap between __ex_table and rodata) Link: http://marc.info/?l=linux-pm&m=146372852823760&w=2 Reported-by: Logan Gunthorpe <logang@deltatee.com> Reported-and-tested-by: Borislav Petkov <bp@suse.de> Tested-by: Kees Cook <keescook@chromium.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-07-01 00:11:41 +08:00
/* switch to temporary page tables */
movq %rax, %cr3
/* flush TLB */
movq %rbx, %rcx
andq $~(X86_CR4_PGE), %rcx
movq %rcx, %cr4; # turn off PGE
movq %cr3, %rcx; # flush TLB
movq %rcx, %cr3;
movq %rbx, %cr4; # turn PGE back on
.Lloop:
testq %rdx, %rdx
jz .Ldone
/* get addresses from the pbe and copy the page */
movq pbe_address(%rdx), %rsi
movq pbe_orig_address(%rdx), %rdi
Hibernation: Arbitrary boot kernel support on x86_64 Make it possible to restore a hibernation image on x86_64 with the help of a kernel different from the one in the image. The idea is to split the core restoration code into two separate parts and to place each of them in a different page.  The first part belongs to the boot kernel and is executed as the last step of the image kernel's memory restoration procedure.  Before being executed, it is relocated to a safe page that won't be overwritten while copying the image kernel pages. The final operation performed by it is a jump to the second part of the core restoration code that belongs to the image kernel and has just been restored. This code makes the CPU switch to the image kernel's page tables and restores the state of general purpose registers (including the stack pointer) from before the hibernation. The main issue with this idea is that in order to jump to the second part of the core restoration code the boot kernel needs to know its address.  However, this address may be passed to it in the image header.  Namely, the part of the image header previously used for checking if the version of the image kernel is correct can be replaced with some architecture specific data that will allow the boot kernel to jump to the right address within the image kernel.  These data should also be used for checking if the image kernel is compatible with the boot kernel (as far as the memory restroration procedure is concerned). It can be done, for example, with the help of a "magic" value that has to be equal in both kernels, so that they can be regarded as compatible. Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Acked-by: Pavel Machek <pavel@ucw.cz> Cc: Andi Kleen <ak@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-18 18:04:53 +08:00
movq $(PAGE_SIZE >> 3), %rcx
rep
movsq
/* progress to the next pbe */
movq pbe_next(%rdx), %rdx
jmp .Lloop
x86/power/64: Fix kernel text mapping corruption during image restoration Logan Gunthorpe reports that hibernation stopped working reliably for him after commit ab76f7b4ab23 (x86/mm: Set NX on gap between __ex_table and rodata). That turns out to be a consequence of a long-standing issue with the 64-bit image restoration code on x86, which is that the temporary page tables set up by it to avoid page tables corruption when the last bits of the image kernel's memory contents are copied into their original page frames re-use the boot kernel's text mapping, but that mapping may very well get corrupted just like any other part of the page tables. Of course, if that happens, the final jump to the image kernel's entry point will go to nowhere. The exact reason why commit ab76f7b4ab23 matters here is that it sometimes causes a PMD of a large page to be split into PTEs that are allocated dynamically and get corrupted during image restoration as described above. To fix that issue note that the code copying the last bits of the image kernel's memory contents to the page frames occupied by them previoulsy doesn't use the kernel text mapping, because it runs from a special page covered by the identity mapping set up for that code from scratch. Hence, the kernel text mapping is only needed before that code starts to run and then it will only be used just for the final jump to the image kernel's entry point. Accordingly, the temporary page tables set up in swsusp_arch_resume() on x86-64 need to contain the kernel text mapping too. That mapping is only going to be used for the final jump to the image kernel, so it only needs to cover the image kernel's entry point, because the first thing the image kernel does after getting control back is to switch over to its own original page tables. Moreover, the virtual address of the image kernel's entry point in that mapping has to be the same as the one mapped by the image kernel's page tables. With that in mind, modify the x86-64's arch_hibernation_header_save() and arch_hibernation_header_restore() routines to pass the physical address of the image kernel's entry point (in addition to its virtual address) to the boot kernel (a small piece of assembly code involved in passing the entry point's virtual address to the image kernel is not necessary any more after that, so drop it). Update RESTORE_MAGIC too to reflect the image header format change. Next, in set_up_temporary_mappings(), use the physical and virtual addresses of the image kernel's entry point passed in the image header to set up a minimum kernel text mapping (using memory pages that won't be overwritten by the image kernel's memory contents) that will map those addresses to each other as appropriate. This makes the concern about the possible corruption of the original boot kernel text mapping go away and if the the minimum kernel text mapping used for the final jump marks the image kernel's entry point memory as executable, the jump to it is guaraneed to succeed. Fixes: ab76f7b4ab23 (x86/mm: Set NX on gap between __ex_table and rodata) Link: http://marc.info/?l=linux-pm&m=146372852823760&w=2 Reported-by: Logan Gunthorpe <logang@deltatee.com> Reported-and-tested-by: Borislav Petkov <bp@suse.de> Tested-by: Kees Cook <keescook@chromium.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-07-01 00:11:41 +08:00
.Ldone:
Hibernation: Arbitrary boot kernel support on x86_64 Make it possible to restore a hibernation image on x86_64 with the help of a kernel different from the one in the image. The idea is to split the core restoration code into two separate parts and to place each of them in a different page.  The first part belongs to the boot kernel and is executed as the last step of the image kernel's memory restoration procedure.  Before being executed, it is relocated to a safe page that won't be overwritten while copying the image kernel pages. The final operation performed by it is a jump to the second part of the core restoration code that belongs to the image kernel and has just been restored. This code makes the CPU switch to the image kernel's page tables and restores the state of general purpose registers (including the stack pointer) from before the hibernation. The main issue with this idea is that in order to jump to the second part of the core restoration code the boot kernel needs to know its address.  However, this address may be passed to it in the image header.  Namely, the part of the image header previously used for checking if the version of the image kernel is correct can be replaced with some architecture specific data that will allow the boot kernel to jump to the right address within the image kernel.  These data should also be used for checking if the image kernel is compatible with the boot kernel (as far as the memory restroration procedure is concerned). It can be done, for example, with the help of a "magic" value that has to be equal in both kernels, so that they can be regarded as compatible. Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Acked-by: Pavel Machek <pavel@ucw.cz> Cc: Andi Kleen <ak@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-18 18:04:53 +08:00
/* jump to the restore_registers address from the image header */
x86/power/64: Fix kernel text mapping corruption during image restoration Logan Gunthorpe reports that hibernation stopped working reliably for him after commit ab76f7b4ab23 (x86/mm: Set NX on gap between __ex_table and rodata). That turns out to be a consequence of a long-standing issue with the 64-bit image restoration code on x86, which is that the temporary page tables set up by it to avoid page tables corruption when the last bits of the image kernel's memory contents are copied into their original page frames re-use the boot kernel's text mapping, but that mapping may very well get corrupted just like any other part of the page tables. Of course, if that happens, the final jump to the image kernel's entry point will go to nowhere. The exact reason why commit ab76f7b4ab23 matters here is that it sometimes causes a PMD of a large page to be split into PTEs that are allocated dynamically and get corrupted during image restoration as described above. To fix that issue note that the code copying the last bits of the image kernel's memory contents to the page frames occupied by them previoulsy doesn't use the kernel text mapping, because it runs from a special page covered by the identity mapping set up for that code from scratch. Hence, the kernel text mapping is only needed before that code starts to run and then it will only be used just for the final jump to the image kernel's entry point. Accordingly, the temporary page tables set up in swsusp_arch_resume() on x86-64 need to contain the kernel text mapping too. That mapping is only going to be used for the final jump to the image kernel, so it only needs to cover the image kernel's entry point, because the first thing the image kernel does after getting control back is to switch over to its own original page tables. Moreover, the virtual address of the image kernel's entry point in that mapping has to be the same as the one mapped by the image kernel's page tables. With that in mind, modify the x86-64's arch_hibernation_header_save() and arch_hibernation_header_restore() routines to pass the physical address of the image kernel's entry point (in addition to its virtual address) to the boot kernel (a small piece of assembly code involved in passing the entry point's virtual address to the image kernel is not necessary any more after that, so drop it). Update RESTORE_MAGIC too to reflect the image header format change. Next, in set_up_temporary_mappings(), use the physical and virtual addresses of the image kernel's entry point passed in the image header to set up a minimum kernel text mapping (using memory pages that won't be overwritten by the image kernel's memory contents) that will map those addresses to each other as appropriate. This makes the concern about the possible corruption of the original boot kernel text mapping go away and if the the minimum kernel text mapping used for the final jump marks the image kernel's entry point memory as executable, the jump to it is guaraneed to succeed. Fixes: ab76f7b4ab23 (x86/mm: Set NX on gap between __ex_table and rodata) Link: http://marc.info/?l=linux-pm&m=146372852823760&w=2 Reported-by: Logan Gunthorpe <logang@deltatee.com> Reported-and-tested-by: Borislav Petkov <bp@suse.de> Tested-by: Kees Cook <keescook@chromium.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-07-01 00:11:41 +08:00
jmpq *%r8
Hibernation: Arbitrary boot kernel support on x86_64 Make it possible to restore a hibernation image on x86_64 with the help of a kernel different from the one in the image. The idea is to split the core restoration code into two separate parts and to place each of them in a different page.  The first part belongs to the boot kernel and is executed as the last step of the image kernel's memory restoration procedure.  Before being executed, it is relocated to a safe page that won't be overwritten while copying the image kernel pages. The final operation performed by it is a jump to the second part of the core restoration code that belongs to the image kernel and has just been restored. This code makes the CPU switch to the image kernel's page tables and restores the state of general purpose registers (including the stack pointer) from before the hibernation. The main issue with this idea is that in order to jump to the second part of the core restoration code the boot kernel needs to know its address.  However, this address may be passed to it in the image header.  Namely, the part of the image header previously used for checking if the version of the image kernel is correct can be replaced with some architecture specific data that will allow the boot kernel to jump to the right address within the image kernel.  These data should also be used for checking if the image kernel is compatible with the boot kernel (as far as the memory restroration procedure is concerned). It can be done, for example, with the help of a "magic" value that has to be equal in both kernels, so that they can be regarded as compatible. Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Acked-by: Pavel Machek <pavel@ucw.cz> Cc: Andi Kleen <ak@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-18 18:04:53 +08:00
x86/power/64: Fix kernel text mapping corruption during image restoration Logan Gunthorpe reports that hibernation stopped working reliably for him after commit ab76f7b4ab23 (x86/mm: Set NX on gap between __ex_table and rodata). That turns out to be a consequence of a long-standing issue with the 64-bit image restoration code on x86, which is that the temporary page tables set up by it to avoid page tables corruption when the last bits of the image kernel's memory contents are copied into their original page frames re-use the boot kernel's text mapping, but that mapping may very well get corrupted just like any other part of the page tables. Of course, if that happens, the final jump to the image kernel's entry point will go to nowhere. The exact reason why commit ab76f7b4ab23 matters here is that it sometimes causes a PMD of a large page to be split into PTEs that are allocated dynamically and get corrupted during image restoration as described above. To fix that issue note that the code copying the last bits of the image kernel's memory contents to the page frames occupied by them previoulsy doesn't use the kernel text mapping, because it runs from a special page covered by the identity mapping set up for that code from scratch. Hence, the kernel text mapping is only needed before that code starts to run and then it will only be used just for the final jump to the image kernel's entry point. Accordingly, the temporary page tables set up in swsusp_arch_resume() on x86-64 need to contain the kernel text mapping too. That mapping is only going to be used for the final jump to the image kernel, so it only needs to cover the image kernel's entry point, because the first thing the image kernel does after getting control back is to switch over to its own original page tables. Moreover, the virtual address of the image kernel's entry point in that mapping has to be the same as the one mapped by the image kernel's page tables. With that in mind, modify the x86-64's arch_hibernation_header_save() and arch_hibernation_header_restore() routines to pass the physical address of the image kernel's entry point (in addition to its virtual address) to the boot kernel (a small piece of assembly code involved in passing the entry point's virtual address to the image kernel is not necessary any more after that, so drop it). Update RESTORE_MAGIC too to reflect the image header format change. Next, in set_up_temporary_mappings(), use the physical and virtual addresses of the image kernel's entry point passed in the image header to set up a minimum kernel text mapping (using memory pages that won't be overwritten by the image kernel's memory contents) that will map those addresses to each other as appropriate. This makes the concern about the possible corruption of the original boot kernel text mapping go away and if the the minimum kernel text mapping used for the final jump marks the image kernel's entry point memory as executable, the jump to it is guaraneed to succeed. Fixes: ab76f7b4ab23 (x86/mm: Set NX on gap between __ex_table and rodata) Link: http://marc.info/?l=linux-pm&m=146372852823760&w=2 Reported-by: Logan Gunthorpe <logang@deltatee.com> Reported-and-tested-by: Borislav Petkov <bp@suse.de> Tested-by: Kees Cook <keescook@chromium.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-07-01 00:11:41 +08:00
/* code below belongs to the image kernel */
.align PAGE_SIZE
Hibernation: Arbitrary boot kernel support on x86_64 Make it possible to restore a hibernation image on x86_64 with the help of a kernel different from the one in the image. The idea is to split the core restoration code into two separate parts and to place each of them in a different page.  The first part belongs to the boot kernel and is executed as the last step of the image kernel's memory restoration procedure.  Before being executed, it is relocated to a safe page that won't be overwritten while copying the image kernel pages. The final operation performed by it is a jump to the second part of the core restoration code that belongs to the image kernel and has just been restored. This code makes the CPU switch to the image kernel's page tables and restores the state of general purpose registers (including the stack pointer) from before the hibernation. The main issue with this idea is that in order to jump to the second part of the core restoration code the boot kernel needs to know its address.  However, this address may be passed to it in the image header.  Namely, the part of the image header previously used for checking if the version of the image kernel is correct can be replaced with some architecture specific data that will allow the boot kernel to jump to the right address within the image kernel.  These data should also be used for checking if the image kernel is compatible with the boot kernel (as far as the memory restroration procedure is concerned). It can be done, for example, with the help of a "magic" value that has to be equal in both kernels, so that they can be regarded as compatible. Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Acked-by: Pavel Machek <pavel@ucw.cz> Cc: Andi Kleen <ak@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-18 18:04:53 +08:00
ENTRY(restore_registers)
[PATCH] x86_64: Set up safe page tables during resume The following patch makes swsusp avoid the possible temporary corruption of page translation tables during resume on x86-64. This is achieved by creating a copy of the relevant page tables that will not be modified by swsusp and can be safely used by it on resume. The problem is that during resume on x86-64 swsusp may temporarily corrupt the page tables used for the direct mapping of RAM. If that happens, a page fault occurs and cannot be handled properly, which leads to the solid hang of the affected system. This leads to the loss of the system's state from before suspend and may result in the loss of data or the corruption of filesystems, so it is a serious issue. Also, it appears to happen quite often (for me, as often as 50% of the time). The problem is related to the fact that (at least) one of the PMD entries used in the direct memory mapping (starting at PAGE_OFFSET) points to a page table the physical address of which is much greater than the physical address of the PMD entry itself. Moreover, unfortunately, the physical address of the page table before suspend (i.e. the one stored in the suspend image) happens to be different to the physical address of the corresponding page table used during resume (i.e. the one that is valid right before swsusp_arch_resume() in arch/x86_64/kernel/suspend_asm.S is executed). Thus while the image is restored, the "offending" PMD entry gets overwritten, so it does not point to the right physical address any more (i.e. there's no page table at the address pointed to by it, because it points to the address the page table has been at during suspend). Consequently, if the PMD entry is used later on, and it _is_ used in the process of copying the image pages, a page fault occurs, but it cannot be handled in the normal way and the system hangs. In principle we can call create_resume_mapping() from swsusp_arch_resume() (ie. from suspend_asm.S), but then the memory allocations in create_resume_mapping(), resume_pud_mapping(), and resume_pmd_mapping() must be made carefully so that we use _only_ NosaveFree pages in them (the other pages are overwritten by the loop in swsusp_arch_resume()). Additionally, we are in atomic context at that time, so we cannot use GFP_KERNEL. Moreover, if one of the allocations fails, we should free all of the allocated pages, so we need to trace them somehow. All of this is done in the appended patch, except that the functions populating the page tables are located in arch/x86_64/kernel/suspend.c rather than in init.c. It may be done in a more elegan way in the future, with the help of some swsusp patches that are in the works now. [AK: move some externs into headers, renamed a function] Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-10-10 03:19:40 +08:00
/* go back to the original page tables */
x86/power/64: Fix kernel text mapping corruption during image restoration Logan Gunthorpe reports that hibernation stopped working reliably for him after commit ab76f7b4ab23 (x86/mm: Set NX on gap between __ex_table and rodata). That turns out to be a consequence of a long-standing issue with the 64-bit image restoration code on x86, which is that the temporary page tables set up by it to avoid page tables corruption when the last bits of the image kernel's memory contents are copied into their original page frames re-use the boot kernel's text mapping, but that mapping may very well get corrupted just like any other part of the page tables. Of course, if that happens, the final jump to the image kernel's entry point will go to nowhere. The exact reason why commit ab76f7b4ab23 matters here is that it sometimes causes a PMD of a large page to be split into PTEs that are allocated dynamically and get corrupted during image restoration as described above. To fix that issue note that the code copying the last bits of the image kernel's memory contents to the page frames occupied by them previoulsy doesn't use the kernel text mapping, because it runs from a special page covered by the identity mapping set up for that code from scratch. Hence, the kernel text mapping is only needed before that code starts to run and then it will only be used just for the final jump to the image kernel's entry point. Accordingly, the temporary page tables set up in swsusp_arch_resume() on x86-64 need to contain the kernel text mapping too. That mapping is only going to be used for the final jump to the image kernel, so it only needs to cover the image kernel's entry point, because the first thing the image kernel does after getting control back is to switch over to its own original page tables. Moreover, the virtual address of the image kernel's entry point in that mapping has to be the same as the one mapped by the image kernel's page tables. With that in mind, modify the x86-64's arch_hibernation_header_save() and arch_hibernation_header_restore() routines to pass the physical address of the image kernel's entry point (in addition to its virtual address) to the boot kernel (a small piece of assembly code involved in passing the entry point's virtual address to the image kernel is not necessary any more after that, so drop it). Update RESTORE_MAGIC too to reflect the image header format change. Next, in set_up_temporary_mappings(), use the physical and virtual addresses of the image kernel's entry point passed in the image header to set up a minimum kernel text mapping (using memory pages that won't be overwritten by the image kernel's memory contents) that will map those addresses to each other as appropriate. This makes the concern about the possible corruption of the original boot kernel text mapping go away and if the the minimum kernel text mapping used for the final jump marks the image kernel's entry point memory as executable, the jump to it is guaraneed to succeed. Fixes: ab76f7b4ab23 (x86/mm: Set NX on gap between __ex_table and rodata) Link: http://marc.info/?l=linux-pm&m=146372852823760&w=2 Reported-by: Logan Gunthorpe <logang@deltatee.com> Reported-and-tested-by: Borislav Petkov <bp@suse.de> Tested-by: Kees Cook <keescook@chromium.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-07-01 00:11:41 +08:00
movq %r9, %cr3
/* Flush TLB, including "global" things (vmalloc) */
movq mmu_cr4_features(%rip), %rax
movq %rax, %rdx
andq $~(X86_CR4_PGE), %rdx
movq %rdx, %cr4; # turn off PGE
movq %cr3, %rcx; # flush TLB
movq %rcx, %cr3
movq %rax, %cr4; # turn PGE back on
/* We don't restore %rax, it must be 0 anyway */
movq $saved_context, %rax
movq pt_regs_sp(%rax), %rsp
movq pt_regs_bp(%rax), %rbp
movq pt_regs_si(%rax), %rsi
movq pt_regs_di(%rax), %rdi
movq pt_regs_bx(%rax), %rbx
movq pt_regs_cx(%rax), %rcx
movq pt_regs_dx(%rax), %rdx
movq pt_regs_r8(%rax), %r8
movq pt_regs_r9(%rax), %r9
movq pt_regs_r10(%rax), %r10
movq pt_regs_r11(%rax), %r11
movq pt_regs_r12(%rax), %r12
movq pt_regs_r13(%rax), %r13
movq pt_regs_r14(%rax), %r14
movq pt_regs_r15(%rax), %r15
pushq pt_regs_flags(%rax)
popfq
x86, gdt, hibernate: Store/load GDT for hibernate path. The git commite7a5cd063c7b4c58417f674821d63f5eb6747e37 ("x86-64, gdt: Store/load GDT for ACPI S3 or hibernate/resume path is not needed.") assumes that for the hibernate path the booting kernel and the resuming kernel MUST be the same. That is certainly the case for a 32-bit kernel (see check_image_kernel and CONFIG_ARCH_HIBERNATION_HEADER config option). However for 64-bit kernels it is OK to have a different kernel version (and size of the image) of the booting and resuming kernels. Hence the above mentioned git commit introduces an regression. This patch fixes it by introducing a 'struct desc_ptr gdt_desc' back in the 'struct saved_context'. However instead of having in the 'save_processor_state' and 'restore_processor_state' the store/load_gdt calls, we are only saving the GDT in the save_processor_state. For the restore path the lgdt operation is done in hibernate_asm_[32|64].S in the 'restore_registers' path. The apt reader of this description will recognize that only 64-bit kernels need this treatment, not 32-bit. This patch adds the logic in the 32-bit path to be more similar to 64-bit so that in the future the unification process can take advantage of this. [ hpa: this also reverts an inadvertent on-disk format change ] Suggested-by: "H. Peter Anvin" <hpa@zytor.com> Acked-by: "Rafael J. Wysocki" <rjw@sisk.pl> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Link: http://lkml.kernel.org/r/1367459610-9656-2-git-send-email-konrad.wilk@oracle.com Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-05-02 09:53:30 +08:00
/* Saved in save_processor_state. */
lgdt saved_context_gdt_desc(%rax)
xorq %rax, %rax
Hibernation: Arbitrary boot kernel support on x86_64 Make it possible to restore a hibernation image on x86_64 with the help of a kernel different from the one in the image. The idea is to split the core restoration code into two separate parts and to place each of them in a different page.  The first part belongs to the boot kernel and is executed as the last step of the image kernel's memory restoration procedure.  Before being executed, it is relocated to a safe page that won't be overwritten while copying the image kernel pages. The final operation performed by it is a jump to the second part of the core restoration code that belongs to the image kernel and has just been restored. This code makes the CPU switch to the image kernel's page tables and restores the state of general purpose registers (including the stack pointer) from before the hibernation. The main issue with this idea is that in order to jump to the second part of the core restoration code the boot kernel needs to know its address.  However, this address may be passed to it in the image header.  Namely, the part of the image header previously used for checking if the version of the image kernel is correct can be replaced with some architecture specific data that will allow the boot kernel to jump to the right address within the image kernel.  These data should also be used for checking if the image kernel is compatible with the boot kernel (as far as the memory restroration procedure is concerned). It can be done, for example, with the help of a "magic" value that has to be equal in both kernels, so that they can be regarded as compatible. Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Acked-by: Pavel Machek <pavel@ucw.cz> Cc: Andi Kleen <ak@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-18 18:04:53 +08:00
/* tell the hibernation core that we've just restored the memory */
movq %rax, in_suspend(%rip)
ret
2016-01-22 06:49:24 +08:00
ENDPROC(restore_registers)