2008-02-10 06:24:09 +08:00
|
|
|
/*
|
|
|
|
* Hibernation support for x86-64
|
|
|
|
*
|
|
|
|
* Distribute under GPLv2
|
|
|
|
*
|
|
|
|
* Copyright (c) 2007 Rafael J. Wysocki <rjw@sisk.pl>
|
2010-07-18 20:27:13 +08:00
|
|
|
* Copyright (c) 2002 Pavel Machek <pavel@ucw.cz>
|
2008-02-10 06:24:09 +08:00
|
|
|
* Copyright (c) 2001 Patrick Mochel <mochel@osdl.org>
|
|
|
|
*/
|
|
|
|
|
include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h
percpu.h is included by sched.h and module.h and thus ends up being
included when building most .c files. percpu.h includes slab.h which
in turn includes gfp.h making everything defined by the two files
universally available and complicating inclusion dependencies.
percpu.h -> slab.h dependency is about to be removed. Prepare for
this change by updating users of gfp and slab facilities include those
headers directly instead of assuming availability. As this conversion
needs to touch large number of source files, the following script is
used as the basis of conversion.
http://userweb.kernel.org/~tj/misc/slabh-sweep.py
The script does the followings.
* Scan files for gfp and slab usages and update includes such that
only the necessary includes are there. ie. if only gfp is used,
gfp.h, if slab is used, slab.h.
* When the script inserts a new include, it looks at the include
blocks and try to put the new include such that its order conforms
to its surrounding. It's put in the include block which contains
core kernel includes, in the same order that the rest are ordered -
alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
doesn't seem to be any matching order.
* If the script can't find a place to put a new include (mostly
because the file doesn't have fitting include block), it prints out
an error message indicating which .h file needs to be added to the
file.
The conversion was done in the following steps.
1. The initial automatic conversion of all .c files updated slightly
over 4000 files, deleting around 700 includes and adding ~480 gfp.h
and ~3000 slab.h inclusions. The script emitted errors for ~400
files.
2. Each error was manually checked. Some didn't need the inclusion,
some needed manual addition while adding it to implementation .h or
embedding .c file was more appropriate for others. This step added
inclusions to around 150 files.
3. The script was run again and the output was compared to the edits
from #2 to make sure no file was left behind.
4. Several build tests were done and a couple of problems were fixed.
e.g. lib/decompress_*.c used malloc/free() wrappers around slab
APIs requiring slab.h to be added manually.
5. The script was run on all .h files but without automatically
editing them as sprinkling gfp.h and slab.h inclusions around .h
files could easily lead to inclusion dependency hell. Most gfp.h
inclusion directives were ignored as stuff from gfp.h was usually
wildly available and often used in preprocessor macros. Each
slab.h inclusion directive was examined and added manually as
necessary.
6. percpu.h was updated not to include slab.h.
7. Build test were done on the following configurations and failures
were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my
distributed build env didn't work with gcov compiles) and a few
more options had to be turned off depending on archs to make things
build (like ipr on powerpc/64 which failed due to missing writeq).
* x86 and x86_64 UP and SMP allmodconfig and a custom test config.
* powerpc and powerpc64 SMP allmodconfig
* sparc and sparc64 SMP allmodconfig
* ia64 SMP allmodconfig
* s390 SMP allmodconfig
* alpha SMP allmodconfig
* um on x86_64 SMP allmodconfig
8. percpu.h modifications were reverted so that it could be applied as
a separate patch and serve as bisection point.
Given the fact that I had only a couple of failures from tests on step
6, I'm fairly confident about the coverage of this conversion patch.
If there is a breakage, it's likely to be something in one of the arch
headers which should be easily discoverable easily on most builds of
the specific arch.
Signed-off-by: Tejun Heo <tj@kernel.org>
Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
2010-03-24 16:04:11 +08:00
|
|
|
#include <linux/gfp.h>
|
2008-02-10 06:24:09 +08:00
|
|
|
#include <linux/smp.h>
|
|
|
|
#include <linux/suspend.h>
|
PM / hibernate: Verify the consistent of e820 memory map by md5 digest
On some platforms, there is occasional panic triggered when
trying to resume from hibernation, a typical panic looks like:
"BUG: unable to handle kernel paging request at ffff880085894000
IP: [<ffffffff810c5dc2>] load_image_lzo+0x8c2/0xe70"
Investigation carried out by Lee Chun-Yi shows that this is because
e820 map has been changed by BIOS across hibernation, and one
of the page frames from suspend kernel is right located in restore
kernel's unmapped region, so panic comes out when accessing unmapped
kernel address.
In order to expose this issue earlier, the md5 hash of e820 map
is passed from suspend kernel to restore kernel, and the restore
kernel will terminate the resume process once it finds the md5
hash are not the same.
As the format of image header has been modified, the magic number
should also be adjusted as kernels with the same RESTORE_MAGIC have
to use the same header format and interpret all of the fields in
it in the same way.
If the suspend kernel is built without md5 support, and the restore
kernel has md5 support, then the latter will bypass the check process.
Vice versa the restore kernel will bypass the check if it does not
support md5 operation.
Note:
1. Without this patch applied, it is possible that BIOS has
provided an inconsistent memory map, but the resume kernel is still
able to restore the image anyway(e.g, E820_RAM region is the superset
of the previous one), although the system might be unstable. So this
patch tries to treat any inconsistent e820 as illegal.
2. Another case is, this patch replies on comparing the e820_saved, but
currently the e820_save might not be strictly the same across
hibernation, even if BIOS has provided consistent e820 map - In
theory mptable might modify the BIOS-provided e820_saved dynamically
in early_reserve_e820_mpc_new, which would allocate a buffer from
E820_RAM, and marks it from E820_RAM to E820_RESERVED).
This is a potential and rare case we need to deal with in OS in
the future.
Suggested-by: Pavel Machek <pavel@ucw.cz>
Suggested-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Chen Yu <yu.c.chen@intel.com>
Reviewed-by: Lee, Chun-Yi <jlee@suse.com>
Acked-by: Pavel Machek <pavel@ucw.cz>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-10-20 16:14:52 +08:00
|
|
|
#include <linux/scatterlist.h>
|
|
|
|
#include <linux/kdebug.h>
|
|
|
|
|
|
|
|
#include <crypto/hash.h>
|
2013-01-25 04:20:14 +08:00
|
|
|
|
|
|
|
#include <asm/init.h>
|
2008-02-10 06:24:09 +08:00
|
|
|
#include <asm/proto.h>
|
|
|
|
#include <asm/page.h>
|
|
|
|
#include <asm/pgtable.h>
|
|
|
|
#include <asm/mtrr.h>
|
2014-10-10 06:30:30 +08:00
|
|
|
#include <asm/sections.h>
|
2009-04-01 06:23:37 +08:00
|
|
|
#include <asm/suspend.h>
|
x86/power/64: Fix kernel text mapping corruption during image restoration
Logan Gunthorpe reports that hibernation stopped working reliably for
him after commit ab76f7b4ab23 (x86/mm: Set NX on gap between __ex_table
and rodata).
That turns out to be a consequence of a long-standing issue with the
64-bit image restoration code on x86, which is that the temporary
page tables set up by it to avoid page tables corruption when the
last bits of the image kernel's memory contents are copied into
their original page frames re-use the boot kernel's text mapping,
but that mapping may very well get corrupted just like any other
part of the page tables. Of course, if that happens, the final
jump to the image kernel's entry point will go to nowhere.
The exact reason why commit ab76f7b4ab23 matters here is that it
sometimes causes a PMD of a large page to be split into PTEs
that are allocated dynamically and get corrupted during image
restoration as described above.
To fix that issue note that the code copying the last bits of the
image kernel's memory contents to the page frames occupied by them
previoulsy doesn't use the kernel text mapping, because it runs from
a special page covered by the identity mapping set up for that code
from scratch. Hence, the kernel text mapping is only needed before
that code starts to run and then it will only be used just for the
final jump to the image kernel's entry point.
Accordingly, the temporary page tables set up in swsusp_arch_resume()
on x86-64 need to contain the kernel text mapping too. That mapping
is only going to be used for the final jump to the image kernel, so
it only needs to cover the image kernel's entry point, because the
first thing the image kernel does after getting control back is to
switch over to its own original page tables. Moreover, the virtual
address of the image kernel's entry point in that mapping has to be
the same as the one mapped by the image kernel's page tables.
With that in mind, modify the x86-64's arch_hibernation_header_save()
and arch_hibernation_header_restore() routines to pass the physical
address of the image kernel's entry point (in addition to its virtual
address) to the boot kernel (a small piece of assembly code involved
in passing the entry point's virtual address to the image kernel is
not necessary any more after that, so drop it). Update RESTORE_MAGIC
too to reflect the image header format change.
Next, in set_up_temporary_mappings(), use the physical and virtual
addresses of the image kernel's entry point passed in the image
header to set up a minimum kernel text mapping (using memory pages
that won't be overwritten by the image kernel's memory contents) that
will map those addresses to each other as appropriate.
This makes the concern about the possible corruption of the original
boot kernel text mapping go away and if the the minimum kernel text
mapping used for the final jump marks the image kernel's entry point
memory as executable, the jump to it is guaraneed to succeed.
Fixes: ab76f7b4ab23 (x86/mm: Set NX on gap between __ex_table and rodata)
Link: http://marc.info/?l=linux-pm&m=146372852823760&w=2
Reported-by: Logan Gunthorpe <logang@deltatee.com>
Reported-and-tested-by: Borislav Petkov <bp@suse.de>
Tested-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-07-01 00:11:41 +08:00
|
|
|
#include <asm/tlbflush.h>
|
2008-02-10 06:24:09 +08:00
|
|
|
|
2008-02-10 06:24:09 +08:00
|
|
|
/* Defined in hibernate_asm_64.S */
|
2014-05-02 06:44:37 +08:00
|
|
|
extern asmlinkage __visible int restore_image(void);
|
2008-02-10 06:24:09 +08:00
|
|
|
|
|
|
|
/*
|
|
|
|
* Address to jump to in the last phase of restore in order to get to the image
|
|
|
|
* kernel's text (this value is passed in the image header).
|
|
|
|
*/
|
2013-08-06 06:02:49 +08:00
|
|
|
unsigned long restore_jump_address __visible;
|
x86/power/64: Fix kernel text mapping corruption during image restoration
Logan Gunthorpe reports that hibernation stopped working reliably for
him after commit ab76f7b4ab23 (x86/mm: Set NX on gap between __ex_table
and rodata).
That turns out to be a consequence of a long-standing issue with the
64-bit image restoration code on x86, which is that the temporary
page tables set up by it to avoid page tables corruption when the
last bits of the image kernel's memory contents are copied into
their original page frames re-use the boot kernel's text mapping,
but that mapping may very well get corrupted just like any other
part of the page tables. Of course, if that happens, the final
jump to the image kernel's entry point will go to nowhere.
The exact reason why commit ab76f7b4ab23 matters here is that it
sometimes causes a PMD of a large page to be split into PTEs
that are allocated dynamically and get corrupted during image
restoration as described above.
To fix that issue note that the code copying the last bits of the
image kernel's memory contents to the page frames occupied by them
previoulsy doesn't use the kernel text mapping, because it runs from
a special page covered by the identity mapping set up for that code
from scratch. Hence, the kernel text mapping is only needed before
that code starts to run and then it will only be used just for the
final jump to the image kernel's entry point.
Accordingly, the temporary page tables set up in swsusp_arch_resume()
on x86-64 need to contain the kernel text mapping too. That mapping
is only going to be used for the final jump to the image kernel, so
it only needs to cover the image kernel's entry point, because the
first thing the image kernel does after getting control back is to
switch over to its own original page tables. Moreover, the virtual
address of the image kernel's entry point in that mapping has to be
the same as the one mapped by the image kernel's page tables.
With that in mind, modify the x86-64's arch_hibernation_header_save()
and arch_hibernation_header_restore() routines to pass the physical
address of the image kernel's entry point (in addition to its virtual
address) to the boot kernel (a small piece of assembly code involved
in passing the entry point's virtual address to the image kernel is
not necessary any more after that, so drop it). Update RESTORE_MAGIC
too to reflect the image header format change.
Next, in set_up_temporary_mappings(), use the physical and virtual
addresses of the image kernel's entry point passed in the image
header to set up a minimum kernel text mapping (using memory pages
that won't be overwritten by the image kernel's memory contents) that
will map those addresses to each other as appropriate.
This makes the concern about the possible corruption of the original
boot kernel text mapping go away and if the the minimum kernel text
mapping used for the final jump marks the image kernel's entry point
memory as executable, the jump to it is guaraneed to succeed.
Fixes: ab76f7b4ab23 (x86/mm: Set NX on gap between __ex_table and rodata)
Link: http://marc.info/?l=linux-pm&m=146372852823760&w=2
Reported-by: Logan Gunthorpe <logang@deltatee.com>
Reported-and-tested-by: Borislav Petkov <bp@suse.de>
Tested-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-07-01 00:11:41 +08:00
|
|
|
unsigned long jump_address_phys;
|
2008-02-10 06:24:09 +08:00
|
|
|
|
|
|
|
/*
|
|
|
|
* Value of the cr3 register from before the hibernation (this value is passed
|
|
|
|
* in the image header).
|
|
|
|
*/
|
2013-08-06 06:02:49 +08:00
|
|
|
unsigned long restore_cr3 __visible;
|
2008-02-10 06:24:09 +08:00
|
|
|
|
2016-08-03 07:19:26 +08:00
|
|
|
unsigned long temp_level4_pgt __visible;
|
2008-02-10 06:24:09 +08:00
|
|
|
|
x86/power/64: Fix kernel text mapping corruption during image restoration
Logan Gunthorpe reports that hibernation stopped working reliably for
him after commit ab76f7b4ab23 (x86/mm: Set NX on gap between __ex_table
and rodata).
That turns out to be a consequence of a long-standing issue with the
64-bit image restoration code on x86, which is that the temporary
page tables set up by it to avoid page tables corruption when the
last bits of the image kernel's memory contents are copied into
their original page frames re-use the boot kernel's text mapping,
but that mapping may very well get corrupted just like any other
part of the page tables. Of course, if that happens, the final
jump to the image kernel's entry point will go to nowhere.
The exact reason why commit ab76f7b4ab23 matters here is that it
sometimes causes a PMD of a large page to be split into PTEs
that are allocated dynamically and get corrupted during image
restoration as described above.
To fix that issue note that the code copying the last bits of the
image kernel's memory contents to the page frames occupied by them
previoulsy doesn't use the kernel text mapping, because it runs from
a special page covered by the identity mapping set up for that code
from scratch. Hence, the kernel text mapping is only needed before
that code starts to run and then it will only be used just for the
final jump to the image kernel's entry point.
Accordingly, the temporary page tables set up in swsusp_arch_resume()
on x86-64 need to contain the kernel text mapping too. That mapping
is only going to be used for the final jump to the image kernel, so
it only needs to cover the image kernel's entry point, because the
first thing the image kernel does after getting control back is to
switch over to its own original page tables. Moreover, the virtual
address of the image kernel's entry point in that mapping has to be
the same as the one mapped by the image kernel's page tables.
With that in mind, modify the x86-64's arch_hibernation_header_save()
and arch_hibernation_header_restore() routines to pass the physical
address of the image kernel's entry point (in addition to its virtual
address) to the boot kernel (a small piece of assembly code involved
in passing the entry point's virtual address to the image kernel is
not necessary any more after that, so drop it). Update RESTORE_MAGIC
too to reflect the image header format change.
Next, in set_up_temporary_mappings(), use the physical and virtual
addresses of the image kernel's entry point passed in the image
header to set up a minimum kernel text mapping (using memory pages
that won't be overwritten by the image kernel's memory contents) that
will map those addresses to each other as appropriate.
This makes the concern about the possible corruption of the original
boot kernel text mapping go away and if the the minimum kernel text
mapping used for the final jump marks the image kernel's entry point
memory as executable, the jump to it is guaraneed to succeed.
Fixes: ab76f7b4ab23 (x86/mm: Set NX on gap between __ex_table and rodata)
Link: http://marc.info/?l=linux-pm&m=146372852823760&w=2
Reported-by: Logan Gunthorpe <logang@deltatee.com>
Reported-and-tested-by: Borislav Petkov <bp@suse.de>
Tested-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-07-01 00:11:41 +08:00
|
|
|
unsigned long relocated_restore_code __visible;
|
|
|
|
|
2016-08-03 07:19:26 +08:00
|
|
|
static int set_up_temporary_text_mapping(pgd_t *pgd)
|
x86/power/64: Fix kernel text mapping corruption during image restoration
Logan Gunthorpe reports that hibernation stopped working reliably for
him after commit ab76f7b4ab23 (x86/mm: Set NX on gap between __ex_table
and rodata).
That turns out to be a consequence of a long-standing issue with the
64-bit image restoration code on x86, which is that the temporary
page tables set up by it to avoid page tables corruption when the
last bits of the image kernel's memory contents are copied into
their original page frames re-use the boot kernel's text mapping,
but that mapping may very well get corrupted just like any other
part of the page tables. Of course, if that happens, the final
jump to the image kernel's entry point will go to nowhere.
The exact reason why commit ab76f7b4ab23 matters here is that it
sometimes causes a PMD of a large page to be split into PTEs
that are allocated dynamically and get corrupted during image
restoration as described above.
To fix that issue note that the code copying the last bits of the
image kernel's memory contents to the page frames occupied by them
previoulsy doesn't use the kernel text mapping, because it runs from
a special page covered by the identity mapping set up for that code
from scratch. Hence, the kernel text mapping is only needed before
that code starts to run and then it will only be used just for the
final jump to the image kernel's entry point.
Accordingly, the temporary page tables set up in swsusp_arch_resume()
on x86-64 need to contain the kernel text mapping too. That mapping
is only going to be used for the final jump to the image kernel, so
it only needs to cover the image kernel's entry point, because the
first thing the image kernel does after getting control back is to
switch over to its own original page tables. Moreover, the virtual
address of the image kernel's entry point in that mapping has to be
the same as the one mapped by the image kernel's page tables.
With that in mind, modify the x86-64's arch_hibernation_header_save()
and arch_hibernation_header_restore() routines to pass the physical
address of the image kernel's entry point (in addition to its virtual
address) to the boot kernel (a small piece of assembly code involved
in passing the entry point's virtual address to the image kernel is
not necessary any more after that, so drop it). Update RESTORE_MAGIC
too to reflect the image header format change.
Next, in set_up_temporary_mappings(), use the physical and virtual
addresses of the image kernel's entry point passed in the image
header to set up a minimum kernel text mapping (using memory pages
that won't be overwritten by the image kernel's memory contents) that
will map those addresses to each other as appropriate.
This makes the concern about the possible corruption of the original
boot kernel text mapping go away and if the the minimum kernel text
mapping used for the final jump marks the image kernel's entry point
memory as executable, the jump to it is guaraneed to succeed.
Fixes: ab76f7b4ab23 (x86/mm: Set NX on gap between __ex_table and rodata)
Link: http://marc.info/?l=linux-pm&m=146372852823760&w=2
Reported-by: Logan Gunthorpe <logang@deltatee.com>
Reported-and-tested-by: Borislav Petkov <bp@suse.de>
Tested-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-07-01 00:11:41 +08:00
|
|
|
{
|
|
|
|
pmd_t *pmd;
|
|
|
|
pud_t *pud;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* The new mapping only has to cover the page containing the image
|
|
|
|
* kernel's entry point (jump_address_phys), because the switch over to
|
|
|
|
* it is carried out by relocated code running from a page allocated
|
|
|
|
* specifically for this purpose and covered by the identity mapping, so
|
|
|
|
* the temporary kernel text mapping is only needed for the final jump.
|
|
|
|
* Moreover, in that mapping the virtual address of the image kernel's
|
|
|
|
* entry point must be the same as its virtual address in the image
|
|
|
|
* kernel (restore_jump_address), so the image kernel's
|
|
|
|
* restore_registers() code doesn't find itself in a different area of
|
|
|
|
* the virtual address space after switching over to the original page
|
|
|
|
* tables used by the image kernel.
|
|
|
|
*/
|
|
|
|
pud = (pud_t *)get_safe_page(GFP_ATOMIC);
|
|
|
|
if (!pud)
|
|
|
|
return -ENOMEM;
|
|
|
|
|
|
|
|
pmd = (pmd_t *)get_safe_page(GFP_ATOMIC);
|
|
|
|
if (!pmd)
|
|
|
|
return -ENOMEM;
|
|
|
|
|
|
|
|
set_pmd(pmd + pmd_index(restore_jump_address),
|
|
|
|
__pmd((jump_address_phys & PMD_MASK) | __PAGE_KERNEL_LARGE_EXEC));
|
|
|
|
set_pud(pud + pud_index(restore_jump_address),
|
|
|
|
__pud(__pa(pmd) | _KERNPG_TABLE));
|
2016-08-03 07:19:26 +08:00
|
|
|
set_pgd(pgd + pgd_index(restore_jump_address),
|
x86/power/64: Fix kernel text mapping corruption during image restoration
Logan Gunthorpe reports that hibernation stopped working reliably for
him after commit ab76f7b4ab23 (x86/mm: Set NX on gap between __ex_table
and rodata).
That turns out to be a consequence of a long-standing issue with the
64-bit image restoration code on x86, which is that the temporary
page tables set up by it to avoid page tables corruption when the
last bits of the image kernel's memory contents are copied into
their original page frames re-use the boot kernel's text mapping,
but that mapping may very well get corrupted just like any other
part of the page tables. Of course, if that happens, the final
jump to the image kernel's entry point will go to nowhere.
The exact reason why commit ab76f7b4ab23 matters here is that it
sometimes causes a PMD of a large page to be split into PTEs
that are allocated dynamically and get corrupted during image
restoration as described above.
To fix that issue note that the code copying the last bits of the
image kernel's memory contents to the page frames occupied by them
previoulsy doesn't use the kernel text mapping, because it runs from
a special page covered by the identity mapping set up for that code
from scratch. Hence, the kernel text mapping is only needed before
that code starts to run and then it will only be used just for the
final jump to the image kernel's entry point.
Accordingly, the temporary page tables set up in swsusp_arch_resume()
on x86-64 need to contain the kernel text mapping too. That mapping
is only going to be used for the final jump to the image kernel, so
it only needs to cover the image kernel's entry point, because the
first thing the image kernel does after getting control back is to
switch over to its own original page tables. Moreover, the virtual
address of the image kernel's entry point in that mapping has to be
the same as the one mapped by the image kernel's page tables.
With that in mind, modify the x86-64's arch_hibernation_header_save()
and arch_hibernation_header_restore() routines to pass the physical
address of the image kernel's entry point (in addition to its virtual
address) to the boot kernel (a small piece of assembly code involved
in passing the entry point's virtual address to the image kernel is
not necessary any more after that, so drop it). Update RESTORE_MAGIC
too to reflect the image header format change.
Next, in set_up_temporary_mappings(), use the physical and virtual
addresses of the image kernel's entry point passed in the image
header to set up a minimum kernel text mapping (using memory pages
that won't be overwritten by the image kernel's memory contents) that
will map those addresses to each other as appropriate.
This makes the concern about the possible corruption of the original
boot kernel text mapping go away and if the the minimum kernel text
mapping used for the final jump marks the image kernel's entry point
memory as executable, the jump to it is guaraneed to succeed.
Fixes: ab76f7b4ab23 (x86/mm: Set NX on gap between __ex_table and rodata)
Link: http://marc.info/?l=linux-pm&m=146372852823760&w=2
Reported-by: Logan Gunthorpe <logang@deltatee.com>
Reported-and-tested-by: Borislav Petkov <bp@suse.de>
Tested-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-07-01 00:11:41 +08:00
|
|
|
__pgd(__pa(pud) | _KERNPG_TABLE));
|
|
|
|
|
|
|
|
return 0;
|
|
|
|
}
|
2008-02-10 06:24:09 +08:00
|
|
|
|
2013-01-25 04:20:14 +08:00
|
|
|
static void *alloc_pgt_page(void *context)
|
2008-02-10 06:24:09 +08:00
|
|
|
{
|
2013-01-25 04:20:14 +08:00
|
|
|
return (void *)get_safe_page(GFP_ATOMIC);
|
2008-02-10 06:24:09 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
static int set_up_temporary_mappings(void)
|
|
|
|
{
|
2013-01-25 04:20:14 +08:00
|
|
|
struct x86_mapping_info info = {
|
|
|
|
.alloc_pgt_page = alloc_pgt_page,
|
|
|
|
.pmd_flag = __PAGE_KERNEL_LARGE_EXEC,
|
2016-08-08 21:31:31 +08:00
|
|
|
.offset = __PAGE_OFFSET,
|
2013-01-25 04:20:14 +08:00
|
|
|
};
|
|
|
|
unsigned long mstart, mend;
|
2016-08-03 07:19:26 +08:00
|
|
|
pgd_t *pgd;
|
2013-01-25 04:20:14 +08:00
|
|
|
int result;
|
|
|
|
int i;
|
2008-02-10 06:24:09 +08:00
|
|
|
|
2016-08-03 07:19:26 +08:00
|
|
|
pgd = (pgd_t *)get_safe_page(GFP_ATOMIC);
|
|
|
|
if (!pgd)
|
2008-02-10 06:24:09 +08:00
|
|
|
return -ENOMEM;
|
|
|
|
|
x86/power/64: Fix kernel text mapping corruption during image restoration
Logan Gunthorpe reports that hibernation stopped working reliably for
him after commit ab76f7b4ab23 (x86/mm: Set NX on gap between __ex_table
and rodata).
That turns out to be a consequence of a long-standing issue with the
64-bit image restoration code on x86, which is that the temporary
page tables set up by it to avoid page tables corruption when the
last bits of the image kernel's memory contents are copied into
their original page frames re-use the boot kernel's text mapping,
but that mapping may very well get corrupted just like any other
part of the page tables. Of course, if that happens, the final
jump to the image kernel's entry point will go to nowhere.
The exact reason why commit ab76f7b4ab23 matters here is that it
sometimes causes a PMD of a large page to be split into PTEs
that are allocated dynamically and get corrupted during image
restoration as described above.
To fix that issue note that the code copying the last bits of the
image kernel's memory contents to the page frames occupied by them
previoulsy doesn't use the kernel text mapping, because it runs from
a special page covered by the identity mapping set up for that code
from scratch. Hence, the kernel text mapping is only needed before
that code starts to run and then it will only be used just for the
final jump to the image kernel's entry point.
Accordingly, the temporary page tables set up in swsusp_arch_resume()
on x86-64 need to contain the kernel text mapping too. That mapping
is only going to be used for the final jump to the image kernel, so
it only needs to cover the image kernel's entry point, because the
first thing the image kernel does after getting control back is to
switch over to its own original page tables. Moreover, the virtual
address of the image kernel's entry point in that mapping has to be
the same as the one mapped by the image kernel's page tables.
With that in mind, modify the x86-64's arch_hibernation_header_save()
and arch_hibernation_header_restore() routines to pass the physical
address of the image kernel's entry point (in addition to its virtual
address) to the boot kernel (a small piece of assembly code involved
in passing the entry point's virtual address to the image kernel is
not necessary any more after that, so drop it). Update RESTORE_MAGIC
too to reflect the image header format change.
Next, in set_up_temporary_mappings(), use the physical and virtual
addresses of the image kernel's entry point passed in the image
header to set up a minimum kernel text mapping (using memory pages
that won't be overwritten by the image kernel's memory contents) that
will map those addresses to each other as appropriate.
This makes the concern about the possible corruption of the original
boot kernel text mapping go away and if the the minimum kernel text
mapping used for the final jump marks the image kernel's entry point
memory as executable, the jump to it is guaraneed to succeed.
Fixes: ab76f7b4ab23 (x86/mm: Set NX on gap between __ex_table and rodata)
Link: http://marc.info/?l=linux-pm&m=146372852823760&w=2
Reported-by: Logan Gunthorpe <logang@deltatee.com>
Reported-and-tested-by: Borislav Petkov <bp@suse.de>
Tested-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-07-01 00:11:41 +08:00
|
|
|
/* Prepare a temporary mapping for the kernel text */
|
2016-08-03 07:19:26 +08:00
|
|
|
result = set_up_temporary_text_mapping(pgd);
|
x86/power/64: Fix kernel text mapping corruption during image restoration
Logan Gunthorpe reports that hibernation stopped working reliably for
him after commit ab76f7b4ab23 (x86/mm: Set NX on gap between __ex_table
and rodata).
That turns out to be a consequence of a long-standing issue with the
64-bit image restoration code on x86, which is that the temporary
page tables set up by it to avoid page tables corruption when the
last bits of the image kernel's memory contents are copied into
their original page frames re-use the boot kernel's text mapping,
but that mapping may very well get corrupted just like any other
part of the page tables. Of course, if that happens, the final
jump to the image kernel's entry point will go to nowhere.
The exact reason why commit ab76f7b4ab23 matters here is that it
sometimes causes a PMD of a large page to be split into PTEs
that are allocated dynamically and get corrupted during image
restoration as described above.
To fix that issue note that the code copying the last bits of the
image kernel's memory contents to the page frames occupied by them
previoulsy doesn't use the kernel text mapping, because it runs from
a special page covered by the identity mapping set up for that code
from scratch. Hence, the kernel text mapping is only needed before
that code starts to run and then it will only be used just for the
final jump to the image kernel's entry point.
Accordingly, the temporary page tables set up in swsusp_arch_resume()
on x86-64 need to contain the kernel text mapping too. That mapping
is only going to be used for the final jump to the image kernel, so
it only needs to cover the image kernel's entry point, because the
first thing the image kernel does after getting control back is to
switch over to its own original page tables. Moreover, the virtual
address of the image kernel's entry point in that mapping has to be
the same as the one mapped by the image kernel's page tables.
With that in mind, modify the x86-64's arch_hibernation_header_save()
and arch_hibernation_header_restore() routines to pass the physical
address of the image kernel's entry point (in addition to its virtual
address) to the boot kernel (a small piece of assembly code involved
in passing the entry point's virtual address to the image kernel is
not necessary any more after that, so drop it). Update RESTORE_MAGIC
too to reflect the image header format change.
Next, in set_up_temporary_mappings(), use the physical and virtual
addresses of the image kernel's entry point passed in the image
header to set up a minimum kernel text mapping (using memory pages
that won't be overwritten by the image kernel's memory contents) that
will map those addresses to each other as appropriate.
This makes the concern about the possible corruption of the original
boot kernel text mapping go away and if the the minimum kernel text
mapping used for the final jump marks the image kernel's entry point
memory as executable, the jump to it is guaraneed to succeed.
Fixes: ab76f7b4ab23 (x86/mm: Set NX on gap between __ex_table and rodata)
Link: http://marc.info/?l=linux-pm&m=146372852823760&w=2
Reported-by: Logan Gunthorpe <logang@deltatee.com>
Reported-and-tested-by: Borislav Petkov <bp@suse.de>
Tested-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-07-01 00:11:41 +08:00
|
|
|
if (result)
|
|
|
|
return result;
|
2008-02-10 06:24:09 +08:00
|
|
|
|
|
|
|
/* Set up the direct mapping from scratch */
|
2013-01-25 04:20:14 +08:00
|
|
|
for (i = 0; i < nr_pfn_mapped; i++) {
|
|
|
|
mstart = pfn_mapped[i].start << PAGE_SHIFT;
|
|
|
|
mend = pfn_mapped[i].end << PAGE_SHIFT;
|
|
|
|
|
2016-08-03 07:19:26 +08:00
|
|
|
result = kernel_ident_mapping_init(&info, pgd, mstart, mend);
|
2013-01-25 04:20:14 +08:00
|
|
|
if (result)
|
|
|
|
return result;
|
2008-02-10 06:24:09 +08:00
|
|
|
}
|
2013-01-25 04:20:14 +08:00
|
|
|
|
2016-08-14 10:07:32 +08:00
|
|
|
temp_level4_pgt = __pa(pgd);
|
2008-02-10 06:24:09 +08:00
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
x86/power/64: Fix kernel text mapping corruption during image restoration
Logan Gunthorpe reports that hibernation stopped working reliably for
him after commit ab76f7b4ab23 (x86/mm: Set NX on gap between __ex_table
and rodata).
That turns out to be a consequence of a long-standing issue with the
64-bit image restoration code on x86, which is that the temporary
page tables set up by it to avoid page tables corruption when the
last bits of the image kernel's memory contents are copied into
their original page frames re-use the boot kernel's text mapping,
but that mapping may very well get corrupted just like any other
part of the page tables. Of course, if that happens, the final
jump to the image kernel's entry point will go to nowhere.
The exact reason why commit ab76f7b4ab23 matters here is that it
sometimes causes a PMD of a large page to be split into PTEs
that are allocated dynamically and get corrupted during image
restoration as described above.
To fix that issue note that the code copying the last bits of the
image kernel's memory contents to the page frames occupied by them
previoulsy doesn't use the kernel text mapping, because it runs from
a special page covered by the identity mapping set up for that code
from scratch. Hence, the kernel text mapping is only needed before
that code starts to run and then it will only be used just for the
final jump to the image kernel's entry point.
Accordingly, the temporary page tables set up in swsusp_arch_resume()
on x86-64 need to contain the kernel text mapping too. That mapping
is only going to be used for the final jump to the image kernel, so
it only needs to cover the image kernel's entry point, because the
first thing the image kernel does after getting control back is to
switch over to its own original page tables. Moreover, the virtual
address of the image kernel's entry point in that mapping has to be
the same as the one mapped by the image kernel's page tables.
With that in mind, modify the x86-64's arch_hibernation_header_save()
and arch_hibernation_header_restore() routines to pass the physical
address of the image kernel's entry point (in addition to its virtual
address) to the boot kernel (a small piece of assembly code involved
in passing the entry point's virtual address to the image kernel is
not necessary any more after that, so drop it). Update RESTORE_MAGIC
too to reflect the image header format change.
Next, in set_up_temporary_mappings(), use the physical and virtual
addresses of the image kernel's entry point passed in the image
header to set up a minimum kernel text mapping (using memory pages
that won't be overwritten by the image kernel's memory contents) that
will map those addresses to each other as appropriate.
This makes the concern about the possible corruption of the original
boot kernel text mapping go away and if the the minimum kernel text
mapping used for the final jump marks the image kernel's entry point
memory as executable, the jump to it is guaraneed to succeed.
Fixes: ab76f7b4ab23 (x86/mm: Set NX on gap between __ex_table and rodata)
Link: http://marc.info/?l=linux-pm&m=146372852823760&w=2
Reported-by: Logan Gunthorpe <logang@deltatee.com>
Reported-and-tested-by: Borislav Petkov <bp@suse.de>
Tested-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-07-01 00:11:41 +08:00
|
|
|
static int relocate_restore_code(void)
|
|
|
|
{
|
|
|
|
pgd_t *pgd;
|
|
|
|
pud_t *pud;
|
|
|
|
|
|
|
|
relocated_restore_code = get_safe_page(GFP_ATOMIC);
|
|
|
|
if (!relocated_restore_code)
|
|
|
|
return -ENOMEM;
|
|
|
|
|
|
|
|
memcpy((void *)relocated_restore_code, &core_restore_code, PAGE_SIZE);
|
|
|
|
|
|
|
|
/* Make the page containing the relocated code executable */
|
|
|
|
pgd = (pgd_t *)__va(read_cr3()) + pgd_index(relocated_restore_code);
|
|
|
|
pud = pud_offset(pgd, relocated_restore_code);
|
|
|
|
if (pud_large(*pud)) {
|
|
|
|
set_pud(pud, __pud(pud_val(*pud) & ~_PAGE_NX));
|
|
|
|
} else {
|
|
|
|
pmd_t *pmd = pmd_offset(pud, relocated_restore_code);
|
|
|
|
|
|
|
|
if (pmd_large(*pmd)) {
|
|
|
|
set_pmd(pmd, __pmd(pmd_val(*pmd) & ~_PAGE_NX));
|
|
|
|
} else {
|
|
|
|
pte_t *pte = pte_offset_kernel(pmd, relocated_restore_code);
|
|
|
|
|
|
|
|
set_pte(pte, __pte(pte_val(*pte) & ~_PAGE_NX));
|
|
|
|
}
|
|
|
|
}
|
|
|
|
__flush_tlb_all();
|
|
|
|
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2008-02-10 06:24:09 +08:00
|
|
|
int swsusp_arch_resume(void)
|
|
|
|
{
|
|
|
|
int error;
|
|
|
|
|
|
|
|
/* We have got enough memory and from now on we cannot recover */
|
x86/power/64: Fix kernel text mapping corruption during image restoration
Logan Gunthorpe reports that hibernation stopped working reliably for
him after commit ab76f7b4ab23 (x86/mm: Set NX on gap between __ex_table
and rodata).
That turns out to be a consequence of a long-standing issue with the
64-bit image restoration code on x86, which is that the temporary
page tables set up by it to avoid page tables corruption when the
last bits of the image kernel's memory contents are copied into
their original page frames re-use the boot kernel's text mapping,
but that mapping may very well get corrupted just like any other
part of the page tables. Of course, if that happens, the final
jump to the image kernel's entry point will go to nowhere.
The exact reason why commit ab76f7b4ab23 matters here is that it
sometimes causes a PMD of a large page to be split into PTEs
that are allocated dynamically and get corrupted during image
restoration as described above.
To fix that issue note that the code copying the last bits of the
image kernel's memory contents to the page frames occupied by them
previoulsy doesn't use the kernel text mapping, because it runs from
a special page covered by the identity mapping set up for that code
from scratch. Hence, the kernel text mapping is only needed before
that code starts to run and then it will only be used just for the
final jump to the image kernel's entry point.
Accordingly, the temporary page tables set up in swsusp_arch_resume()
on x86-64 need to contain the kernel text mapping too. That mapping
is only going to be used for the final jump to the image kernel, so
it only needs to cover the image kernel's entry point, because the
first thing the image kernel does after getting control back is to
switch over to its own original page tables. Moreover, the virtual
address of the image kernel's entry point in that mapping has to be
the same as the one mapped by the image kernel's page tables.
With that in mind, modify the x86-64's arch_hibernation_header_save()
and arch_hibernation_header_restore() routines to pass the physical
address of the image kernel's entry point (in addition to its virtual
address) to the boot kernel (a small piece of assembly code involved
in passing the entry point's virtual address to the image kernel is
not necessary any more after that, so drop it). Update RESTORE_MAGIC
too to reflect the image header format change.
Next, in set_up_temporary_mappings(), use the physical and virtual
addresses of the image kernel's entry point passed in the image
header to set up a minimum kernel text mapping (using memory pages
that won't be overwritten by the image kernel's memory contents) that
will map those addresses to each other as appropriate.
This makes the concern about the possible corruption of the original
boot kernel text mapping go away and if the the minimum kernel text
mapping used for the final jump marks the image kernel's entry point
memory as executable, the jump to it is guaraneed to succeed.
Fixes: ab76f7b4ab23 (x86/mm: Set NX on gap between __ex_table and rodata)
Link: http://marc.info/?l=linux-pm&m=146372852823760&w=2
Reported-by: Logan Gunthorpe <logang@deltatee.com>
Reported-and-tested-by: Borislav Petkov <bp@suse.de>
Tested-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-07-01 00:11:41 +08:00
|
|
|
error = set_up_temporary_mappings();
|
|
|
|
if (error)
|
2008-02-10 06:24:09 +08:00
|
|
|
return error;
|
|
|
|
|
x86/power/64: Fix kernel text mapping corruption during image restoration
Logan Gunthorpe reports that hibernation stopped working reliably for
him after commit ab76f7b4ab23 (x86/mm: Set NX on gap between __ex_table
and rodata).
That turns out to be a consequence of a long-standing issue with the
64-bit image restoration code on x86, which is that the temporary
page tables set up by it to avoid page tables corruption when the
last bits of the image kernel's memory contents are copied into
their original page frames re-use the boot kernel's text mapping,
but that mapping may very well get corrupted just like any other
part of the page tables. Of course, if that happens, the final
jump to the image kernel's entry point will go to nowhere.
The exact reason why commit ab76f7b4ab23 matters here is that it
sometimes causes a PMD of a large page to be split into PTEs
that are allocated dynamically and get corrupted during image
restoration as described above.
To fix that issue note that the code copying the last bits of the
image kernel's memory contents to the page frames occupied by them
previoulsy doesn't use the kernel text mapping, because it runs from
a special page covered by the identity mapping set up for that code
from scratch. Hence, the kernel text mapping is only needed before
that code starts to run and then it will only be used just for the
final jump to the image kernel's entry point.
Accordingly, the temporary page tables set up in swsusp_arch_resume()
on x86-64 need to contain the kernel text mapping too. That mapping
is only going to be used for the final jump to the image kernel, so
it only needs to cover the image kernel's entry point, because the
first thing the image kernel does after getting control back is to
switch over to its own original page tables. Moreover, the virtual
address of the image kernel's entry point in that mapping has to be
the same as the one mapped by the image kernel's page tables.
With that in mind, modify the x86-64's arch_hibernation_header_save()
and arch_hibernation_header_restore() routines to pass the physical
address of the image kernel's entry point (in addition to its virtual
address) to the boot kernel (a small piece of assembly code involved
in passing the entry point's virtual address to the image kernel is
not necessary any more after that, so drop it). Update RESTORE_MAGIC
too to reflect the image header format change.
Next, in set_up_temporary_mappings(), use the physical and virtual
addresses of the image kernel's entry point passed in the image
header to set up a minimum kernel text mapping (using memory pages
that won't be overwritten by the image kernel's memory contents) that
will map those addresses to each other as appropriate.
This makes the concern about the possible corruption of the original
boot kernel text mapping go away and if the the minimum kernel text
mapping used for the final jump marks the image kernel's entry point
memory as executable, the jump to it is guaraneed to succeed.
Fixes: ab76f7b4ab23 (x86/mm: Set NX on gap between __ex_table and rodata)
Link: http://marc.info/?l=linux-pm&m=146372852823760&w=2
Reported-by: Logan Gunthorpe <logang@deltatee.com>
Reported-and-tested-by: Borislav Petkov <bp@suse.de>
Tested-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-07-01 00:11:41 +08:00
|
|
|
error = relocate_restore_code();
|
|
|
|
if (error)
|
|
|
|
return error;
|
2008-02-10 06:24:09 +08:00
|
|
|
|
|
|
|
restore_image();
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* pfn_is_nosave - check if given pfn is in the 'nosave' section
|
|
|
|
*/
|
|
|
|
|
|
|
|
int pfn_is_nosave(unsigned long pfn)
|
|
|
|
{
|
|
|
|
unsigned long nosave_begin_pfn = __pa_symbol(&__nosave_begin) >> PAGE_SHIFT;
|
|
|
|
unsigned long nosave_end_pfn = PAGE_ALIGN(__pa_symbol(&__nosave_end)) >> PAGE_SHIFT;
|
|
|
|
return (pfn >= nosave_begin_pfn) && (pfn < nosave_end_pfn);
|
|
|
|
}
|
|
|
|
|
PM / hibernate: Verify the consistent of e820 memory map by md5 digest
On some platforms, there is occasional panic triggered when
trying to resume from hibernation, a typical panic looks like:
"BUG: unable to handle kernel paging request at ffff880085894000
IP: [<ffffffff810c5dc2>] load_image_lzo+0x8c2/0xe70"
Investigation carried out by Lee Chun-Yi shows that this is because
e820 map has been changed by BIOS across hibernation, and one
of the page frames from suspend kernel is right located in restore
kernel's unmapped region, so panic comes out when accessing unmapped
kernel address.
In order to expose this issue earlier, the md5 hash of e820 map
is passed from suspend kernel to restore kernel, and the restore
kernel will terminate the resume process once it finds the md5
hash are not the same.
As the format of image header has been modified, the magic number
should also be adjusted as kernels with the same RESTORE_MAGIC have
to use the same header format and interpret all of the fields in
it in the same way.
If the suspend kernel is built without md5 support, and the restore
kernel has md5 support, then the latter will bypass the check process.
Vice versa the restore kernel will bypass the check if it does not
support md5 operation.
Note:
1. Without this patch applied, it is possible that BIOS has
provided an inconsistent memory map, but the resume kernel is still
able to restore the image anyway(e.g, E820_RAM region is the superset
of the previous one), although the system might be unstable. So this
patch tries to treat any inconsistent e820 as illegal.
2. Another case is, this patch replies on comparing the e820_saved, but
currently the e820_save might not be strictly the same across
hibernation, even if BIOS has provided consistent e820 map - In
theory mptable might modify the BIOS-provided e820_saved dynamically
in early_reserve_e820_mpc_new, which would allocate a buffer from
E820_RAM, and marks it from E820_RAM to E820_RESERVED).
This is a potential and rare case we need to deal with in OS in
the future.
Suggested-by: Pavel Machek <pavel@ucw.cz>
Suggested-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Chen Yu <yu.c.chen@intel.com>
Reviewed-by: Lee, Chun-Yi <jlee@suse.com>
Acked-by: Pavel Machek <pavel@ucw.cz>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-10-20 16:14:52 +08:00
|
|
|
#define MD5_DIGEST_SIZE 16
|
|
|
|
|
2008-02-10 06:24:09 +08:00
|
|
|
struct restore_data_record {
|
|
|
|
unsigned long jump_address;
|
x86/power/64: Fix kernel text mapping corruption during image restoration
Logan Gunthorpe reports that hibernation stopped working reliably for
him after commit ab76f7b4ab23 (x86/mm: Set NX on gap between __ex_table
and rodata).
That turns out to be a consequence of a long-standing issue with the
64-bit image restoration code on x86, which is that the temporary
page tables set up by it to avoid page tables corruption when the
last bits of the image kernel's memory contents are copied into
their original page frames re-use the boot kernel's text mapping,
but that mapping may very well get corrupted just like any other
part of the page tables. Of course, if that happens, the final
jump to the image kernel's entry point will go to nowhere.
The exact reason why commit ab76f7b4ab23 matters here is that it
sometimes causes a PMD of a large page to be split into PTEs
that are allocated dynamically and get corrupted during image
restoration as described above.
To fix that issue note that the code copying the last bits of the
image kernel's memory contents to the page frames occupied by them
previoulsy doesn't use the kernel text mapping, because it runs from
a special page covered by the identity mapping set up for that code
from scratch. Hence, the kernel text mapping is only needed before
that code starts to run and then it will only be used just for the
final jump to the image kernel's entry point.
Accordingly, the temporary page tables set up in swsusp_arch_resume()
on x86-64 need to contain the kernel text mapping too. That mapping
is only going to be used for the final jump to the image kernel, so
it only needs to cover the image kernel's entry point, because the
first thing the image kernel does after getting control back is to
switch over to its own original page tables. Moreover, the virtual
address of the image kernel's entry point in that mapping has to be
the same as the one mapped by the image kernel's page tables.
With that in mind, modify the x86-64's arch_hibernation_header_save()
and arch_hibernation_header_restore() routines to pass the physical
address of the image kernel's entry point (in addition to its virtual
address) to the boot kernel (a small piece of assembly code involved
in passing the entry point's virtual address to the image kernel is
not necessary any more after that, so drop it). Update RESTORE_MAGIC
too to reflect the image header format change.
Next, in set_up_temporary_mappings(), use the physical and virtual
addresses of the image kernel's entry point passed in the image
header to set up a minimum kernel text mapping (using memory pages
that won't be overwritten by the image kernel's memory contents) that
will map those addresses to each other as appropriate.
This makes the concern about the possible corruption of the original
boot kernel text mapping go away and if the the minimum kernel text
mapping used for the final jump marks the image kernel's entry point
memory as executable, the jump to it is guaraneed to succeed.
Fixes: ab76f7b4ab23 (x86/mm: Set NX on gap between __ex_table and rodata)
Link: http://marc.info/?l=linux-pm&m=146372852823760&w=2
Reported-by: Logan Gunthorpe <logang@deltatee.com>
Reported-and-tested-by: Borislav Petkov <bp@suse.de>
Tested-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-07-01 00:11:41 +08:00
|
|
|
unsigned long jump_address_phys;
|
2008-02-10 06:24:09 +08:00
|
|
|
unsigned long cr3;
|
|
|
|
unsigned long magic;
|
PM / hibernate: Verify the consistent of e820 memory map by md5 digest
On some platforms, there is occasional panic triggered when
trying to resume from hibernation, a typical panic looks like:
"BUG: unable to handle kernel paging request at ffff880085894000
IP: [<ffffffff810c5dc2>] load_image_lzo+0x8c2/0xe70"
Investigation carried out by Lee Chun-Yi shows that this is because
e820 map has been changed by BIOS across hibernation, and one
of the page frames from suspend kernel is right located in restore
kernel's unmapped region, so panic comes out when accessing unmapped
kernel address.
In order to expose this issue earlier, the md5 hash of e820 map
is passed from suspend kernel to restore kernel, and the restore
kernel will terminate the resume process once it finds the md5
hash are not the same.
As the format of image header has been modified, the magic number
should also be adjusted as kernels with the same RESTORE_MAGIC have
to use the same header format and interpret all of the fields in
it in the same way.
If the suspend kernel is built without md5 support, and the restore
kernel has md5 support, then the latter will bypass the check process.
Vice versa the restore kernel will bypass the check if it does not
support md5 operation.
Note:
1. Without this patch applied, it is possible that BIOS has
provided an inconsistent memory map, but the resume kernel is still
able to restore the image anyway(e.g, E820_RAM region is the superset
of the previous one), although the system might be unstable. So this
patch tries to treat any inconsistent e820 as illegal.
2. Another case is, this patch replies on comparing the e820_saved, but
currently the e820_save might not be strictly the same across
hibernation, even if BIOS has provided consistent e820 map - In
theory mptable might modify the BIOS-provided e820_saved dynamically
in early_reserve_e820_mpc_new, which would allocate a buffer from
E820_RAM, and marks it from E820_RAM to E820_RESERVED).
This is a potential and rare case we need to deal with in OS in
the future.
Suggested-by: Pavel Machek <pavel@ucw.cz>
Suggested-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Chen Yu <yu.c.chen@intel.com>
Reviewed-by: Lee, Chun-Yi <jlee@suse.com>
Acked-by: Pavel Machek <pavel@ucw.cz>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-10-20 16:14:52 +08:00
|
|
|
u8 e820_digest[MD5_DIGEST_SIZE];
|
2008-02-10 06:24:09 +08:00
|
|
|
};
|
|
|
|
|
PM / hibernate: Verify the consistent of e820 memory map by md5 digest
On some platforms, there is occasional panic triggered when
trying to resume from hibernation, a typical panic looks like:
"BUG: unable to handle kernel paging request at ffff880085894000
IP: [<ffffffff810c5dc2>] load_image_lzo+0x8c2/0xe70"
Investigation carried out by Lee Chun-Yi shows that this is because
e820 map has been changed by BIOS across hibernation, and one
of the page frames from suspend kernel is right located in restore
kernel's unmapped region, so panic comes out when accessing unmapped
kernel address.
In order to expose this issue earlier, the md5 hash of e820 map
is passed from suspend kernel to restore kernel, and the restore
kernel will terminate the resume process once it finds the md5
hash are not the same.
As the format of image header has been modified, the magic number
should also be adjusted as kernels with the same RESTORE_MAGIC have
to use the same header format and interpret all of the fields in
it in the same way.
If the suspend kernel is built without md5 support, and the restore
kernel has md5 support, then the latter will bypass the check process.
Vice versa the restore kernel will bypass the check if it does not
support md5 operation.
Note:
1. Without this patch applied, it is possible that BIOS has
provided an inconsistent memory map, but the resume kernel is still
able to restore the image anyway(e.g, E820_RAM region is the superset
of the previous one), although the system might be unstable. So this
patch tries to treat any inconsistent e820 as illegal.
2. Another case is, this patch replies on comparing the e820_saved, but
currently the e820_save might not be strictly the same across
hibernation, even if BIOS has provided consistent e820 map - In
theory mptable might modify the BIOS-provided e820_saved dynamically
in early_reserve_e820_mpc_new, which would allocate a buffer from
E820_RAM, and marks it from E820_RAM to E820_RESERVED).
This is a potential and rare case we need to deal with in OS in
the future.
Suggested-by: Pavel Machek <pavel@ucw.cz>
Suggested-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Chen Yu <yu.c.chen@intel.com>
Reviewed-by: Lee, Chun-Yi <jlee@suse.com>
Acked-by: Pavel Machek <pavel@ucw.cz>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-10-20 16:14:52 +08:00
|
|
|
#define RESTORE_MAGIC 0x23456789ABCDEF01UL
|
|
|
|
|
|
|
|
#if IS_BUILTIN(CONFIG_CRYPTO_MD5)
|
|
|
|
/**
|
|
|
|
* get_e820_md5 - calculate md5 according to given e820 map
|
|
|
|
*
|
|
|
|
* @map: the e820 map to be calculated
|
|
|
|
* @buf: the md5 result to be stored to
|
|
|
|
*/
|
|
|
|
static int get_e820_md5(struct e820map *map, void *buf)
|
|
|
|
{
|
|
|
|
struct scatterlist sg;
|
|
|
|
struct crypto_ahash *tfm;
|
|
|
|
int size;
|
|
|
|
int ret = 0;
|
|
|
|
|
|
|
|
tfm = crypto_alloc_ahash("md5", 0, CRYPTO_ALG_ASYNC);
|
|
|
|
if (IS_ERR(tfm))
|
|
|
|
return -ENOMEM;
|
|
|
|
|
|
|
|
{
|
|
|
|
AHASH_REQUEST_ON_STACK(req, tfm);
|
|
|
|
size = offsetof(struct e820map, map)
|
|
|
|
+ sizeof(struct e820entry) * map->nr_map;
|
|
|
|
ahash_request_set_tfm(req, tfm);
|
|
|
|
sg_init_one(&sg, (u8 *)map, size);
|
|
|
|
ahash_request_set_callback(req, 0, NULL, NULL);
|
|
|
|
ahash_request_set_crypt(req, &sg, buf, size);
|
|
|
|
|
|
|
|
if (crypto_ahash_digest(req))
|
|
|
|
ret = -EINVAL;
|
|
|
|
ahash_request_zero(req);
|
|
|
|
}
|
|
|
|
crypto_free_ahash(tfm);
|
|
|
|
|
|
|
|
return ret;
|
|
|
|
}
|
|
|
|
|
|
|
|
static void hibernation_e820_save(void *buf)
|
|
|
|
{
|
|
|
|
get_e820_md5(e820_saved, buf);
|
|
|
|
}
|
|
|
|
|
|
|
|
static bool hibernation_e820_mismatch(void *buf)
|
|
|
|
{
|
|
|
|
int ret;
|
|
|
|
u8 result[MD5_DIGEST_SIZE];
|
|
|
|
|
|
|
|
memset(result, 0, MD5_DIGEST_SIZE);
|
|
|
|
/* If there is no digest in suspend kernel, let it go. */
|
|
|
|
if (!memcmp(result, buf, MD5_DIGEST_SIZE))
|
|
|
|
return false;
|
|
|
|
|
|
|
|
ret = get_e820_md5(e820_saved, result);
|
|
|
|
if (ret)
|
|
|
|
return true;
|
|
|
|
|
|
|
|
return memcmp(result, buf, MD5_DIGEST_SIZE) ? true : false;
|
|
|
|
}
|
|
|
|
#else
|
|
|
|
static void hibernation_e820_save(void *buf)
|
|
|
|
{
|
|
|
|
}
|
|
|
|
|
|
|
|
static bool hibernation_e820_mismatch(void *buf)
|
|
|
|
{
|
|
|
|
/* If md5 is not builtin for restore kernel, let it go. */
|
|
|
|
return false;
|
|
|
|
}
|
|
|
|
#endif
|
2008-02-10 06:24:09 +08:00
|
|
|
|
|
|
|
/**
|
|
|
|
* arch_hibernation_header_save - populate the architecture specific part
|
|
|
|
* of a hibernation image header
|
|
|
|
* @addr: address to save the data at
|
|
|
|
*/
|
|
|
|
int arch_hibernation_header_save(void *addr, unsigned int max_size)
|
|
|
|
{
|
|
|
|
struct restore_data_record *rdr = addr;
|
|
|
|
|
|
|
|
if (max_size < sizeof(struct restore_data_record))
|
|
|
|
return -EOVERFLOW;
|
x86/power/64: Fix kernel text mapping corruption during image restoration
Logan Gunthorpe reports that hibernation stopped working reliably for
him after commit ab76f7b4ab23 (x86/mm: Set NX on gap between __ex_table
and rodata).
That turns out to be a consequence of a long-standing issue with the
64-bit image restoration code on x86, which is that the temporary
page tables set up by it to avoid page tables corruption when the
last bits of the image kernel's memory contents are copied into
their original page frames re-use the boot kernel's text mapping,
but that mapping may very well get corrupted just like any other
part of the page tables. Of course, if that happens, the final
jump to the image kernel's entry point will go to nowhere.
The exact reason why commit ab76f7b4ab23 matters here is that it
sometimes causes a PMD of a large page to be split into PTEs
that are allocated dynamically and get corrupted during image
restoration as described above.
To fix that issue note that the code copying the last bits of the
image kernel's memory contents to the page frames occupied by them
previoulsy doesn't use the kernel text mapping, because it runs from
a special page covered by the identity mapping set up for that code
from scratch. Hence, the kernel text mapping is only needed before
that code starts to run and then it will only be used just for the
final jump to the image kernel's entry point.
Accordingly, the temporary page tables set up in swsusp_arch_resume()
on x86-64 need to contain the kernel text mapping too. That mapping
is only going to be used for the final jump to the image kernel, so
it only needs to cover the image kernel's entry point, because the
first thing the image kernel does after getting control back is to
switch over to its own original page tables. Moreover, the virtual
address of the image kernel's entry point in that mapping has to be
the same as the one mapped by the image kernel's page tables.
With that in mind, modify the x86-64's arch_hibernation_header_save()
and arch_hibernation_header_restore() routines to pass the physical
address of the image kernel's entry point (in addition to its virtual
address) to the boot kernel (a small piece of assembly code involved
in passing the entry point's virtual address to the image kernel is
not necessary any more after that, so drop it). Update RESTORE_MAGIC
too to reflect the image header format change.
Next, in set_up_temporary_mappings(), use the physical and virtual
addresses of the image kernel's entry point passed in the image
header to set up a minimum kernel text mapping (using memory pages
that won't be overwritten by the image kernel's memory contents) that
will map those addresses to each other as appropriate.
This makes the concern about the possible corruption of the original
boot kernel text mapping go away and if the the minimum kernel text
mapping used for the final jump marks the image kernel's entry point
memory as executable, the jump to it is guaraneed to succeed.
Fixes: ab76f7b4ab23 (x86/mm: Set NX on gap between __ex_table and rodata)
Link: http://marc.info/?l=linux-pm&m=146372852823760&w=2
Reported-by: Logan Gunthorpe <logang@deltatee.com>
Reported-and-tested-by: Borislav Petkov <bp@suse.de>
Tested-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-07-01 00:11:41 +08:00
|
|
|
rdr->jump_address = (unsigned long)&restore_registers;
|
|
|
|
rdr->jump_address_phys = __pa_symbol(&restore_registers);
|
2008-02-10 06:24:09 +08:00
|
|
|
rdr->cr3 = restore_cr3;
|
|
|
|
rdr->magic = RESTORE_MAGIC;
|
PM / hibernate: Verify the consistent of e820 memory map by md5 digest
On some platforms, there is occasional panic triggered when
trying to resume from hibernation, a typical panic looks like:
"BUG: unable to handle kernel paging request at ffff880085894000
IP: [<ffffffff810c5dc2>] load_image_lzo+0x8c2/0xe70"
Investigation carried out by Lee Chun-Yi shows that this is because
e820 map has been changed by BIOS across hibernation, and one
of the page frames from suspend kernel is right located in restore
kernel's unmapped region, so panic comes out when accessing unmapped
kernel address.
In order to expose this issue earlier, the md5 hash of e820 map
is passed from suspend kernel to restore kernel, and the restore
kernel will terminate the resume process once it finds the md5
hash are not the same.
As the format of image header has been modified, the magic number
should also be adjusted as kernels with the same RESTORE_MAGIC have
to use the same header format and interpret all of the fields in
it in the same way.
If the suspend kernel is built without md5 support, and the restore
kernel has md5 support, then the latter will bypass the check process.
Vice versa the restore kernel will bypass the check if it does not
support md5 operation.
Note:
1. Without this patch applied, it is possible that BIOS has
provided an inconsistent memory map, but the resume kernel is still
able to restore the image anyway(e.g, E820_RAM region is the superset
of the previous one), although the system might be unstable. So this
patch tries to treat any inconsistent e820 as illegal.
2. Another case is, this patch replies on comparing the e820_saved, but
currently the e820_save might not be strictly the same across
hibernation, even if BIOS has provided consistent e820 map - In
theory mptable might modify the BIOS-provided e820_saved dynamically
in early_reserve_e820_mpc_new, which would allocate a buffer from
E820_RAM, and marks it from E820_RAM to E820_RESERVED).
This is a potential and rare case we need to deal with in OS in
the future.
Suggested-by: Pavel Machek <pavel@ucw.cz>
Suggested-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Chen Yu <yu.c.chen@intel.com>
Reviewed-by: Lee, Chun-Yi <jlee@suse.com>
Acked-by: Pavel Machek <pavel@ucw.cz>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-10-20 16:14:52 +08:00
|
|
|
|
|
|
|
hibernation_e820_save(rdr->e820_digest);
|
|
|
|
|
2008-02-10 06:24:09 +08:00
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
|
|
|
/**
|
|
|
|
* arch_hibernation_header_restore - read the architecture specific data
|
|
|
|
* from the hibernation image header
|
|
|
|
* @addr: address to read the data from
|
|
|
|
*/
|
|
|
|
int arch_hibernation_header_restore(void *addr)
|
|
|
|
{
|
|
|
|
struct restore_data_record *rdr = addr;
|
|
|
|
|
|
|
|
restore_jump_address = rdr->jump_address;
|
x86/power/64: Fix kernel text mapping corruption during image restoration
Logan Gunthorpe reports that hibernation stopped working reliably for
him after commit ab76f7b4ab23 (x86/mm: Set NX on gap between __ex_table
and rodata).
That turns out to be a consequence of a long-standing issue with the
64-bit image restoration code on x86, which is that the temporary
page tables set up by it to avoid page tables corruption when the
last bits of the image kernel's memory contents are copied into
their original page frames re-use the boot kernel's text mapping,
but that mapping may very well get corrupted just like any other
part of the page tables. Of course, if that happens, the final
jump to the image kernel's entry point will go to nowhere.
The exact reason why commit ab76f7b4ab23 matters here is that it
sometimes causes a PMD of a large page to be split into PTEs
that are allocated dynamically and get corrupted during image
restoration as described above.
To fix that issue note that the code copying the last bits of the
image kernel's memory contents to the page frames occupied by them
previoulsy doesn't use the kernel text mapping, because it runs from
a special page covered by the identity mapping set up for that code
from scratch. Hence, the kernel text mapping is only needed before
that code starts to run and then it will only be used just for the
final jump to the image kernel's entry point.
Accordingly, the temporary page tables set up in swsusp_arch_resume()
on x86-64 need to contain the kernel text mapping too. That mapping
is only going to be used for the final jump to the image kernel, so
it only needs to cover the image kernel's entry point, because the
first thing the image kernel does after getting control back is to
switch over to its own original page tables. Moreover, the virtual
address of the image kernel's entry point in that mapping has to be
the same as the one mapped by the image kernel's page tables.
With that in mind, modify the x86-64's arch_hibernation_header_save()
and arch_hibernation_header_restore() routines to pass the physical
address of the image kernel's entry point (in addition to its virtual
address) to the boot kernel (a small piece of assembly code involved
in passing the entry point's virtual address to the image kernel is
not necessary any more after that, so drop it). Update RESTORE_MAGIC
too to reflect the image header format change.
Next, in set_up_temporary_mappings(), use the physical and virtual
addresses of the image kernel's entry point passed in the image
header to set up a minimum kernel text mapping (using memory pages
that won't be overwritten by the image kernel's memory contents) that
will map those addresses to each other as appropriate.
This makes the concern about the possible corruption of the original
boot kernel text mapping go away and if the the minimum kernel text
mapping used for the final jump marks the image kernel's entry point
memory as executable, the jump to it is guaraneed to succeed.
Fixes: ab76f7b4ab23 (x86/mm: Set NX on gap between __ex_table and rodata)
Link: http://marc.info/?l=linux-pm&m=146372852823760&w=2
Reported-by: Logan Gunthorpe <logang@deltatee.com>
Reported-and-tested-by: Borislav Petkov <bp@suse.de>
Tested-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-07-01 00:11:41 +08:00
|
|
|
jump_address_phys = rdr->jump_address_phys;
|
2008-02-10 06:24:09 +08:00
|
|
|
restore_cr3 = rdr->cr3;
|
PM / hibernate: Verify the consistent of e820 memory map by md5 digest
On some platforms, there is occasional panic triggered when
trying to resume from hibernation, a typical panic looks like:
"BUG: unable to handle kernel paging request at ffff880085894000
IP: [<ffffffff810c5dc2>] load_image_lzo+0x8c2/0xe70"
Investigation carried out by Lee Chun-Yi shows that this is because
e820 map has been changed by BIOS across hibernation, and one
of the page frames from suspend kernel is right located in restore
kernel's unmapped region, so panic comes out when accessing unmapped
kernel address.
In order to expose this issue earlier, the md5 hash of e820 map
is passed from suspend kernel to restore kernel, and the restore
kernel will terminate the resume process once it finds the md5
hash are not the same.
As the format of image header has been modified, the magic number
should also be adjusted as kernels with the same RESTORE_MAGIC have
to use the same header format and interpret all of the fields in
it in the same way.
If the suspend kernel is built without md5 support, and the restore
kernel has md5 support, then the latter will bypass the check process.
Vice versa the restore kernel will bypass the check if it does not
support md5 operation.
Note:
1. Without this patch applied, it is possible that BIOS has
provided an inconsistent memory map, but the resume kernel is still
able to restore the image anyway(e.g, E820_RAM region is the superset
of the previous one), although the system might be unstable. So this
patch tries to treat any inconsistent e820 as illegal.
2. Another case is, this patch replies on comparing the e820_saved, but
currently the e820_save might not be strictly the same across
hibernation, even if BIOS has provided consistent e820 map - In
theory mptable might modify the BIOS-provided e820_saved dynamically
in early_reserve_e820_mpc_new, which would allocate a buffer from
E820_RAM, and marks it from E820_RAM to E820_RESERVED).
This is a potential and rare case we need to deal with in OS in
the future.
Suggested-by: Pavel Machek <pavel@ucw.cz>
Suggested-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Chen Yu <yu.c.chen@intel.com>
Reviewed-by: Lee, Chun-Yi <jlee@suse.com>
Acked-by: Pavel Machek <pavel@ucw.cz>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-10-20 16:14:52 +08:00
|
|
|
|
|
|
|
if (rdr->magic != RESTORE_MAGIC) {
|
|
|
|
pr_crit("Unrecognized hibernate image header format!\n");
|
|
|
|
return -EINVAL;
|
|
|
|
}
|
|
|
|
|
|
|
|
if (hibernation_e820_mismatch(rdr->e820_digest)) {
|
|
|
|
pr_crit("Hibernate inconsistent memory map detected!\n");
|
|
|
|
return -ENODEV;
|
|
|
|
}
|
|
|
|
|
|
|
|
return 0;
|
2008-02-10 06:24:09 +08:00
|
|
|
}
|