linux/arch/x86/mm
Jacob Shin 66520ebc2d x86, mm: Only direct map addresses that are marked as E820_RAM
Currently direct mappings are created for [ 0 to max_low_pfn<<PAGE_SHIFT )
and [ 4GB to max_pfn<<PAGE_SHIFT ), which may include regions that are not
backed by actual DRAM. This is fine for holes under 4GB which are covered
by fixed and variable range MTRRs to be UC. However, we run into trouble
on higher memory addresses which cannot be covered by MTRRs.

Our system with 1TB of RAM has an e820 that looks like this:

 BIOS-e820: [mem 0x0000000000000000-0x00000000000983ff] usable
 BIOS-e820: [mem 0x0000000000098400-0x000000000009ffff] reserved
 BIOS-e820: [mem 0x00000000000d0000-0x00000000000fffff] reserved
 BIOS-e820: [mem 0x0000000000100000-0x00000000c7ebffff] usable
 BIOS-e820: [mem 0x00000000c7ec0000-0x00000000c7ed7fff] ACPI data
 BIOS-e820: [mem 0x00000000c7ed8000-0x00000000c7ed9fff] ACPI NVS
 BIOS-e820: [mem 0x00000000c7eda000-0x00000000c7ffffff] reserved
 BIOS-e820: [mem 0x00000000fec00000-0x00000000fec0ffff] reserved
 BIOS-e820: [mem 0x00000000fee00000-0x00000000fee00fff] reserved
 BIOS-e820: [mem 0x00000000fff00000-0x00000000ffffffff] reserved
 BIOS-e820: [mem 0x0000000100000000-0x000000e037ffffff] usable
 BIOS-e820: [mem 0x000000e038000000-0x000000fcffffffff] reserved
 BIOS-e820: [mem 0x0000010000000000-0x0000011ffeffffff] usable

and so direct mappings are created for huge memory hole between
0x000000e038000000 to 0x0000010000000000. Even though the kernel never
generates memory accesses in that region, since the page tables mark
them incorrectly as being WB, our (AMD) processor ends up causing a MCE
while doing some memory bookkeeping/optimizations around that area.

This patch iterates through e820 and only direct maps ranges that are
marked as E820_RAM, and keeps track of those pfn ranges. Depending on
the alignment of E820 ranges, this may possibly result in using smaller
size (i.e. 4K instead of 2M or 1G) page tables.

-v2: move changes from setup.c to mm/init.c, also use for_each_mem_pfn_range
	instead.  - Yinghai Lu
-v3: add calculate_all_table_space_size() to get correct needed page table
	size. - Yinghai Lu
-v4: fix add_pfn_range_mapped() to get correct max_low_pfn_mapped when
     mem map does have hole under 4g that is found by Konard on xen
     domU with 8g ram. - Yinghai

Signed-off-by: Jacob Shin <jacob.shin@amd.com>
Link: http://lkml.kernel.org/r/1353123563-3103-16-git-send-email-yinghai@kernel.org
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Reviewed-by: Pekka Enberg <penberg@kernel.org>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2012-11-17 11:59:14 -08:00
..
kmemcheck bug.h: add include of it to various implicit C users 2012-02-29 17:15:08 -05:00
Makefile memblock, x86: Replace memblock_x86_reserve/free_range() with generic ones 2011-07-14 11:47:53 -07:00
amdtopology.c x86, NUMA: Enable CONFIG_AMD_NUMA on 32bit too 2011-05-02 17:24:48 +02:00
dump_pagetables.c x86, mm: Create symbolic index into address_markers array 2010-07-20 16:56:19 -07:00
extable.c x86, extable: Switch to relative exception table entries 2012-04-20 17:22:34 -07:00
fault.c readahead: fault retry breaks mmap file read random detection 2012-10-09 16:22:47 +09:00
gup.c thp: add compound tail page _mapcount when mapped 2011-12-09 07:50:28 -08:00
highmem_32.c highmem: kill all __kmap_atomic() 2012-03-20 21:48:30 +08:00
hugetlbpage.c mm: replace vma prio_tree with an interval tree 2012-10-09 16:22:39 +09:00
init.c x86, mm: Only direct map addresses that are marked as E820_RAM 2012-11-17 11:59:14 -08:00
init_32.c Merge branch 'x86-smap-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2012-10-01 13:59:17 -07:00
init_64.c x86, mm: Only direct map addresses that are marked as E820_RAM 2012-11-17 11:59:14 -08:00
iomap_32.c mm: fix race in kunmap_atomic() 2010-10-27 18:03:05 -07:00
ioremap.c x86/mm: Fix some kernel-doc warnings 2012-06-11 10:54:45 +02:00
kmmio.c x86, kmmio/mmiotrace: Fix double free of kmmio_fault_pages 2010-06-18 11:30:09 +02:00
memtest.c memblock, x86: Replace memblock_x86_reserve/free_range() with generic ones 2011-07-14 11:47:53 -07:00
mmap.c x86: Fix mmap random address range 2011-12-05 17:07:23 +01:00
mmio-mod.c module_param: make bool parameters really bool (arch) 2012-01-13 09:32:18 +10:30
numa.c x86: print physical addresses consistently with other parts of kernel 2012-05-29 16:22:21 -07:00
numa_32.c memblock, x86: Replace memblock_x86_reserve/free_range() with generic ones 2011-07-14 11:47:53 -07:00
numa_64.c memblock, x86: Make free_all_memory_core_early() explicitly free lowmem only 2011-07-14 11:47:49 -07:00
numa_emulation.c x86: print physical addresses consistently with other parts of kernel 2012-05-29 16:22:21 -07:00
numa_internal.h x86, NUMA: Initialize and use remap allocator from setup_node_bootmem() 2011-05-02 14:18:54 +02:00
pageattr-test.c x86: Convert vmalloc()+memset() to vzalloc() 2011-05-28 19:53:57 +02:00
pageattr.c x86, mm: use pfn_range_is_mapped() with CPA 2012-11-17 11:59:09 -08:00
pat.c mm, x86, pat: rework linear pfn-mmap tracking 2012-10-09 16:22:16 +09:00
pat_internal.h x86, pat: Fix memory leak in free_memtype 2010-05-26 11:26:04 -07:00
pat_rbtree.c rbtree: move augmented rbtree functionality to rbtree_augmented.h 2012-10-09 16:22:40 +09:00
pf_in.c x86: Eliminate various 'set but not used' warnings 2011-05-21 19:10:33 +02:00
pf_in.h
pgtable.c x86: Flush TLB if PGD entry is changed in i386 PAE mode 2011-03-18 11:44:01 +01:00
pgtable_32.c Disintegrate asm/system.h for X86 2012-03-28 18:11:12 +01:00
physaddr.c x86: split __phys_addr out into separate file 2009-09-10 11:48:55 -07:00
physaddr.h x86: split __phys_addr out into separate file 2009-09-10 11:48:55 -07:00
setup_nx.c x86, cpu: Only CPU features determine NX capabilities 2010-11-10 15:43:15 -08:00
srat.c ACPI: Only count valid srat memory structures 2012-08-03 00:15:53 -04:00
testmmiotrace.c x86, kmmio/mmiotrace: Fix double free of kmmio_fault_pages 2010-06-18 11:30:09 +02:00
tlb.c x86: Distinguish TLB shootdown interrupts from other functions call interrupts 2012-09-27 22:52:34 -07:00