linux/arch/sparc/include/asm/tsb.h

381 lines
12 KiB
C
Raw Normal View History

License cleanup: add SPDX GPL-2.0 license identifier to files with no license Many source files in the tree are missing licensing information, which makes it harder for compliance tools to determine the correct license. By default all files without license information are under the default license of the kernel, which is GPL version 2. Update the files which contain no license information with the 'GPL-2.0' SPDX license identifier. The SPDX identifier is a legally binding shorthand, which can be used instead of the full boiler plate text. This patch is based on work done by Thomas Gleixner and Kate Stewart and Philippe Ombredanne. How this work was done: Patches were generated and checked against linux-4.14-rc6 for a subset of the use cases: - file had no licensing information it it. - file was a */uapi/* one with no licensing information in it, - file was a */uapi/* one with existing licensing information, Further patches will be generated in subsequent months to fix up cases where non-standard license headers were used, and references to license had to be inferred by heuristics based on keywords. The analysis to determine which SPDX License Identifier to be applied to a file was done in a spreadsheet of side by side results from of the output of two independent scanners (ScanCode & Windriver) producing SPDX tag:value files created by Philippe Ombredanne. Philippe prepared the base worksheet, and did an initial spot review of a few 1000 files. The 4.13 kernel was the starting point of the analysis with 60,537 files assessed. Kate Stewart did a file by file comparison of the scanner results in the spreadsheet to determine which SPDX license identifier(s) to be applied to the file. She confirmed any determination that was not immediately clear with lawyers working with the Linux Foundation. Criteria used to select files for SPDX license identifier tagging was: - Files considered eligible had to be source code files. - Make and config files were included as candidates if they contained >5 lines of source - File already had some variant of a license header in it (even if <5 lines). All documentation files were explicitly excluded. The following heuristics were used to determine which SPDX license identifiers to apply. - when both scanners couldn't find any license traces, file was considered to have no license information in it, and the top level COPYING file license applied. For non */uapi/* files that summary was: SPDX license identifier # files ---------------------------------------------------|------- GPL-2.0 11139 and resulted in the first patch in this series. If that file was a */uapi/* path one, it was "GPL-2.0 WITH Linux-syscall-note" otherwise it was "GPL-2.0". Results of that was: SPDX license identifier # files ---------------------------------------------------|------- GPL-2.0 WITH Linux-syscall-note 930 and resulted in the second patch in this series. - if a file had some form of licensing information in it, and was one of the */uapi/* ones, it was denoted with the Linux-syscall-note if any GPL family license was found in the file or had no licensing in it (per prior point). Results summary: SPDX license identifier # files ---------------------------------------------------|------ GPL-2.0 WITH Linux-syscall-note 270 GPL-2.0+ WITH Linux-syscall-note 169 ((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause) 21 ((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause) 17 LGPL-2.1+ WITH Linux-syscall-note 15 GPL-1.0+ WITH Linux-syscall-note 14 ((GPL-2.0+ WITH Linux-syscall-note) OR BSD-3-Clause) 5 LGPL-2.0+ WITH Linux-syscall-note 4 LGPL-2.1 WITH Linux-syscall-note 3 ((GPL-2.0 WITH Linux-syscall-note) OR MIT) 3 ((GPL-2.0 WITH Linux-syscall-note) AND MIT) 1 and that resulted in the third patch in this series. - when the two scanners agreed on the detected license(s), that became the concluded license(s). - when there was disagreement between the two scanners (one detected a license but the other didn't, or they both detected different licenses) a manual inspection of the file occurred. - In most cases a manual inspection of the information in the file resulted in a clear resolution of the license that should apply (and which scanner probably needed to revisit its heuristics). - When it was not immediately clear, the license identifier was confirmed with lawyers working with the Linux Foundation. - If there was any question as to the appropriate license identifier, the file was flagged for further research and to be revisited later in time. In total, over 70 hours of logged manual review was done on the spreadsheet to determine the SPDX license identifiers to apply to the source files by Kate, Philippe, Thomas and, in some cases, confirmation by lawyers working with the Linux Foundation. Kate also obtained a third independent scan of the 4.13 code base from FOSSology, and compared selected files where the other two scanners disagreed against that SPDX file, to see if there was new insights. The Windriver scanner is based on an older version of FOSSology in part, so they are related. Thomas did random spot checks in about 500 files from the spreadsheets for the uapi headers and agreed with SPDX license identifier in the files he inspected. For the non-uapi files Thomas did random spot checks in about 15000 files. In initial set of patches against 4.14-rc6, 3 files were found to have copy/paste license identifier errors, and have been fixed to reflect the correct identifier. Additionally Philippe spent 10 hours this week doing a detailed manual inspection and review of the 12,461 patched files from the initial patch version early this week with: - a full scancode scan run, collecting the matched texts, detected license ids and scores - reviewing anything where there was a license detected (about 500+ files) to ensure that the applied SPDX license was correct - reviewing anything where there was no detection but the patch license was not GPL-2.0 WITH Linux-syscall-note to ensure that the applied SPDX license was correct This produced a worksheet with 20 files needing minor correction. This worksheet was then exported into 3 different .csv files for the different types of files to be modified. These .csv files were then reviewed by Greg. Thomas wrote a script to parse the csv files and add the proper SPDX tag to the file, in the format that the file expected. This script was further refined by Greg based on the output to detect more types of files automatically and to distinguish between header and source .c files (which need different comment types.) Finally Greg ran the script using the .csv files to generate the patches. Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org> Reviewed-by: Philippe Ombredanne <pombredanne@nexb.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-11-01 22:07:57 +08:00
/* SPDX-License-Identifier: GPL-2.0 */
#ifndef _SPARC64_TSB_H
#define _SPARC64_TSB_H
/* The sparc64 TSB is similar to the powerpc hashtables. It's a
* power-of-2 sized table of TAG/PTE pairs. The cpu precomputes
* pointers into this table for 8K and 64K page sizes, and also a
* comparison TAG based upon the virtual address and context which
* faults.
*
* TLB miss trap handler software does the actual lookup via something
* of the form:
*
* ldxa [%g0] ASI_{D,I}MMU_TSB_8KB_PTR, %g1
* ldxa [%g0] ASI_{D,I}MMU, %g6
* sllx %g6, 22, %g6
* srlx %g6, 22, %g6
* ldda [%g1] ASI_NUCLEUS_QUAD_LDD, %g4
* cmp %g4, %g6
* bne,pn %xcc, tsb_miss_{d,i}tlb
* mov FAULT_CODE_{D,I}TLB, %g3
* stxa %g5, [%g0] ASI_{D,I}TLB_DATA_IN
* retry
*
*
* Each 16-byte slot of the TSB is the 8-byte tag and then the 8-byte
* PTE. The TAG is of the same layout as the TLB TAG TARGET mmu
* register which is:
*
* -------------------------------------------------
* | - | CONTEXT | - | VADDR bits 63:22 |
* -------------------------------------------------
* 63 61 60 48 47 42 41 0
*
* But actually, since we use per-mm TSB's, we zero out the CONTEXT
* field.
*
* Like the powerpc hashtables we need to use locking in order to
* synchronize while we update the entries. PTE updates need locking
* as well.
*
* We need to carefully choose a lock bits for the TSB entry. We
* choose to use bit 47 in the tag. Also, since we never map anything
* at page zero in context zero, we use zero as an invalid tag entry.
* When the lock bit is set, this forces a tag comparison failure.
*/
#define TSB_TAG_LOCK_BIT 47
#define TSB_TAG_LOCK_HIGH (1 << (TSB_TAG_LOCK_BIT - 32))
#define TSB_TAG_INVALID_BIT 46
#define TSB_TAG_INVALID_HIGH (1 << (TSB_TAG_INVALID_BIT - 32))
/* Some cpus support physical address quad loads. We want to use
* those if possible so we don't need to hard-lock the TSB mapping
* into the TLB. We encode some instruction patching in order to
* support this.
*
* The kernel TSB is locked into the TLB by virtue of being in the
* kernel image, so we don't play these games for swapper_tsb access.
*/
#ifndef __ASSEMBLY__
struct tsb_ldquad_phys_patch_entry {
unsigned int addr;
unsigned int sun4u_insn;
unsigned int sun4v_insn;
};
extern struct tsb_ldquad_phys_patch_entry __tsb_ldquad_phys_patch,
__tsb_ldquad_phys_patch_end;
struct tsb_phys_patch_entry {
unsigned int addr;
unsigned int insn;
};
extern struct tsb_phys_patch_entry __tsb_phys_patch, __tsb_phys_patch_end;
#endif
#define TSB_LOAD_QUAD(TSB, REG) \
661: ldda [TSB] ASI_NUCLEUS_QUAD_LDD, REG; \
.section .tsb_ldquad_phys_patch, "ax"; \
.word 661b; \
ldda [TSB] ASI_QUAD_LDD_PHYS, REG; \
ldda [TSB] ASI_QUAD_LDD_PHYS_4V, REG; \
.previous
#define TSB_LOAD_TAG_HIGH(TSB, REG) \
661: lduwa [TSB] ASI_N, REG; \
.section .tsb_phys_patch, "ax"; \
.word 661b; \
lduwa [TSB] ASI_PHYS_USE_EC, REG; \
.previous
#define TSB_LOAD_TAG(TSB, REG) \
661: ldxa [TSB] ASI_N, REG; \
.section .tsb_phys_patch, "ax"; \
.word 661b; \
ldxa [TSB] ASI_PHYS_USE_EC, REG; \
.previous
#define TSB_CAS_TAG_HIGH(TSB, REG1, REG2) \
661: casa [TSB] ASI_N, REG1, REG2; \
.section .tsb_phys_patch, "ax"; \
.word 661b; \
casa [TSB] ASI_PHYS_USE_EC, REG1, REG2; \
.previous
#define TSB_CAS_TAG(TSB, REG1, REG2) \
661: casxa [TSB] ASI_N, REG1, REG2; \
.section .tsb_phys_patch, "ax"; \
.word 661b; \
casxa [TSB] ASI_PHYS_USE_EC, REG1, REG2; \
.previous
#define TSB_STORE(ADDR, VAL) \
661: stxa VAL, [ADDR] ASI_N; \
.section .tsb_phys_patch, "ax"; \
.word 661b; \
stxa VAL, [ADDR] ASI_PHYS_USE_EC; \
.previous
#define TSB_LOCK_TAG(TSB, REG1, REG2) \
99: TSB_LOAD_TAG_HIGH(TSB, REG1); \
sethi %hi(TSB_TAG_LOCK_HIGH), REG2;\
andcc REG1, REG2, %g0; \
bne,pn %icc, 99b; \
nop; \
TSB_CAS_TAG_HIGH(TSB, REG1, REG2); \
cmp REG1, REG2; \
bne,pn %icc, 99b; \
nop; \
#define TSB_WRITE(TSB, TTE, TAG) \
add TSB, 0x8, TSB; \
TSB_STORE(TSB, TTE); \
sub TSB, 0x8, TSB; \
TSB_STORE(TSB, TAG);
sparc64: Fix physical memory management regressions with large max_phys_bits. If max_phys_bits needs to be > 43 (f.e. for T4 chips), things like DEBUG_PAGEALLOC stop working because the 3-level page tables only can cover up to 43 bits. Another problem is that when we increased MAX_PHYS_ADDRESS_BITS up to 47, several statically allocated tables became enormous. Compounding this is that we will need to support up to 49 bits of physical addressing for M7 chips. The two tables in question are sparc64_valid_addr_bitmap and kpte_linear_bitmap. The first holds a bitmap, with 1 bit for each 4MB chunk of physical memory, indicating whether that chunk actually exists in the machine and is valid. The second table is a set of 2-bit values which tell how large of a mapping (4MB, 256MB, 2GB, 16GB, respectively) we can use at each 256MB chunk of ram in the system. These tables are huge and take up an enormous amount of the BSS section of the sparc64 kernel image. Specifically, the sparc64_valid_addr_bitmap is 4MB, and the kpte_linear_bitmap is 128K. So let's solve the space wastage and the DEBUG_PAGEALLOC problem at the same time, by using the kernel page tables (as designed) to manage this information. We have to keep using large mappings when DEBUG_PAGEALLOC is disabled, and we do this by encoding huge PMDs and PUDs. On a T4-2 with 256GB of ram the kernel page table takes up 16K with DEBUG_PAGEALLOC disabled and 256MB with it enabled. Furthermore, this memory is dynamically allocated at run time rather than coded statically into the kernel image. Signed-off-by: David S. Miller <davem@davemloft.net> Acked-by: Bob Picco <bob.picco@oracle.com>
2014-09-25 11:56:11 +08:00
/* Do a kernel page table walk. Leaves valid PTE value in
* REG1. Jumps to FAIL_LABEL on early page table walk
* termination. VADDR will not be clobbered, but REG2 will.
*
* There are two masks we must apply to propagate bits from
* the virtual address into the PTE physical address field
* when dealing with huge pages. This is because the page
* table boundaries do not match the huge page size(s) the
* hardware supports.
*
* In these cases we propagate the bits that are below the
* page table level where we saw the huge page mapping, but
* are still within the relevant physical bits for the huge
* page size in question. So for PMD mappings (which fall on
* bit 23, for 8MB per PMD) we must propagate bit 22 for a
* 4MB huge page. For huge PUDs (which fall on bit 33, for
* 8GB per PUD), we have to accommodate 256MB and 2GB huge
sparc64: Fix physical memory management regressions with large max_phys_bits. If max_phys_bits needs to be > 43 (f.e. for T4 chips), things like DEBUG_PAGEALLOC stop working because the 3-level page tables only can cover up to 43 bits. Another problem is that when we increased MAX_PHYS_ADDRESS_BITS up to 47, several statically allocated tables became enormous. Compounding this is that we will need to support up to 49 bits of physical addressing for M7 chips. The two tables in question are sparc64_valid_addr_bitmap and kpte_linear_bitmap. The first holds a bitmap, with 1 bit for each 4MB chunk of physical memory, indicating whether that chunk actually exists in the machine and is valid. The second table is a set of 2-bit values which tell how large of a mapping (4MB, 256MB, 2GB, 16GB, respectively) we can use at each 256MB chunk of ram in the system. These tables are huge and take up an enormous amount of the BSS section of the sparc64 kernel image. Specifically, the sparc64_valid_addr_bitmap is 4MB, and the kpte_linear_bitmap is 128K. So let's solve the space wastage and the DEBUG_PAGEALLOC problem at the same time, by using the kernel page tables (as designed) to manage this information. We have to keep using large mappings when DEBUG_PAGEALLOC is disabled, and we do this by encoding huge PMDs and PUDs. On a T4-2 with 256GB of ram the kernel page table takes up 16K with DEBUG_PAGEALLOC disabled and 256MB with it enabled. Furthermore, this memory is dynamically allocated at run time rather than coded statically into the kernel image. Signed-off-by: David S. Miller <davem@davemloft.net> Acked-by: Bob Picco <bob.picco@oracle.com>
2014-09-25 11:56:11 +08:00
* pages. So for those we propagate bits 32 to 28.
*/
#define KERN_PGTABLE_WALK(VADDR, REG1, REG2, FAIL_LABEL) \
sethi %hi(swapper_pg_dir), REG1; \
or REG1, %lo(swapper_pg_dir), REG1; \
sllx VADDR, 64 - (PGDIR_SHIFT + PGDIR_BITS), REG2; \
srlx REG2, 64 - PAGE_SHIFT, REG2; \
andn REG2, 0x7, REG2; \
ldx [REG1 + REG2], REG1; \
brz,pn REG1, FAIL_LABEL; \
sllx VADDR, 64 - (PUD_SHIFT + PUD_BITS), REG2; \
srlx REG2, 64 - PAGE_SHIFT, REG2; \
andn REG2, 0x7, REG2; \
ldxa [REG1 + REG2] ASI_PHYS_USE_EC, REG1; \
brz,pn REG1, FAIL_LABEL; \
sparc64: Fix physical memory management regressions with large max_phys_bits. If max_phys_bits needs to be > 43 (f.e. for T4 chips), things like DEBUG_PAGEALLOC stop working because the 3-level page tables only can cover up to 43 bits. Another problem is that when we increased MAX_PHYS_ADDRESS_BITS up to 47, several statically allocated tables became enormous. Compounding this is that we will need to support up to 49 bits of physical addressing for M7 chips. The two tables in question are sparc64_valid_addr_bitmap and kpte_linear_bitmap. The first holds a bitmap, with 1 bit for each 4MB chunk of physical memory, indicating whether that chunk actually exists in the machine and is valid. The second table is a set of 2-bit values which tell how large of a mapping (4MB, 256MB, 2GB, 16GB, respectively) we can use at each 256MB chunk of ram in the system. These tables are huge and take up an enormous amount of the BSS section of the sparc64 kernel image. Specifically, the sparc64_valid_addr_bitmap is 4MB, and the kpte_linear_bitmap is 128K. So let's solve the space wastage and the DEBUG_PAGEALLOC problem at the same time, by using the kernel page tables (as designed) to manage this information. We have to keep using large mappings when DEBUG_PAGEALLOC is disabled, and we do this by encoding huge PMDs and PUDs. On a T4-2 with 256GB of ram the kernel page table takes up 16K with DEBUG_PAGEALLOC disabled and 256MB with it enabled. Furthermore, this memory is dynamically allocated at run time rather than coded statically into the kernel image. Signed-off-by: David S. Miller <davem@davemloft.net> Acked-by: Bob Picco <bob.picco@oracle.com>
2014-09-25 11:56:11 +08:00
sethi %uhi(_PAGE_PUD_HUGE), REG2; \
brz,pn REG1, FAIL_LABEL; \
sllx REG2, 32, REG2; \
andcc REG1, REG2, %g0; \
sethi %hi(0xf8000000), REG2; \
bne,pt %xcc, 697f; \
sllx REG2, 1, REG2; \
sllx VADDR, 64 - (PMD_SHIFT + PMD_BITS), REG2; \
srlx REG2, 64 - PAGE_SHIFT, REG2; \
andn REG2, 0x7, REG2; \
ldxa [REG1 + REG2] ASI_PHYS_USE_EC, REG1; \
sparc64: Fix physical memory management regressions with large max_phys_bits. If max_phys_bits needs to be > 43 (f.e. for T4 chips), things like DEBUG_PAGEALLOC stop working because the 3-level page tables only can cover up to 43 bits. Another problem is that when we increased MAX_PHYS_ADDRESS_BITS up to 47, several statically allocated tables became enormous. Compounding this is that we will need to support up to 49 bits of physical addressing for M7 chips. The two tables in question are sparc64_valid_addr_bitmap and kpte_linear_bitmap. The first holds a bitmap, with 1 bit for each 4MB chunk of physical memory, indicating whether that chunk actually exists in the machine and is valid. The second table is a set of 2-bit values which tell how large of a mapping (4MB, 256MB, 2GB, 16GB, respectively) we can use at each 256MB chunk of ram in the system. These tables are huge and take up an enormous amount of the BSS section of the sparc64 kernel image. Specifically, the sparc64_valid_addr_bitmap is 4MB, and the kpte_linear_bitmap is 128K. So let's solve the space wastage and the DEBUG_PAGEALLOC problem at the same time, by using the kernel page tables (as designed) to manage this information. We have to keep using large mappings when DEBUG_PAGEALLOC is disabled, and we do this by encoding huge PMDs and PUDs. On a T4-2 with 256GB of ram the kernel page table takes up 16K with DEBUG_PAGEALLOC disabled and 256MB with it enabled. Furthermore, this memory is dynamically allocated at run time rather than coded statically into the kernel image. Signed-off-by: David S. Miller <davem@davemloft.net> Acked-by: Bob Picco <bob.picco@oracle.com>
2014-09-25 11:56:11 +08:00
sethi %uhi(_PAGE_PMD_HUGE), REG2; \
brz,pn REG1, FAIL_LABEL; \
sparc64: Fix physical memory management regressions with large max_phys_bits. If max_phys_bits needs to be > 43 (f.e. for T4 chips), things like DEBUG_PAGEALLOC stop working because the 3-level page tables only can cover up to 43 bits. Another problem is that when we increased MAX_PHYS_ADDRESS_BITS up to 47, several statically allocated tables became enormous. Compounding this is that we will need to support up to 49 bits of physical addressing for M7 chips. The two tables in question are sparc64_valid_addr_bitmap and kpte_linear_bitmap. The first holds a bitmap, with 1 bit for each 4MB chunk of physical memory, indicating whether that chunk actually exists in the machine and is valid. The second table is a set of 2-bit values which tell how large of a mapping (4MB, 256MB, 2GB, 16GB, respectively) we can use at each 256MB chunk of ram in the system. These tables are huge and take up an enormous amount of the BSS section of the sparc64 kernel image. Specifically, the sparc64_valid_addr_bitmap is 4MB, and the kpte_linear_bitmap is 128K. So let's solve the space wastage and the DEBUG_PAGEALLOC problem at the same time, by using the kernel page tables (as designed) to manage this information. We have to keep using large mappings when DEBUG_PAGEALLOC is disabled, and we do this by encoding huge PMDs and PUDs. On a T4-2 with 256GB of ram the kernel page table takes up 16K with DEBUG_PAGEALLOC disabled and 256MB with it enabled. Furthermore, this memory is dynamically allocated at run time rather than coded statically into the kernel image. Signed-off-by: David S. Miller <davem@davemloft.net> Acked-by: Bob Picco <bob.picco@oracle.com>
2014-09-25 11:56:11 +08:00
sllx REG2, 32, REG2; \
andcc REG1, REG2, %g0; \
be,pn %xcc, 698f; \
sethi %hi(0x400000), REG2; \
697: brgez,pn REG1, FAIL_LABEL; \
andn REG1, REG2, REG1; \
and VADDR, REG2, REG2; \
ba,pt %xcc, 699f; \
or REG1, REG2, REG1; \
698: sllx VADDR, 64 - PMD_SHIFT, REG2; \
sparc64: Move from 4MB to 8MB huge pages. The impetus for this is that we would like to move to 64-bit PMDs and PGDs, but that would result in only supporting a 42-bit address space with the current page table layout. It'd be nice to support at least 43-bits. The reason we'd end up with only 42-bits after making PMDs and PGDs 64-bit is that we only use half-page sized PTE tables in order to make PMDs line up to 4MB, the hardware huge page size we use. So what we do here is we make huge pages 8MB, and fabricate them using 4MB hw TLB entries. Facilitate this by providing a "REAL_HPAGE_SHIFT" which is used in places that really need to operate on hardware 4MB pages. Use full pages (512 entries) for PTE tables, and adjust PMD_SHIFT, PGD_SHIFT, and the build time CPP test as needed. Use a CPP test to make sure REAL_HPAGE_SHIFT and the _PAGE_SZHUGE_* we use match up. This makes the pgtable cache completely unused, so remove the code managing it and the state used in mm_context_t. Now we have less spinlocks taken in the page table allocation path. The technique we use to fabricate the 8MB pages is to transfer bit 22 from the missing virtual address into the PTEs physical address field. That takes care of the transparent huge pages case. For hugetlb, we fill things in at the PTE level and that code already puts the sub huge page physical bits into the PTEs, based upon the offset, so there is nothing special we need to do. It all just works out. So, a small amount of complexity in the THP case, but this code is about to get much simpler when we move the 64-bit PMDs as we can move away from the fancy 32-bit huge PMD encoding and just put a real PTE value in there. With bug fixes and help from Bob Picco. Signed-off-by: David S. Miller <davem@davemloft.net>
2013-09-26 04:48:49 +08:00
srlx REG2, 64 - PAGE_SHIFT, REG2; \
andn REG2, 0x7, REG2; \
sparc64: Fix physical memory management regressions with large max_phys_bits. If max_phys_bits needs to be > 43 (f.e. for T4 chips), things like DEBUG_PAGEALLOC stop working because the 3-level page tables only can cover up to 43 bits. Another problem is that when we increased MAX_PHYS_ADDRESS_BITS up to 47, several statically allocated tables became enormous. Compounding this is that we will need to support up to 49 bits of physical addressing for M7 chips. The two tables in question are sparc64_valid_addr_bitmap and kpte_linear_bitmap. The first holds a bitmap, with 1 bit for each 4MB chunk of physical memory, indicating whether that chunk actually exists in the machine and is valid. The second table is a set of 2-bit values which tell how large of a mapping (4MB, 256MB, 2GB, 16GB, respectively) we can use at each 256MB chunk of ram in the system. These tables are huge and take up an enormous amount of the BSS section of the sparc64 kernel image. Specifically, the sparc64_valid_addr_bitmap is 4MB, and the kpte_linear_bitmap is 128K. So let's solve the space wastage and the DEBUG_PAGEALLOC problem at the same time, by using the kernel page tables (as designed) to manage this information. We have to keep using large mappings when DEBUG_PAGEALLOC is disabled, and we do this by encoding huge PMDs and PUDs. On a T4-2 with 256GB of ram the kernel page table takes up 16K with DEBUG_PAGEALLOC disabled and 256MB with it enabled. Furthermore, this memory is dynamically allocated at run time rather than coded statically into the kernel image. Signed-off-by: David S. Miller <davem@davemloft.net> Acked-by: Bob Picco <bob.picco@oracle.com>
2014-09-25 11:56:11 +08:00
ldxa [REG1 + REG2] ASI_PHYS_USE_EC, REG1; \
brgez,pn REG1, FAIL_LABEL; \
nop; \
699:
/* PUD has been loaded into REG1, interpret the value, seeing
* if it is a HUGE PUD or a normal one. If it is not valid
* then jump to FAIL_LABEL. If it is a HUGE PUD, and it
* translates to a valid PTE, branch to PTE_LABEL.
*
* We have to propagate bits [32:22] from the virtual address
* to resolve at 4M granularity.
*/
#if defined(CONFIG_HUGETLB_PAGE) || defined(CONFIG_TRANSPARENT_HUGEPAGE)
#define USER_PGTABLE_CHECK_PUD_HUGE(VADDR, REG1, REG2, FAIL_LABEL, PTE_LABEL) \
700: ba 700f; \
nop; \
.section .pud_huge_patch, "ax"; \
.word 700b; \
nop; \
.previous; \
brz,pn REG1, FAIL_LABEL; \
sethi %uhi(_PAGE_PUD_HUGE), REG2; \
sllx REG2, 32, REG2; \
andcc REG1, REG2, %g0; \
be,pt %xcc, 700f; \
sethi %hi(0x1ffc0000), REG2; \
sllx REG2, 1, REG2; \
brgez,pn REG1, FAIL_LABEL; \
andn REG1, REG2, REG1; \
and VADDR, REG2, REG2; \
brlz,pt REG1, PTE_LABEL; \
or REG1, REG2, REG1; \
700:
#else
#define USER_PGTABLE_CHECK_PUD_HUGE(VADDR, REG1, REG2, FAIL_LABEL, PTE_LABEL) \
brz,pn REG1, FAIL_LABEL; \
nop;
#endif
/* PMD has been loaded into REG1, interpret the value, seeing
* if it is a HUGE PMD or a normal one. If it is not valid
* then jump to FAIL_LABEL. If it is a HUGE PMD, and it
* translates to a valid PTE, branch to PTE_LABEL.
*
* We have to propagate the 4MB bit of the virtual address
* because we are fabricating 8MB pages using 4MB hw pages.
*/
#if defined(CONFIG_HUGETLB_PAGE) || defined(CONFIG_TRANSPARENT_HUGEPAGE)
#define USER_PGTABLE_CHECK_PMD_HUGE(VADDR, REG1, REG2, FAIL_LABEL, PTE_LABEL) \
brz,pn REG1, FAIL_LABEL; \
sethi %uhi(_PAGE_PMD_HUGE), REG2; \
sllx REG2, 32, REG2; \
andcc REG1, REG2, %g0; \
be,pt %xcc, 700f; \
sethi %hi(4 * 1024 * 1024), REG2; \
brgez,pn REG1, FAIL_LABEL; \
andn REG1, REG2, REG1; \
and VADDR, REG2, REG2; \
brlz,pt REG1, PTE_LABEL; \
or REG1, REG2, REG1; \
700:
#else
#define USER_PGTABLE_CHECK_PMD_HUGE(VADDR, REG1, REG2, FAIL_LABEL, PTE_LABEL) \
brz,pn REG1, FAIL_LABEL; \
nop;
#endif
/* Do a user page table walk in MMU globals. Leaves final,
* valid, PTE value in REG1. Jumps to FAIL_LABEL on early
* page table walk termination or if the PTE is not valid.
*
* Physical base of page tables is in PHYS_PGD which will not
* be modified.
*
* VADDR will not be clobbered, but REG1 and REG2 will.
*/
#define USER_PGTABLE_WALK_TL1(VADDR, PHYS_PGD, REG1, REG2, FAIL_LABEL) \
sllx VADDR, 64 - (PGDIR_SHIFT + PGDIR_BITS), REG2; \
srlx REG2, 64 - PAGE_SHIFT, REG2; \
andn REG2, 0x7, REG2; \
ldxa [PHYS_PGD + REG2] ASI_PHYS_USE_EC, REG1; \
brz,pn REG1, FAIL_LABEL; \
sllx VADDR, 64 - (PUD_SHIFT + PUD_BITS), REG2; \
srlx REG2, 64 - PAGE_SHIFT, REG2; \
andn REG2, 0x7, REG2; \
ldxa [REG1 + REG2] ASI_PHYS_USE_EC, REG1; \
USER_PGTABLE_CHECK_PUD_HUGE(VADDR, REG1, REG2, FAIL_LABEL, 800f) \
brz,pn REG1, FAIL_LABEL; \
sllx VADDR, 64 - (PMD_SHIFT + PMD_BITS), REG2; \
srlx REG2, 64 - PAGE_SHIFT, REG2; \
andn REG2, 0x7, REG2; \
ldxa [REG1 + REG2] ASI_PHYS_USE_EC, REG1; \
USER_PGTABLE_CHECK_PMD_HUGE(VADDR, REG1, REG2, FAIL_LABEL, 800f) \
sllx VADDR, 64 - PMD_SHIFT, REG2; \
sparc64: Move from 4MB to 8MB huge pages. The impetus for this is that we would like to move to 64-bit PMDs and PGDs, but that would result in only supporting a 42-bit address space with the current page table layout. It'd be nice to support at least 43-bits. The reason we'd end up with only 42-bits after making PMDs and PGDs 64-bit is that we only use half-page sized PTE tables in order to make PMDs line up to 4MB, the hardware huge page size we use. So what we do here is we make huge pages 8MB, and fabricate them using 4MB hw TLB entries. Facilitate this by providing a "REAL_HPAGE_SHIFT" which is used in places that really need to operate on hardware 4MB pages. Use full pages (512 entries) for PTE tables, and adjust PMD_SHIFT, PGD_SHIFT, and the build time CPP test as needed. Use a CPP test to make sure REAL_HPAGE_SHIFT and the _PAGE_SZHUGE_* we use match up. This makes the pgtable cache completely unused, so remove the code managing it and the state used in mm_context_t. Now we have less spinlocks taken in the page table allocation path. The technique we use to fabricate the 8MB pages is to transfer bit 22 from the missing virtual address into the PTEs physical address field. That takes care of the transparent huge pages case. For hugetlb, we fill things in at the PTE level and that code already puts the sub huge page physical bits into the PTEs, based upon the offset, so there is nothing special we need to do. It all just works out. So, a small amount of complexity in the THP case, but this code is about to get much simpler when we move the 64-bit PMDs as we can move away from the fancy 32-bit huge PMD encoding and just put a real PTE value in there. With bug fixes and help from Bob Picco. Signed-off-by: David S. Miller <davem@davemloft.net>
2013-09-26 04:48:49 +08:00
srlx REG2, 64 - PAGE_SHIFT, REG2; \
andn REG2, 0x7, REG2; \
add REG1, REG2, REG1; \
ldxa [REG1] ASI_PHYS_USE_EC, REG1; \
brgez,pn REG1, FAIL_LABEL; \
nop; \
800:
/* Lookup a OBP mapping on VADDR in the prom_trans[] table at TL>0.
* If no entry is found, FAIL_LABEL will be branched to. On success
* the resulting PTE value will be left in REG1. VADDR is preserved
* by this routine.
*/
#define OBP_TRANS_LOOKUP(VADDR, REG1, REG2, REG3, FAIL_LABEL) \
sethi %hi(prom_trans), REG1; \
or REG1, %lo(prom_trans), REG1; \
97: ldx [REG1 + 0x00], REG2; \
brz,pn REG2, FAIL_LABEL; \
nop; \
ldx [REG1 + 0x08], REG3; \
add REG2, REG3, REG3; \
cmp REG2, VADDR; \
bgu,pt %xcc, 98f; \
cmp VADDR, REG3; \
bgeu,pt %xcc, 98f; \
ldx [REG1 + 0x10], REG3; \
sub VADDR, REG2, REG2; \
ba,pt %xcc, 99f; \
add REG3, REG2, REG1; \
98: ba,pt %xcc, 97b; \
add REG1, (3 * 8), REG1; \
99:
/* We use a 32K TSB for the whole kernel, this allows to
* handle about 16MB of modules and vmalloc mappings without
* incurring many hash conflicts.
*/
#define KERNEL_TSB_SIZE_BYTES (32 * 1024)
#define KERNEL_TSB_NENTRIES \
(KERNEL_TSB_SIZE_BYTES / 16)
#define KERNEL_TSB4M_NENTRIES 4096
/* Do a kernel TSB lookup at tl>0 on VADDR+TAG, branch to OK_LABEL
* on TSB hit. REG1, REG2, REG3, and REG4 are used as temporaries
* and the found TTE will be left in REG1. REG3 and REG4 must
* be an even/odd pair of registers.
*
* VADDR and TAG will be preserved and not clobbered by this macro.
*/
#define KERN_TSB_LOOKUP_TL1(VADDR, TAG, REG1, REG2, REG3, REG4, OK_LABEL) \
661: sethi %uhi(swapper_tsb), REG1; \
sethi %hi(swapper_tsb), REG2; \
or REG1, %ulo(swapper_tsb), REG1; \
or REG2, %lo(swapper_tsb), REG2; \
.section .swapper_tsb_phys_patch, "ax"; \
.word 661b; \
.previous; \
sllx REG1, 32, REG1; \
or REG1, REG2, REG1; \
srlx VADDR, PAGE_SHIFT, REG2; \
and REG2, (KERNEL_TSB_NENTRIES - 1), REG2; \
sllx REG2, 4, REG2; \
add REG1, REG2, REG2; \
TSB_LOAD_QUAD(REG2, REG3); \
cmp REG3, TAG; \
be,a,pt %xcc, OK_LABEL; \
mov REG4, REG1;
#ifndef CONFIG_DEBUG_PAGEALLOC
/* This version uses a trick, the TAG is already (VADDR >> 22) so
* we can make use of that for the index computation.
*/
#define KERN_TSB4M_LOOKUP_TL1(TAG, REG1, REG2, REG3, REG4, OK_LABEL) \
661: sethi %uhi(swapper_4m_tsb), REG1; \
sethi %hi(swapper_4m_tsb), REG2; \
or REG1, %ulo(swapper_4m_tsb), REG1; \
or REG2, %lo(swapper_4m_tsb), REG2; \
.section .swapper_4m_tsb_phys_patch, "ax"; \
.word 661b; \
.previous; \
sllx REG1, 32, REG1; \
or REG1, REG2, REG1; \
and TAG, (KERNEL_TSB4M_NENTRIES - 1), REG2; \
sllx REG2, 4, REG2; \
add REG1, REG2, REG2; \
TSB_LOAD_QUAD(REG2, REG3); \
cmp REG3, TAG; \
be,a,pt %xcc, OK_LABEL; \
mov REG4, REG1;
#endif
#endif /* !(_SPARC64_TSB_H) */