linux_old1/Documentation/x86/protection-keys.txt

Memory Protection Keys for Userspace (PKU aka PKEYs) is a feature
which is found on Intel's Skylake "Scalable Processor" Server CPUs.
It will be avalable in future non-server parts.

For anyone wishing to test or use this feature, it is available in
Amazon's EC2 C5 instances and is known to work there using an Ubuntu
17.04 image.

Memory Protection Keys provides a mechanism for enforcing page-based
protections, but without requiring modification of the page tables
when an application changes protection domains.  It works by
dedicating 4 previously ignored bits in each page table entry to a
"protection key", giving 16 possible keys.

There is also a new user-accessible register (PKRU) with two separate
bits (Access Disable and Write Disable) for each key.  Being a CPU
register, PKRU is inherently thread-local, potentially giving each
thread a different set of protections from every other thread.

There are two new instructions (RDPKRU/WRPKRU) for reading and writing
to the new register.  The feature is only available in 64-bit mode,
even though there is theoretically space in the PAE PTEs.  These
permissions are enforced on data access only and have no effect on
instruction fetches.

=========================== Syscalls ===========================

There are 3 system calls which directly interact with pkeys:

	int pkey_alloc(unsigned long flags, unsigned long init_access_rights)
	int pkey_free(int pkey);
	int pkey_mprotect(unsigned long start, size_t len,
			  unsigned long prot, int pkey);

Before a pkey can be used, it must first be allocated with
pkey_alloc().  An application calls the WRPKRU instruction
directly in order to change access permissions to memory covered
with a key.  In this example WRPKRU is wrapped by a C function
called pkey_set().

	int real_prot = PROT_READ|PROT_WRITE;
	pkey = pkey_alloc(0, PKEY_DISABLE_WRITE);
	ptr = mmap(NULL, PAGE_SIZE, PROT_NONE, MAP_ANONYMOUS|MAP_PRIVATE, -1, 0);
	ret = pkey_mprotect(ptr, PAGE_SIZE, real_prot, pkey);
	... application runs here

Now, if the application needs to update the data at 'ptr', it can
gain access, do the update, then remove its write access:

	pkey_set(pkey, 0); // clear PKEY_DISABLE_WRITE
	*ptr = foo; // assign something
	pkey_set(pkey, PKEY_DISABLE_WRITE); // set PKEY_DISABLE_WRITE again

Now when it frees the memory, it will also free the pkey since it
is no longer in use:

	munmap(ptr, PAGE_SIZE);
	pkey_free(pkey);

(Note: pkey_set() is a wrapper for the RDPKRU and WRPKRU instructions.
 An example implementation can be found in
 tools/testing/selftests/x86/protection_keys.c)

=========================== Behavior ===========================

The kernel attempts to make protection keys consistent with the
behavior of a plain mprotect().  For instance if you do this:

	mprotect(ptr, size, PROT_NONE);
	something(ptr);

you can expect the same effects with protection keys when doing this:

	pkey = pkey_alloc(0, PKEY_DISABLE_WRITE | PKEY_DISABLE_READ);
	pkey_mprotect(ptr, size, PROT_READ|PROT_WRITE, pkey);
	something(ptr);

That should be true whether something() is a direct access to 'ptr'
like:

	*ptr = foo;

or when the kernel does the access on the application's behalf like
with a read():

	read(fd, ptr, 1);

The kernel will send a SIGSEGV in both cases, but si_code will be set
to SEGV_PKERR when violating protection keys versus SEGV_ACCERR when
the plain mprotect() permissions are violated.
x86/pkeys: Update documentation about availability Now that CPUs that implement Memory Protection Keys are publicly available we can be a bit less oblique about where it is available. Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20171111001228.DC748A10@viggo.jf.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org> 2017-11-11 08:12:28 +08:00			`Memory Protection Keys for Userspace (PKU aka PKEYs) is a feature`
			`which is found on Intel's Skylake "Scalable Processor" Server CPUs.`
			`It will be avalable in future non-server parts.`

			`For anyone wishing to test or use this feature, it is available in`
			`Amazon's EC2 C5 instances and is known to work there using an Ubuntu`
			`17.04 image.`
x86/mm/pkeys: Add missing Documentation Stefan Richter noticed that the X86_INTEL_MEMORY_PROTECTION_KEYS option in arch/x86/Kconfig references Documentation/x86/protection-keys.txt, but the file does not exist. This is a patch merging mishap: the final (v8) version of the pkeys series did not include the documentation patch 32 and v7 included. Add it now. Reported-by: Stefan Richter <stefanr@s5r6.in-berlin.de> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dave Hansen <dave@sr71.net> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-mm@kvack.org Link: http://lkml.kernel.org/r/20151214190634.426BEE41@viggo.jf.intel.com [ Added changelog. ] Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> 2015-12-15 03:06:34 +08:00
			`Memory Protection Keys provides a mechanism for enforcing page-based`
			`protections, but without requiring modification of the page tables`
			`when an application changes protection domains. It works by`
			`dedicating 4 previously ignored bits in each page table entry to a`
			`"protection key", giving 16 possible keys.`

			`There is also a new user-accessible register (PKRU) with two separate`
			`bits (Access Disable and Write Disable) for each key. Being a CPU`
			`register, PKRU is inherently thread-local, potentially giving each`
			`thread a different set of protections from every other thread.`

			`There are two new instructions (RDPKRU/WRPKRU) for reading and writing`
			`to the new register. The feature is only available in 64-bit mode,`
			`even though there is theoretically space in the PAE PTEs. These`
			`permissions are enforced on data access only and have no effect on`
			`instruction fetches.`

pkeys: Add details of system call use to Documentation/ This spells out all of the pkey-related system calls that we have and provides some example code fragments to demonstrate how we expect them to be used. Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Cc: linux-arch@vger.kernel.org Cc: Dave Hansen <dave@sr71.net> Cc: mgorman@techsingularity.net Cc: arnd@arndb.de Cc: linux-api@vger.kernel.org Cc: linux-mm@kvack.org Cc: luto@kernel.org Cc: akpm@linux-foundation.org Cc: torvalds@linux-foundation.org Link: http://lkml.kernel.org/r/20160729163020.59350E33@viggo.jf.intel.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de> 2016-07-30 00:30:20 +08:00			`=========================== Syscalls ===========================`

x86/pkeys: Update documentation There are a few items that have gotten stale in the protection keys documentation. The config option description only applied to the execute-only support and is not accurate for the current code. There was also a typo with the number of system calls. I also wanted to call out that pkey_set() is not a kernel-provided facility, and where to find an implementation. Signed-off-by: Dave Hansen <dave.hansen@intel.com> Cc: Dave Hansen <dave@sr71.net> Cc: linux-doc@vger.kernel.org Cc: corbet@lwn.net Link: http://lkml.kernel.org/r/20161004163857.71E0D6F6@viggo.jf.intel.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de> 2016-10-05 00:38:57 +08:00			`There are 3 system calls which directly interact with pkeys:`
pkeys: Add details of system call use to Documentation/ This spells out all of the pkey-related system calls that we have and provides some example code fragments to demonstrate how we expect them to be used. Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Cc: linux-arch@vger.kernel.org Cc: Dave Hansen <dave@sr71.net> Cc: mgorman@techsingularity.net Cc: arnd@arndb.de Cc: linux-api@vger.kernel.org Cc: linux-mm@kvack.org Cc: luto@kernel.org Cc: akpm@linux-foundation.org Cc: torvalds@linux-foundation.org Link: http://lkml.kernel.org/r/20160729163020.59350E33@viggo.jf.intel.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de> 2016-07-30 00:30:20 +08:00
			`int pkey_alloc(unsigned long flags, unsigned long init_access_rights)`
			`int pkey_free(int pkey);`
			`int pkey_mprotect(unsigned long start, size_t len,`
			`unsigned long prot, int pkey);`

			`Before a pkey can be used, it must first be allocated with`
			`pkey_alloc(). An application calls the WRPKRU instruction`
			`directly in order to change access permissions to memory covered`
			`with a key. In this example WRPKRU is wrapped by a C function`
			`called pkey_set().`

			`int real_prot = PROT_READ\|PROT_WRITE;`
x86/mm/pkeys: Fix typo in Documentation/x86/protection-keys.txt Replace PKEY_DENY_WRITE with PKEY_DISABLE_WRITE, to match the source code. Signed-off-by: Wang Kai <morgan.wang@huawei.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: corbet@lwn.net Cc: dave.hansen@intel.com Cc: dave.hansen@linux.intel.com Cc: linux-doc@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org> 2017-07-24 21:03:46 +08:00			`pkey = pkey_alloc(0, PKEY_DISABLE_WRITE);`
pkeys: Add details of system call use to Documentation/ This spells out all of the pkey-related system calls that we have and provides some example code fragments to demonstrate how we expect them to be used. Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Cc: linux-arch@vger.kernel.org Cc: Dave Hansen <dave@sr71.net> Cc: mgorman@techsingularity.net Cc: arnd@arndb.de Cc: linux-api@vger.kernel.org Cc: linux-mm@kvack.org Cc: luto@kernel.org Cc: akpm@linux-foundation.org Cc: torvalds@linux-foundation.org Link: http://lkml.kernel.org/r/20160729163020.59350E33@viggo.jf.intel.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de> 2016-07-30 00:30:20 +08:00			`ptr = mmap(NULL, PAGE_SIZE, PROT_NONE, MAP_ANONYMOUS\|MAP_PRIVATE, -1, 0);`
			`ret = pkey_mprotect(ptr, PAGE_SIZE, real_prot, pkey);`
			`... application runs here`

			`Now, if the application needs to update the data at 'ptr', it can`
			`gain access, do the update, then remove its write access:`

x86/mm/pkeys: Fix typo in Documentation/x86/protection-keys.txt Replace PKEY_DENY_WRITE with PKEY_DISABLE_WRITE, to match the source code. Signed-off-by: Wang Kai <morgan.wang@huawei.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: corbet@lwn.net Cc: dave.hansen@intel.com Cc: dave.hansen@linux.intel.com Cc: linux-doc@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org> 2017-07-24 21:03:46 +08:00			`pkey_set(pkey, 0); // clear PKEY_DISABLE_WRITE`
pkeys: Add details of system call use to Documentation/ This spells out all of the pkey-related system calls that we have and provides some example code fragments to demonstrate how we expect them to be used. Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Cc: linux-arch@vger.kernel.org Cc: Dave Hansen <dave@sr71.net> Cc: mgorman@techsingularity.net Cc: arnd@arndb.de Cc: linux-api@vger.kernel.org Cc: linux-mm@kvack.org Cc: luto@kernel.org Cc: akpm@linux-foundation.org Cc: torvalds@linux-foundation.org Link: http://lkml.kernel.org/r/20160729163020.59350E33@viggo.jf.intel.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de> 2016-07-30 00:30:20 +08:00			`*ptr = foo; // assign something`
x86/mm/pkeys: Fix typo in Documentation/x86/protection-keys.txt Replace PKEY_DENY_WRITE with PKEY_DISABLE_WRITE, to match the source code. Signed-off-by: Wang Kai <morgan.wang@huawei.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: corbet@lwn.net Cc: dave.hansen@intel.com Cc: dave.hansen@linux.intel.com Cc: linux-doc@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org> 2017-07-24 21:03:46 +08:00			`pkey_set(pkey, PKEY_DISABLE_WRITE); // set PKEY_DISABLE_WRITE again`
pkeys: Add details of system call use to Documentation/ This spells out all of the pkey-related system calls that we have and provides some example code fragments to demonstrate how we expect them to be used. Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Cc: linux-arch@vger.kernel.org Cc: Dave Hansen <dave@sr71.net> Cc: mgorman@techsingularity.net Cc: arnd@arndb.de Cc: linux-api@vger.kernel.org Cc: linux-mm@kvack.org Cc: luto@kernel.org Cc: akpm@linux-foundation.org Cc: torvalds@linux-foundation.org Link: http://lkml.kernel.org/r/20160729163020.59350E33@viggo.jf.intel.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de> 2016-07-30 00:30:20 +08:00
			`Now when it frees the memory, it will also free the pkey since it`
			`is no longer in use:`

			`munmap(ptr, PAGE_SIZE);`
			`pkey_free(pkey);`

x86/pkeys: Update documentation There are a few items that have gotten stale in the protection keys documentation. The config option description only applied to the execute-only support and is not accurate for the current code. There was also a typo with the number of system calls. I also wanted to call out that pkey_set() is not a kernel-provided facility, and where to find an implementation. Signed-off-by: Dave Hansen <dave.hansen@intel.com> Cc: Dave Hansen <dave@sr71.net> Cc: linux-doc@vger.kernel.org Cc: corbet@lwn.net Link: http://lkml.kernel.org/r/20161004163857.71E0D6F6@viggo.jf.intel.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de> 2016-10-05 00:38:57 +08:00			`(Note: pkey_set() is a wrapper for the RDPKRU and WRPKRU instructions.`
			`An example implementation can be found in`
			`tools/testing/selftests/x86/protection_keys.c)`

pkeys: Add details of system call use to Documentation/ This spells out all of the pkey-related system calls that we have and provides some example code fragments to demonstrate how we expect them to be used. Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Cc: linux-arch@vger.kernel.org Cc: Dave Hansen <dave@sr71.net> Cc: mgorman@techsingularity.net Cc: arnd@arndb.de Cc: linux-api@vger.kernel.org Cc: linux-mm@kvack.org Cc: luto@kernel.org Cc: akpm@linux-foundation.org Cc: torvalds@linux-foundation.org Link: http://lkml.kernel.org/r/20160729163020.59350E33@viggo.jf.intel.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de> 2016-07-30 00:30:20 +08:00			`=========================== Behavior ===========================`

			`The kernel attempts to make protection keys consistent with the`
			`behavior of a plain mprotect(). For instance if you do this:`

			`mprotect(ptr, size, PROT_NONE);`
			`something(ptr);`

			`you can expect the same effects with protection keys when doing this:`

			`pkey = pkey_alloc(0, PKEY_DISABLE_WRITE \| PKEY_DISABLE_READ);`
			`pkey_mprotect(ptr, size, PROT_READ\|PROT_WRITE, pkey);`
			`something(ptr);`

			`That should be true whether something() is a direct access to 'ptr'`
			`like:`

			`*ptr = foo;`

			`or when the kernel does the access on the application's behalf like`
			`with a read():`

			`read(fd, ptr, 1);`

			`The kernel will send a SIGSEGV in both cases, but si_code will be set`
			`to SEGV_PKERR when violating protection keys versus SEGV_ACCERR when`
			`the plain mprotect() permissions are violated.`