linux_old1/arch/x86/xen
Alex Shi e7b52ffd45 x86/flush_tlb: try flush_tlb_single one by one in flush_tlb_range
x86 has no flush_tlb_range support in instruction level. Currently the
flush_tlb_range just implemented by flushing all page table. That is not
the best solution for all scenarios. In fact, if we just use 'invlpg' to
flush few lines from TLB, we can get the performance gain from later
remain TLB lines accessing.

But the 'invlpg' instruction costs much of time. Its execution time can
compete with cr3 rewriting, and even a bit more on SNB CPU.

So, on a 512 4KB TLB entries CPU, the balance points is at:
	(512 - X) * 100ns(assumed TLB refill cost) =
		X(TLB flush entries) * 100ns(assumed invlpg cost)

Here, X is 256, that is 1/2 of 512 entries.

But with the mysterious CPU pre-fetcher and page miss handler Unit, the
assumed TLB refill cost is far lower then 100ns in sequential access. And
2 HT siblings in one core makes the memory access more faster if they are
accessing the same memory. So, in the patch, I just do the change when
the target entries is less than 1/16 of whole active tlb entries.
Actually, I have no data support for the percentage '1/16', so any
suggestions are welcomed.

As to hugetlb, guess due to smaller page table, and smaller active TLB
entries, I didn't see benefit via my benchmark, so no optimizing now.

My micro benchmark show in ideal scenarios, the performance improves 70
percent in reading. And in worst scenario, the reading/writing
performance is similar with unpatched 3.4-rc4 kernel.

Here is the reading data on my 2P * 4cores *HT NHM EP machine, with THP
'always':

multi thread testing, '-t' paramter is thread number:
	       	        with patch   unpatched 3.4-rc4
./mprotect -t 1           14ns		24ns
./mprotect -t 2           13ns		22ns
./mprotect -t 4           12ns		19ns
./mprotect -t 8           14ns		16ns
./mprotect -t 16          28ns		26ns
./mprotect -t 32          54ns		51ns
./mprotect -t 128         200ns		199ns

Single process with sequencial flushing and memory accessing:

		       	with patch   unpatched 3.4-rc4
./mprotect		    7ns			11ns
./mprotect -p 4096  -l 8 -n 10240
			    21ns		21ns

[ hpa: http://lkml.kernel.org/r/1B4B44D9196EFF41AE41FDA404FC0A100BFF94@SHSMSX101.ccr.corp.intel.com
  has additional performance numbers. ]

Signed-off-by: Alex Shi <alex.shi@intel.com>
Link: http://lkml.kernel.org/r/1340845344-27557-3-git-send-email-alex.shi@intel.com
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2012-06-27 19:29:07 -07:00
..
Kconfig xen: Make XEN_MAX_DOMAIN_MEMORY have more sensible defaults 2011-11-21 17:14:46 -05:00
Makefile xen/x86: Implement x86_apic_ops 2012-05-01 14:50:33 -04:00
apic.c x86/xen/apic: Add missing #include <xen/xen.h> 2012-05-18 09:34:45 +02:00
debugfs.c debugfs: Add support to print u32 array in debugfs 2012-04-17 00:18:36 -04:00
debugfs.h debugfs: Add support to print u32 array in debugfs 2012-04-17 00:18:36 -04:00
enlighten.c x86, amd, xen: Avoid NULL pointer paravirt references 2012-05-30 16:15:02 -07:00
grant-table.c Merge commit 'v3.2-rc3' into stable/for-linus-3.3 2011-12-20 17:01:18 -05:00
irq.c xen: use this_cpu_xxx replace percpu_xxx funcs 2012-01-24 12:20:24 -05:00
mmu.c x86/flush_tlb: try flush_tlb_single one by one in flush_tlb_range 2012-06-27 19:29:07 -07:00
mmu.h xen: make a pile of mmu pvop functions static 2011-05-20 14:25:24 -07:00
multicalls.c xen/multicall: move *idx fields to start of mc_buffer 2011-07-18 15:43:46 -07:00
multicalls.h xen: use this_cpu_xxx replace percpu_xxx funcs 2012-01-24 12:20:24 -05:00
p2m.c Revert "xen/p2m: m2p_find_override: use list_for_each_entry_safe" 2012-04-20 11:56:00 -04:00
pci-swiotlb-xen.c X86 & IA64: adapt for dma_map_ops changes 2012-03-28 16:36:31 +02:00
platform-pci-unplug.c xen:pvhvm: Modpost section mismatch fix 2011-07-11 13:37:04 -04:00
setup.c xen/setup: update VA mapping when releasing memory during setup 2012-05-07 15:32:24 -04:00
smp.c Features: 2012-05-24 16:02:08 -07:00
smp.h xen: implement apic ipi interface 2012-05-07 15:33:15 -04:00
spinlock.c debugfs: Add support to print u32 array in debugfs 2012-04-17 00:18:36 -04:00
suspend.c xen: suspend: add "arch" to pre/post suspend hooks 2011-02-25 16:43:12 +00:00
time.c Merge branch 'upstream/xen-settime' of git://git.kernel.org/pub/scm/linux/kernel/git/jeremy/xen 2011-11-06 20:15:05 -08:00
trace.c xen/trace: Fix compile error when CONFIG_XEN_PRIVILEGED_GUEST is not set 2011-08-05 09:43:02 -04:00
vdso.h
vga.c xen: allow enable use of VGA console on dom0 2011-06-06 11:46:00 -04:00
xen-asm.S xen: correctly check for pending events when restoring irq flags 2012-04-27 16:04:21 -04:00
xen-asm.h xen: make direct versions of irq_enable/disable/save/restore to common code 2009-02-04 16:59:04 -08:00
xen-asm_32.S x86, extable: Remove open-coded exception table entries in arch/x86/xen/xen-asm_32.S 2012-04-20 13:51:39 -07:00
xen-asm_64.S xen: use iret for return from 64b kernel to 32b usermode 2009-12-03 11:14:54 -08:00
xen-head.S x86, asm: Cleanup unnecssary macros in asm-offsets.c 2011-02-25 16:37:32 -08:00
xen-ops.h Features: 2012-05-24 16:02:08 -07:00