mirror of https://gitee.com/openkylin/linux.git
10 Commits
Author | SHA1 | Message | Date |
---|---|---|---|
Linus Torvalds | c136b84393 |
PPC:
- Better machine check handling for HV KVM - Ability to support guests with threads=2, 4 or 8 on POWER9 - Fix for a race that could cause delayed recognition of signals - Fix for a bug where POWER9 guests could sleep with interrupts pending. ARM: - VCPU request overhaul - allow timer and PMU to have their interrupt number selected from userspace - workaround for Cavium erratum 30115 - handling of memory poisonning - the usual crop of fixes and cleanups s390: - initial machine check forwarding - migration support for the CMMA page hinting information - cleanups and fixes x86: - nested VMX bugfixes and improvements - more reliable NMI window detection on AMD - APIC timer optimizations Generic: - VCPU request overhaul + documentation of common code patterns - kvm_stat improvements There is a small conflict in arch/s390 due to an arch-wide field rename. -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.22 (GNU/Linux) iQEcBAABAgAGBQJZW4XTAAoJEL/70l94x66DkhMH/izpk54KI17PtyQ9VYI2sYeZ BWK6Kl886g3ij4pFi3pECqjDJzWaa3ai+vFfzzpJJ8OkCJT5Rv4LxC5ERltVVmR8 A3T1I/MRktSC0VJLv34daPC2z4Lco/6SPipUpPnL4bE2HATKed4vzoOjQ3tOeGTy dwi7TFjKwoVDiM7kPPDRnTHqCe5G5n13sZ49dBe9WeJ7ttJauWqoxhlYosCGNPEj g8ZX8+cvcAhVnz5uFL8roqZ8ygNEQq2mgkU18W8ZZKuiuwR0gdsG0gSBFNTdwIMK NoreRKMrw0+oLXTIB8SZsoieU6Qi7w3xMAMabe8AJsvYtoersugbOmdxGCr1lsA= =OD7H -----END PGP SIGNATURE----- Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm Pull KVM updates from Paolo Bonzini: "PPC: - Better machine check handling for HV KVM - Ability to support guests with threads=2, 4 or 8 on POWER9 - Fix for a race that could cause delayed recognition of signals - Fix for a bug where POWER9 guests could sleep with interrupts pending. ARM: - VCPU request overhaul - allow timer and PMU to have their interrupt number selected from userspace - workaround for Cavium erratum 30115 - handling of memory poisonning - the usual crop of fixes and cleanups s390: - initial machine check forwarding - migration support for the CMMA page hinting information - cleanups and fixes x86: - nested VMX bugfixes and improvements - more reliable NMI window detection on AMD - APIC timer optimizations Generic: - VCPU request overhaul + documentation of common code patterns - kvm_stat improvements" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (124 commits) Update my email address kvm: vmx: allow host to access guest MSR_IA32_BNDCFGS x86: kvm: mmu: use ept a/d in vmcs02 iff used in vmcs12 kvm: x86: mmu: allow A/D bits to be disabled in an mmu x86: kvm: mmu: make spte mmio mask more explicit x86: kvm: mmu: dead code thanks to access tracking KVM: PPC: Book3S: Fix typo in XICS-on-XIVE state saving code KVM: PPC: Book3S HV: Close race with testing for signals on guest entry KVM: PPC: Book3S HV: Simplify dynamic micro-threading code KVM: x86: remove ignored type attribute KVM: LAPIC: Fix lapic timer injection delay KVM: lapic: reorganize restart_apic_timer KVM: lapic: reorganize start_hv_timer kvm: nVMX: Check memory operand to INVVPID KVM: s390: Inject machine check into the nested guest KVM: s390: Inject machine check into the guest tools/kvm_stat: add new interactive command 'b' tools/kvm_stat: add new command line switch '-i' tools/kvm_stat: fix error on interactive command 'g' KVM: SVM: suppress unnecessary NMI singlestep on GIF=0 and nested exit ... |
|
Linus Torvalds | 55a7b2125c |
arm64 updates for 4.13:
- RAS reporting via GHES/APEI (ACPI) - Indirect ftrace trampolines for modules - Improvements to kernel fault reporting - Page poisoning - Sigframe cleanups and preparation for SVE context - Core dump fixes - Sparse fixes (mainly relating to endianness) - xgene SoC PMU v3 driver - Misc cleanups and non-critical fixes -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQEcBAABCgAGBQJZWiuVAAoJELescNyEwWM0g/gIAIRpVEzjE61zfm/KCsVuIu4O p6F/HrvF/ApvlFcth8LDpTDYUholzT1e9wmx/O0Ll37UvFUrReT03R5MMJ02WU8s hRg0N4izdg2BPa9zuaP/XE5i6WmFfRAwFsv6PzX77FjNGk0M4zhW8acNpWHYMBQT DwXT/xCvg6045Sj6CuwfcIqqVHrz6/kpBmvdbW7G3/WpIHpUGIWM9EO3mkuLGMj0 j0VSCxfAVJvWwmKEBdFExLNjqxvSlVAMOIEAw7yBNLjuheiL+afK+Y1BggB00oe8 14+6viOgW6L97VmPpYVn0YDseqeGg5DqlNF3NqjTqdmzWH/ApAvL4WXN7SL2jbU= =RNzb -----END PGP SIGNATURE----- Merge tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux Pull arm64 updates from Will Deacon: - RAS reporting via GHES/APEI (ACPI) - Indirect ftrace trampolines for modules - Improvements to kernel fault reporting - Page poisoning - Sigframe cleanups and preparation for SVE context - Core dump fixes - Sparse fixes (mainly relating to endianness) - xgene SoC PMU v3 driver - Misc cleanups and non-critical fixes * tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (75 commits) arm64: fix endianness annotation for 'struct jit_ctx' and friends arm64: cpuinfo: constify attribute_group structures. arm64: ptrace: Fix incorrect get_user() use in compat_vfp_set() arm64: ptrace: Remove redundant overrun check from compat_vfp_set() arm64: ptrace: Avoid setting compat FP[SC]R to garbage if get_user fails arm64: fix endianness annotation for __apply_alternatives()/get_alt_insn() arm64: fix endianness annotation in get_kaslr_seed() arm64: add missing conversion to __wsum in ip_fast_csum() arm64: fix endianness annotation in acpi_parking_protocol.c arm64: use readq() instead of readl() to read 64bit entry_point arm64: fix endianness annotation for reloc_insn_movw() & reloc_insn_imm() arm64: fix endianness annotation for aarch64_insn_write() arm64: fix endianness annotation in aarch64_insn_read() arm64: fix endianness annotation in call_undef_hook() arm64: fix endianness annotation for debug-monitors.c ras: mark stub functions as 'inline' arm64: pass endianness info to sparse arm64: ftrace: fix !CONFIG_ARM64_MODULE_PLTS kernels arm64: signal: Allow expansion of the signal frame acpi: apei: check for pending errors when probing GHES entries ... |
|
Tyler Baicar | 621f48e40e |
arm/arm64: KVM: add guest SEA support
Currently external aborts are unsupported by the guest abort handling. Add handling for SEAs so that the host kernel reports SEAs which occur in the guest kernel. When an SEA occurs in the guest kernel, the guest exits and is routed to kvm_handle_guest_abort(). Prior to this patch, a print message of an unsupported FSC would be printed and nothing else would happen. With this patch, the code gets routed to the APEI handling of SEAs in the host kernel to report the SEA information. Signed-off-by: Tyler Baicar <tbaicar@codeaurora.org> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Acked-by: Marc Zyngier <marc.zyngier@arm.com> Acked-by: Christoffer Dall <cdall@linaro.org> Signed-off-by: Will Deacon <will.deacon@arm.com> |
|
James Morse | 196f878a7a |
KVM: arm/arm64: Signal SIGBUS when stage2 discovers hwpoison memory
Once we enable ARCH_SUPPORTS_MEMORY_FAILURE on arm64, notifications for broken memory can call memory_failure() in mm/memory-failure.c to offline pages of memory, possibly signalling user space processes and notifying all the in-kernel users. memory_failure() has two modes, early and late. Early is used by machine-managers like Qemu to receive a notification when a memory error is notified to the host. These can then be relayed to the guest before the affected page is accessed. To enable this, the process must set PR_MCE_KILL_EARLY in PR_MCE_KILL_SET using the prctl() syscall. Once the early notification has been handled, nothing stops the machine-manager or guest from accessing the affected page. If the machine-manager does this the page will fail to be mapped and SIGBUS will be sent. This patch adds the equivalent path for when the guest accesses the page, sending SIGBUS to the machine-manager. These two signals can be distinguished by the machine-manager using their si_code: BUS_MCEERR_AO for 'action optional' early notifications, and BUS_MCEERR_AR for 'action required' synchronous/late notifications. Do as x86 does, and deliver the SIGBUS when we discover pfn == KVM_PFN_ERR_HWPOISON. Use the hugepage size as si_addr_lsb if this vma was allocated as a hugepage. Transparent hugepages will be split by memory_failure() before we see them here. Cc: Punit Agrawal <punit.agrawal@arm.com> Signed-off-by: James Morse <james.morse@arm.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> |
|
Marc Zyngier | d6dbdd3c85 |
KVM: arm/arm64: Handle possible NULL stage2 pud when ageing pages
Under memory pressure, we start ageing pages, which amounts to parsing the page tables. Since we don't want to allocate any extra level, we pass NULL for our private allocation cache. Which means that stage2_get_pud() is allowed to fail. This results in the following splat: [ 1520.409577] Unable to handle kernel NULL pointer dereference at virtual address 00000008 [ 1520.417741] pgd = ffff810f52fef000 [ 1520.421201] [00000008] *pgd=0000010f636c5003, *pud=0000010f56f48003, *pmd=0000000000000000 [ 1520.429546] Internal error: Oops: 96000006 [#1] PREEMPT SMP [ 1520.435156] Modules linked in: [ 1520.438246] CPU: 15 PID: 53550 Comm: qemu-system-aar Tainted: G W 4.12.0-rc4-00027-g1885c397eaec #7205 [ 1520.448705] Hardware name: FOXCONN R2-1221R-A4/C2U4N_MB, BIOS G31FB12A 10/26/2016 [ 1520.463726] task: ffff800ac5fb4e00 task.stack: ffff800ce04e0000 [ 1520.469666] PC is at stage2_get_pmd+0x34/0x110 [ 1520.474119] LR is at kvm_age_hva_handler+0x44/0xf0 [ 1520.478917] pc : [<ffff0000080b137c>] lr : [<ffff0000080b149c>] pstate: 40000145 [ 1520.486325] sp : ffff800ce04e33d0 [ 1520.489644] x29: ffff800ce04e33d0 x28: 0000000ffff40064 [ 1520.494967] x27: 0000ffff27e00000 x26: 0000000000000000 [ 1520.500289] x25: ffff81051ba65008 x24: 0000ffff40065000 [ 1520.505618] x23: 0000ffff40064000 x22: 0000000000000000 [ 1520.510947] x21: ffff810f52b20000 x20: 0000000000000000 [ 1520.516274] x19: 0000000058264000 x18: 0000000000000000 [ 1520.521603] x17: 0000ffffa6fe7438 x16: ffff000008278b70 [ 1520.526940] x15: 000028ccd8000000 x14: 0000000000000008 [ 1520.532264] x13: ffff7e0018298000 x12: 0000000000000002 [ 1520.537582] x11: ffff000009241b93 x10: 0000000000000940 [ 1520.542908] x9 : ffff0000092ef800 x8 : 0000000000000200 [ 1520.548229] x7 : ffff800ce04e36a8 x6 : 0000000000000000 [ 1520.553552] x5 : 0000000000000001 x4 : 0000000000000000 [ 1520.558873] x3 : 0000000000000000 x2 : 0000000000000008 [ 1520.571696] x1 : ffff000008fd5000 x0 : ffff0000080b149c [ 1520.577039] Process qemu-system-aar (pid: 53550, stack limit = 0xffff800ce04e0000) [...] [ 1521.510735] [<ffff0000080b137c>] stage2_get_pmd+0x34/0x110 [ 1521.516221] [<ffff0000080b149c>] kvm_age_hva_handler+0x44/0xf0 [ 1521.522054] [<ffff0000080b0610>] handle_hva_to_gpa+0xb8/0xe8 [ 1521.527716] [<ffff0000080b3434>] kvm_age_hva+0x44/0xf0 [ 1521.532854] [<ffff0000080a58b0>] kvm_mmu_notifier_clear_flush_young+0x70/0xc0 [ 1521.539992] [<ffff000008238378>] __mmu_notifier_clear_flush_young+0x88/0xd0 [ 1521.546958] [<ffff00000821eca0>] page_referenced_one+0xf0/0x188 [ 1521.552881] [<ffff00000821f36c>] rmap_walk_anon+0xec/0x250 [ 1521.558370] [<ffff000008220f78>] rmap_walk+0x78/0xa0 [ 1521.563337] [<ffff000008221104>] page_referenced+0x164/0x180 [ 1521.569002] [<ffff0000081f1af0>] shrink_active_list+0x178/0x3b8 [ 1521.574922] [<ffff0000081f2058>] shrink_node_memcg+0x328/0x600 [ 1521.580758] [<ffff0000081f23f4>] shrink_node+0xc4/0x328 [ 1521.585986] [<ffff0000081f2718>] do_try_to_free_pages+0xc0/0x340 [ 1521.592000] [<ffff0000081f2a64>] try_to_free_pages+0xcc/0x240 [...] The trivial fix is to handle this NULL pud value early, rather than dereferencing it blindly. Cc: stable@vger.kernel.org Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Reviewed-by: Christoffer Dall <cdall@linaro.org> Signed-off-by: Christoffer Dall <cdall@linaro.org> |
|
Suzuki K Poulose | 0c428a6a92 |
kvm: arm/arm64: Fix use after free of stage2 page table
We yield the kvm->mmu_lock occassionaly while performing an operation (e.g, unmap or permission changes) on a large area of stage2 mappings. However this could possibly cause another thread to clear and free up the stage2 page tables while we were waiting for regaining the lock and thus the original thread could end up in accessing memory that was freed. This patch fixes the problem by making sure that the stage2 pagetable is still valid after we regain the lock. The fact that mmu_notifer->release() could be called twice (via __mmu_notifier_release and mmu_notifier_unregsister) enhances the possibility of hitting this race where there are two threads trying to unmap the entire guest shadow pages. While at it, cleanup the redudant checks around cond_resched_lock in stage2_wp_range(), as cond_resched_lock already does the same checks. Cc: Mark Rutland <mark.rutland@arm.com> Cc: Radim Krčmář <rkrcmar@redhat.com> Cc: andreyknvl@google.com Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: stable@vger.kernel.org Acked-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com> Reviewed-by: Christoffer Dall <cdall@linaro.org> Signed-off-by: Christoffer Dall <cdall@linaro.org> |
|
Suzuki K Poulose | 2952a6070e |
kvm: arm/arm64: Force reading uncached stage2 PGD
Make sure we don't use a cached value of the KVM stage2 PGD while resetting the PGD. Cc: Marc Zyngier <marc.zyngier@arm.com> Cc: stable@vger.kernel.org Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com> Reviewed-by: Christoffer Dall <cdall@linaro.org> Signed-off-by: Christoffer Dall <cdall@linaro.org> |
|
Suzuki K Poulose | 6c0d706b56 |
kvm: arm/arm64: Fix race in resetting stage2 PGD
In kvm_free_stage2_pgd() we check the stage2 PGD before holding the lock and proceed to take the lock if it is valid. And we unmap the page tables, followed by releasing the lock. We reset the PGD only after dropping this lock, which could cause a race condition where another thread waiting on or even holding the lock, could potentially see that the PGD is still valid and proceed to perform a stage2 operation and later encounter a NULL PGD. [223090.242280] Unable to handle kernel NULL pointer dereference at virtual address 00000040 [223090.262330] PC is at unmap_stage2_range+0x8c/0x428 [223090.262332] LR is at kvm_unmap_hva_handler+0x2c/0x3c [223090.262531] Call trace: [223090.262533] [<ffff0000080adb78>] unmap_stage2_range+0x8c/0x428 [223090.262535] [<ffff0000080adf40>] kvm_unmap_hva_handler+0x2c/0x3c [223090.262537] [<ffff0000080ace2c>] handle_hva_to_gpa+0xb0/0x104 [223090.262539] [<ffff0000080af988>] kvm_unmap_hva+0x5c/0xbc [223090.262543] [<ffff0000080a2478>] kvm_mmu_notifier_invalidate_page+0x50/0x8c [223090.262547] [<ffff0000082274f8>] __mmu_notifier_invalidate_page+0x5c/0x84 [223090.262551] [<ffff00000820b700>] try_to_unmap_one+0x1d0/0x4a0 [223090.262553] [<ffff00000820c5c8>] rmap_walk+0x1cc/0x2e0 [223090.262555] [<ffff00000820c90c>] try_to_unmap+0x74/0xa4 [223090.262557] [<ffff000008230ce4>] migrate_pages+0x31c/0x5ac [223090.262561] [<ffff0000081f869c>] compact_zone+0x3fc/0x7ac [223090.262563] [<ffff0000081f8ae0>] compact_zone_order+0x94/0xb0 [223090.262564] [<ffff0000081f91c0>] try_to_compact_pages+0x108/0x290 [223090.262569] [<ffff0000081d5108>] __alloc_pages_direct_compact+0x70/0x1ac [223090.262571] [<ffff0000081d64a0>] __alloc_pages_nodemask+0x434/0x9f4 [223090.262572] [<ffff0000082256f0>] alloc_pages_vma+0x230/0x254 [223090.262574] [<ffff000008235e5c>] do_huge_pmd_anonymous_page+0x114/0x538 [223090.262576] [<ffff000008201bec>] handle_mm_fault+0xd40/0x17a4 [223090.262577] [<ffff0000081fb324>] __get_user_pages+0x12c/0x36c [223090.262578] [<ffff0000081fb804>] get_user_pages_unlocked+0xa4/0x1b8 [223090.262579] [<ffff0000080a3ce8>] __gfn_to_pfn_memslot+0x280/0x31c [223090.262580] [<ffff0000080a3dd0>] gfn_to_pfn_prot+0x4c/0x5c [223090.262582] [<ffff0000080af3f8>] kvm_handle_guest_abort+0x240/0x774 [223090.262584] [<ffff0000080b2bac>] handle_exit+0x11c/0x1ac [223090.262586] [<ffff0000080ab99c>] kvm_arch_vcpu_ioctl_run+0x31c/0x648 [223090.262587] [<ffff0000080a1d78>] kvm_vcpu_ioctl+0x378/0x768 [223090.262590] [<ffff00000825df5c>] do_vfs_ioctl+0x324/0x5a4 [223090.262591] [<ffff00000825e26c>] SyS_ioctl+0x90/0xa4 [223090.262595] [<ffff000008085d84>] el0_svc_naked+0x38/0x3c This patch moves the stage2 PGD manipulation under the lock. Reported-by: Alexander Graf <agraf@suse.de> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Marc Zyngier <marc.zyngier@arm.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Radim Krčmář <rkrcmar@redhat.com> Reviewed-by: Christoffer Dall <cdall@linaro.org> Reviewed-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com> Signed-off-by: Christoffer Dall <cdall@linaro.org> |
|
Paolo Bonzini | 36c344f3f1 |
Second round of KVM/ARM Changes for v4.12.
Changes include: - A fix related to the 32-bit idmap stub - A fix to the bitmask used to deode the operands of an AArch32 CP instruction - We have moved the files shared between arch/arm/kvm and arch/arm64/kvm to virt/kvm/arm - We add support for saving/restoring the virtual ITS state to userspace -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQEcBAABAgAGBQJZEZihAAoJEEtpOizt6ddyGDYH/jmGjDMnryORn2P2o10dUQKJ RnHTQYnpOYqnprlkFtZFpmK+mjl/a8R1Btb7GK2EwmovTR95pMYPRqtrCTOL0aQA 4OToh7+vFGatwxsGCS6utazdhmx0UT/LhO/GEF4G1zOb7eVa4ZtS1NKLP2WjPD1E RU3Qn8wa0pESv3tJScv8qo2+PWVX4krbFllhY2Hk0AkVQcI66ExkdVq4ikm1eUXn rxzIayLG2bv3KEPNCzozdwoY9tDL+b40q6vN/RHGJmM05SZbbSx2/Bkw2RbslSpD 2hvhHWX7xeuEBcd5mZO7sP4WS3hM/BI8eX7q+uMeNJ9B+nM82yjGfOTtglVi2cc= =JfvQ -----END PGP SIGNATURE----- Merge tag 'kvm-arm-for-v4.12-round2' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD Second round of KVM/ARM Changes for v4.12. Changes include: - A fix related to the 32-bit idmap stub - A fix to the bitmask used to deode the operands of an AArch32 CP instruction - We have moved the files shared between arch/arm/kvm and arch/arm64/kvm to virt/kvm/arm - We add support for saving/restoring the virtual ITS state to userspace |
|
Christoffer Dall | 35d2d5d490 |
KVM: arm/arm64: Move shared files to virt/kvm/arm
For some time now we have been having a lot of shared functionality between the arm and arm64 KVM support in arch/arm, which not only required a horrible inter-arch reference from the Makefile in arch/arm64/kvm, but also created confusion for newcomers to the code base, as was recently seen on the mailing list. Further, it causes confusion for things like cscope, which needs special attention to index specific shared files for arm64 from the arm tree. Move the shared files into virt/kvm/arm and move the trace points along with it. When moving the tracepoints we have to modify the way the vgic creates definitions of the trace points, so we take the chance to include the VGIC tracepoints in its very own special vgic trace.h file. Signed-off-by: Christoffer Dall <cdall@linaro.org> |