KVM: x86/mmu: Consolidate comments about {Host,MMU}-writable
Consolidate the large comment above DEFAULT_SPTE_HOST_WRITABLE with the large comment above is_writable_pte() into one comment. This comment explains the different reasons why an SPTE may be non-writable and KVM keeps track of that with the {Host,MMU}-writable bits. No functional change intended. Signed-off-by: David Matlack <dmatlack@google.com> Message-Id: <20220125230723.1701061-1-dmatlack@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
This commit is contained in:
parent
1ca87e015d
commit
02844ac1eb
|
@ -549,11 +549,9 @@ static u64 mmu_spte_update_no_track(u64 *sptep, u64 new_spte)
|
|||
/* Rules for using mmu_spte_update:
|
||||
* Update the state bits, it means the mapped pfn is not changed.
|
||||
*
|
||||
* Whenever we overwrite a writable spte with a read-only one we
|
||||
* should flush remote TLBs. Otherwise rmap_write_protect
|
||||
* will find a read-only spte, even though the writable spte
|
||||
* might be cached on a CPU's TLB, the return value indicates this
|
||||
* case.
|
||||
* Whenever an MMU-writable SPTE is overwritten with a read-only SPTE, remote
|
||||
* TLBs must be flushed. Otherwise rmap_write_protect will find a read-only
|
||||
* spte, even though the writable spte might be cached on a CPU's TLB.
|
||||
*
|
||||
* Returns true if the TLB needs to be flushed
|
||||
*/
|
||||
|
@ -5844,7 +5842,7 @@ void kvm_mmu_slot_remove_write_access(struct kvm *kvm,
|
|||
* will clear a separate software-only bit (MMU-writable) and skip the
|
||||
* flush if-and-only-if this bit was already clear.
|
||||
*
|
||||
* See DEFAULT_SPTE_MMU_WRITABLE for more details.
|
||||
* See is_writable_pte() for more details.
|
||||
*/
|
||||
if (flush)
|
||||
kvm_arch_flush_remote_tlbs_memslot(kvm, memslot);
|
||||
|
|
|
@ -75,28 +75,8 @@ static_assert(SPTE_TDP_AD_ENABLED_MASK == 0);
|
|||
static_assert(!(SPTE_TDP_AD_MASK & SHADOW_ACC_TRACK_SAVED_MASK));
|
||||
|
||||
/*
|
||||
* *_SPTE_HOST_WRITABLE (aka Host-writable) indicates whether the host permits
|
||||
* writes to the guest page mapped by the SPTE. This bit is cleared on SPTEs
|
||||
* that map guest pages in read-only memslots and read-only VMAs.
|
||||
*
|
||||
* Invariants:
|
||||
* - If Host-writable is clear, PT_WRITABLE_MASK must be clear.
|
||||
*
|
||||
*
|
||||
* *_SPTE_MMU_WRITABLE (aka MMU-writable) indicates whether the shadow MMU
|
||||
* allows writes to the guest page mapped by the SPTE. This bit is cleared when
|
||||
* the guest page mapped by the SPTE contains a page table that is being
|
||||
* monitored for shadow paging. In this case the SPTE can only be made writable
|
||||
* by unsyncing the shadow page under the mmu_lock.
|
||||
*
|
||||
* Invariants:
|
||||
* - If MMU-writable is clear, PT_WRITABLE_MASK must be clear.
|
||||
* - If MMU-writable is set, Host-writable must be set.
|
||||
*
|
||||
* If MMU-writable is set, PT_WRITABLE_MASK is normally set but can be cleared
|
||||
* to track writes for dirty logging. For such SPTEs, KVM will locklessly set
|
||||
* PT_WRITABLE_MASK upon the next write from the guest and record the write in
|
||||
* the dirty log (see fast_page_fault()).
|
||||
* {DEFAULT,EPT}_SPTE_{HOST,MMU}_WRITABLE are used to keep track of why a given
|
||||
* SPTE is write-protected. See is_writable_pte() for details.
|
||||
*/
|
||||
|
||||
/* Bits 9 and 10 are ignored by all non-EPT PTEs. */
|
||||
|
@ -340,37 +320,64 @@ static __always_inline bool is_rsvd_spte(struct rsvd_bits_validate *rsvd_check,
|
|||
}
|
||||
|
||||
/*
|
||||
* Currently, we have two sorts of write-protection, a) the first one
|
||||
* write-protects guest page to sync the guest modification, b) another one is
|
||||
* used to sync dirty bitmap when we do KVM_GET_DIRTY_LOG. The differences
|
||||
* between these two sorts are:
|
||||
* 1) the first case clears MMU-writable bit.
|
||||
* 2) the first case requires flushing tlb immediately avoiding corrupting
|
||||
* shadow page table between all vcpus so it should be in the protection of
|
||||
* mmu-lock. And the another case does not need to flush tlb until returning
|
||||
* the dirty bitmap to userspace since it only write-protects the page
|
||||
* logged in the bitmap, that means the page in the dirty bitmap is not
|
||||
* missed, so it can flush tlb out of mmu-lock.
|
||||
* An shadow-present leaf SPTE may be non-writable for 3 possible reasons:
|
||||
*
|
||||
* So, there is the problem: the first case can meet the corrupted tlb caused
|
||||
* by another case which write-protects pages but without flush tlb
|
||||
* immediately. In order to making the first case be aware this problem we let
|
||||
* it flush tlb if we try to write-protect a spte whose MMU-writable bit
|
||||
* is set, it works since another case never touches MMU-writable bit.
|
||||
* 1. To intercept writes for dirty logging. KVM write-protects huge pages
|
||||
* so that they can be split be split down into the dirty logging
|
||||
* granularity (4KiB) whenever the guest writes to them. KVM also
|
||||
* write-protects 4KiB pages so that writes can be recorded in the dirty log
|
||||
* (e.g. if not using PML). SPTEs are write-protected for dirty logging
|
||||
* during the VM-iotcls that enable dirty logging.
|
||||
*
|
||||
* Anyway, whenever a spte is updated (only permission and status bits are
|
||||
* changed) we need to check whether the spte with MMU-writable becomes
|
||||
* readonly, if that happens, we need to flush tlb. Fortunately,
|
||||
* mmu_spte_update() has already handled it perfectly.
|
||||
* 2. To intercept writes to guest page tables that KVM is shadowing. When a
|
||||
* guest writes to its page table the corresponding shadow page table will
|
||||
* be marked "unsync". That way KVM knows which shadow page tables need to
|
||||
* be updated on the next TLB flush, INVLPG, etc. and which do not.
|
||||
*
|
||||
* The rules to use MMU-writable and PT_WRITABLE_MASK:
|
||||
* - if we want to see if it has writable tlb entry or if the spte can be
|
||||
* writable on the mmu mapping, check MMU-writable, this is the most
|
||||
* case, otherwise
|
||||
* - if we fix page fault on the spte or do write-protection by dirty logging,
|
||||
* check PT_WRITABLE_MASK.
|
||||
* 3. To prevent guest writes to read-only memory, such as for memory in a
|
||||
* read-only memslot or guest memory backed by a read-only VMA. Writes to
|
||||
* such pages are disallowed entirely.
|
||||
*
|
||||
* TODO: introduce APIs to split these two cases.
|
||||
* To keep track of why a given SPTE is write-protected, KVM uses 2
|
||||
* software-only bits in the SPTE:
|
||||
*
|
||||
* shadow_mmu_writable_mask, aka MMU-writable -
|
||||
* Cleared on SPTEs that KVM is currently write-protecting for shadow paging
|
||||
* purposes (case 2 above).
|
||||
*
|
||||
* shadow_host_writable_mask, aka Host-writable -
|
||||
* Cleared on SPTEs that are not host-writable (case 3 above)
|
||||
*
|
||||
* Note, not all possible combinations of PT_WRITABLE_MASK,
|
||||
* shadow_mmu_writable_mask, and shadow_host_writable_mask are valid. A given
|
||||
* SPTE can be in only one of the following states, which map to the
|
||||
* aforementioned 3 cases:
|
||||
*
|
||||
* shadow_host_writable_mask | shadow_mmu_writable_mask | PT_WRITABLE_MASK
|
||||
* ------------------------- | ------------------------ | ----------------
|
||||
* 1 | 1 | 1 (writable)
|
||||
* 1 | 1 | 0 (case 1)
|
||||
* 1 | 0 | 0 (case 2)
|
||||
* 0 | 0 | 0 (case 3)
|
||||
*
|
||||
* The valid combinations of these bits are checked by
|
||||
* check_spte_writable_invariants() whenever an SPTE is modified.
|
||||
*
|
||||
* Clearing the MMU-writable bit is always done under the MMU lock and always
|
||||
* accompanied by a TLB flush before dropping the lock to avoid corrupting the
|
||||
* shadow page tables between vCPUs. Write-protecting an SPTE for dirty logging
|
||||
* (which does not clear the MMU-writable bit), does not flush TLBs before
|
||||
* dropping the lock, as it only needs to synchronize guest writes with the
|
||||
* dirty bitmap.
|
||||
*
|
||||
* So, there is the problem: clearing the MMU-writable bit can encounter a
|
||||
* write-protected SPTE while CPUs still have writable mappings for that SPTE
|
||||
* cached in their TLB. To address this, KVM always flushes TLBs when
|
||||
* write-protecting SPTEs if the MMU-writable bit is set on the old SPTE.
|
||||
*
|
||||
* The Host-writable bit is not modified on present SPTEs, it is only set or
|
||||
* cleared when an SPTE is first faulted in from non-present and then remains
|
||||
* immutable.
|
||||
*/
|
||||
static inline bool is_writable_pte(unsigned long pte)
|
||||
{
|
||||
|
|
Loading…
Reference in New Issue