mm: document ZONE_DEVICE memory-model implications
Explain the general mechanisms of 'ZONE_DEVICE' pages and list the users of 'devm_memremap_pages()'. [dan.j.williams@intel.com: update ZONE_DEVICE memory model documentation] Link: http://lkml.kernel.org/r/156109575458.1409767.1885676287099277666.stgit@dwillia2-desk3.amr.corp.intel.com Link: http://lkml.kernel.org/r/156092354985.979959.15763234410543451710.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com> Reported-by: Mike Rapoport <rppt@linux.ibm.com> Reviewed-by: Mike Rapoport <rppt@linux.ibm.com> Tested-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> [ppc64] Cc: Jonathan Corbet <corbet@lwn.net> Cc: David Hildenbrand <david@redhat.com> Cc: Jane Chu <jane.chu@oracle.com> Cc: Jeff Moyer <jmoyer@redhat.com> Cc: Jérôme Glisse <jglisse@redhat.com> Cc: Logan Gunthorpe <logang@deltatee.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: Pavel Tatashin <pasha.tatashin@soleen.com> Cc: Toshi Kani <toshi.kani@hpe.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Wei Yang <richardw.yang@linux.intel.com> Cc: Jason Gunthorpe <jgg@mellanox.com> Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This commit is contained in:
parent
ba72b4c8cf
commit
a0653406a3
|
@ -181,3 +181,43 @@ that is eventually passed to vmemmap_populate() through a long chain
|
|||
of function calls. The vmemmap_populate() implementation may use the
|
||||
`vmem_altmap` along with :c:func:`altmap_alloc_block_buf` helper to
|
||||
allocate memory map on the persistent memory device.
|
||||
|
||||
ZONE_DEVICE
|
||||
===========
|
||||
The `ZONE_DEVICE` facility builds upon `SPARSEMEM_VMEMMAP` to offer
|
||||
`struct page` `mem_map` services for device driver identified physical
|
||||
address ranges. The "device" aspect of `ZONE_DEVICE` relates to the fact
|
||||
that the page objects for these address ranges are never marked online,
|
||||
and that a reference must be taken against the device, not just the page
|
||||
to keep the memory pinned for active use. `ZONE_DEVICE`, via
|
||||
:c:func:`devm_memremap_pages`, performs just enough memory hotplug to
|
||||
turn on :c:func:`pfn_to_page`, :c:func:`page_to_pfn`, and
|
||||
:c:func:`get_user_pages` service for the given range of pfns. Since the
|
||||
page reference count never drops below 1 the page is never tracked as
|
||||
free memory and the page's `struct list_head lru` space is repurposed
|
||||
for back referencing to the host device / driver that mapped the memory.
|
||||
|
||||
While `SPARSEMEM` presents memory as a collection of sections,
|
||||
optionally collected into memory blocks, `ZONE_DEVICE` users have a need
|
||||
for smaller granularity of populating the `mem_map`. Given that
|
||||
`ZONE_DEVICE` memory is never marked online it is subsequently never
|
||||
subject to its memory ranges being exposed through the sysfs memory
|
||||
hotplug api on memory block boundaries. The implementation relies on
|
||||
this lack of user-api constraint to allow sub-section sized memory
|
||||
ranges to be specified to :c:func:`arch_add_memory`, the top-half of
|
||||
memory hotplug. Sub-section support allows for 2MB as the cross-arch
|
||||
common alignment granularity for :c:func:`devm_memremap_pages`.
|
||||
|
||||
The users of `ZONE_DEVICE` are:
|
||||
|
||||
* pmem: Map platform persistent memory to be used as a direct-I/O target
|
||||
via DAX mappings.
|
||||
|
||||
* hmm: Extend `ZONE_DEVICE` with `->page_fault()` and `->page_free()`
|
||||
event callbacks to allow a device-driver to coordinate memory management
|
||||
events related to device-memory, typically GPU memory. See
|
||||
Documentation/vm/hmm.rst.
|
||||
|
||||
* p2pdma: Create `struct page` objects to allow peer devices in a
|
||||
PCI/-E topology to coordinate direct-DMA operations between themselves,
|
||||
i.e. bypass host memory.
|
||||
|
|
Loading…
Reference in New Issue