linux_old1/drivers/base
Dave Hansen c221c0b030 device-dax: "Hotplug" persistent memory for use like normal RAM
This is intended for use with NVDIMMs that are physically persistent
(physically like flash) so that they can be used as a cost-effective
RAM replacement.  Intel Optane DC persistent memory is one
implementation of this kind of NVDIMM.

Currently, a persistent memory region is "owned" by a device driver,
either the "Direct DAX" or "Filesystem DAX" drivers.  These drivers
allow applications to explicitly use persistent memory, generally
by being modified to use special, new libraries. (DIMM-based
persistent memory hardware/software is described in great detail
here: Documentation/nvdimm/nvdimm.txt).

However, this limits persistent memory use to applications which
*have* been modified.  To make it more broadly usable, this driver
"hotplugs" memory into the kernel, to be managed and used just like
normal RAM would be.

To make this work, management software must remove the device from
being controlled by the "Device DAX" infrastructure:

	echo dax0.0 > /sys/bus/dax/drivers/device_dax/unbind

and then tell the new driver that it can bind to the device:

	echo dax0.0 > /sys/bus/dax/drivers/kmem/new_id

After this, there will be a number of new memory sections visible
in sysfs that can be onlined, or that may get onlined by existing
udev-initiated memory hotplug rules.

This rebinding procedure is currently a one-way trip.  Once memory
is bound to "kmem", it's there permanently and can not be
unbound and assigned back to device_dax.

The kmem driver will never bind to a dax device unless the device
is *explicitly* bound to the driver.  There are two reasons for
this: One, since it is a one-way trip, it can not be undone if
bound incorrectly.  Two, the kmem driver destroys data on the
device.  Think of if you had good data on a pmem device.  It
would be catastrophic if you compile-in "kmem", but leave out
the "device_dax" driver.  kmem would take over the device and
write volatile data all over your good data.

This inherits any existing NUMA information for the newly-added
memory from the persistent memory device that came from the
firmware.  On Intel platforms, the firmware has guarantees that
require each socket's persistent memory to be in a separate
memory-only NUMA node.  That means that this patch is not expected
to create NUMA nodes, but will simply hotplug memory into existing
nodes.

Because NUMA nodes are created, the existing NUMA APIs and tools
are sufficient to create policies for applications or memory areas
to have affinity for or an aversion to using this memory.

There is currently some metadata at the beginning of pmem regions.
The section-size memory hotplug restrictions, plus this small
reserved area can cause the "loss" of a section or two of capacity.
This should be fixable in follow-on patches.  But, as a first step,
losing 256MB of memory (worst case) out of hundreds of gigabytes
is a good tradeoff vs. the required code to fix this up precisely.
This calculation is also the reason we export
memory_block_size_bytes().

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Reviewed-by: Keith Busch <keith.busch@intel.com>
Cc: Dave Jiang <dave.jiang@intel.com>
Cc: Ross Zwisler <zwisler@kernel.org>
Cc: Vishal Verma <vishal.l.verma@intel.com>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: linux-nvdimm@lists.01.org
Cc: linux-kernel@vger.kernel.org
Cc: linux-mm@kvack.org
Cc: Huang Ying <ying.huang@intel.com>
Cc: Fengguang Wu <fengguang.wu@intel.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Yaowei Bai <baiyaowei@cmss.chinamobile.com>
Cc: Takashi Iwai <tiwai@suse.de>
Cc: Jerome Glisse <jglisse@redhat.com>
Reviewed-by: Vishal Verma <vishal.l.verma@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2019-02-28 10:41:23 -08:00
..
firmware_loader firmware: Always initialize the fw_priv list object 2018-09-30 08:49:55 -07:00
power RTC for 4.21 2019-01-01 13:24:31 -08:00
regmap Merge remote-tracking branch 'regmap/topic/irq' into regmap-next 2018-12-19 18:38:33 +00:00
test driver core: Remove redundant license text 2017-12-07 18:36:44 +01:00
Kconfig firmware_loader: move kconfig FW_LOADER entries to its own file 2018-05-14 16:43:10 +02:00
Makefile drivers: base: Introducing software nodes to the firmware node framework 2018-11-26 18:19:11 +01:00
arch_topology.c sched/topology, drivers/base/arch_topology: Rebuild the sched_domain hierarchy when capacities change 2018-09-10 11:05:47 +02:00
attribute_container.c driver core: Remove redundant license text 2017-12-07 18:36:44 +01:00
base.h driver core: remove unnecessary function extern declare 2018-07-16 13:32:20 +02:00
bus.c sysfs: Disable lockdep for driver bind/unbind files 2018-12-19 15:58:40 +01:00
cacheinfo.c drivers: base: cacheinfo: Do not populate sysfs for unknown cache types 2018-10-04 23:02:17 +02:00
class.c driver core: Remove redundant license text 2017-12-07 18:36:44 +01:00
component.c component: convert to DEFINE_SHOW_ATTRIBUTE 2018-12-20 16:33:18 +01:00
container.c driver core: Remove redundant license text 2017-12-07 18:36:44 +01:00
core.c Driver core patches for 4.21-rc1 2018-12-28 20:44:29 -08:00
cpu.c x86/speculation/l1tf: Add sysfs reporting for l1tf 2018-06-20 19:10:00 +02:00
dd.c driver core: Add missing dev->bus->need_parent_lock checks 2018-12-19 10:08:34 +01:00
devcon.c drivers: base: Unified device connection lookup 2018-03-22 13:10:29 +01:00
devcoredump.c driver core: Remove redundant license text 2017-12-07 18:36:44 +01:00
devres.c devres: Align data[] to ARCH_KMALLOC_MINALIGN 2018-11-11 11:40:04 -08:00
devtmpfs.c vfs: Suppress MS_* flag defs within the kernel unless explicitly enabled 2018-12-20 16:32:56 +00:00
driver.c driver-core: return EINVAL error instead of BUG_ON() 2018-05-25 18:18:45 +02:00
firmware.c driver core: Remove redundant license text 2017-12-07 18:36:44 +01:00
hypervisor.c driver core: Remove redundant license text 2017-12-07 18:36:44 +01:00
init.c base: fix order of OF initialization 2018-07-07 17:54:29 +02:00
isa.c Merge 4.15-rc3 into driver-core-next 2017-12-11 08:50:05 +01:00
map.c driver core: Remove redundant license text 2017-12-07 18:36:44 +01:00
memory.c device-dax: "Hotplug" persistent memory for use like normal RAM 2019-02-28 10:41:23 -08:00
module.c driver core: Remove redundant license text 2017-12-07 18:36:44 +01:00
node.c mm, proc: add KReclaimable to /proc/meminfo 2018-10-26 16:26:32 -07:00
pinctrl.c driver core: Remove redundant license text 2017-12-07 18:36:44 +01:00
platform-msi.c platform-msi: Free descriptors in platform_msi_domain_free() 2018-12-13 09:35:31 +00:00
platform.c drivers/base/platform.c: kmemleak ignore a known leak 2019-01-04 13:13:48 -08:00
property.c device property: fix fwnode_graph_get_next_endpoint() documentation 2018-12-18 23:39:45 +01:00
soc.c base: soc: use put_device() instead of kfree() 2018-03-15 14:37:03 +01:00
swnode.c drivers: base: swnode: check if swnode is NULL before dereferencing it 2018-12-26 10:50:36 +01:00
syscore.c driver core: Remove redundant license text 2017-12-07 18:36:44 +01:00
topology.c driver core: Remove redundant license text 2017-12-07 18:36:44 +01:00
transport_class.c driver core: Remove redundant license text 2017-12-07 18:36:44 +01:00