Commit Graph

53 Commits

Author SHA1 Message Date
Dan Williams 7b60422cb7 libnvdimm/security: Consolidate 'security' operations
The security operations are exported from libnvdimm/security.c to
libnvdimm/dimm_devs.c, and libnvdimm/security.c is optionally compiled
based on the CONFIG_NVDIMM_KEYS config symbol.

Rather than export the operations across compile objects, just move the
__security_store() entry point to live with the helpers.

Acked-by: Jeff Moyer <jmoyer@redhat.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Link: https://lore.kernel.org/r/156686730515.184120.10522747907309996674.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2019-08-29 13:51:57 -07:00
Dan Williams bc4f2199ca libnvdimm/security: Tighten scope of nvdimm->busy vs security operations
An attempt to freeze DIMMs currently runs afoul of default blocking of
all security operations in the entry to the 'store' routine for the
'security' sysfs attribute.

The blanket blocking of all security operations while the DIMM is in
active use in a region is too restrictive. The only security operations
that need to be aware of the ->busy state are those that mutate the
state of data, i.e. erase and overwrite.

Refactor the ->busy checks to be applied at the entry common entry point
in __security_store() rather than each of the helper routines to enable
freeze to be run regardless of busy state.

Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Reviewed-by: Jeff Moyer <jmoyer@redhat.com>
Link: https://lore.kernel.org/r/156686729996.184120.3458026302402493937.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2019-08-29 13:51:57 -07:00
Dan Williams d78c620a2e libnvdimm/security: Introduce a 'frozen' attribute
In the process of debugging a system with an NVDIMM that was failing to
unlock it was found that the kernel is reporting 'locked' while the DIMM
security interface is 'frozen'. Unfortunately the security state is
tracked internally as an enum which prevents it from communicating the
difference between 'locked' and 'locked + frozen'. It follows that the
enum also prevents the kernel from communicating 'unlocked + frozen'
which would be useful for debugging why security operations like 'change
passphrase' are disabled.

Ditch the security state enum for a set of flags and introduce a new
sysfs attribute explicitly for the 'frozen' state. The regression risk
is low because the 'frozen' state was already blocked behind the
'locked' state, but will need to revisit if there were cases where
applications need 'frozen' to show up in the primary 'security'
attribute. The expectation is that communicating 'frozen' is mostly a
helper for debug and status monitoring.

Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Reported-by: Jeff Moyer <jmoyer@redhat.com>
Reviewed-by: Jeff Moyer <jmoyer@redhat.com>
Link: https://lore.kernel.org/r/156686729474.184120.5835135644278860826.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2019-08-29 13:49:13 -07:00
Dan Williams 87a30e1f05 driver-core, libnvdimm: Let device subsystems add local lockdep coverage
For good reason, the standard device_lock() is marked
lockdep_set_novalidate_class() because there is simply no sane way to
describe the myriad ways the device_lock() ordered with other locks.
However, that leaves subsystems that know their own local device_lock()
ordering rules to find lock ordering mistakes manually. Instead,
introduce an optional / additional lockdep-enabled lock that a subsystem
can acquire in all the same paths that the device_lock() is acquired.

A conversion of the NFIT driver and NVDIMM subsystem to a
lockdep-validate device_lock() scheme is included. The
debug_nvdimm_lock() implementation implements the correct lock-class and
stacking order for the libnvdimm device topology hierarchy.

Yes, this is a hack, but hopefully it is a useful hack for other
subsystems device_lock() debug sessions. Quoting Greg:

    "Yeah, it feels a bit hacky but it's really up to a subsystem to mess up
     using it as much as anything else, so user beware :)

     I don't object to it if it makes things easier for you to debug."

Cc: Ingo Molnar <mingo@redhat.com>
Cc: Ira Weiny <ira.weiny@intel.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Dave Jiang <dave.jiang@intel.com>
Cc: Keith Busch <keith.busch@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Vishal Verma <vishal.l.verma@intel.com>
Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Link: https://lore.kernel.org/r/156341210661.292348.7014034644265455704.stgit@dwillia2-desk3.amr.corp.intel.com
2019-07-18 16:23:27 -07:00
Thomas Gleixner 5b497af42f treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 295
Based on 1 normalized pattern(s):

  this program is free software you can redistribute it and or modify
  it under the terms of version 2 of the gnu general public license as
  published by the free software foundation this program is
  distributed in the hope that it will be useful but without any
  warranty without even the implied warranty of merchantability or
  fitness for a particular purpose see the gnu general public license
  for more details

extracted by the scancode license scanner the SPDX license identifier

  GPL-2.0-only

has been chosen to replace the boilerplate/reference in 64 file(s).

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Alexios Zavras <alexios.zavras@intel.com>
Reviewed-by: Allison Randal <allison@lohutok.net>
Cc: linux-spdx@vger.kernel.org
Link: https://lkml.kernel.org/r/20190529141901.894819585@linutronix.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-06-05 17:36:38 +02:00
Linus Torvalds 0968621917 Printk changes for 5.2
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEESH4wyp42V4tXvYsjUqAMR0iAlPIFAlzP8nQACgkQUqAMR0iA
 lPK79A/+NkRouqA9ihAZhUbgW0DHzOAFvUJSBgX11HQAZbGjngakuoyYFvwUx0T0
 m80SUTCysxQrWl+xLdccPZ9ZrhP2KFQrEBEdeYHZ6ymcYcl83+3bOIBS7VwdZAbO
 EzB8u/58uU/sI6ABL4lF7ZF/+R+U4CXveEUoVUF04bxdPOxZkRX4PT8u3DzCc+RK
 r4yhwQUXGcKrHa2GrRL3GXKsDxcnRdFef/nzq4RFSZsi0bpskzEj34WrvctV6j+k
 FH/R3kEcZrtKIMPOCoDMMWq07yNqK/QKj0MJlGoAlwfK4INgcrSXLOx+pAmr6BNq
 uMKpkxCFhnkZVKgA/GbKEGzFf+ZGz9+2trSFka9LD2Ig6DIstwXqpAgiUK8JFQYj
 lq1mTaJZD3DfF2vnGHGeAfBFG3XETv+mIT/ow6BcZi3NyNSVIaqa5GAR+lMc6xkR
 waNkcMDkzLFuP1r0p7ZizXOksk9dFkMP3M6KqJomRtApwbSNmtt+O2jvyLPvB3+w
 wRyN9WT7IJZYo4v0rrD5Bl6BjV15ZeCPRSFZRYofX+vhcqJQsFX1M9DeoNqokh55
 Cri8f6MxGzBVjE1G70y2/cAFFvKEKJud0NUIMEuIbcy+xNrEAWPF8JhiwpKKnU10
 c0u674iqHJ2HeVsYWZF0zqzqQ6E1Idhg/PrXfuVuhAaL5jIOnYY=
 =WZfC
 -----END PGP SIGNATURE-----

Merge tag 'printk-for-5.2' of git://git.kernel.org/pub/scm/linux/kernel/git/pmladek/printk

Pull printk updates from Petr Mladek:

 - Allow state reset of printk_once() calls.

 - Prevent crashes when dereferencing invalid pointers in vsprintf().
   Only the first byte is checked for simplicity.

 - Make vsprintf warnings consistent and inlined.

 - Treewide conversion of obsolete %pf, %pF to %ps, %pF printf
   modifiers.

 - Some clean up of vsprintf and test_printf code.

* tag 'printk-for-5.2' of git://git.kernel.org/pub/scm/linux/kernel/git/pmladek/printk:
  lib/vsprintf: Make function pointer_string static
  vsprintf: Limit the length of inlined error messages
  vsprintf: Avoid confusion between invalid address and value
  vsprintf: Prevent crash when dereferencing invalid pointers
  vsprintf: Consolidate handling of unknown pointer specifiers
  vsprintf: Factor out %pO handler as kobject_string()
  vsprintf: Factor out %pV handler as va_format()
  vsprintf: Factor out %p[iI] handler as ip_addr_string()
  vsprintf: Do not check address of well-known strings
  vsprintf: Consistent %pK handling for kptr_restrict == 0
  vsprintf: Shuffle restricted_pointer()
  printk: Tie printk_once / printk_deferred_once into .data.once for reset
  treewide: Switch printk users from %pf and %pF to %ps and %pS, respectively
  lib/test_printf: Switch to bitmap_zalloc()
2019-05-07 09:18:12 -07:00
Sakari Ailus d75f773c86 treewide: Switch printk users from %pf and %pF to %ps and %pS, respectively
%pF and %pf are functionally equivalent to %pS and %ps conversion
specifiers. The former are deprecated, therefore switch the current users
to use the preferred variant.

The changes have been produced by the following command:

	git grep -l '%p[fF]' | grep -v '^\(tools\|Documentation\)/' | \
	while read i; do perl -i -pe 's/%pf/%ps/g; s/%pF/%pS/g;' $i; done

And verifying the result.

Link: http://lkml.kernel.org/r/20190325193229.23390-1-sakari.ailus@linux.intel.com
Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: sparclinux@vger.kernel.org
Cc: linux-um@lists.infradead.org
Cc: xen-devel@lists.xenproject.org
Cc: linux-acpi@vger.kernel.org
Cc: linux-pm@vger.kernel.org
Cc: drbd-dev@lists.linbit.com
Cc: linux-block@vger.kernel.org
Cc: linux-mmc@vger.kernel.org
Cc: linux-nvdimm@lists.01.org
Cc: linux-pci@vger.kernel.org
Cc: linux-scsi@vger.kernel.org
Cc: linux-btrfs@vger.kernel.org
Cc: linux-f2fs-devel@lists.sourceforge.net
Cc: linux-mm@kvack.org
Cc: ceph-devel@vger.kernel.org
Cc: netdev@vger.kernel.org
Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com>
Acked-by: David Sterba <dsterba@suse.com> (for btrfs)
Acked-by: Mike Rapoport <rppt@linux.ibm.com> (for mm/memblock.c)
Acked-by: Bjorn Helgaas <bhelgaas@google.com> (for drivers/pci)
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Petr Mladek <pmladek@suse.com>
2019-04-09 14:19:06 +02:00
Dan Williams 451fed24e9 Merge branch 'for-5.1/libnvdimm' into libnvdimm-for-next
Merge miscellaneous libnvdimm sub-system updates for v5.1. Highlights
include:

* Support for the Hyper-V family of device-specific-methods (DSMs)
* Several fixes and workarounds for Hyper-V compatibility.
* Fix for the support to cache the dirty-shutdown-count at init.
2019-03-11 12:13:42 -07:00
Dan Williams d5d30d5a5c libnvdimm/dimm: Add a no-BLK quirk based on NVDIMM family
As Dexuan reports the NVDIMM_FAMILY_HYPERV platform is incompatible with
the existing Linux namespace implementation because it uses
NSLABEL_FLAG_LOCAL for x1-width PMEM interleave sets. Quirk it as an
platform / DIMM that does not provide BLK-aperture access. Allow the
libnvdimm core to assume no potential for aliasing. In case other
implementations make the same mistake, provide a "noblk" module
parameter to force-enable the quirk.

Link: https://lkml.kernel.org/r/PU1P153MB0169977604493B82B662A01CBF920@PU1P153MB0169.APCP153.PROD.OUTLOOK.COM
Reported-by: Dexuan Cui <decui@microsoft.com>
Tested-by: Dexuan Cui <decui@microsoft.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2019-02-02 16:35:26 -08:00
Dan Williams 1cd7386549 libnvdimm/security: Require nvdimm_security_setup_events() to succeed
The following warning:

    ACPI0012:00: security event setup failed: -19

...is meant to capture exceptional failures of sysfs_get_dirent(),
however it will also fail in the common case when security support is
disabled. A few issues:

1/ A dev_warn() report for a common case is too chatty
2/ The setup of this notifier is generic, no need for it to be driven
   from the nfit driver, it can exist completely in the core.
3/ If it fails for any reason besides security support being disabled,
   that's fatal and should abort DIMM activation. Userspace may hang if
   it never gets overwrite notifications.
4/ The dirent needs to be released.

Move the call to the core 'dimm' driver, make it conditional on security
support being active, make it fatal for the exceptional case, add the
missing sysfs_put() at device disable time.

Fixes: 7d988097c5 ("...Add security DSM overwrite support")
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2019-01-21 09:57:43 -08:00
Dave Jiang 3c13e2ac74 tools/testing/nvdimm: Add test support for Intel nvdimm security DSMs
Add nfit_test support for DSM functions "Get Security State",
"Set Passphrase", "Disable Passphrase", "Unlock Unit", "Freeze Lock",
and "Secure Erase" for the fake DIMMs.

Also adding a sysfs knob in order to put the DIMMs in "locked" state. The
order of testing DIMM unlocking would be.
1a. Disable DIMM X.
1b. Set Passphrase to DIMM X.
2. Write to
/sys/devices/platform/nfit_test.0/nfit_test_dimm/test_dimmX/lock_dimm
3. Renable DIMM X
4. Check DIMM X state via sysfs "security" attribute for nmemX.

Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2018-12-21 12:44:41 -08:00
Dave Jiang 89fa9d8ea7 acpi/nfit, libnvdimm/security: add Intel DSM 1.8 master passphrase support
With Intel DSM 1.8 [1] two new security DSMs are introduced. Enable/update
master passphrase and master secure erase. The master passphrase allows
a secure erase to be performed without the user passphrase that is set on
the NVDIMM. The commands of master_update and master_erase are added to
the sysfs knob in order to initiate the DSMs. They are similar in opeartion
mechanism compare to update and erase.

[1]: http://pmem.io/documents/NVDIMM_DSM_Interface-V1.8.pdf

Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2018-12-21 12:44:41 -08:00
Dave Jiang 7d988097c5 acpi/nfit, libnvdimm/security: Add security DSM overwrite support
Add support for the NVDIMM_FAMILY_INTEL "ovewrite" capability as
described by the Intel DSM spec v1.7. This will allow triggering of
overwrite on Intel NVDIMMs. The overwrite operation can take tens of
minutes. When the overwrite DSM is issued successfully, the NVDIMMs will
be unaccessible. The kernel will do backoff polling to detect when the
overwrite process is completed. According to the DSM spec v1.7, the 128G
NVDIMMs can take up to 15mins to perform overwrite and larger DIMMs will
take longer.

Given that overwrite puts the DIMM in an indeterminate state until it
completes introduce the NDD_SECURITY_OVERWRITE flag to prevent other
operations from executing when overwrite is happening. The
NDD_WORK_PENDING flag is added to denote that there is a device reference
on the nvdimm device for an async workqueue thread context.

Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2018-12-21 12:44:41 -08:00
Dave Jiang 64e77c8c04 acpi/nfit, libnvdimm: Add support for issue secure erase DSM to Intel nvdimm
Add support to issue a secure erase DSM to the Intel nvdimm. The
required passphrase is acquired from an encrypted key in the kernel user
keyring. To trigger the action, "erase <keyid>" is written to the
"security" sysfs attribute.

Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2018-12-21 12:44:41 -08:00
Dave Jiang d2a4ac73f5 acpi/nfit, libnvdimm: Add enable/update passphrase support for Intel nvdimms
Add support for enabling and updating passphrase on the Intel nvdimms.
The passphrase is the an encrypted key in the kernel user keyring.
We trigger the update via writing "update <old_keyid> <new_keyid>" to the
sysfs attribute "security". If no <old_keyid> exists (for enabling
security) then a 0 should be used.

Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2018-12-21 12:44:41 -08:00
Dave Jiang 03b65b22ad acpi/nfit, libnvdimm: Add disable passphrase support to Intel nvdimm.
Add support to disable passphrase (security) for the Intel nvdimm. The
passphrase used for disabling is pulled from an encrypted-key in the kernel
user keyring. The action is triggered by writing "disable <keyid>" to the
sysfs attribute "security".

Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2018-12-21 12:44:41 -08:00
Dave Jiang 37833fb798 acpi/nfit, libnvdimm: Add freeze security support to Intel nvdimm
Add support for freeze security on Intel nvdimm. This locks out any
changes to security for the DIMM until a hard reset of the DIMM is
performed. This is triggered by writing "freeze" to the generic
nvdimm/nmemX "security" sysfs attribute.

Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Co-developed-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2018-12-13 17:54:13 -08:00
Dave Jiang f298939655 acpi/nfit, libnvdimm: Introduce nvdimm_security_ops
Some NVDIMMs, like the ones defined by the NVDIMM_FAMILY_INTEL command
set, expose a security capability to lock the DIMMs at poweroff and
require a passphrase to unlock them. The security model is derived from
ATA security. In anticipation of other DIMMs implementing a similar
scheme, and to abstract the core security implementation away from the
device-specific details, introduce nvdimm_security_ops.

Initially only a status retrieval operation, ->state(), is defined,
along with the base infrastructure and definitions for future
operations.

Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Co-developed-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2018-12-13 17:54:13 -08:00
Dave Jiang d6548ae4d1 acpi/nfit, libnvdimm: Store dimm id as a member to struct nvdimm
The generated dimm id is needed for the sysfs attribute as well as being
used as the identifier/description for the security key. Since it's
constant and should never change, store it as a member of struct nvdimm.

As nvdimm_create() continues to grow parameters relative to NFIT driver
requirements, do not require other implementations to keep pace.
Introduce __nvdimm_create() to carry the new parameters and keep
nvdimm_create() with the long standing default api.

Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2018-12-13 17:54:12 -08:00
Alexander Duyck 2d657d17f7 nvdimm: Split label init out from the logic for getting config data
This patch splits the initialization of the label data into two functions.
One for doing the init, and another for reading the actual configuration
data. The idea behind this is that by doing this we create a symmetry
between the getting and setting of config data in that we have a function
for both. In addition it will make it easier for us to identify the bits
that are related to init versus the pieces that are a wrapper for reading
data from the ACPI interface.

So for example by splitting things out like this it becomes much more
obvious that we were performing checks that weren't necessarily related to
the set/get operations such as relying on ndd->data being present when the
set and get ops should not care about a locally cached copy of the label
area.

Reviewed-by: Toshi Kani <toshi.kani@hpe.com>
Signed-off-by: Alexander Duyck <alexander.h.duyck@linux.intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2018-10-12 08:39:24 -07:00
Dan Williams d11cf4a732 libnvdimm, dimm: Maximize label transfer size
Use kvzalloc() to bypass the arbitrary PAGE_SIZE limit of label transfer
operations. Given the expense of calling into firmware, maximize the
amount of label data we transfer per call to be up to the total label
space if allowed by the firmware.

Instead of limiting based on PAGE_SIZE we can instead simply limit the
maximum size based on either the config_size int he case of the get
operation, or the length of the write based on the set operation.

On a system with 24 NVDIMM modules each with a config_size of 128K and a
maximum transfer size of 64K - 4, this patch reduces the init time for the
label data from around 24 seconds down to between 4-5 seconds.

Reviewed-by: Toshi Kani <toshi.kani@hpe.com>
Signed-off-by: Alexander Duyck <alexander.h.duyck@linux.intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2018-10-10 20:54:16 -07:00
Keith Busch 12e3129e29 libnvdimm: Use max contiguous area for namespace size
This patch will find the max contiguous area to determine the largest
pmem namespace size that can be created. If the requested size exceeds
the largest available, ENOSPC error will be returned.

This fixes the allocation underrun error and wrong error return code
that have otherwise been observed as the following kernel warning:

  WARNING: CPU: <CPU> PID: <PID> at drivers/nvdimm/namespace_devs.c:913 size_store

Fixes: a1f3e4d6a0 ("libnvdimm, region: update nd_region_available_dpa() for multi-pmem support")
Cc: <stable@vger.kernel.org>
Signed-off-by: Keith Busch <keith.busch@intel.com>
Reviewed-by: Vishal Verma <vishal.l.verma@intel.com>
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
2018-07-25 14:11:09 -07:00
Dan Williams e7c5a571a8 libnvdimm, dimm: handle EACCES failures from label reads
The new support for the standard _LSR and _LSW methods neglected to also
update the nvdimm_init_config_data() and nvdimm_set_config_data() to
return the translated error code from failed commands. This precision is
necessary because the locked status that was previously returned on
ND_CMD_GET_CONFIG_SIZE commands is now returned on
ND_CMD_{GET,SET}_CONFIG_DATA commands.

If the kernel misses this indication it can inadvertently fall back to
label-less mode when it should otherwise avoid all access to locked
regions.

Cc: <stable@vger.kernel.org>
Fixes: 4b27db7e26 ("acpi, nfit: add support for the _LSI, _LSR, and...")
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2018-04-16 08:18:51 -07:00
Dan Williams 426824d63b libnvdimm: remove redundant __func__ in dev_dbg
Dynamic debug can be instructed to add the function name to the debug
output using the +f switch, so there is no need for the libnvdimm
modules to do it again. If a user decides to add the +f switch for
libnvdimm's dynamic debug this results in double prints of the function
name.

Reported-by: Johannes Thumshirn <jthumshirn@suse.de>
Reported-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2018-03-06 08:44:17 -08:00
Dan Williams efbf6f50ad libnvdimm: introduce 'flags' attribute for DIMM 'lock' and 'alias' status
Given that we now how have two mechanisms for a DIMM to indicate that it
is locked:

    * NVDIMM_FAMILY_INTEL 'get_config_size' _DSM command

    * ACPI 6.2 Label Storage Read / Write commands

...export the generic libnvdimm DIMM status in a new 'flags' attribute.

This attribute can also reflect the 'alias' state which indicates
whether the nvdimm core is enforcing labels for aliased-region-capacity
that the given dimm is an interleave-set member.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2017-10-07 10:04:06 -07:00
Dan Williams d34cb80840 libnvdimm, dimm: clear 'locked' status on successful DIMM enable
If we successfully enable a DIMM then it must not be locked and we can
clear the label-read failure condition. Otherwise, we need to reload the
entire bus provider driver to achieve the same effect, and that can
disrupt unrelated DIMMs and namespaces.

Fixes: 9d62ed9651 ("libnvdimm: handle locked label storage areas")
Cc: <stable@vger.kernel.org>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2017-09-28 09:10:14 -07:00
Dan Williams c9e582aa68 libnvdimm, nfit: enable support for volatile ranges
Allow volatile nfit ranges to participate in all the same infrastructure
provided for persistent memory regions. A resulting resulting namespace
device will still be called "pmem", but the parent region type will be
"nd_volatile". This is in preparation for disabling the dax ->flush()
operation in the pmem driver when it is hosted on a volatile range.

Cc: Jan Kara <jack@suse.cz>
Cc: Jeff Moyer <jmoyer@redhat.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Matthew Wilcox <mawilcox@microsoft.com>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2017-06-27 16:44:13 -07:00
Dan Williams ca6a4657e5 x86, libnvdimm, pmem: remove global pmem api
Now that all callers of the pmem api have been converted to dax helpers that
call back to the pmem driver, we can remove include/linux/pmem.h and
asm/pmem.h.

Cc: <x86@kernel.org>
Cc: Jeff Moyer <jmoyer@redhat.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Toshi Kani <toshi.kani@hpe.com>
Cc: Oliver O'Halloran <oohall@gmail.com>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2017-06-27 16:29:54 -07:00
Michal Hocko 752ade68cb treewide: use kv[mz]alloc* rather than opencoded variants
There are many code paths opencoding kvmalloc.  Let's use the helper
instead.  The main difference to kvmalloc is that those users are
usually not considering all the aspects of the memory allocator.  E.g.
allocation requests <= 32kB (with 4kB pages) are basically never failing
and invoke OOM killer to satisfy the allocation.  This sounds too
disruptive for something that has a reasonable fallback - the vmalloc.
On the other hand those requests might fallback to vmalloc even when the
memory allocator would succeed after several more reclaim/compaction
attempts previously.  There is no guarantee something like that happens
though.

This patch converts many of those places to kv[mz]alloc* helpers because
they are more conservative.

Link: http://lkml.kernel.org/r/20170306103327.2766-2-mhocko@kernel.org
Signed-off-by: Michal Hocko <mhocko@suse.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> # Xen bits
Acked-by: Kees Cook <keescook@chromium.org>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Andreas Dilger <andreas.dilger@intel.com> # Lustre
Acked-by: Christian Borntraeger <borntraeger@de.ibm.com> # KVM/s390
Acked-by: Dan Williams <dan.j.williams@intel.com> # nvdim
Acked-by: David Sterba <dsterba@suse.com> # btrfs
Acked-by: Ilya Dryomov <idryomov@gmail.com> # Ceph
Acked-by: Tariq Toukan <tariqt@mellanox.com> # mlx4
Acked-by: Leon Romanovsky <leonro@mellanox.com> # mlx5
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Anton Vorontsov <anton@enomsg.org>
Cc: Colin Cross <ccross@android.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Cc: Ben Skeggs <bskeggs@redhat.com>
Cc: Kent Overstreet <kent.overstreet@gmail.com>
Cc: Santosh Raspatur <santosh@chelsio.com>
Cc: Hariprasad S <hariprasad@chelsio.com>
Cc: Yishai Hadas <yishaih@mellanox.com>
Cc: Oleg Drokin <oleg.drokin@intel.com>
Cc: "Yan, Zheng" <zyan@redhat.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: David Miller <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-05-08 17:15:13 -07:00
Dan Williams 9d62ed9651 libnvdimm: handle locked label storage areas
Per the latest version of the "NVDIMM DSM Interface Example" [1], the
label data retrieval routine can report a "locked" status. In this case
all regions associated with that DIMM are disabled until the label area
is unlocked. Provide generic libnvdimm enabling for NVDIMMs with label
data area locking capabilities.

[1]: http://pmem.io/documents/

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2017-05-04 15:41:39 -07:00
Dan Williams 8f078b38dd libnvdimm: convert NDD_ flags to use bitops, introduce NDD_LOCKED
This is a preparation patch for handling locked nvdimm label regions, a
new concept as introduced by the latest DSM document on pmem.io [1]. A
future patch will leverage nvdimm_set_locked() at DIMM probe time to
flag regions that can not be enabled. There should be no functional
difference resulting from this change.

[1]: http://pmem.io/documents/NVDIMM_DSM_Interface_Example-V1.3.pdf

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2017-05-04 14:01:24 -07:00
Dan Williams fe514739d8 libnvdimm: fix blk free space accounting
Commit a1f3e4d6a0 "libnvdimm, region: update nd_region_available_dpa()
for multi-pmem support" reworked blk dpa (DIMM Physical Address)
accounting to comprehend multiple pmem namespace allocations aliasing
with a given blk-dpa range.

The following call trace is a result of failing to account for allocated
blk capacity.

 WARNING: CPU: 1 PID: 2433 at tools/testing/nvdimm/../../../drivers/nvdimm/names
4 size_store+0x6f3/0x930 [libnvdimm]
 nd_region region5: allocation underrun: 0x0 of 0x1000000 bytes
 [..]
 Call Trace:
  dump_stack+0x86/0xc3
  __warn+0xcb/0xf0
  warn_slowpath_fmt+0x5f/0x80
  size_store+0x6f3/0x930 [libnvdimm]
  dev_attr_store+0x18/0x30

If a given blk-dpa allocation does not alias with any pmem ranges then
the full allocation should be accounted as busy space, not the size of
the current pmem contribution to the region.

The thinkos that led to this confusion was not realizing that the struct
resource management is already guaranteeing no collisions between pmem
allocations and blk allocations on the same dimm. Also, we do not try to
support blk allocations in aliased pmem holes.

This patch also fixes a case where the available blk goes negative.

Cc: <stable@vger.kernel.org>
Fixes: a1f3e4d6a0 ("libnvdimm, region: update nd_region_available_dpa() for multi-pmem support").
Reported-by: Dariusz Dokupil <dariusz.dokupil@intel.com>
Reported-by: Dave Jiang <dave.jiang@intel.com>
Reported-by: Vishal Verma <vishal.l.verma@intel.com>
Tested-by: Dave Jiang <dave.jiang@intel.com>
Tested-by: Vishal Verma <vishal.l.verma@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2017-04-04 15:08:36 -07:00
Dan Williams 42237e393f libnvdimm: allow a platform to force enable label support
Platforms like QEMU-KVM implement an NFIT table and label DSMs.
However, since that environment does not define an aliased
configuration, the labels are currently ignored and the kernel registers
a single full-sized pmem-namespace per region. Now that the kernel
supports sub-divisions of pmem regions the labels have a purpose.
Arrange for the labels to be honored when we find an existing / valid
namespace index block.

Cc: <qemu-devel@nongnu.org>
Cc: Haozhong Zhang <haozhong.zhang@intel.com>
Cc: Xiao Guangrong <guangrong.xiao@linux.intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2016-10-19 08:57:33 -07:00
Dan Williams 762d067dba libnvdimm, namespace: enable allocation of multiple pmem namespaces
Now that we have nd_region_available_dpa() able to handle the presence
of multiple PMEM allocations in aliased PMEM regions, reuse that same
infrastructure to track allocations from free space.  In particular
handle allocating from an aliased PMEM region in the case where there
are dis-contiguous holes.  The allocation for BLK and PMEM are
documented in the space_valid() helper:

    BLK-space is valid as long as it does not precede a PMEM
    allocation in a given region. PMEM-space must be contiguous
    and adjacent to an existing existing allocation (if one
    exists).

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2016-10-07 09:22:53 -07:00
Dan Williams a1f3e4d6a0 libnvdimm, region: update nd_region_available_dpa() for multi-pmem support
The free dpa (dimm-physical-address) space calculation reports how much
free space is available with consideration for aliased BLK + PMEM
regions.  Recall that BLK capacity is allocated from high addresses and
PMEM is allocated from low addresses in their respective regions.

nd_region_available_dpa() accounts for the fact that the largest
encroachment (lowest starting address) into PMEM capacity by a BLK
allocation limits the available capacity to that point, regardless if
there is BLK allocation hole at a higher address.  Similarly, for the
multi-pmem case we need to track the largest encroachment (highest
 ending address) of a PMEM allocation in BLK capacity regardless of
whether there is an allocation hole that a BLK allocation could fill at
a lower address.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2016-10-07 09:20:53 -07:00
Toshi Kani aee6598748 libnvdimm: Fix nvdimm_probe error on NVDIMM-N
'ndctl list --buses --dimms' does not list any NVDIMM-Ns since
they are considered as idle.  ndctl checks if any driver is
attached to nmem device.  nvdimm_probe() always fails in
nvdimm_init_nsarea() since NVDIMM-Ns do not implement optinal
ND_CMD_GET_CONFIG_DATA command.

Change nvdimm_probe() to accept the case that the CONFIG_DATA
command is not implemented for NVDIMM-Ns.  The driver attaches
without ndd, which keeps it no-op to the device.

Reported-by: Brian Boylston <brian.boylston@hpe.com>
Signed-off-by: Toshi Kani <toshi.kani@hpe.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Tested-by: Johannes Thumshirn <jthumshirn@suse.de>
Acked-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2016-09-01 18:20:39 -07:00
Dan Williams ba9c8dd3c2 acpi, nfit: add dimm device notification support
Per "ACPI 6.1 Section 9.20.3" NVDIMM devices, children of the ACPI0012
NVDIMM Root device, can receive health event notifications.

Given that these devices are precluded from registering a notification
handler via acpi_driver.acpi_device_ops (due to no _HID), we use
acpi_install_notify_handler() directly.  The registered handler,
acpi_nvdimm_notify(), triggers a poll(2) event on the nmemX/nfit/flags
sysfs attribute when a health event notification is received.

Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Tested-by: Toshi Kani <toshi.kani@hpe.com>
Reviewed-by: Vishal Verma <vishal.l.verma@intel.com>
Acked-by: Rafael J. Wysocki <rafael@kernel.org>
Reviewed-by: Toshi Kani <toshi.kani@hpe.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2016-08-29 14:55:17 -07:00
Dan Williams e5ae3b252c libnvdimm, nfit: move flush hint mapping to region-device driver-data
In preparation for triggering flushes of a DIMM's writes-posted-queue
(WPQ) via the pmem driver move mapping of flush hint addresses to the
region driver.  Since this uses devm_nvdimm_memremap() the flush
addresses will remain mapped while any region to which the dimm belongs
is active.

We need to communicate more information to the nvdimm core to facilitate
this mapping, namely each dimm object now carries an array of flush hint
address resources.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2016-07-11 15:09:26 -07:00
Dan Williams 36092ee8ba Merge branch 'for-4.7/dax' into libnvdimm-for-next 2016-05-21 12:33:04 -07:00
Dan Williams b354aba016 libnvdimm: release ida resources
ida instances allocate some internal memory for ->free_bitmap in
addition to the base 'struct ida'.  Use ida_destroy() to release that
memory at module_exit().

Reported-by: Johannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2016-05-20 22:02:56 -07:00
Dan Williams e3654eca70 nfit, libnvdimm: clarify "commands" vs "_DSMs"
Clarify the distinction between "commands", the ioctls userspace calls
to request the kernel take some action on a given dimm device, and
"_DSMs", the actual function numbers used in the firmware interface to
the DIMM.  _DSMs are ACPI specific whereas commands are Linux kernel
generic.

This is in preparation for breaking the 1:1 implicit relationship
between the kernel ioctl number space and the firmware specific function
numbers.

Cc: Jerry Hoemann <jerry.hoemann@hpe.com>
Cc: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2016-04-28 16:23:16 -07:00
Dan Williams aef2533822 libnvdimm, nfit: centralize command status translation
The return value from an 'ndctl_fn' reports the command execution
status, i.e. was the command properly formatted and was it successfully
submitted to the bus provider.  The new 'cmd_rc' parameter allows the bus
provider to communicate command specific results, translated into
common error codes.

Convert the ARS commands to this scheme to:

1/ Consolidate status reporting

2/ Prepare for for expanding ars unit test cases

3/ Make the implementation more generic

Cc: Vishal Verma <vishal.l.verma@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2016-03-05 12:24:06 -08:00
yalin wang a06a757652 nvdimm: change to use generic kvfree()
Signed-off-by: yalin wang <yalin.wang2010@gmail.com>
Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2015-08-27 19:35:48 -04:00
Ross Zwisler 047fc8a1f9 libnvdimm, nfit, nd_blk: driver for BLK-mode access persistent memory
The libnvdimm implementation handles allocating dimm address space (DPA)
between PMEM and BLK mode interfaces.  After DPA has been allocated from
a BLK-region to a BLK-namespace the nd_blk driver attaches to handle I/O
as a struct bio based block device. Unlike PMEM, BLK is required to
handle platform specific details like mmio register formats and memory
controller interleave.  For this reason the libnvdimm generic nd_blk
driver calls back into the bus provider to carry out the I/O.

This initial implementation handles the BLK interface defined by the
ACPI 6 NFIT [1] and the NVDIMM DSM Interface Example [2] composed from
DCR (dimm control region), BDW (block data window), IDT (interleave
descriptor) NFIT structures and the hardware register format.
[1]: http://www.uefi.org/sites/default/files/resources/ACPI_6.0.pdf
[2]: http://pmem.io/documents/NVDIMM_DSM_Interface_Example.pdf

Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Boaz Harrosh <boaz@plexistor.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jens Axboe <axboe@fb.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2015-06-26 11:23:38 -04:00
Dan Williams 0ba1c63489 libnvdimm: write blk label set
After 'uuid', 'size', 'sector_size', and optionally 'alt_name' have been
set to valid values the labels on the dimm can be updated.  The
difference with the pmem case is that blk namespaces are limited to one
dimm and can cover discontiguous ranges in dpa space.

Also, after allocating label slots, it is useful for userspace to know
how many slots are left.  Export this information in sysfs.

Cc: Greg KH <gregkh@linuxfoundation.org>
Cc: Neil Brown <neilb@suse.de>
Acked-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2015-06-24 21:24:10 -04:00
Dan Williams f524bf271a libnvdimm: write pmem label set
After 'uuid', 'size', and optionally 'alt_name' have been set to valid
values the labels on the dimms can be updated.

Write procedure is:
1/ Allocate and write new labels in the "next" index
2/ Free the old labels in the working copy
3/ Write the bitmap and the label space on the dimm
4/ Write the index to make the update valid

Label ranges directly mirror the dpa resource values for the given
label_id of the namespace.

Cc: Greg KH <gregkh@linuxfoundation.org>
Cc: Neil Brown <neilb@suse.de>
Acked-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2015-06-24 21:24:10 -04:00
Dan Williams 1b40e09a12 libnvdimm: blk labels and namespace instantiation
A blk label set describes a namespace comprised of one or more
discontiguous dpa ranges on a single dimm.  They may alias with one or
more pmem interleave sets that include the given dimm.

This is the runtime/volatile configuration infrastructure for sysfs
manipulation of 'alt_name', 'uuid', 'size', and 'sector_size'.  A later
patch will make these settings persistent by writing back the label(s).

Unlike pmem namespaces, multiple blk namespaces can be created per
region.  Once a blk namespace has been created a new seed device
(unconfigured child of a parent blk region) is instantiated.  As long as
a region has 'available_size' != 0 new child namespaces may be created.

Cc: Greg KH <gregkh@linuxfoundation.org>
Cc: Neil Brown <neilb@suse.de>
Acked-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2015-06-24 21:24:10 -04:00
Dan Williams bf9bccc14c libnvdimm: pmem label sets and namespace instantiation.
A complete label set is a PMEM-label per-dimm per-interleave-set where
all the UUIDs match and the interleave set cookie matches the hosting
interleave set.

Present sysfs attributes for manipulation of a PMEM-namespace's
'alt_name', 'uuid', and 'size' attributes.  A later patch will make
these settings persistent by writing back the label.

Note that PMEM allocations grow forwards from the start of an interleave
set (lowest dimm-physical-address (DPA)).  BLK-namespaces that alias
with a PMEM interleave set will grow allocations backward from the
highest DPA.

Cc: Greg KH <gregkh@linuxfoundation.org>
Cc: Neil Brown <neilb@suse.de>
Acked-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2015-06-24 21:24:10 -04:00
Dan Williams 4a826c83db libnvdimm: namespace indices: read and validate
This on media label format [1] consists of two index blocks followed by
an array of labels.  None of these structures are ever updated in place.
A sequence number tracks the current active index and the next one to
write, while labels are written to free slots.

    +------------+
    |            |
    |  nsindex0  |
    |            |
    +------------+
    |            |
    |  nsindex1  |
    |            |
    +------------+
    |   label0   |
    +------------+
    |   label1   |
    +------------+
    |            |
     ....nslot...
    |            |
    +------------+
    |   labelN   |
    +------------+

After reading valid labels, store the dpa ranges they claim into
per-dimm resource trees.

[1]: http://pmem.io/documents/NVDIMM_Namespace_Spec.pdf

Cc: Neil Brown <neilb@suse.de>
Acked-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2015-06-24 21:24:10 -04:00
Dan Williams eaf961536e libnvdimm, nfit: add interleave-set state-tracking infrastructure
On platforms that have firmware support for reading/writing per-dimm
label space, a portion of the dimm may be accessible via an interleave
set PMEM mapping in addition to the dimm's BLK (block-data-window
aperture(s)) interface.  A label, stored in a "configuration data
region" on the dimm, disambiguates which dimm addresses are accessed
through which exclusive interface.

Add infrastructure that allows the kernel to block modifications to a
label in the set while any member dimm is active.  Note that this is
meant only for enforcing "no modifications of active labels" via the
coarse ioctl command.  Adding/deleting namespaces from an active
interleave set is always possible via sysfs.

Another aspect of tracking interleave sets is tracking their integrity
when DIMMs in a set are physically re-ordered.  For this purpose we
generate an "interleave-set cookie" that can be recorded in a label and
validated against the current configuration.  It is the bus provider
implementation's responsibility to calculate the interleave set cookie
and attach it to a given region.

Cc: Neil Brown <neilb@suse.de>
Cc: <linux-acpi@vger.kernel.org>
Cc: Greg KH <gregkh@linuxfoundation.org>
Cc: Robert Moore <robert.moore@intel.com>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Christoph Hellwig <hch@lst.de>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2015-06-24 21:24:10 -04:00