The NVMe spec recommends using the shutdown normal sequence when safely
taking the controller offline instead of hitting CC.EN on the next
start-up to reset the controller. The spec recommends a minimum of 1
second for the shutdown complete. This patch waits 2 seconds to be on
the safe side.
Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
The 'Number of Namespaces' read from the device was being treated as
signed, which would cause us to not scan any namespaces for a device
with more than 2 billion namespaces. That led to noticing that the
namespace ID was also being treated as signed, which could lead to the
result from NVME_IOCTL_ID being treated as an error code.
Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
To build user programs that call the NVMe ioctls, we need to have a
user header file. Catch up to the new way of doing that by splitting
the header file into kernel and uapi portions.
Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
Add io stats accounting for bio requests so nvme block devices show
useful disk stats.
Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
Add definitions for the three Firmware Activate actions, and change the
SCSI translation code to construct the command into a temporary variable
instead of translating the endianness back-and-forth.
Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
Reviewed-by: Vishal Verma <vishal.l.verma@linux.intel.com>
This adds support for namespaces with separate meta-data formats in the
submit io ioctl. The meta-data buffer has to be a contiguous, so such
a buffer is allocated and the mapped user pages are copied to/from this
buffer for write/read commands.
Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
We have an nvme device that has a concept of a stripe size. IO requests
that do not transfer data crossing a stripe boundary has greater
performance compared to IO that does cross it. This patch sets the
stripe size for the device if the device and vendor ids match one with
this feature and splits IO requests that cross the stripe boundary.
Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
Registers a miscellaneous device for each nvme controller probed. This
creates character device files as /dev/nvmeN, where N is the device
instance, and supports nvme admin ioctl commands so devices without
namespaces can be managed.
Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
When constructing the command, dsmgmt needs to be treated as a 32-bit
value, not a 16-bit value. reftag, apptag and appmask all need to be
converted from native-endian to little-endian. Again, sparse's bitwise
warnings caught this problem. Thanks to Keith for pointing out the
correct way to fix the reftag.
Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
Acked-by: Keith Busch <keith.busch@intel.com>
Introduce nvme_block_nr() to help convert sectors to block numbers.
This fixes an integer overflow in the SCSI conversion layer, and it's
slightly less typing than opencoding it.
Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
Acked-by: Keith Busch <keith.busch@intel.com>
Translates SCSI commands in SG_IO ioctl to NVMe commands.
Uses the scsi-nvme translation spec from nvmexpress.org as reference.
Signed-off-by: Vishal Verma <vishal.l.verma@intel.com>
Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
The SCSI emulation has the ability to send format commands, so we need
to add the definition of the command. Also add a missing error code.
Signed-off-by: Vishal Verma <vishal.l.verma@intel.com>
Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
nvme-scsi.c uses several data structures and definitions that were
previously private to nvme-core.c. Move the definitions to nvme.h,
protected by __KERNEL__.
Signed-off-by: Vishal Verma <vishal.l.verma@intel.com>
Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
This adds discard support to block queues if the nvme device is capable of
deallocating blocks as indicated by the controller's optional command support.
A discard flagged bio request will submit an NVMe deallocate Data Set
Management command for the requested blocks.
Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
This data structure is defined in the NVMe specification. It's not used
by the kernel, but is available for use by userspace software.
Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
Set the depth for IO queues to the device's maximum supported queue
entries if the requested depth exceeds the device's capabilities.
Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
Set the max hw sectors in a namespace's request queue if the nvme device
has a max data transfer size.
Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
The driver was still using an old definition of Identify Controller
which only came to light once we started using the 'number of namespaces'
field properly.
Reported-by: Nisheeth Bhat <nisheeth.bhat@intel.com>
Reported-by: Khosrow Panah <Khosrow.Panah@idt.com>
Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
The doorbell stride allows devices to spread out their doorbells instead
of packing them tightly. This feature was added as part of ECN 003.
This patch also enables support for more than 512 queues :-)
Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
Remove the special-purpose IDENTIFY, GET_RANGE_TYPE, DOWNLOAD_FIRMWARE
and ACTIVATE_FIRMWARE commands. Replace them with a generic ADMIN_CMD
ioctl that can submit any admin command.
Add a new ID ioctl that returns the namespace ID of the queried device.
It corresponds to the SCSI Idlun ioctl.
Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
THe device reports (in its capability register) how long it will take
to initialise. If that time elapses before the ready bit becomes set,
conclude the device is broken and refuse to initialise it. Log a nice
error message so the user knows why we did nothing.
Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
The arbitration field was extended by one bit, shifting the shutdown
notification bits by one. Also, the SQ/CQ entry size was made
configurable for future extensions.
Reported-by: Paul Luse <paul.e.luse@intel.com>
Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
The read and write commands don't define a 'result', so there's no need
to copy it back to userspace.
Remove the ability of the ioctl to submit commands to a different
namespace; it's just asking for trouble, and the use case I have in mind
will be addressed througha different ioctl in the future. That removes
the need for both the block_shift and nsid arguments.
Check that the opcode is one of 'read' or 'write'. Future opcodes may
be added in the future, but we will need a different structure definition
for them.
The nblocks field is redefined to be 0-based. This allows the user to
request the full 65536 blocks.
Don't byteswap the reftag, apptag and appmask. Martin Petersen tells
me these are calculated in big-endian and are transmitted to the device
in big-endian.
Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
NVME_IOCTL_SUBMIT_IO has a struct nvme_user_io, not a struct nvme_rw_command
as a parameter, and NVME_IOCTL_DOWNLOAD_FW is a Write, not a Read.
Reported-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
The head can never overrun the tail since we won't allocate enough command
IDs to let that happen. The status codes are in sync with the spec.
Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
Signed-off-by: Krzysztof Wierzbicki <krzysztof.wierzbicki@intel.com>
Signed-off-by: Matthew Wilcox <willy@linux.intel.com>
Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
Add two reserved registers in the middle of the BAR to match the 1.0
spec plus ECN 0002.
Also rename IMC and ISC to INTMC and INTSC to conform with the spec.
We still don't need to use them :-)
Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
Add prp1, prp2 and the metadata prp to the common command, since the
fields are generally used this way.
Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>