From b63de8400a6e1001b5732286cf6f5ec27799b7b4 Mon Sep 17 00:00:00 2001 From: James Smart Date: Fri, 28 Aug 2020 12:01:50 -0700 Subject: [PATCH 1/5] nvme: Revert: Fix controller creation races with teardown flow The indicated patch introduced a barrier in the sysfs_delete attribute for the controller that rejects the request if the controller isn't created. "Created" is defined as at least 1 call to nvme_start_ctrl(). This is problematic in error-injection testing. If an error occurs on the initial attempt to create an association and the controller enters reconnect(s) attempts, the admin cannot delete the controller until either there is a successful association created or ctrl_loss_tmo times out. Where this issue is particularly hurtful is when the "admin" is the nvme-cli, it is performing a connection to a discovery controller, and it is initiated via auto-connect scripts. With the FC transport, if the first connection attempt fails, the controller enters a normal reconnect state but returns control to the cli thread that created the controller. In this scenario, the cli attempts to read the discovery log via ioctl, which fails, causing the cli to see it as an empty log and then proceeds to delete the discovery controller. The delete is rejected and the controller is left live. If the discovery controller reconnect then succeeds, there is no action to delete it, and it sits live doing nothing. Cc: # v5.7+ Fixes: ce1518139e69 ("nvme: Fix controller creation races with teardown flow") Signed-off-by: James Smart CC: Israel Rukshin CC: Max Gurtovoy CC: Christoph Hellwig CC: Keith Busch CC: Sagi Grimberg Signed-off-by: Christoph Hellwig --- drivers/nvme/host/core.c | 5 ----- drivers/nvme/host/nvme.h | 1 - 2 files changed, 6 deletions(-) diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c index 5702a3843746..8b75f6ca0b61 100644 --- a/drivers/nvme/host/core.c +++ b/drivers/nvme/host/core.c @@ -3525,10 +3525,6 @@ static ssize_t nvme_sysfs_delete(struct device *dev, { struct nvme_ctrl *ctrl = dev_get_drvdata(dev); - /* Can't delete non-created controllers */ - if (!ctrl->created) - return -EBUSY; - if (device_remove_file_self(dev, attr)) nvme_delete_ctrl_sync(ctrl); return count; @@ -4403,7 +4399,6 @@ void nvme_start_ctrl(struct nvme_ctrl *ctrl) nvme_queue_scan(ctrl); nvme_start_queues(ctrl); } - ctrl->created = true; } EXPORT_SYMBOL_GPL(nvme_start_ctrl); diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h index 2910f6caab7d..9fd45ff656da 100644 --- a/drivers/nvme/host/nvme.h +++ b/drivers/nvme/host/nvme.h @@ -307,7 +307,6 @@ struct nvme_ctrl { struct nvme_command ka_cmd; struct work_struct fw_act_work; unsigned long events; - bool created; #ifdef CONFIG_NVME_MULTIPATH /* asymmetric namespace access: */ From e126e8210e950bb83414c4f57b3120ddb8450742 Mon Sep 17 00:00:00 2001 From: David Milburn Date: Wed, 2 Sep 2020 17:42:54 -0500 Subject: [PATCH 2/5] nvme-fc: cancel async events before freeing event struct Cancel async event work in case async event has been queued up, and nvme_fc_submit_async_event() runs after event has been freed. Signed-off-by: David Milburn Reviewed-by: Keith Busch Reviewed-by: Sagi Grimberg Signed-off-by: Christoph Hellwig --- drivers/nvme/host/fc.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/nvme/host/fc.c b/drivers/nvme/host/fc.c index a7f474ddfff7..e8ef42b9d50c 100644 --- a/drivers/nvme/host/fc.c +++ b/drivers/nvme/host/fc.c @@ -2160,6 +2160,7 @@ nvme_fc_term_aen_ops(struct nvme_fc_ctrl *ctrl) struct nvme_fc_fcp_op *aen_op; int i; + cancel_work_sync(&ctrl->ctrl.async_event_work); aen_op = ctrl->aen_ops; for (i = 0; i < NVME_NR_AEN_COMMANDS; i++, aen_op++) { __nvme_fc_exit_request(ctrl, aen_op); From 925dd04c1f9825194b9e444c12478084813b2b5d Mon Sep 17 00:00:00 2001 From: David Milburn Date: Wed, 2 Sep 2020 17:42:52 -0500 Subject: [PATCH 3/5] nvme-rdma: cancel async events before freeing event struct Cancel async event work in case async event has been queued up, and nvme_rdma_submit_async_event() runs after event has been freed. Signed-off-by: David Milburn Reviewed-by: Keith Busch Reviewed-by: Sagi Grimberg Signed-off-by: Christoph Hellwig --- drivers/nvme/host/rdma.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/nvme/host/rdma.c b/drivers/nvme/host/rdma.c index 1d8ea5305d60..3247d4ee46b1 100644 --- a/drivers/nvme/host/rdma.c +++ b/drivers/nvme/host/rdma.c @@ -835,6 +835,7 @@ static void nvme_rdma_destroy_admin_queue(struct nvme_rdma_ctrl *ctrl, blk_mq_free_tag_set(ctrl->ctrl.admin_tagset); } if (ctrl->async_event_sqe.data) { + cancel_work_sync(&ctrl->ctrl.async_event_work); nvme_rdma_free_qe(ctrl->device->dev, &ctrl->async_event_sqe, sizeof(struct nvme_command), DMA_TO_DEVICE); ctrl->async_event_sqe.data = NULL; From ceb1e0874dba5cbfc4e0b4145796a4bfb3716e6a Mon Sep 17 00:00:00 2001 From: David Milburn Date: Wed, 2 Sep 2020 17:42:53 -0500 Subject: [PATCH 4/5] nvme-tcp: cancel async events before freeing event struct Cancel async event work in case async event has been queued up, and nvme_tcp_submit_async_event() runs after event has been freed. Signed-off-by: David Milburn Reviewed-by: Keith Busch Reviewed-by: Sagi Grimberg Signed-off-by: Christoph Hellwig --- drivers/nvme/host/tcp.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/nvme/host/tcp.c b/drivers/nvme/host/tcp.c index 873d6afcbc9e..245a73d0a1ca 100644 --- a/drivers/nvme/host/tcp.c +++ b/drivers/nvme/host/tcp.c @@ -1597,6 +1597,7 @@ static struct blk_mq_tag_set *nvme_tcp_alloc_tagset(struct nvme_ctrl *nctrl, static void nvme_tcp_free_admin_queue(struct nvme_ctrl *ctrl) { if (to_tcp_ctrl(ctrl)->async_req.pdu) { + cancel_work_sync(&ctrl->async_event_work); nvme_tcp_free_async_req(to_tcp_ctrl(ctrl)); to_tcp_ctrl(ctrl)->async_req.pdu = NULL; } From 73a5379937ec89b91e907bb315e2434ee9696a2c Mon Sep 17 00:00:00 2001 From: Sagi Grimberg Date: Tue, 8 Sep 2020 12:56:08 -0700 Subject: [PATCH 5/5] nvme-fabrics: allow to queue requests for live queues Right now we are failing requests based on the controller state (which is checked inline in nvmf_check_ready) however we should definitely accept requests if the queue is live. When entering controller reset, we transition the controller into NVME_CTRL_RESETTING, and then return BLK_STS_RESOURCE for non-mpath requests (have blk_noretry_request set). This is also the case for NVME_REQ_USER for the wrong reason. There shouldn't be any reason for us to reject this I/O in a controller reset. We do want to prevent passthru commands on the admin queue because we need the controller to fully initialize first before we let user passthru admin commands to be issued. In a non-mpath setup, this means that the requests will simply be requeued over and over forever not allowing the q_usage_counter to drop its final reference, causing controller reset to hang if running concurrently with heavy I/O. Fixes: 35897b920c8a ("nvme-fabrics: fix and refine state checks in __nvmf_check_ready") Reviewed-by: James Smart Signed-off-by: Sagi Grimberg Signed-off-by: Christoph Hellwig --- drivers/nvme/host/fabrics.c | 12 ++++++++---- 1 file changed, 8 insertions(+), 4 deletions(-) diff --git a/drivers/nvme/host/fabrics.c b/drivers/nvme/host/fabrics.c index 32f61fc5f4c5..8575724734e0 100644 --- a/drivers/nvme/host/fabrics.c +++ b/drivers/nvme/host/fabrics.c @@ -565,10 +565,14 @@ bool __nvmf_check_ready(struct nvme_ctrl *ctrl, struct request *rq, struct nvme_request *req = nvme_req(rq); /* - * If we are in some state of setup or teardown only allow - * internally generated commands. + * currently we have a problem sending passthru commands + * on the admin_q if the controller is not LIVE because we can't + * make sure that they are going out after the admin connect, + * controller enable and/or other commands in the initialization + * sequence. until the controller will be LIVE, fail with + * BLK_STS_RESOURCE so that they will be rescheduled. */ - if (!blk_rq_is_passthrough(rq) || (req->flags & NVME_REQ_USERCMD)) + if (rq->q == ctrl->admin_q && (req->flags & NVME_REQ_USERCMD)) return false; /* @@ -577,7 +581,7 @@ bool __nvmf_check_ready(struct nvme_ctrl *ctrl, struct request *rq, */ switch (ctrl->state) { case NVME_CTRL_CONNECTING: - if (nvme_is_fabrics(req->cmd) && + if (blk_rq_is_passthrough(rq) && nvme_is_fabrics(req->cmd) && req->cmd->fabrics.fctype == nvme_fabrics_type_connect) return true; break;