linux/drivers/s390/scsi/zfcp_aux.c

540 lines
14 KiB
C
Raw Normal View History

// SPDX-License-Identifier: GPL-2.0
/*
* zfcp device driver
*
* Module interface and handling of zfcp data structures.
*
* Copyright IBM Corp. 2002, 2017
*/
/*
* Driver authors:
* Martin Peschke (originator of the driver)
* Raimund Schroeder
* Aron Zeh
* Wolfgang Taphorn
* Stefan Bader
* Heiko Carstens (kernel 2.6 port of the driver)
* Andreas Herrmann
* Maxim Shchetynin
* Volker Sameske
* Ralph Wuerthner
* Michael Loehr
* Swen Schillig
* Christof Schmitt
* Martin Petermann
* Sven Schuetz
[SCSI] zfcp: status read buffers on first adapter open with link down Commit 64deb6efdc5504ce97b5c1c6f281fffbc150bd93 "[SCSI] zfcp: Use status_read_buf_num provided by FCP channel" started using a value returned by the channel but only evaluated the value if the fabric link is up. Commit 8d88cf3f3b9af4713642caeb221b6d6a42019001 "[SCSI] zfcp: Update status read mempool" introduced mempool resizings based on the above value. On setting an FCP device online for the very first time since boot, a new zeroed adapter object is allocated. If the link is down, the number of status read requests remains zero. Since just the config data exchange is incomplete, we proceed with adapter open recovery. However, we unconditionally call mempool_resize with adapter->stat_read_buf_num == 0 in this case. This causes a kernel message "kernel BUG at mm/mempool.c:131!" in process "zfcperp<FCP-device-bus-ID>" with last function mempool_resize in Krnl PSW and zfcp_erp_thread in the Call Trace. Don't evaluate channel values which are invalid on link down. The number of status read requests is always valid, evaluated, and set to a positive minimum greater than zero. The adapter open recovery can proceed and the channel has status read buffers to inform us on a future link up event. While we are not aware of any other code path that could result in mempool resize attempts of size zero, we still also initialize the number of status read buffers to be posted to a static minimum number on adapter object allocation. Signed-off-by: Steffen Maier <maier@linux.vnet.ibm.com> Cc: <stable@vger.kernel.org> #2.6.35+ Signed-off-by: James Bottomley <JBottomley@Parallels.com>
2013-04-26 23:34:54 +08:00
* Steffen Maier
*/
#define KMSG_COMPONENT "zfcp"
#define pr_fmt(fmt) KMSG_COMPONENT ": " fmt
#include <linux/seq_file.h>
include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h percpu.h is included by sched.h and module.h and thus ends up being included when building most .c files. percpu.h includes slab.h which in turn includes gfp.h making everything defined by the two files universally available and complicating inclusion dependencies. percpu.h -> slab.h dependency is about to be removed. Prepare for this change by updating users of gfp and slab facilities include those headers directly instead of assuming availability. As this conversion needs to touch large number of source files, the following script is used as the basis of conversion. http://userweb.kernel.org/~tj/misc/slabh-sweep.py The script does the followings. * Scan files for gfp and slab usages and update includes such that only the necessary includes are there. ie. if only gfp is used, gfp.h, if slab is used, slab.h. * When the script inserts a new include, it looks at the include blocks and try to put the new include such that its order conforms to its surrounding. It's put in the include block which contains core kernel includes, in the same order that the rest are ordered - alphabetical, Christmas tree, rev-Xmas-tree or at the end if there doesn't seem to be any matching order. * If the script can't find a place to put a new include (mostly because the file doesn't have fitting include block), it prints out an error message indicating which .h file needs to be added to the file. The conversion was done in the following steps. 1. The initial automatic conversion of all .c files updated slightly over 4000 files, deleting around 700 includes and adding ~480 gfp.h and ~3000 slab.h inclusions. The script emitted errors for ~400 files. 2. Each error was manually checked. Some didn't need the inclusion, some needed manual addition while adding it to implementation .h or embedding .c file was more appropriate for others. This step added inclusions to around 150 files. 3. The script was run again and the output was compared to the edits from #2 to make sure no file was left behind. 4. Several build tests were done and a couple of problems were fixed. e.g. lib/decompress_*.c used malloc/free() wrappers around slab APIs requiring slab.h to be added manually. 5. The script was run on all .h files but without automatically editing them as sprinkling gfp.h and slab.h inclusions around .h files could easily lead to inclusion dependency hell. Most gfp.h inclusion directives were ignored as stuff from gfp.h was usually wildly available and often used in preprocessor macros. Each slab.h inclusion directive was examined and added manually as necessary. 6. percpu.h was updated not to include slab.h. 7. Build test were done on the following configurations and failures were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my distributed build env didn't work with gcov compiles) and a few more options had to be turned off depending on archs to make things build (like ipr on powerpc/64 which failed due to missing writeq). * x86 and x86_64 UP and SMP allmodconfig and a custom test config. * powerpc and powerpc64 SMP allmodconfig * sparc and sparc64 SMP allmodconfig * ia64 SMP allmodconfig * s390 SMP allmodconfig * alpha SMP allmodconfig * um on x86_64 SMP allmodconfig 8. percpu.h modifications were reverted so that it could be applied as a separate patch and serve as bisection point. Given the fact that I had only a couple of failures from tests on step 6, I'm fairly confident about the coverage of this conversion patch. If there is a breakage, it's likely to be something in one of the arch headers which should be easily discoverable easily on most builds of the specific arch. Signed-off-by: Tejun Heo <tj@kernel.org> Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
2010-03-24 16:04:11 +08:00
#include <linux/slab.h>
#include <linux/module.h>
#include "zfcp_ext.h"
#include "zfcp_fc.h"
#include "zfcp_reqlist.h"
#define ZFCP_BUS_ID_SIZE 20
MODULE_AUTHOR("IBM Deutschland Entwicklung GmbH - linux390@de.ibm.com");
MODULE_DESCRIPTION("FCP HBA driver");
MODULE_LICENSE("GPL");
static char *init_device;
module_param_named(device, init_device, charp, 0400);
MODULE_PARM_DESC(device, "specify initial device");
static struct kmem_cache * __init zfcp_cache_hw_align(const char *name,
unsigned long size)
{
return kmem_cache_create(name, size, roundup_pow_of_two(size), 0, NULL);
}
static void __init zfcp_init_device_configure(char *busid, u64 wwpn, u64 lun)
{
struct ccw_device *cdev;
struct zfcp_adapter *adapter;
struct zfcp_port *port;
cdev = get_ccwdev_by_busid(&zfcp_ccw_driver, busid);
if (!cdev)
return;
if (ccw_device_set_online(cdev))
goto out_ccw_device;
adapter = zfcp_ccw_adapter_by_cdev(cdev);
if (!adapter)
goto out_ccw_device;
port = zfcp_get_port_by_wwpn(adapter, wwpn);
if (!port)
goto out_port;
flush_work(&port->rport_work);
zfcp_unit_add(port, lun);
put_device(&port->dev);
out_port:
zfcp_ccw_adapter_put(adapter);
out_ccw_device:
put_device(&cdev->dev);
return;
}
static void __init zfcp_init_device_setup(char *devstr)
{
char *token;
char *str, *str_saved;
char busid[ZFCP_BUS_ID_SIZE];
u64 wwpn, lun;
/* duplicate devstr and keep the original for sysfs presentation*/
str_saved = kstrdup(devstr, GFP_KERNEL);
str = str_saved;
if (!str)
return;
token = strsep(&str, ",");
if (!token || strlen(token) >= ZFCP_BUS_ID_SIZE)
goto err_out;
strlcpy(busid, token, ZFCP_BUS_ID_SIZE);
token = strsep(&str, ",");
if (!token || kstrtoull(token, 0, (unsigned long long *) &wwpn))
goto err_out;
token = strsep(&str, ",");
if (!token || kstrtoull(token, 0, (unsigned long long *) &lun))
goto err_out;
kfree(str_saved);
zfcp_init_device_configure(busid, wwpn, lun);
return;
err_out:
kfree(str_saved);
pr_err("%s is not a valid SCSI device\n", devstr);
}
static int __init zfcp_module_init(void)
{
int retval = -ENOMEM;
if (zfcp_experimental_dix)
pr_warn("DIX is enabled. It is experimental and might cause problems\n");
zfcp_fsf_qtcb_cache = zfcp_cache_hw_align("zfcp_fsf_qtcb",
sizeof(struct fsf_qtcb));
if (!zfcp_fsf_qtcb_cache)
goto out_qtcb_cache;
zfcp_fc_req_cache = zfcp_cache_hw_align("zfcp_fc_req",
sizeof(struct zfcp_fc_req));
if (!zfcp_fc_req_cache)
goto out_fc_cache;
zfcp_scsi_transport_template =
fc_attach_transport(&zfcp_transport_functions);
if (!zfcp_scsi_transport_template)
goto out_transport;
scsi_transport_reserve_device(zfcp_scsi_transport_template,
sizeof(struct zfcp_scsi_dev));
retval = ccw_driver_register(&zfcp_ccw_driver);
if (retval) {
pr_err("The zfcp device driver could not register with "
"the common I/O layer\n");
goto out_ccw_register;
}
if (init_device)
zfcp_init_device_setup(init_device);
return 0;
out_ccw_register:
fc_release_transport(zfcp_scsi_transport_template);
out_transport:
kmem_cache_destroy(zfcp_fc_req_cache);
out_fc_cache:
kmem_cache_destroy(zfcp_fsf_qtcb_cache);
out_qtcb_cache:
return retval;
}
module_init(zfcp_module_init);
static void __exit zfcp_module_exit(void)
{
ccw_driver_unregister(&zfcp_ccw_driver);
fc_release_transport(zfcp_scsi_transport_template);
kmem_cache_destroy(zfcp_fc_req_cache);
kmem_cache_destroy(zfcp_fsf_qtcb_cache);
}
module_exit(zfcp_module_exit);
/**
* zfcp_get_port_by_wwpn - find port in port list of adapter by wwpn
* @adapter: pointer to adapter to search for port
* @wwpn: wwpn to search for
*
* Returns: pointer to zfcp_port or NULL
*/
struct zfcp_port *zfcp_get_port_by_wwpn(struct zfcp_adapter *adapter,
u64 wwpn)
{
unsigned long flags;
struct zfcp_port *port;
read_lock_irqsave(&adapter->port_list_lock, flags);
list_for_each_entry(port, &adapter->port_list, list)
if (port->wwpn == wwpn) {
if (!get_device(&port->dev))
port = NULL;
read_unlock_irqrestore(&adapter->port_list_lock, flags);
return port;
}
read_unlock_irqrestore(&adapter->port_list_lock, flags);
return NULL;
}
static int zfcp_allocate_low_mem_buffers(struct zfcp_adapter *adapter)
{
adapter->pool.erp_req =
mempool_create_kmalloc_pool(1, sizeof(struct zfcp_fsf_req));
if (!adapter->pool.erp_req)
return -ENOMEM;
adapter->pool.gid_pn_req =
mempool_create_kmalloc_pool(1, sizeof(struct zfcp_fsf_req));
if (!adapter->pool.gid_pn_req)
return -ENOMEM;
adapter->pool.scsi_req =
mempool_create_kmalloc_pool(1, sizeof(struct zfcp_fsf_req));
if (!adapter->pool.scsi_req)
return -ENOMEM;
adapter->pool.scsi_abort =
mempool_create_kmalloc_pool(1, sizeof(struct zfcp_fsf_req));
if (!adapter->pool.scsi_abort)
return -ENOMEM;
adapter->pool.status_read_req =
mempool_create_kmalloc_pool(FSF_STATUS_READS_RECOM,
sizeof(struct zfcp_fsf_req));
if (!adapter->pool.status_read_req)
return -ENOMEM;
adapter->pool.qtcb_pool =
mempool_create_slab_pool(4, zfcp_fsf_qtcb_cache);
if (!adapter->pool.qtcb_pool)
return -ENOMEM;
BUILD_BUG_ON(sizeof(struct fsf_status_read_buffer) > PAGE_SIZE);
adapter->pool.sr_data =
mempool_create_page_pool(FSF_STATUS_READS_RECOM, 0);
if (!adapter->pool.sr_data)
return -ENOMEM;
adapter->pool.gid_pn =
mempool_create_slab_pool(1, zfcp_fc_req_cache);
if (!adapter->pool.gid_pn)
return -ENOMEM;
return 0;
}
static void zfcp_free_low_mem_buffers(struct zfcp_adapter *adapter)
{
mempool_destroy(adapter->pool.erp_req);
mempool_destroy(adapter->pool.scsi_req);
mempool_destroy(adapter->pool.scsi_abort);
mempool_destroy(adapter->pool.qtcb_pool);
mempool_destroy(adapter->pool.status_read_req);
mempool_destroy(adapter->pool.sr_data);
mempool_destroy(adapter->pool.gid_pn);
}
/**
* zfcp_status_read_refill - refill the long running status_read_requests
* @adapter: ptr to struct zfcp_adapter for which the buffers should be refilled
*
* Return:
* * 0 on success meaning at least one status read is pending
* * 1 if posting failed and not a single status read buffer is pending,
* also triggers adapter reopen recovery
*/
int zfcp_status_read_refill(struct zfcp_adapter *adapter)
{
scsi: zfcp: fix posting too many status read buffers leading to adapter shutdown Suppose adapter (open) recovery is between opened QDIO queues and before (the end of) initial posting of status read buffers (SRBs). This time window can be seconds long due to FSF_PROT_HOST_CONNECTION_INITIALIZING causing by design looping with exponential increase sleeps in the function performing exchange config data during recovery [zfcp_erp_adapter_strat_fsf_xconf()]. Recovery triggered by local link up. Suppose an event occurs for which the FCP channel would send an unsolicited notification to zfcp by means of a previously posted SRB. We saw it with local cable pull (link down) in multi-initiator zoning with multiple NPIV-enabled subchannels of the same shared FCP channel. As soon as zfcp_erp_adapter_strategy_open_fsf() starts posting the initial status read buffers from within the adapter's ERP thread, the channel does send an unsolicited notification. Since v2.6.27 commit d26ab06ede83 ("[SCSI] zfcp: receiving an unsolicted status can lead to I/O stall"), zfcp_fsf_status_read_handler() schedules adapter->stat_work to re-fill the just consumed SRB from a work item. Now the ERP thread and the work item post SRBs in parallel. Both contexts call the helper function zfcp_status_read_refill(). The tracking of missing (to be posted / re-filled) SRBs is not thread-safe due to separate atomic_read() and atomic_dec(), in order to depend on posting success. Hence, both contexts can see atomic_read(&adapter->stat_miss) == 1. One of the two contexts posts one too many SRB. Zfcp gets QDIO_ERROR_SLSB_STATE on the output queue (trace tag "qdireq1") leading to zfcp_erp_adapter_shutdown() in zfcp_qdio_handler_error(). An obvious and seemingly clean fix would be to schedule stat_work from the ERP thread and wait for it to finish. This would serialize all SRB re-fills. However, we already have another work item wait on the ERP thread: adapter->scan_work runs zfcp_fc_scan_ports() which calls zfcp_fc_eval_gpn_ft(). The latter calls zfcp_erp_wait() to wait for all the open port recoveries during zfcp auto port scan, but in fact it waits for any pending recovery including an adapter recovery. This approach leads to a deadlock. [see also v3.19 commit 18f87a67e6d6 ("zfcp: auto port scan resiliency"); v2.6.37 commit d3e1088d6873 ("[SCSI] zfcp: No ERP escalation on gpn_ft eval"); v2.6.28 commit fca55b6fb587 ("[SCSI] zfcp: fix deadlock between wq triggered port scan and ERP") fixing v2.6.27 commit c57a39a45a76 ("[SCSI] zfcp: wait until adapter is finished with ERP during auto-port"); v2.6.27 commit cc8c282963bd ("[SCSI] zfcp: Automatically attach remote ports")] Instead make the accounting of missing SRBs atomic for parallel execution in both the ERP thread and adapter->stat_work. Signed-off-by: Steffen Maier <maier@linux.ibm.com> Fixes: d26ab06ede83 ("[SCSI] zfcp: receiving an unsolicted status can lead to I/O stall") Cc: <stable@vger.kernel.org> #2.6.27+ Reviewed-by: Jens Remus <jremus@linux.ibm.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2018-12-07 00:31:20 +08:00
while (atomic_add_unless(&adapter->stat_miss, -1, 0))
if (zfcp_fsf_status_read(adapter->qdio)) {
scsi: zfcp: fix posting too many status read buffers leading to adapter shutdown Suppose adapter (open) recovery is between opened QDIO queues and before (the end of) initial posting of status read buffers (SRBs). This time window can be seconds long due to FSF_PROT_HOST_CONNECTION_INITIALIZING causing by design looping with exponential increase sleeps in the function performing exchange config data during recovery [zfcp_erp_adapter_strat_fsf_xconf()]. Recovery triggered by local link up. Suppose an event occurs for which the FCP channel would send an unsolicited notification to zfcp by means of a previously posted SRB. We saw it with local cable pull (link down) in multi-initiator zoning with multiple NPIV-enabled subchannels of the same shared FCP channel. As soon as zfcp_erp_adapter_strategy_open_fsf() starts posting the initial status read buffers from within the adapter's ERP thread, the channel does send an unsolicited notification. Since v2.6.27 commit d26ab06ede83 ("[SCSI] zfcp: receiving an unsolicted status can lead to I/O stall"), zfcp_fsf_status_read_handler() schedules adapter->stat_work to re-fill the just consumed SRB from a work item. Now the ERP thread and the work item post SRBs in parallel. Both contexts call the helper function zfcp_status_read_refill(). The tracking of missing (to be posted / re-filled) SRBs is not thread-safe due to separate atomic_read() and atomic_dec(), in order to depend on posting success. Hence, both contexts can see atomic_read(&adapter->stat_miss) == 1. One of the two contexts posts one too many SRB. Zfcp gets QDIO_ERROR_SLSB_STATE on the output queue (trace tag "qdireq1") leading to zfcp_erp_adapter_shutdown() in zfcp_qdio_handler_error(). An obvious and seemingly clean fix would be to schedule stat_work from the ERP thread and wait for it to finish. This would serialize all SRB re-fills. However, we already have another work item wait on the ERP thread: adapter->scan_work runs zfcp_fc_scan_ports() which calls zfcp_fc_eval_gpn_ft(). The latter calls zfcp_erp_wait() to wait for all the open port recoveries during zfcp auto port scan, but in fact it waits for any pending recovery including an adapter recovery. This approach leads to a deadlock. [see also v3.19 commit 18f87a67e6d6 ("zfcp: auto port scan resiliency"); v2.6.37 commit d3e1088d6873 ("[SCSI] zfcp: No ERP escalation on gpn_ft eval"); v2.6.28 commit fca55b6fb587 ("[SCSI] zfcp: fix deadlock between wq triggered port scan and ERP") fixing v2.6.27 commit c57a39a45a76 ("[SCSI] zfcp: wait until adapter is finished with ERP during auto-port"); v2.6.27 commit cc8c282963bd ("[SCSI] zfcp: Automatically attach remote ports")] Instead make the accounting of missing SRBs atomic for parallel execution in both the ERP thread and adapter->stat_work. Signed-off-by: Steffen Maier <maier@linux.ibm.com> Fixes: d26ab06ede83 ("[SCSI] zfcp: receiving an unsolicted status can lead to I/O stall") Cc: <stable@vger.kernel.org> #2.6.27+ Reviewed-by: Jens Remus <jremus@linux.ibm.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2018-12-07 00:31:20 +08:00
atomic_inc(&adapter->stat_miss); /* undo add -1 */
if (atomic_read(&adapter->stat_miss) >=
adapter->stat_read_buf_num) {
zfcp_erp_adapter_reopen(adapter, 0, "axsref1");
return 1;
}
break;
scsi: zfcp: fix posting too many status read buffers leading to adapter shutdown Suppose adapter (open) recovery is between opened QDIO queues and before (the end of) initial posting of status read buffers (SRBs). This time window can be seconds long due to FSF_PROT_HOST_CONNECTION_INITIALIZING causing by design looping with exponential increase sleeps in the function performing exchange config data during recovery [zfcp_erp_adapter_strat_fsf_xconf()]. Recovery triggered by local link up. Suppose an event occurs for which the FCP channel would send an unsolicited notification to zfcp by means of a previously posted SRB. We saw it with local cable pull (link down) in multi-initiator zoning with multiple NPIV-enabled subchannels of the same shared FCP channel. As soon as zfcp_erp_adapter_strategy_open_fsf() starts posting the initial status read buffers from within the adapter's ERP thread, the channel does send an unsolicited notification. Since v2.6.27 commit d26ab06ede83 ("[SCSI] zfcp: receiving an unsolicted status can lead to I/O stall"), zfcp_fsf_status_read_handler() schedules adapter->stat_work to re-fill the just consumed SRB from a work item. Now the ERP thread and the work item post SRBs in parallel. Both contexts call the helper function zfcp_status_read_refill(). The tracking of missing (to be posted / re-filled) SRBs is not thread-safe due to separate atomic_read() and atomic_dec(), in order to depend on posting success. Hence, both contexts can see atomic_read(&adapter->stat_miss) == 1. One of the two contexts posts one too many SRB. Zfcp gets QDIO_ERROR_SLSB_STATE on the output queue (trace tag "qdireq1") leading to zfcp_erp_adapter_shutdown() in zfcp_qdio_handler_error(). An obvious and seemingly clean fix would be to schedule stat_work from the ERP thread and wait for it to finish. This would serialize all SRB re-fills. However, we already have another work item wait on the ERP thread: adapter->scan_work runs zfcp_fc_scan_ports() which calls zfcp_fc_eval_gpn_ft(). The latter calls zfcp_erp_wait() to wait for all the open port recoveries during zfcp auto port scan, but in fact it waits for any pending recovery including an adapter recovery. This approach leads to a deadlock. [see also v3.19 commit 18f87a67e6d6 ("zfcp: auto port scan resiliency"); v2.6.37 commit d3e1088d6873 ("[SCSI] zfcp: No ERP escalation on gpn_ft eval"); v2.6.28 commit fca55b6fb587 ("[SCSI] zfcp: fix deadlock between wq triggered port scan and ERP") fixing v2.6.27 commit c57a39a45a76 ("[SCSI] zfcp: wait until adapter is finished with ERP during auto-port"); v2.6.27 commit cc8c282963bd ("[SCSI] zfcp: Automatically attach remote ports")] Instead make the accounting of missing SRBs atomic for parallel execution in both the ERP thread and adapter->stat_work. Signed-off-by: Steffen Maier <maier@linux.ibm.com> Fixes: d26ab06ede83 ("[SCSI] zfcp: receiving an unsolicted status can lead to I/O stall") Cc: <stable@vger.kernel.org> #2.6.27+ Reviewed-by: Jens Remus <jremus@linux.ibm.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2018-12-07 00:31:20 +08:00
}
return 0;
}
static void _zfcp_status_read_scheduler(struct work_struct *work)
{
zfcp_status_read_refill(container_of(work, struct zfcp_adapter,
stat_work));
}
static void zfcp_print_sl(struct seq_file *m, struct service_level *sl)
{
struct zfcp_adapter *adapter =
container_of(sl, struct zfcp_adapter, service_level);
seq_printf(m, "zfcp: %s microcode level %x\n",
dev_name(&adapter->ccw_device->dev),
adapter->fsf_lic_version);
}
static int zfcp_setup_adapter_work_queue(struct zfcp_adapter *adapter)
{
char name[TASK_COMM_LEN];
snprintf(name, sizeof(name), "zfcp_q_%s",
dev_name(&adapter->ccw_device->dev));
adapter->work_queue = alloc_ordered_workqueue(name, WQ_MEM_RECLAIM);
if (adapter->work_queue)
return 0;
return -ENOMEM;
}
static void zfcp_destroy_adapter_work_queue(struct zfcp_adapter *adapter)
{
if (adapter->work_queue)
destroy_workqueue(adapter->work_queue);
adapter->work_queue = NULL;
}
/**
* zfcp_adapter_enqueue - enqueue a new adapter to the list
* @ccw_device: pointer to the struct cc_device
*
* Returns: struct zfcp_adapter*
* Enqueues an adapter at the end of the adapter list in the driver data.
* All adapter internal structures are set up.
* Proc-fs entries are also created.
*/
struct zfcp_adapter *zfcp_adapter_enqueue(struct ccw_device *ccw_device)
{
struct zfcp_adapter *adapter;
if (!get_device(&ccw_device->dev))
return ERR_PTR(-ENODEV);
adapter = kzalloc(sizeof(struct zfcp_adapter), GFP_KERNEL);
if (!adapter) {
put_device(&ccw_device->dev);
return ERR_PTR(-ENOMEM);
}
kref_init(&adapter->ref);
ccw_device->handler = NULL;
adapter->ccw_device = ccw_device;
INIT_WORK(&adapter->stat_work, _zfcp_status_read_scheduler);
zfcp: auto port scan resiliency This patch improves the Fibre Channel port scan behaviour of the zfcp lldd. Without it the zfcp device driver may churn up the storage area network by excessive scanning and scan bursts, particularly in big virtual server environments, potentially resulting in interference of virtual servers and reduced availability of storage connectivity. The two main issues as to the zfcp device drivers automatic port scan in virtual server environments are frequency and simultaneity. On the one hand, there is no point in allowing lots of ports scans in a row. It makes sense, though, to make sure that a scan is conducted eventually if there has been any indication for potential SAN changes. On the other hand, lots of virtual servers receiving the same indication for a SAN change had better not attempt to conduct a scan instantly, that is, at the same time. Hence this patch has a two-fold approach for better port scanning: the introduction of a rate limit to amend frequency issues, and the introduction of a short random backoff to amend simultaneity issues. Both approaches boil down to deferred port scans, with delays comprising parts for both approaches. The new port scan behaviour is summarised best by: NEW: NEW: no_auto_port_rescan random rate flush backoff limit =wait adapter resume/thaw yes yes no yes* adapter online (user) no yes no yes* port rescan (user) no no no yes adapter recovery (user) yes yes yes no adapter recovery (other) yes yes yes no incoming ELS yes yes yes no incoming ELS lost yes yes yes no Implementation is straight-forward by converting an existing worker to a delayed worker. But care is needed whenever that worker is going to be flushed (in order to make sure work has been completed), since a flush operation cancels the timer set up for deferred execution (see * above). There is a small race window whenever a port scan work starts running up to the point in time of storing the time stamp for that port scan. The impact is negligible. Closing that gap isn't trivial, though, and would the destroy the beauty of a simple work-to-delayed-work conversion. Signed-off-by: Martin Peschke <mpeschke@linux.vnet.ibm.com> Signed-off-by: Steffen Maier <maier@linux.vnet.ibm.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Christoph Hellwig <hch@lst.de>
2014-11-13 21:59:48 +08:00
INIT_DELAYED_WORK(&adapter->scan_work, zfcp_fc_scan_ports);
INIT_WORK(&adapter->ns_up_work, zfcp_fc_sym_name_update);
zfcp: auto port scan resiliency This patch improves the Fibre Channel port scan behaviour of the zfcp lldd. Without it the zfcp device driver may churn up the storage area network by excessive scanning and scan bursts, particularly in big virtual server environments, potentially resulting in interference of virtual servers and reduced availability of storage connectivity. The two main issues as to the zfcp device drivers automatic port scan in virtual server environments are frequency and simultaneity. On the one hand, there is no point in allowing lots of ports scans in a row. It makes sense, though, to make sure that a scan is conducted eventually if there has been any indication for potential SAN changes. On the other hand, lots of virtual servers receiving the same indication for a SAN change had better not attempt to conduct a scan instantly, that is, at the same time. Hence this patch has a two-fold approach for better port scanning: the introduction of a rate limit to amend frequency issues, and the introduction of a short random backoff to amend simultaneity issues. Both approaches boil down to deferred port scans, with delays comprising parts for both approaches. The new port scan behaviour is summarised best by: NEW: NEW: no_auto_port_rescan random rate flush backoff limit =wait adapter resume/thaw yes yes no yes* adapter online (user) no yes no yes* port rescan (user) no no no yes adapter recovery (user) yes yes yes no adapter recovery (other) yes yes yes no incoming ELS yes yes yes no incoming ELS lost yes yes yes no Implementation is straight-forward by converting an existing worker to a delayed worker. But care is needed whenever that worker is going to be flushed (in order to make sure work has been completed), since a flush operation cancels the timer set up for deferred execution (see * above). There is a small race window whenever a port scan work starts running up to the point in time of storing the time stamp for that port scan. The impact is negligible. Closing that gap isn't trivial, though, and would the destroy the beauty of a simple work-to-delayed-work conversion. Signed-off-by: Martin Peschke <mpeschke@linux.vnet.ibm.com> Signed-off-by: Steffen Maier <maier@linux.vnet.ibm.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Christoph Hellwig <hch@lst.de>
2014-11-13 21:59:48 +08:00
adapter->next_port_scan = jiffies;
scsi: zfcp: fix erp_action use-before-initialize in REC action trace v4.10 commit 6f2ce1c6af37 ("scsi: zfcp: fix rport unblock race with LUN recovery") extended accessing parent pointer fields of struct zfcp_erp_action for tracing. If an erp_action has never been enqueued before, these parent pointer fields are uninitialized and NULL. Examples are zfcp objects freshly added to the parent object's children list, before enqueueing their first recovery subsequently. In zfcp_erp_try_rport_unblock(), we iterate such list. Accessing erp_action fields can cause a NULL pointer dereference. Since the kernel can read from lowcore on s390, it does not immediately cause a kernel page fault. Instead it can cause hangs on trying to acquire the wrong erp_action->adapter->dbf->rec_lock in zfcp_dbf_rec_action_lvl() ^bogus^ while holding already other locks with IRQs disabled. Real life example from attaching lots of LUNs in parallel on many CPUs: crash> bt 17723 PID: 17723 TASK: ... CPU: 25 COMMAND: "zfcperp0.0.1800" LOWCORE INFO: -psw : 0x0404300180000000 0x000000000038e424 -function : _raw_spin_lock_wait_flags at 38e424 ... #0 [fdde8fc90] zfcp_dbf_rec_action_lvl at 3e0004e9862 [zfcp] #1 [fdde8fce8] zfcp_erp_try_rport_unblock at 3e0004dfddc [zfcp] #2 [fdde8fd38] zfcp_erp_strategy at 3e0004e0234 [zfcp] #3 [fdde8fda8] zfcp_erp_thread at 3e0004e0a12 [zfcp] #4 [fdde8fe60] kthread at 173550 #5 [fdde8feb8] kernel_thread_starter at 10add2 zfcp_adapter zfcp_port zfcp_unit <address>, 0x404040d600000000 scsi_device NULL, returning early! zfcp_scsi_dev.status = 0x40000000 0x40000000 ZFCP_STATUS_COMMON_RUNNING crash> zfcp_unit <address> struct zfcp_unit { erp_action = { adapter = 0x0, port = 0x0, unit = 0x0, }, } zfcp_erp_action is always fully embedded into its container object. Such container object is never moved in its object tree (only add or delete). Hence, erp_action parent pointers can never change. To fix the issue, initialize the erp_action parent pointers before adding the erp_action container to any list and thus before it becomes accessible from outside of its initializing function. In order to also close the time window between zfcp_erp_setup_act() memsetting the entire erp_action to zero and setting the parent pointers again, drop the memset and instead explicitly initialize individually all erp_action fields except for parent pointers. To be extra careful not to introduce any other unintended side effect, even keep zeroing the erp_action fields for list and timer. Also double-check with WARN_ON_ONCE that erp_action parent pointers never change, so we get to know when we would deviate from previous behavior. Signed-off-by: Steffen Maier <maier@linux.vnet.ibm.com> Fixes: 6f2ce1c6af37 ("scsi: zfcp: fix rport unblock race with LUN recovery") Cc: <stable@vger.kernel.org> #2.6.32+ Reviewed-by: Benjamin Block <bblock@linux.vnet.ibm.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-10-13 21:40:07 +08:00
adapter->erp_action.adapter = adapter;
if (zfcp_qdio_setup(adapter))
goto failed;
if (zfcp_allocate_low_mem_buffers(adapter))
goto failed;
adapter->req_list = zfcp_reqlist_alloc();
if (!adapter->req_list)
goto failed;
if (zfcp_dbf_adapter_register(adapter))
goto failed;
if (zfcp_setup_adapter_work_queue(adapter))
goto failed;
if (zfcp_fc_gs_setup(adapter))
goto failed;
rwlock_init(&adapter->port_list_lock);
INIT_LIST_HEAD(&adapter->port_list);
INIT_LIST_HEAD(&adapter->events.list);
INIT_WORK(&adapter->events.work, zfcp_fc_post_event);
spin_lock_init(&adapter->events.list_lock);
init_waitqueue_head(&adapter->erp_ready_wq);
init_waitqueue_head(&adapter->erp_done_wqh);
INIT_LIST_HEAD(&adapter->erp_ready_head);
INIT_LIST_HEAD(&adapter->erp_running_head);
rwlock_init(&adapter->erp_lock);
rwlock_init(&adapter->abort_lock);
if (zfcp_erp_thread_setup(adapter))
goto failed;
adapter->service_level.seq_print = zfcp_print_sl;
dev_set_drvdata(&ccw_device->dev, adapter);
if (sysfs_create_group(&ccw_device->dev.kobj,
&zfcp_sysfs_adapter_attrs))
goto failed;
/* report size limit per scatter-gather segment */
adapter->ccw_device->dev.dma_parms = &adapter->dma_parms;
[SCSI] zfcp: status read buffers on first adapter open with link down Commit 64deb6efdc5504ce97b5c1c6f281fffbc150bd93 "[SCSI] zfcp: Use status_read_buf_num provided by FCP channel" started using a value returned by the channel but only evaluated the value if the fabric link is up. Commit 8d88cf3f3b9af4713642caeb221b6d6a42019001 "[SCSI] zfcp: Update status read mempool" introduced mempool resizings based on the above value. On setting an FCP device online for the very first time since boot, a new zeroed adapter object is allocated. If the link is down, the number of status read requests remains zero. Since just the config data exchange is incomplete, we proceed with adapter open recovery. However, we unconditionally call mempool_resize with adapter->stat_read_buf_num == 0 in this case. This causes a kernel message "kernel BUG at mm/mempool.c:131!" in process "zfcperp<FCP-device-bus-ID>" with last function mempool_resize in Krnl PSW and zfcp_erp_thread in the Call Trace. Don't evaluate channel values which are invalid on link down. The number of status read requests is always valid, evaluated, and set to a positive minimum greater than zero. The adapter open recovery can proceed and the channel has status read buffers to inform us on a future link up event. While we are not aware of any other code path that could result in mempool resize attempts of size zero, we still also initialize the number of status read buffers to be posted to a static minimum number on adapter object allocation. Signed-off-by: Steffen Maier <maier@linux.vnet.ibm.com> Cc: <stable@vger.kernel.org> #2.6.35+ Signed-off-by: James Bottomley <JBottomley@Parallels.com>
2013-04-26 23:34:54 +08:00
adapter->stat_read_buf_num = FSF_STATUS_READS_RECOM;
if (!zfcp_scsi_adapter_register(adapter))
return adapter;
failed:
zfcp_adapter_unregister(adapter);
return ERR_PTR(-ENOMEM);
}
void zfcp_adapter_unregister(struct zfcp_adapter *adapter)
{
struct ccw_device *cdev = adapter->ccw_device;
zfcp: auto port scan resiliency This patch improves the Fibre Channel port scan behaviour of the zfcp lldd. Without it the zfcp device driver may churn up the storage area network by excessive scanning and scan bursts, particularly in big virtual server environments, potentially resulting in interference of virtual servers and reduced availability of storage connectivity. The two main issues as to the zfcp device drivers automatic port scan in virtual server environments are frequency and simultaneity. On the one hand, there is no point in allowing lots of ports scans in a row. It makes sense, though, to make sure that a scan is conducted eventually if there has been any indication for potential SAN changes. On the other hand, lots of virtual servers receiving the same indication for a SAN change had better not attempt to conduct a scan instantly, that is, at the same time. Hence this patch has a two-fold approach for better port scanning: the introduction of a rate limit to amend frequency issues, and the introduction of a short random backoff to amend simultaneity issues. Both approaches boil down to deferred port scans, with delays comprising parts for both approaches. The new port scan behaviour is summarised best by: NEW: NEW: no_auto_port_rescan random rate flush backoff limit =wait adapter resume/thaw yes yes no yes* adapter online (user) no yes no yes* port rescan (user) no no no yes adapter recovery (user) yes yes yes no adapter recovery (other) yes yes yes no incoming ELS yes yes yes no incoming ELS lost yes yes yes no Implementation is straight-forward by converting an existing worker to a delayed worker. But care is needed whenever that worker is going to be flushed (in order to make sure work has been completed), since a flush operation cancels the timer set up for deferred execution (see * above). There is a small race window whenever a port scan work starts running up to the point in time of storing the time stamp for that port scan. The impact is negligible. Closing that gap isn't trivial, though, and would the destroy the beauty of a simple work-to-delayed-work conversion. Signed-off-by: Martin Peschke <mpeschke@linux.vnet.ibm.com> Signed-off-by: Steffen Maier <maier@linux.vnet.ibm.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Christoph Hellwig <hch@lst.de>
2014-11-13 21:59:48 +08:00
cancel_delayed_work_sync(&adapter->scan_work);
cancel_work_sync(&adapter->stat_work);
cancel_work_sync(&adapter->ns_up_work);
zfcp_destroy_adapter_work_queue(adapter);
zfcp_fc_wka_ports_force_offline(adapter->gs);
zfcp_scsi_adapter_unregister(adapter);
sysfs_remove_group(&cdev->dev.kobj, &zfcp_sysfs_adapter_attrs);
zfcp_erp_thread_kill(adapter);
zfcp_dbf_adapter_unregister(adapter);
zfcp_qdio_destroy(adapter->qdio);
zfcp_ccw_adapter_put(adapter); /* final put to release */
}
/**
* zfcp_adapter_release - remove the adapter from the resource list
* @ref: pointer to struct kref
* locks: adapter list write lock is assumed to be held by caller
*/
void zfcp_adapter_release(struct kref *ref)
{
struct zfcp_adapter *adapter = container_of(ref, struct zfcp_adapter,
ref);
struct ccw_device *cdev = adapter->ccw_device;
dev_set_drvdata(&adapter->ccw_device->dev, NULL);
zfcp_fc_gs_destroy(adapter);
zfcp_free_low_mem_buffers(adapter);
kfree(adapter->req_list);
kfree(adapter->fc_stats);
kfree(adapter->stats_reset_data);
kfree(adapter);
put_device(&cdev->dev);
}
static void zfcp_port_release(struct device *dev)
{
struct zfcp_port *port = container_of(dev, struct zfcp_port, dev);
zfcp_ccw_adapter_put(port->adapter);
kfree(port);
}
/**
* zfcp_port_enqueue - enqueue port to port list of adapter
* @adapter: adapter where remote port is added
* @wwpn: WWPN of the remote port to be enqueued
* @status: initial status for the port
* @d_id: destination id of the remote port to be enqueued
* Returns: pointer to enqueued port on success, ERR_PTR on error
*
* All port internal structures are set up and the sysfs entry is generated.
* d_id is used to enqueue ports with a well known address like the Directory
* Service for nameserver lookup.
*/
struct zfcp_port *zfcp_port_enqueue(struct zfcp_adapter *adapter, u64 wwpn,
u32 status, u32 d_id)
{
struct zfcp_port *port;
int retval = -ENOMEM;
kref_get(&adapter->ref);
port = zfcp_get_port_by_wwpn(adapter, wwpn);
if (port) {
put_device(&port->dev);
retval = -EEXIST;
goto err_out;
}
port = kzalloc(sizeof(struct zfcp_port), GFP_KERNEL);
if (!port)
goto err_out;
rwlock_init(&port->unit_list_lock);
INIT_LIST_HEAD(&port->unit_list);
atomic_set(&port->units, 0);
INIT_WORK(&port->gid_pn_work, zfcp_fc_port_did_lookup);
INIT_WORK(&port->test_link_work, zfcp_fc_link_test_work);
INIT_WORK(&port->rport_work, zfcp_scsi_rport_work);
port->adapter = adapter;
port->d_id = d_id;
port->wwpn = wwpn;
port->rport_task = RPORT_NONE;
port->dev.parent = &adapter->ccw_device->dev;
port->dev.groups = zfcp_port_attr_groups;
port->dev.release = zfcp_port_release;
scsi: zfcp: fix erp_action use-before-initialize in REC action trace v4.10 commit 6f2ce1c6af37 ("scsi: zfcp: fix rport unblock race with LUN recovery") extended accessing parent pointer fields of struct zfcp_erp_action for tracing. If an erp_action has never been enqueued before, these parent pointer fields are uninitialized and NULL. Examples are zfcp objects freshly added to the parent object's children list, before enqueueing their first recovery subsequently. In zfcp_erp_try_rport_unblock(), we iterate such list. Accessing erp_action fields can cause a NULL pointer dereference. Since the kernel can read from lowcore on s390, it does not immediately cause a kernel page fault. Instead it can cause hangs on trying to acquire the wrong erp_action->adapter->dbf->rec_lock in zfcp_dbf_rec_action_lvl() ^bogus^ while holding already other locks with IRQs disabled. Real life example from attaching lots of LUNs in parallel on many CPUs: crash> bt 17723 PID: 17723 TASK: ... CPU: 25 COMMAND: "zfcperp0.0.1800" LOWCORE INFO: -psw : 0x0404300180000000 0x000000000038e424 -function : _raw_spin_lock_wait_flags at 38e424 ... #0 [fdde8fc90] zfcp_dbf_rec_action_lvl at 3e0004e9862 [zfcp] #1 [fdde8fce8] zfcp_erp_try_rport_unblock at 3e0004dfddc [zfcp] #2 [fdde8fd38] zfcp_erp_strategy at 3e0004e0234 [zfcp] #3 [fdde8fda8] zfcp_erp_thread at 3e0004e0a12 [zfcp] #4 [fdde8fe60] kthread at 173550 #5 [fdde8feb8] kernel_thread_starter at 10add2 zfcp_adapter zfcp_port zfcp_unit <address>, 0x404040d600000000 scsi_device NULL, returning early! zfcp_scsi_dev.status = 0x40000000 0x40000000 ZFCP_STATUS_COMMON_RUNNING crash> zfcp_unit <address> struct zfcp_unit { erp_action = { adapter = 0x0, port = 0x0, unit = 0x0, }, } zfcp_erp_action is always fully embedded into its container object. Such container object is never moved in its object tree (only add or delete). Hence, erp_action parent pointers can never change. To fix the issue, initialize the erp_action parent pointers before adding the erp_action container to any list and thus before it becomes accessible from outside of its initializing function. In order to also close the time window between zfcp_erp_setup_act() memsetting the entire erp_action to zero and setting the parent pointers again, drop the memset and instead explicitly initialize individually all erp_action fields except for parent pointers. To be extra careful not to introduce any other unintended side effect, even keep zeroing the erp_action fields for list and timer. Also double-check with WARN_ON_ONCE that erp_action parent pointers never change, so we get to know when we would deviate from previous behavior. Signed-off-by: Steffen Maier <maier@linux.vnet.ibm.com> Fixes: 6f2ce1c6af37 ("scsi: zfcp: fix rport unblock race with LUN recovery") Cc: <stable@vger.kernel.org> #2.6.32+ Reviewed-by: Benjamin Block <bblock@linux.vnet.ibm.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-10-13 21:40:07 +08:00
port->erp_action.adapter = adapter;
port->erp_action.port = port;
if (dev_set_name(&port->dev, "0x%016llx", (unsigned long long)wwpn)) {
kfree(port);
goto err_out;
}
retval = -EINVAL;
if (device_register(&port->dev)) {
put_device(&port->dev);
goto err_out;
}
write_lock_irq(&adapter->port_list_lock);
list_add_tail(&port->list, &adapter->port_list);
write_unlock_irq(&adapter->port_list_lock);
atomic_or(status | ZFCP_STATUS_COMMON_RUNNING, &port->status);
return port;
err_out:
zfcp_ccw_adapter_put(adapter);
return ERR_PTR(retval);
}