To support scenarios which aren't bound to nvmetcli add port scenarios,
which is currently where the nvmet_fc transport invokes the discovery
event callbacks, a syfs attribute is added to lpfc which can be written
to cause an RSCN to be generated for the nport.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Reviewed-by: Arun Easi <aeasi@marvell.com>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
This patch updates RSCN receive processing to check for the remote
port being an NVME port, and if so, invoke the nvme_fc callback to
rescan the remote port. The rescan will generate a discovery udev
event.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Reviewed-by: Arun Easi <aeasi@marvell.com>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
This patch adds support for the nvmet discovery op. When the callback
routine is called, the driver will call the routine to generate an RSCN
to the port on the other end of the link.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Reviewed-by: Arun Easi <aeasi@marvell.com>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
This patch adds general RSCN support:
- The ability to transmit an RSCN to the port on the other end of
the link (regular port if pt2pt, or fabric controller if fabric).
- And general recognition of an RSCN ELS when an ELS is received.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Reviewed-by: Arun Easi <aeasi@marvell.com>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Update fcloop to support the discovery_event operation and
invoke a nvme rescan. In a real fc adapter, this would generate an
RSCN, which the host would receive and convert into a nvme rescan
on the remote port specified in the rscn payload.
Signed-off-by: James Smart <jsmart2021@gmail.com>
[kbuild-bot: fcloop_tgt_discovery_evt can be static]
Reviewed-by: Hannes Reinecke <hare@suse.com>
Reviewed-by: Arun Easi <aeasi@marvell.com>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
This patch adds support for the nvmet discovery_change transport op.
In turn, the transport adds it's own LLDD api callback discovery_event
op to request the LLDD to generate an RSCN for the discovery change.
Signed-off-by: James Smart <jsmart2021@gmail.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Reviewed-by: Arun Easi <aeasi@marvell.com>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Some transports, such as FC-NVME, support discovery controller change
events without the use of a persistent discovery controller. FC receives
events via RSCN from the FC Fabric Controller or subsystem FC port.
This patch adds a nvmet transport op that is called whenever a
discovery change event occurs in the nvmet layer.
To facilitate the callback without adding another layer to cross into
core.c to reference the transport ops, the port structure snapshots
the transport ops when the port is enabled and clears them when disabled.
Signed-off-by: James Smart <jsmart2021@gmail.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Reviewed-by: Arun Easi <aeasi@marvell.com>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
The bfq schedule now uses css_next_descendant_pre directly after
the stats functionality depending on it has been from the core
blk-cgroup code to bfq. Export the symbol so that bfq can still
be build modular.
Fixes: d6258980da ("bfq-iosched: move bfq_stat_recursive_sum into the only caller")
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Pull MD changes from Song.
* 'md-next' of https://github.com/liu-song-6/linux:
md: add bitmap_abort label in md_run
md-bitmap: create and destroy wb_info_pool with the change of bitmap
md-bitmap: create and destroy wb_info_pool with the change of backlog
md: introduce mddev_create/destroy_wb_pool for the change of member device
md/raid1: fix potential data inconsistency issue with write behind device
Now, there are two places need to consider about
the failure of destroy bitmap, so move the common
part between bitmap_abort and abort label.
Reviewed-by: NeilBrown <neilb@suse.com>
Signed-off-by: Guoqing Jiang <gqjiang@suse.com>
Signed-off-by: Song Liu <songliubraving@fb.com>
The write-behind attribute is part of bitmap, since bitmap
can be added/removed dynamically with the following.
1. mdadm --grow /dev/md0 --bitmap=none
2. mdadm --grow /dev/md0 --bitmap=internal --write-behind
So we need to destroy wb_info_pool in md_bitmap_destroy,
and create the pool before load bitmap.
Reviewed-by: NeilBrown <neilb@suse.com>
Signed-off-by: Guoqing Jiang <gqjiang@suse.com>
Signed-off-by: Song Liu <songliubraving@fb.com>
Since we can enable write-behind mode by write backlog node,
so create wb_info_pool if the mode is just enabled, also call
call md_bitmap_update_sb to make user aware the write-behind
mode is enabled. Conversely, wb_info_pool should be destroyed
when write-behind mode is disabled.
Beside above, it is better to update bitmap sb if we change
the number of max_write_behind.
Reviewed-by: NeilBrown <neilb@suse.com>
Signed-off-by: Guoqing Jiang <gqjiang@suse.com>
Signed-off-by: Song Liu <songliubraving@fb.com>
Previously, we called rdev_init_wb to avoid potential data
inconsistency when array is created.
Now, we need to call the function and create mempool if a
device is added or just be flaged as "writemostly". So
mddev_create_wb_pool is introduced and called accordingly.
And for safety reason, we mark implicit GFP_NOIO allocation
scope for create mempool during mddev_suspend/mddev_resume.
And mempool should be removed conversely after remove a
member device or its's "writemostly" flag, which is done
by call mddev_destroy_wb_pool.
Reviewed-by: NeilBrown <neilb@suse.com>
Signed-off-by: Guoqing Jiang <gqjiang@suse.com>
Signed-off-by: Song Liu <songliubraving@fb.com>
For write-behind mode, we think write IO is complete once it has
reached all the non-writemostly devices. It works fine for single
queue devices.
But for multiqueue device, if there are lots of IOs come from upper
layer, then the write-behind device could issue those IOs to different
queues, depends on the each queue's delay, so there is no guarantee
that those IOs can arrive in order.
To address the issue, we need to check the collision among write
behind IOs, we can only continue without collision, otherwise wait
for the completion of previous collisioned IO.
And WBCollision is introduced for multiqueue device which is worked
under write-behind mode.
But this patch doesn't handle below cases which could have the data
inconsistency issue as well, these cases will be handled in later
patches.
1. modify max_write_behind by write backlog node.
2. add or remove array's bitmap dynamically.
3. the change of member disk.
Reviewed-by: NeilBrown <neilb@suse.com>
Signed-off-by: Guoqing Jiang <gqjiang@suse.com>
Signed-off-by: Song Liu <songliubraving@fb.com>
Adjust the f2fs tracing code to use newly introduced block layer
function blk_op_str() which converts the REQ_OP_XXX into the string
XXX.
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Improve the print_req_error with additional request fields which are
helpful for debugging. Use newly introduced blk_op_str() to print the
REQ_OP_XXX in the string format.
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Now that we've a helper function blk_op_str() to convert the
REQ_OP_XXX to string XXX, adjust the code to use that. Get rid of
the duplicate array op_name which is now present in the blk-core.c
which we renamed it to "blk_op_name" and open coding in the
blk-mq-debugfs.c.
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
In order to centralize the REQ_OP_XXX to string conversion which can be
used in the block layer and different places in the kernel like f2fs,
this patch adds a new helper function along with an array similar to the
one present in the blk-mq-debugfs.c.
We keep this helper functionality centralize under blk-core.c instead of
blk-mq-debugfs.c since blk-core.c is configured using CONFIG_BLOCK and
it will not be dependent on blk-mq-debugfs.c which is configured using
CONFIG_BLK_DEBUG_FS.
Next patch adjusts the code in the blk-mq-debugfs.c with newly
introduced helper.
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Print the calling function instead of print_req_error as a prefix, and
print the operation and op_flags separately instead of the whole field.
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
This option is entirely bfq specific, give it an appropinquate name.
Also make it depend on CONFIG_BFQ_GROUP_IOSCHED in Kconfig, as all
the functionality already does so anyway.
Acked-by: Tejun Heo <tj@kernel.org>
Acked-by: Paolo Valente <paolo.valente@linaro.org>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
This function was moved from core block code and is way to generic.
Fold it into the only caller and simplify it based on the actually
passed arguments.
Acked-by: Tejun Heo <tj@kernel.org>
Acked-by: Paolo Valente <paolo.valente@linaro.org>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
This structure and assorted infrastructure is only used by the bfq I/O
scheduler. Move it there instead of bloating the common code.
Acked-by: Tejun Heo <tj@kernel.org>
Acked-by: Paolo Valente <paolo.valente@linaro.org>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
When sampling the blkcg counts we don't need atomics or per-cpu
variables. Introduce a new structure just containing plain u64
counters.
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Returning a structure generates rather bad code, so switch to passing
by reference. Also don't require the structure to be zeroed and add
to the 0-initialized counters, but actually set the counters to the
calculated value.
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Trying to break up the crazy statements to something readable.
Also switch to an unsigned counter as it can't ever turn negative.
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
This function just has a few trivial assignments, has two callers with
one of them being in the fastpath.
Reviewed-by: Hannes Reinecke <hare@suse.com>
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Now that we don't need to assign the front/back segment sizes, we can
duplicating the segs assignment for the split vs no-split case and
remove a whole chunk of boilerplate code.
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Return the segement and let the callers assign them, which makes the code
a littler more obvious. Also pass the request instead of q plus bio
chain, allowing for the use of rq_for_each_bvec.
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
We only need the number of segments in the blk-mq submission path.
Remove the field from struct bio, and return it from a variant of
blk_queue_split instead of that it can passed as an argument to
those functions that need the value.
This also means we stop recounting segments except for cloning
and partial segments.
To keep the number of arguments in this how path down remove
pointless struct request_queue arguments from any of the functions
that had it and grew a nr_segs argument.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
lightnvm should have never used this function, as it is sending
passthrough requests, so switch it to blk_rq_append_bio like all the
other passthrough request users. Inline blk_init_request_from_bio into
the only remaining caller.
Reviewed-by: Hannes Reinecke <hare@suse.com>
Reviewed-by: Minwoo Im <minwoo.im.dev@gmail.com>
Reviewed-by: Javier González <javier@javigon.com>
Reviewed-by: Matias Bjørling <mb@lightnvm.io>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
The priority field also makes sense for passthrough requests, so
initialize it in blk_rq_bio_prep.
Reviewed-by: Minwoo Im <minwoo.im.dev@gmail.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
As is, iolatency recognizes done_bio and cleanup as ending paths. If a
request is marked REQ_NOWAIT and fails to get a request, the bio is
cleaned up via rq_qos_cleanup() and ended in bio_wouldblock_error().
This results in underflowing the inflight counter. Fix this by only
accounting bios that were actually submitted.
Signed-off-by: Dennis Zhou <dennis@kernel.org>
Cc: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
When calling debugfs functions, there is no need to ever check the
return value. The function can work or not, but the code logic should
never do something different based on this.
Cc: Philipp Reisner <philipp.reisner@linbit.com>
Cc: Lars Ellenberg <lars.ellenberg@linbit.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: drbd-dev@lists.linbit.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Commit a1ce35fa49 ("block: remove dead elevator code")
deleted blk_end_request() and friends, but some declaration are still
left. Purge them.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
This is a pure code cleanup patch and doesn't change any functionality.
Having multiple coding styles in the code creates confusion when
someone tries to add a new code.
Make queue_poll_stat_show() consistent by adding spaces around binary
operators with the rest of the code.
Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
In function __blk_mq_debugfs_rq_show variable op has unsigned int type.
Since op can never be negative use %u format specifier to match the
variable type.
Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
This is a pure code cleanup patch and doesn't change any functionality.
This removes the redundant else in the code which is not needed since
we are returning from function anyway.
Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
In function null_add_dev() struct nullb *nullb member is allocated
using kzalloc_node() which returns 0red memory.
In function setup_queues() which is called from the null_add_dev(), on
successful queue allocation we set the nullb->nr_queues = 0 which is not
needed due to earlier use of kzalloc_node().
Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
__data_len and __sector are internal fields which should not be accessed
directly in driver-level like the comment above it. But, tag field can
be accessed by driver level directly so that we need to make the comment
right by moving it to some other place.
Cc: Jens Axboe <axboe@kernel.dk>
Cc: linux-block@vger.kernel.org
Signed-off-by: Minwoo Im <minwoo.im.dev@gmail.com>
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
clang warns about unusual code in floppy.c that looks like it
was intended to be a bit mask operation, checking for a specific
bit in the UDP->cmos variable (FLOPPY1_TYPE expands to '4' on
ARM):
drivers/block/floppy.c:3902:17: error: use of logical '&&' with constant operand [-Werror,-Wconstant-logical-operand]
if (!UDP->cmos && FLOPPY1_TYPE)
^ ~~~~~~~~~~~~
drivers/block/floppy.c:3902:17: note: use '&' for a bitwise operation
if (!UDP->cmos && FLOPPY1_TYPE)
The check here is redundant anyway, if FLOPPY1_TYPE is zero, then
assigning it to a zero UDP->cmos field does not change anything,
so removing the extra check here has no effect other than shutting
up the warning.
On x86, this will no longer read a hardware register, as the
FLOPPY1_TYPE macro is not expanded if UDP->cmos is already
zero, but the result is the same.
Cc: Robert Elliott <elliott@hpe.com>
Cc: Keith Busch <kbusch@kernel.org>
Link: https://patchwork.kernel.org/patch/10851841/
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
This is a pure code cleanup patch and doesn't change any functionality.
In block layer to identify the request operation req_op() macro is
used, so change the open coding the req_op() in the blk-mq-debugfs.c.
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
wbc_account_io() collects information on cgroup ownership of writeback
pages to determine which cgroup should own the inode. Pages can stay
associated with dead memcgs but we want to avoid attributing IOs to
dead blkcgs as much as possible as the association is likely to be
stale. However, currently, pages associated with dead memcgs
contribute to the accounting delaying and/or confusing the
arbitration.
Fix it by ignoring pages associated with dead memcgs.
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Jan Kara <jack@suse.cz>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
When blkcg_activate_policy() is creating blkg_policy_data for existing
blkgs, it did in the wrong order - descendants first. Fix it. None
of the existing controllers seem affected by this.
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
blkg alloc is performed as a separate step from the rest of blkg
creation so that GFP_KERNEL allocations can be used when creating
blkgs from configuration file writes because otherwise user actions
may fail due to failures of opportunistic GFP_NOWAIT allocations.
While making blkgs use percpu_ref, 7fcf2b033b ("blkcg: change blkg
reference counting to use percpu_ref") incorrectly added unconditional
opportunistic percpu_ref_init() to blkg_create() breaking this
guarantee.
This patch moves percpu_ref_init() to blkg_alloc() so makes it use
@gfp_mask that blkg_alloc() is called with. Also, percpu_ref_exit()
is moved to blkg_free() for consistency.
Signed-off-by: Tejun Heo <tj@kernel.org>
Fixes: 7fcf2b033b ("blkcg: change blkg reference counting to use percpu_ref")
Cc: Dennis Zhou <dennis@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Depending on the number of devices, blkcg stats can go over the
default seqfile buf size. seqfile normally retries with a larger
buffer but since the ->pd_stat() addition, blkcg_print_stat() doesn't
tell seqfile that overflow has happened and the output gets printed
truncated. Fix it by calling seq_commit() w/ -1 on possible
overflows.
Signed-off-by: Tejun Heo <tj@kernel.org>
Fixes: 903d23f0a3 ("blk-cgroup: allow controllers to output their own stats")
Cc: stable@vger.kernel.org # v4.19+
Cc: Josef Bacik <jbacik@fb.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
If use_delay was non-zero when the latency target of a cgroup was set
to zero, it will stay stuck until io.latency is enabled on the cgroup
again. This keeps readahead disabled for the cgroup impacting
performance negatively.
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Josef Bacik <jbacik@fb.com>
Fixes: d706751215 ("block: introduce blk-iolatency io controller")
Cc: stable@vger.kernel.org # v4.19+
Signed-off-by: Jens Axboe <axboe@kernel.dk>
One of the more common cases of allocation size calculations is finding
the size of a structure that has a zero-sized array at the end, along
with memory for some number of elements for that array. For example:
struct bio_map_data {
...
struct iovec iov[];
};
instance = kmalloc(sizeof(sizeof(struct bio_map_data) + sizeof(struct iovec) *
count, GFP_KERNEL);
Instead of leaving these open-coded and prone to type mistakes, we can
now use the new struct_size() helper:
instance = kmalloc(struct_size(instance, iov, count), GFP_KERNEL);
This code was detected with the help of Coccinelle.
Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Make use of the struct_size() helper instead of an open-coded version
in order to avoid any potential type mistakes, in particular in the
context in which this code is being used.
So, replace the following form:
sizeof(*new_ptbl) + target * sizeof(new_ptbl->part[0])
with:
struct_size(new_ptbl, part, target)
Also, notice that variable size is unnecessary, hence it is removed.
This code was detected with the help of Coccinelle.
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
struct blk_rq_stat::mean is a u64 value, so use %llu
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>