Requests that triggers flushing volatile writeback cache to disk (barriers)
have significant effect to overall performance.
Block layer has sophisticated engine for combining several flush requests
into one. But there is no statistics for actual flushes executed by disk.
Requests which trigger flushes usually are barriers - zero-size writes.
This patch adds two iostat counters into /sys/class/block/$dev/stat and
/proc/diskstats - count of completed flush requests and their total time.
Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
In function 'activate_lsp', rather than hard-coding the short atom
header(0x83), we need to let the function 'add_short_atom_header' append
the header based on the parameter being appended.
The parameter has been defined in Section 3.1.2.1 of
https://trustedcomputinggroup.org/wp-content/uploads/TCG_Storage-Opal_Feature_Set_Single_User_Mode_v1-00_r1-00-Final.pdf
Reviewed-by: Jon Derrick <jonathan.derrick@intel.com>
Signed-off-by: Revanth Rajashekar <revanth.rajashekar@intel.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Currently, cgroup rstat is supported only on cgroup2 hierarchy and
rstat functions shouldn't be called on cgroup1 cgroups. While
converting blk-cgroup core statistics to rstat, f733164829
("blk-cgroup: reimplement basic IO stats using cgroup rstat")
accidentally ended up calling cgroup_rstat_updated() on cgroup1
cgroups causing crashes.
Longer term, we probably should add cgroup1 support to rstat but for
now let's mask the call directly.
Fixes: f733164829 ("blk-cgroup: reimplement basic IO stats using cgroup rstat")
Tested-by: Faiz Abbas <faiz_abbas@ti.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
trigger_softirq() is always invoked as a SMP-function call which is
always invoked with disables interrupts.
Don't disable interrupt in trigger_softirq() because interrupts are
already disabled.
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Since the only caller of this function has been deleted, delete this one
also.
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
64K PAGE_SIZE is popular on ARM64 or other ARCHs, and 64K has been big
enough to break some devices probably, so change the logic to split bio
if the only bvec's length is > SZ_4K instead of PAGE_SIZE.
Fixes: fa53228721 (block: avoid blk_bio_segment_split for small I/O operations)
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Some device may set segment boundary as PAGE_SIZE - 1. If the bvec
crosses pages, and meantime its length is <= PAGE_SIZE, we still need
to split the bvec into 2 segments.
Fixes this issue by still splitting bio if the single bvec crosses
pages.
Reported-by: kernel test robot <lkp@intel.com>
Fixes: fa53228721 (block: avoid blk_bio_segment_split for small I/O operations)
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
blkg_rwstat is now only used by bfq-iosched and blk-throtl when on
cgroup1. Let's move it into its own files and gate it behind a config
option.
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
blk-cgroup has been using blkg_rwstat to track basic IO stats.
Unfortunately, reading recursive stats scales badly as itinvolves
walking all descendants. On systems with a huge number of cgroups
(dead or alive), this can lead to substantial CPU cost when reading IO
stats.
This patch reimplements basic IO stats using cgroup rstat which uses
more memory but makes recursive stat reading O(# descendants which
have been active since last reading) instead of O(# descendants).
* blk-cgroup core no longer uses sync/async stats. Introduce new stat
enums - BLKG_IOSTAT_{READ|WRITE|DISCARD}.
* Add blkg_iostat[_set] which encapsulates byte and io stats, last
values for propagation delta calculation and u64_stats_sync for
correctness on 32bit archs.
* Update the new percpu stat counters directly and implement
blkcg_rstat_flush() to implement propagation.
* blkg_print_stat() can now bring the stats up to date by calling
cgroup_rstat_flush() and print them instead of directly summing up
all descendants.
* It now allocates 96 bytes per cpu. It used to be 40 bytes.
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Dan Schatzberg <dschatzberg@fb.com>
Cc: Daniel Xu <dlxu@fb.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
When used on cgroup1, blk-throtl uses the blkg->stat_bytes and
->stat_ios from blk-cgroup core to populate four stat knobs.
blk-cgroup core is moving away from blkg_rwstat to improve scalability
and won't be able to support this usage.
It isn't like the sharing gains all that much. Let's break them out
to dedicated rwstat counters which are updated when on cgroup1.
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
When used on cgroup1, bfq uses the blkg->stat_bytes and ->stat_ios
from blk-cgroup core to populate six stat knobs. blk-cgroup core is
moving away from blkg_rwstat to improve scalability and won't be able
to support this usage.
It isn't like the sharing gains all that much. Let's break it out to
dedicated rwstat counters which are updated when on cgroup1. This
makes use of bfqg_*rwstat*() helpers outside of
CONFIG_BFQ_CGROUP_DEBUG. Move them out.
v2: Compile fix when !CONFIG_BFQ_CGROUP_DEBUG.
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Paolo Valente <paolo.valente@linaro.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Collect them right under #ifdef CONFIG_BFQ_CGROUP_DEBUG. The next
patch will use them from !DEBUG path and this makes it easy to move
them out of the ifdef block.
This is pure code reorganization.
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Pull on for-linus to resolve what otherwise would have been a conflict
with the cgroups rstat patchset from Tejun.
* for-linus: (942 commits)
blkcg: make blkcg_print_stat() print stats only for online blkgs
nvme: change nvme_passthru_cmd64 to explicitly mark rsvd
nvme-multipath: fix crash in nvme_mpath_clear_ctrl_paths
nvme-rdma: fix a segmentation fault during module unload
iocost: don't nest spin_lock_irq in ioc_weight_write()
io_uring: ensure we clear io_kiocb->result before each issue
um-ubd: Entrust re-queue to the upper layers
nvme-multipath: remove unused groups_only mode in ana log
nvme-multipath: fix possible io hang after ctrl reconnect
io_uring: don't touch ctx in setup after ring fd install
io_uring: Fix leaked shadow_req
Linux 5.4-rc5
riscv: cleanup do_trap_break
nbd: verify socket is supported during setup
ata: libahci_platform: Fix regulator_get_optional() misuse
nbd: handle racing with error'ed out commands
nbd: protect cmd->status with cmd->lock
io_uring: fix bad inflight accounting for SETUP_IOPOLL|SETUP_SQTHREAD
io_uring: used cached copies of sq->dropped and cq->overflow
ARM: dts: stm32: relax qspi pins slew-rate for stm32mp157
...
Introduce three new ioctl commands BLKOPENZONE, BLKCLOSEZONE and
BLKFINISHZONE to allow applications to control the condition of zones
on a zoned block device through the execution of the REQ_OP_ZONE_OPEN,
REQ_OP_ZONE_CLOSE and REQ_OP_ZONE_FINISH operations.
Contains contributions from Matias Bjorling, Hans Holmberg,
Dmitry Fomichev, Keith Busch, Damien Le Moal and Christoph Hellwig.
Reviewed-by: Javier González <javier@javigon.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ajay Joshi <ajay.joshi@wdc.com>
Signed-off-by: Matias Bjorling <matias.bjorling@wdc.com>
Signed-off-by: Hans Holmberg <hans.holmberg@wdc.com>
Signed-off-by: Dmitry Fomichev <dmitry.fomichev@wdc.com>
Signed-off-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Zoned block devices (ZBC and ZAC devices) allow an explicit control
over the condition (state) of zones. The operations allowed are:
* Open a zone: Transition to open condition to indicate that a zone will
actively be written
* Close a zone: Transition to closed condition to release the drive
resources used for writing to a zone
* Finish a zone: Transition an open or closed zone to the full
condition to prevent write operations
To enable this control for in-kernel zoned block device users, define
the new request operations REQ_OP_ZONE_OPEN, REQ_OP_ZONE_CLOSE
and REQ_OP_ZONE_FINISH as well as the generic function
blkdev_zone_mgmt() for submitting these operations on a range of zones.
This results in blkdev_reset_zones() removal and replacement with this
new zone magement function. Users of blkdev_reset_zones() (f2fs and
dm-zoned) are updated accordingly.
Contains contributions from Matias Bjorling, Hans Holmberg,
Dmitry Fomichev, Keith Busch, Damien Le Moal and Christoph Hellwig.
Reviewed-by: Javier González <javier@javigon.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ajay Joshi <ajay.joshi@wdc.com>
Signed-off-by: Matias Bjorling <matias.bjorling@wdc.com>
Signed-off-by: Hans Holmberg <hans.holmberg@wdc.com>
Signed-off-by: Dmitry Fomichev <dmitry.fomichev@wdc.com>
Signed-off-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
There is no need for the function __blkdev_reset_all_zones() as
REQ_OP_ZONE_RESET_ALL can be handled directly in blkdev_reset_zones()
bio loop with an early break from the loop. This patch removes this
function and modifies blkdev_reset_zones(), simplifying the code.
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
REQ_OP_ZONE_RESET operations cannot be merged as these bios and requests
do not have a size and are never sequential due to the zone start sector
position required for their execution. As a result, there is no point in
using a plug around blkdev_reset_zones() bio issuing loop. This patch
removes this unnecessary plugging.
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Reviewed-by: Javier González <javier@javigon.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
blkcg_print_stat() iterates blkgs under RCU and doesn't test whether
the blkg is online. This can call into pd_stat_fn() on a pd which is
still being initialized leading to an oops.
The heaviest operation - recursively summing up rwstat counters - is
already done while holding the queue_lock. Expand queue_lock to cover
the other operations and skip the blkg if it isn't online yet. The
online state is protected by both blkcg and queue locks, so this
guarantees that only online blkgs are processed.
Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Roman Gushchin <guro@fb.com>
Cc: Josef Bacik <jbacik@fb.com>
Fixes: 903d23f0a3 ("blk-cgroup: allow controllers to output their own stats")
Cc: stable@vger.kernel.org # v4.19+
Signed-off-by: Jens Axboe <axboe@kernel.dk>
With transition to blk-mq, the elevator= kernel argument was removed as
it makes less and less sense with the current variety of devices. Since
this may surprise some users and there are advices on the Internet that
still suggest to use it, let's at least warn if the parameter is used.
Reviewed-by: Jeff Moyer <jmoyer@redhat.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Pull NVMe fixes from Keith:
"We have a few late nvme fixes for a couple device removal kernel
crashes, and a compat fix for a new ioctl introduced during this merge
window."
* 'nvme-5.4-rc7' of git://git.infradead.org/nvme:
nvme: change nvme_passthru_cmd64 to explicitly mark rsvd
nvme-multipath: fix crash in nvme_mpath_clear_ctrl_paths
nvme-rdma: fix a segmentation fault during module unload
Changing nvme_passthru_cmd64 to add a field: rsvd2. This field is an explicit
marker for the padding space added on certain platforms as a result of the
enlargement of the result field from 32 bit to 64 bits in size, and
fixes differences in struct size when using compat ioctl for 32-bit
binaries on 64-bit architecture.
Fixes: 65e68edce0 ("nvme: allow 64-bit results in passthru commands")
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Charles Machalow <csm10495@gmail.com>
[changelog]
Signed-off-by: Keith Busch <kbusch@kernel.org>
nvme_mpath_clear_ctrl_paths() iterates through
the ctrl->namespaces list while holding ctrl->scan_lock.
This does not seem to be the correct way of protecting
from concurrent list modification.
Specifically, nvme_scan_work() sorts ctrl->namespaces
AFTER unlocking scan_lock.
This may result in the following (rare) crash in ctrl disconnect
during scan_work:
BUG: kernel NULL pointer dereference, address: 0000000000000050
Oops: 0000 [#1] SMP PTI
CPU: 0 PID: 3995 Comm: nvme 5.3.5-050305-generic
RIP: 0010:nvme_mpath_clear_current_path+0xe/0x90 [nvme_core]
...
Call Trace:
nvme_mpath_clear_ctrl_paths+0x3c/0x70 [nvme_core]
nvme_remove_namespaces+0x35/0xe0 [nvme_core]
nvme_do_delete_ctrl+0x47/0x90 [nvme_core]
nvme_sysfs_delete+0x49/0x60 [nvme_core]
dev_attr_store+0x17/0x30
sysfs_kf_write+0x3e/0x50
kernfs_fop_write+0x11e/0x1a0
__vfs_write+0x1b/0x40
vfs_write+0xb9/0x1a0
ksys_write+0x67/0xe0
__x64_sys_write+0x1a/0x20
do_syscall_64+0x5a/0x130
entry_SYSCALL_64_after_hwframe+0x44/0xa9
RIP: 0033:0x7f8d02bfb154
Fix:
After taking scan_lock in nvme_mpath_clear_ctrl_paths()
down_read(&ctrl->namespaces_rwsem) as well to make list traversal safe.
This will not cause deadlocks because taking scan_lock never happens
while holding the namespaces_rwsem.
Moreover, scan work downs namespaces_rwsem in the same order.
Alternative: sort ctrl->namespaces in nvme_scan_work()
while still holding the scan_lock.
This would leave nvme_mpath_clear_ctrl_paths() without correct protection
against ctrl->namespaces modification by anyone other than scan_work.
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Anton Eidelman <anton@lightbitslabs.com>
Signed-off-by: Keith Busch <kbusch@kernel.org>
In case there are controllers that are not associated with any RDMA
device (e.g. during unsuccessful reconnection) and the user will unload
the module, these controllers will not be freed and will access already
freed memory. The same logic appears in other fabric drivers as well.
Fixes: 87fd125344 ("nvme-rdma: remove redundant reference between ib_device and tagset")
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Max Gurtovoy <maxg@mellanox.com>
Signed-off-by: Keith Busch <kbusch@kernel.org>
__blk_queue_split() adds significant overhead for small I/O operations.
Add a shortcut to avoid it for cases where we know we never need to
split.
Based on a patch from Ming Lei.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
8962842ca5 ("blk-mq: avoid sysfs buffer overflow with too many CPU cores")
avoids sysfs buffer overflow, and reserves one character for line break.
However, the last snprintf() doesn't get correct 'size' parameter passed
in, so fixed it.
Fixes: 8962842ca5 ("blk-mq: avoid sysfs buffer overflow with too many CPU cores")
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
This patch introduces Opal Datastore UID.
The generic read/write table ioctl can use this UID
to access the Opal Datastore.
Reviewed-by: Scott Bauer <sbauer@plzdonthack.me>
Reviewed-by: Jon Derrick <jonathan.derrick@intel.com>
Signed-off-by: Revanth Rajashekar <revanth.rajashekar@intel.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
This feature gives the user RW access to any opal table with admin1
authority. The flags described in the new structure determines if the user
wants to read/write the data. Flags are checked for valid values in
order to allow future features to be added to the ioctl.
The user can provide the desired table's UID. Also, the ioctl provides a
size and offset field and internally will loop data accesses to return
the full data block. Read overrun is prevented by the initiator's
sec_send_recv() backend. The ioctl provides a private field with the
intention to accommodate any future expansions to the ioctl.
Reviewed-by: Scott Bauer <sbauer@plzdonthack.me>
Reviewed-by: Jon Derrick <jonathan.derrick@intel.com>
Signed-off-by: Revanth Rajashekar <revanth.rajashekar@intel.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
This patch refactors the existing "write_shadowmbr" func and
creates a new generalized function "generic_table_write_data",
to write data to any opal table. Also, a few cleanups are included
in this patch.
Reviewed-by: Scott Bauer <sbauer@plzdonthack.me>
Reviewed-by: Jon Derrick <jonathan.derrick@intel.com>
Signed-off-by: Revanth Rajashekar <revanth.rajashekar@intel.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Currently, block device size in not updated on second and further open
for block devices where partition scan is disabled. This is particularly
annoying for example for DVD drives as that means block device size does
not get updated once the media is inserted into a drive if the device is
already open when inserting the media. This is actually always the case
for example when pktcdvd is in use.
Fix the problem by revalidating block device size on every open even for
devices with partition scan disabled.
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Factor out code handling revalidation of bdev on disk change into a
common helper.
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
It is reported that sysfs buffer overflow can be triggered if the system
has too many CPU cores(>841 on 4K PAGE_SIZE) when showing CPUs of
hctx via /sys/block/$DEV/mq/$N/cpu_list.
Use snprintf to avoid the potential buffer overflow.
This version doesn't change the attribute format, and simply stops
showing CPU numbers if the buffer is going to overflow.
Cc: stable@vger.kernel.org
Fixes: 676141e48af7("blk-mq: don't dump CPU -> hw queue map on driver load")
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Since commit 97889f9ac2 ("blk-mq: remove synchronize_rcu() from
blk_mq_del_queue_tag_set()"), the return value of blk_mq_run_hw_queue()
is never checked, so make it return void, which very marginally simplifies
the code.
Reviewed-by: Bob Liu <bob.liu@oracle.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
This code causes a static analysis warning:
block/blk-iocost.c:2113 ioc_weight_write() error: double lock 'irq'
We disable IRQs in blkg_conf_prep() and re-enable them in
blkg_conf_finish(). IRQ disable/enable should not be nested because
that means the IRQs will be enabled at the first unlock instead of the
second one.
Fixes: 7caa47151a ("blkcg: implement blk-iocost")
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
We use io_kiocb->result == -EAGAIN as a way to know if we need to
re-submit a polled request, as -EAGAIN reporting happens out-of-line
for IO submission failures. This field is cleared when we originally
allocate the request, but it isn't reset when we retry the submission
from async context. This can cause issues where we think something
needs a re-issue, but we're really just reading stale data.
Reset ->result whenever we re-prep a request for polled submission.
Cc: stable@vger.kernel.org
Fixes: 9e645e1105 ("io_uring: add support for sqe links")
Reported-by: Bijan Mottahedeh <bijan.mottahedeh@oracle.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
groups_only mode in nvme_read_ana_log() is no longer used: remove it.
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Anton Eidelman <anton@lightbitslabs.com>
Signed-off-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
The following scenario results in an IO hang:
1) ctrl completes a request with NVME_SC_ANA_TRANSITION.
NVME_NS_ANA_PENDING bit in ns->flags is set and ana_work is triggered.
2) ana_work: nvme_read_ana_log() tries to get the ANA log page from the ctrl.
This fails because ctrl disconnects.
Therefore nvme_update_ns_ana_state() is not called
and NVME_NS_ANA_PENDING bit in ns->flags is not cleared.
3) ctrl reconnects: nvme_mpath_init(ctrl,...) calls
nvme_read_ana_log(ctrl, groups_only=true).
However, nvme_update_ana_state() does not update namespaces
because nr_nsids = 0 (due to groups_only mode).
4) scan_work calls nvme_validate_ns() finds the ns and re-validates OK.
Result:
The ctrl is now live but NVME_NS_ANA_PENDING bit in ns->flags is still set.
Consequently ctrl will never be considered a viable path by __nvme_find_path().
IO will hang if ctrl is the only or the last path to the namespace.
More generally, while ctrl is reconnecting, its ANA state may change.
And because nvme_mpath_init() requests ANA log in groups_only mode,
these changes are not propagated to the existing ctrl namespaces.
This may result in a mal-function or an IO hang.
Solution:
nvme_mpath_init() will nvme_read_ana_log() with groups_only set to false.
This will not harm the new ctrl case (no namespaces present),
and will make sure the ANA state of namespaces gets updated after reconnect.
Note: Another option would be for nvme_mpath_init() to invoke
nvme_parse_ana_log(..., nvme_set_ns_ana_state) for each existing namespace.
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Anton Eidelman <anton@lightbitslabs.com>
Signed-off-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
syzkaller reported an issue where it looks like a malicious app can
trigger a use-after-free of reading the ctx ->sq_array and ->rings
value right after having installed the ring fd in the process file
table.
Defer ring fd installation until after we're done reading those
values.
Fixes: 75b28affdd ("io_uring: allocate the two rings together")
Reported-by: syzbot+6f03d895a6cd0d06187f@syzkaller.appspotmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
io_queue_link_head() owns shadow_req after taking it as an argument.
By not freeing it in case of an error, it can leak the request along
with taken ctx->refs.
Reviewed-by: Jackie Liu <liuyun01@kylinos.cn>
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Pull x86 fixes from Thomas Gleixner:
"Two fixes for the VMWare guest support:
- Unbreak VMWare platform detection which got wreckaged by converting
an integer constant to a string constant.
- Fix the clang build of the VMWAre hypercall by explicitely
specifying the ouput register for INL instead of using the short
form"
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/cpu/vmware: Fix platform detection VMWARE_PORT macro
x86/cpu/vmware: Use the full form of INL in VMWARE_HYPERCALL, for clang/llvm
Pull timer fixes from Thomas Gleixner:
"A small set of fixes for time(keeping):
- Add a missing include to prevent compiler warnings.
- Make the VDSO implementation of clock_getres() POSIX compliant
again. A recent change dropped the NULL pointer guard which is
required as NULL is a valid pointer value for this function.
- Fix two function documentation typos"
* 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
posix-cpu-timers: Fix two trivial comments
timers/sched_clock: Include local timekeeping.h for missing declarations
lib/vdso: Make clock_getres() POSIX compliant again
Pull perf fixes from Thomas Gleixner:
"A set of perf fixes:
kernel:
- Unbreak the tracking of auxiliary buffer allocations which got
imbalanced causing recource limit failures.
- Fix the fallout of splitting of ToPA entries which missed to shift
the base entry PA correctly.
- Use the correct context to lookup the AUX event when unmapping the
associated AUX buffer so the event can be stopped and the buffer
reference dropped.
tools:
- Fix buildiid-cache mode setting in copyfile_mode_ns() when copying
/proc/kcore
- Fix freeing id arrays in the event list so the correct event is
closed.
- Sync sched.h anc kvm.h headers with the kernel sources.
- Link jvmti against tools/lib/ctype.o to have weak strlcpy().
- Fix multiple memory and file descriptor leaks, found by coverity in
perf annotate.
- Fix leaks in error handling paths in 'perf c2c', 'perf kmem', found
by a static analysis tool"
* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf/aux: Fix AUX output stopping
perf/aux: Fix tracking of auxiliary trace buffer allocation
perf/x86/intel/pt: Fix base for single entry topa
perf kmem: Fix memory leak in compact_gfp_flags()
tools headers UAPI: Sync sched.h with the kernel
tools headers kvm: Sync kvm.h headers with the kernel sources
tools headers kvm: Sync kvm headers with the kernel sources
tools headers kvm: Sync kvm headers with the kernel sources
perf c2c: Fix memory leak in build_cl_output()
perf tools: Fix mode setting in copyfile_mode_ns()
perf annotate: Fix multiple memory and file descriptor leaks
perf tools: Fix resource leak of closedir() on the error paths
perf evlist: Fix fix for freed id arrays
perf jvmti: Link against tools/lib/ctype.h to have weak strlcpy()
Pull irq fixes from Thomas Gleixner:
"Two fixes for interrupt controller drivers:
- Skip IRQ_M_EXT entries in the device tree when initializing the
RISCV PLIC controller to avoid a double init attempt.
- Use the correct ITS list when issuing the VMOVP synchronization
command so the operation works only on the ITS instances which are
associated to a VM"
* 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
irqchip/sifive-plic: Skip contexts except supervisor in plic_init()
irqchip/gic-v3-its: Use the exact ITSList for VMOVP
-----BEGIN PGP SIGNATURE-----
iQGyBAABCgAdFiEE6fsu8pdIjtWE/DpLiiy9cAdyT1EFAl208asACgkQiiy9cAdy
T1EVJAv41lwLL01FQtWsoMB8yn46q7iboc0eqVJ4CEnZj25Fe3VgokKm1+2N6/15
g8tQeLnE8kzjQyAxEgI260PHRF6Dg0RSL7zwxnJOFpspWLb9+U4UEryCDao/v4Ao
LYr88F6XeYDGojyuGKokT/jlsxWfdPwaYhpOPabjO76f6Jv7cNfIa9EKzbEAtq3e
a9to2AVnJ02FLGTP3QM1ThBtCu3JFFMI07u2Rp8ANiJYtM5Mg2piOkO1PQ3hGLIM
/3+1E7B12KgG7D3GA7ZuP1HLjyo+A91BhZVTzlBDq1VHTd8ccJDJklg28oXoMrMh
dgW5XRcS/s60LsRzagpNMC0cPlAdBAd9N9uZaoXmQhNAg0I7CJ24H3tglwKejoUR
m/duI1C89Fbo1kgjwfAztxhfvair2SYLu8DGijvYH/C5syNUR7V99uy1th+nmOgl
vpBM9CbkM6mtSxFHtgcFuv4rwIf61EQaGoXiL/e3RLH4zc/HaUpBo2rDh0LPmwFB
koXIw6E=
=fTOk
-----END PGP SIGNATURE-----
Merge tag '5.4-rc5-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6
Pull cifs fixes from Steve French:
"Seven cifs/smb3 fixes, including three for stable"
* tag '5.4-rc5-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6:
cifs: Fix cifsInodeInfo lock_sem deadlock when reconnect occurs
CIFS: Fix use after free of file info structures
CIFS: Fix retry mid list corruption on reconnects
cifs: Fix missed free operations
CIFS: avoid using MID 0xFFFF
cifs: clarify comment about timestamp granularity for old servers
cifs: Handle -EINPROGRESS only when noblockcnt is set
Several minor fixes and cleanups for v5.4-rc5:
- Three build fixes for various SPARSEMEM-related kernel configurations
- Two cleanup patches for the kernel bug and breakpoint trap handler code
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEElRDoIDdEz9/svf2Kx4+xDQu9KksFAl203dwACgkQx4+xDQu9
Kks4cxAAsun5+QPqwbexWMZXSPbGkfxFAzs9Ma0dSsAH46+g1cb+kkBpeJwX0oPS
NO/oz0bxKDqCRRCEUM5UBg+JsBCUBucPHoxMfNnJKjtjIMkIjadR+6N6Zz8gyXqD
qK3NLzUpadfuiphxXo+frHTlPpRZdo7MIT2FLa6dSq4uutqJZ7CmAhvAWkYQQDfS
vAlxBU4lqsk4PW2IuX0Q5n90Jptyon4EJo3Cnu5IgEEH9MSGb4JsWZFBOAsFFmEH
1Q6Q2a4xiuRs0L14bZ44kem7Fg3ietN/oWMiGf58ZIc+h18S3gh0cHO64X3rZFRF
9ad5jFCqMvibApxJqsf6/wI4/LODRUPJPcj+veliH40lkIm3sNnqNdVhESaPICg6
5j0hmUMPYazj2NnADhFi1B9gEuChl9nsUxEbMxClhZxdC6+4AHeqC1KmLsQS7vyY
Zs8vxU4eONsKbTP4qC1aGeUjlu7yBe35E9GUUZ6llm6uRuRAmXGG1IRdUIgrSnQt
vrePKyWjRekW+F5nZeEvUhhQUWwCsanaa6pKK0ybXHkULg5v8Sf50DlIognxehTt
IvtjLsD8gQKnGNY1crqYuFy95kZ0UOgtDCJW13cy3EwVIUWRdPcTZkpWQMFFILL5
EaTLERIK0DUbiKwpAlfmO2WL7tOJh+wxfZCnzXrdZMhFbmfJ5Vw=
=l7GM
-----END PGP SIGNATURE-----
Merge tag 'riscv/for-v5.4-rc5-b' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux
Pull RISC-V fixes from Paul Walmsley:
"Several minor fixes and cleanups for v5.4-rc5:
- Three build fixes for various SPARSEMEM-related kernel
configurations
- Two cleanup patches for the kernel bug and breakpoint trap handler
code"
* tag 'riscv/for-v5.4-rc5-b' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
riscv: cleanup do_trap_break
riscv: cleanup <asm/bug.h>
riscv: Fix undefined reference to vmemmap_populate_basepages
riscv: Fix implicit declaration of 'page_to_section'
riscv: fix fs/proc/kcore.c compilation with sparsemem enabled
- Fix VDSO time-related function behavior for systems where we need to
fall back to syscalls, but were instead returning bogus results.
- A fix to TLB exception handlers for Cavium Octeon systems where they
would inadvertently clobber the $1/$at register.
- A build fix for bcm63xx configurations.
- Switch to using my @kernel.org email address.
-----BEGIN PGP SIGNATURE-----
iIwEABYIADQWIQRgLjeFAZEXQzy86/s+p5+stXUA3QUCXbTEbhYccGF1bGJ1cnRv
bkBrZXJuZWwub3JnAAoJED6nn6y1dQDd+HsBAJ2Zvzlm+CftfNTPbG1SihhyH3s4
edn8VuexsPJp+TjJAP9UZHPQj35tvS5MWYRg0YsNz9HYPTVclYdEsLS9KbSMCw==
=YNU+
-----END PGP SIGNATURE-----
Merge tag 'mips_fixes_5.4_3' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux
Pull MIPS fixes from Paul Burton:
"A few MIPS fixes:
- Fix VDSO time-related function behavior for systems where we need
to fall back to syscalls, but were instead returning bogus results.
- A fix to TLB exception handlers for Cavium Octeon systems where
they would inadvertently clobber the $1/$at register.
- A build fix for bcm63xx configurations.
- Switch to using my @kernel.org email address"
* tag 'mips_fixes_5.4_3' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux:
MIPS: tlbex: Fix build_restore_pagemask KScratch restore
MIPS: bmips: mark exception vectors as char arrays
mips: vdso: Fix __arch_get_hw_counter()
MAINTAINERS: Use @kernel.org address for Paul Burton