Commit 1fd7c3b438 ("kobject: Improve doc clarity kobject_init_and_add()")
tried to provide more clarity, but the reference to kobject_del() was
incorrect. Fix that up by removing that line, and hopefully be more explicit
as to exactly what needs to happen here once you register a kobject with the
kobject core.
Acked-by: Tobin C. Harding <tobin@kernel.org>
Fixes: 1fd7c3b438 ("kobject: Improve doc clarity kobject_init_and_add()")
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
kernel-doc comments have a prescribed format. This includes parenthesis
on the function name. To be _particularly_ correct we should also
capitalise the brief description and terminate it with a period.
In preparation for adding/updating kernel-doc function comments clean up
the ones currently present.
Signed-off-by: Tobin C. Harding <tobin@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Currently the docstring for kobject_get_path() mentions 'kset'. The
kset is not used in the function callchain starting from this function.
Remove docstring reference to kset from the function kobject_get_path().
Signed-off-by: Tobin C. Harding <tobin@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
"sysfs" was misspelled in a comment and a log message.
Signed-off-by: Jonathan Neuschäfer <j.neuschaefer@gmx.net>
Reviewed-by: Mukesh Ojha <mojha@codeaurora.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The kobj pointer is being null-checked so potentially it could be null,
however, the ktype declaration before the null check is dereferencing kobj
hence we have a potential null pointer deference. Fix this by moving the
assignment of ktype after kobj has been null checked.
Addresses-Coverity: ("Dereference before null check")
Fixes: aa30f47cf6 ("kobject: Add support for default attribute groups to kobj_type")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This reverts commit edb16da34b as it
breaks existing systems as reported by Krzysztof.
Reported-by: Krzysztof Kozlowski <krzk@kernel.org>
Cc: Venkata Narendra Kumar Gutta <vnkgutta@codeaurora.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Since commit 13610aa908 ("kernel/configs: use .incbin directive to
embed config_data.gz"), IKCONFIG no longer uses BUILD_BIN2C so prevent
it from being selected in Kconfig.
Reviewed-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Introduce in-kernel headers which are made available as an archive
through proc (/proc/kheaders.tar.xz file). This archive makes it
possible to run eBPF and other tracing programs that need to extend the
kernel for tracing purposes without any dependency on the file system
having headers.
A github PR is sent for the corresponding BCC patch at:
https://github.com/iovisor/bcc/pull/2312
On Android and embedded systems, it is common to switch kernels but not
have kernel headers available on the file system. Further once a
different kernel is booted, any headers stored on the file system will
no longer be useful. This is an issue even well known to distros.
By storing the headers as a compressed archive within the kernel, we can
avoid these issues that have been a hindrance for a long time.
The best way to use this feature is by building it in. Several users
have a need for this, when they switch debug kernels, they do not want to
update the filesystem or worry about it where to store the headers on
it. However, the feature is also buildable as a module in case the user
desires it not being part of the kernel image. This makes it possible to
load and unload the headers from memory on demand. A tracing program can
load the module, do its operations, and then unload the module to save
kernel memory. The total memory needed is 3.3MB.
By having the archive available at a fixed location independent of
filesystem dependencies and conventions, all debugging tools can
directly refer to the fixed location for the archive, without concerning
with where the headers on a typical filesystem which significantly
simplifies tooling that needs kernel headers.
The code to read the headers is based on /proc/config.gz code and uses
the same technique to embed the headers.
Other approaches were discussed such as having an in-memory mountable
filesystem, but that has drawbacks such as requiring an in-kernel xz
decompressor which we don't have today, and requiring usage of 42 MB of
kernel memory to host the decompressed headers at anytime. Also this
approach is simpler than such approaches.
Reviewed-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Function kobject_init_and_add() is currently misused in a number of
places in the kernel. On error return kobject_put() must be called but
is at times not.
Make the function documentation more explicit about calling
kobject_put() in the error path.
Signed-off-by: Tobin C. Harding <tobin@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
There is currently some confusion on how to wind back
kobject_init_and_add() during the error paths in code that uses this
function.
Add documentation to kobject_add() and kobject_del() to help clarify the
usage.
Signed-off-by: Tobin C. Harding <tobin@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Platform core is using pdev->name as the platform device name to do
the binding of the devices with the drivers. But, when the platform
driver overrides the platform device name with dev_set_name(),
the pdev->name is pointing to a location which is freed and becomes
an invalid parameter to do the binding match.
use-after-free instance:
[ 33.325013] BUG: KASAN: use-after-free in strcmp+0x8c/0xb0
[ 33.330646] Read of size 1 at addr ffffffc10beae600 by task modprobe
[ 33.339068] CPU: 5 PID: 518 Comm: modprobe Tainted:
G S W O 4.19.30+ #3
[ 33.346835] Hardware name: MTP (DT)
[ 33.350419] Call trace:
[ 33.352941] dump_backtrace+0x0/0x3b8
[ 33.356713] show_stack+0x24/0x30
[ 33.360119] dump_stack+0x160/0x1d8
[ 33.363709] print_address_description+0x84/0x2e0
[ 33.368549] kasan_report+0x26c/0x2d0
[ 33.372322] __asan_report_load1_noabort+0x2c/0x38
[ 33.377248] strcmp+0x8c/0xb0
[ 33.380306] platform_match+0x70/0x1f8
[ 33.384168] __driver_attach+0x78/0x3a0
[ 33.388111] bus_for_each_dev+0x13c/0x1b8
[ 33.392237] driver_attach+0x4c/0x58
[ 33.395910] bus_add_driver+0x350/0x560
[ 33.399854] driver_register+0x23c/0x328
[ 33.403886] __platform_driver_register+0xd0/0xe0
So, use dev_name(&pdev->dev), which fetches the platform device name from
the kobject(dev->kobj->name) of the device instead of the pdev->name.
Signed-off-by: Venkata Narendra Kumar Gutta <vnkgutta@codeaurora.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The kobj_type default_attrs field is being replaced by the
default_groups field. Replace klp_ktype_patch's default_attrs field
with default_groups and use the ATTRIBUTE_GROUPS macro to create
klp_patch_groups.
This patch was tested by loading the livepatch-sample module and
verifying that the sysfs files for the attributes in the default groups
were created.
Signed-off-by: Kimberly Brown <kimbrownkd@gmail.com>
Acked-by: Jiri Kosina <jkosina@suse.cz>
Acked-by: Miroslav Benes <mbenes@suse.cz>
Acked-by: Petr Mladek <pmladek@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The kobj_type default_attrs field is being replaced by the
default_groups field. Replace sugov_tunables_ktype's default_attrs field
with default groups. Change "sugov_attributes" to "sugov_attrs" and use
the ATTRIBUTE_GROUPS macro to create sugov_groups.
This patch was tested by setting the scaling governor to schedutil and
verifying that the sysfs files for the attributes in the default groups
were created.
Signed-off-by: Kimberly Brown <kimbrownkd@gmail.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The kobj_type default_attrs field is being replaced by the
default_groups field. Replace padata_attr_type's default_attrs field
with default_groups and use the ATTRIBUTE_GROUPS macro to create
padata_default_groups.
This patch was tested by loading the pcrypt module and verifying that
the sysfs files for the attributes in the default groups were created.
Signed-off-by: Kimberly Brown <kimbrownkd@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The kobj_type default_attrs field is being replaced by the
default_groups field. Replace irq_kobj_type's default_attrs field with
default_groups and use the ATTRIBUTE_GROUPS macro to create irq_groups.
This patch was tested by verifying that the sysfs files for the
attributes in the default groups were created.
Signed-off-by: Kimberly Brown <kimbrownkd@gmail.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The kobj_type default_attrs field is being replaced by the
default_groups field. Replace the default_attrs fields in rx_queue_ktype
and netdev_queue_ktype with default_groups. Use the ATTRIBUTE_GROUPS
macro to create rx_queue_default_groups and netdev_queue_default_groups.
This patch was tested by verifying that the sysfs files for the
attributes in the default groups were created.
Signed-off-by: Kimberly Brown <kimbrownkd@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The kobj_type default_attrs field is being replaced by the
default_groups field. Replace all of the ktype default_attrs fields in
the block subsystem with default_groups and use the ATTRIBUTE_GROUPS
macro to create the default groups.
Remove default_ctx_attrs[] because it doesn't contain any attributes.
This patch was tested by verifying that the sysfs files for the
attributes in the default groups were created.
Signed-off-by: Kimberly Brown <kimbrownkd@gmail.com>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The kobj_type default_attrs field is being replaced by the
default_groups field. Replace foo_ktype's default_attrs field with
default_groups and use the ATTRIBUTE_GROUPS macro to create
foo_default_groups.
This patch was tested by loading the kset-example module and verifying
that the sysfs files for the attributes in the default group were
created.
Signed-off-by: Kimberly Brown <kimbrownkd@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
kobj_type currently uses a list of individual attributes to store
default attributes. Attribute groups are more flexible than a list of
attributes because groups provide support for attribute visibility. So,
add support for default attribute groups to kobj_type.
In future patches, the existing uses of kobj_type’s attribute list will
be converted to attribute groups. When that is complete, kobj_type’s
attribute list, “default_attrs”, will be removed.
Signed-off-by: Kimberly Brown <kimbrownkd@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
In commit 376991db4b ("driver core: Postpone DMA tear-down until after
devres release"), we changed the ordering of tearing down the device DMA
ops and releasing all the device's resources; this was because the DMA ops
should be maintained until we release the device's managed DMA memories.
However, we have seen another crash on an arm64 system when a
device driver probe fails:
hisi_sas_v3_hw 0000:74:02.0: Adding to iommu group 2
scsi host1: hisi_sas_v3_hw
BUG: Bad page state in process swapper/0 pfn:313f5
page:ffff7e0000c4fd40 count:1 mapcount:0
mapping:0000000000000000 index:0x0
flags: 0xfffe00000001000(reserved)
raw: 0fffe00000001000 ffff7e0000c4fd48 ffff7e0000c4fd48
0000000000000000
raw: 0000000000000000 0000000000000000 00000001ffffffff
0000000000000000
page dumped because: PAGE_FLAGS_CHECK_AT_FREE flag(s) set
bad because of flags: 0x1000(reserved)
Modules linked in:
CPU: 49 PID: 1 Comm: swapper/0 Not tainted
5.1.0-rc1-43081-g22d97fd-dirty #1433
Hardware name: Huawei D06/D06, BIOS Hisilicon D06 UEFI
RC0 - V1.12.01 01/29/2019
Call trace:
dump_backtrace+0x0/0x118
show_stack+0x14/0x1c
dump_stack+0xa4/0xc8
bad_page+0xe4/0x13c
free_pages_check_bad+0x4c/0xc0
__free_pages_ok+0x30c/0x340
__free_pages+0x30/0x44
__dma_direct_free_pages+0x30/0x38
dma_direct_free+0x24/0x38
dma_free_attrs+0x9c/0xd8
dmam_release+0x20/0x28
release_nodes+0x17c/0x220
devres_release_all+0x34/0x54
really_probe+0xc4/0x2c8
driver_probe_device+0x58/0xfc
device_driver_attach+0x68/0x70
__driver_attach+0x94/0xdc
bus_for_each_dev+0x5c/0xb4
driver_attach+0x20/0x28
bus_add_driver+0x14c/0x200
driver_register+0x6c/0x124
__pci_register_driver+0x48/0x50
sas_v3_pci_driver_init+0x20/0x28
do_one_initcall+0x40/0x25c
kernel_init_freeable+0x2b8/0x3c0
kernel_init+0x10/0x100
ret_from_fork+0x10/0x18
Disabling lock debugging due to kernel taint
BUG: Bad page state in process swapper/0 pfn:313f6
page:ffff7e0000c4fd80 count:1 mapcount:0
mapping:0000000000000000 index:0x0
[ 89.322983] flags: 0xfffe00000001000(reserved)
raw: 0fffe00000001000 ffff7e0000c4fd88 ffff7e0000c4fd88
0000000000000000
raw: 0000000000000000 0000000000000000 00000001ffffffff
0000000000000000
The crash occurs for the same reason.
In this case, on the really_probe() failure path, we are still clearing
the DMA ops prior to releasing the device's managed memories.
This patch fixes this issue by reordering the DMA ops teardown and the
call to devres_release_all() on the failure path.
Reported-by: Xiang Chen <chenxiang66@hisilicon.com>
Tested-by: Xiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Reviewed-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Since insert_resource() might return an error we don't need
to shadow its error code and would safely propagate to the user.
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
smp_mb__before_atomic() can not be applied to atomic_set(). Remove the
barrier and rely on RELEASE synchronization.
Fixes: ba16b2846a ("kernfs: add an API to get kernfs node from inode number")
Cc: stable@vger.kernel.org
Signed-off-by: Andrea Parri <andrea.parri@amarulasolutions.com>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The commit 665ac7e927 ("acpi/hmat: Register processor domain to its
memory") introduced an uninitialized "struct memory_target" that could
cause an incorrect branching.
drivers/acpi/hmat/hmat.c:385:6: warning: variable 'target' is used
uninitialized whenever 'if' condition is false
[-Wsometimes-uninitialized]
if (p->flags & ACPI_HMAT_MEMORY_PD_VALID) {
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
drivers/acpi/hmat/hmat.c:392:6: note: uninitialized use occurs here
if (target && p->flags & ACPI_HMAT_PROCESSOR_PD_VALID) {
^~~~~~
drivers/acpi/hmat/hmat.c:385:2: note: remove the 'if' if its condition
is always true
if (p->flags & ACPI_HMAT_MEMORY_PD_VALID) {
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
drivers/acpi/hmat/hmat.c:369:30: note: initialize the variable 'target'
to silence this warning
struct memory_target *target;
^
= NULL
Signed-off-by: Qian Cai <cai@lca.pw>
Reviewed-by: Mukesh Ojha <mojha@codeaurora.org>
Fixes: 665ac7e927 ("acpi/hmat: Register processor domain to its memory")
Reviewed-by: Nathan Chancellor <natechancellor@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
ACPI 6.3 changed the subtable "Memory Subsystem Address Range Structure"
to "Memory Proximity Domain Attributes Structure".
Updating and renaming of the structure was included in commit:
ACPICA: ACPI 6.3: HMAT updates (9a8d961f1e)
Rename the enum type to match the subtable and structure naming.
Signed-off-by: Alison Schofield <alison.schofield@intel.com>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The commit 665ac7e927 ("acpi/hmat: Register processor domain to its
memory") introduced some memory leaks below due to it fails to release
the heap memory in an error path, and then those statically-allocated
__initdata memory which reference them get freed during boot renders
those heap memory as leaks. Since it is valid to pass NULL to
acpi_put_table(), it is fine to call it even if acpi_get_table() returns
an error.
unreferenced object 0xc8ff8008349e9400 (size 128):
comm "swapper/0", pid 1, jiffies 4294709236 (age 48121.476s)
hex dump (first 32 bytes):
00 d0 9e 34 08 80 ff 84 d8 00 43 11 00 10 ff ff ...4......C.....
00 00 00 00 ff ff ff ff 00 00 00 00 00 00 00 00 ................
backtrace:
[<00000000869d4503>] __kmalloc+0x568/0x600
[<0000000070fd6afb>] alloc_memory_target+0x50/0xd8
[<00000000efa2081e>] srat_parse_mem_affinity+0x58/0x5c
[<000000008bfaef74>] acpi_parse_entries_array+0x1c8/0x2c0
[<0000000022804877>] acpi_table_parse_entries_array+0x11c/0x138
[<00000000ffe9cd34>] acpi_table_parse_entries+0x7c/0xac
[<00000000a7023afd>] hmat_init+0x90/0x174
[<00000000694a86c1>] do_one_initcall+0x2d8/0x5f8
[<0000000024889da9>] do_initcall_level+0x37c/0x3fc
[<000000009be02908>] do_basic_setup+0x38/0x50
[<0000000037b3ac0a>] kernel_init_freeable+0x194/0x258
[<00000000f5741184>] kernel_init+0x18/0x334
[<000000007b30f423>] ret_from_fork+0x10/0x18
[<000000006c7147a8>] 0xffffffffffffffff
Signed-off-by: Qian Cai <cai@lca.pw>
Fixes: 665ac7e927 ("acpi/hmat: Register processor domain to its memory")
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
It should have been 'management' not 'managemend'.
Fixes: 7945f929f1 ("drivers: provide devm_platform_ioremap_resource()")
Signed-off-by: Bartosz Golaszewski <bgolaszewski@baylibre.com>
Reviewed-by: Mukesh Ojha <mojha@codeaurora.org>
Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
When adding the memory by probing memory block in sysfs interface, there is an
obvious issue that we will unlock the device_hotplug_lock when fails to takes it.
That issue was introduced in Commit 8df1d0e4a2
("mm/memory_hotplug: make add_memory() take the device_hotplug_lock")
We should drop out in time when fails to take the device_hotplug_lock.
Fixes: 8df1d0e4a2 ("mm/memory_hotplug: make add_memory() take the device_hotplug_lock")
Reported-by: Yang yingliang <yangyingliang@huawei.com>
Signed-off-by: zhong jiang <zhongjiang@huawei.com>
Reviewed-by: Oscar Salvador <osalvador@suse.de>
Reviewed-by: David Hildenbrand <david@redhat.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
It is not absolutely clear from the docs how the cleanup path after
device_add() should look like so spell it out explicitly.
No functional changes, just documentation.
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Since commit ff9fb72bc0 ("debugfs: return error values, not NULL")
these helper functions do not return NULL anymore (with the exception
of debugfs_create_u32_array()).
Fixes: ff9fb72bc0 ("debugfs: return error values, not NULL")
Signed-off-by: Ronald Tschalär <ronald@innovation.ch>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
There were a few files in the driver core power code that did not have
SPDX identifiers on them, so fix that up. At the same time, remove the
"free form" text that specified the license of the file, as that is
impossible for any tool to properly parse.
Cc: "Rafael J. Wysocki" <rafael@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
There were two files in the firmware_loader code that did not have SPDX
identifiers on them, so fix that up.
Cc: Luis Chamberlain <mcgrof@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The Makefile in the drivers/base/test/ directory did not have a SPDX
identifier on it, so fix that up.
Cc: "Rafael J. Wysocki" <rafael@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
If user updates any cpu's cpu_capacity, then the new value is going to
be applied to all its online sibling cpus. But this need not to be correct
always, as sibling cpus (in ARM, same micro architecture cpus) would have
different cpu_capacity with different performance characteristics.
So, updating the user supplied cpu_capacity to all cpu siblings
is not correct.
And another problem is, current code assumes that 'all cpus in a cluster
or with same package_id (core_siblings), would have same cpu_capacity'.
But with commit '5bdd2b3f0f8 ("arm64: topology: add support to remove
cpu topology sibling masks")', when a cpu hotplugged out, the cpu
information gets cleared in its sibling cpus. So, user supplied
cpu_capacity would be applied to only online sibling cpus at the time.
After that, if any cpu hotplugged in, it would have different cpu_capacity
than its siblings, which breaks the above assumption.
So, instead of mucking around the core sibling mask for user supplied
value, use device-tree to set cpu capacity. And make the cpu_capacity
node as read-only to know the asymmetry between cpus in the system.
While at it, remove cpu_scale_mutex usage, which used for sysfs write
protection.
Tested-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
Tested-by: Quentin Perret <quentin.perret@arm.com>
Reviewed-by: Quentin Perret <quentin.perret@arm.com>
Acked-by: Sudeep Holla <sudeep.holla@arm.com>
Signed-off-by: Lingutla Chandrasekhar <clingutla@codeaurora.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Platforms may provide system memory where some physical address ranges
perform differently than others, or is cached by the system on the
memory side.
Add documentation describing a high level overview of such systems and the
perforamnce and caching attributes the kernel provides for applications
wishing to query this information.
Reviewed-by: Mike Rapoport <rppt@linux.ibm.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Keith Busch <keith.busch@intel.com>
Tested-by: Brice Goglin <Brice.Goglin@inria.fr>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Register memory side cache attributes with the memory's node if HMAT
provides the side cache iniformation table.
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Keith Busch <keith.busch@intel.com>
Tested-by: Brice Goglin <Brice.Goglin@inria.fr>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Save the best performance access attributes and register these with the
memory's node if HMAT provides the locality table. While HMAT does make
it possible to know performance for all possible initiator-target
pairings, we export only the local pairings at this time.
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Keith Busch <keith.busch@intel.com>
Tested-by: Brice Goglin <Brice.Goglin@inria.fr>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
If the HMAT Subsystem Address Range provides a valid processor proximity
domain for a memory domain, or a processor domain matches the performance
access of the valid processor proximity domain, register the memory
target with that initiator so this relationship will be visible under
the node's sysfs directory.
Since HMAT requires valid address ranges have an equivalent SRAT entry,
verify each memory target satisfies this requirement.
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Keith Busch <keith.busch@intel.com>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Brice Goglin <Brice.Goglin@inria.fr>
Tested-by: Brice Goglin <Brice.Goglin@inria.fr>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
System memory may have caches to help improve access speed to frequently
requested address ranges. While the system provided cache is transparent
to the software accessing these memory ranges, applications can optimize
their own access based on cache attributes.
Provide a new API for the kernel to register these memory-side caches
under the memory node that provides it.
The new sysfs representation is modeled from the existing cpu cacheinfo
attributes, as seen from /sys/devices/system/cpu/<cpu>/cache/. Unlike CPU
cacheinfo though, the node cache level is reported from the view of the
memory. A higher level number is nearer to the CPU, while lower levels
are closer to the last level memory.
The exported attributes are the cache size, the line size, associativity
indexing, and write back policy, and add the attributes for the system
memory caches to sysfs stable documentation.
Signed-off-by: Keith Busch <keith.busch@intel.com>
Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Brice Goglin <Brice.Goglin@inria.fr>
Tested-by: Brice Goglin <Brice.Goglin@inria.fr>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Heterogeneous memory systems provide memory nodes with different latency
and bandwidth performance attributes. Provide a new kernel interface
for subsystems to register the attributes under the memory target
node's initiator access class. If the system provides this information,
applications may query these attributes when deciding which node to
request memory.
The following example shows the new sysfs hierarchy for a node exporting
performance attributes:
# tree -P "read*|write*"/sys/devices/system/node/nodeY/accessZ/initiators/
/sys/devices/system/node/nodeY/accessZ/initiators/
|-- read_bandwidth
|-- read_latency
|-- write_bandwidth
`-- write_latency
The bandwidth is exported as MB/s and latency is reported in
nanoseconds. The values are taken from the platform as reported by the
manufacturer.
Memory accesses from an initiator node that is not one of the memory's
access "Z" initiator nodes linked in the same directory may observe
different performance than reported here. When a subsystem makes use
of this interface, initiators of a different access number may not have
the same performance relative to initiators in other access numbers, or
omitted from the any access class' initiators.
Descriptions for memory access initiator performance access attributes
are added to sysfs stable documentation.
Acked-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Tested-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Keith Busch <keith.busch@intel.com>
Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Tested-by: Brice Goglin <Brice.Goglin@inria.fr>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Systems may be constructed with various specialized nodes. Some nodes
may provide memory, some provide compute devices that access and use
that memory, and others may provide both. Nodes that provide memory are
referred to as memory targets, and nodes that can initiate memory access
are referred to as memory initiators.
Memory targets will often have varying access characteristics from
different initiators, and platforms may have ways to express those
relationships. In preparation for these systems, provide interfaces for
the kernel to export the memory relationship among different nodes memory
targets and their initiators with symlinks to each other.
If a system provides access locality for each initiator-target pair, nodes
may be grouped into ranked access classes relative to other nodes. The
new interface allows a subsystem to register relationships of varying
classes if available and desired to be exported.
A memory initiator may have multiple memory targets in the same access
class. The target memory's initiators in a given class indicate the
nodes access characteristics share the same performance relative to other
linked initiator nodes. Each target within an initiator's access class,
though, do not necessarily perform the same as each other.
A memory target node may have multiple memory initiators. All linked
initiators in a target's class have the same access characteristics to
that target.
The following example show the nodes' new sysfs hierarchy for a memory
target node 'Y' with access class 0 from initiator node 'X':
# symlinks -v /sys/devices/system/node/nodeX/access0/
relative: /sys/devices/system/node/nodeX/access0/targets/nodeY -> ../../nodeY
# symlinks -v /sys/devices/system/node/nodeY/access0/
relative: /sys/devices/system/node/nodeY/access0/initiators/nodeX -> ../../nodeX
The new attributes are added to the sysfs stable documentation.
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Keith Busch <keith.busch@intel.com>
Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Tested-by: Brice Goglin <Brice.Goglin@inria.fr>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Systems may provide different memory types and export this information
in the ACPI Heterogeneous Memory Attribute Table (HMAT). Parse these
tables provided by the platform and report the memory access and caching
attributes to the kernel messages.
Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Tested-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Keith Busch <keith.busch@intel.com>
Tested-by: Brice Goglin <Brice.Goglin@inria.fr>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The Heterogeneous Memory Attribute Table (HMAT) header has different
field lengths than the existing parsing uses. Add the HMAT type to the
parsing rules so it may be generically parsed.
Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Tested-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Keith Busch <keith.busch@intel.com>
Tested-by: Brice Goglin <Brice.Goglin@inria.fr>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Parsing entries in an ACPI table had assumed a generic header
structure. There is no standard ACPI header, though, so less common
layouts with different field sizes required custom parsers to go through
their subtable entry list.
Create the infrastructure for adding different table types so parsing
the entries array may be more reused for all ACPI system tables and
the common code doesn't need to be duplicated.
Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Tested-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Keith Busch <keith.busch@intel.com>
Tested-by: Brice Goglin <Brice.Goglin@inria.fr>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
syzbot is hitting use-after-free bug in uinput module [1]. This is because
kobject_uevent(KOBJ_REMOVE) is called again due to commit 0f4dafc056
("Kobject: auto-cleanup on final unref") after memory allocation fault
injection made kobject_uevent(KOBJ_REMOVE) from device_del() from
input_unregister_device() fail, while uinput_destroy_device() is expecting
that kobject_uevent(KOBJ_REMOVE) is not called after device_del() from
input_unregister_device() completed.
That commit intended to catch cases where nobody even attempted to send
"remove" uevents. But there is no guarantee that an event will ultimately
be sent. We are at the point of no return as far as the rest of the kernel
is concerned; there are no repeats or do-overs.
Also, it is not clear whether some subsystem depends on that commit.
If no subsystem depends on that commit, it will be better to remove
the state_{add,remove}_uevent_sent logic. But we don't want to risk
a regression (in a patch which will be backported) by trying to remove
that logic. Therefore, as a first step, let's avoid the use-after-free bug
by making sure that kobject_uevent(KOBJ_REMOVE) won't be triggered twice.
[1] https://syzkaller.appspot.com/bug?id=8b17c134fe938bbddd75a45afaa9e68af43a362d
Reported-by: syzbot <syzbot+f648cfb7e0b52bf7ae32@syzkaller.appspotmail.com>
Analyzed-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Fixes: 0f4dafc056 ("Kobject: auto-cleanup on final unref")
Cc: Kay Sievers <kay@vrfy.org>
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Since commit 7934779a69 ("Driver-Core: disable /sbin/hotplug by
default"), the help text for the /sbin/hotplug fork-bomb says
"This should not be used today [...] creates a high system load, or
[...] out-of-memory situations during bootup". The rationale for this
was that no recent mainstream system used this anymore (in 2010!).
A few years later, the complete uevent helper support was made optional
in commit 86d56134f1 ("kobject: Make support for uevent_helper
optional."). However, if was still left enabled by default, to support
ancient userland.
Time passed by, and nothing should use this anymore, so it can be
disabled by default.
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
struct device is big, around 760 bytes on x86_64. It's not a critical
structure, but it is embedded everywhere, so making it smaller is always
a good thing.
With a recent patch that moved a field from struct device to the private
structure, some benchmarks showed a very odd regression, despite this
structure having nothing to do with those benchmarks. That caused me to
look into the layout of the structure. Using 'pahole', it showed a
number of holes and ways that the structure could be reordered in order
to align some cachelines better, as well as reduce the size of the
overall structure.
Move 'struct kobj' to the start of the structure, to keep that access
in the first cacheline, and try to organize things a bit more compactly
where possible
By doing these few moves, the result removes at least 8 bytes from
'struct device' on a 64bit system. Given we know there are systems with
at least 30k devices in memory at once, every little byte counts, and
this change could be a savings of 240k of kernel memory for them. On
"normal" systems the overall memory savings would be much less.
Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com>
Cc: Johan Hovold <johan@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
On top of this, a cleanup of kvm_para.h headers, which were exported by
some architectures even though they not support KVM at all. This is
responsible for all the Kbuild changes in the diffstat.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.22 (GNU/Linux)
iQEcBAABAgAGBQJcoM5VAAoJEL/70l94x66DU3EH/A8sYdsfeqALWElm2Sy9TYas
mntz+oTWsl3vDy8s8zp1ET2NpF7oBlBEMmCWhVEJaD+1qW3VpTRAseR3Zr9ML9xD
k+BQM8SKv47o86ZN+y4XALl30Ckb3DXh/X1xsrV5hF6J3ofC+Ce2tF560l8C9ygC
WyHDxwNHMWVA/6TyW3mhunzuVKgZ/JND9+0zlyY1LKmUQ0BQLle23gseIhhI0YDm
B4VGIYU2Mf8jCH5Ir3N/rQ8pLdo8U7f5P/MMfgXQafksvUHJBg6B6vOhLJh94dLh
J2wixYp1zlT0drBBkvJ0jPZ75skooWWj0o3otEA7GNk/hRj6MTllgfL5SajTHZg=
=/A7u
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull KVM fixes from Paolo Bonzini:
"A collection of x86 and ARM bugfixes, and some improvements to
documentation.
On top of this, a cleanup of kvm_para.h headers, which were exported
by some architectures even though they not support KVM at all. This is
responsible for all the Kbuild changes in the diffstat"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (28 commits)
Documentation: kvm: clarify KVM_SET_USER_MEMORY_REGION
KVM: doc: Document the life cycle of a VM and its resources
KVM: selftests: complete IO before migrating guest state
KVM: selftests: disable stack protector for all KVM tests
KVM: selftests: explicitly disable PIE for tests
KVM: selftests: assert on exit reason in CR4/cpuid sync test
KVM: x86: update %rip after emulating IO
x86/kvm/hyper-v: avoid spurious pending stimer on vCPU init
kvm/x86: Move MSR_IA32_ARCH_CAPABILITIES to array emulated_msrs
KVM: x86: Emulate MSR_IA32_ARCH_CAPABILITIES on AMD hosts
kvm: don't redefine flags as something else
kvm: mmu: Used range based flushing in slot_handle_level_range
KVM: export <linux/kvm_para.h> and <asm/kvm_para.h> iif KVM is supported
KVM: x86: remove check on nr_mmu_pages in kvm_arch_commit_memory_region()
kvm: nVMX: Add a vmentry check for HOST_SYSENTER_ESP and HOST_SYSENTER_EIP fields
KVM: SVM: Workaround errata#1096 (insn_len maybe zero on SMAP violation)
KVM: Reject device ioctls from processes other than the VM's creator
KVM: doc: Fix incorrect word ordering regarding supported use of APIs
KVM: x86: fix handling of role.cr4_pae and rename it to 'gpte_size'
KVM: nVMX: Do not inherit quadrant and invalid for the root shadow EPT
...
Pull x86 fixes from Thomas Gleixner:
"A pile of x86 updates:
- Prevent exceeding he valid physical address space in the /dev/mem
limit checks.
- Move all header content inside the header guard to prevent compile
failures.
- Fix the bogus __percpu annotation in this_cpu_has() which makes
sparse very noisy.
- Disable switch jump tables completely when retpolines are enabled.
- Prevent leaking the trampoline address"
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/realmode: Make set_real_mode_mem() static inline
x86/cpufeature: Fix __percpu annotation in this_cpu_has()
x86/mm: Don't exceed the valid physical address space
x86/retpolines: Disable switch jump tables when retpolines are enabled
x86/realmode: Don't leak the trampoline kernel address
x86/boot: Fix incorrect ifdeffery scope
x86/resctrl: Remove unused variable