mirror of https://gitee.com/openkylin/linux.git
32833 Commits
Author | SHA1 | Message | Date |
---|---|---|---|
Linus Torvalds | 0785249f8b |
Time(keeping) updates:
- Fix the time_for_children symlink in /proc/$PID/ so it properly reflects that it part of the 'time' namespace - Add the missing userns limit for the allowed number of time namespaces, which was half defined but the actual array member was not added. This went unnoticed as the array has an exessive empty member at the end but introduced a user visible regression as the output was corrupted. - Prevent further silent ucount corruption by adding a BUILD_BUG_ON() to catch half updated data. -----BEGIN PGP SIGNATURE----- iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAl6TFe4THHRnbHhAbGlu dXRyb25peC5kZQAKCRCmGPVMDXSYob4PD/47Qwz2z2mEeO037VbbI2gY4yl/raFo 5KPWmnwonKrtVaYAldLutA3iaG7bBbUX5fRvbSRNTS6CJIHwwfLSx7/CeWMmIXEX 0zsBsn5QXjG89lJZXM+ot74yzjvkeoad2g0jEHv92v0WDSXFiAWhkBUwknfNFbpa csEjkdpyn2zTVBGBzKVHWHXddkY0o0Q0JOy0EiH09rHGpQktPoLJdYp73VCygoJd NRAXhTmBQq85RMcSB3eVTbSPpIuBUzZke9zoio7YZwEjl6bkvSqetPmTdIr57u4s ex3PX++64EXD7r8ZW36fPGDqu6v0CH2ILK7QVhwyHAYJo2LQKVd+v25muaFrzfpn dSG1SqabWqdIHUoW/76ORyecAFLTzGwDu07UH+6VJbXeLfmuhe/LI3hdDQFph9NQ BOBKhaHm8aXmAmvrkxbbAikSkJYVHrAIp5abI4PSYoPaqK1DWnSPaT1cqtaIUgYL Mk15z19V9np4lMCH2cucAlap8U9EvQEIfCRRdl+crDu17ZzGID1pwhY2DA8adqcT SUfwzzUaykd5TZtDeIe+6G9fsgf/wbSTSSbrNGKlLXDbxx+iNVXErkmx0JXLEHV4 47cmBwQZ255DzjMfuS4HzCck2MaaP8mDWgcbszgkP+GFnkf9EAP5XNp9st937mbG rzP+NkjNCldN9w== =wOiC -----END PGP SIGNATURE----- Merge tag 'timers-urgent-2020-04-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull time(keeping) updates from Thomas Gleixner: - Fix the time_for_children symlink in /proc/$PID/ so it properly reflects that it part of the 'time' namespace - Add the missing userns limit for the allowed number of time namespaces, which was half defined but the actual array member was not added. This went unnoticed as the array has an exessive empty member at the end but introduced a user visible regression as the output was corrupted. - Prevent further silent ucount corruption by adding a BUILD_BUG_ON() to catch half updated data. * tag 'timers-urgent-2020-04-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: ucount: Make sure ucounts in /proc/sys/user don't regress again time/namespace: Add max_time_namespaces ucount time/namespace: Fix time_for_children symlink |
|
Linus Torvalds | 590680d139 |
Scheduler fixes/updates:
- Deduplicate the average computations in the scheduler core and the fair class code. - Fix a raise between runtime distribution and assignement which can cause exceeding the quota by up to 70%. - Prevent negative results in the imbalanace calculation - Remove a stale warning in the workqueue code which can be triggered since the call site was moved out of preempt disabled code. It's a false positive. - Deduplicate the print macros for procfs - Add the ucmap values to the SCHED_DEBUG procfs output for completness -----BEGIN PGP SIGNATURE----- iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAl6TFEoTHHRnbHhAbGlu dXRyb25peC5kZQAKCRCmGPVMDXSYoRoPD/9cTERNZb/aT/SVD3TntBs5bMlnRR2U KsyoN7w2bmYGY6XbKdjWX+KorPyJ9YRcUpOfmH+6JmMioM5RrgCtAQ4cXP4q89k3 6+hdHu+WAhp2AyKr5LAYZK+NYdq0fy/9xx7fdXNri8/QakQCy1vu/u6iiKRMmrEf R7zB7v4ddTOTKamUuGBNnmM4GwfrD7td9NksfjDqV4yJ2ZkaQ+apWAPzJK8ixfEa TGjVnnUZA82tZRZ+O4RDN2AVgG0CuXYRYPWRfqi3XgQ3d0+ju5mM7q+6TQeEUuAq 19IoscLhmYGb9yYCY5p0j1AUkyr6FcSFr1SJt/8Jxpyw4JsKw+ANFBA3kkzEBZ+h 0VcWjUw4S6A6+0HdrqUQYjr5tl7RCY+hOE+QQ/Xvrm4PpWi0L+1tB+kgR7VKNPO8 Xqfp/CK7Rcgh2/nUHT4uxdwh4ZNsLo39QGTuahyPXzeRVJTEGUjbzktnEUVc9wmd OfjpC0Q2DBdXx+vnzH82flv/rtUUYor/Owhpq92E6d50Z7CjTa6xRNbmQUiVKls8 jqehpRBUg5cB3TI0BKeb5+kzgeGDGpm7MfZbvSwvoL0stdJdzNwwc4nhOLjPkJFK R5m7cg82Kx+r/00Vb7k6WgYZ10WTS2Il1OKbgwHl5uTlaKuY7TPEIRVuJQV/XUSG 88n4jIwpRbLMNw== =BVqY -----END PGP SIGNATURE----- Merge tag 'sched-urgent-2020-04-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler fixes/updates from Thomas Gleixner: - Deduplicate the average computations in the scheduler core and the fair class code. - Fix a raise between runtime distribution and assignement which can cause exceeding the quota by up to 70%. - Prevent negative results in the imbalanace calculation - Remove a stale warning in the workqueue code which can be triggered since the call site was moved out of preempt disabled code. It's a false positive. - Deduplicate the print macros for procfs - Add the ucmap values to the SCHED_DEBUG procfs output for completness * tag 'sched-urgent-2020-04-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: sched/debug: Add task uclamp values to SCHED_DEBUG procfs sched/debug: Factor out printing formats into common macros sched/debug: Remove redundant macro define sched/core: Remove unused rq::last_load_update_tick workqueue: Remove the warning in wq_worker_sleeping() sched/fair: Fix negative imbalance in imbalance calculation sched/fair: Fix race between runtime distribution and assignment sched/fair: Align rq->avg_idle and rq->avg_scan_cost |
|
Linus Torvalds | 20e2aa8126 |
Thre fixes/updates for perf:
- Fix the perf event cgroup tracking which tries to track the cgroup even for disabled events. - Add Ice Lake server support for uncore events - Disable pagefaults when retrieving the physical address in the sampling code. -----BEGIN PGP SIGNATURE----- iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAl6TEbMTHHRnbHhAbGlu dXRyb25peC5kZQAKCRCmGPVMDXSYoYvgD/9Ufht3qrbE29oTS+q2x6PZRc01hvMW c3wh/7Anh5yBlTioUEl1tbNfSD1k9hKqMQJF6iPZQ4tzft6JAunstICngqJrJqSY si22mgY6QGKrA2+4UuxT7FZD5nrEhy1TrGNIOp/86Y9I59PAlrypWMxq71VUPjgB Yy9r6eSJqNX05r9tZy1WMloJaBBieaVvEefK9ZrO4s1XM/RU5pAl+/B1XauK8XN9 e4bzu7d6r+w23pFuEqk3r4KWozafxczJAqV+w6/Me1lFgNj6m2GKq429bL/Bgj8d Re8N4donynPe9yvuDWS4eHD1z0nhOMQhOv85seBwBcEet0PnEPtyw/eUu2zZTE2J h1R7l8I3RIs9EmXtdoOuzLFbc8a9sw/lGQJYdP70TMEtLKPUcGkfNhwP196CRbZK TBVub3jP0STmo8u7PrkcMb53ZBZ17P317ous52f0xgkFB6YVqv29cB+NEyabjKMI fP9ZAaoJ9Rn7T9nji/8KsUcjpCUBhP//YWnf/Vax5PW+hkJhfrLOpesKpX1uAwzd fc4QzEXYr/a4YI30IvqSJBsJJZfvFl9ikMaXrv5TbR8nj/eszJInho5+olQj54A7 SLzCOpVHc3jJ13d/C5Xjo0LhwSqjLv3JtvlFG34XHl+YlZUw0ua9u4Ld5vDIDYgw mwHHFp3P/yFAfQ== =5kqD -----END PGP SIGNATURE----- Merge tag 'perf-urgent-2020-04-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf fixes from Thomas Gleixner: "Three fixes/updates for perf: - Fix the perf event cgroup tracking which tries to track the cgroup even for disabled events. - Add Ice Lake server support for uncore events - Disable pagefaults when retrieving the physical address in the sampling code" * tag 'perf-urgent-2020-04-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf/core: Disable page faults when getting phys address perf/x86/intel/uncore: Add Ice Lake server uncore support perf/cgroup: Correct indirection in perf_less_group_idx() perf/core: Fix event cgroup tracking |
|
Linus Torvalds | 652fa53caa |
Three small fixes/updates for the locking core code:
- Plug a task struct reference leak in the percpu rswem implementation. - Document the refcount interaction with PID_MAX_LIMIT - Improve the 'invalid wait context' data dump in lockdep so it contains all information which is required to decode the problem -----BEGIN PGP SIGNATURE----- iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAl6TEJoTHHRnbHhAbGlu dXRyb25peC5kZQAKCRCmGPVMDXSYoeY6D/42AKqIvpAPiPA+Aod9o8o7qDCoPJQN kwCx939+Mwrwm8YS3WJGJvR/jgoGeq33gerY93D0+jkMyhmvYeWbUIEdCenyy0+C tkA7pN1j61D7jMqmL8PgrObhQMM8tTdNNsiNxg5qZA8wI2oFBMbAStx5mCficCyJ 8t3hh/2aRVlNN1MBjo25+jEWEXUIchAzmxnmR+5t/pRYWythOhLuWW05D8b1JoN3 9zdHj9ZScONlrK+VmpXZl41SokT385arMXcmUuS+zPx6zl4P6yTgeHMLsm4Kt3+j 1q0hm4fUEEW2IrF1Uc9pxtJh/39UVwvX8Wfj/wJLm3b1akmO+3tQ2cpiee4NHk2H WogvBNGG4QeGMrRkg0seiqUE6SOaNPXdGRszPc6EkzwpOywDjHVYV5qbII1VOpbK z/TJrtv8g8ejgAvyG0dEtEDydm2SqPAIAZFXj9JQ0I/gUZKTW5rvGQWdMXW6HPif wdJJ6xb4DC3f5DrBrXgpkMSdwfea2boIoLDMf9tCcdlRs6vmvY3iRLHy2xBLcRFt 9yVDUxO40eJUxztbFKuZsQvgJyBxZubyK+Yo9CCvWQ95GfIr2JurIJEcD2fxVyRU 57/Avt3zt7iViZ9LHJnL4JnWRxftWOdWusF2dsufmnuveCVhtA5B5bl7nGntpIT2 mfJEDrJ7zzhgyA== =Agrh -----END PGP SIGNATURE----- Merge tag 'locking-urgent-2020-04-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull locking fixes from Thomas Gleixner: "Three small fixes/updates for the locking core code: - Plug a task struct reference leak in the percpu rswem implementation. - Document the refcount interaction with PID_MAX_LIMIT - Improve the 'invalid wait context' data dump in lockdep so it contains all information which is required to decode the problem" * tag 'locking-urgent-2020-04-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: locking/lockdep: Improve 'invalid wait context' splat locking/refcount: Document interaction with PID_MAX_LIMIT locking/percpu-rwsem: Fix a task_struct refcount |
|
Linus Torvalds | 75e7188397 |
dma-mapping fixes for 5.7
- fix an integer truncation in dma_direct_get_required_mask (Kishon Vijay Abraham) - fix the display of dma mapping types (Grygorii Strashko) -----BEGIN PGP SIGNATURE----- iQI/BAABCgApFiEEgdbnc3r/njty3Iq9D55TZVIEUYMFAl6Rfy8LHGhjaEBsc3Qu ZGUACgkQD55TZVIEUYPT3w/+MKNVGmyi6nSsruskr1CkLCah5GkHrDA4v+vwwkFZ zIeB7df/EOijzfYHlhEkiQWBcPvl9wmVBhFsuaY9XMQ/fJv06CpR5J2xtb+uZN9L HmfwCJPZwrHDqsonIiOtIWjyIOARLGC0CfBP+DCPJ/GDG9wys7eyfESh85PQEDvO QA4sPjJHuKeTTH2Q6QLM/aiqgr80Q4xrQLCVcgDAqAvqnCYi/E4HXty4pLNGpvJu 5O2xzmbQT2l3gRXjxt5Ct4pol+nFmiWJ0sBDNVNUqge7vebBiu3e0VjEaeXUPAMq hQSrof2x+EwHITV10L82/poIKNQVVXW3aYgItMNXfOSBNqaNx4wVLRsrcMZWeMeJ 0OHmCy6Sac2dK4QtY/NtZFLL1zW4ajsj8X975RjqBMcL6QAdCMxNSWGEY/j/fMQy FV1spy8FBfvXH+0O2POkYNBx/eICnMbGQ2ed5NZ8ONAl8wFLPM+2qIwmgb51a3TG 8jR613Zqyw0aMnky0MUwvj5TlwwUyyXbGKpizl0vkFw7/NYXroNMaYWJSHF/ddFp vUChNqYhgCFY8FOhanFadyPxqU1InEoOkUS8/ij7NAR6cCrxsLuAohMYSNQJes3R IvmU36FtjO53rBdHaN4Eizk4EHVaZDxoll05UPU/B5IvzXqR6zy0PphoOxfzvmmq Qfk= =Af93 -----END PGP SIGNATURE----- Merge tag 'dma-mapping-5.7-1' of git://git.infradead.org/users/hch/dma-mapping Pull dma-mapping fixes from Christoph Hellwig: - fix an integer truncation in dma_direct_get_required_mask (Kishon Vijay Abraham) - fix the display of dma mapping types (Grygorii Strashko) * tag 'dma-mapping-5.7-1' of git://git.infradead.org/users/hch/dma-mapping: dma-debug: fix displaying of dma allocation type dma-direct: fix data truncation in dma_direct_get_required_mask() |
|
Linus Torvalds | 5b8b9d0c6d |
Merge branch 'akpm' (patches from Andrew)
Merge yet more updates from Andrew Morton: - Almost all of the rest of MM (memcg, slab-generic, slab, pagealloc, gup, hugetlb, pagemap, memremap) - Various other things (hfs, ocfs2, kmod, misc, seqfile) * akpm: (34 commits) ipc/util.c: sysvipc_find_ipc() should increase position index kernel/gcov/fs.c: gcov_seq_next() should increase position index fs/seq_file.c: seq_read(): add info message about buggy .next functions drivers/dma/tegra20-apb-dma.c: fix platform_get_irq.cocci warnings change email address for Pali Rohár selftests: kmod: test disabling module autoloading selftests: kmod: fix handling test numbers above 9 docs: admin-guide: document the kernel.modprobe sysctl fs/filesystems.c: downgrade user-reachable WARN_ONCE() to pr_warn_once() kmod: make request_module() return an error when autoloading is disabled mm/memremap: set caching mode for PCI P2PDMA memory to WC mm/memory_hotplug: add pgprot_t to mhp_params powerpc/mm: thread pgprot_t through create_section_mapping() x86/mm: introduce __set_memory_prot() x86/mm: thread pgprot_t through init_memory_mapping() mm/memory_hotplug: rename mhp_restrictions to mhp_params mm/memory_hotplug: drop the flags field from struct mhp_restrictions mm/special: create generic fallbacks for pte_special() and pte_mkspecial() mm/vma: introduce VM_ACCESS_FLAGS mm/vma: define a default value for VM_DATA_DEFAULT_FLAGS ... |
|
Vasily Averin | f4d74ef622 |
kernel/gcov/fs.c: gcov_seq_next() should increase position index
If seq_file .next function does not change position index, read after some lseek can generate unexpected output. https://bugzilla.kernel.org/show_bug.cgi?id=206283 Signed-off-by: Vasily Averin <vvs@virtuozzo.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: Peter Oberparleiter <oberpar@linux.ibm.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: Ingo Molnar <mingo@redhat.com> Cc: Manfred Spraul <manfred@colorfullife.com> Cc: NeilBrown <neilb@suse.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Waiman Long <longman@redhat.com> Link: http://lkml.kernel.org/r/f65c6ee7-bd00-f910-2f8a-37cc67e4ff88@virtuozzo.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
|
Eric Biggers | d7d27cfc5c |
kmod: make request_module() return an error when autoloading is disabled
Patch series "module autoloading fixes and cleanups", v5. This series fixes a bug where request_module() was reporting success to kernel code when module autoloading had been completely disabled via 'echo > /proc/sys/kernel/modprobe'. It also addresses the issues raised on the original thread (https://lkml.kernel.org/lkml/20200310223731.126894-1-ebiggers@kernel.org/T/#u) bydocumenting the modprobe sysctl, adding a self-test for the empty path case, and downgrading a user-reachable WARN_ONCE(). This patch (of 4): It's long been possible to disable kernel module autoloading completely (while still allowing manual module insertion) by setting /proc/sys/kernel/modprobe to the empty string. This can be preferable to setting it to a nonexistent file since it avoids the overhead of an attempted execve(), avoids potential deadlocks, and avoids the call to security_kernel_module_request() and thus on SELinux-based systems eliminates the need to write SELinux rules to dontaudit module_request. However, when module autoloading is disabled in this way, request_module() returns 0. This is broken because callers expect 0 to mean that the module was successfully loaded. Apparently this was never noticed because this method of disabling module autoloading isn't used much, and also most callers don't use the return value of request_module() since it's always necessary to check whether the module registered its functionality or not anyway. But improperly returning 0 can indeed confuse a few callers, for example get_fs_type() in fs/filesystems.c where it causes a WARNING to be hit: if (!fs && (request_module("fs-%.*s", len, name) == 0)) { fs = __get_fs_type(name, len); WARN_ONCE(!fs, "request_module fs-%.*s succeeded, but still no fs?\n", len, name); } This is easily reproduced with: echo > /proc/sys/kernel/modprobe mount -t NONEXISTENT none / It causes: request_module fs-NONEXISTENT succeeded, but still no fs? WARNING: CPU: 1 PID: 1106 at fs/filesystems.c:275 get_fs_type+0xd6/0xf0 [...] This should actually use pr_warn_once() rather than WARN_ONCE(), since it's also user-reachable if userspace immediately unloads the module. Regardless, request_module() should correctly return an error when it fails. So let's make it return -ENOENT, which matches the error when the modprobe binary doesn't exist. I've also sent patches to document and test this case. Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Kees Cook <keescook@chromium.org> Reviewed-by: Jessica Yu <jeyu@kernel.org> Acked-by: Luis Chamberlain <mcgrof@kernel.org> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Jeff Vander Stoep <jeffv@google.com> Cc: Ben Hutchings <benh@debian.org> Cc: Josh Triplett <josh@joshtriplett.org> Cc: <stable@vger.kernel.org> Link: http://lkml.kernel.org/r/20200310223731.126894-1-ebiggers@kernel.org Link: http://lkml.kernel.org/r/20200312202552.241885-1-ebiggers@kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
|
Sergey Senozhatsky | ab6f762f0f |
printk: queue wake_up_klogd irq_work only if per-CPU areas are ready
printk_deferred(), similarly to printk_safe/printk_nmi, does not
immediately attempt to print a new message on the consoles, avoiding
calls into non-reentrant kernel paths, e.g. scheduler or timekeeping,
which potentially can deadlock the system.
Those printk() flavors, instead, rely on per-CPU flush irq_work to print
messages from safer contexts. For same reasons (recursive scheduler or
timekeeping calls) printk() uses per-CPU irq_work in order to wake up
user space syslog/kmsg readers.
However, only printk_safe/printk_nmi do make sure that per-CPU areas
have been initialised and that it's safe to modify per-CPU irq_work.
This means that, for instance, should printk_deferred() be invoked "too
early", that is before per-CPU areas are initialised, printk_deferred()
will perform illegal per-CPU access.
Lech Perczak [0] reports that after commit
|
|
Linus Torvalds | 87ad46e601 |
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace
Pull proc fix from Eric Biederman: "A brown paper bag slipped through my proc changes, and syzcaller caught it when the code ended up in your tree. I have opted to fix it the simplest cleanest way I know how, so there is no reasonable chance for the bug to repeat" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: proc: Use a dedicated lock in struct pid |
|
Linus Torvalds | bbec2a2dc3 |
More power management updates for 5.7-rc1
Rework compat ioctl handling in the user space hibernation interface (Christoph Hellwig) and fix a typo in a function name in the cpuidle haltpoll driver (Yihao Wu). -----BEGIN PGP SIGNATURE----- iQJFBAABCAAwFiEE4fcc61cGeeHD/fCwgsRv/nhiVHEFAl6QPXkSHHJqd0Byand5 c29ja2kubmV0AAoJEILEb/54YlRx918P+NU9lM++pMi2Ng4wHmtRtHW9mH4M6L54 /LTjDNZ4IE/v6JlK6hzaFUAhrFViI23gmUat8a7TT/o1NUsPS0QHxatFK+nGbjEk blB/rBHWpC5vGo8SZbqmvI0hbRn0Q6Ah5I+iV+KAk9Z76mDEMNHZKpP2CfiqSVQE QYsUIJOSUo/CMT1SZRE/xDrvoU418Y1Ed6a6Kn9Ki5uXvqDTPHoAyETZ9M6tLphP kGBpbnbkHx3FyYZ3EyjVEd8O6cDsxS2gIWu6YUCB31N4G7v3bJ6rfgperTCN999J 8eQY9rNVlaAWIP9t0ObC3xOpoaNUcZC+V+yaHev3LU/6LP4Sued8cNBxQn26/RWN vyM5M6K0OphjPll9QfVfZFRUcuoBMyAtgVYw8mwT64GVJt3ukbd2QYDQHORXBQEC ziTtfLQEuAiUw/DRoTGo2XYDTlnd2nu9mFoTRU6juOawOwgwPbQldOtSJiMtCDoR cyaQH/t528w10jGCa+mIJZToTIet+gN6ui83M3kQdxMTa5ulgPClU8Kujn1wPgKX 6jFrf+NJ6SNm6A/EhPk9t/soe7hhzyGwqx351aMQpZGX6UH4hQQPXgC0tyZ71Uj5 UJ7Ys5RHcXg4kub/FJsfhv/3wdSPiekwe+U89UCQY5lxXC2x3iVjp50B4auCjW23 tQVHpAvegWg= =h4ud -----END PGP SIGNATURE----- Merge tag 'pm-5.7-rc1-3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull more power management updates from Rafael Wysocki: "Rework compat ioctl handling in the user space hibernation interface (Christoph Hellwig) and fix a typo in a function name in the cpuidle haltpoll driver (Yihao Wu)" * tag 'pm-5.7-rc1-3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: cpuidle-haltpoll: Fix small typo PM / sleep: handle the compat case in snapshot_set_swap_area() PM / sleep: move SNAPSHOT_SET_SWAP_AREA handling into a helper |
|
Linus Torvalds | c0cc271173 |
Modules updates for v5.7
Summary of modules changes for the 5.7 merge window: - Trivial zero-length array to flexible-array cleanup Signed-off-by: Jessica Yu <jeyu@kernel.org> -----BEGIN PGP SIGNATURE----- iQJEBAABCAAuFiEEVrp26glSWYuDNrCUwEV+OM47wXIFAl6O4MsQHGpleXVAa2Vy bmVsLm9yZwAKCRDARX44zjvBci4aEACsoeVA9rJGsYR7Z08xq9VF5/mPI9oFsXHQ HBnSr8QzOqpKBpAoE0Pp5WqhgbC0+5HrhcbMqE8yGHs+WJ7EDvKB8l8AF6OerRVq kmMlxYJwCJ7rJnnn6K8By4Q/13ZeCSgJfz8v8KJsqLADfm81L5/fGKvt4BThdyap fCuwv4fo0zq7JeEekCyvjaYfcqLHT6OpbOrkR9XQW6U2XTPC5nPFvmKwJvhTED/n 18HVGRGxrfsPLcTkd/njPEEZetngKICTK6iRJk0ePH2cdxiNAPEInI/BZFOXge8H wo9l6UEGSlX1DicdokCc27SQLzd4R78xykT+Z1CrFZo+6INpY5F/+J2LNnknqP25 BGqRpOu6HJZcpOUNTGaBk2c0BmXzKQzqCE3bq2ClI8ifVjwAqQrODbtie9eIqr/w 5Ocg1np0z9F2NFjIiJjD288zASmBpnfgpwgkOsiQAGW8j6Xd40mzqH/Vu6/iT856 +XC3FJvossm2XlB5D+koyWDYPtTQf1LbGt3DhCZ/5Xd7dFIRt51g8jxBJEWAyTJy EVxAohtEM9UTyqn9VymwUUrLPGV6JCG1dbfz+02KuqRklM59ZJQdKypv0vxhqdcv HS9N77xALC6AUdSupOIZiw+jSHNE2WgkDkdtJ6fS2JO2veqJcyLBEflMikT3+1l7 PXji6Pt+9Q== =HWeX -----END PGP SIGNATURE----- Merge tag 'modules-for-v5.7' of git://git.kernel.org/pub/scm/linux/kernel/git/jeyu/linux Pull module updates from Jessica Yu: "Only a small cleanup this time around: a trivial conversion of zero-length arrays to flexible arrays" * tag 'modules-for-v5.7' of git://git.kernel.org/pub/scm/linux/kernel/git/jeyu/linux: kernel: module: Replace zero-length array with flexible-array member |
|
Eric W. Biederman | 63f818f46a |
proc: Use a dedicated lock in struct pid
syzbot wrote:
> ========================================================
> WARNING: possible irq lock inversion dependency detected
> 5.6.0-syzkaller #0 Not tainted
> --------------------------------------------------------
> swapper/1/0 just changed the state of lock:
> ffffffff898090d8 (tasklist_lock){.+.?}-{2:2}, at: send_sigurg+0x9f/0x320 fs/fcntl.c:840
> but this lock took another, SOFTIRQ-unsafe lock in the past:
> (&pid->wait_pidfd){+.+.}-{2:2}
>
>
> and interrupts could create inverse lock ordering between them.
>
>
> other info that might help us debug this:
> Possible interrupt unsafe locking scenario:
>
> CPU0 CPU1
> ---- ----
> lock(&pid->wait_pidfd);
> local_irq_disable();
> lock(tasklist_lock);
> lock(&pid->wait_pidfd);
> <Interrupt>
> lock(tasklist_lock);
>
> *** DEADLOCK ***
>
> 4 locks held by swapper/1/0:
The problem is that because wait_pidfd.lock is taken under the tasklist
lock. It must always be taken with irqs disabled as tasklist_lock can be
taken from interrupt context and if wait_pidfd.lock was already taken this
would create a lock order inversion.
Oleg suggested just disabling irqs where I have added extra calls to
wait_pidfd.lock. That should be safe and I think the code will eventually
do that. It was rightly pointed out by Christian that sharing the
wait_pidfd.lock was a premature optimization.
It is also true that my pre-merge window testing was insufficient. So
remove the premature optimization and give struct pid a dedicated lock of
it's own for struct pid things. I have verified that lockdep sees all 3
paths where we take the new pid->lock and lockdep does not complain.
It is my current day dream that one day pid->lock can be used to guard the
task lists as well and then the tasklist_lock won't need to be held to
deliver signals. That will require taking pid->lock with irqs disabled.
Acked-by: Christian Brauner <christian.brauner@ubuntu.com>
Link: https://lore.kernel.org/lkml/00000000000011d66805a25cd73f@google.com/
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Christian Brauner <christian.brauner@ubuntu.com>
Reported-by: syzbot+343f75cdeea091340956@syzkaller.appspotmail.com
Reported-by: syzbot+832aabf700bc3ec920b9@syzkaller.appspotmail.com
Reported-by: syzbot+f675f964019f884dbd0f@syzkaller.appspotmail.com
Reported-by: syzbot+a9fb1457d720a55d6dc5@syzkaller.appspotmail.com
Fixes:
|
|
Grygorii Strashko | 9bb50ed747 |
dma-debug: fix displaying of dma allocation type
The commit |
|
Kishon Vijay Abraham I | cdcda0d1f8 |
dma-direct: fix data truncation in dma_direct_get_required_mask()
The upper 32-bit physical address gets truncated inadvertently when dma_direct_get_required_mask() invokes phys_to_dma_direct(). This results in dma_addressing_limited() return incorrect value when used in platforms with LPAE enabled. Fix it here by explicitly type casting 'max_pfn' to phys_addr_t in order to prevent overflow of intermediate value while evaluating '(max_pfn - 1) << PAGE_SHIFT'. Signed-off-by: Kishon Vijay Abraham I <kishon@ti.com> Signed-off-by: Christoph Hellwig <hch@lst.de> |
|
Peter Zijlstra | 9a019db0b6 |
locking/lockdep: Improve 'invalid wait context' splat
The 'invalid wait context' splat doesn't print all the information required to reconstruct / validate the error, specifically the irq-context state is missing. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> |
|
Qian Cai | d22cc7f67d |
locking/percpu-rwsem: Fix a task_struct refcount
The following commit: |
|
Valentin Schneider | 96e74ebf8d |
sched/debug: Add task uclamp values to SCHED_DEBUG procfs
Requested and effective uclamp values can be a bit tricky to decipher when playing with cgroup hierarchies. Add them to a task's procfs when SCHED_DEBUG is enabled. Reviewed-by: Qais Yousef <qais.yousef@arm.com> Signed-off-by: Valentin Schneider <valentin.schneider@arm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lkml.kernel.org/r/20200226124543.31986-4-valentin.schneider@arm.com |
|
Valentin Schneider | 9e3bf9469c |
sched/debug: Factor out printing formats into common macros
The printing macros in debug.c keep redefining the same output format. Collect each output format in a single definition, and reuse that definition in the other macros. While at it, add a layer of parentheses and replace printf's with the newly introduced macros. Reviewed-by: Qais Yousef <qais.yousef@arm.com> Signed-off-by: Valentin Schneider <valentin.schneider@arm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lkml.kernel.org/r/20200226124543.31986-3-valentin.schneider@arm.com |
|
Valentin Schneider | c745a6212c |
sched/debug: Remove redundant macro define
Most printing macros for procfs are defined globally in debug.c, and they are re-defined (to the exact same thing) within proc_sched_show_task(). Get rid of the duplicate defines. Reviewed-by: Qais Yousef <qais.yousef@arm.com> Signed-off-by: Valentin Schneider <valentin.schneider@arm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lkml.kernel.org/r/20200226124543.31986-2-valentin.schneider@arm.com |
|
Vincent Donnefort | 275b2f6723 |
sched/core: Remove unused rq::last_load_update_tick
The following commit:
|
|
Sebastian Andrzej Siewior | 62849a9612 |
workqueue: Remove the warning in wq_worker_sleeping()
The kernel test robot triggered a warning with the following race:
task-ctx A interrupt-ctx B
worker
-> process_one_work()
-> work_item()
-> schedule();
-> sched_submit_work()
-> wq_worker_sleeping()
-> ->sleeping = 1
atomic_dec_and_test(nr_running)
__schedule(); *interrupt*
async_page_fault()
-> local_irq_enable();
-> schedule();
-> sched_submit_work()
-> wq_worker_sleeping()
-> if (WARN_ON(->sleeping)) return
-> __schedule()
-> sched_update_worker()
-> wq_worker_running()
-> atomic_inc(nr_running);
-> ->sleeping = 0;
-> sched_update_worker()
-> wq_worker_running()
if (!->sleeping) return
In this context the warning is pointless everything is fine.
An interrupt before wq_worker_sleeping() will perform the ->sleeping
assignment (0 -> 1 > 0) twice.
An interrupt after wq_worker_sleeping() will trigger the warning and
nr_running will be decremented (by A) and incremented once (only by B, A
will skip it). This is the case until the ->sleeping is zeroed again in
wq_worker_running().
Remove the WARN statement because this condition may happen. Document
that preemption around wq_worker_sleeping() needs to be disabled to
protect ->sleeping and not just as an optimisation.
Fixes:
|
|
Aubrey Li | 111688ca1c |
sched/fair: Fix negative imbalance in imbalance calculation
A negative imbalance value was observed after imbalance calculation, this happens when the local sched group type is group_fully_busy, and the average load of local group is greater than the selected busiest group. Fix this problem by comparing the average load of the local and busiest group before imbalance calculation formula. Suggested-by: Vincent Guittot <vincent.guittot@linaro.org> Reviewed-by: Phil Auld <pauld@redhat.com> Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org> Acked-by: Mel Gorman <mgorman@suse.de> Signed-off-by: Aubrey Li <aubrey.li@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lkml.kernel.org/r/1585201349-70192-1-git-send-email-aubrey.li@intel.com |
|
Huaixin Chang | 26a8b12747 |
sched/fair: Fix race between runtime distribution and assignment
Currently, there is a potential race between distribute_cfs_runtime()
and assign_cfs_rq_runtime(). Race happens when cfs_b->runtime is read,
distributes without holding lock and finds out there is not enough
runtime to charge against after distribution. Because
assign_cfs_rq_runtime() might be called during distribution, and use
cfs_b->runtime at the same time.
Fibtest is the tool to test this race. Assume all gcfs_rq is throttled
and cfs period timer runs, slow threads might run and sleep, returning
unused cfs_rq runtime and keeping min_cfs_rq_runtime in their local
pool. If all this happens sufficiently quickly, cfs_b->runtime will drop
a lot. If runtime distributed is large too, over-use of runtime happens.
A runtime over-using by about 70 percent of quota is seen when we
test fibtest on a 96-core machine. We run fibtest with 1 fast thread and
95 slow threads in test group, configure 10ms quota for this group and
see the CPU usage of fibtest is 17.0%, which is far more than the
expected 10%.
On a smaller machine with 32 cores, we also run fibtest with 96
threads. CPU usage is more than 12%, which is also more than expected
10%. This shows that on similar workloads, this race do affect CPU
bandwidth control.
Solve this by holding lock inside distribute_cfs_runtime().
Fixes:
|
|
Valentin Schneider | d76343c6b2 |
sched/fair: Align rq->avg_idle and rq->avg_scan_cost
sched/core.c uses update_avg() for rq->avg_idle and sched/fair.c uses an open-coded version (with the exact same decay factor) for rq->avg_scan_cost. On top of that, select_idle_cpu() expects to be able to compare these two fields. The only difference between the two is that rq->avg_scan_cost is computed using a pure division rather than a shift. Turns out it actually matters, first of all because the shifted value can be negative, and the standard has this to say about it: """ The result of E1 >> E2 is E1 right-shifted E2 bit positions. [...] If E1 has a signed type and a negative value, the resulting value is implementation-defined. """ Not only this, but (arithmetic) right shifting a negative value (using 2's complement) is *not* equivalent to dividing it by the corresponding power of 2. Let's look at a few examples: -4 -> 0xF..FC -4 >> 3 -> 0xF..FF == -1 != -4 / 8 -8 -> 0xF..F8 -8 >> 3 -> 0xF..FF == -1 == -8 / 8 -9 -> 0xF..F7 -9 >> 3 -> 0xF..FE == -2 != -9 / 8 Make update_avg() use a division, and export it to the private scheduler header to reuse it where relevant. Note that this still lets compilers use a shift here, but should prevent any unwanted surprise. The disassembly of select_idle_cpu() remains unchanged on arm64, and ttwu_do_wakeup() gains 2 instructions; the diff sort of looks like this: - sub x1, x1, x0 + subs x1, x1, x0 // set condition codes + add x0, x1, #0x7 + csel x0, x0, x1, mi // x0 = x1 < 0 ? x0 : x1 add x0, x3, x0, asr #3 which does the right thing (i.e. gives us the expected result while still using an arithmetic shift) Signed-off-by: Valentin Schneider <valentin.schneider@arm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lkml.kernel.org/r/20200330090127.16294-1-valentin.schneider@arm.com |
|
Jiri Olsa | d3296fb372 |
perf/core: Disable page faults when getting phys address
We hit following warning when running tests on kernel compiled with CONFIG_DEBUG_ATOMIC_SLEEP=y: WARNING: CPU: 19 PID: 4472 at mm/gup.c:2381 __get_user_pages_fast+0x1a4/0x200 CPU: 19 PID: 4472 Comm: dummy Not tainted 5.6.0-rc6+ #3 RIP: 0010:__get_user_pages_fast+0x1a4/0x200 ... Call Trace: perf_prepare_sample+0xff1/0x1d90 perf_event_output_forward+0xe8/0x210 __perf_event_overflow+0x11a/0x310 __intel_pmu_pebs_event+0x657/0x850 intel_pmu_drain_pebs_nhm+0x7de/0x11d0 handle_pmi_common+0x1b2/0x650 intel_pmu_handle_irq+0x17b/0x370 perf_event_nmi_handler+0x40/0x60 nmi_handle+0x192/0x590 default_do_nmi+0x6d/0x150 do_nmi+0x2f9/0x3c0 nmi+0x8e/0xd7 While __get_user_pages_fast() is IRQ-safe, it calls access_ok(), which warns on: WARN_ON_ONCE(!in_task() && !pagefault_disabled()) Peter suggested disabling page faults around __get_user_pages_fast(), which gets rid of the warning in access_ok() call. Suggested-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Jiri Olsa <jolsa@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lkml.kernel.org/r/20200407141427.3184722-1-jolsa@kernel.org |
|
Ian Rogers | 24fb6b8e7c |
perf/cgroup: Correct indirection in perf_less_group_idx()
The void* in perf_less_group_idx() is to a member in the array which points
at a perf_event*, as such it is a perf_event**.
Reported-By: John Sperbeck <jsperbeck@google.com>
Fixes:
|
|
Peter Zijlstra | 33238c5045 |
perf/core: Fix event cgroup tracking
Song reports that installing cgroup events is broken since: |
|
Jan Kara | 0f538e3e71 |
ucount: Make sure ucounts in /proc/sys/user don't regress again
Commit |
|
Gustavo A. R. Silva | 6524d79413 |
kernel/gcov/fs.c: replace zero-length array with flexible-array member
The current codebase makes use of the zero-length array language extension
to the C90 standard, but the preferred mechanism to declare
variable-length types such as these ones is a flexible array member[1][2],
introduced in C99:
struct foo {
int stuff;
struct boo array[];
};
By making use of the mechanism above, we will get a compiler warning in
case the flexible array does not occur last in the structure, which will
help us prevent some kind of undefined behavior bugs from being
inadvertently introduced[3] to the codebase from now on.
Also, notice that, dynamic memory allocations won't be affected by this
change:
"Flexible array members have incomplete type, and so the sizeof operator
may not be applied. As a quirk of the original implementation of
zero-length arrays, sizeof evaluates to zero."[1]
This issue was found with the help of Coccinelle.
[1] https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html
[2] https://github.com/KSPP/linux/issues/21
[3] commit
|
|
Gustavo A. R. Silva | 7ff87182d1 |
gcov: gcc_3_4: replace zero-length array with flexible-array member
The current codebase makes use of the zero-length array language extension
to the C90 standard, but the preferred mechanism to declare
variable-length types such as these ones is a flexible array member[1][2],
introduced in C99:
struct foo {
int stuff;
struct boo array[];
};
By making use of the mechanism above, we will get a compiler warning in
case the flexible array does not occur last in the structure, which will
help us prevent some kind of undefined behavior bugs from being
inadvertently introduced[3] to the codebase from now on.
Also, notice that, dynamic memory allocations won't be affected by this
change:
"Flexible array members have incomplete type, and so the sizeof operator
may not be applied. As a quirk of the original implementation of
zero-length arrays, sizeof evaluates to zero."[1]
This issue was found with the help of Coccinelle.
[1] https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html
[2] https://github.com/KSPP/linux/issues/21
[3] commit
|
|
Gustavo A. R. Silva | fba4168ede |
gcov: gcc_4_7: replace zero-length array with flexible-array member
The current codebase makes use of the zero-length array language extension
to the C90 standard, but the preferred mechanism to declare
variable-length types such as these ones is a flexible array member[1][2],
introduced in C99:
struct foo {
int stuff;
struct boo array[];
};
By making use of the mechanism above, we will get a compiler warning in
case the flexible array does not occur last in the structure, which will
help us prevent some kind of undefined behavior bugs from being
inadvertently introduced[3] to the codebase from now on.
Also, notice that, dynamic memory allocations won't be affected by this
change:
"Flexible array members have incomplete type, and so the sizeof operator
may not be applied. As a quirk of the original implementation of
zero-length arrays, sizeof evaluates to zero."[1]
This issue was found with the help of Coccinelle.
[1] https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html
[2] https://github.com/KSPP/linux/issues/21
[3] commit
|
|
Qiujun Huang | 06d4f8152a |
kernel/kmod.c: fix a typo "assuems" -> "assumes"
There is a typo in comment. Fix it. s/assuems/assumes/ Signed-off-by: Qiujun Huang <hqjagain@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: Luis Chamberlain <mcgrof@kernel.org> Link: http://lkml.kernel.org/r/1585891029-6450-1-git-send-email-hqjagain@gmail.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
|
Will Deacon | 0bd476e6c6 |
kallsyms: unexport kallsyms_lookup_name() and kallsyms_on_each_symbol()
kallsyms_lookup_name() and kallsyms_on_each_symbol() are exported to modules despite having no in-tree users and being wide open to abuse by out-of-tree modules that can use them as a method to invoke arbitrary non-exported kernel functions. Unexport kallsyms_lookup_name() and kallsyms_on_each_symbol(). Signed-off-by: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org> Reviewed-by: Quentin Perret <qperret@google.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Frederic Weisbecker <frederic@kernel.org> Cc: K.Prasad <prasad@linux.vnet.ibm.com> Cc: Miroslav Benes <mbenes@suse.cz> Cc: Petr Mladek <pmladek@suse.com> Cc: Joe Lawrence <joe.lawrence@redhat.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Link: http://lkml.kernel.org/r/20200221114404.14641-4-will@kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
|
Masahiro Yamada | 889b3c1245 |
compiler: remove CONFIG_OPTIMIZE_INLINING entirely
Commit
|
|
Nathan Chancellor | 63174f61df |
kernel/extable.c: use address-of operator on section symbols
Clang warns: ../kernel/extable.c:37:52: warning: array comparison always evaluates to a constant [-Wtautological-compare] if (main_extable_sort_needed && __stop___ex_table > __start___ex_table) { ^ 1 warning generated. These are not true arrays, they are linker defined symbols, which are just addresses. Using the address of operator silences the warning and does not change the resulting assembly with either clang/ld.lld or gcc/ld (tested with diff + objdump -Dr). Suggested-by: Nick Desaulniers <ndesaulniers@google.com> Signed-off-by: Nathan Chancellor <natechancellor@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Link: https://github.com/ClangBuiltLinux/linux/issues/892 Link: http://lkml.kernel.org/r/20200219202036.45702-1-natechancellor@gmail.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
|
Alexey Dobriyan | d919b33daf |
proc: faster open/read/close with "permanent" files
Now that "struct proc_ops" exist we can start putting there stuff which could not fly with VFS "struct file_operations"... Most of fs/proc/inode.c file is dedicated to make open/read/.../close reliable in the event of disappearing /proc entries which usually happens if module is getting removed. Files like /proc/cpuinfo which never disappear simply do not need such protection. Save 2 atomic ops, 1 allocation, 1 free per open/read/close sequence for such "permanent" files. Enable "permanent" flag for /proc/cpuinfo /proc/kmsg /proc/modules /proc/slabinfo /proc/stat /proc/sysvipc/* /proc/swaps More will come once I figure out foolproof way to prevent out module authors from marking their stuff "permanent" for performance reasons when it is not. This should help with scalability: benchmark is "read /proc/cpuinfo R times by N threads scattered over the system". N R t, s (before) t, s (after) ----------------------------------------------------- 64 4096 1.582458 1.530502 -3.2% 256 4096 6.371926 6.125168 -3.9% 1024 4096 25.64888 24.47528 -4.6% Benchmark source: #include <chrono> #include <iostream> #include <thread> #include <vector> #include <sys/types.h> #include <sys/stat.h> #include <fcntl.h> #include <unistd.h> const int NR_CPUS = sysconf(_SC_NPROCESSORS_ONLN); int N; const char *filename; int R; int xxx = 0; int glue(int n) { cpu_set_t m; CPU_ZERO(&m); CPU_SET(n, &m); return sched_setaffinity(0, sizeof(cpu_set_t), &m); } void f(int n) { glue(n % NR_CPUS); while (*(volatile int *)&xxx == 0) { } for (int i = 0; i < R; i++) { int fd = open(filename, O_RDONLY); char buf[4096]; ssize_t rv = read(fd, buf, sizeof(buf)); asm volatile ("" :: "g" (rv)); close(fd); } } int main(int argc, char *argv[]) { if (argc < 4) { std::cerr << "usage: " << argv[0] << ' ' << "N /proc/filename R "; return 1; } N = atoi(argv[1]); filename = argv[2]; R = atoi(argv[3]); for (int i = 0; i < NR_CPUS; i++) { if (glue(i) == 0) break; } std::vector<std::thread> T; T.reserve(N); for (int i = 0; i < N; i++) { T.emplace_back(f, i); } auto t0 = std::chrono::system_clock::now(); { *(volatile int *)&xxx = 1; for (auto& t: T) { t.join(); } } auto t1 = std::chrono::system_clock::now(); std::chrono::duration<double> dt = t1 - t0; std::cout << dt.count() << ' '; return 0; } P.S.: Explicit randomization marker is added because adding non-function pointer will silently disable structure layout randomization. [akpm@linux-foundation.org: coding style fixes] Reported-by: kbuild test robot <lkp@intel.com> Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Joe Perches <joe@perches.com> Link: http://lkml.kernel.org/r/20200222201539.GA22576@avx2 Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
|
Anshuman Khandual | 03911132aa |
mm/vma: replace all remaining open encodings with is_vm_hugetlb_page()
This replaces all remaining open encodings with is_vm_hugetlb_page(). Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: Paul Mackerras <paulus@ozlabs.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Will Deacon <will@kernel.org> Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com> Cc: Nick Piggin <npiggin@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Guo Ren <guoren@kernel.org> Cc: Mel Gorman <mgorman@suse.de> Cc: Paul Burton <paulburton@kernel.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Rich Felker <dalias@libc.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Link: http://lkml.kernel.org/r/1582520593-30704-4-git-send-email-anshuman.khandual@arm.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
|
Anshuman Khandual | 3122e80efc |
mm/vma: make vma_is_accessible() available for general use
Lets move vma_is_accessible() helper to include/linux/mm.h which makes it available for general use. While here, this replaces all remaining open encodings for VMA access check with vma_is_accessible(). Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> Acked-by: Guo Ren <guoren@kernel.org> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: Guo Ren <guoren@kernel.org> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Paul Burton <paulburton@kernel.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Cc: Rich Felker <dalias@libc.org> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Mel Gorman <mgorman@suse.de> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Nick Piggin <npiggin@gmail.com> Cc: Paul Mackerras <paulus@ozlabs.org> Cc: Will Deacon <will@kernel.org> Link: http://lkml.kernel.org/r/1582520593-30704-3-git-send-email-anshuman.khandual@arm.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
|
Li Xinhai | e39a4b332d |
mm: set vm_next and vm_prev to NULL in vm_area_dup()
Set ->vm_next and ->vm_prev to NULL to prevent potential misuse from the new duplicated vma. Currently, only in fork path there are misuse for handling anon_vma. No other bugs been revealed with this patch applied. Signed-off-by: Li Xinhai <lixinhai.lxh@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Rik van Riel <riel@redhat.com> Link: http://lkml.kernel.org/r/1581150928-3214-4-git-send-email-lixinhai.lxh@gmail.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
|
Li Xinhai | 93949bb21b |
mm: don't prepare anon_vma if vma has VM_WIPEONFORK
Patch series "mm: Fix misuse of parent anon_vma in dup_mmap path".
This patchset fixes the misuse of parenet anon_vma, which mainly caused by
child vma's vm_next and vm_prev are left same as its parent after
duplicate vma. Finally, code reached parent vma's neighbor by referring
pointer of child vma and executed wrong logic.
The first two patches fix relevant issues, and the third patch sets
vm_next and vm_prev to NULL when duplicate vma to prevent potential misuse
in future.
Effects of the first bug is that causes rmap code to check both parent and
child's page table, although a page couldn't be mapped by both parent and
child, because child vma has WIPEONFORK so all pages mapped by child are
'new' and not relevant to parent.
Effects of the second bug is that the relationship of anon_vma of parent
and child are totallyconvoluted. It would cause 'son', 'grandson', ...,
etc, to share 'parent' anon_vma, which disobey the design rule of reusing
anon_vma (the rule to be followed is that reusing should among vma of same
process, and vma should not gone through fork).
So, both issues should cause unnecessary rmap walking and have unexpected
complexity.
These two issues would not be directly visible, I used debugging code to
check the anon_vma pointers of parent and child when inspecting the
suspicious implementation of issue #2, then find the problem.
This patch (of 3):
In dup_mmap(), anon_vma_prepare() is called for vma has VM_WIPEONFORK, and
parameter 'tmp' (i.e., the new vma of child) has same ->vm_next and
->vm_prev as its parent vma. That allows anon_vma used by parent been
mistakenly shared by child (find_mergeable_anon_vma() will do this reuse
work).
Besides this issue, call anon_vma_prepare() should be avoided because we
don't copy page for this vma. Preparing anon_vma will be handled during
fault.
Fixes:
|
|
Dmitry Safonov | eeec26d5da |
time/namespace: Add max_time_namespaces ucount
Michael noticed that userns limit for number of time namespaces is missing.
Furthermore, time namespace introduced UCOUNT_TIME_NAMESPACES, but didn't
introduce an array member in user_table[]. It would make array's
initialisation OOB write, but by luck the user_table array has an excessive
empty member (all accesses to the array are limited with UCOUNT_COUNTS - so
it silently reuses the last free member.
Fixes user-visible regression: max_inotify_instances by reason of the
missing UCOUNT_ENTRY() has limited max number of namespaces instead of the
number of inotify instances.
Fixes:
|
|
Michael Kerrisk (man-pages) | b801f1e22c |
time/namespace: Fix time_for_children symlink
Looking at the contents of the /proc/PID/ns/time_for_children symlink shows
an anomaly:
$ ls -l /proc/self/ns/* |awk '{print $9, $10, $11}'
...
/proc/self/ns/pid -> pid:[4026531836]
/proc/self/ns/pid_for_children -> pid:[4026531836]
/proc/self/ns/time -> time:[4026531834]
/proc/self/ns/time_for_children -> time_for_children:[4026531834]
/proc/self/ns/user -> user:[4026531837]
...
The reference for 'time_for_children' should be a 'time' namespace, just as
the reference for 'pid_for_children' is a 'pid' namespace. In other words,
the above time_for_children link should read:
/proc/self/ns/time_for_children -> time:[4026531834]
Fixes:
|
|
Christoph Hellwig | 0f5c4c6e0e |
PM / sleep: handle the compat case in snapshot_set_swap_area()
Use in_compat_syscall to copy directly from the 32-bit ABI structure. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> |
|
Christoph Hellwig | 88a77559cc |
PM / sleep: move SNAPSHOT_SET_SWAP_AREA handling into a helper
Move the handling of the SNAPSHOT_SET_SWAP_AREA ioctl from the main ioctl helper into a helper function. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> |
|
Linus Torvalds | ef05db16bb |
Additional power management updates for 5.7-rc1
- Fix corner-case suspend-to-idle wakeup issue on systems where the ACPI SCI is shared with another wakeup source (Hans de Goede). - Add document describing system-wide suspend and resume code flows to the admin guide (Rafael Wysocki). - Add kernel command line option to set pm_debug_messages (Chen Yu). - Choose schedutil as the preferred scaling governor by default on ARM big.LITTLE systems and on x86 systems using the intel_pstate driver in the passive mode (Linus Walleij, Rafael Wysocki). - Drop racy and redundant checks from the PM core's device_prepare() routine (Rafael Wysocki). - Make resume from hibernation take the hibernation_restore() return value into account (Dexuan Cui). -----BEGIN PGP SIGNATURE----- iQJGBAABCAAwFiEE4fcc61cGeeHD/fCwgsRv/nhiVHEFAl6LN9kSHHJqd0Byand5 c29ja2kubmV0AAoJEILEb/54YlRxqj0P/3Uh16YIKOUeXosqtptJJiWgZoWu82aR H2uzc9hCV6k/tnSG/Y3ZGgI4dUZGcJ94By80XoWceZ97CC1YuBHpF/dHSupTCcvt tBEP9H2qbSLi9d2PhlEUJ0B6PBpWEjCzOb519Tzj4sd67FGhsKeihXl+WXdgaaKZ hpdj+tBkQ6Dxb7CXVw/mUpWlg/nmF+yRqpK5By+V0WMOlQtg4Ecb3lMqaJPr7b4F W/I5m4PE9QG+VetSSXoe7CKfLo2fsDtyHH6ioTJ9xklcOflIblZ3MYf3sKgOuuny m/6Zm71HmQwWRkjA/V+C/eTr0VX2TuMUALfA8i+uZBo//I58TTJq9oqosu8eao2P 5MR79Vwa+eqtYRgpHCTM4JO1iyKiE+FTfVAPOS4hHOKjqY9BUj0YtWHxk8abdzQd owv9DjEtGKcipAnCKtWDIS9ikZ5TOsYiGgWyJQnoonpFY+0s3DwtRdi+gBLJliSj 5j7sRv9wJrJRFZ7tRMHHAqoQkSYqf4YdBuRBG7F/M44anveORBJNaTbPgm30qbUP FXD9hTJn/4Q2nmBSzqvEv0+TJrWoCyKduzr5uNsR0eyg48w9mrlUDDkW7dOMb3DU WV6QGNomMuKIY4DHKCo3/k21NmzW2JN7zhUrErCnl0bXjV2AM71Cb2xjUD1eNOqV 9LxPDvAmZoGt =QFDr -----END PGP SIGNATURE----- Merge tag 'pm-5.7-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull more power management updates from Rafael Wysocki: "Additional power management updates. These fix a corner-case suspend-to-idle wakeup issue on systems where the ACPI SCI is shared with another wakeup source, add a kernel command line option to set pm_debug_messages via the kernel command line, add a document desctibing system-wide suspend and resume code flows, modify cpufreq Kconfig to choose schedutil as the preferred governor by default in a couple of cases and do some assorted cleanups. Specifics: - Fix corner-case suspend-to-idle wakeup issue on systems where the ACPI SCI is shared with another wakeup source (Hans de Goede). - Add document describing system-wide suspend and resume code flows to the admin guide (Rafael Wysocki). - Add kernel command line option to set pm_debug_messages (Chen Yu). - Choose schedutil as the preferred scaling governor by default on ARM big.LITTLE systems and on x86 systems using the intel_pstate driver in the passive mode (Linus Walleij, Rafael Wysocki). - Drop racy and redundant checks from the PM core's device_prepare() routine (Rafael Wysocki). - Make resume from hibernation take the hibernation_restore() return value into account (Dexuan Cui)" * tag 'pm-5.7-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: platform/x86: intel_int0002_vgpio: Use acpi_register_wakeup_handler() ACPI: PM: Add acpi_[un]register_wakeup_handler() Documentation: PM: sleep: Document system-wide suspend code flows cpufreq: Select schedutil when using big.LITTLE PM: sleep: Add pm_debug_messages kernel command line option PM: sleep: core: Drop racy and redundant checks from device_prepare() PM: hibernate: Propagate the return value of hibernation_restore() cpufreq: intel_pstate: Select schedutil as the default governor |
|
Linus Torvalds | b6ff10700d |
\n
-----BEGIN PGP SIGNATURE----- iQEzBAABCAAdFiEEq1nRK9aeMoq1VSgcnJ2qBz9kQNkFAl6LFJ0ACgkQnJ2qBz9k QNkSUQgAzwaescnHeVTF7/Zg9Uj2xrfrTJZ1E+Mn9qnd/0/z/asVV+RKfY7Gnu7h g19inDI4ZESFz2gWz4jwJD1c2/yMZb8vnae4ye3dtCv2yjG/0JxCeue6vjwsWqmO 4jbSgk8YNQqzwEFVMzNp43ZJr3CFooLCIsJcL8q4yYk8Kt4pDUPmQ1vBvAc6k9vK BKMBvp926tbomP27nq0n0CjvHy7ipDGMl4H6i4vBxHRfbDPih2x9VEklK3JatC1n 4AKS6IYJrkZVdOjli+DrResbcWxyT4db5tPio5MU0RDnVhNZT2cHyNVXf5EpRJqP 72pa7gfPu1Rx1+tU8bDR/daSveou2A== =fkCV -----END PGP SIGNATURE----- Merge tag 'fsnotify_for_v5.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs Pull fsnotify updates from Jan Kara: "This implements the fanotify FAN_DIR_MODIFY event. This event reports the name in a directory under which a change happened and together with the directory filehandle and fstatat() allows reliable and efficient implementation of directory synchronization" * tag 'fsnotify_for_v5.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs: fanotify: Fix the checks in fanotify_fsid_equal fanotify: report name info for FAN_DIR_MODIFY event fanotify: record name info for FAN_DIR_MODIFY event fanotify: Drop fanotify_event_has_fid() fanotify: prepare to report both parent and child fid's fanotify: send FAN_DIR_MODIFY event flavor with dir inode and name fanotify: divorce fanotify_path_event and fanotify_fid_event fanotify: Store fanotify handles differently fanotify: Simplify create_fd() fanotify: fix merging marks masks with FAN_ONDIR fanotify: merge duplicate events on parent and child fsnotify: replace inode pointer with an object id fsnotify: simplify arguments passing to fsnotify_parent() fsnotify: use helpers to access data by data_type fsnotify: funnel all dirent events through fsnotify_name() fsnotify: factor helpers fsnotify_dentry() and fsnotify_file() fsnotify: tidy up FS_ and FAN_ constants |
|
Linus Torvalds | c48b07226b |
perf updates all over the place:
core: - Support for cgroup tracking in samples to allow cgroup based analysis tools: - Support for cgroup analysis - Commandline option and hotkey for perf top to change the sort order - A set of fixes all over the place - Various build system related improvements - Updates of the X86 pmu event JSON data - Documentation updates -----BEGIN PGP SIGNATURE----- iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAl6J2iATHHRnbHhAbGlu dXRyb25peC5kZQAKCRCmGPVMDXSYoXMEEACy6WaNabX7EzkLUnX0WCgxnZVSryNR 4EnxLJSg5lChEe4q2mOS3mRMpzlXHQieWcFxlwda7kjIbFlgQvjJQUiYlAvxRRO/ giY/GwCtTi/Flcb+7wKxTgMYmtAUOZDWeQbBZUlFLi9vyeCHVkjget9EyVsgbe/W EmRsrPuKOVMUTeEwm3zpIE051DObpiWLNge++My70q5W/yNsS94PbNydgKO7osqk pX37YVyBFpI2IQxMGzaE3yK7OxXRjYljZaz1tONFDMEYOX9gmxpDsCCflsP1ZOzL 4/P4faRvAOElwxtYBelKmRl8eboqhRpTEK0Et0TI0LYbUZrE2nisDi0LTKPWQb0k Om2Qi6AfZs67PVzx9htlx6rfee72+sUluz5BDKOGH0pNJ8CFy70ns8InLsZqbBZ7 SgFVNjx6bHxB58VuVE9WEzr/KVs6zI/SuJlH7WG7FLXbm5j0cETfCjg40JlDpSNh CZs+Epky1zyytrVJ9Gc7KnRlw8VB2eWEQ2cQ0sqj5w7WxhfrnsCmQf1zk4sofhOF iz3Dvb9Llz/pYWGZbEiQAuI+8bo0psJptidzpmpbIXs/woKDhW49w1FxqowJQMWe +lGWcauSfo3gjgEnTkOWAx3yiH4i9rvRChX8Jh0Z07d6Kwf19YYfxcuhlkx0Wutj eKaaErWDtWAxpA== =2UUD -----END PGP SIGNATURE----- Merge tag 'perf-urgent-2020-04-05' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull more perf updates from Thomas Gleixner: "Perf updates all over the place: core: - Support for cgroup tracking in samples to allow cgroup based analysis tools: - Support for cgroup analysis - Commandline option and hotkey for perf top to change the sort order - A set of fixes all over the place - Various build system related improvements - Updates of the X86 pmu event JSON data - Documentation updates" * tag 'perf-urgent-2020-04-05' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (55 commits) perf python: Fix clang detection to strip out options passed in $CC perf tools: Support Python 3.8+ in Makefile perf script: Fix invalid read of directory entry after closedir() perf script report: Fix SEGFAULT when using DWARF mode perf script: add -S/--symbols documentation perf pmu-events x86: Use CPU_CLK_UNHALTED.THREAD in Kernel_Utilization metric perf events parser: Add missing Intel CPU events to parser perf script: Allow --symbol to accept hexadecimal addresses perf report/top TUI: Fix title line formatting perf top: Support hotkey to change sort order perf top: Support --group-sort-idx to change the sort order perf symbols: Fix arm64 gap between kernel start and module end perf build-test: Honour JOBS to override detection of number of cores perf script: Add --show-cgroup-events option perf top: Add --all-cgroups option perf record: Add --all-cgroups option perf record: Support synthesizing cgroup events perf report: Add 'cgroup' sort key perf cgroup: Maintain cgroup hierarchy perf tools: Basic support for CGROUP event ... |
|
Linus Torvalds | d5ca32738f |
Two timer subsystem fixes:
- Prevent a use after free in the new lockdep state tracking for hrtimers - Add missing parenthesis in the VF pit timer driver -----BEGIN PGP SIGNATURE----- iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAl6J2pgTHHRnbHhAbGlu dXRyb25peC5kZQAKCRCmGPVMDXSYoWM/D/4qlP3NvsRf/4dIqcmfOtaGmMC1yqug 4YbA6jQ2iecmgGprqN7JIKCHRgRyP2d72Ue6dmDKA8eOCcLmUsWy6dz5A+Ufi5wT S9S9dWctZSkmXSdEWkkBMHaefNFUNOTc16q4c4BFXomZzE4QZs+KjoVjJZBDtIqw A/9rmZKcBKxMpbuorE7zs6cRzsfvmiothXI+R78WMRbI+Yy3JAIuf3+uR1h7tXSi M8BNTTGn9U+Rnos/MFK5p136mwd5DHbCrX2G5KoYaox2CFGQ3+SvFGW9DWR38OTz IDP/RmH02s2AI0MNsQxrFFCQIpCentUEHWV5x5gjsw6DrHI23Xc98xbNdz3c9S+n WZMn63jvGr2XuH8XWb9tS72Zdp9VyKzubQ04xOEswvZg2KQuSntbUjq8RIEwSTMb xC82sJVXf20RX4iPsHcPKqPAWTgKRjNBuZbzxjRWjS/Ijtbdt8/GP/q9nZ3EKRvb 6k+bRWS6fbdRflhptCp9YmczE3/WX6SH02d0+m46x88SPzbxkg7sCMKjeJddiqXW XO2fRedYlbQXmGdUbvFrssRxLuGin4rYMAtZbO43t7uIf8KizPRE8EdUIaRpcYsS QftCReyHa1lu4+yCdknBIJ6eadzeeKaJed8FJLTp1KtPOQu68WryN1qG9Isa5o4R xcTq0+KI56XoKA== =y3EI -----END PGP SIGNATURE----- Merge tag 'timers-urgent-2020-04-05' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timer fixes from Thomas Gleixner: "Two timer subsystem fixes: - Prevent a use after free in the new lockdep state tracking for hrtimers - Add missing parenthesis in the VF pit timer driver" * tag 'timers-urgent-2020-04-05' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: clocksource/drivers/timer-vf-pit: Add missing parenthesis hrtimer: Don't dereference the hrtimer pointer after the callback |
|
Linus Torvalds | aa1a8ce533 |
New tracing features:
- The ring buffer is no longer disabled when reading the trace file. The trace_pipe file was made to be used for live tracing and reading as it acted like the normal producer/consumer. As the trace file would not consume the data, the easy way of handling it was to just disable writes to the ring buffer. This came to a surprise to the BPF folks who complained about lost events due to reading. This is no longer an issue. If someone wants to keep the old disabling there's a new option "pause-on-trace" that can be set. - New set_ftrace_notrace_pid file. PIDs in this file will not be traced by the function tracer. Similar to set_ftrace_pid, which makes the function tracer only trace those tasks with PIDs in the file, the set_ftrace_notrace_pid does the reverse. - New set_event_notrace_pid file. PIDs in this file will cause events not to be traced if triggered by a task with a matching PID. Similar to the set_event_pid file but will not be traced. Note, sched_waking and sched_switch events may still be trace if one of the tasks referenced by those events contains a PID that is allowed to be traced. Tracing related features: - New bootconfig option, that is attached to the initrd file. If bootconfig is on the command line, then the initrd file is searched looking for a bootconfig appended at the end. - New GPU tracepoint infrastructure to help the gfx drivers to get off debugfs (acked by Greg Kroah-Hartman) Other minor updates and fixes. -----BEGIN PGP SIGNATURE----- iIoEABYIADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCXokgWRQccm9zdGVkdEBn b29kbWlzLm9yZwAKCRAp5XQQmuv6qgrHAP0UkKs/52JY4oWa3OIh/OqK+vnCrIwz zGvDFOYM0fKbwgD9FZWgzlcaYK5m2Cxlhp4VoraZveHMLJUhnEHtdX6X0wk= =Rebj -----END PGP SIGNATURE----- Merge tag 'trace-v5.7' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull tracing updates from Steven Rostedt: "New tracing features: - The ring buffer is no longer disabled when reading the trace file. The trace_pipe file was made to be used for live tracing and reading as it acted like the normal producer/consumer. As the trace file would not consume the data, the easy way of handling it was to just disable writes to the ring buffer. This came to a surprise to the BPF folks who complained about lost events due to reading. This is no longer an issue. If someone wants to keep the old disabling there's a new option "pause-on-trace" that can be set. - New set_ftrace_notrace_pid file. PIDs in this file will not be traced by the function tracer. Similar to set_ftrace_pid, which makes the function tracer only trace those tasks with PIDs in the file, the set_ftrace_notrace_pid does the reverse. - New set_event_notrace_pid file. PIDs in this file will cause events not to be traced if triggered by a task with a matching PID. Similar to the set_event_pid file but will not be traced. Note, sched_waking and sched_switch events may still be traced if one of the tasks referenced by those events contains a PID that is allowed to be traced. Tracing related features: - New bootconfig option, that is attached to the initrd file. If bootconfig is on the command line, then the initrd file is searched looking for a bootconfig appended at the end. - New GPU tracepoint infrastructure to help the gfx drivers to get off debugfs (acked by Greg Kroah-Hartman) And other minor updates and fixes" * tag 'trace-v5.7' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: (27 commits) tracing: Do not allocate buffer in trace_find_next_entry() in atomic tracing: Add documentation on set_ftrace_notrace_pid and set_event_notrace_pid selftests/ftrace: Add test to test new set_event_notrace_pid file selftests/ftrace: Add test to test new set_ftrace_notrace_pid file tracing: Create set_event_notrace_pid to not trace tasks ftrace: Create set_ftrace_notrace_pid to not trace tasks ftrace: Make function trace pid filtering a bit more exact ftrace/kprobe: Show the maxactive number on kprobe_events tracing: Have the document reflect that the trace file keeps tracing enabled ring-buffer/tracing: Have iterator acknowledge dropped events tracing: Do not disable tracing when reading the trace file ring-buffer: Do not disable recording when there is an iterator ring-buffer: Make resize disable per cpu buffer instead of total buffer ring-buffer: Optimize rb_iter_head_event() ring-buffer: Do not die if rb_iter_peek() fails more than thrice ring-buffer: Have rb_iter_head_event() handle concurrent writer ring-buffer: Add page_stamp to iterator for synchronization ring-buffer: Rename ring_buffer_read() to read_buffer_iter_advance() ring-buffer: Have ring_buffer_empty() not depend on tracing stopped tracing: Save off entry when peeking at next entry ... |