mirror of https://gitee.com/openkylin/linux.git
25308 Commits
Author | SHA1 | Message | Date |
---|---|---|---|
Linus Torvalds | f79ec886f9 |
Three minor updates
- Use of the new GFP_RETRY_MAYFAIL to be more aggressive in allocating memory for the ring buffer without causing OOMs - Fix a memory leak in adding and removing instances - Add __rcu annotation to be able to debug RCU usage of function tracing a bit better. -----BEGIN PGP SIGNATURE----- iQExBAABCAAbBQJZcf52FBxyb3N0ZWR0QGdvb2RtaXMub3JnAAoJEMm5BfJq2Y3L Vg4H/0DxsgqsGehOhbIu/W6JLJo+q+jNUbfFfvpIDvraZ8z7bC+6SORdgMEV7uXt EMISWnzy9Wv9E361ZLgUaODwbimnqdUeFYzE4f4ggE1+eFhZKAY5Lo0UDcctwNoq /kcOPr51aW8+Tzdu6UtymVsnXykuJo3mIPGFzsKQju8ykcl/dXIdiFAMvVmiNxsG /Rv9yGhYDYm61pj3JyP9pgICYTI/7jtatKhoVZBxI/ji0hWNAnZfF89k0VeU9vpY xsK/d9n84o+kPsuh8hIMVKUUPRoeamDuxpMa+Rf37Vm6aQyzNIXDtNdo3mdfocpg uXLxNxYcmDmRXawR5EkF2cCGIl0= =FjNl -----END PGP SIGNATURE----- Merge tag 'trace-v4.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull tracing fixes from Steven Rostedt: "Three minor updates - Use the new GFP_RETRY_MAYFAIL to be more aggressive in allocating memory for the ring buffer without causing OOMs - Fix a memory leak in adding and removing instances - Add __rcu annotation to be able to debug RCU usage of function tracing a bit better" * tag 'trace-v4.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: trace: fix the errors caused by incompatible type of RCU variables tracing: Fix kmemleak in instance_rmdir tracing/ring_buffer: Try harder to allocate |
|
Linus Torvalds | 5a77f0254b |
Merge branch 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler fixes from Ingo Molnar: "A cputime fix and code comments/organization fix to the deadline scheduler" * 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: sched/deadline: Fix confusing comments about selection of top pi-waiter sched/cputime: Don't use smp_processor_id() in preemptible context |
|
Linus Torvalds | bbcdea658f |
Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf fixes from Ingo Molnar: "Two hw-enablement patches, two race fixes, three fixes for regressions of semantics, plus a number of tooling fixes" * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf/x86/intel: Add proper condition to run sched_task callbacks perf/core: Fix locking for children siblings group read perf/core: Fix scheduling regression of pinned groups perf/x86/intel: Fix debug_store reset field for freq events perf/x86/intel: Add Goldmont Plus CPU PMU support perf/x86/intel: Enable C-state residency events for Apollo Lake perf symbols: Accept zero as the kernel base address Revert "perf/core: Drop kernel samples even though :u is specified" perf annotate: Fix broken arrow at row 0 connecting jmp instruction to its target perf evsel: State in the default event name if attr.exclude_kernel is set perf evsel: Fix attr.exclude_kernel setting for default cycles:p |
|
Linus Torvalds | 8b810a3a35 |
Merge branch 'locking-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull locking fixlet from Ingo Molnar: "Remove an unnecessary priority adjustment in the rtmutex code" * 'locking-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: locking/rtmutex: Remove unnecessary priority adjustment |
|
Linus Torvalds | 34eddefee4 |
Merge branch 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull irq fixes from Ingo Molnar: "A resume_irq() fix, plus a number of static declaration fixes" * 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: irqchip/digicolor: Drop unnecessary static irqchip/mips-cpu: Drop unnecessary static irqchip/gic/realview: Drop unnecessary static irqchip/mips-gic: Remove population of irq domain names genirq/PM: Properly pretend disabled state when force resuming interrupts |
|
Jiri Olsa | 2aeb188354 |
perf/core: Fix locking for children siblings group read
We're missing ctx lock when iterating children siblings within the perf_read path for group reading. Following race and crash can happen: User space doing read syscall on event group leader: T1: perf_read lock event->ctx->mutex perf_read_group lock leader->child_mutex __perf_read_group_add(child) list_for_each_entry(sub, &leader->sibling_list, group_entry) ----> sub might be invalid at this point, because it could get removed via perf_event_exit_task_context in T2 Child exiting and cleaning up its events: T2: perf_event_exit_task_context lock ctx->mutex list_for_each_entry_safe(child_event, next, &child_ctx->event_list,... perf_event_exit_event(child) lock ctx->lock perf_group_detach(child) unlock ctx->lock ----> child is removed from sibling_list without any sync with T1 path above ... free_event(child) Before the child is removed from the leader's child_list, (and thus is omitted from perf_read_group processing), we need to ensure that perf_read_group touches child's siblings under its ctx->lock. Peter further notes: | One additional note; this bug got exposed by commit: | | |
|
Linus Torvalds | 96080f6977 |
Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull networking fixes from David Miller: 1) BPF verifier signed/unsigned value tracking fix, from Daniel Borkmann, Edward Cree, and Josef Bacik. 2) Fix memory allocation length when setting up calls to ->ndo_set_mac_address, from Cong Wang. 3) Add a new cxgb4 device ID, from Ganesh Goudar. 4) Fix FIB refcount handling, we have to set it's initial value before the configure callback (which can bump it). From David Ahern. 5) Fix double-free in qcom/emac driver, from Timur Tabi. 6) A bunch of gcc-7 string format overflow warning fixes from Arnd Bergmann. 7) Fix link level headroom tests in ip_do_fragment(), from Vasily Averin. 8) Fix chunk walking in SCTP when iterating over error and parameter headers. From Alexander Potapenko. 9) TCP BBR congestion control fixes from Neal Cardwell. 10) Fix SKB fragment handling in bcmgenet driver, from Doug Berger. 11) BPF_CGROUP_RUN_PROG_SOCK_OPS needs to check for null __sk, from Cong Wang. 12) xmit_recursion in ppp driver needs to be per-device not per-cpu, from Gao Feng. 13) Cannot release skb->dst in UDP if IP options processing needs it. From Paolo Abeni. 14) Some netdev ioctl ifr_name[] NULL termination fixes. From Alexander Levin and myself. 15) Revert some rtnetlink notification changes that are causing regressions, from David Ahern. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (83 commits) net: bonding: Fix transmit load balancing in balance-alb mode rds: Make sure updates to cp_send_gen can be observed net: ethernet: ti: cpsw: Push the request_irq function to the end of probe ipv4: initialize fib_trie prior to register_netdev_notifier call. rtnetlink: allocate more memory for dev_set_mac_address() net: dsa: b53: Add missing ARL entries for BCM53125 bpf: more tests for mixed signed and unsigned bounds checks bpf: add test for mixed signed and unsigned bounds checks bpf: fix up test cases with mixed signed/unsigned bounds bpf: allow to specify log level and reduce it for test_verifier bpf: fix mixed signed/unsigned derived min/max value bounds ipv6: avoid overflow of offset in ip6_find_1stfragopt net: tehuti: don't process data if it has not been copied from userspace Revert "rtnetlink: Do not generate notifications for CHANGEADDR event" net: dsa: mv88e6xxx: Enable CMODE config support for 6390X dt-binding: ptp: Add SoC compatibility strings for dte ptp clock NET: dwmac: Make dwmac reset unconditional net: Zero terminate ifr_name in dev_ifname(). wireless: wext: terminate ifr name coming from userspace netfilter: fix netfilter_net_init() return ... |
|
Daniel Borkmann | 4cabc5b186 |
bpf: fix mixed signed/unsigned derived min/max value bounds
Edward reported that there's an issue in min/max value bounds tracking when signed and unsigned compares both provide hints on limits when having unknown variables. E.g. a program such as the following should have been rejected: 0: (7a) *(u64 *)(r10 -8) = 0 1: (bf) r2 = r10 2: (07) r2 += -8 3: (18) r1 = 0xffff8a94cda93400 5: (85) call bpf_map_lookup_elem#1 6: (15) if r0 == 0x0 goto pc+7 R0=map_value(ks=8,vs=8,id=0),min_value=0,max_value=0 R10=fp 7: (7a) *(u64 *)(r10 -16) = -8 8: (79) r1 = *(u64 *)(r10 -16) 9: (b7) r2 = -1 10: (2d) if r1 > r2 goto pc+3 R0=map_value(ks=8,vs=8,id=0),min_value=0,max_value=0 R1=inv,min_value=0 R2=imm-1,max_value=18446744073709551615,min_align=1 R10=fp 11: (65) if r1 s> 0x1 goto pc+2 R0=map_value(ks=8,vs=8,id=0),min_value=0,max_value=0 R1=inv,min_value=0,max_value=1 R2=imm-1,max_value=18446744073709551615,min_align=1 R10=fp 12: (0f) r0 += r1 13: (72) *(u8 *)(r0 +0) = 0 R0=map_value_adj(ks=8,vs=8,id=0),min_value=0,max_value=1 R1=inv,min_value=0,max_value=1 R2=imm-1,max_value=18446744073709551615,min_align=1 R10=fp 14: (b7) r0 = 0 15: (95) exit What happens is that in the first part ... 8: (79) r1 = *(u64 *)(r10 -16) 9: (b7) r2 = -1 10: (2d) if r1 > r2 goto pc+3 ... r1 carries an unsigned value, and is compared as unsigned against a register carrying an immediate. Verifier deduces in reg_set_min_max() that since the compare is unsigned and operation is greater than (>), that in the fall-through/false case, r1's minimum bound must be 0 and maximum bound must be r2. Latter is larger than the bound and thus max value is reset back to being 'invalid' aka BPF_REGISTER_MAX_RANGE. Thus, r1 state is now 'R1=inv,min_value=0'. The subsequent test ... 11: (65) if r1 s> 0x1 goto pc+2 ... is a signed compare of r1 with immediate value 1. Here, verifier deduces in reg_set_min_max() that since the compare is signed this time and operation is greater than (>), that in the fall-through/false case, we can deduce that r1's maximum bound must be 1, meaning with prior test, we result in r1 having the following state: R1=inv,min_value=0,max_value=1. Given that the actual value this holds is -8, the bounds are wrongly deduced. When this is being added to r0 which holds the map_value(_adj) type, then subsequent store access in above case will go through check_mem_access() which invokes check_map_access_adj(), that will then probe whether the map memory is in bounds based on the min_value and max_value as well as access size since the actual unknown value is min_value <= x <= max_value; commit |
|
Linus Torvalds | f58781c983 |
Merge branch 'stable-4.13' of git://git.infradead.org/users/pcmoore/audit
Pull audit fix from Paul Moore: "A small audit fix, just a single line, to plug a memory leak in some audit error handling code" * 'stable-4.13' of git://git.infradead.org/users/pcmoore/audit: audit: fix memleak in auditd_send_unicast_skb. |
|
Chunyan Zhang | f86f418059 |
trace: fix the errors caused by incompatible type of RCU variables
The variables which are processed by RCU functions should be annotated as RCU, otherwise sparse will report the errors like below: "error: incompatible types in comparison expression (different address spaces)" Link: http://lkml.kernel.org/r/1496823171-7758-1-git-send-email-zhang.chunyan@linaro.org Signed-off-by: Chunyan Zhang <zhang.chunyan@linaro.org> [ Updated to not be 100% 80 column strict ] Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org> |
|
Chunyu Hu | db9108e054 |
tracing: Fix kmemleak in instance_rmdir
Hit the kmemleak when executing instance_rmdir, it forgot releasing
mem of tracing_cpumask. With this fix, the warn does not appear any
more.
unreferenced object 0xffff93a8dfaa7c18 (size 8):
comm "mkdir", pid 1436, jiffies 4294763622 (age 9134.308s)
hex dump (first 8 bytes):
ff ff ff ff ff ff ff ff ........
backtrace:
[<ffffffff88b6567a>] kmemleak_alloc+0x4a/0xa0
[<ffffffff8861ea41>] __kmalloc_node+0xf1/0x280
[<ffffffff88b505d3>] alloc_cpumask_var_node+0x23/0x30
[<ffffffff88b5060e>] alloc_cpumask_var+0xe/0x10
[<ffffffff88571ab0>] instance_mkdir+0x90/0x240
[<ffffffff886e5100>] tracefs_syscall_mkdir+0x40/0x70
[<ffffffff886565c9>] vfs_mkdir+0x109/0x1b0
[<ffffffff8865b1d0>] SyS_mkdir+0xd0/0x100
[<ffffffff88403857>] do_syscall_64+0x67/0x150
[<ffffffff88b710e7>] return_from_SYSCALL_64+0x0/0x6a
[<ffffffffffffffff>] 0xffffffffffffffff
Link: http://lkml.kernel.org/r/1500546969-12594-1-git-send-email-chuhu@redhat.com
Cc: stable@vger.kernel.org
Fixes:
|
|
Alexander Shishkin | 3bda69c1c3 |
perf/core: Fix scheduling regression of pinned groups
Vince Weaver reported: > I was tracking down some regressions in my perf_event_test testsuite. > Some of the tests broke in the 4.11-rc1 timeframe. > > I've bisected one of them, this report is about > tests/overflow/simul_oneshot_group_overflow > This test creates an event group containing two sampling events, set > to overflow to a signal handler (which disables and then refreshes the > event). > > On a good kernel you get the following: > Event perf::instructions with period 1000000 > Event perf::instructions with period 2000000 > fd 3 overflows: 946 (perf::instructions/1000000) > fd 4 overflows: 473 (perf::instructions/2000000) > Ending counts: > Count 0: 946379875 > Count 1: 946365218 > > With the broken kernels you get: > Event perf::instructions with period 1000000 > Event perf::instructions with period 2000000 > fd 3 overflows: 938 (perf::instructions/1000000) > fd 4 overflows: 318 (perf::instructions/2000000) > Ending counts: > Count 0: 946373080 > Count 1: 653373058 The root cause of the bug is that the following commit: |
|
Linus Torvalds | e06fdaf40a |
Now that IPC and other changes have landed, enable manual markings for
randstruct plugin, including the task_struct. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 Comment: Kees Cook <kees@outflux.net> iQIcBAABCgAGBQJZbRgGAAoJEIly9N/cbcAmk2AQAIL60aQ+9RIcFAXriFhnd7Z2 x9Jqi9JNc8NgPFXx8GhE4J4eTZ5PwcjgXBpNRWY/laBkRyoBHn24ku09YxrJjmHz ZSUsP+/iO9lVeEfbmU9Tnk50afkfwx6bHXBwkiVGQWHtybNVUqA19JbqkHeg8ubx myKLGeUv5PPCodRIcBDD0+HaAANcsqtgbDpgmWU8s+IXWwvWCE2p7PuBw7v3HHgH qzlPDHYQCRDw+LWsSqPaHj+9mbRO18P/ydMoZHGH4Hl3YYNtty8ZbxnraI3A7zBL 6mLUVcZ+/l88DqHc5I05T8MmLU1yl2VRxi8/jpMAkg9wkvZ5iNAtlEKIWU6eqsvk vaImNOkViLKlWKF+oUD1YdG16d8Segrc6m4MGdI021tb+LoGuUbkY7Tl4ee+3dl/ 9FM+jPv95HjJnyfRNGidh2TKTa9KJkh6DYM9aUnktMFy3ca1h/LuszOiN0LTDiHt k5xoFURk98XslJJyXM8FPwXCXiRivrXMZbg5ixNoS4aYSBLv7Cn1M6cPnSOs7UPh FqdNPXLRZ+vabSxvEg5+41Ioe0SHqACQIfaSsV5BfF2rrRRdaAxK4h7DBcI6owV2 7ziBN1nBBq2onYGbARN6ApyCqLcchsKtQfiZ0iFsvW7ZawnkVOOObDTCgPl3tdkr 403YXzphQVzJtpT5eRV6 =ngAW -----END PGP SIGNATURE----- Merge tag 'gcc-plugins-v4.13-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux Pull structure randomization updates from Kees Cook: "Now that IPC and other changes have landed, enable manual markings for randstruct plugin, including the task_struct. This is the rest of what was staged in -next for the gcc-plugins, and comes in three patches, largest first: - mark "easy" structs with __randomize_layout - mark task_struct with an optional anonymous struct to isolate the __randomize_layout section - mark structs to opt _out_ of automated marking (which will come later) And, FWIW, this continues to pass allmodconfig (normal and patched to enable gcc-plugins) builds of x86_64, i386, arm64, arm, powerpc, and s390 for me" * tag 'gcc-plugins-v4.13-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: randstruct: opt-out externally exposed function pointer structs task_struct: Allow randomized layout randstruct: Mark various structs for randomization |
|
Shu Wang | b0659ae5e3 |
audit: fix memleak in auditd_send_unicast_skb.
Found this issue by kmemleak report, auditd_send_unicast_skb did not free skb if rcu_dereference(auditd_conn) returns null. unreferenced object 0xffff88082568ce00 (size 256): comm "auditd", pid 1119, jiffies 4294708499 backtrace: [<ffffffff8176166a>] kmemleak_alloc+0x4a/0xa0 [<ffffffff8121820c>] kmem_cache_alloc_node+0xcc/0x210 [<ffffffff8161b99d>] __alloc_skb+0x5d/0x290 [<ffffffff8113c614>] audit_make_reply+0x54/0xd0 [<ffffffff8113dfa7>] audit_receive_msg+0x967/0xd70 ---------------- (gdb) list *audit_receive_msg+0x967 0xffffffff8113dff7 is in audit_receive_msg (kernel/audit.c:1133). 1132 skb = audit_make_reply(0, AUDIT_REPLACE, 0, 0, &pvnr, sizeof(pvnr)); --------------- [<ffffffff8113e402>] audit_receive+0x52/0xa0 [<ffffffff8166c561>] netlink_unicast+0x181/0x240 [<ffffffff8166c8e2>] netlink_sendmsg+0x2c2/0x3b0 [<ffffffff816112e8>] sock_sendmsg+0x38/0x50 [<ffffffff816117a2>] SYSC_sendto+0x102/0x190 [<ffffffff81612f4e>] SyS_sendto+0xe/0x10 [<ffffffff8176d337>] entry_SYSCALL_64_fastpath+0x1a/0xa5 [<ffffffffffffffff>] 0xffffffffffffffff Signed-off-by: Shu Wang <shuwang@redhat.com> Signed-off-by: Paul Moore <paul@paul-moore.com> |
|
Joel Fernandes | 848618857d |
tracing/ring_buffer: Try harder to allocate
ftrace can fail to allocate per-CPU ring buffer on systems with a large number of CPUs coupled while large amounts of cache happening in the page cache. Currently the ring buffer allocation doesn't retry in the VM implementation even if direct-reclaim made some progress but still wasn't able to find a free page. On retrying I see that the allocations almost always succeed. The retry doesn't happen because __GFP_NORETRY is used in the tracer to prevent the case where we might OOM, however if we drop __GFP_NORETRY, we risk destabilizing the system if OOM killer is triggered. To prevent this situation, use the __GFP_RETRY_MAYFAIL flag introduced recently [1]. Tested the following still succeeds without destabilizing a system with 1GB memory. echo 300000 > /sys/kernel/debug/tracing/buffer_size_kb [1] https://marc.info/?l=linux-mm&m=149820805124906&w=2 Link: http://lkml.kernel.org/r/20170713021416.8897-1-joelaf@google.com Cc: Tim Murray <timmurray@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Andrew Morton <akpm@linux-foundation.org> Acked-by: Vlastimil Babka <vbabka@suse.cz> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Michal Hocko <mhocko@kernel.org> Signed-off-by: Joel Fernandes <joelaf@google.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org> |
|
Juergen Gross | a696712c3d |
genirq/PM: Properly pretend disabled state when force resuming interrupts
Interrupts with the IRQF_FORCE_RESUME flag set have also the
IRQF_NO_SUSPEND flag set. They are not disabled in the suspend path, but
must be forcefully resumed. That's used by XEN to keep IPIs enabled beyond
the suspension of device irqs. Force resume works by pretending that the
interrupt was disabled and then calling __irq_enable().
Incrementing the disabled depth counter was enough to do that, but with the
recent changes which use state flags to avoid unnecessary hardware access,
this is not longer sufficient. If the state flags are not set, then the
hardware callbacks are not invoked and the interrupt line stays disabled in
"hardware".
Set the disabled and masked state when pretending that an interrupt got
disabled by suspend.
Fixes:
|
|
Linus Torvalds | 935acd3f5e |
Merge branch 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull irq fix from Thomas Gleixner: "Fix the fallout from reworking the locking and resource management in request/free_irq()" * 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: genirq: Keep chip buslock across irq_request/release_resources() |
|
Linus Torvalds | 31ba04d99a |
Merge branch 'smp-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull SMP fix from Thomas Gleixner: "Replace the bogus BUG_ON in the cpu hotplug code" * 'smp-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: smp/hotplug: Replace BUG_ON and react useful |
|
Linus Torvalds | 78dcf73421 |
Merge branch 'work.mount' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull ->s_options removal from Al Viro: "Preparations for fsmount/fsopen stuff (coming next cycle). Everything gets moved to explicit ->show_options(), killing ->s_options off + some cosmetic bits around fs/namespace.c and friends. Basically, the stuff needed to work with fsmount series with minimum of conflicts with other work. It's not strictly required for this merge window, but it would reduce the PITA during the coming cycle, so it would be nice to have those bits and pieces out of the way" * 'work.mount' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: isofs: Fix isofs_show_options() VFS: Kill off s_options and helpers orangefs: Implement show_options 9p: Implement show_options isofs: Implement show_options afs: Implement show_options affs: Implement show_options befs: Implement show_options spufs: Implement show_options bpf: Implement show_options ramfs: Implement show_options pstore: Implement show_options omfs: Implement show_options hugetlbfs: Implement show_options VFS: Don't use save/replace_mount_options if not using generic_show_options VFS: Provide empty name qstr VFS: Make get_filesystem() return the affected filesystem VFS: Clean up whitespace in fs/namespace.c and fs/super.c Provide a function to create a NUL-terminated string from unterminated data |
|
Linus Torvalds | e37720e25d |
Power management fixes for v4.13-rc1
- Avoid clearing the PCI PME Enable bit for devices as a result of config space restoration which confuses AML executed afterward and causes wakeup events to be lost on some systems (Rafael Wysocki). - Fix the native PCIe PME interrupts handling in the cases when the PME IRQ is set up as a system wakeup one so that runtime PM remote wakeup works as expected after system resume on systems where that happens (Rafael Wysocki). - Fix the device PM QoS sysfs interface to handle invalid user input correctly instead of using an unititialized variable value as the latency tolerance for the device at hand (Dan Carpenter). - Get rid of one more rounding error from intel_pstate computations (Srinivas Pandruvada). - Fix the schedutil cpufreq governor to prevent it from possibly accessing unititialized data structures from governor callbacks in some cases on systems when multiple CPUs share a single cpufreq policy object (Vikram Mulukutla). - Fix the return values of probe routines in two devfreq drivers (Gustavo Silva). - Constify an attribute_group structure in devfreq (Arvind Yadav). -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iQIcBAABCAAGBQJZaLe2AAoJEILEb/54YlRxbi8P/jbQkFdtZinL8eR5DNlUt9jn ZzOnPNNJL0xj2dRJ8qpmHYT1PAQQGIhWyiXavbJqLeZeO5f4AFnFa8Uya+oq6UfP rv73RIk+qaogUccdqfa7Y3IcBhuER9q2baSIguLEt4w7+szyiWO+XonK640iTRNz moUcf2MCA9EacvwlmANQbnimB7mvwz4Tupgn6zK6zh2BJEBYlkWRbqXE1Zm6tJXb +jYwKY0W/hsJbLAUfhbz0Iz6FhvE/ix46NTRw33gWyjmmsUSn4KvIF6mq1+RplD9 6Rvka6pilqSIWoy3Wr4irAQkaOA8WecvwKGtmTh6mkfQC8TyNbQEHwD0EBSsht9n G1OHaWLv7m8PKaxmaLMvQEd8gYWmKAF3EZHA6zT2qN+LCPkMKzab/dEhsU/rxuR2 Nda57D5iNsGIETfVws9FBeYKOw64gb6TOQi8bunLPQbg15n4XWuL5IjtgnPwHFcU xkaxE5UbAmSLIDM8drevIQGIgrEsDDCgezvnVBV8vCYwUyBbzuBb+T6jibPMdNDM t0DiF8QwQEGJcxYXEd5FpPamS3rmeKxcf234kzf9lHq0Msq6lMFdhihoJvZJ6rw/ F18ZkAT3ni546CRmknJrUmeg7FjwHsTgJo7K7MArIcHBLhsA59+Bv2Mh+UIH//yT 57c1OquHgPXx1uTULMC3 =G9eQ -----END PGP SIGNATURE----- Merge tag 'pm-fixes-4.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull power management fixes from Rafael Wysocki: "These fix a recently exposed issue in the PCI device wakeup code and one older problem related to PCI device wakeup that has been reported recently, modify one more piece of computations in intel_pstate to get rid of a rounding error, fix a possible race in the schedutil cpufreq governor, fix the device PM QoS sysfs interface to correctly handle invalid user input, fix return values of two probe routines in devfreq drivers and constify an attribute_group structure in devfreq. Specifics: - Avoid clearing the PCI PME Enable bit for devices as a result of config space restoration which confuses AML executed afterward and causes wakeup events to be lost on some systems (Rafael Wysocki). - Fix the native PCIe PME interrupts handling in the cases when the PME IRQ is set up as a system wakeup one so that runtime PM remote wakeup works as expected after system resume on systems where that happens (Rafael Wysocki). - Fix the device PM QoS sysfs interface to handle invalid user input correctly instead of using an unititialized variable value as the latency tolerance for the device at hand (Dan Carpenter). - Get rid of one more rounding error from intel_pstate computations (Srinivas Pandruvada). - Fix the schedutil cpufreq governor to prevent it from possibly accessing unititialized data structures from governor callbacks in some cases on systems when multiple CPUs share a single cpufreq policy object (Vikram Mulukutla). - Fix the return values of probe routines in two devfreq drivers (Gustavo Silva). - Constify an attribute_group structure in devfreq (Arvind Yadav)" * tag 'pm-fixes-4.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: PCI / PM: Fix native PME handling during system suspend/resume PCI / PM: Restore PME Enable after config space restoration cpufreq: schedutil: Fix sugov_start() versus sugov_update_shared() race PM / QoS: return -EINVAL for bogus strings cpufreq: intel_pstate: Fix ratio setting for min_perf_pct PM / devfreq: constify attribute_group structures. PM / devfreq: tegra: fix error return code in tegra_devfreq_probe() PM / devfreq: rk3399_dmc: fix error return code in rk3399_dmcfreq_probe() |
|
Luis R. Rodriguez | 6d7964a722 |
kmod: throttle kmod thread limit
If we reach the limit of modprobe_limit threads running the next request_module() call will fail. The original reason for adding a kill was to do away with possible issues with in old circumstances which would create a recursive series of request_module() calls. We can do better than just be super aggressive and reject calls once we've reached the limit by simply making pending callers wait until the threshold has been reduced, and then throttling them in, one by one. This throttling enables requests over the kmod concurrent limit to be processed once a pending request completes. Only the first item queued up to wait is woken up. The assumption here is once a task is woken it will have no other option to also kick the queue to check if there are more pending tasks -- regardless of whether or not it was successful. By throttling and processing only max kmod concurrent tasks we ensure we avoid unexpected fatal request_module() calls, and we keep memory consumption on module loading to a minimum. With x86_64 qemu, with 4 cores, 4 GiB of RAM it takes the following run time to run both tests: time ./kmod.sh -t 0008 real 0m16.366s user 0m0.883s sys 0m8.916s time ./kmod.sh -t 0009 real 0m50.803s user 0m0.791s sys 0m9.852s Link: http://lkml.kernel.org/r/20170628223155.26472-4-mcgrof@kernel.org Signed-off-by: Luis R. Rodriguez <mcgrof@kernel.org> Reviewed-by: Petr Mladek <pmladek@suse.com> Cc: Jessica Yu <jeyu@redhat.com> Cc: Shuah Khan <shuah@kernel.org> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Michal Marek <mmarek@suse.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
|
Kefeng Wang | 5f92a7b0fc |
kernel/watchdog.c: use better pr_fmt prefix
After commit
|
|
Rafael J. Wysocki | a252c258dd |
Merge branches 'pm-cpufreq-sched' and 'intel_pstate'
* pm-cpufreq-sched: cpufreq: schedutil: Fix sugov_start() versus sugov_update_shared() race * intel_pstate: cpufreq: intel_pstate: Fix ratio setting for min_perf_pct |
|
Joel Fernandes | 193be41e33 |
sched/deadline: Fix confusing comments about selection of top pi-waiter
This comment in the code is incomplete, and I believe it begs a definition of dl_boosted to make sense of the condition that follows. Rewrite the comment and also rearrange the condition that follows to reflect the first condition "we have a top pi-waiter which is a SCHED_DEADLINE task" in that order. Also fix a typo that follows. Signed-off-by: Joel Fernandes <joelaf@google.com> Reviewed-by: Daniel Bristot de Oliveira <bristot@redhat.com> Acked-by: Juri Lelli <juri.lelli@arm.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20170713022429.10307-1-joelaf@google.com Signed-off-by: Ingo Molnar <mingo@kernel.org> |
|
Wanpeng Li | 0e4097c335 |
sched/cputime: Don't use smp_processor_id() in preemptible context
Recent kernels trigger this warning: BUG: using smp_processor_id() in preemptible [00000000] code: 99-trinity/181 caller is debug_smp_processor_id+0x17/0x19 CPU: 0 PID: 181 Comm: 99-trinity Not tainted 4.12.0-01059-g2a42eb9 #1 Call Trace: dump_stack+0x82/0xb8 check_preemption_disabled() debug_smp_processor_id() vtime_delta() task_cputime() thread_group_cputime() thread_group_cputime_adjusted() wait_consider_task() do_wait() SYSC_wait4() do_syscall_64() entry_SYSCALL64_slow_path() As Frederic pointed out: | Although those sched_clock_cpu() things seem to only matter when the | sched_clock() is unstable. And that stability is a condition for nohz_full | to work anyway. So probably sched_clock() alone would be enough. This patch fixes it by replacing sched_clock_cpu() with sched_clock() to avoid calling smp_processor_id() in a preemptible context. Reported-by: Xiaolong Ye <xiaolong.ye@intel.com> Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Luiz Capitulino <lcapitulino@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1499586028-7402-1-git-send-email-wanpeng.li@hotmail.com [ Prettified the changelog. ] Signed-off-by: Ingo Molnar <mingo@kernel.org> |
|
Linus Torvalds | bc0f51d359 |
A few more minor updates:
- Show the tgid mappings for user space trace tools to use - Fix and optimize the comm and tgid cache recording - Sanitize derived kprobe names - Ftrace selftest updates - trace file header fix - Update of Documentation/trace/ftrace.txt - Compiler warning fixes - Fix possible uninitialized variable -----BEGIN PGP SIGNATURE----- iQExBAABCAAbBQJZZ2rbFBxyb3N0ZWR0QGdvb2RtaXMub3JnAAoJEMm5BfJq2Y3L V3MIAI3NZ3dr0dKJ7DMF1jsQc24YF/bMG2noWm2b9+H/sO+gbnJKsizqzrB2Cm8S lFCYGSydLKGGZgKob3wkAX15iO2fxcUvJOKzkKxmyDbwAteABRf9LSr/llthRIsT 8kSPI5bgJ5dah+lvhl9+1ekarsIZGr41svY97Knj9A2K18kQplnSNqgatkIuV2Kn hIoiPI0tG2y27In2JJoaTedAHj4NIwmI3nhTt6nks0GN7ICx3bMcvdE9l+zB+OLJ akAehsTk3kcNb66ttoj6ZTzGZ7kaes96Cl6/uamVpXzh3SXla36ux1r9Kp8bgONE EgrJwbRwU8BMDaattutDxT7/XmU= =TPGB -----END PGP SIGNATURE----- Merge tag 'trace-v4.13-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull more tracing updates from Steven Rostedt: "A few more minor updates: - Show the tgid mappings for user space trace tools to use - Fix and optimize the comm and tgid cache recording - Sanitize derived kprobe names - Ftrace selftest updates - trace file header fix - Update of Documentation/trace/ftrace.txt - Compiler warning fixes - Fix possible uninitialized variable" * tag 'trace-v4.13-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: ftrace: Fix uninitialized variable in match_records() ftrace: Remove an unneeded NULL check ftrace: Hide cached module code for !CONFIG_MODULES tracing: Do note expose stack_trace_filter without DYNAMIC_FTRACE tracing: Update Documentation/trace/ftrace.txt tracing: Fixup trace file header alignment selftests/ftrace: Add a testcase for kprobe event naming selftests/ftrace: Add a test to probe module functions selftests/ftrace: Update multiple kprobes test for powerpc trace/kprobes: Sanitize derived event names tracing: Attempt to record other information even if some fail tracing: Treat recording tgid for idle task as a success tracing: Treat recording comm for idle task as a success tracing: Add saved_tgids file to show cached pid to tgid mappings |
|
Linus Torvalds | ad51271afc |
Merge branch 'akpm' (patches from Andrew)
Merge yet more updates from Andrew Morton: - various misc things - kexec updates - sysctl core updates - scripts/gdb udpates - checkpoint-restart updates - ipc updates - kernel/watchdog updates - Kees's "rough equivalent to the glibc _FORTIFY_SOURCE=1 feature" - "stackprotector: ascii armor the stack canary" - more MM bits - checkpatch updates * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (96 commits) writeback: rework wb_[dec|inc]_stat family of functions ARM: samsung: usb-ohci: move inline before return type video: fbdev: omap: move inline before return type video: fbdev: intelfb: move inline before return type USB: serial: safe_serial: move __inline__ before return type drivers: tty: serial: move inline before return type drivers: s390: move static and inline before return type x86/efi: move asmlinkage before return type sh: move inline before return type MIPS: SMP: move asmlinkage before return type m68k: coldfire: move inline before return type ia64: sn: pci: move inline before type ia64: move inline before return type FRV: tlbflush: move asmlinkage before return type CRIS: gpio: move inline before return type ARM: HP Jornada 7XX: move inline before return type ARM: KVM: move asmlinkage before type checkpatch: improve the STORAGE_CLASS test mm, migration: do not trigger OOM killer when migrating memory drm/i915: use __GFP_RETRY_MAYFAIL ... |
|
Alex Shi | 69f0d429c4 |
locking/rtmutex: Remove unnecessary priority adjustment
We don't need to adjust priority before adding a new pi_waiter, the
priority only needs to be updated after pi_waiter change or task
priority change.
Steven Rostedt pointed out:
"Interesting, I did some git mining and this was added with the original
entry of the rtmutex.c (
|
|
Linus Torvalds | 3a75ad1457 |
Modules updates for v4.13
Summary of modules changes for the 4.13 merge window: - Minor code cleanups - Avoid accessing mod struct prior to checking module struct version, from Kees - Fix racy atomic inc/dec logic of kmod_concurrent_max in kmod, from Luis Signed-off-by: Jessica Yu <jeyu@kernel.org> -----BEGIN PGP SIGNATURE----- iQIcBAABCgAGBQJZZp4WAAoJEMBFfjjOO8Fy5JkQAIYujpi6ZS7pGpNCXnGa8pnQ E62oLWAM3UndSgzkL6KJ8HXUzc26Wvm56hoF+k/bvQ7fq0qUmMF71yQ7mArzTZEW QW4t7Fu6zTUh4l5hGenoz1ShJbi+rB/pQT8l6AgdCSEZjpcCoWv+sdb93qoT3YO8 /5pugAR2Uid1yb6EVDzItB/tz5w9Vyojp/fePkcz7M0sAI3NCa/0zeWtYgJbXpTW atieqPM8icfP8LNBYaXmA1SowMkW9cIh8AGhBIbvUYP35wTZVP2jJA0GxK6vB/+c pnDRw/zZO+BUYSpv/NMpJsQ2SKX+t2h5uvBqveq3Q5PljcZAvb6L0wt3PSUp4kvz iRPAIb90FtQqBCLfFnDyIMvzVyCXfHq+eVsFYcvlVOWfdkLaeNEhLyn25whkFXr7 ricd/yXKdS8T1WHatR1HqzIk7pog7PsPewVrjl78TBx3nyIMxEhtCpV9MrnditfP IE1/8hQ2rSriSkFeAi5SYxQ5iNwzQKtKOqMiv7lefIuJiCde+0no4XzMrPz/MaU6 UGyTRRNiQXSlfZQaMI4Ru1itVdAugRRVScATz69ggFqRyfCVuByM78RaygfcrPEC H6tHbeJxyEBytlS2qB2cmVXPvIKOdJ3mU9bGdBy9IuXCj8reJMbzQMfIt4lSow+h axggDNhbL2urY9Ymn1wX =tYuD -----END PGP SIGNATURE----- Merge tag 'modules-for-v4.13' of git://git.kernel.org/pub/scm/linux/kernel/git/jeyu/linux Pull modules updates from Jessica Yu: "Summary of modules changes for the 4.13 merge window: - Minor code cleanups - Avoid accessing mod struct prior to checking module struct version, from Kees - Fix racy atomic inc/dec logic of kmod_concurrent_max in kmod, from Luis" * tag 'modules-for-v4.13' of git://git.kernel.org/pub/scm/linux/kernel/git/jeyu/linux: module: make the modinfo name const kmod: reduce atomic operations on kmod_concurrent and simplify module: use list_for_each_entry_rcu() on find_module_all() kernel/module.c: suppress warning about unused nowarn variable module: Add module name to modinfo module: Pass struct load_info into symbol checks |
|
Rik van Riel | 7cd815bce8 |
fork,random: use get_random_canary() to set tsk->stack_canary
Use the ascii-armor canary to prevent unterminated C string overflows from being able to successfully overwrite the canary, even if they somehow obtain the canary value. Inspired by execshield ascii-armor and Daniel Micay's linux-hardened tree. Link: http://lkml.kernel.org/r/20170524155751.424-3-riel@redhat.com Signed-off-by: Rik van Riel <riel@redhat.com> Acked-by: Kees Cook <keescook@chromium.org> Cc: Daniel Micay <danielmicay@gmail.com> Cc: "Theodore Ts'o" <tytso@mit.edu> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Ingo Molnar <mingo@kernel.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
|
Kees Cook | e2ae8ab4b5 |
kexec_file: adjust declaration of kexec_purgatory
Defining kexec_purgatory as a zero-length char array upsets compile time size checking. Since this is built on a per-arch basis, define it as an unsized char array (like is done for other similar things, e.g. linker sections). This silences the warning generated by the future CONFIG_FORTIFY_SOURCE, which did not like the memcmp() of a "0 byte" array. This drops the __weak and uses an extern instead, since both users define kexec_purgatory. Link: http://lkml.kernel.org/r/1497903987-21002-4-git-send-email-keescook@chromium.org Signed-off-by: Kees Cook <keescook@chromium.org> Acked-by: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Daniel Micay <danielmicay@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
|
Nicholas Piggin | a10a842ff8 |
kernel/watchdog: provide watchdog_nmi_reconfigure() for arch watchdogs
After reconfiguring watchdog sysctls etc., architecture specific watchdogs may not get all their parameters updated. watchdog_nmi_reconfigure() can be implemented to pull the new values in and set the arch NMI watchdog. [npiggin@gmail.com: add code comments] Link: http://lkml.kernel.org/r/20170617125933.774d3858@roar.ozlabs.ibm.com [arnd@arndb.de: hide unused function] Link: http://lkml.kernel.org/r/20170620204854.966601-1-arnd@arndb.de Link: http://lkml.kernel.org/r/20170616065715.18390-5-npiggin@gmail.com Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Don Zickus <dzickus@redhat.com> Tested-by: Babu Moger <babu.moger@oracle.com> [sparc] Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
|
Nicholas Piggin | 05a4a95279 |
kernel/watchdog: split up config options
Split SOFTLOCKUP_DETECTOR from LOCKUP_DETECTOR, and split HARDLOCKUP_DETECTOR_PERF from HARDLOCKUP_DETECTOR. LOCKUP_DETECTOR implies the general boot, sysctl, and programming interfaces for the lockup detectors. An architecture that wants to use a hard lockup detector must define HAVE_HARDLOCKUP_DETECTOR_PERF or HAVE_HARDLOCKUP_DETECTOR_ARCH. Alternatively an arch can define HAVE_NMI_WATCHDOG, which provides the minimum arch_touch_nmi_watchdog, and it otherwise does its own thing and does not implement the LOCKUP_DETECTOR interfaces. sparc is unusual in that it has started to implement some of the interfaces, but not fully yet. It should probably be converted to a full HAVE_HARDLOCKUP_DETECTOR_ARCH. [npiggin@gmail.com: fix] Link: http://lkml.kernel.org/r/20170617223522.66c0ad88@roar.ozlabs.ibm.com Link: http://lkml.kernel.org/r/20170616065715.18390-4-npiggin@gmail.com Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Reviewed-by: Don Zickus <dzickus@redhat.com> Reviewed-by: Babu Moger <babu.moger@oracle.com> Tested-by: Babu Moger <babu.moger@oracle.com> [sparc] Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
|
Nicholas Piggin | f2e0cff85e |
kernel/watchdog: introduce arch_touch_nmi_watchdog()
For architectures that define HAVE_NMI_WATCHDOG, instead of having them provide the complete touch_nmi_watchdog() function, just have them provide arch_touch_nmi_watchdog(). This gives the generic code more flexibility in implementing this function, and arch implementations don't miss out on touching the softlockup watchdog or other generic details. Link: http://lkml.kernel.org/r/20170616065715.18390-3-npiggin@gmail.com Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Reviewed-by: Don Zickus <dzickus@redhat.com> Reviewed-by: Babu Moger <babu.moger@oracle.com> Tested-by: Babu Moger <babu.moger@oracle.com> [sparc] Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
|
Dmitry Vyukov | e41d58185f |
fault-inject: support systematic fault injection
Add /proc/self/task/<current-tid>/fail-nth file that allows failing 0-th, 1-st, 2-nd and so on calls systematically. Excerpt from the added documentation: "Write to this file of integer N makes N-th call in the current task fail (N is 0-based). Read from this file returns a single char 'Y' or 'N' that says if the fault setup with a previous write to this file was injected or not, and disables the fault if it wasn't yet injected. Note that this file enables all types of faults (slab, futex, etc). This setting takes precedence over all other generic settings like probability, interval, times, etc. But per-capability settings (e.g. fail_futex/ignore-private) take precedence over it. This feature is intended for systematic testing of faults in a single system call. See an example below" Why add a new setting: 1. Existing settings are global rather than per-task. So parallel testing is not possible. 2. attr->interval is close but it depends on attr->count which is non reset to 0, so interval does not work as expected. 3. Trying to model this with existing settings requires manipulations of all of probability, interval, times, space, task-filter and unexposed count and per-task make-it-fail files. 4. Existing settings are per-failure-type, and the set of failure types is potentially expanding. 5. make-it-fail can't be changed by unprivileged user and aggressive stress testing better be done from an unprivileged user. Similarly, this would require opening the debugfs files to the unprivileged user, as he would need to reopen at least times file (not possible to pre-open before dropping privs). The proposed interface solves all of the above (see the example). We want to integrate this into syzkaller fuzzer. A prototype has found 10 bugs in kernel in first day of usage: https://groups.google.com/forum/#!searchin/syzkaller/%22FAULT_INJECTION%22%7Csort:relevance I've made the current interface work with all types of our sandboxes. For setuid the secret sauce was prctl(PR_SET_DUMPABLE, 1, 0, 0, 0) to make /proc entries non-root owned. So I am fine with the current version of the code. [akpm@linux-foundation.org: fix build] Link: http://lkml.kernel.org/r/20170328130128.101773-1-dvyukov@google.com Signed-off-by: Dmitry Vyukov <dvyukov@google.com> Cc: Akinobu Mita <akinobu.mita@gmail.com> Cc: Michal Hocko <mhocko@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
|
Cyrill Gorcunov | 0791e3644e |
kcmp: add KCMP_EPOLL_TFD mode to compare epoll target files
With current epoll architecture target files are addressed with file_struct and file descriptor number, where the last is not unique. Moreover files can be transferred from another process via unix socket, added into queue and closed then so we won't find this descriptor in the task fdinfo list. Thus to checkpoint and restore such processes CRIU needs to find out where exactly the target file is present to add it into epoll queue. For this sake one can use kcmp call where some particular target file from the queue is compared with arbitrary file passed as an argument. Because epoll target files can have same file descriptor number but different file_struct a caller should explicitly specify the offset within. To test if some particular file is matching entry inside epoll one have to - fill kcmp_epoll_slot structure with epoll file descriptor, target file number and target file offset (in case if only one target is present then it should be 0) - call kcmp as kcmp(pid1, pid2, KCMP_EPOLL_TFD, fd, &kcmp_epoll_slot) - the kernel fetch file pointer matching file descriptor @fd of pid1 - lookups for file struct in epoll queue of pid2 and returns traditional 0,1,2 result for sorting purpose Link: http://lkml.kernel.org/r/20170424154423.511592110@gmail.com Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org> Acked-by: Andrey Vagin <avagin@openvz.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Pavel Emelyanov <xemul@virtuozzo.com> Cc: Michael Kerrisk <mtk.manpages@gmail.com> Cc: Jason Baron <jbaron@akamai.com> Cc: Andy Lutomirski <luto@amacapital.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
|
Mateusz Jurczyk | 9380fa60b1 |
kernel/sysctl_binary.c: check name array length in deprecated_sysctl_warning()
Prevent use of uninitialized memory (originating from the stack frame of do_sysctl()) by verifying that the name array is filled with sufficient input data before comparing its specific entries with integer constants. Through timing measurement or analyzing the kernel debug logs, a user-mode program could potentially infer the results of comparisons against the uninitialized memory, and acquire some (very limited) information about the state of the kernel stack. The change also eliminates possible future warnings by tools such as KMSAN and other code checkers / instrumentations. Link: http://lkml.kernel.org/r/20170524122139.21333-1-mjurczyk@google.com Signed-off-by: Mateusz Jurczyk <mjurczyk@google.com> Acked-by: Kees Cook <keescook@chromium.org> Cc: "David S. Miller" <davem@davemloft.net> Cc: Matthew Whitehead <tedheadster@gmail.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Cc: Alexander Potapenko <glider@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
|
Luis R. Rodriguez | 61d9b56a89 |
sysctl: add unsigned int range support
To keep parity with regular int interfaces provide the an unsigned int proc_douintvec_minmax() which allows you to specify a range of allowed valid numbers. Adding proc_douintvec_minmax_sysadmin() is easy but we can wait for an actual user for that. Link: http://lkml.kernel.org/r/20170519033554.18592-6-mcgrof@kernel.org Signed-off-by: Luis R. Rodriguez <mcgrof@kernel.org> Acked-by: Kees Cook <keescook@chromium.org> Cc: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org> Cc: Heinrich Schuchardt <xypron.glpk@gmx.de> Cc: Kees Cook <keescook@chromium.org> Cc: "David S. Miller" <davem@davemloft.net> Cc: Ingo Molnar <mingo@redhat.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
|
Luis R. Rodriguez | 4f2fec00af |
sysctl: simplify unsigned int support
Commit |
|
Luis R. Rodriguez | d383d48470 |
sysctl: fold sysctl_writes_strict checks into helper
The mode sysctl_writes_strict positional checks keep being copy and pasted as we add new proc handlers. Just add a helper to avoid code duplication. Link: http://lkml.kernel.org/r/20170519033554.18592-4-mcgrof@kernel.org Signed-off-by: Luis R. Rodriguez <mcgrof@kernel.org> Suggested-by: Kees Cook <keescook@chromium.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
|
Luis R. Rodriguez | a19ac33749 |
sysctl: kdoc'ify sysctl_writes_strict
Document the different sysctl_writes_strict modes in code. Link: http://lkml.kernel.org/r/20170519033554.18592-3-mcgrof@kernel.org Signed-off-by: Luis R. Rodriguez <mcgrof@kernel.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Kees Cook <keescook@chromium.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
|
Xunlei Pang | 1229384f5b |
kdump: protect vmcoreinfo data under the crash memory
Currently vmcoreinfo data is updated at boot time subsys_initcall(), it has the risk of being modified by some wrong code during system is running. As a result, vmcore dumped may contain the wrong vmcoreinfo. Later on, when using "crash", "makedumpfile", etc utility to parse this vmcore, we probably will get "Segmentation fault" or other unexpected errors. E.g. 1) wrong code overwrites vmcoreinfo_data; 2) further crashes the system; 3) trigger kdump, then we obviously will fail to recognize the crash context correctly due to the corrupted vmcoreinfo. Now except for vmcoreinfo, all the crash data is well protected(including the cpu note which is fully updated in the crash path, thus its correctness is guaranteed). Given that vmcoreinfo data is a large chunk prepared for kdump, we better protect it as well. To solve this, we relocate and copy vmcoreinfo_data to the crash memory when kdump is loading via kexec syscalls. Because the whole crash memory will be protected by existing arch_kexec_protect_crashkres() mechanism, we naturally protect vmcoreinfo_data from write(even read) access under kernel direct mapping after kdump is loaded. Since kdump is usually loaded at the very early stage after boot, we can trust the correctness of the vmcoreinfo data copied. On the other hand, we still need to operate the vmcoreinfo safe copy when crash happens to generate vmcoreinfo_note again, we rely on vmap() to map out a new kernel virtual address and update to use this new one instead in the following crash_save_vmcoreinfo(). BTW, we do not touch vmcoreinfo_note, because it will be fully updated using the protected vmcoreinfo_data after crash which is surely correct just like the cpu crash note. Link: http://lkml.kernel.org/r/1493281021-20737-3-git-send-email-xlpang@redhat.com Signed-off-by: Xunlei Pang <xlpang@redhat.com> Tested-by: Michael Holzheu <holzheu@linux.vnet.ibm.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Dave Young <dyoung@redhat.com> Cc: Eric Biederman <ebiederm@xmission.com> Cc: Hari Bathini <hbathini@linux.vnet.ibm.com> Cc: Juergen Gross <jgross@suse.com> Cc: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
|
Xunlei Pang | 5203f4995d |
powerpc/fadump: use the correct VMCOREINFO_NOTE_SIZE for phdr
vmcoreinfo_max_size stands for the vmcoreinfo_data, the correct one we
should use is vmcoreinfo_note whose total size is VMCOREINFO_NOTE_SIZE.
Like explained in commit
|
|
Xunlei Pang | 203e9e4121 |
kexec: move vmcoreinfo out of the kernel's .bss section
As Eric said, "what we need to do is move the variable vmcoreinfo_note out of the kernel's .bss section. And modify the code to regenerate and keep this information in something like the control page. Definitely something like this needs a page all to itself, and ideally far away from any other kernel data structures. I clearly was not watching closely the data someone decided to keep this silly thing in the kernel's .bss section." This patch allocates extra pages for these vmcoreinfo_XXX variables, one advantage is that it enhances some safety of vmcoreinfo, because vmcoreinfo now is kept far away from other kernel data structures. Link: http://lkml.kernel.org/r/1493281021-20737-1-git-send-email-xlpang@redhat.com Signed-off-by: Xunlei Pang <xlpang@redhat.com> Tested-by: Michael Holzheu <holzheu@linux.vnet.ibm.com> Reviewed-by: Juergen Gross <jgross@suse.com> Suggested-by: Eric Biederman <ebiederm@xmission.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Dave Young <dyoung@redhat.com> Cc: Hari Bathini <hbathini@linux.vnet.ibm.com> Cc: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
|
Christoph Lameter | 112166f88c |
kernel/fork.c: virtually mapped stacks: do not disable interrupts
The reason to disable interrupts seems to be to avoid switching to a different processor while handling per cpu data using individual loads and stores. If we use per cpu RMV primitives we will not have to disable interrupts. Link: http://lkml.kernel.org/r/alpine.DEB.2.20.1705171055130.5898@east.gentwo.org Signed-off-by: Christoph Lameter <cl@linux.com> Cc: Andy Lutomirski <luto@amacapital.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
|
Al Viro | 58c7ffc074 |
fix a braino in compat_sys_getrlimit()
Reported-and-tested-by: Meelis Roos <mroos@linux.ee>
Fixes: commit
|
|
Dan Carpenter | 2e028c4fe1 |
ftrace: Fix uninitialized variable in match_records()
My static checker complains that if "func" is NULL then "clear_filter"
is uninitialized. This seems like it could be true, although it's
possible something subtle is happening that I haven't seen.
kernel/trace/ftrace.c:3844 match_records()
error: uninitialized symbol 'clear_filter'.
Link: http://lkml.kernel.org/r/20170712073556.h6tkpjcdzjaozozs@mwanda
Cc: stable@vger.kernel.org
Fixes:
|
|
Dan Carpenter | 44925dfff0 |
ftrace: Remove an unneeded NULL check
"func" can't be NULL and it doesn't make sense to check because we've already derefenced it. Link: http://lkml.kernel.org/r/20170712073340.4enzeojeoupuds5a@mwanda Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org> |
|
Vikram Mulukutla | ab2f7cf141 |
cpufreq: schedutil: Fix sugov_start() versus sugov_update_shared() race
With a shared policy in place, when one of the CPUs in the policy is hotplugged out and then brought back online, sugov_stop() and sugov_start() are called in order. sugov_stop() removes utilization hooks for each CPU in the policy and does nothing else in the for_each_cpu() loop. sugov_start() on the other hand iterates through the CPUs in the policy and re-initializes the per-cpu structure _and_ adds the utilization hook. This implies that the scheduler is allowed to invoke a CPU's utilization update hook when the rest of the per-cpu structures have yet to be re-inited. Apart from some strange values in tracepoints this doesn't cause a problem, but if we do end up accessing a pointer from the per-cpu sugov_cpu structure somewhere in the sugov_update_shared() path, we will likely see crashes since the memset for another CPU in the policy is free to race with sugov_update_shared from the CPU that is ready to go. So let's fix this now to first init all per-cpu structures, and then add the per-cpu utilization update hooks all at once. Signed-off-by: Vikram Mulukutla <markivx@codeaurora.org> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> |
|
Thomas Gleixner | 19d39a3810 |
genirq: Keep chip buslock across irq_request/release_resources()
Moving the irq_request/release_resources() callbacks out of the spinlocked,
irq disabled and bus locked region, unearthed an interesting abuse of the
irq_bus_lock/irq_bus_sync_unlock() callbacks.
The OMAP GPIO driver does merily power management inside of them. The
irq_request_resources() callback of this GPIO irqchip calls a function
which reads a GPIO register. That read aborts now because the clock of the
GPIO block is not magically enabled via the irq_bus_lock() callback.
Move the callbacks under the bus lock again to prevent this. In the
free_irq() path this requires to drop the bus_lock before calling
synchronize_irq() and reaquiring it before calling the
irq_release_resources() callback.
The bus lock can't be held because:
1) The data which has been changed between bus_lock/un_lock is cached in
the irq chip driver private data and needs to go out to the irq chip
via the slow bus (usually SPI or I2C) before calling
synchronize_irq().
That's the reason why this bus_lock/unlock magic exists in the first
place, as you cannot do SPI/I2C transactions while holding desc->lock
with interrupts disabled.
2) synchronize_irq() will actually deadlock, if there is a handler on
flight. These chips use threaded handlers for obvious reasons, as
they allow to do SPI/I2C communication. When the threaded handler
returns then bus_lock needs to be taken in irq_finalize_oneshot() as
we need to talk to the actual irq chip once more. After that the
threaded handler is marked done, which makes synchronize_irq() return.
So if we hold bus_lock accross the synchronize_irq() call, the
handler cannot mark itself done because it blocks on the bus
lock. That in turn makes synchronize_irq() wait forever on the
threaded handler to complete....
Add the missing unlock of desc->request_mutex in the error path of
__free_irq() and add a bunch of comments to explain the locking and
protection rules.
Fixes:
|