mirror of https://gitee.com/openkylin/linux.git
950973 Commits
Author | SHA1 | Message | Date |
---|---|---|---|
Vasily Gorbik | 6cf4ecf5c5 |
perf build: Allow nested externs to enable BUILD_BUG() usage
Currently BUILD_BUG() macro is expanded to smth like the following: do { extern void __compiletime_assert_0(void) __attribute__((error("BUILD_BUG failed"))); if (!(!(1))) __compiletime_assert_0(); } while (0); If used in a function body this obviously would produce build errors with -Wnested-externs and -Werror. To enable BUILD_BUG() usage in tools/arch/x86/lib/insn.c which perf includes in intel-pt-decoder, build perf without -Wnested-externs. Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com> Acked-by: Jiri Olsa <jolsa@redhat.com> Tested-by: Stephen Rothwell <sfr@canb.auug.org.au> # build tested Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lore.kernel.org/lkml/patch-1.thread-251403.git-2514037e9477.your-ad-here.call-01602244460-ext-7088@work.hours Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
Jiri Slaby | f3013f7ed4 |
perf trace: Fix off by ones in memset() after realloc() in arches using libaudit
'perf trace ls' started crashing after commit |
|
Leo Yan | edac75a2f8 |
perf c2c: Update usage for showing memory events
Since commit
|
|
Arnaldo Carvalho de Melo | dbaa1b3d9a |
Merge branch 'perf/urgent' into perf/core
To pick fixes that missed v5.9. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
Tzvetomir Stoyanov (VMware) | a41c32105c |
tools lib traceevent: Hide non API functions
There are internal library functions, which are not declared as a static. They are used inside the library from different files. Hide them from the library users, as they are not part of the API. These functions are made hidden and are renamed without the prefix "tep_": tep_free_plugin_paths tep_peek_char tep_buffer_init tep_get_input_buf_ptr tep_get_input_buf tep_read_token tep_free_token tep_free_event tep_free_format_field __tep_parse_format Link: https://lore.kernel.org/linux-trace-devel/e4afdd82deb5e023d53231bb13e08dca78085fb0.camel@decadent.org.uk/ Reported-by: Ben Hutchings <ben@decadent.org.uk> Signed-off-by: Tzvetomir Stoyanov (VMware) <tz.stoyanov@gmail.com> Reviewed-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Cc: linux-trace-devel@vger.kernel.org Link: http://lore.kernel.org/lkml/20200930110733.280534-1-tz.stoyanov@gmail.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
Joel Fernandes (Google) | dc000c4593 |
perf sched: Show start of latency as well
The 'perf sched latency' tool is really useful at showing worst-case latencies that task encountered since wakeup. However it shows only the end of the latency. Often times the start of a latency is interesting as it can show what else was going on at the time to cause the latency. I certainly myself spending a lot of time backtracking to the start of the latency in "perf sched script" which wastes a lot of time. This patch therefore adds a new column "Max delay start". Considering this, also rename "Maximum delay at" to "Max delay end" as its easier to understand. Example of the new output: ---------------------------------------------------------------------------------------------------------------------------------- Task | Runtime ms | Switches | Avg delay ms | Max delay ms | Max delay start | Max delay end | ---------------------------------------------------------------------------------------------------------------------------------- MediaScannerSer:11936 | 651.296 ms | 67978 | avg: 0.113 ms | max: 77.250 ms | max start: 477.691360 s | max end: 477.768610 s audio@2.0-servi:(3) | 0.000 ms | 3440 | avg: 0.034 ms | max: 72.267 ms | max start: 477.697051 s | max end: 477.769318 s AudioOut_1D:8112 | 0.000 ms | 2588 | avg: 0.083 ms | max: 64.020 ms | max start: 477.710740 s | max end: 477.774760 s Time-limited te:14973 | 7966.090 ms | 24807 | avg: 0.073 ms | max: 15.563 ms | max start: 477.162746 s | max end: 477.178309 s surfaceflinger:8049 | 9.680 ms | 603 | avg: 0.063 ms | max: 13.275 ms | max start: 476.931791 s | max end: 476.945067 s HeapTaskDaemon:(3) | 1588.830 ms | 7040 | avg: 0.065 ms | max: 6.880 ms | max start: 473.666043 s | max end: 473.672922 s mount-passthrou:(3) | 1370.809 ms | 68904 | avg: 0.011 ms | max: 6.524 ms | max start: 478.090630 s | max end: 478.097154 s ReferenceQueueD:(3) | 11.794 ms | 1725 | avg: 0.014 ms | max: 6.521 ms | max start: 476.119782 s | max end: 476.126303 s writer:14077 | 18.410 ms | 1427 | avg: 0.036 ms | max: 6.131 ms | max start: 474.169675 s | max end: 474.175805 s Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org> Acked-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20200925235634.4089867-1-joel@joelfernandes.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
Sandipan Das | 70830f974e |
perf vendor events: Fix typos in power8 PMU events
This replaces the incorrectly spelled word "localtion" with "location"
in some power8 PMU event descriptions.
Fixes:
|
|
Namhyung Kim | bf7ef5ddb0 |
perf bench: Run inject-build-id with --buildid-all option too
For comparison, it now runs the benchmark twice - one if regular -b and another for --buildid-all. $ perf bench internals inject-build-id # Running 'internals/inject-build-id' benchmark: Average build-id injection took: 21.002 msec (+- 0.172 msec) Average time per event: 2.059 usec (+- 0.017 usec) Average memory usage: 8169 KB (+- 0 KB) Average build-id-all injection took: 19.543 msec (+- 0.124 msec) Average time per event: 1.916 usec (+- 0.012 usec) Average memory usage: 7348 KB (+- 0 KB) Signed-off-by: Namhyung Kim <namhyung@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Acked-by: Ian Rogers <irogers@google.com> Acked-by: Jiri Olsa <jolsa@redhat.com> Link: https://lore.kernel.org/r/20201012070214.2074921-7-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
Namhyung Kim | 27c9c3424f |
perf inject: Add --buildid-all option
Like 'perf record', we can even more speedup build-id processing by just using all DSOs. Then we don't need to look at all the sample events anymore. The following patch will update 'perf bench' to show the result of the --buildid-all option too. Signed-off-by: Namhyung Kim <namhyung@kernel.org> Original-patch-by: Stephane Eranian <eranian@google.com> Acked-by: Ian Rogers <irogers@google.com> Acked-by: Jiri Olsa <jolsa@redhat.com> Link: https://lore.kernel.org/r/20201012070214.2074921-6-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
Namhyung Kim | e7b60c5a0c |
perf inject: Do not load map/dso when injecting build-id
No need to load symbols in a DSO when injecting build-id. I guess the reason was to check the DSO is a special file like anon files. Use some helper functions in map.c to check them before reading build-id. Also pass sample event's cpumode to a new build-id event. It brought a speedup in the benchmark of 25 -> 21 msec on my laptop. Also the memory usage (Max RSS) went down by ~200 KB. # Running 'internals/inject-build-id' benchmark: Average build-id injection took: 21.389 msec (+- 0.138 msec) Average time per event: 2.097 usec (+- 0.014 usec) Average memory usage: 8225 KB (+- 0 KB) Committer notes: Before: $ perf stat -r5 perf bench internals inject-build-id > /dev/null Performance counter stats for 'perf bench internals inject-build-id' (5 runs): 4,020.56 msec task-clock:u # 1.271 CPUs utilized ( +- 0.74% ) 0 context-switches:u # 0.000 K/sec 0 cpu-migrations:u # 0.000 K/sec 123,354 page-faults:u # 0.031 M/sec ( +- 0.81% ) 7,119,951,568 cycles:u # 1.771 GHz ( +- 1.74% ) (83.27%) 230,086,969 stalled-cycles-frontend:u # 3.23% frontend cycles idle ( +- 1.97% ) (83.41%) 1,168,298,765 stalled-cycles-backend:u # 16.41% backend cycles idle ( +- 1.13% ) (83.44%) 11,173,083,669 instructions:u # 1.57 insn per cycle # 0.10 stalled cycles per insn ( +- 1.58% ) (83.31%) 2,413,908,936 branches:u # 600.392 M/sec ( +- 1.69% ) (83.26%) 46,576,289 branch-misses:u # 1.93% of all branches ( +- 2.20% ) (83.31%) 3.1638 +- 0.0309 seconds time elapsed ( +- 0.98% ) $ After: $ perf stat -r5 perf bench internals inject-build-id > /dev/null Performance counter stats for 'perf bench internals inject-build-id' (5 runs): 2,379.94 msec task-clock:u # 1.473 CPUs utilized ( +- 0.18% ) 0 context-switches:u # 0.000 K/sec 0 cpu-migrations:u # 0.000 K/sec 62,584 page-faults:u # 0.026 M/sec ( +- 0.07% ) 2,372,389,668 cycles:u # 0.997 GHz ( +- 0.29% ) (83.14%) 106,937,862 stalled-cycles-frontend:u # 4.51% frontend cycles idle ( +- 4.89% ) (83.20%) 581,697,915 stalled-cycles-backend:u # 24.52% backend cycles idle ( +- 0.71% ) (83.47%) 3,659,692,199 instructions:u # 1.54 insn per cycle # 0.16 stalled cycles per insn ( +- 0.10% ) (83.63%) 791,372,961 branches:u # 332.518 M/sec ( +- 0.27% ) (83.39%) 10,648,083 branch-misses:u # 1.35% of all branches ( +- 0.22% ) (83.16%) 1.61570 +- 0.00172 seconds time elapsed ( +- 0.11% ) $ Signed-off-by: Namhyung Kim <namhyung@kernel.org> Original-patch-by: Stephane Eranian <eranian@google.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Acked-by: Jiri Olsa <jolsa@redhat.com> Link: https://lore.kernel.org/r/20201012070214.2074921-5-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
Namhyung Kim | 336c95b297 |
perf inject: Enter namespace when reading build-id
It should be in a proper mnt namespace when accessing the file. I think this had no problem since the build-id was actually read from map__load() -> dso__load() already. But I'd like to change it in the following commit. Acked-by: Jiri Olsa <jolsa@redhat.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20201012070214.2074921-4-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
Namhyung Kim | 2946ecedd0 |
perf inject: Add missing callbacks in perf_tool
I found some events (like PERF_RECORD_CGROUP) are not copied by perf inject due to the missing callbacks. Let's add them. While at it, I've changed the order of the callbacks to match with struct perf_tool so that we can compare them easily. Acked-by: Jiri Olsa <jolsa@redhat.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20201012070214.2074921-3-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
Namhyung Kim | 0bf02a0d80 |
perf bench: Add build-id injection benchmark
Sometimes I can see that 'perf record' piped with 'perf inject' take a long time processing build-ids. So introduce a inject-build-id benchmark to the internals benchmark suite to measure its overhead regularly. It runs the 'perf inject' command internally and feeds the given number of synthesized events (MMAP2 + SAMPLE basically). Usage: perf bench internals inject-build-id <options> -i, --iterations <n> Number of iterations used to compute average (default: 100) -m, --nr-mmaps <n> Number of mmap events for each iteration (default: 100) -n, --nr-samples <n> Number of sample events per mmap event (default: 100) -v, --verbose be more verbose (show iteration count, DSO name, etc) By default, it measures average processing time of 100 MMAP2 events and 10000 SAMPLE events. Below is a result on my laptop. $ perf bench internals inject-build-id # Running 'internals/inject-build-id' benchmark: Average build-id injection took: 25.789 msec (+- 0.202 msec) Average time per event: 2.528 usec (+- 0.020 usec) Average memory usage: 8411 KB (+- 7 KB) Committer testing: $ perf bench Usage: perf bench [<common options>] <collection> <benchmark> [<options>] # List of all available benchmark collections: sched: Scheduler and IPC benchmarks syscall: System call benchmarks mem: Memory access benchmarks numa: NUMA scheduling and MM benchmarks futex: Futex stressing benchmarks epoll: Epoll stressing benchmarks internals: Perf-internals benchmarks all: All benchmarks $ perf bench internals # List of available benchmarks for collection 'internals': synthesize: Benchmark perf event synthesis kallsyms-parse: Benchmark kallsyms parsing inject-build-id: Benchmark build-id injection $ perf bench internals inject-build-id # Running 'internals/inject-build-id' benchmark: Average build-id injection took: 14.202 msec (+- 0.059 msec) Average time per event: 1.392 usec (+- 0.006 usec) Average memory usage: 12650 KB (+- 10 KB) Average build-id-all injection took: 12.831 msec (+- 0.071 msec) Average time per event: 1.258 usec (+- 0.007 usec) Average memory usage: 11895 KB (+- 10 KB) $ $ perf stat -r5 perf bench internals inject-build-id # Running 'internals/inject-build-id' benchmark: Average build-id injection took: 14.380 msec (+- 0.056 msec) Average time per event: 1.410 usec (+- 0.006 usec) Average memory usage: 12608 KB (+- 11 KB) Average build-id-all injection took: 11.889 msec (+- 0.064 msec) Average time per event: 1.166 usec (+- 0.006 usec) Average memory usage: 11838 KB (+- 10 KB) # Running 'internals/inject-build-id' benchmark: Average build-id injection took: 14.246 msec (+- 0.065 msec) Average time per event: 1.397 usec (+- 0.006 usec) Average memory usage: 12744 KB (+- 10 KB) Average build-id-all injection took: 12.019 msec (+- 0.066 msec) Average time per event: 1.178 usec (+- 0.006 usec) Average memory usage: 11963 KB (+- 10 KB) # Running 'internals/inject-build-id' benchmark: Average build-id injection took: 14.321 msec (+- 0.067 msec) Average time per event: 1.404 usec (+- 0.007 usec) Average memory usage: 12690 KB (+- 10 KB) Average build-id-all injection took: 11.909 msec (+- 0.041 msec) Average time per event: 1.168 usec (+- 0.004 usec) Average memory usage: 11938 KB (+- 10 KB) # Running 'internals/inject-build-id' benchmark: Average build-id injection took: 14.287 msec (+- 0.059 msec) Average time per event: 1.401 usec (+- 0.006 usec) Average memory usage: 12864 KB (+- 10 KB) Average build-id-all injection took: 11.862 msec (+- 0.058 msec) Average time per event: 1.163 usec (+- 0.006 usec) Average memory usage: 12103 KB (+- 10 KB) # Running 'internals/inject-build-id' benchmark: Average build-id injection took: 14.402 msec (+- 0.053 msec) Average time per event: 1.412 usec (+- 0.005 usec) Average memory usage: 12876 KB (+- 10 KB) Average build-id-all injection took: 11.826 msec (+- 0.061 msec) Average time per event: 1.159 usec (+- 0.006 usec) Average memory usage: 12111 KB (+- 10 KB) Performance counter stats for 'perf bench internals inject-build-id' (5 runs): 4,267.48 msec task-clock:u # 1.502 CPUs utilized ( +- 0.14% ) 0 context-switches:u # 0.000 K/sec 0 cpu-migrations:u # 0.000 K/sec 102,092 page-faults:u # 0.024 M/sec ( +- 0.08% ) 3,894,589,578 cycles:u # 0.913 GHz ( +- 0.19% ) (83.49%) 140,078,421 stalled-cycles-frontend:u # 3.60% frontend cycles idle ( +- 0.77% ) (83.34%) 948,581,189 stalled-cycles-backend:u # 24.36% backend cycles idle ( +- 0.46% ) (83.25%) 5,835,587,719 instructions:u # 1.50 insn per cycle # 0.16 stalled cycles per insn ( +- 0.21% ) (83.24%) 1,267,423,636 branches:u # 296.996 M/sec ( +- 0.22% ) (83.12%) 17,484,290 branch-misses:u # 1.38% of all branches ( +- 0.12% ) (83.55%) 2.84176 +- 0.00222 seconds time elapsed ( +- 0.08% ) $ Acked-by: Jiri Olsa <jolsa@redhat.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20201012070214.2074921-2-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
Namhyung Kim | bef69bd7cf |
perf stat: Fix out of bounds CPU map access when handling armv8_pmu events
It was reported that 'perf stat' crashed when using with armv8_pmu (CPU)
events with the task mode. As 'perf stat' uses an empty cpu map for
task mode but armv8_pmu has its own cpu mask, it has confused which map
it should use when accessing file descriptors and this causes segfaults:
(gdb) bt
#0 0x0000000000603fc8 in perf_evsel__close_fd_cpu (evsel=<optimized out>,
cpu=<optimized out>) at evsel.c:122
#1 perf_evsel__close_cpu (evsel=evsel@entry=0x716e950, cpu=7) at evsel.c:156
#2 0x00000000004d4718 in evlist__close (evlist=0x70a7cb0) at util/evlist.c:1242
#3 0x0000000000453404 in __run_perf_stat (argc=3, argc@entry=1, argv=0x30,
argv@entry=0xfffffaea2f90, run_idx=119, run_idx@entry=1701998435)
at builtin-stat.c:929
#4 0x0000000000455058 in run_perf_stat (run_idx=1701998435, argv=0xfffffaea2f90,
argc=1) at builtin-stat.c:947
#5 cmd_stat (argc=1, argv=0xfffffaea2f90) at builtin-stat.c:2357
#6 0x00000000004bb888 in run_builtin (p=p@entry=0x9764b8 <commands+288>,
argc=argc@entry=4, argv=argv@entry=0xfffffaea2f90) at perf.c:312
#7 0x00000000004bbb54 in handle_internal_command (argc=argc@entry=4,
argv=argv@entry=0xfffffaea2f90) at perf.c:364
#8 0x0000000000435378 in run_argv (argcp=<synthetic pointer>,
argv=<synthetic pointer>) at perf.c:408
#9 main (argc=4, argv=0xfffffaea2f90) at perf.c:538
To fix this, I simply used the given cpu map unless the evsel actually
is not a system-wide event (like uncore events).
Fixes:
|
|
Jiri Olsa | 6fcd5ddc3b |
perf python scripting: Fix printable strings in python3 scripts
Hagen reported broken strings in python3 tracepoint scripts:
make PYTHON=python3
perf record -e sched:sched_switch -a -- sleep 5
perf script --gen-script py
perf script -s ./perf-script.py
[..]
sched__sched_switch 7 563231.759525792 0 swapper prev_comm=bytearray(b'swapper/7\x00\x00\x00\x00\x00\x00\x00'), prev_pid=0, prev_prio=120, prev_state=, next_comm=bytearray(b'mutex-thread-co\x00'),
The problem is in the is_printable_array function that does not take the
zero byte into account and claim such string as not printable, so the
code will create byte array instead of string.
Committer testing:
After this fix:
sched__sched_switch 3 484522.497072626 1158680 kworker/3:0-eve prev_comm=kworker/3:0, prev_pid=1158680, prev_prio=120, prev_state=I, next_comm=swapper/3, next_pid=0, next_prio=120
Sample: {addr=0, cpu=3, datasrc=84410401, datasrc_decode=N/A|SNP N/A|TLB N/A|LCK N/A, ip=18446744071841817196, period=1, phys_addr=0, pid=1158680, tid=1158680, time=484522497072626, transaction=0, values=[(0, 0)], weight=0}
sched__sched_switch 4 484522.497085610 1225814 perf prev_comm=perf, prev_pid=1225814, prev_prio=120, prev_state=, next_comm=migration/4, next_pid=30, next_prio=0
Sample: {addr=0, cpu=4, datasrc=84410401, datasrc_decode=N/A|SNP N/A|TLB N/A|LCK N/A, ip=18446744071841817196, period=1, phys_addr=0, pid=1225814, tid=1225814, time=484522497085610, transaction=0, values=[(0, 0)], weight=0}
Fixes:
|
|
Arnaldo Carvalho de Melo | 388968d864 |
perf trace: Use the autogenerated mmap 'prot' string/id table
No change in behaviour: # perf trace -e mmap sleep 1 0.000 ( 0.009 ms): sleep/751870 mmap(len: 143317, prot: READ, flags: PRIVATE, fd: 3) = 0x7fa96d0f7000 0.028 ( 0.004 ms): sleep/751870 mmap(len: 8192, prot: READ|WRITE, flags: PRIVATE|ANONYMOUS) = 0x7fa96d0f5000 0.037 ( 0.005 ms): sleep/751870 mmap(len: 1872744, prot: READ, flags: PRIVATE|DENYWRITE, fd: 3) = 0x7fa96cf2b000 0.044 ( 0.011 ms): sleep/751870 mmap(addr: 0x7fa96cf50000, len: 1376256, prot: READ|EXEC, flags: PRIVATE|FIXED|DENYWRITE, fd: 3, off: 0x25000) = 0x7fa96cf50000 0.056 ( 0.007 ms): sleep/751870 mmap(addr: 0x7fa96d0a0000, len: 307200, prot: READ, flags: PRIVATE|FIXED|DENYWRITE, fd: 3, off: 0x175000) = 0x7fa96d0a0000 0.064 ( 0.007 ms): sleep/751870 mmap(addr: 0x7fa96d0eb000, len: 24576, prot: READ|WRITE, flags: PRIVATE|FIXED|DENYWRITE, fd: 3, off: 0x1bf000) = 0x7fa96d0eb000 0.075 ( 0.005 ms): sleep/751870 mmap(addr: 0x7fa96d0f1000, len: 13160, prot: READ|WRITE, flags: PRIVATE|FIXED|ANONYMOUS) = 0x7fa96d0f1000 0.253 ( 0.005 ms): sleep/751870 mmap(len: 218049136, prot: READ, flags: PRIVATE, fd: 3) = 0x7fa95ff38000 # # # set -o vi # strace -e mmap sleep 1 mmap(NULL, 143317, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f333bd83000 mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f333bd81000 mmap(NULL, 1872744, PROT_READ, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f333bbb7000 mmap(0x7f333bbdc000, 1376256, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x25000) = 0x7f333bbdc000 mmap(0x7f333bd2c000, 307200, PROT_READ, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x175000) = 0x7f333bd2c000 mmap(0x7f333bd77000, 24576, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x1bf000) = 0x7f333bd77000 mmap(0x7f333bd7d000, 13160, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7f333bd7d000 mmap(NULL, 218049136, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f332ebc4000 +++ exited with 0 +++ # And you can as well tweak 'perf trace's output to more closely match strace's: # perf config trace.show_arg_names=no # perf config trace.show_duration=no # perf config trace.show_prefix=yes # perf config trace.show_timestamp=no # perf config trace.show_zeros=yes # perf config trace.no_inherit=yes # perf trace -e mmap sleep 1 mmap(NULL, 143317, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f0d287ca000 mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS) = 0x7f0d287c8000 mmap(NULL, 1872744, PROT_READ, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f0d285fe000 mmap(0x7f0d28623000, 1376256, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x25000) = 0x7f0d28623000 mmap(0x7f0d28773000, 307200, PROT_READ, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x175000) = 0x7f0d28773000 mmap(0x7f0d287be000, 24576, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x1bf000) = 0x7f0d287be000 mmap(0x7f0d287c4000, 13160, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS) = 0x7f0d287c4000 mmap(NULL, 218049136, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f0d1b60b000 # # perf config | grep ^trace trace.show_arg_names=no trace.show_duration=no trace.show_prefix=yes trace.show_timestamp=no trace.show_zeros=yes trace.no_inherit=yes # Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
Arnaldo Carvalho de Melo | 08fc476214 |
tools beauty: Add script to generate table of mmap's 'prot' argument
Will be wired up in the following csets: $ tools/perf/trace/beauty/mmap_prot.sh static const char *mmap_prot[] = { [ilog2(0x1) + 1] = "READ", #ifndef PROT_READ #define PROT_READ 0x1 #endif [ilog2(0x2) + 1] = "WRITE", #ifndef PROT_WRITE #define PROT_WRITE 0x2 #endif [ilog2(0x4) + 1] = "EXEC", #ifndef PROT_EXEC #define PROT_EXEC 0x4 #endif [ilog2(0x8) + 1] = "SEM", #ifndef PROT_SEM #define PROT_SEM 0x8 #endif [ilog2(0x01000000) + 1] = "GROWSDOWN", #ifndef PROT_GROWSDOWN #define PROT_GROWSDOWN 0x01000000 #endif [ilog2(0x02000000) + 1] = "GROWSUP", #ifndef PROT_GROWSUP #define PROT_GROWSUP 0x02000000 #endif }; $ $ $ $ tools/perf/trace/beauty/mmap_prot.sh alpha static const char *mmap_prot[] = { [ilog2(0x4) + 1] = "EXEC", #ifndef PROT_EXEC #define PROT_EXEC 0x4 #endif [ilog2(0x01000000) + 1] = "GROWSDOWN", #ifndef PROT_GROWSDOWN #define PROT_GROWSDOWN 0x01000000 #endif [ilog2(0x02000000) + 1] = "GROWSUP", #ifndef PROT_GROWSUP #define PROT_GROWSUP 0x02000000 #endif [ilog2(0x1) + 1] = "READ", #ifndef PROT_READ #define PROT_READ 0x1 #endif [ilog2(0x8) + 1] = "SEM", #ifndef PROT_SEM #define PROT_SEM 0x8 #endif [ilog2(0x2) + 1] = "WRITE", #ifndef PROT_WRITE #define PROT_WRITE 0x2 #endif }; $ Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
Arnaldo Carvalho de Melo | 61693228b6 |
perf beauty mmap_flags: Conditionaly define the mmap flags
So that in older systems we get it in the mmap flags scnprintf routines: $ tools/perf/trace/beauty/mmap_flags.sh | head -9 2> /dev/null static const char *mmap_flags[] = { [ilog2(0x40) + 1] = "32BIT", #ifndef MAP_32BIT #define MAP_32BIT 0x40 #endif [ilog2(0x01) + 1] = "SHARED", #ifndef MAP_SHARED #define MAP_SHARED 0x01 #endif $ Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
Arnaldo Carvalho de Melo | 9012e3dda2 |
perf trace beauty: Add script to autogenerate mremap's flags args string/id table
It'll also conditionally generate the defines, so that if we don't have those when building a new tool tarball in an older systems, we get those, and we need them sometimes in the actual scnprintf routine, such as when checking if a flags means we have an extra arg, like with MREMAP_FIXED. $ tools/perf/trace/beauty/mremap_flags.sh static const char *mremap_flags[] = { [ilog2(1) + 1] = "MAYMOVE", #ifndef MREMAP_MAYMOVE #define MREMAP_MAYMOVE 1 #endif [ilog2(2) + 1] = "FIXED", #ifndef MREMAP_FIXED #define MREMAP_FIXED 2 #endif [ilog2(4) + 1] = "DONTUNMAP", #ifndef MREMAP_DONTUNMAP #define MREMAP_DONTUNMAP 4 #endif }; $ Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
Arnaldo Carvalho de Melo | d758d5d474 |
perf tools: Separate the checking of headers only used to build beautification tables
Some headers are not used in building the tools directly, but instead to generate tables that then gets source code included to do id->string and string->id lookups for things like syscall flags and commands. We were adding it directly to tools/include/ and this sometimes gets in the way of building using system headers, lets untangle this a bit. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
Arnaldo Carvalho de Melo | 717d182e41 |
Merge remote-tracking branch 'torvalds/master' into perf/core
To pick up fixes and get v5.10 development in sync with the main kernel sources. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
Linus Torvalds | fb0155a09b |
NFS client bugfixes for Linux 5.9
Highlights include: Bugfixes: - NFSv4.2: copy_file_range needs to invalidate caches on success - NFSv4.2: Fix security label length not being reset - pNFS/flexfiles: Ensure we initialise the mirror bsizes correctly on read - pNFS/flexfiles: Fix signed/unsigned type issues with mirror indices -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEESQctxSBg8JpV8KqEZwvnipYKAPIFAl9yHBYACgkQZwvnipYK APLKCA//Sppmzm+kFDmZ6iWplwdoIq7rnIMG7eKKGD754dDvOtYNIw9D9yOIY5G6 eVdvQ10m6vA8Dp8AxaWK9qacMXljmOX8szz+Bf1NcIe2F6X/waO3zMoud8Rd9Ja4 PigAbAW6Gs0gohL3wg+jh5N5JlaDcZ0Dri3QWdqGaHjhrKV9MW9h0BpBCx9YCPkL FFgk+I+524rGQnkHvCWbclww4428e+MSYdeJE+c4wrIx/HCz3iJ60AFA0SIAw7FV 6qMtxN4/kqfdIrA074xcreMdkucxe3lNl7ujT1T6dum2OwERq+WyzkwoirqNguJM X71CXU9IE8rw72ATWMoba961i4HITp05ZbVg7yXZrrRkAEljyHhr67R/1RRSlxQm ZrPOICrCoXKHRFTbNL7Sb+xeTGbuZQkbcwGXnUYdTIO3JQ6PRIEFb/y8yuuT+EPG KWk2vM+QM9036qfBWjbAZMOpwDB4oiVkBgzNM8FGcebiV1FANQ1by7oMaQsH1NLm WY0M0KFY2wdv3ovGT7oUOEbtoxD993HuuLdIWxTRHFjRPgg8WKTFnf4BIeZtMjY8 oRvN83hEjWszuTEuuEukUdsqLTftv7rNhxrotoh9WfeSXvJDB6PF0y55UmZ6WuKE wRQQLxC9Om+E3HidxgOolqKxD6d4OOY3XJWzH3As7sJEgQyE/5o= =cNi/ -----END PGP SIGNATURE----- Merge tag 'nfs-for-5.9-3' of git://git.linux-nfs.org/projects/trondmy/linux-nfs Pull NFS client bugfixes from Trond Myklebust: "Highlights include: - NFSv4.2: copy_file_range needs to invalidate caches on success - NFSv4.2: Fix security label length not being reset - pNFS/flexfiles: Ensure we initialise the mirror bsizes correctly on read - pNFS/flexfiles: Fix signed/unsigned type issues with mirror indices" * tag 'nfs-for-5.9-3' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: pNFS/flexfiles: Be consistent about mirror index types pNFS/flexfiles: Ensure we initialise the mirror bsizes correctly on read NFSv4.2: fix client's attribute cache management for copy_file_range nfs: Fix security label length not being reset |
|
Jason A. Donenfeld | a4d63c3732 |
mm: do not rely on mm == current->mm in __get_user_pages_locked
It seems likely this block was pasted from internal_get_user_pages_fast,
which is not passed an mm struct and therefore uses current's. But
__get_user_pages_locked is passed an explicit mm, and current->mm is not
always valid. This was hit when being called from i915, which uses:
pin_user_pages_remote->
__get_user_pages_remote->
__gup_longterm_locked->
__get_user_pages_locked
Before, this would lead to an OOPS:
BUG: kernel NULL pointer dereference, address: 0000000000000064
#PF: supervisor write access in kernel mode
#PF: error_code(0x0002) - not-present page
CPU: 10 PID: 1431 Comm: kworker/u33:1 Tainted: P S U O 5.9.0-rc7+ #140
Hardware name: LENOVO 20QTCTO1WW/20QTCTO1WW, BIOS N2OET47W (1.34 ) 08/06/2020
Workqueue: i915-userptr-acquire __i915_gem_userptr_get_pages_worker [i915]
RIP: 0010:__get_user_pages_remote+0xd7/0x310
Call Trace:
__i915_gem_userptr_get_pages_worker+0xc8/0x260 [i915]
process_one_work+0x1ca/0x390
worker_thread+0x48/0x3c0
kthread+0x114/0x130
ret_from_fork+0x1f/0x30
CR2: 0000000000000064
This commit fixes the problem by using the mm pointer passed to the
function rather than the bogus one in current.
Fixes:
|
|
Ian Rogers | a55b7bb1c1 |
perf test: Fix msan uninitialized use.
Ensure 'st' is initialized before an error branch is taken.
Fixes test "67: Parse and process metrics" with LLVM msan:
==6757==WARNING: MemorySanitizer: use-of-uninitialized-value
#0 0x5570edae947d in rblist__exit tools/perf/util/rblist.c:114:2
#1 0x5570edb1c6e8 in runtime_stat__exit tools/perf/util/stat-shadow.c:141:2
#2 0x5570ed92cfae in __compute_metric tools/perf/tests/parse-metric.c:187:2
#3 0x5570ed92cb74 in compute_metric tools/perf/tests/parse-metric.c:196:9
#4 0x5570ed92c6d8 in test_recursion_fail tools/perf/tests/parse-metric.c:318:2
#5 0x5570ed92b8c8 in test__parse_metric tools/perf/tests/parse-metric.c:356:2
#6 0x5570ed8de8c1 in run_test tools/perf/tests/builtin-test.c:410:9
#7 0x5570ed8ddadf in test_and_print tools/perf/tests/builtin-test.c:440:9
#8 0x5570ed8dca04 in __cmd_test tools/perf/tests/builtin-test.c:661:4
#9 0x5570ed8dbc07 in cmd_test tools/perf/tests/builtin-test.c:807:9
#10 0x5570ed7326cc in run_builtin tools/perf/perf.c:313:11
#11 0x5570ed731639 in handle_internal_command tools/perf/perf.c:365:8
#12 0x5570ed7323cd in run_argv tools/perf/perf.c:409:2
#13 0x5570ed731076 in main tools/perf/perf.c:539:3
Fixes: commit
|
|
Ian Rogers | aa98d8482c |
perf parse-events: Reduce casts around bp_addr
perf_event_attr bp_addr is a u64. parse-events.y parses it as a u64, but casts it to a void* and then parse-events.c casts it back to a u64. Rather than all the casts, change the type of the address to be a u64. This removes an issue noted in: https://lore.kernel.org/lkml/20200903184359.GC3495158@kernel.org/ Signed-off-by: Ian Rogers <irogers@google.com> Acked-by: Jiri Olsa <jolsa@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Jin Yao <yao.jin@linux.intel.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: http://lore.kernel.org/lkml/20200925003903.561568-1-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
Namhyung Kim | 40b74c30ff |
perf test: Add expand cgroup event test
It'll expand given events for cgroups A, B and C. $ perf test -v expansion 69: Event expansion for cgroups : --- start --- test child forked, pid 983140 metric expr 1 / IPC for CPI metric expr instructions / cycles for IPC found event instructions found event cycles adding {instructions,cycles}:W copying metric event for cgroup 'A': instructions (idx=0) copying metric event for cgroup 'B': instructions (idx=0) copying metric event for cgroup 'C': instructions (idx=0) test child finished with 0 ---- end ---- Event expansion for cgroups: Ok Signed-off-by: Namhyung Kim <namhyung@kernel.org> Acked-by: Jiri Olsa <jolsa@redhat.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: John Garry <john.garry@huawei.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: http://lore.kernel.org/lkml/20200924124455.336326-6-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
Namhyung Kim | 89fb1ca2ab |
perf tools: Allow creation of cgroup without open
This is a preparation for a test case of expanding events for multiple cgroups. Instead of using real system cgroup, the test will use fake cgroups so it needs a way to have them without a open file descriptor. Signed-off-by: Namhyung Kim <namhyung@kernel.org> Acked-by: Jiri Olsa <jolsa@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: http://lore.kernel.org/lkml/20200924124455.336326-5-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
Namhyung Kim | b214ba8c42 |
perf tools: Copy metric events properly when expand cgroups
The metricgroup__copy_metric_events() is to handle metrics events when expanding event for cgroups. As the metric events keep pointers to evsel, it should be refreshed when events are cloned during the operation. The perf_stat__collect_metric_expr() is also called in case an event has a metric directly. During the copy, it references evsel by index as the evlist now has cloned evsels for the given cgroup. Also kernel test robot found an issue in the python module import so add empty implementations of those two functions to fix it. Reported-by: kernel test robot <rong.a.chen@intel.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Acked-by: Jiri Olsa <jolsa@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: John Garry <john.garry@huawei.com> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: http://lore.kernel.org/lkml/20200924124455.336326-4-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
Namhyung Kim | d1c5a0e86a |
perf stat: Add --for-each-cgroup option
The --for-each-cgroup option is a syntax sugar to monitor large number of cgroups easily. Current command line requires to list all the events and cgroups even if users want to monitor same events for each cgroup. This patch addresses that usage by copying given events for each cgroup on user's behalf. For instance, if they want to monitor 6 events for 200 cgroups each they should write 1200 event names (with -e) AND 1200 cgroup names (with -G) on the command line. But with this change, they can just specify 6 events and 200 cgroups with a new option. A simpler example below: It wants to measure 3 events for 2 cgroups ('A' and 'B'). The result is that total 6 events are counted like below. $ perf stat -a -e cpu-clock,cycles,instructions --for-each-cgroup A,B sleep 1 Performance counter stats for 'system wide': 988.18 msec cpu-clock A # 0.987 CPUs utilized 3,153,761,702 cycles A # 3.200 GHz (100.00%) 8,067,769,847 instructions A # 2.57 insn per cycle (100.00%) 982.71 msec cpu-clock B # 0.982 CPUs utilized 3,136,093,298 cycles B # 3.182 GHz (99.99%) 8,109,619,327 instructions B # 2.58 insn per cycle (99.99%) 1.001228054 seconds time elapsed Signed-off-by: Namhyung Kim <namhyung@kernel.org> Acked-by: Jiri Olsa <jolsa@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: http://lore.kernel.org/lkml/20200924124455.336326-3-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
Namhyung Kim | 7fedd9b84b |
perf evsel: Add evsel__clone() function
The evsel__clone() is to create an exactly same evsel from same attributes. The function assumes the given evsel is not configured yet so it cares fields set during event parsing. Those fields are now moved together as Jiri suggested. Note that metric events will be handled by later patch. It will be used by perf stat to generate separate events for each cgroup. Signed-off-by: Namhyung Kim <namhyung@kernel.org> Acked-by: Jiri Olsa <jolsa@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: http://lore.kernel.org/lkml/20200924124455.336326-2-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
Jin Yao | b5ff7f2799 |
perf vendor events: Update SkylakeX events to v1.21
- Update SkylakeX events to v1.21. - Update SkylakeX JSON metrics from TMAM 4.0. Other fixes: - Add NO_NMI_WATCHDOG metric constraint to Backend_Bound - Fix misspelled error Signed-off-by: Jin Yao <yao.jin@linux.intel.com> Reviewed-by: Andi Kleen <ak@linux.intel.com> Acked-by: Ian Rogers <irogers@google.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/lkml/20200922031918.3723-1-yao.jin@linux.intel.com/ Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
Jin Yao | 038d3b53c2 |
perf vendor events intel: Update CascadelakeX events to v1.08
- Update CascadelakeX events to v1.08. - Update CascadelakeX JSON metrics from TMAM 4.0. Other fixes: - Add NO_NMI_WATCHDOG metric constraint to Backend_Bound - Change 'MB/sec' to 'MB' in UNC_M_PMM_BANDWIDTH. Signed-off-by: Jin Yao <yao.jin@linux.intel.com> Reviewed-by: Andi Kleen <ak@linux.intel.com> Acked-by: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Kan Liang <kan.liang@intel.com> Link: https://lore.kernel.org/lkml/20200922031918.3723-1-yao.jin@linux.intel.com/ Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
Linus Torvalds | a1b8638ba1 | Linux 5.9-rc7 | |
Linus Torvalds | 16bc1d5432 |
Kbuild fixes for v5.9 (4th)
- Ignore compiler stubs for PPC to fix builds - Fix the usage of --target mentioned in the LLVM document -----BEGIN PGP SIGNATURE----- iQJJBAABCgAzFiEEbmPs18K1szRHjPqEPYsBB53g2wYFAl9wyzAVHG1hc2FoaXJv eUBrZXJuZWwub3JnAAoJED2LAQed4NsGlGAQAIp9spj8Qd1U5lC7YVrTnYglG4rV BHPlNsBcV0V4i6OYunnEozO2wcyjii0WpOsxuA2o1ilDxoNSPPjPG73ySU2CNUYy ZMkuPER9vk/6MLFl4lyzxlSF2VkGem3Lk2M1yGvLKiiEAR6AafxZ6CdVig7kFz1H ZJV11ms9pufCpOVDMCnazS96I/EQn60uTfiy992Xjtkaet1tEaWh1KirH2OYAbBO oNQnlpHZVXp/xnexHz9z2ctMjWBGH2cDws5iDtIHTep2lYHK5i8CqOKzfKX88NRS txBZMJjDn/U825u+P8NHNaQsC1Vo0ECAsX/5YTjsLT2BZPG049BmTe0HNGDtVTB3 bvSuRfG96L75qDxZusCtpk2TLKT7ntnV4bQYFXyilr2rZ0cXne10FmTOBhuEt1DQ r7o6/M4D2X2KpDTSIqfJRUozlyzBtmX9JBnKZIYvaC92ChhVch7b8RPeXV1IMqPS tCYpSszPbc/u15Bp2SPd7hZ1xVnKi+v8FzVRLJFLSHkWME61J/6x/v9c9jMu9buC y7vU1SNagkL2hf9YvsVWgiA5RvtYuj3o2uyMOZQiDE3h4RbxnSuM7IQKAH/hG4W0 LE5hMB925Z5+H4gOduSC1pK9uxd6r6HpmJmNd4V2qEIqikNCfYIa6i0k/k3feZKE dJPqBX3wf8FsaJkU =kjmg -----END PGP SIGNATURE----- Merge tag 'kbuild-fixes-v5.9-4' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild Pull Kbuild fixes from Masahiro Yamada: - ignore compiler stubs for PPC to fix builds - fix the usage of --target mentioned in the LLVM document * tag 'kbuild-fixes-v5.9-4' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: Documentation/llvm: Fix clang target examples scripts/kallsyms: skip ppc compiler stub *.long_branch.* / *.plt_branch.* |
|
Linus Torvalds | f8818559ca |
Two fixes for the x86 interrupt code:
- Unbreak the magic 'search the timer interrupt' logic in IO/APIC code which got wreckaged when the core interrupt code made the state tracking logic stricter. That caused the interrupt line to stay masked after switching from IO/APIC to PIC delivery mode, which obviously prevents interrupts from being delivered. - Make run_on_irqstack_code() typesafe. The function argument is a void pointer which is then casted to 'void (*fun)(void *). This breaks Control Flow Integrity checking in clang. Use proper helper functions for the three variants reuqired. -----BEGIN PGP SIGNATURE----- iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAl9wqnATHHRnbHhAbGlu dXRyb25peC5kZQAKCRCmGPVMDXSYoWEHD/402XehvgOXySy/KuhMgtSAoJ/OElvo dnZYugCEh/6mllRbAhH0hkWpjxdhmCEYjWjf5Qqj01KYbQRSzYIhitmPyAI/Z2Dk CqhvRj4Hko7AQGmF97mg/VLdwn6sDIslfprgo3o0pqZcuyDd0QqUgkKLgQpvGDXe 4l4OtiOoo2yHnr9wJiiMSVgHkXgp5QpTYAhnWV5ea9FokwO6/OLwjZAHgBh7MqNY 6hnUQIHL/v4K3aXdxOED55Irf05Yk4OqiXKgNMHNLqd0xf+DBTqlfzvQ+omZpMXY sF0SML08r2f+KnBGuXlXDeu7MDRJlBO9Jxeq/mLcyLqP1K2nI2Ue3B6dy7AocKX3 p1aUu0TUtRpDNdhMZSHFon1cQfPc8pFZSVkdq0ke2CYd+HleIrklvQzSn7gGrQqJ ZvWT9zs/XONXqxcMcSS1+XCQdJsRwIqbvY0Vxa68OIhZcKrsP7Ewln/sj46ghzOq kTbxXyG58anK2ssalJqR148tsDlZBrBlo3/Bm3sITIoAtb8m+jeuqi1fZnvmfkAC IiSEqY+p7CnNz9pYEA5T4GHDHE2luBA6YryclvCEAvi60XMZ+LW+T5szs6+fOjS+ y4RK2dGgcE0ewlvm0jzWKLvKMdi7iv5c9ndtE1W4qxy/K/uXKvfHe45SpmcFAyqx FVjhT+kF/HKF1A== =BcVu -----END PGP SIGNATURE----- Merge tag 'x86-urgent-2020-09-27' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Thomas Gleixner: "Two fixes for the x86 interrupt code: - Unbreak the magic 'search the timer interrupt' logic in IO/APIC code which got wreckaged when the core interrupt code made the state tracking logic stricter. That caused the interrupt line to stay masked after switching from IO/APIC to PIC delivery mode, which obviously prevents interrupts from being delivered. - Make run_on_irqstack_code() typesafe. The function argument is a void pointer which is then cast to 'void (*fun)(void *). This breaks Control Flow Integrity checking in clang. Use proper helper functions for the three variants reuqired" * tag 'x86-urgent-2020-09-27' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/ioapic: Unbreak check_timer() x86/irq: Make run_on_irqstack_cond() typesafe |
|
Linus Torvalds | ba25f0570b |
A set of clocksource/clockevents updates:
- Reset the TI/DM timer before enabling it instead of doing it the other way round. - Initialize the reload value for the GX6605s timer correctly so the hardware counter starts at 0 again after overrun. - Make error return value negative in the h8300 timer init function -----BEGIN PGP SIGNATURE----- iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAl9wqQ0THHRnbHhAbGlu dXRyb25peC5kZQAKCRCmGPVMDXSYoZx0EACJUIlCC54kw4CnZdxhoWu0f6tXEuip +Iyb8OJw56FdyHigvkPBMoF1o4a0Ax32TbYYOKntpDy67vnqkO6DV1M/Mwt8IhfO ey7h1t7e4y2vrXAfYN0oX1ZQAk9hkPGW5+wugEf6dbZZva7mm+jV0PfNP/yn7KWS n9lUrLNlPJdndSIYwj9Cto5mMQBsM7/qM8MkBR84i8GxFP2rofh4C5bD8WTnXzHd B8898riwkaaQmfq/Ch9Y79oMzpZXysAEYpZ3YExkQsEmi5YqZ8k6R8RD18mKQdFH 7Kqh/025j7oKk9fopOvPjZ9sIX22gGP8C+tdy3sipYDCY0wRVNu+SPXppwl0T9ML JLX/D2pC20f/VUQ21yc8KgVt76g8QID4t+NV5/VdIHuxhei/4WN3hJxuI4w4Ivfn YK8mB5TK+R4K8Ln+GFE0zh/wfpjJe84K7r4NmDJnClD8chTVhVZHOlv5qJBZzob8 Yd4fMFS0WufAj15ZMN55iLFEI30iubY5X1xaDD1sFrFJyO1VCj8ITH7mBtW9zW1a a/8LQlB5yIjLNTGVZGTCcYfyQ7+MA1EmkutD7AnFN87Zwx6FtDYGEPZq/KI3dwrw 2qA7HTVBYoWQvSOQWt8inuXsbnqUQ2Hq2y8cIuieg333OGc1WQS6BZOeLdJWNGas W0JztaeFr1S3ew== =Htin -----END PGP SIGNATURE----- Merge tag 'timers-urgent-2020-09-27' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timer updates from Thomas Gleixner: "A set of clocksource/clockevents updates: - Reset the TI/DM timer before enabling it instead of doing it the other way round. - Initialize the reload value for the GX6605s timer correctly so the hardware counter starts at 0 again after overrun. - Make error return value negative in the h8300 timer init function" * tag 'timers-urgent-2020-09-27' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: clocksource/drivers/timer-gx6605s: Fixup counter reload clocksource/drivers/timer-ti-dm: Do reset before enable clocksource/drivers/h8300_timer8: Fix wrong return value in h8300_8timer_init() |
|
Peter Xu | d042035eaf |
mm/thp: Split huge pmds/puds if they're pinned when fork()
Pinned pages shouldn't be write-protected when fork() happens, because follow up copy-on-write on these pages could cause the pinned pages to be replaced by random newly allocated pages. For huge PMDs, we split the huge pmd if pinning is detected. So that future handling will be done by the PTE level (with our latest changes, each of the small pages will be copied). We can achieve this by let copy_huge_pmd() return -EAGAIN for pinned pages, so that we'll fallthrough in copy_pmd_range() and finally land the next copy_pte_range() call. Huge PUDs will be even more special - so far it does not support anonymous pages. But it can actually be done the same as the huge PMDs even if the split huge PUDs means to erase the PUD entries. It'll guarantee the follow up fault ins will remap the same pages in either parent/child later. This might not be the most efficient way, but it should be easy and clean enough. It should be fine, since we're tackling with a very rare case just to make sure userspaces that pinned some thps will still work even without MADV_DONTFORK and after they fork()ed. Signed-off-by: Peter Xu <peterx@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
|
Peter Xu | 70e806e4e6 |
mm: Do early cow for pinned pages during fork() for ptes
This allows copy_pte_range() to do early cow if the pages were pinned on the source mm. Currently we don't have an accurate way to know whether a page is pinned or not. The only thing we have is page_maybe_dma_pinned(). However that's good enough for now. Especially, with the newly added mm->has_pinned flag to make sure we won't affect processes that never pinned any pages. It would be easier if we can do GFP_KERNEL allocation within copy_one_pte(). Unluckily, we can't because we're with the page table locks held for both the parent and child processes. So the page allocation needs to be done outside copy_one_pte(). Some trick is there in copy_present_pte(), majorly the wrprotect trick to block concurrent fast-gup. Comments in the function should explain better in place. Oleg Nesterov reported a (probably harmless) bug during review that we didn't reset entry.val properly in copy_pte_range() so that potentially there's chance to call add_swap_count_continuation() multiple times on the same swp entry. However that should be harmless since even if it happens, the same function (add_swap_count_continuation()) will return directly noticing that there're enough space for the swp counter. So instead of a standalone stable patch, it is touched up in this patch directly. Link: https://lore.kernel.org/lkml/20200914143829.GA1424636@nvidia.com/ Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Peter Xu <peterx@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
|
Peter Xu | 7a4830c380 |
mm/fork: Pass new vma pointer into copy_page_range()
This prepares for the future work to trigger early cow on pinned pages during fork(). No functional change intended. Signed-off-by: Peter Xu <peterx@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
|
Peter Xu | 008cfe4418 |
mm: Introduce mm_struct.has_pinned
(Commit message majorly collected from Jason Gunthorpe) Reduce the chance of false positive from page_maybe_dma_pinned() by keeping track if the mm_struct has ever been used with pin_user_pages(). This allows cases that might drive up the page ref_count to avoid any penalty from handling dma_pinned pages. Future work is planned, to provide a more sophisticated solution, likely to turn it into a real counter. For now, make it atomic_t but use it as a boolean for simplicity. Suggested-by: Jason Gunthorpe <jgg@ziepe.ca> Signed-off-by: Peter Xu <peterx@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
|
Thomas Gleixner | a7b6c0feda |
- Fix wrong signed return value when checking of_iomap in the probe
function for the h8300 timer (Tianjia Zhang) - Fix reset sequence when setting up the timer on the dm_timer (Tony Lindgren) - Fix counter reload when the interrupt fires on gx6605s (Guo Ren) -----BEGIN PGP SIGNATURE----- iQEzBAABCAAdFiEEGn3N4YVz0WNVyHskqDIjiipP6E8FAl9SG88ACgkQqDIjiipP 6E/zAgf+KtVSIp8NF1N0WPbwVHzqvg4RODIKvLgbaPvRPQRmU+tNZnLtRBfQkyg+ ul6snRlhHAAI1auLczLXRVb/u2ERtjqBukxrA7ECMJNx+bGOars4j2w6tSQ1InFU B659c2ELvMSe97mrpnWAs9IWcau3fMMweNn2rTL6erK9nt2Ap0cjmf58tSG72hh1 0TRBbLaFSwj7Eq+IG8YjsYHzeQ/1DlHcaPbfFOFNwgWIq5Q4GSHvc1sbkmzeb6ps lx2nmF3sw9KSN5cyvALtU7944nNIexMQNcxXcCjRuLFtgNFlyJseWEQY5rmKHYWN vVGMEjyM184g2ZYA/ll8uFZtZ0SA6g== =E64+ -----END PGP SIGNATURE----- Merge tag 'timers-v5.9-rc4' of https://git.linaro.org/people/daniel.lezcano/linux into timers/urgent Pull clocksource/clockevent fixes from Daniel Lezcano: - Fix wrong signed return value when checking of_iomap in the probe function for the h8300 timer (Tianjia Zhang) - Fix reset sequence when setting up the timer on the dm_timer (Tony Lindgren) - Fix counter reload when the interrupt fires on gx6605s (Guo Ren) |
|
Linus Torvalds | a1bffa4874 |
SCSI fixes on 20200926
Three fixes: one in drivers (lpfc) and two for zoned block devices. The latter also impinges on the block layer but only to introduce a new block API for setting the zone model rather than fiddling with the queue directly in the zoned block driver. Signed-off-by: James E.J. Bottomley <jejb@linux.ibm.com> -----BEGIN PGP SIGNATURE----- iJwEABMIAEQWIQTnYEDbdso9F2cI+arnQslM7pishQUCX29mRyYcamFtZXMuYm90 dG9tbGV5QGhhbnNlbnBhcnRuZXJzaGlwLmNvbQAKCRDnQslM7pishabnAP48vMYD /cjyGAJfq/0k/U/t6pRPc5tUm89LOWcOJz0SjwD/YXcQNz7mx8MxnypAV1jbWXR7 iyWkPMYVc4EJh7oTARE= =SQhI -----END PGP SIGNATURE----- Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi Pull SCSI fixes from James Bottomley: "Three fixes: one in drivers (lpfc) and two for zoned block devices. The latter also impinges on the block layer but only to introduce a new block API for setting the zone model rather than fiddling with the queue directly in the zoned block driver" * tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: scsi: sd: sd_zbc: Fix ZBC disk initialization scsi: sd: sd_zbc: Fix handling of host-aware ZBC disks scsi: lpfc: Fix initial FLOGI failure due to BBSCN not supported |
|
Linus Torvalds | 692495baa3 |
io_uring-5.9-2020-09-25
-----BEGIN PGP SIGNATURE----- iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAl9upV4QHGF4Ym9lQGtl cm5lbC5kawAKCRD301j7KXHgpuPrD/9K1UQLv38K2nPYclLymOi+GIsukpjgwzdY SM38GNXU5vYkFhylH/bXfckNQ/gja0/whNpcr/UVCgTWleMnss9UiZaCgysyuIOL vnBxT4yDZIxtkOwF/790NiwV2FrsmrLFdNZU4LkmfbmmrAlNtjOElKyJsOyNNMzJ UMzHH2Z1vvUwKz+Yq4fPyZCJbpNHN6ABwkSXY/Nz8oWsKfw728fZztLsP57gOtkl yYVFO2z1n7VaWp5ZzVFYG51DFuMCIDXgN6mMlaKfnQ6auQZxjR+jg69HSRKLjIx7 ZEG1gl/DzwH1+751P7HnuI3U7BtBYolyErHW4j4a6Ko4XX8PPhG9ODKOmsEMPrEq gCUGcGgWUsEyz+pyullTEt7ea/oLGJ5N86qtNOdviXETZZTghm47QlzxFWr1/GWy wH++ctBf/Ekk0dbCBF6mJiqDl/PrVSDSClTVhsGJESEmk4BOoC9zd9zT/EfsiR9m vA8wLE2g1/5oU+irQ0Dlc/EENVWISiigOFFvTPJJjma9iGXAW3kV2/aYW6DKZSwM w/va7zTlzt89O+L0AT+Rg8btaiTiaZcs3op1AFa1z5Gut3b5YhWL/e95wlaOI6Nv Tudm4GX06BaN1QdUDV9g0Pr5iNOaCvuOArjNOU3j7ySusJxiJ8GdA3WFqJ/XUlIV pne8hC/+7A== =mBw7 -----END PGP SIGNATURE----- Merge tag 'io_uring-5.9-2020-09-25' of git://git.kernel.dk/linux-block Pull io_uring fixes from Jens Axboe: "Two fixes for regressions in this cycle, and one that goes to 5.8 stable: - fix leak of getname() retrieved filename - remove plug->nowait assignment, fixing a regression with btrfs - fix for async buffered retry" * tag 'io_uring-5.9-2020-09-25' of git://git.kernel.dk/linux-block: io_uring: ensure async buffered read-retry is setup properly io_uring: don't unconditionally set plug->nowait = true io_uring: ensure open/openat2 name is cleaned on cancelation |
|
Linus Torvalds | 9d2fbaefb3 |
block-5.9-2020-09-25
-----BEGIN PGP SIGNATURE----- iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAl9upXAQHGF4Ym9lQGtl cm5lbC5kawAKCRD301j7KXHgplZPD/9cIgt7FM7O1MZYCpp7TH+Da8887UxFDIJ4 VWZOs7JzV0BPHsfonMEYBSsEYvJxA+w2vtD+aTTwBK/+QwvvCNRyPNjEGZRgb8+n o41qRCuuQho1OO9ivGI2C/sGmt7mI9LRZ+ik0yHYVSzW8V9z1Z0D/KB5258pwPEN mhjC+haAX0fjzSckh7Qr+5p8RdO/yxfzR6rugB84qzmwSxiFPdDI0v2bT1paNXPy cHx45ov3Z0UjfDnzpMcldnKznUScayFZ5rkOVaC1G7M7daJbAYnT0pZPAvbE4C9G koMdcIDqX4xsNGsmRePjvAcb2la6Oo0N0tKg8IB0syhyozQBbLH76RfUaybWZpbK JJZNJnGY6KwmrAYYw94uUH/EQ2YMweSp+x2MN503D4gBmFtc3oz6X6cgxXKMB/OH Z0l2D7nRSiVZAEPf/b/RY7N3vkxq1feTQTBgW/lheYU1LPc9w4uWDlpdmQFY+Agn biSZIFspn/WAbtXtRouKbm1fygHnUYqx7PQpyXRwvENFk15wz5174OrO4Doo5r9R 1t9CYzxQFxnfVSukLFFdQxOUU78t9DQDYwTsCZXvTNNuEgv+3sOHQ8iYU7sCQiZh EAz97kqETUf/Av1+5ItzneZTaI22OU6DF2LBmkjxbKp7W+19yO15oo9gOjIR1l+r 8Nr3DMOc3Q== =e8oA -----END PGP SIGNATURE----- Merge tag 'block-5.9-2020-09-25' of git://git.kernel.dk/linux-block Pull block fixes from Jens Axboe: "NVMe pull request from Christoph, and removal of a dead define. - fix error during controller probe that cause double free irqs (Keith Busch) - FC connection establishment fix (James Smart) - properly handle completions for invalid tags (Xianting Tian) - pass the correct nsid to the command effects and supported log (Chaitanya Kulkarni)" * tag 'block-5.9-2020-09-25' of git://git.kernel.dk/linux-block: block: remove unused BLK_QC_T_EAGAIN flag nvme-core: don't use NVME_NSID_ALL for command effects and supported log nvme-fc: fail new connections to a deleted host or remote port nvme-pci: fix NULL req in completion handler nvme: return errors for hwmon init |
|
Linus Torvalds | eeddbe6841 |
s390 fixes for 5.9-rc7
- Fix truncated ZCRYPT_PERDEV_REQCNT ioctl result. Copy entire reqcnt list. -----BEGIN PGP SIGNATURE----- iQEzBAABCAAdFiEE3QHqV+H2a8xAv27vjYWKoQLXFBgFAl9vAiwACgkQjYWKoQLX FBj+xwf8DvOpBW0mo0D+aYqUrA0ETpeF5GIJe+yG/Sf4ozx7Qk49GvoC2ahEomSN jM4V7wkcnbjoPe1Cisv5PfzSf88AaZFJBbcoC7UKHQqVF2N8FVw59I2zI1+TFf6O RXtDZYeaVKNT2l40xrkZJudG4FEhlo9W7FZZqgdQy/4YZmof7p8mMQ0VhB/HQuh0 qsihtefG9PLKYUUw7GvEtrPfrWoyFfwWfMAc00JklPsnwb6Y+heYgdzNM2ElIj6m u5dmnZoq7qsyAcwdA1t1TSCYgGSuYjny87NJ6dbe2HPzRFmwIhIAj1Jgr4ubFDBF 4zZn0S8PDE4r4MEKDfvq9kohQ6uXCQ== =+5YA -----END PGP SIGNATURE----- Merge tag 's390-5.9-7' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux Pull s390 fix from Vasily Gorbik: "Fix truncated ZCRYPT_PERDEV_REQCNT ioctl result. Copy entire reqcnt list" * tag 's390-5.9-7' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: s390/zcrypt: Fix ZCRYPT_PERDEV_REQCNT ioctl |
|
Linus Torvalds | 8fb1e91033 |
Merge branch 'akpm' (patches from Andrew)
Merge misc fixes from Andrew Morton: "9 patches. Subsystems affected by this patch series: mm (thp, memcg, gup, migration, memory-hotplug), lib, and x86" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: mm: don't rely on system state to detect hot-plug operations mm: replace memmap_context by meminit_context arch/x86/lib/usercopy_64.c: fix __copy_user_flushcache() cache writeback lib/memregion.c: include memregion.h lib/string.c: implement stpcpy mm/migrate: correct thp migration stats mm/gup: fix gup_fast with dynamic page table folding mm: memcontrol: fix missing suffix of workingset_restore mm, THP, swap: fix allocating cluster for swapfile by mistake |
|
Minchan Kim | ce2684254b |
mm: validate pmd after splitting
syzbot reported the following KASAN splat:
general protection fault, probably for non-canonical address 0xdffffc0000000003: 0000 [#1] PREEMPT SMP KASAN
KASAN: null-ptr-deref in range [0x0000000000000018-0x000000000000001f]
CPU: 1 PID: 6826 Comm: syz-executor142 Not tainted 5.9.0-rc4-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
RIP: 0010:__lock_acquire+0x84/0x2ae0 kernel/locking/lockdep.c:4296
Code: ff df 8a 04 30 84 c0 0f 85 e3 16 00 00 83 3d 56 58 35 08 00 0f 84 0e 17 00 00 83 3d 25 c7 f5 07 00 74 2c 4c 89 e8 48 c1 e8 03 <80> 3c 30 00 74 12 4c 89 ef e8 3e d1 5a 00 48 be 00 00 00 00 00 fc
RSP: 0018:ffffc90004b9f850 EFLAGS: 00010006
Call Trace:
lock_acquire+0x140/0x6f0 kernel/locking/lockdep.c:5006
__raw_spin_lock include/linux/spinlock_api_smp.h:142 [inline]
_raw_spin_lock+0x2a/0x40 kernel/locking/spinlock.c:151
spin_lock include/linux/spinlock.h:354 [inline]
madvise_cold_or_pageout_pte_range+0x52f/0x25c0 mm/madvise.c:389
walk_pmd_range mm/pagewalk.c:89 [inline]
walk_pud_range mm/pagewalk.c:160 [inline]
walk_p4d_range mm/pagewalk.c:193 [inline]
walk_pgd_range mm/pagewalk.c:229 [inline]
__walk_page_range+0xe7b/0x1da0 mm/pagewalk.c:331
walk_page_range+0x2c3/0x5c0 mm/pagewalk.c:427
madvise_pageout_page_range mm/madvise.c:521 [inline]
madvise_pageout mm/madvise.c:557 [inline]
madvise_vma mm/madvise.c:946 [inline]
do_madvise+0x12d0/0x2090 mm/madvise.c:1145
__do_sys_madvise mm/madvise.c:1171 [inline]
__se_sys_madvise mm/madvise.c:1169 [inline]
__x64_sys_madvise+0x76/0x80 mm/madvise.c:1169
do_syscall_64+0x31/0x70 arch/x86/entry/common.c:46
entry_SYSCALL_64_after_hwframe+0x44/0xa9
The backing vma was shmem.
In case of split page of file-backed THP, madvise zaps the pmd instead
of remapping of sub-pages. So we need to check pmd validity after
split.
Reported-by: syzbot+ecf80462cb7d5d552bc7@syzkaller.appspotmail.com
Fixes:
|
|
Laurent Dufour | f85086f95f |
mm: don't rely on system state to detect hot-plug operations
In register_mem_sect_under_node() the system_state's value is checked to
detect whether the call is made during boot time or during an hot-plug
operation. Unfortunately, that check against SYSTEM_BOOTING is wrong
because regular memory is registered at SYSTEM_SCHEDULING state. In
addition, memory hot-plug operation can be triggered at this system
state by the ACPI [1]. So checking against the system state is not
enough.
The consequence is that on system with interleaved node's ranges like this:
Early memory node ranges
node 1: [mem 0x0000000000000000-0x000000011fffffff]
node 2: [mem 0x0000000120000000-0x000000014fffffff]
node 1: [mem 0x0000000150000000-0x00000001ffffffff]
node 0: [mem 0x0000000200000000-0x000000048fffffff]
node 2: [mem 0x0000000490000000-0x00000007ffffffff]
This can be seen on PowerPC LPAR after multiple memory hot-plug and
hot-unplug operations are done. At the next reboot the node's memory
ranges can be interleaved and since the call to link_mem_sections() is
made in topology_init() while the system is in the SYSTEM_SCHEDULING
state, the node's id is not checked, and the sections registered to
multiple nodes:
$ ls -l /sys/devices/system/memory/memory21/node*
total 0
lrwxrwxrwx 1 root root 0 Aug 24 05:27 node1 -> ../../node/node1
lrwxrwxrwx 1 root root 0 Aug 24 05:27 node2 -> ../../node/node2
In that case, the system is able to boot but if later one of theses
memory blocks is hot-unplugged and then hot-plugged, the sysfs
inconsistency is detected and this is triggering a BUG_ON():
kernel BUG at /Users/laurent/src/linux-ppc/mm/memory_hotplug.c:1084!
Oops: Exception in kernel mode, sig: 5 [#1]
LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries
Modules linked in: rpadlpar_io rpaphp pseries_rng rng_core vmx_crypto gf128mul binfmt_misc ip_tables x_tables xfs libcrc32c crc32c_vpmsum autofs4
CPU: 8 PID: 10256 Comm: drmgr Not tainted 5.9.0-rc1+ #25
Call Trace:
add_memory_resource+0x23c/0x340 (unreliable)
__add_memory+0x5c/0xf0
dlpar_add_lmb+0x1b4/0x500
dlpar_memory+0x1f8/0xb80
handle_dlpar_errorlog+0xc0/0x190
dlpar_store+0x198/0x4a0
kobj_attr_store+0x30/0x50
sysfs_kf_write+0x64/0x90
kernfs_fop_write+0x1b0/0x290
vfs_write+0xe8/0x290
ksys_write+0xdc/0x130
system_call_exception+0x160/0x270
system_call_common+0xf0/0x27c
This patch addresses the root cause by not relying on the system_state
value to detect whether the call is due to a hot-plug operation. An
extra parameter is added to link_mem_sections() detailing whether the
operation is due to a hot-plug operation.
[1] According to Oscar Salvador, using this qemu command line, ACPI
memory hotplug operations are raised at SYSTEM_SCHEDULING state:
$QEMU -enable-kvm -machine pc -smp 4,sockets=4,cores=1,threads=1 -cpu host -monitor pty \
-m size=$MEM,slots=255,maxmem=4294967296k \
-numa node,nodeid=0,cpus=0-3,mem=512 -numa node,nodeid=1,mem=512 \
-object memory-backend-ram,id=memdimm0,size=134217728 -device pc-dimm,node=0,memdev=memdimm0,id=dimm0,slot=0 \
-object memory-backend-ram,id=memdimm1,size=134217728 -device pc-dimm,node=0,memdev=memdimm1,id=dimm1,slot=1 \
-object memory-backend-ram,id=memdimm2,size=134217728 -device pc-dimm,node=0,memdev=memdimm2,id=dimm2,slot=2 \
-object memory-backend-ram,id=memdimm3,size=134217728 -device pc-dimm,node=0,memdev=memdimm3,id=dimm3,slot=3 \
-object memory-backend-ram,id=memdimm4,size=134217728 -device pc-dimm,node=1,memdev=memdimm4,id=dimm4,slot=4 \
-object memory-backend-ram,id=memdimm5,size=134217728 -device pc-dimm,node=1,memdev=memdimm5,id=dimm5,slot=5 \
-object memory-backend-ram,id=memdimm6,size=134217728 -device pc-dimm,node=1,memdev=memdimm6,id=dimm6,slot=6 \
Fixes:
|
|
Laurent Dufour | c1d0da8335 |
mm: replace memmap_context by meminit_context
Patch series "mm: fix memory to node bad links in sysfs", v3. Sometimes, firmware may expose interleaved memory layout like this: Early memory node ranges node 1: [mem 0x0000000000000000-0x000000011fffffff] node 2: [mem 0x0000000120000000-0x000000014fffffff] node 1: [mem 0x0000000150000000-0x00000001ffffffff] node 0: [mem 0x0000000200000000-0x000000048fffffff] node 2: [mem 0x0000000490000000-0x00000007ffffffff] In that case, we can see memory blocks assigned to multiple nodes in sysfs: $ ls -l /sys/devices/system/memory/memory21 total 0 lrwxrwxrwx 1 root root 0 Aug 24 05:27 node1 -> ../../node/node1 lrwxrwxrwx 1 root root 0 Aug 24 05:27 node2 -> ../../node/node2 -rw-r--r-- 1 root root 65536 Aug 24 05:27 online -r--r--r-- 1 root root 65536 Aug 24 05:27 phys_device -r--r--r-- 1 root root 65536 Aug 24 05:27 phys_index drwxr-xr-x 2 root root 0 Aug 24 05:27 power -r--r--r-- 1 root root 65536 Aug 24 05:27 removable -rw-r--r-- 1 root root 65536 Aug 24 05:27 state lrwxrwxrwx 1 root root 0 Aug 24 05:25 subsystem -> ../../../../bus/memory -rw-r--r-- 1 root root 65536 Aug 24 05:25 uevent -r--r--r-- 1 root root 65536 Aug 24 05:27 valid_zones The same applies in the node's directory with a memory21 link in both the node1 and node2's directory. This is wrong but doesn't prevent the system to run. However when later, one of these memory blocks is hot-unplugged and then hot-plugged, the system is detecting an inconsistency in the sysfs layout and a BUG_ON() is raised: kernel BUG at /Users/laurent/src/linux-ppc/mm/memory_hotplug.c:1084! LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries Modules linked in: rpadlpar_io rpaphp pseries_rng rng_core vmx_crypto gf128mul binfmt_misc ip_tables x_tables xfs libcrc32c crc32c_vpmsum autofs4 CPU: 8 PID: 10256 Comm: drmgr Not tainted 5.9.0-rc1+ #25 Call Trace: add_memory_resource+0x23c/0x340 (unreliable) __add_memory+0x5c/0xf0 dlpar_add_lmb+0x1b4/0x500 dlpar_memory+0x1f8/0xb80 handle_dlpar_errorlog+0xc0/0x190 dlpar_store+0x198/0x4a0 kobj_attr_store+0x30/0x50 sysfs_kf_write+0x64/0x90 kernfs_fop_write+0x1b0/0x290 vfs_write+0xe8/0x290 ksys_write+0xdc/0x130 system_call_exception+0x160/0x270 system_call_common+0xf0/0x27c This has been seen on PowerPC LPAR. The root cause of this issue is that when node's memory is registered, the range used can overlap another node's range, thus the memory block is registered to multiple nodes in sysfs. There are two issues here: (a) The sysfs memory and node's layouts are broken due to these multiple links (b) The link errors in link_mem_sections() should not lead to a system panic. To address (a) register_mem_sect_under_node should not rely on the system state to detect whether the link operation is triggered by a hot plug operation or not. This is addressed by the patches 1 and 2 of this series. Issue (b) will be addressed separately. This patch (of 2): The memmap_context enum is used to detect whether a memory operation is due to a hot-add operation or happening at boot time. Make it general to the hotplug operation and rename it as meminit_context. There is no functional change introduced by this patch Suggested-by: David Hildenbrand <david@redhat.com> Signed-off-by: Laurent Dufour <ldufour@linux.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: David Hildenbrand <david@redhat.com> Reviewed-by: Oscar Salvador <osalvador@suse.de> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: "Rafael J . Wysocki" <rafael@kernel.org> Cc: Nathan Lynch <nathanl@linux.ibm.com> Cc: Scott Cheloha <cheloha@linux.ibm.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: <stable@vger.kernel.org> Link: https://lkml.kernel.org/r/20200915094143.79181-1-ldufour@linux.ibm.com Link: https://lkml.kernel.org/r/20200915132624.9723-1-ldufour@linux.ibm.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
|
Mikulas Patocka | a1cd6c2ae4 |
arch/x86/lib/usercopy_64.c: fix __copy_user_flushcache() cache writeback
If we copy less than 8 bytes and if the destination crosses a cache
line, __copy_user_flushcache would invalidate only the first cache line.
This patch makes it invalidate the second cache line as well.
Fixes:
|