mirror of https://gitee.com/openkylin/linux.git
6797 Commits
Author | SHA1 | Message | Date |
---|---|---|---|
Jiri Olsa | 8dfdf440d3 |
perf tools: Pass build_id object to dso__set_build_id()
Passing build_id object to dso__set_build_id(), so it's easier to initialize dos's build id object. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Acked-by: Ian Rogers <irogers@google.com> Link: https://lore.kernel.org/r/20201013192441.1299447-6-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
Jiri Olsa | bf5411695a |
perf tools: Pass build_id object to build_id__sprintf()
Passing build_id object to build_id__sprintf function, so it can operate with the proper size of build id. This will create proper md5 build id readable names, like following: a50e350e97c43b4708d09bcd85ebfff7 instead of: a50e350e97c43b4708d09bcd85ebfff700000000 Signed-off-by: Jiri Olsa <jolsa@kernel.org> Acked-by: Ian Rogers <irogers@google.com> Link: https://lore.kernel.org/r/20201013192441.1299447-5-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
Jiri Olsa | 3ff1b8c8cc |
perf tools: Pass build id object to sysfs__read_build_id()
Passing build id object to sysfs__read_build_id function, so it can populate the size of the build_id object. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Acked-by: Ian Rogers <irogers@google.com> Link: https://lore.kernel.org/r/20201013192441.1299447-4-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
Jiri Olsa | f766819cd5 |
perf tools: Pass build_id object to filename__read_build_id()
Pass a build_id object to filename__read_build_id function, so it can populate the size of the build_id object. Changing filename__read_build_id() code for both ELF/non-ELF code. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Acked-by: Ian Rogers <irogers@google.com> Link: https://lore.kernel.org/r/20201013192441.1299447-3-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
Jiri Olsa | 0aba7f036a |
perf tools: Use build_id object in dso
Replace build_id byte array with struct build_id object and all the code that references it. The objective is to carry size together with build id array, so it's better to keep both together. This is preparatory change for following patches, and there's no functional change. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Acked-by: Ian Rogers <irogers@google.com> Link: https://lore.kernel.org/r/20201013192441.1299447-2-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
Arnaldo Carvalho de Melo | 79bbbabd22 |
perf config: Export the perf_config_from_file() function
We'll use it to ask for extra config files to be loaded, profile like stuff that will be used first to make 'perf trace' mimic 'strace' output via a 'perf strace' command that just sets up 'perf trace' output. At some point it'll be used for regression tests, where we'll run some simple commands like: perf strace ls > perf-strace.output strace ls > strace.output And then do some mutable syscall arg aware diff like tool to deal with arguments for things like mmap, that change at each execution, to be first ignored and then properly tracked when used accoss multiple syscalls. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
Arnaldo Carvalho de Melo | dbaa1b3d9a |
Merge branch 'perf/urgent' into perf/core
To pick fixes that missed v5.9. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
Namhyung Kim | e7b60c5a0c |
perf inject: Do not load map/dso when injecting build-id
No need to load symbols in a DSO when injecting build-id. I guess the reason was to check the DSO is a special file like anon files. Use some helper functions in map.c to check them before reading build-id. Also pass sample event's cpumode to a new build-id event. It brought a speedup in the benchmark of 25 -> 21 msec on my laptop. Also the memory usage (Max RSS) went down by ~200 KB. # Running 'internals/inject-build-id' benchmark: Average build-id injection took: 21.389 msec (+- 0.138 msec) Average time per event: 2.097 usec (+- 0.014 usec) Average memory usage: 8225 KB (+- 0 KB) Committer notes: Before: $ perf stat -r5 perf bench internals inject-build-id > /dev/null Performance counter stats for 'perf bench internals inject-build-id' (5 runs): 4,020.56 msec task-clock:u # 1.271 CPUs utilized ( +- 0.74% ) 0 context-switches:u # 0.000 K/sec 0 cpu-migrations:u # 0.000 K/sec 123,354 page-faults:u # 0.031 M/sec ( +- 0.81% ) 7,119,951,568 cycles:u # 1.771 GHz ( +- 1.74% ) (83.27%) 230,086,969 stalled-cycles-frontend:u # 3.23% frontend cycles idle ( +- 1.97% ) (83.41%) 1,168,298,765 stalled-cycles-backend:u # 16.41% backend cycles idle ( +- 1.13% ) (83.44%) 11,173,083,669 instructions:u # 1.57 insn per cycle # 0.10 stalled cycles per insn ( +- 1.58% ) (83.31%) 2,413,908,936 branches:u # 600.392 M/sec ( +- 1.69% ) (83.26%) 46,576,289 branch-misses:u # 1.93% of all branches ( +- 2.20% ) (83.31%) 3.1638 +- 0.0309 seconds time elapsed ( +- 0.98% ) $ After: $ perf stat -r5 perf bench internals inject-build-id > /dev/null Performance counter stats for 'perf bench internals inject-build-id' (5 runs): 2,379.94 msec task-clock:u # 1.473 CPUs utilized ( +- 0.18% ) 0 context-switches:u # 0.000 K/sec 0 cpu-migrations:u # 0.000 K/sec 62,584 page-faults:u # 0.026 M/sec ( +- 0.07% ) 2,372,389,668 cycles:u # 0.997 GHz ( +- 0.29% ) (83.14%) 106,937,862 stalled-cycles-frontend:u # 4.51% frontend cycles idle ( +- 4.89% ) (83.20%) 581,697,915 stalled-cycles-backend:u # 24.52% backend cycles idle ( +- 0.71% ) (83.47%) 3,659,692,199 instructions:u # 1.54 insn per cycle # 0.16 stalled cycles per insn ( +- 0.10% ) (83.63%) 791,372,961 branches:u # 332.518 M/sec ( +- 0.27% ) (83.39%) 10,648,083 branch-misses:u # 1.35% of all branches ( +- 0.22% ) (83.16%) 1.61570 +- 0.00172 seconds time elapsed ( +- 0.11% ) $ Signed-off-by: Namhyung Kim <namhyung@kernel.org> Original-patch-by: Stephane Eranian <eranian@google.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Acked-by: Jiri Olsa <jolsa@redhat.com> Link: https://lore.kernel.org/r/20201012070214.2074921-5-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
Namhyung Kim | 0bf02a0d80 |
perf bench: Add build-id injection benchmark
Sometimes I can see that 'perf record' piped with 'perf inject' take a long time processing build-ids. So introduce a inject-build-id benchmark to the internals benchmark suite to measure its overhead regularly. It runs the 'perf inject' command internally and feeds the given number of synthesized events (MMAP2 + SAMPLE basically). Usage: perf bench internals inject-build-id <options> -i, --iterations <n> Number of iterations used to compute average (default: 100) -m, --nr-mmaps <n> Number of mmap events for each iteration (default: 100) -n, --nr-samples <n> Number of sample events per mmap event (default: 100) -v, --verbose be more verbose (show iteration count, DSO name, etc) By default, it measures average processing time of 100 MMAP2 events and 10000 SAMPLE events. Below is a result on my laptop. $ perf bench internals inject-build-id # Running 'internals/inject-build-id' benchmark: Average build-id injection took: 25.789 msec (+- 0.202 msec) Average time per event: 2.528 usec (+- 0.020 usec) Average memory usage: 8411 KB (+- 7 KB) Committer testing: $ perf bench Usage: perf bench [<common options>] <collection> <benchmark> [<options>] # List of all available benchmark collections: sched: Scheduler and IPC benchmarks syscall: System call benchmarks mem: Memory access benchmarks numa: NUMA scheduling and MM benchmarks futex: Futex stressing benchmarks epoll: Epoll stressing benchmarks internals: Perf-internals benchmarks all: All benchmarks $ perf bench internals # List of available benchmarks for collection 'internals': synthesize: Benchmark perf event synthesis kallsyms-parse: Benchmark kallsyms parsing inject-build-id: Benchmark build-id injection $ perf bench internals inject-build-id # Running 'internals/inject-build-id' benchmark: Average build-id injection took: 14.202 msec (+- 0.059 msec) Average time per event: 1.392 usec (+- 0.006 usec) Average memory usage: 12650 KB (+- 10 KB) Average build-id-all injection took: 12.831 msec (+- 0.071 msec) Average time per event: 1.258 usec (+- 0.007 usec) Average memory usage: 11895 KB (+- 10 KB) $ $ perf stat -r5 perf bench internals inject-build-id # Running 'internals/inject-build-id' benchmark: Average build-id injection took: 14.380 msec (+- 0.056 msec) Average time per event: 1.410 usec (+- 0.006 usec) Average memory usage: 12608 KB (+- 11 KB) Average build-id-all injection took: 11.889 msec (+- 0.064 msec) Average time per event: 1.166 usec (+- 0.006 usec) Average memory usage: 11838 KB (+- 10 KB) # Running 'internals/inject-build-id' benchmark: Average build-id injection took: 14.246 msec (+- 0.065 msec) Average time per event: 1.397 usec (+- 0.006 usec) Average memory usage: 12744 KB (+- 10 KB) Average build-id-all injection took: 12.019 msec (+- 0.066 msec) Average time per event: 1.178 usec (+- 0.006 usec) Average memory usage: 11963 KB (+- 10 KB) # Running 'internals/inject-build-id' benchmark: Average build-id injection took: 14.321 msec (+- 0.067 msec) Average time per event: 1.404 usec (+- 0.007 usec) Average memory usage: 12690 KB (+- 10 KB) Average build-id-all injection took: 11.909 msec (+- 0.041 msec) Average time per event: 1.168 usec (+- 0.004 usec) Average memory usage: 11938 KB (+- 10 KB) # Running 'internals/inject-build-id' benchmark: Average build-id injection took: 14.287 msec (+- 0.059 msec) Average time per event: 1.401 usec (+- 0.006 usec) Average memory usage: 12864 KB (+- 10 KB) Average build-id-all injection took: 11.862 msec (+- 0.058 msec) Average time per event: 1.163 usec (+- 0.006 usec) Average memory usage: 12103 KB (+- 10 KB) # Running 'internals/inject-build-id' benchmark: Average build-id injection took: 14.402 msec (+- 0.053 msec) Average time per event: 1.412 usec (+- 0.005 usec) Average memory usage: 12876 KB (+- 10 KB) Average build-id-all injection took: 11.826 msec (+- 0.061 msec) Average time per event: 1.159 usec (+- 0.006 usec) Average memory usage: 12111 KB (+- 10 KB) Performance counter stats for 'perf bench internals inject-build-id' (5 runs): 4,267.48 msec task-clock:u # 1.502 CPUs utilized ( +- 0.14% ) 0 context-switches:u # 0.000 K/sec 0 cpu-migrations:u # 0.000 K/sec 102,092 page-faults:u # 0.024 M/sec ( +- 0.08% ) 3,894,589,578 cycles:u # 0.913 GHz ( +- 0.19% ) (83.49%) 140,078,421 stalled-cycles-frontend:u # 3.60% frontend cycles idle ( +- 0.77% ) (83.34%) 948,581,189 stalled-cycles-backend:u # 24.36% backend cycles idle ( +- 0.46% ) (83.25%) 5,835,587,719 instructions:u # 1.50 insn per cycle # 0.16 stalled cycles per insn ( +- 0.21% ) (83.24%) 1,267,423,636 branches:u # 296.996 M/sec ( +- 0.22% ) (83.12%) 17,484,290 branch-misses:u # 1.38% of all branches ( +- 0.12% ) (83.55%) 2.84176 +- 0.00222 seconds time elapsed ( +- 0.08% ) $ Acked-by: Jiri Olsa <jolsa@redhat.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20201012070214.2074921-2-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
Jiri Olsa | 6fcd5ddc3b |
perf python scripting: Fix printable strings in python3 scripts
Hagen reported broken strings in python3 tracepoint scripts:
make PYTHON=python3
perf record -e sched:sched_switch -a -- sleep 5
perf script --gen-script py
perf script -s ./perf-script.py
[..]
sched__sched_switch 7 563231.759525792 0 swapper prev_comm=bytearray(b'swapper/7\x00\x00\x00\x00\x00\x00\x00'), prev_pid=0, prev_prio=120, prev_state=, next_comm=bytearray(b'mutex-thread-co\x00'),
The problem is in the is_printable_array function that does not take the
zero byte into account and claim such string as not printable, so the
code will create byte array instead of string.
Committer testing:
After this fix:
sched__sched_switch 3 484522.497072626 1158680 kworker/3:0-eve prev_comm=kworker/3:0, prev_pid=1158680, prev_prio=120, prev_state=I, next_comm=swapper/3, next_pid=0, next_prio=120
Sample: {addr=0, cpu=3, datasrc=84410401, datasrc_decode=N/A|SNP N/A|TLB N/A|LCK N/A, ip=18446744071841817196, period=1, phys_addr=0, pid=1158680, tid=1158680, time=484522497072626, transaction=0, values=[(0, 0)], weight=0}
sched__sched_switch 4 484522.497085610 1225814 perf prev_comm=perf, prev_pid=1225814, prev_prio=120, prev_state=, next_comm=migration/4, next_pid=30, next_prio=0
Sample: {addr=0, cpu=4, datasrc=84410401, datasrc_decode=N/A|SNP N/A|TLB N/A|LCK N/A, ip=18446744071841817196, period=1, phys_addr=0, pid=1225814, tid=1225814, time=484522497085610, transaction=0, values=[(0, 0)], weight=0}
Fixes:
|
|
Ian Rogers | aa98d8482c |
perf parse-events: Reduce casts around bp_addr
perf_event_attr bp_addr is a u64. parse-events.y parses it as a u64, but casts it to a void* and then parse-events.c casts it back to a u64. Rather than all the casts, change the type of the address to be a u64. This removes an issue noted in: https://lore.kernel.org/lkml/20200903184359.GC3495158@kernel.org/ Signed-off-by: Ian Rogers <irogers@google.com> Acked-by: Jiri Olsa <jolsa@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Jin Yao <yao.jin@linux.intel.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: http://lore.kernel.org/lkml/20200925003903.561568-1-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
Namhyung Kim | 89fb1ca2ab |
perf tools: Allow creation of cgroup without open
This is a preparation for a test case of expanding events for multiple cgroups. Instead of using real system cgroup, the test will use fake cgroups so it needs a way to have them without a open file descriptor. Signed-off-by: Namhyung Kim <namhyung@kernel.org> Acked-by: Jiri Olsa <jolsa@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: http://lore.kernel.org/lkml/20200924124455.336326-5-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
Namhyung Kim | b214ba8c42 |
perf tools: Copy metric events properly when expand cgroups
The metricgroup__copy_metric_events() is to handle metrics events when expanding event for cgroups. As the metric events keep pointers to evsel, it should be refreshed when events are cloned during the operation. The perf_stat__collect_metric_expr() is also called in case an event has a metric directly. During the copy, it references evsel by index as the evlist now has cloned evsels for the given cgroup. Also kernel test robot found an issue in the python module import so add empty implementations of those two functions to fix it. Reported-by: kernel test robot <rong.a.chen@intel.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Acked-by: Jiri Olsa <jolsa@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: John Garry <john.garry@huawei.com> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: http://lore.kernel.org/lkml/20200924124455.336326-4-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
Namhyung Kim | d1c5a0e86a |
perf stat: Add --for-each-cgroup option
The --for-each-cgroup option is a syntax sugar to monitor large number of cgroups easily. Current command line requires to list all the events and cgroups even if users want to monitor same events for each cgroup. This patch addresses that usage by copying given events for each cgroup on user's behalf. For instance, if they want to monitor 6 events for 200 cgroups each they should write 1200 event names (with -e) AND 1200 cgroup names (with -G) on the command line. But with this change, they can just specify 6 events and 200 cgroups with a new option. A simpler example below: It wants to measure 3 events for 2 cgroups ('A' and 'B'). The result is that total 6 events are counted like below. $ perf stat -a -e cpu-clock,cycles,instructions --for-each-cgroup A,B sleep 1 Performance counter stats for 'system wide': 988.18 msec cpu-clock A # 0.987 CPUs utilized 3,153,761,702 cycles A # 3.200 GHz (100.00%) 8,067,769,847 instructions A # 2.57 insn per cycle (100.00%) 982.71 msec cpu-clock B # 0.982 CPUs utilized 3,136,093,298 cycles B # 3.182 GHz (99.99%) 8,109,619,327 instructions B # 2.58 insn per cycle (99.99%) 1.001228054 seconds time elapsed Signed-off-by: Namhyung Kim <namhyung@kernel.org> Acked-by: Jiri Olsa <jolsa@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: http://lore.kernel.org/lkml/20200924124455.336326-3-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
Namhyung Kim | 7fedd9b84b |
perf evsel: Add evsel__clone() function
The evsel__clone() is to create an exactly same evsel from same attributes. The function assumes the given evsel is not configured yet so it cares fields set during event parsing. Those fields are now moved together as Jiri suggested. Note that metric events will be handled by later patch. It will be used by perf stat to generate separate events for each cgroup. Signed-off-by: Namhyung Kim <namhyung@kernel.org> Acked-by: Jiri Olsa <jolsa@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: http://lore.kernel.org/lkml/20200924124455.336326-2-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
Leo Yan | d110162caf |
perf tsc: Support cap_user_time_short for event TIME_CONV
The synthesized event TIME_CONV doesn't contain the complete parameters for counters, this will lead to wrong conversion between counter cycles and timestamp. This patch extends event TIME_CONV to record flags 'cap_user_time_zero' which is used to indicate the counter parameters are valid or not, if not will directly return 0 for timestamp calculation. And record the flag 'cap_user_time_short' and its relevant fields 'time_cycles' and 'time_mask' for cycle calibration. Signed-off-by: Leo Yan <leo.yan@linaro.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Gustavo A. R. Silva <gustavoars@kernel.org> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: John Garry <john.garry@huawei.com> Cc: Kemeng Shi <shikemeng@huawei.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Nick Gasson <nick.gasson@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Remi Bernon <rbernon@codeweavers.com> Cc: Stephane Eranian <eranian@google.com> Cc: Steve Maclean <steve.maclean@microsoft.com> Cc: Will Deacon <will@kernel.org> Cc: Zou Wei <zou_wei@huawei.com> Cc: linux-arm-kernel@lists.infradead.org Link: http://lore.kernel.org/lkml/20200914115311.2201-5-leo.yan@linaro.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
Leo Yan | 78a93d4cec |
perf tsc: Calculate timestamp with cap_user_time_short
The perf mmap'ed buffer contains the flag 'cap_user_time_short' and two extra fields 'time_cycles' and 'time_mask', perf tool needs to know them for handling the counter wrapping case. This patch is to reads out the relevant parameters from the head of the first mmap'ed page and stores into the structure 'perf_tsc_conversion', if the flag 'cap_user_time_short' has been set, it will firstly calibrate cycle value for timestamp calculation. Committer testing: Before/after: # perf test tsc 70: Convert perf time to TSC : Ok # # perf test -v tsc 70: Convert perf time to TSC : --- start --- test child forked, pid 11059 mmap size 528384B 1st event perf time 996384576521 tsc 3850532906613 rdtsc time 996384578455 tsc 3850532913950 2nd event perf time 996384578845 tsc 3850532915428 test child finished with 0 ---- end ---- Convert perf time to TSC: Ok # Signed-off-by: Leo Yan <leo.yan@linaro.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Gustavo A. R. Silva <gustavoars@kernel.org> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: John Garry <john.garry@huawei.com> Cc: Kemeng Shi <shikemeng@huawei.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Nick Gasson <nick.gasson@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Remi Bernon <rbernon@codeweavers.com> Cc: Stephane Eranian <eranian@google.com> Cc: Steve Maclean <steve.maclean@microsoft.com> Cc: Will Deacon <will@kernel.org> Cc: Zou Wei <zou_wei@huawei.com> Cc: linux-arm-kernel@lists.infradead.org Link: http://lore.kernel.org/lkml/20200914115311.2201-4-leo.yan@linaro.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
Leo Yan | 03fca3af51 |
perf tsc: Move out common functions from x86
Functions perf_read_tsc_conversion() and perf_event__synth_time_conv() should work as common functions rather than x86 specific, so move these two functions out from arch/x86 folder and place them into util/tsc.c. Since the function perf_event__synth_time_conv() will be linked in util/tsc.c, remove its weak version. Committer testing: Before/after: # perf test tsc 70: Convert perf time to TSC : Ok # # perf test -v tsc 70: Convert perf time to TSC : --- start --- test child forked, pid 8520 mmap size 528384B 1st event perf time 592110439891 tsc 2317172044331 rdtsc time 592110441915 tsc 2317172052010 2nd event perf time 592110442336 tsc 2317172053605 test child finished with 0 ---- end ---- Convert perf time to TSC: Ok # Signed-off-by: Leo Yan <leo.yan@linaro.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Gustavo A. R. Silva <gustavoars@kernel.org> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: John Garry <john.garry@huawei.com> Cc: Kemeng Shi <shikemeng@huawei.com> Cc: linux-arm-kernel@lists.infradead.org Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Nick Gasson <nick.gasson@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Remi Bernon <rbernon@codeweavers.com> Cc: Stephane Eranian <eranian@google.com> Cc: Steve Maclean <steve.maclean@microsoft.com> Cc: Will Deacon <will@kernel.org> Cc: Zou Wei <zou_wei@huawei.com> Link: http://lore.kernel.org/lkml/20200914115311.2201-2-leo.yan@linaro.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
Masami Hiramatsu | 7cd5738d0d |
perf probe: Fall back to debuginfod query if debuginfo and source not found locally
Since 'perf probe' heavily depends on debuginfo, debuginfod gives us
many benefits on the 'perf probe' command on remote machine.
Especially, this will be helpful for the embedded devices which will not
have enough storage, or boot with a cross-build kernel whose source code
is in the host machine.
This will work as similar to commit
|
|
Masami Hiramatsu | ac7a75d1fb |
perf probe: Fix to adjust symbol address with correct reloc_sym address
'perf probe' uses ref_reloc_sym to adjust symbol offset address from debuginfo address or ref_reloc_sym based address, but that is misusing reloc_sym->addr and reloc_sym->unrelocated_addr. If map is not relocated (map->reloc == 0), we can use reloc_sym->addr as unrelocated address instead of reloc_sym->unrelocated_addr. This usually does not happen. If we have a non-stripped ELF binary, we will use it for map and debuginfo, if not, we use only kallsyms without debuginfo. Thus, the map is always relocated (ELF and DWARF binary) or not relocated (kallsyms). However, if we allow the combination of debuginfo and kallsyms based map (like using debuginfod), we have to check the map->reloc and choose the collect address of reloc_sym. Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Reviewed-by: Frank Ch. Eigler <fche@redhat.com> Cc: Aaron Merey <amerey@redhat.com> Cc: Daniel Thompson <daniel.thompson@linaro.org> Link: http://lore.kernel.org/lkml/160041609047.912668.14314639291419159274.stgit@devnote2 Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
Ian Rogers | dcc81be0fc |
perf metricgroup: Fix uncore metric expressions
A metric like DRAM_BW_Use has on SkylakeX events uncore_imc/cas_count_read/ and uncore_imc/case_count_write/. These events open 6 events per socket with pmu names of uncore_imc_[0-5]. The current metric setup code in find_evsel_group assumes one ID will map to 1 event to be recorded in metric_events. For events with multiple matches, the first event is recorded in metric_events (avoiding matching >1 event with the same name) and the evlist_used updated so that duplicate events aren't removed when the evlist has unused events removed. Before this change: $ /tmp/perf/perf stat -M DRAM_BW_Use -a -- sleep 1 Performance counter stats for 'system wide': 41.14 MiB uncore_imc/cas_count_read/ 1,002,614,251 ns duration_time 1.002614251 seconds time elapsed After this change: $ /tmp/perf/perf stat -M DRAM_BW_Use -a -- sleep 1 Performance counter stats for 'system wide': 157.47 MiB uncore_imc/cas_count_read/ # 0.00 DRAM_BW_Use 126.97 MiB uncore_imc/cas_count_write/ 1,003,019,728 ns duration_time Erroneous duplication introduced in: commit |
|
Adrian Hunter | 7d537a8d2e |
perf intel-pt: Fix "context_switch event has no tid" error
A context_switch event can have no tid because pids can be detached from
a task while the task is still running (in do_exit()). Note this won't
happen with per-task contexts because then tracing stops at
perf_event_exit_task()
If a task with no tid gets preempted, or a dying task gets preempted and
its parent releases it, when it subsequently gets switched back in,
Intel PT will not be able to determine what task is running and prints
an error "context_switch event has no tid". However, it is not really an
error because the task is in kernel space and the decoder can continue
to decode successfully. Fix by changing the error to be only a logged
message, and make allowance for tid == -1.
Example:
Using 5.9-rc4 with Preemptible Kernel (Low-Latency Desktop) e.g.
$ uname -r
5.9.0-rc4
$ grep PREEMPT .config
# CONFIG_PREEMPT_NONE is not set
# CONFIG_PREEMPT_VOLUNTARY is not set
CONFIG_PREEMPT=y
CONFIG_PREEMPT_COUNT=y
CONFIG_PREEMPTION=y
CONFIG_PREEMPT_RCU=y
CONFIG_PREEMPT_NOTIFIERS=y
CONFIG_DRM_I915_PREEMPT_TIMEOUT=640
CONFIG_DEBUG_PREEMPT=y
# CONFIG_PREEMPT_TRACER is not set
# CONFIG_PREEMPTIRQ_DELAY_TEST is not set
Before:
$ cat forkit.c
#include <sys/types.h>
#include <unistd.h>
#include <sys/wait.h>
int main()
{
pid_t child;
int status = 0;
child = fork();
if (child == 0)
return 123;
wait(&status);
return 0;
}
$ gcc -o forkit forkit.c
$ sudo ~/bin/perf record --kcore -a -m,64M -e intel_pt/cyc/k &
[1] 11016
$ taskset 2 ./forkit
$ sudo pkill perf
$ [ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 17.262 MB perf.data ]
[1]+ Terminated sudo ~/bin/perf record --kcore -a -m,64M -e intel_pt/cyc/k
$ sudo ~/bin/perf script --show-task-events --show-switch-events --itrace=iqqe-o -C 1 --ns | grep -C 2 forkit
context_switch event has no tid
taskset 11019 [001] 66663.270045029: 1 instructions:k: ffffffffb1d9f844 strnlen_user+0xb4 ([kernel.kallsyms])
taskset 11019 [001] 66663.270201816: 1 instructions:k: ffffffffb1a83121 unmap_page_range+0x561 ([kernel.kallsyms])
forkit 11019 [001] 66663.270327553: PERF_RECORD_COMM exec: forkit:11019/11019
forkit 11019 [001] 66663.270420028: 1 instructions:k: ffffffffb1db9537 __clear_user+0x27 ([kernel.kallsyms])
forkit 11019 [001] 66663.270648704: 1 instructions:k: ffffffffb18829e6 do_user_addr_fault+0xf6 ([kernel.kallsyms])
forkit 11019 [001] 66663.270833163: 1 instructions:k: ffffffffb230a825 irqentry_exit_to_user_mode+0x15 ([kernel.kallsyms])
forkit 11019 [001] 66663.271092359: 1 instructions:k: ffffffffb1aea3d9 lock_page_memcg+0x9 ([kernel.kallsyms])
forkit 11019 [001] 66663.271207092: PERF_RECORD_FORK(11020:11020):(11019:11019)
forkit 11019 [001] 66663.271234775: PERF_RECORD_SWITCH_CPU_WIDE OUT next pid/tid: 11020/11020
forkit 11020 [001] 66663.271238407: PERF_RECORD_SWITCH_CPU_WIDE IN prev pid/tid: 11019/11019
forkit 11020 [001] 66663.271312066: 1 instructions:k: ffffffffb1a88140 handle_mm_fault+0x10 ([kernel.kallsyms])
forkit 11020 [001] 66663.271476225: PERF_RECORD_EXIT(11020:11020):(11019:11019)
forkit 11020 [001] 66663.271497488: PERF_RECORD_SWITCH_CPU_WIDE OUT preempt next pid/tid: 11019/11019
forkit 11019 [001] 66663.271500523: PERF_RECORD_SWITCH_CPU_WIDE IN prev pid/tid: 11020/11020
forkit 11019 [001] 66663.271517241: 1 instructions:k: ffffffffb24012cd error_entry+0x6d ([kernel.kallsyms])
forkit 11019 [001] 66663.271664080: PERF_RECORD_EXIT(11019:11019):(1386:1386)
After:
$ sudo ~/bin/perf script --show-task-events --show-switch-events --itrace=iqqe-o -C 1 --ns | grep -C 2 forkit
taskset 11019 [001] 66663.270045029: 1 instructions:k: ffffffffb1d9f844 strnlen_user+0xb4 ([kernel.kallsyms])
taskset 11019 [001] 66663.270201816: 1 instructions:k: ffffffffb1a83121 unmap_page_range+0x561 ([kernel.kallsyms])
forkit 11019 [001] 66663.270327553: PERF_RECORD_COMM exec: forkit:11019/11019
forkit 11019 [001] 66663.270420028: 1 instructions:k: ffffffffb1db9537 __clear_user+0x27 ([kernel.kallsyms])
forkit 11019 [001] 66663.270648704: 1 instructions:k: ffffffffb18829e6 do_user_addr_fault+0xf6 ([kernel.kallsyms])
forkit 11019 [001] 66663.270833163: 1 instructions:k: ffffffffb230a825 irqentry_exit_to_user_mode+0x15 ([kernel.kallsyms])
forkit 11019 [001] 66663.271092359: 1 instructions:k: ffffffffb1aea3d9 lock_page_memcg+0x9 ([kernel.kallsyms])
forkit 11019 [001] 66663.271207092: PERF_RECORD_FORK(11020:11020):(11019:11019)
forkit 11019 [001] 66663.271234775: PERF_RECORD_SWITCH_CPU_WIDE OUT next pid/tid: 11020/11020
forkit 11020 [001] 66663.271238407: PERF_RECORD_SWITCH_CPU_WIDE IN prev pid/tid: 11019/11019
forkit 11020 [001] 66663.271312066: 1 instructions:k: ffffffffb1a88140 handle_mm_fault+0x10 ([kernel.kallsyms])
forkit 11020 [001] 66663.271476225: PERF_RECORD_EXIT(11020:11020):(11019:11019)
forkit 11020 [001] 66663.271497488: PERF_RECORD_SWITCH_CPU_WIDE OUT preempt next pid/tid: 11019/11019
forkit 11019 [001] 66663.271500523: PERF_RECORD_SWITCH_CPU_WIDE IN prev pid/tid: 11020/11020
forkit 11019 [001] 66663.271517241: 1 instructions:k: ffffffffb24012cd error_entry+0x6d ([kernel.kallsyms])
forkit 11019 [001] 66663.271664080: PERF_RECORD_EXIT(11019:11019):(1386:1386)
forkit 11019 [001] 66663.271688752: PERF_RECORD_SWITCH_CPU_WIDE OUT next pid/tid: -1/-1
:-1 -1 [001] 66663.271692086: PERF_RECORD_SWITCH_CPU_WIDE IN prev pid/tid: 11019/11019
:-1 -1 [001] 66663.271707466: 1 instructions:k: ffffffffb18eb096 update_load_avg+0x306 ([kernel.kallsyms])
Fixes:
|
|
Adrian Hunter | fc18380fb9 |
perf script: Display negative tid in non-sample events
The kernel can release tasks while they are still running. This can result in a task having no tid, in which case perf records a tid of -1. Improve the perf script output in that case. Example: Before: # cat ./autoreap.c #include <sys/types.h> #include <unistd.h> #include <sys/wait.h> #include <signal.h> struct sigaction act = { .sa_handler = SIG_IGN, }; int main() { pid_t child; int status = 0; sigaction(SIGCHLD, &act, NULL); child = fork(); if (child == 0) return 123; wait(&status); return 0; } # gcc -o autoreap autoreap.c # ./perf record -a -e dummy --switch-events ./autoreap [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.948 MB perf.data ] # ./perf script --show-task-events --show-switch-events | grep -C2 'autoreap\|4294967295\|-1' swapper 0 [004] 18462.673613: PERF_RECORD_SWITCH_CPU_WIDE OUT preempt next pid/tid: 25189/25189 perf 25189 [004] 18462.673614: PERF_RECORD_SWITCH_CPU_WIDE IN prev pid/tid: 0/0 autoreap 25189 [004] 18462.673800: PERF_RECORD_COMM exec: autoreap:25189/25189 autoreap 25189 [004] 18462.674042: PERF_RECORD_FORK(25191:25191):(25189:25189) autoreap 25189 [004] 18462.674050: PERF_RECORD_SWITCH_CPU_WIDE OUT next pid/tid: 0/0 swapper 0 [004] 18462.674051: PERF_RECORD_SWITCH_CPU_WIDE IN prev pid/tid: 25189/25189 swapper 0 [005] 18462.674083: PERF_RECORD_SWITCH_CPU_WIDE OUT preempt next pid/tid: 25191/25191 autoreap 25191 [005] 18462.674084: PERF_RECORD_SWITCH_CPU_WIDE IN prev pid/tid: 0/0 swapper 0 [003] 18462.674121: PERF_RECORD_SWITCH_CPU_WIDE OUT preempt next pid/tid: 11/11 rcu_preempt 11 [003] 18462.674121: PERF_RECORD_SWITCH_CPU_WIDE IN prev pid/tid: 0/0 rcu_preempt 11 [003] 18462.674124: PERF_RECORD_SWITCH_CPU_WIDE OUT next pid/tid: 0/0 swapper 0 [003] 18462.674124: PERF_RECORD_SWITCH_CPU_WIDE IN prev pid/tid: 11/11 autoreap 25191 [005] 18462.674138: PERF_RECORD_EXIT(25191:25191):(25189:25189) PERF_RECORD_SWITCH_CPU_WIDE OUT next pid/tid: 0/0 swapper 0 [005] 18462.674149: PERF_RECORD_SWITCH_CPU_WIDE IN prev pid/tid: 4294967295/4294967295 swapper 0 [004] 18462.674182: PERF_RECORD_SWITCH_CPU_WIDE OUT preempt next pid/tid: 25189/25189 autoreap 25189 [004] 18462.674183: PERF_RECORD_SWITCH_CPU_WIDE IN prev pid/tid: 0/0 autoreap 25189 [004] 18462.674218: PERF_RECORD_EXIT(25189:25189):(25188:25188) autoreap 25189 [004] 18462.674225: PERF_RECORD_SWITCH_CPU_WIDE OUT next pid/tid: 0/0 swapper 0 [004] 18462.674226: PERF_RECORD_SWITCH_CPU_WIDE IN prev pid/tid: 25189/25189 swapper 0 [007] 18462.674257: PERF_RECORD_SWITCH_CPU_WIDE OUT preempt next pid/tid: 25188/25188 After: # ./perf script --show-task-events --show-switch-events | grep -C2 'autoreap\|4294967295\|-1' swapper 0 [004] 18462.673613: PERF_RECORD_SWITCH_CPU_WIDE OUT preempt next pid/tid: 25189/25189 perf 25189 [004] 18462.673614: PERF_RECORD_SWITCH_CPU_WIDE IN prev pid/tid: 0/0 autoreap 25189 [004] 18462.673800: PERF_RECORD_COMM exec: autoreap:25189/25189 autoreap 25189 [004] 18462.674042: PERF_RECORD_FORK(25191:25191):(25189:25189) autoreap 25189 [004] 18462.674050: PERF_RECORD_SWITCH_CPU_WIDE OUT next pid/tid: 0/0 swapper 0 [004] 18462.674051: PERF_RECORD_SWITCH_CPU_WIDE IN prev pid/tid: 25189/25189 swapper 0 [005] 18462.674083: PERF_RECORD_SWITCH_CPU_WIDE OUT preempt next pid/tid: 25191/25191 autoreap 25191 [005] 18462.674084: PERF_RECORD_SWITCH_CPU_WIDE IN prev pid/tid: 0/0 swapper 0 [003] 18462.674121: PERF_RECORD_SWITCH_CPU_WIDE OUT preempt next pid/tid: 11/11 rcu_preempt 11 [003] 18462.674121: PERF_RECORD_SWITCH_CPU_WIDE IN prev pid/tid: 0/0 rcu_preempt 11 [003] 18462.674124: PERF_RECORD_SWITCH_CPU_WIDE OUT next pid/tid: 0/0 swapper 0 [003] 18462.674124: PERF_RECORD_SWITCH_CPU_WIDE IN prev pid/tid: 11/11 autoreap 25191 [005] 18462.674138: PERF_RECORD_EXIT(25191:25191):(25189:25189) :-1 -1 [005] 18462.674149: PERF_RECORD_SWITCH_CPU_WIDE OUT next pid/tid: 0/0 swapper 0 [005] 18462.674149: PERF_RECORD_SWITCH_CPU_WIDE IN prev pid/tid: -1/-1 swapper 0 [004] 18462.674182: PERF_RECORD_SWITCH_CPU_WIDE OUT preempt next pid/tid: 25189/25189 autoreap 25189 [004] 18462.674183: PERF_RECORD_SWITCH_CPU_WIDE IN prev pid/tid: 0/0 autoreap 25189 [004] 18462.674218: PERF_RECORD_EXIT(25189:25189):(25188:25188) autoreap 25189 [004] 18462.674225: PERF_RECORD_SWITCH_CPU_WIDE OUT next pid/tid: 0/0 swapper 0 [004] 18462.674226: PERF_RECORD_SWITCH_CPU_WIDE IN prev pid/tid: 25189/25189 swapper 0 [007] 18462.674257: PERF_RECORD_SWITCH_CPU_WIDE OUT preempt next pid/tid: 25188/25188 Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Yu-cheng Yu <yu-cheng.yu@intel.com> Link: http://lore.kernel.org/lkml/20200909084923.9096-2-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
Andi Kleen | 55c36a9fc2 |
perf stat: Support new per thread TopDown metrics
Icelake has support for reporting per thread TopDown metrics. These are reported differently than the previous TopDown support, each metric is standalone, but scaled to pipeline "slots". We don't need to do anything special for HyperThreading anymore. Teach perf stat --topdown to handle these new metrics and print them in the same way as the previous TopDown metrics. The restrictions of only being able to report information per core is gone. Signed-off-by: Andi Kleen <ak@linux.intel.com> Co-developed-by: Kan Liang <kan.liang@linux.intel.com> Acked-by: Jiri Olsa <jolsa@redhat.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: http://lore.kernel.org/lkml/20200911144808.27603-4-kan.liang@linux.intel.com Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
Kan Liang | acb65150a4 |
perf record: Support sample-read topdown metric group
With the hardware TopDown metrics feature, sample-read feature should be supported for a topdown group, e.g., sample a non-topdown event and read a topdown metric group. But the current perf record code errors out. For a topdown metric group, the slots event must be the leader of the group, but the leader slots event doesn't support sampling. To support sample-read the topdown metric group, use the 2nd event of the group as the "leader" for the purposes of sampling. Only the platform with Topdown metic feature supports sample-read the topdown group. Add arch_topdown_sample_read() to indicate whether the topdown group supports sample-read. Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Acked-by: Jiri Olsa <jolsa@redhat.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Andi Kleen <ak@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: http://lore.kernel.org/lkml/20200911144808.27603-3-kan.liang@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
Kan Liang | 687986bbeb |
perf tools: Rename group to topdown
The group.h/c only include TopDown group related functions. The name "group" is too generic and inaccurate. Use the name "topdown" to replace it. Move topdown related functions to a dedicated file, topdown.c. Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Acked-by: Jiri Olsa <jolsa@redhat.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Andi Kleen <ak@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: http://lore.kernel.org/lkml/20200911144808.27603-2-kan.liang@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
Jiri Olsa | c57f5eaa09 |
perf machine: Add machine__for_each_dso() function
Add the machine__for_each_dso() to iterate over all dso objects defined for the within a machine object. It will be used in the MMAP3 patch series. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Alexey Budankov <alexey.budankov@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Frank Ch. Eigler <fche@redhat.com> Cc: Ian Rogers <irogers@google.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Michael Petlan <mpetlan@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Song Liu <songliubraving@fb.com> Cc: Stephane Eranian <eranian@google.com> Link: http://lore.kernel.org/lkml/20200913210313.1985612-22-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
Arnaldo Carvalho de Melo | 056c172201 |
Merge remote-tracking branch 'torvalds/master' into perf/core
To pick up fixes. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
Namhyung Kim | 0f1b550e29 |
perf parse-event: Release cpu_map refcount if evsel alloc failed
Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: http://lore.kernel.org/lkml/20200917060219.1287863-1-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
Namhyung Kim | 5d680be3b0 |
perf parse-event: Fix cpu map refcounting
Like evlist cpu map, evsel's cpu map should have a proper refcount. As it's created with a refcount, we don't need to get an extra count. Thanks to Arnaldo for the simpler suggestion. This, together with the following patch, fixes the following ASAN report: Direct leak of 840 byte(s) in 70 object(s) allocated from: #0 0x7fe36703f628 in malloc (/lib/x86_64-linux-gnu/libasan.so.5+0x107628) #1 0x559fbbf611ca in cpu_map__trim_new /home/namhyung/project/linux/tools/lib/perf/cpumap.c:79 #2 0x559fbbf6229c in perf_cpu_map__new /home/namhyung/project/linux/tools/lib/perf/cpumap.c:237 #3 0x559fbbcc6c6d in __add_event util/parse-events.c:357 #4 0x559fbbcc6c6d in add_event_tool util/parse-events.c:408 #5 0x559fbbcc6c6d in parse_events_add_tool util/parse-events.c:1414 #6 0x559fbbd8474d in parse_events_parse util/parse-events.y:439 #7 0x559fbbcc95da in parse_events__scanner util/parse-events.c:2096 #8 0x559fbbcc95da in __parse_events util/parse-events.c:2141 #9 0x559fbbc2788b in check_parse_id tests/pmu-events.c:406 #10 0x559fbbc2788b in check_parse_id tests/pmu-events.c:393 #11 0x559fbbc2788b in check_parse_fake tests/pmu-events.c:436 #12 0x559fbbc2788b in metric_parse_fake tests/pmu-events.c:553 #13 0x559fbbc27e2d in test_parsing_fake tests/pmu-events.c:599 #14 0x559fbbc27e2d in test_parsing_fake tests/pmu-events.c:574 #15 0x559fbbc0109b in run_test tests/builtin-test.c:410 #16 0x559fbbc0109b in test_and_print tests/builtin-test.c:440 #17 0x559fbbc03e69 in __cmd_test tests/builtin-test.c:695 #18 0x559fbbc03e69 in cmd_test tests/builtin-test.c:807 #19 0x559fbbc691f4 in run_builtin /home/namhyung/project/linux/tools/perf/perf.c:312 #20 0x559fbbb071a8 in handle_internal_command /home/namhyung/project/linux/tools/perf/perf.c:364 #21 0x559fbbb071a8 in run_argv /home/namhyung/project/linux/tools/perf/perf.c:408 #22 0x559fbbb071a8 in main /home/namhyung/project/linux/tools/perf/perf.c:538 #23 0x7fe366b68cc9 in __libc_start_main ../csu/libc-start.c:308 And I've failed which commit introduced this bug as the code was heavily changed since then. ;-/ Signed-off-by: Namhyung Kim <namhyung@kernel.org> Acked-by: Jiri Olsa <jolsa@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: http://lore.kernel.org/lkml/20200917060219.1287863-2-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
Qi Liu | ce9c13f31b |
perf stat: Fix the ratio comments of miss-events
'perf stat' displays miss ratio of L1-dcache, L1-icache, dTLB cache, iTLB cache and LL-cache. Take L1-dcache for example, miss ratio is caculated as "L1-dcache-load-misses/L1-dcache-loads". So "of all L1-dcache hits" is unsuitable to describe it, and "of all L1-dcache accesses" seems better. The comments of L1-icache, dTLB cache, iTLB cache and LL-cache are fixed in the same way. Signed-off-by: Qi Liu <liuqi115@huawei.com> Reviewed-by: Andi Kleen <ak@linux.intel.com> Cc: linuxarm@huawei.com Link: http://lore.kernel.org/lkml/1600253331-10535-1-git-send-email-liuqi115@huawei.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
Namhyung Kim | d26383dcb2 |
perf test: Free formats for perf pmu parse test
The following leaks were detected by ASAN:
Indirect leak of 360 byte(s) in 9 object(s) allocated from:
#0 0x7fecc305180e in calloc (/lib/x86_64-linux-gnu/libasan.so.5+0x10780e)
#1 0x560578f6dce5 in perf_pmu__new_format util/pmu.c:1333
#2 0x560578f752fc in perf_pmu_parse util/pmu.y:59
#3 0x560578f6a8b7 in perf_pmu__format_parse util/pmu.c:73
#4 0x560578e07045 in test__pmu tests/pmu.c:155
#5 0x560578de109b in run_test tests/builtin-test.c:410
#6 0x560578de109b in test_and_print tests/builtin-test.c:440
#7 0x560578de401a in __cmd_test tests/builtin-test.c:661
#8 0x560578de401a in cmd_test tests/builtin-test.c:807
#9 0x560578e49354 in run_builtin /home/namhyung/project/linux/tools/perf/perf.c:312
#10 0x560578ce71a8 in handle_internal_command /home/namhyung/project/linux/tools/perf/perf.c:364
#11 0x560578ce71a8 in run_argv /home/namhyung/project/linux/tools/perf/perf.c:408
#12 0x560578ce71a8 in main /home/namhyung/project/linux/tools/perf/perf.c:538
#13 0x7fecc2b7acc9 in __libc_start_main ../csu/libc-start.c:308
Fixes:
|
|
Namhyung Kim | 6f47ed6cd1 |
perf metric: Do not free metric when failed to resolve
It's dangerous to free the original metric when it's called from
resolve_metric() as it's already in the metric_list and might have other
resources too. Instead, it'd better let them bail out and be released
properly at the later stage.
So add a check when it's called from metricgroup__add_metric() and
release it. Also make sure that mp is set properly.
Fixes:
|
|
Namhyung Kim | 27adafcda3 |
perf metric: Free metric when it failed to resolve
The metricgroup__add_metric() can find multiple match for a metric group
and it's possible to fail. Also it can fail in the middle like in
resolve_metric() even for single metric.
In those cases, the intermediate list and ids will be leaked like:
Direct leak of 3 byte(s) in 1 object(s) allocated from:
#0 0x7f4c938f40b5 in strdup (/lib/x86_64-linux-gnu/libasan.so.5+0x920b5)
#1 0x55f7e71c1bef in __add_metric util/metricgroup.c:683
#2 0x55f7e71c31d0 in add_metric util/metricgroup.c:906
#3 0x55f7e71c3844 in metricgroup__add_metric util/metricgroup.c:940
#4 0x55f7e71c488d in metricgroup__add_metric_list util/metricgroup.c:993
#5 0x55f7e71c488d in parse_groups util/metricgroup.c:1045
#6 0x55f7e71c60a4 in metricgroup__parse_groups_test util/metricgroup.c:1087
#7 0x55f7e71235ae in __compute_metric tests/parse-metric.c:164
#8 0x55f7e7124650 in compute_metric tests/parse-metric.c:196
#9 0x55f7e7124650 in test_recursion_fail tests/parse-metric.c:318
#10 0x55f7e7124650 in test__parse_metric tests/parse-metric.c:356
#11 0x55f7e70be09b in run_test tests/builtin-test.c:410
#12 0x55f7e70be09b in test_and_print tests/builtin-test.c:440
#13 0x55f7e70c101a in __cmd_test tests/builtin-test.c:661
#14 0x55f7e70c101a in cmd_test tests/builtin-test.c:807
#15 0x55f7e7126214 in run_builtin /home/namhyung/project/linux/tools/perf/perf.c:312
#16 0x55f7e6fc41a8 in handle_internal_command /home/namhyung/project/linux/tools/perf/perf.c:364
#17 0x55f7e6fc41a8 in run_argv /home/namhyung/project/linux/tools/perf/perf.c:408
#18 0x55f7e6fc41a8 in main /home/namhyung/project/linux/tools/perf/perf.c:538
#19 0x7f4c93492cc9 in __libc_start_main ../csu/libc-start.c:308
Fixes:
|
|
Namhyung Kim | 437822bf38 |
perf metric: Release expr_parse_ctx after testing
The test_generic_metric() missed to release entries in the pctx. Asan
reported following leak (and more):
Direct leak of 128 byte(s) in 1 object(s) allocated from:
#0 0x7f4c9396980e in calloc (/lib/x86_64-linux-gnu/libasan.so.5+0x10780e)
#1 0x55f7e748cc14 in hashmap_grow (/home/namhyung/project/linux/tools/perf/perf+0x90cc14)
#2 0x55f7e748d497 in hashmap__insert (/home/namhyung/project/linux/tools/perf/perf+0x90d497)
#3 0x55f7e7341667 in hashmap__set /home/namhyung/project/linux/tools/perf/util/hashmap.h:111
#4 0x55f7e7341667 in expr__add_ref util/expr.c:120
#5 0x55f7e7292436 in prepare_metric util/stat-shadow.c:783
#6 0x55f7e729556d in test_generic_metric util/stat-shadow.c:858
#7 0x55f7e712390b in compute_single tests/parse-metric.c:128
#8 0x55f7e712390b in __compute_metric tests/parse-metric.c:180
#9 0x55f7e712446d in compute_metric tests/parse-metric.c:196
#10 0x55f7e712446d in test_dcache_l2 tests/parse-metric.c:295
#11 0x55f7e712446d in test__parse_metric tests/parse-metric.c:355
#12 0x55f7e70be09b in run_test tests/builtin-test.c:410
#13 0x55f7e70be09b in test_and_print tests/builtin-test.c:440
#14 0x55f7e70c101a in __cmd_test tests/builtin-test.c:661
#15 0x55f7e70c101a in cmd_test tests/builtin-test.c:807
#16 0x55f7e7126214 in run_builtin /home/namhyung/project/linux/tools/perf/perf.c:312
#17 0x55f7e6fc41a8 in handle_internal_command /home/namhyung/project/linux/tools/perf/perf.c:364
#18 0x55f7e6fc41a8 in run_argv /home/namhyung/project/linux/tools/perf/perf.c:408
#19 0x55f7e6fc41a8 in main /home/namhyung/project/linux/tools/perf/perf.c:538
#20 0x7f4c93492cc9 in __libc_start_main ../csu/libc-start.c:308
Fixes:
|
|
Namhyung Kim | b12eea5ad8 |
perf parse-event: Fix memory leak in evsel->unit
The evsel->unit borrows a pointer of pmu event or alias instead of
owns a string. But tool event (duration_time) passes a result of
strdup() caused a leak.
It was found by ASAN during metric test:
Direct leak of 210 byte(s) in 70 object(s) allocated from:
#0 0x7fe366fca0b5 in strdup (/lib/x86_64-linux-gnu/libasan.so.5+0x920b5)
#1 0x559fbbcc6ea3 in add_event_tool util/parse-events.c:414
#2 0x559fbbcc6ea3 in parse_events_add_tool util/parse-events.c:1414
#3 0x559fbbd8474d in parse_events_parse util/parse-events.y:439
#4 0x559fbbcc95da in parse_events__scanner util/parse-events.c:2096
#5 0x559fbbcc95da in __parse_events util/parse-events.c:2141
#6 0x559fbbc28555 in check_parse_id tests/pmu-events.c:406
#7 0x559fbbc28555 in check_parse_id tests/pmu-events.c:393
#8 0x559fbbc28555 in check_parse_cpu tests/pmu-events.c:415
#9 0x559fbbc28555 in test_parsing tests/pmu-events.c:498
#10 0x559fbbc0109b in run_test tests/builtin-test.c:410
#11 0x559fbbc0109b in test_and_print tests/builtin-test.c:440
#12 0x559fbbc03e69 in __cmd_test tests/builtin-test.c:695
#13 0x559fbbc03e69 in cmd_test tests/builtin-test.c:807
#14 0x559fbbc691f4 in run_builtin /home/namhyung/project/linux/tools/perf/perf.c:312
#15 0x559fbbb071a8 in handle_internal_command /home/namhyung/project/linux/tools/perf/perf.c:364
#16 0x559fbbb071a8 in run_argv /home/namhyung/project/linux/tools/perf/perf.c:408
#17 0x559fbbb071a8 in main /home/namhyung/project/linux/tools/perf/perf.c:538
#18 0x7fe366b68cc9 in __libc_start_main ../csu/libc-start.c:308
Fixes:
|
|
Namhyung Kim | bfd1b83d75 |
perf evlist: Fix cpu/thread map leak
Asan reported leak of cpu and thread maps as they have one more refcount than released. I found that after setting evlist maps it should release it's refcount. It seems to be broken from the beginning so I chose the original commit as the culprit. But not sure how it's applied to stable trees since there are many changes in the code after that. Fixes: |
|
Namhyung Kim | b033ab11ad |
perf metric: Fix some memory leaks - part 2
The metric_event_delete() missed to free expr->metric_events and it
should free an expr when metric_refs allocation failed.
Fixes:
|
|
Namhyung Kim | 4f57a1ed74 |
perf metric: Fix some memory leaks
I found some memory leaks while reading the metric code. Some are real
and others only occur in the error path. When it failed during metric
or event parsing, it should release all resources properly.
Fixes:
|
|
Namhyung Kim | 22fe5a25b5 |
perf test: Free aliases for PMU event map aliases test
The aliases were never released causing the following leaks:
Indirect leak of 1224 byte(s) in 9 object(s) allocated from:
#0 0x7feefb830628 in malloc (/lib/x86_64-linux-gnu/libasan.so.5+0x107628)
#1 0x56332c8f1b62 in __perf_pmu__new_alias util/pmu.c:322
#2 0x56332c8f401f in pmu_add_cpu_aliases_map util/pmu.c:778
#3 0x56332c792ce9 in __test__pmu_event_aliases tests/pmu-events.c:295
#4 0x56332c792ce9 in test_aliases tests/pmu-events.c:367
#5 0x56332c76a09b in run_test tests/builtin-test.c:410
#6 0x56332c76a09b in test_and_print tests/builtin-test.c:440
#7 0x56332c76ce69 in __cmd_test tests/builtin-test.c:695
#8 0x56332c76ce69 in cmd_test tests/builtin-test.c:807
#9 0x56332c7d2214 in run_builtin /home/namhyung/project/linux/tools/perf/perf.c:312
#10 0x56332c6701a8 in handle_internal_command /home/namhyung/project/linux/tools/perf/perf.c:364
#11 0x56332c6701a8 in run_argv /home/namhyung/project/linux/tools/perf/perf.c:408
#12 0x56332c6701a8 in main /home/namhyung/project/linux/tools/perf/perf.c:538
#13 0x7feefb359cc9 in __libc_start_main ../csu/libc-start.c:308
Fixes:
|
|
Ian Rogers | 3b0a18c1aa |
perf record: Don't clear event's period if set by a term
If events in a group explicitly set a frequency or period with leader sampling, don't disable the samples on those events. Prior to 5.8: perf record -e '{cycles/period=12345000/,instructions/period=6789000/}:S' would clear the attributes then apply the config terms. In commit |
|
Stephane Eranian | ae5dcc8abe |
perf record: Prevent override of attr->sample_period for libpfm4 events
Before: $ perf record -c 10000 --pfm-events=cycles:period=77777 Would yield a cycles event with period=10000, instead of 77777. the event string and perf record initializing the event. This was due to an ordering issue between libpfm4 parsing events with attr->sample_period != 0 by the time intent of the author. perf_evsel__config() is invoked. This seems to have been the This patch fixes the problem by preventing override for Signed-off-by: Stephane Eranian <eranian@google.com> Reviewed-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Andi Kleen <ak@linux.intel.com> Cc: Andrii Nakryiko <andriin@fb.com> Cc: Athira Jajeev <atrajeev@linux.vnet.ibm.com> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: Jiri Olsa <jolsa@redhat.com> Cc: John Fastabend <john.fastabend@gmail.com> Cc: KP Singh <kpsingh@chromium.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Martin KaFai Lau <kafai@fb.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Song Liu <songliubraving@fb.com> Cc: Yonghong Song <yhs@fb.com> Link: http://lore.kernel.org/lkml/20200912025655.1337192-3-irogers@google.com Signed-off-by: Ian Rogers <irogers@google.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
David Sharp | ce4326d275 |
perf record: Set PERF_RECORD_PERIOD if attr->freq is set.
evsel__config() would only set PERF_RECORD_PERIOD if it set attr->freq from perf record options. When it is set by libpfm events, it would not get set. This changes evsel__config to see if attr->freq is set outside of whether or not it changes attr->freq itself. Signed-off-by: David Sharp <dhsharp@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Andi Kleen <ak@linux.intel.com> Cc: Andrii Nakryiko <andriin@fb.com> Cc: Athira Jajeev <atrajeev@linux.vnet.ibm.com> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: John Fastabend <john.fastabend@gmail.com> Cc: KP Singh <kpsingh@chromium.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Martin KaFai Lau <kafai@fb.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Song Liu <songliubraving@fb.com> Cc: Stephane Eranian <eranian@google.com> Cc: Yonghong Song <yhs@fb.com> Cc: david sharp <dhsharp@google.com> Link: http://lore.kernel.org/lkml/20200912025655.1337192-2-irogers@google.com Signed-off-by: Ian Rogers <irogers@google.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
Jiri Olsa | 8366f0d268 |
perf tests: Call test_attr__open() directly
There's no longer need to call test_attr__open() from sys_perf_event_open(), because both 'perf record' and 'perf stat' call evsel__open_cpu(), so we can call it directly from there and not polute the perf-sys.h header. Committer testing: Before and after: # perf test attr 17: Setup struct perf_event_attr : Ok 49: Synthesize attr update : Ok # perf test -v attr 17: Setup struct perf_event_attr : --- start --- test child forked, pid 2170868 running '/home/acme/libexec/perf-core/tests/attr/test-record-branch-filter-any_ret' unsupp '/home/acme/libexec/perf-core/tests/attr/test-record-branch-filter-any_ret' running '/home/acme/libexec/perf-core/tests/attr/test-record-C0' running '/home/acme/libexec/perf-core/tests/attr/test-record-graph-fp' running '/home/acme/libexec/perf-core/tests/attr/test-record-period' running '/home/acme/libexec/perf-core/tests/attr/test-record-group-sampling' running '/home/acme/libexec/perf-core/tests/attr/test-record-freq' running '/home/acme/libexec/perf-core/tests/attr/test-stat-detailed-3' running '/home/acme/libexec/perf-core/tests/attr/test-record-branch-filter-k' unsupp '/home/acme/libexec/perf-core/tests/attr/test-record-branch-filter-k' running '/home/acme/libexec/perf-core/tests/attr/test-stat-group1' running '/home/acme/libexec/perf-core/tests/attr/test-record-branch-filter-u' unsupp '/home/acme/libexec/perf-core/tests/attr/test-record-branch-filter-u' running '/home/acme/libexec/perf-core/tests/attr/test-stat-basic' running '/home/acme/libexec/perf-core/tests/attr/test-record-branch-filter-any_call' unsupp '/home/acme/libexec/perf-core/tests/attr/test-record-branch-filter-any_call' running '/home/acme/libexec/perf-core/tests/attr/test-stat-default' running '/home/acme/libexec/perf-core/tests/attr/test-record-graph-dwarf' running '/home/acme/libexec/perf-core/tests/attr/test-record-no-buffering' running '/home/acme/libexec/perf-core/tests/attr/test-record-raw' running '/home/acme/libexec/perf-core/tests/attr/test-stat-detailed-2' running '/home/acme/libexec/perf-core/tests/attr/test-record-count' running '/home/acme/libexec/perf-core/tests/attr/test-record-data' running '/home/acme/libexec/perf-core/tests/attr/test-record-branch-filter-any' unsupp '/home/acme/libexec/perf-core/tests/attr/test-record-branch-filter-any' running '/home/acme/libexec/perf-core/tests/attr/test-stat-group' running '/home/acme/libexec/perf-core/tests/attr/test-record-branch-any' unsupp '/home/acme/libexec/perf-core/tests/attr/test-record-branch-any' running '/home/acme/libexec/perf-core/tests/attr/test-record-graph-default' running '/home/acme/libexec/perf-core/tests/attr/test-record-no-samples' running '/home/acme/libexec/perf-core/tests/attr/test-stat-C0' running '/home/acme/libexec/perf-core/tests/attr/test-record-no-inherit' running '/home/acme/libexec/perf-core/tests/attr/test-record-branch-filter-ind_call' unsupp '/home/acme/libexec/perf-core/tests/attr/test-record-branch-filter-ind_call' running '/home/acme/libexec/perf-core/tests/attr/test-record-basic' running '/home/acme/libexec/perf-core/tests/attr/test-record-group1' running '/home/acme/libexec/perf-core/tests/attr/test-record-pfm-period' unsupp '/home/acme/libexec/perf-core/tests/attr/test-record-pfm-period' running '/home/acme/libexec/perf-core/tests/attr/test-stat-detailed-1' running '/home/acme/libexec/perf-core/tests/attr/test-stat-no-inherit' running '/home/acme/libexec/perf-core/tests/attr/test-record-branch-filter-hv' unsupp '/home/acme/libexec/perf-core/tests/attr/test-record-branch-filter-hv' running '/home/acme/libexec/perf-core/tests/attr/test-record-group' test child finished with 0 ---- end ---- Setup struct perf_event_attr: Ok 49: Synthesize attr update : --- start --- test child forked, pid 2171004 test child finished with 0 ---- end ---- Synthesize attr update: Ok # Signed-off-by: Jiri Olsa <jolsa@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Michael Petlan <mpetlan@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lore.kernel.org/lkml/20200827193201.GB127372@krava Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
Kajol Jain | f5a489dc81 |
perf metricgroup: Pass pmu_event structure as a parameter for arch_get_runtimeparam()
This patch adds passing of pmu_event as a parameter in function 'arch_get_runtimeparam' which can be used to get details like if the event is percore/perchip. Signed-off-by: Kajol Jain <kjain@linux.ibm.com> Acked-by: Ian Rogers <irogers@google.com> Acked-by: Jiri Olsa <jolsa@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jin Yao <yao.jin@linux.intel.com> Cc: John Garry <john.garry@huawei.com> Cc: Madhavan Srinivasan <maddy@linux.ibm.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Paul Clarke <pc@us.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@linux.ibm.com> Link: http://lore.kernel.org/lkml/20200907064133.75090-5-kjain@linux.ibm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
Ian Rogers | 9e34c1c87e |
perf metricgroup: Fix typo in comment.
Add missing character. Signed-off-by: Ian Rogers <irogers@google.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Jin Yao <yao.jin@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Richter <tmricht@linux.ibm.com> Link: http://lore.kernel.org/lkml/20200910032632.511566-2-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
Ian Rogers | 7a16183316 |
perf stat: Remove dead code: no need to set os.evsel twice
No need to set os.evsel twice. Signed-off-by: Ian Rogers <irogers@google.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Jin Yao <yao.jin@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Richter <tmricht@linux.ibm.com> Link: http://lore.kernel.org/lkml/20200910032632.511566-1-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
Namhyung Kim | fac49a3bc4 |
perf list: Do not print 'Metric Groups:' unnecessarily
It was printed unconditionally even if nothing is printed. Check if the output list empty when filter is given. Before: $ ./perf list duration List of pre-defined events (to be used in -e): duration_time [Tool event] Metric Groups: After: $ ./perf list duration List of pre-defined events (to be used in -e): duration_time [Tool event] Signed-off-by: Namhyung Kim <namhyung@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: http://lore.kernel.org/lkml/20200909055849.469612-2-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
Adrian Hunter | ee7fe31e6e |
perf tools: Consolidate close_control_option()'s into one function
Consolidate control option fifo closing into one function. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Suggested-by: Alexey Budankov <alexey.budankov@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Link: http://lore.kernel.org/lkml/20200903122937.25691-1-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
Arnaldo Carvalho de Melo | bbe544682e |
perf annotate: Allow configuring the 'disassembler_style' knob via 'perf config'
# perf annotate --stdio2 acpi_processor_ffh_cstate_enter > default # perf config annotate.disassembler_style=intel # perf config annotate.disassembler_style annotate.disassembler_style=intel # perf annotate --stdio2 acpi_processor_ffh_cstate_enter > intel # diff -u default intel --- default 2020-09-04 13:09:26.019205732 -0300 +++ intel 2020-09-04 13:09:52.823795081 -0300 @@ -1,42 +1,42 @@ Samples: 1K of event 'cycles', 4000 Hz, Event count (approx.): 990065316, [percent: local period] acpi_processor_ffh_cstate_enter() /lib/modules/5.9.0-rc3/build/vmlinux -Percent → callq __fentry__ - mov cpu_number,%edx - mov %edx,%edx - mov cpu_cstate_entry,%rax - add -0x7dbe9700(,%rdx,8),%rax - movzbl 0x9(%rdi),%edx - mov 0x4(%rax,%rdx,8),%edi - mov (%rax,%rdx,8),%esi - → jmpq 137ccc6 - 2d: → jmpq 137ccd8 +Percent → call __fentry__ + mov edx,DWORD PTR gs:[rip+0x7e541d74] + mov edx,edx + mov rax,QWORD PTR [rip+0x152b8fb] + add rax,QWORD PTR [rdx*8-0x7dbe9700] + movzx edx,BYTE PTR [rdi+0x9] + mov edi,DWORD PTR [rax+rdx*8+0x4] + mov esi,DWORD PTR [rax+rdx*8] + → jmp 137ccc6 + 2d: → jmp 137ccd8 mfence - mov %gs:0x17bc0,%rax - clflush (%rax) + mov rax,QWORD PTR gs:0x17bc0 + clflush BYTE PTR [rax] mfence - xor %edx,%edx - mov %rdx,%rcx - mov %gs:0x17bc0,%rax - 0.00 monitor %rax,%ecx,%edx - mov (%rax),%rax - test $0x8,%al + xor edx,edx + mov rcx,rdx + mov rax,QWORD PTR gs:0x17bc0 + 0.00 monitor + mov rax,QWORD PTR [rax] + test al,0x8 ↓ jne 71 - ↓ jmpq 68 - verw 0x538b08(%rip) # ffffffff82008150 <ds.0> - 68: mov %rsi,%rax - mov %rdi,%rcx -100.00 mwait %eax,%ecx - 71: mov %gs:0x17bc0,%rax - lock andb $0xdf,0x2(%rax) - lock addl $0x0,-0x4(%rsp) - mov (%rax),%rax - test $0x8,%al + ↓ jmp 68 + verw WORD PTR [rip+0x538b08] # ffffffff82008150 <ds.0> + 68: mov rax,rsi + mov rcx,rdi +100.00 mwait + 71: mov rax,QWORD PTR gs:0x17bc0 + lock and BYTE PTR [rax+0x2],0xdf + lock add DWORD PTR [rsp-0x4],0x0 + mov rax,QWORD PTR [rax] + test al,0x8 ↓ je 97 - andl $0x7fffffff,__preempt_count - 97: ← retq - mov %gs:0x17bc0,%rax - lock orb $0x20,0x2(%rax) - mov (%rax),%rax - test $0x8,%al + and DWORD PTR gs:[rip+0x7e548509],0x7fffffff + 97: ret + mov rax,QWORD PTR gs:0x17bc0 + lock or BYTE PTR [rax+0x2],0x20 + mov rax,QWORD PTR [rax] + test al,0x8 ↑ jne 71 - ↑ jmpq 2d + ↑ jmp 2d # Requested-by: Matt P. Dziubinski <matdzb@gmail.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |