mirror of https://gitee.com/openkylin/linux.git
11678 Commits
Author | SHA1 | Message | Date |
---|---|---|---|
![]() |
1ef998ff18 |
perf intel-pt: Consolidate thread-stack use condition
The components of the condition do not change, so consolidate them in one variable. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Link: http://lore.kernel.org/lkml/20200429150751.12570-3-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
![]() |
86d67180b9 |
perf thread-stack: Add branch stack support
Intel PT already has support for creating branch stacks for each context (per-cpu or per-thread). In the more common per-cpu case, the branch stack is not separated for different threads, instead being cleared in between each sample. That approach will not work very well for adding branch stacks to regular events. The branch stacks really need to be accumulated separately for each thread. As a start to accomplishing that, this patch adds support for putting branch stack support into the thread-stack. The advantages are: 1. the branches are accumulated separately for each thread 2. the branch stack is cleared only in between continuous traces This helps pave the way for adding branch stacks to regular events, not just synthesized events as at present. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Link: http://lore.kernel.org/lkml/20200429150751.12570-2-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
![]() |
bb629484d9 |
perf tools: Simplify checking if SMT is active.
SMT now could be disabled via "/sys/devices/system/cpu/smt/control". Status is shown in "/sys/devices/system/cpu/smt/active" simply as "0" / "1". If this knob isn't here then fallback to checking topology as before. Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru> Cc: Andi Kleen <ak@linux.intel.com> Cc: Dmitry Monakhov <dmtrmonakhov@yandex-team.ru> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lore.kernel.org/lkml/158817741394.748034.9273604089138009552.stgit@buzz Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
![]() |
846de4371f |
perf tools: Fix reading new topology attribute "core_cpus"
Check if access("devices/system/cpu/cpu%d/topology/core_cpus", F_OK)
fails, which will happen unless the current directory is "/sys".
Simply try to read this file first.
Fixes:
|
|
![]() |
ba08829aac |
perf parse-events: Fix another memory leaks found on parse_events()
Fix another memory leak found by applying LLVM's libfuzzer on parse_events(). Signed-off-by: Ian Rogers <irogers@google.com> Acked-by: Jiri Olsa <jolsa@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: clang-built-linux@googlegroups.com Link: http://lore.kernel.org/lkml/20200319023101.82458-1-irogers@google.com [ split from a larger patch ] Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
![]() |
672f707ef5 |
perf parse-events: Fix memory leaks found on parse_events
free_list_evsel() deals with tools/perf/ evsels, not with libperf perf_evsels, use the right destructor and avoid a leak, as evsel__delete() will delete something perf_evsel__delete() doesn't. Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Acked-by: Jiri Olsa <jolsa@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: clang-built-linux@googlegroups.com Link: http://lore.kernel.org/lkml/20200319023101.82458-1-irogers@google.com [ split from a larger patch ] Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
![]() |
e8dfb81838 |
perf parse-events: Fix memory leaks found on parse_events
Fix a memory leak found by applying LLVM's libfuzzer on parse_events(). Signed-off-by: Ian Rogers <irogers@google.com> Acked-by: Jiri Olsa <jolsa@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: clang-built-linux@googlegroups.com Link: http://lore.kernel.org/lkml/20200319023101.82458-1-irogers@google.com [ split from a larger patch, use zfree() ] Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
![]() |
23cbb41c93 |
perf record: Move side band evlist setup to separate routine
It is quite big by now, move that code to a separate record__setup_sb_evlist() routine. Suggested-by: Jiri Olsa <jolsa@redhat.com> Acked-by: Jiri Olsa <jolsa@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Song Liu <songliubraving@fb.com> Link: http://lore.kernel.org/lkml/20200429131106.27974-9-acme@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
![]() |
899e5ffbf2 |
perf record: Introduce --switch-output-event
Now we can use it with --overwrite to have a flight recorder mode that gets snapshot requests from arbitrary events that are processed in the side band thread together with the PERF_RECORD_BPF_EVENT processing. Example: To collect scheduler events until a recvmmsg syscall happens, system wide: [root@five a]# rm -f perf.data.2020042717* [root@five a]# perf record --overwrite -e sched:*switch,syscalls:*recvmmsg --switch-output-event syscalls:sys_enter_recvmmsg [ perf record: dump data: Woken up 1 times ] [ perf record: Dump perf.data.2020042717585458 ] [ perf record: dump data: Woken up 1 times ] [ perf record: Dump perf.data.2020042717590235 ] [ perf record: dump data: Woken up 1 times ] [ perf record: Dump perf.data.2020042717590398 ] ^C[ perf record: Woken up 1 times to write data ] [ perf record: Dump perf.data.2020042717590511 ] [ perf record: Captured and wrote 7.244 MB perf.data.<timestamp> ] So in the above case we had 3 snapshots, the fourth was forced by control+C: [root@five a]# ls -la total 20440 drwxr-xr-x. 2 root root 4096 Apr 27 17:59 . dr-xr-x---. 12 root root 4096 Apr 27 17:46 .. -rw-------. 1 root root 3936125 Apr 27 17:58 perf.data.2020042717585458 -rw-------. 1 root root 5074869 Apr 27 17:59 perf.data.2020042717590235 -rw-------. 1 root root 4291037 Apr 27 17:59 perf.data.2020042717590398 -rw-------. 1 root root 7617037 Apr 27 17:59 perf.data.2020042717590511 [root@five a]# One can make this more precise by adding the switch output event to the main -e events list, as since this is done asynchronously, a few events after the signal event will appear in the snapshots, as can be seen with: [root@five a]# rm -f perf.data.20200427175* [root@five a]# perf record --overwrite -e sched:*switch,syscalls:*recvmmsg --switch-output-event syscalls:sys_enter_recvmmsg [ perf record: dump data: Woken up 1 times ] [ perf record: Dump perf.data.2020042718024203 ] [ perf record: dump data: Woken up 1 times ] [ perf record: Dump perf.data.2020042718024301 ] [ perf record: dump data: Woken up 1 times ] [ perf record: Dump perf.data.2020042718024484 ] ^C[ perf record: Woken up 1 times to write data ] [ perf record: Dump perf.data.2020042718024562 ] [ perf record: Captured and wrote 7.337 MB perf.data.<timestamp> ] [root@five a]# perf script -i perf.data.2020042718024203 | tail -15 PacerThread 148586 [005] 122.830729: sched:sched_switch: prev_comm=PacerThread prev_pid=148586... swapper 0 [000] 122.833588: sched:sched_switch: prev_comm=swapper/0 prev_pid=... NetworkManager 1251 [000] 122.833619: syscalls:sys_enter_recvmmsg: fd: 0x0000001c, mmsg: 0x7ffe83054a1... swapper 0 [002] 122.833624: sched:sched_switch: prev_comm=swapper/2 prev_pid=... swapper 0 [003] 122.833624: sched:sched_switch: prev_comm=swapper/3 prev_pid=... NetworkManager 1251 [000] 122.833626: syscalls:sys_exit_recvmmsg: 0x1 kworker/3:3-eve 158946 [003] 122.833628: sched:sched_switch: prev_comm=kworker/3:3 prev_pid=15894... swapper 0 [004] 122.833641: sched:sched_switch: prev_comm=swapper/4 prev_pid=... NetworkManager 1251 [000] 122.833642: sched:sched_switch: prev_comm=NetworkManage... perf 228273 [002] 122.833645: sched:sched_switch: prev_comm=perf prev_pid=22827... swapper 0 [011] 122.833646: sched:sched_switch: prev_comm=swapper/1... swapper 0 [002] 122.833648: sched:sched_switch: prev_comm=swapper/... kworker/0:2-eve 207387 [000] 122.833648: sched:sched_switch: prev_comm=kworker/0:2 prev_pid=20738... kworker/2:3-eve 232038 [002] 122.833652: sched:sched_switch: prev_comm=kworker/2:3 prev_pid=23203... perf 235825 [003] 122.833653: sched:sched_switch: prev_comm=perf prev_pid=23582... [root@five a]# Acked-by: Jiri Olsa <jolsa@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Song Liu <songliubraving@fb.com> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lore.kernel.org/lkml/20200429131106.27974-8-acme@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
![]() |
976be84504 |
perf evlist: Allow reusing the side band thread for more purposes
I.e. so far we had just one event in that side band thread, a dummy one with attr.bpf_event set, so that 'perf record' can go ahead and ask the kernel for further information about BPF programs being loaded. Allow for more than one event to be there, so that we can use it as well for the upcoming --switch-output-event feature. Acked-by: Jiri Olsa <jolsa@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Song Liu <songliubraving@fb.com> Link: http://lore.kernel.org/lkml/20200429131106.27974-6-acme@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
![]() |
9a39994467 |
perf evlist: Move the sideband thread routines to separate object
To avoid dragging more stuff into the perf python binding in the following csets. Reported-by: Jiri Olsa <jolsa@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
![]() |
d0abbc3ce6 |
perf parse-events: Add parse_events_option() variant that creates evlist
For the upcoming --switch-output-event option we want to create the side band event, populate it with the specified events and then, if it is present multiple times, go on adding to it, then, if the BPF tracking is required, use the first event to set its attr.bpf_event to get those PERF_RECORD_BPF_EVENT metadata events too. Acked-by: Jiri Olsa <jolsa@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Song Liu <songliubraving@fb.com> Link: http://lore.kernel.org/lkml/20200429131106.27974-5-acme@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
![]() |
b38d85ef49 |
perf bpf: Decouple creating the evlist from adding the SB event
Renaming bpf_event__add_sb_event() to evlist__add_sb_event() and requiring that the evlist be allocated beforehand. This will allow using the same side band thread and evlist to be used for multiple purposes in addition to react to PERF_RECORD_BPF_EVENT soon after they are generated. Acked-by: Jiri Olsa <jolsa@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Song Liu <songliubraving@fb.com> Link: http://lore.kernel.org/lkml/20200429131106.27974-4-acme@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
![]() |
ca6c9c8b10 |
perf top: Move sb_evlist to 'struct perf_top'
Where state related to a 'perf top' session is grouped. Acked-by: Jiri Olsa <jolsa@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Song Liu <songliubraving@fb.com> Link: http://lore.kernel.org/lkml/20200429131106.27974-3-acme@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
![]() |
bc477d7983 |
perf record: Move sb_evlist to 'struct record'
Where state related to a 'perf record' session is grouped. Acked-by: Jiri Olsa <jolsa@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Song Liu <songliubraving@fb.com> Link: http://lore.kernel.org/lkml/20200429131106.27974-2-acme@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
![]() |
40c7d2460e |
perf tools: Move routines that probe for perf API features to separate file
Trying to disentangle this a bit further, unfortunately it uses parse_events(), its interesting to have it separated anyway, so do it. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
![]() |
354575c00d |
perf vendor events power9: Add hv_24x7 socket/chip level metric events
The hv_24×7 feature in IBM® POWER9™ processor-based servers provide the facility to continuously collect large numbers of hardware performance metrics efficiently and accurately. This patch adds hv_24x7 metric file for different Socket/chip resources. Result: power9 platform: command:# ./perf stat --metric-only -M Memory_RD_BW_Chip -C 0 -I 1000 1.000096188 0.9 0.3 2.000285720 0.5 0.1 3.000424990 0.4 0.1 command:# ./perf stat --metric-only -M PowerBUS_Frequency -C 0 -I 1000 1.000097981 2.3 2.3 2.000291713 2.3 2.3 3.000421719 2.3 2.3 4.000550912 2.3 2.3 Signed-off-by: Kajol Jain <kjain@linux.ibm.com> Acked-by: Jiri Olsa <jolsa@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Anju T Sudhakar <anju@linux.vnet.ibm.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Jin Yao <yao.jin@linux.intel.com> Cc: Joe Mario <jmario@redhat.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Madhavan Srinivasan <maddy@linux.vnet.ibm.com> Cc: Mamatha Inamdar <mamatha4@linux.vnet.ibm.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Michael Petlan <mpetlan@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Paul Mackerras <paulus@ozlabs.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@linux.ibm.com> Cc: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linuxppc-dev@lists.ozlabs.org Link: http://lore.kernel.org/lkml/20200401203340.31402-8-kjain@linux.ibm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
![]() |
3351c6da89 |
perf tools: Enable Hz/hz prinitg for --metric-only option
Commit
|
|
![]() |
9022608ec5 |
perf tests expr: Added test for runtime param in metric expression
Added test case for parsing "?" in metric expression. Signed-off-by: Kajol Jain <kjain@linux.ibm.com> Acked-by: Jiri Olsa <jolsa@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Anju T Sudhakar <anju@linux.vnet.ibm.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Jin Yao <yao.jin@linux.intel.com> Cc: Joe Mario <jmario@redhat.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Madhavan Srinivasan <maddy@linux.vnet.ibm.com> Cc: Mamatha Inamdar <mamatha4@linux.vnet.ibm.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Michael Petlan <mpetlan@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Paul Mackerras <paulus@ozlabs.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@linux.ibm.com> Cc: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linuxppc-dev@lists.ozlabs.org Link: http://lore.kernel.org/lkml/20200401203340.31402-6-kjain@linux.ibm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
![]() |
1e1a873dc6 |
perf metricgroups: Enhance JSON/metric infrastructure to handle "?"
Patch enhances current metric infrastructure to handle "?" in the metric expression. The "?" can be use for parameters whose value not known while creating metric events and which can be replace later at runtime to the proper value. It also add flexibility to create multiple events out of single metric event added in JSON file. Patch adds function 'arch_get_runtimeparam' which is a arch specific function, returns the count of metric events need to be created. By default it return 1. This infrastructure needed for hv_24x7 socket/chip level events. "hv_24x7" chip level events needs specific chip-id to which the data is requested. Function 'arch_get_runtimeparam' implemented in header.c which extract number of sockets from sysfs file "sockets" under "/sys/devices/hv_24x7/interface/". With this patch basically we are trying to create as many metric events as define by runtime_param. For that one loop is added in function 'metricgroup__add_metric', which create multiple events at run time depend on return value of 'arch_get_runtimeparam' and merge that event in 'group_list'. To achieve that we are actually passing this parameter value as part of `expr__find_other` function and changing "?" present in metric expression with this value. As in our JSON file, there gonna be single metric event, and out of which we are creating multiple events. To understand which data count belongs to which parameter value, we also printing param value in generic_metric function. For example, command:# ./perf stat -M PowerBUS_Frequency -C 0 -I 1000 1.000101867 9,356,933 hv_24x7/pm_pb_cyc,chip=0/ # 2.3 GHz PowerBUS_Frequency_0 1.000101867 9,366,134 hv_24x7/pm_pb_cyc,chip=1/ # 2.3 GHz PowerBUS_Frequency_1 2.000314878 9,365,868 hv_24x7/pm_pb_cyc,chip=0/ # 2.3 GHz PowerBUS_Frequency_0 2.000314878 9,366,092 hv_24x7/pm_pb_cyc,chip=1/ # 2.3 GHz PowerBUS_Frequency_1 So, here _0 and _1 after PowerBUS_Frequency specify parameter value. Signed-off-by: Kajol Jain <kjain@linux.ibm.com> Acked-by: Jiri Olsa <jolsa@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Anju T Sudhakar <anju@linux.vnet.ibm.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Jin Yao <yao.jin@linux.intel.com> Cc: Joe Mario <jmario@redhat.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Madhavan Srinivasan <maddy@linux.vnet.ibm.com> Cc: Mamatha Inamdar <mamatha4@linux.vnet.ibm.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Michael Petlan <mpetlan@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Paul Mackerras <paulus@ozlabs.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@linux.ibm.com> Cc: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linuxppc-dev@lists.ozlabs.org Link: http://lore.kernel.org/lkml/20200401203340.31402-5-kjain@linux.ibm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
![]() |
454a8be0cf |
perf pmu: Fix function name in comment, its get_cpuid_str(), not get_cpustr()
get_cpuid_str() is used in tools/perf/arch/xxx/util/header.c, fix the name in comment. Signed-off-by: Shaokun Zhang <zhangshaokun@hisilicon.com> Cc: Andi Kleen <ak@linux.intel.com> Link: http://lore.kernel.org/lkml/1588141992-48382-1-git-send-email-zhangshaokun@hisilicon.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
![]() |
6fa9c3e779 |
perf report: Fix warning assignment of 0/1 to bool variable
Fixes coccicheck warning: tools/perf/builtin-report.c:1403:2-34: WARNING: Assignment of 0/1 to bool variable Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Zou Wei <zou_wei@huawei.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lore.kernel.org/lkml/1587904683-3510-1-git-send-email-zou_wei@huawei.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
![]() |
8284bbeab7 |
perf tools: Remove unneeded semicolons
Fixes coccicheck warnings: tools/perf/builtin-diff.c:1565:2-3: Unneeded semicolon tools/perf/builtin-lock.c:778:2-3: Unneeded semicolon tools/perf/builtin-mem.c:126:2-3: Unneeded semicolon tools/perf/util/intel-pt-decoder/intel-pt-pkt-decoder.c:555:2-3: Unneeded semicolon tools/perf/util/ordered-events.c:317:2-3: Unneeded semicolon tools/perf/util/synthetic-events.c:1131:2-3: Unneeded semicolon tools/perf/util/trace-event-read.c:78:2-3: Unneeded semicolon Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Zou Wei <zou_wei@huawei.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lore.kernel.org/lkml/1588065523-71423-1-git-send-email-zou_wei@huawei.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
![]() |
2cca512ad2 |
perf c2c: Remove unneeded semicolon
Fixes coccicheck warnings: tools/perf/builtin-c2c.c:1712:2-3: Unneeded semicolon tools/perf/builtin-c2c.c:1928:2-3: Unneeded semicolon tools/perf/builtin-c2c.c:2962:2-3: Unneeded semicolon Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Zou Wei <zou_wei@huawei.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lore.kernel.org/lkml/1588064336-70456-1-git-send-email-zou_wei@huawei.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
![]() |
fad1f1e7de |
perf script: Remove extraneous newline in perf_sample__fprintf_regs()
When printing iregs, there was a double newline printed because perf_sample__fprintf_regs() was printing its own and then at the end of all fields, perf script was adding one. This was causing blank line in the output: Before: $ perf script -Fip,iregs 401b8d ABI:2 DX:0x100 SI:0x4a8340 DI:0x4a9340 401b8d ABI:2 DX:0x100 SI:0x4a9340 DI:0x4a8340 401b8d ABI:2 DX:0x100 SI:0x4a8340 DI:0x4a9340 401b8d ABI:2 DX:0x100 SI:0x4a9340 DI:0x4a8340 After: $ perf script -Fip,iregs 401b8d ABI:2 DX:0x100 SI:0x4a8340 DI:0x4a9340 401b8d ABI:2 DX:0x100 SI:0x4a9340 DI:0x4a8340 401b8d ABI:2 DX:0x100 SI:0x4a8340 DI:0x4a9340 Committer testing: First we need to figure out how to request that registers be recorded, so we use: # perf record -h reg Usage: perf record [<options>] [<command>] or: perf record [<options>] -- <command> [<options>] -I, --intr-regs[=<any register>] sample selected machine registers on interrupt, use '-I?' to list register names --buildid-all Record build-id of all DSOs regardless of hits --user-regs[=<any register>] sample selected machine registers on interrupt, use '--user-regs=?' to list register names # Ok, now lets ask for them all: # perf record -a --intr-regs --user-regs sleep 1 [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 4.105 MB perf.data (2760 samples) ] # Lets look at the first 6 output lines: # perf script -Fip,iregs | head -6 ffffffff8a06f2f4 ABI:2 AX:0xffffd168fee0a980 BX:0xffff8a23b087f000 CX:0xfffeb69aaeb25d73 DX:0xffff8a253e8310f0 SI:0xfffffff9bafe7359 DI:0xffffb1690204fb10 BP:0xffffd168fee0a950 SP:0xffffb1690204fb88 IP:0xffffffff8a06f2f4 FLAGS:0x4e CS:0x10 SS:0x18 R8:0x1495f0a91129a R9:0xffff8a23b087f000 R10:0x1 R11:0xffffffff R12:0x0 R13:0xffff8a253e827e00 R14:0xffffd168fee0aa5c R15:0xffffd168fee0a980 ffffffff8a06f2f4 ABI:2 AX:0x0 BX:0xffffd168fee0a950 CX:0x5684cc1118491900 DX:0x0 SI:0xffffd168fee0a9d0 DI:0x202 BP:0xffffb1690204fd70 SP:0xffffb1690204fd20 IP:0xffffffff8a06f2f4 FLAGS:0x24e CS:0x10 SS:0x18 R8:0x0 R9:0xffffd168fee0a9d0 R10:0x1 R11:0xffffffff R12:0xffffffff8a23e480 R13:0xffff8a23b087f240 R14:0xffff8a23b087f000 R15:0xffffd168fee0a950 ffffffff8a06f2f4 ABI:2 AX:0x0 BX:0x0 CX:0x7f25f334335b DX:0x0 SI:0x2400 DI:0x4 BP:0x7fff5f264570 SP:0x7fff5f264538 IP:0xffffffff8a06f2f4 FLAGS:0x24e CS:0x10 SS:0x2b R8:0x0 R9:0x2312d20 R10:0x0 R11:0x246 R12:0x22cc0e0 R13:0x0 R14:0x0 R15:0x22d0780 # Reproduced, apply the patch and: [root@five ~]# perf script -Fip,iregs | head -6 ffffffff8a06f2f4 ABI:2 AX:0xffffd168fee0a980 BX:0xffff8a23b087f000 CX:0xfffeb69aaeb25d73 DX:0xffff8a253e8310f0 SI:0xfffffff9bafe7359 DI:0xffffb1690204fb10 BP:0xffffd168fee0a950 SP:0xffffb1690204fb88 IP:0xffffffff8a06f2f4 FLAGS:0x4e CS:0x10 SS:0x18 R8:0x1495f0a91129a R9:0xffff8a23b087f000 R10:0x1 R11:0xffffffff R12:0x0 R13:0xffff8a253e827e00 R14:0xffffd168fee0aa5c R15:0xffffd168fee0a980 ffffffff8a06f2f4 ABI:2 AX:0x0 BX:0xffffd168fee0a950 CX:0x5684cc1118491900 DX:0x0 SI:0xffffd168fee0a9d0 DI:0x202 BP:0xffffb1690204fd70 SP:0xffffb1690204fd20 IP:0xffffffff8a06f2f4 FLAGS:0x24e CS:0x10 SS:0x18 R8:0x0 R9:0xffffd168fee0a9d0 R10:0x1 R11:0xffffffff R12:0xffffffff8a23e480 R13:0xffff8a23b087f240 R14:0xffff8a23b087f000 R15:0xffffd168fee0a950 ffffffff8a06f2f4 ABI:2 AX:0x0 BX:0x0 CX:0x7f25f334335b DX:0x0 SI:0x2400 DI:0x4 BP:0x7fff5f264570 SP:0x7fff5f264538 IP:0xffffffff8a06f2f4 FLAGS:0x24e CS:0x10 SS:0x2b R8:0x0 R9:0x2312d20 R10:0x0 R11:0x246 R12:0x22cc0e0 R13:0x0 R14:0x0 R15:0x22d0780 ffffffff8a24074b ABI:2 AX:0xcb BX:0xcb CX:0x0 DX:0x0 SI:0xffffb1690204ff58 DI:0xcb BP:0xffffb1690204ff58 SP:0xffffb1690204ff40 IP:0xffffffff8a24074b FLAGS:0x24e CS:0x10 SS:0x18 R8:0x0 R9:0x0 R10:0x0 R11:0x0 R12:0x0 R13:0x0 R14:0x0 R15:0x0 ffffffff8a310600 ABI:2 AX:0x0 BX:0xffffffff8b8c39a0 CX:0x0 DX:0xffff8a2503890300 SI:0xffffb1690204ff20 DI:0xffff8a23e4080000 BP:0xffff8a23e4080000 SP:0xffffb1690204fec0 IP:0xffffffff8a310600 FLAGS:0x28e CS:0x10 SS:0x18 R8:0x0 R9:0x0 R10:0x0 R11:0x0 R12:0xffffffffffffffea R13:0xffff8a23e4080020 R14:0x0 R15:0x0 ffffffff8a11b688 ABI:2 AX:0x0 BX:0xffff8a237b7c8800 CX:0xffffb1690204fae0 DX:0x78 SI:0xffff8a237b7c8800 DI:0xffffb1690204fa10 BP:0xffffb1690204fb00 SP:0xffffb1690204fa00 IP:0xffffffff8a11b688 FLAGS:0x8a CS:0x10 SS:0x18 R8:0x1495f0a917eba R9:0xffffd168fde19a48 R10:0xffffb1690204fd98 R11:0xffff8a253e82afb0 R12:0xffff8a237b7c8800 R13:0xffffb1690204fb00 R14:0x0 R15:0xffff8a237b7c8800 [root@five ~]# To see it more clearly, lets get just two of those registers by sample: # perf record -a --intr-regs=ax,bx --user-regs=cx,dx sleep 1 [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 3.502 MB perf.data (1653 samples) ] # Extra info, lets see what gets setup in that 'struct perf_event_attr': # perf evlist -v cycles: size: 120, { sample_period, sample_freq }: 4000, sample_type: IP|TID|TIME|CPU|PERIOD|REGS_USER|REGS_INTR, read_format: ID, disabled: 1, inherit: 1, mmap: 1, comm: 1, freq: 1, task: 1, precise_ip: 2, sample_id_all: 1, exclude_guest: 1, mmap2: 1, comm_exec: 1, ksymbol: 1, bpf_event: 1, sample_regs_user: 0xc, sample_regs_intr: 0x3 # Cook, some PERF_SAMPLE_REGS_USER|PERF_SAMPLE_REGS_INTR + attr.sample_regs_user and attr.sample_regs_intr register masks, now lets see if those newlines are gone in a more compact fashion: # perf script -Fip,iregs,uregs ffffffff8a56df78 ABI:2 AX:0xffff8a25137b6028 BX:0xffff8a2502f18000 ABI:2 CX:0x7f204460e49b DX:0xf42920 ffffffff8a56df78 ABI:2 AX:0xffff8a25137b6028 BX:0xffff8a2502f18000 ABI:2 CX:0x7f204460e49b DX:0xf42920 ffffffff8a56df78 ABI:2 AX:0xffff8a25137b6028 BX:0xffff8a2502f18000 ABI:2 CX:0x7f204460e49b DX:0xf42920 ffffffff8a56df78 ABI:2 AX:0xffff8a25137b6028 BX:0xffff8a2502f18000 ABI:2 CX:0x7f204460e49b DX:0xf42920 ffffffff8a56df78 ABI:2 AX:0xffff8a25137b6028 BX:0xffff8a2502f18000 ABI:2 CX:0x7f204460e49b DX:0xf42920 ffffffff8a56df78 ABI:2 AX:0xffff8a25137b6028 BX:0xffff8a2502f18000 ABI:2 CX:0x7f204460e49b DX:0xf42920 ffffffff8a29b78d ABI:2 AX:0x2a20ffcd6000 BX:0x2ec7d9000 ABI:2 CX:0x7f204460e49b DX:0xf42920 # And where was that? # perf script -Fip,iregs,uregs,sym,dso ffffffff8a56df78 strrchr (/lib/modules/5.7.0-rc2/build/vmlinux) ABI:2 AX:0xffff8a25137b6028 BX:0xffff8a2502f18000 ABI:2 CX:0x7f204460e49b DX:0xf42920 ffffffff8a56df78 strrchr (/lib/modules/5.7.0-rc2/build/vmlinux) ABI:2 AX:0xffff8a25137b6028 BX:0xffff8a2502f18000 ABI:2 CX:0x7f204460e49b DX:0xf42920 ffffffff8a56df78 strrchr (/lib/modules/5.7.0-rc2/build/vmlinux) ABI:2 AX:0xffff8a25137b6028 BX:0xffff8a2502f18000 ABI:2 CX:0x7f204460e49b DX:0xf42920 ffffffff8a56df78 strrchr (/lib/modules/5.7.0-rc2/build/vmlinux) ABI:2 AX:0xffff8a25137b6028 BX:0xffff8a2502f18000 ABI:2 CX:0x7f204460e49b DX:0xf42920 ffffffff8a56df78 strrchr (/lib/modules/5.7.0-rc2/build/vmlinux) ABI:2 AX:0xffff8a25137b6028 BX:0xffff8a2502f18000 ABI:2 CX:0x7f204460e49b DX:0xf42920 ffffffff8a56df78 strrchr (/lib/modules/5.7.0-rc2/build/vmlinux) ABI:2 AX:0xffff8a25137b6028 BX:0xffff8a2502f18000 ABI:2 CX:0x7f204460e49b DX:0xf42920 ffffffff8a29b78d __vma_link_rb (/lib/modules/5.7.0-rc2/build/vmlinux) ABI:2 AX:0x2a20ffcd6000 BX:0x2ec7d9000 ABI:2 CX:0x7f204460e49b DX:0xf42920 # Signed-off-by: Stephane Eranian <eranian@google.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lore.kernel.org/lkml/20200418231908.152212-1-eranian@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
![]() |
2069425eb3 |
perf synthetic events: Remove use of sscanf from /proc reading
The synthesize benchmark, run on a single process and thread, shows perf_event__synthesize_mmap_events as the hottest function with fgets and sscanf taking the majority of execution time. fscanf performs similarly well. Replace the scanf call with manual reading of each field of the /proc/pid/maps line, and remove some unnecessary buffering. This change also addresses potential, but unlikely, buffer overruns for the string values read by scanf. Performance before is: $ sudo perf bench internals synthesize -m 16 -M 16 -s -t \# Running 'internals/synthesize' benchmark: Computing performance of single threaded perf event synthesis by synthesizing events on the perf process itself: Average synthesis took: 102.810 usec (+- 0.027 usec) Average num. events: 17.000 (+- 0.000) Average time per event 6.048 usec Average data synthesis took: 106.325 usec (+- 0.018 usec) Average num. events: 89.000 (+- 0.000) Average time per event 1.195 usec Computing performance of multi threaded perf event synthesis by synthesizing events on CPU 0: Number of synthesis threads: 16 Average synthesis took: 68103.100 usec (+- 441.234 usec) Average num. events: 30703.000 (+- 0.730) Average time per event 2.218 usec And after is: $ sudo perf bench internals synthesize -m 16 -M 16 -s -t \# Running 'internals/synthesize' benchmark: Computing performance of single threaded perf event synthesis by synthesizing events on the perf process itself: Average synthesis took: 50.388 usec (+- 0.031 usec) Average num. events: 17.000 (+- 0.000) Average time per event 2.964 usec Average data synthesis took: 52.693 usec (+- 0.020 usec) Average num. events: 89.000 (+- 0.000) Average time per event 0.592 usec Computing performance of multi threaded perf event synthesis by synthesizing events on CPU 0: Number of synthesis threads: 16 Average synthesis took: 45022.400 usec (+- 552.740 usec) Average num. events: 30624.200 (+- 10.037) Average time per event 1.470 usec On a Intel Xeon 6154 compiling with Debian gcc 9.2.1. Committer testing: On a AMD Ryzen 5 3600X 6-Core Processor: Before: # perf bench internals synthesize --min-threads 12 --max-threads 12 --st --mt # Running 'internals/synthesize' benchmark: Computing performance of single threaded perf event synthesis by synthesizing events on the perf process itself: Average synthesis took: 267.491 usec (+- 0.176 usec) Average num. events: 56.000 (+- 0.000) Average time per event 4.777 usec Average data synthesis took: 277.257 usec (+- 0.169 usec) Average num. events: 287.000 (+- 0.000) Average time per event 0.966 usec Computing performance of multi threaded perf event synthesis by synthesizing events on CPU 0: Number of synthesis threads: 12 Average synthesis took: 81599.500 usec (+- 346.315 usec) Average num. events: 36096.100 (+- 2.523) Average time per event 2.261 usec # After: # perf bench internals synthesize --min-threads 12 --max-threads 12 --st --mt # Running 'internals/synthesize' benchmark: Computing performance of single threaded perf event synthesis by synthesizing events on the perf process itself: Average synthesis took: 110.125 usec (+- 0.080 usec) Average num. events: 56.000 (+- 0.000) Average time per event 1.967 usec Average data synthesis took: 118.518 usec (+- 0.057 usec) Average num. events: 287.000 (+- 0.000) Average time per event 0.413 usec Computing performance of multi threaded perf event synthesis by synthesizing events on CPU 0: Number of synthesis threads: 12 Average synthesis took: 43490.700 usec (+- 284.527 usec) Average num. events: 37028.500 (+- 0.563) Average time per event 1.175 usec # Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Acked-by: Jiri Olsa <jolsa@redhat.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andrey Zhizhikin <andrey.z@gmail.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Kefeng Wang <wangkefeng.wang@huawei.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Petr Mladek <pmladek@suse.com> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lore.kernel.org/lkml/20200415054050.31645-4-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
![]() |
e95770af4c |
tools api: Add a lightweight buffered reading api
The synthesize benchmark shows the majority of execution time going to fgets and sscanf, necessary to parse /proc/pid/maps. Add a new buffered reading library that will be used to replace these calls in a follow-up CL. Add tests for the library to perf test. Committer tests: $ perf test api 63: Test api io : Ok $ Signed-off-by: Ian Rogers <irogers@google.com> Acked-by: Jiri Olsa <jolsa@redhat.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andrey Zhizhikin <andrey.z@gmail.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Kefeng Wang <wangkefeng.wang@huawei.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Petr Mladek <pmladek@suse.com> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lore.kernel.org/lkml/20200415054050.31645-3-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
![]() |
13edc23720 |
perf bench: Add a multi-threaded synthesize benchmark
By default this isn't run as it reads /proc and may not have access. For consistency, modify the single threaded benchmark to compute an average time per event. Committer testing: $ grep -m1 "model name" /proc/cpuinfo model name : Intel(R) Core(TM) i7-8650U CPU @ 1.90GHz $ grep "model name" /proc/cpuinfo | wc -l 8 $ $ perf bench internals synthesize -h # Running 'internals/synthesize' benchmark: Usage: perf bench internals synthesize <options> -I, --multi-iterations <n> Number of iterations used to compute multi-threaded average -i, --single-iterations <n> Number of iterations used to compute single-threaded average -M, --max-threads <n> Maximum number of threads in multithreaded bench -m, --min-threads <n> Minimum number of threads in multithreaded bench -s, --st Run single threaded benchmark -t, --mt Run multi-threaded benchmark $ $ perf bench internals synthesize -t # Running 'internals/synthesize' benchmark: Computing performance of multi threaded perf event synthesis by synthesizing events on CPU 0: Number of synthesis threads: 1 Average synthesis took: 65449.000 usec (+- 586.442 usec) Average num. events: 9405.400 (+- 0.306) Average time per event 6.959 usec Number of synthesis threads: 2 Average synthesis took: 37838.300 usec (+- 130.259 usec) Average num. events: 9501.800 (+- 20.469) Average time per event 3.982 usec Number of synthesis threads: 3 Average synthesis took: 48551.400 usec (+- 225.686 usec) Average num. events: 9544.000 (+- 0.000) Average time per event 5.087 usec Number of synthesis threads: 4 Average synthesis took: 29632.500 usec (+- 50.808 usec) Average num. events: 9544.000 (+- 0.000) Average time per event 3.105 usec Number of synthesis threads: 5 Average synthesis took: 33920.400 usec (+- 284.509 usec) Average num. events: 9544.000 (+- 0.000) Average time per event 3.554 usec Number of synthesis threads: 6 Average synthesis took: 27604.100 usec (+- 72.344 usec) Average num. events: 9548.000 (+- 0.000) Average time per event 2.891 usec Number of synthesis threads: 7 Average synthesis took: 25406.300 usec (+- 933.371 usec) Average num. events: 9545.500 (+- 0.167) Average time per event 2.662 usec Number of synthesis threads: 8 Average synthesis took: 24110.400 usec (+- 73.229 usec) Average num. events: 9551.000 (+- 0.000) Average time per event 2.524 usec $ Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Acked-by: Jiri Olsa <jolsa@redhat.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andrey Zhizhikin <andrey.z@gmail.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Kefeng Wang <wangkefeng.wang@huawei.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Petr Mladek <pmladek@suse.com> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lore.kernel.org/lkml/20200415054050.31645-2-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
![]() |
d99c22eabe |
perf record: Add num-synthesize-threads option
To control degree of parallelism of the synthesize_mmap() code which is scanning /proc/PID/task/PID/maps and can be time consuming. Mimic perf top way of handling the option. If not specified will default to 1 thread, i.e. default behavior before this option. On a desktop computer the processing of /proc/PID/task/PID/maps isn't slow enough to warrant parallel processing and the thread creation has some cost - hence the default of 1. On a loaded server with >100 cores it is possible to see synthesis times in the order of seconds and in this case having the option is desirable. As the processing is a synchronization point, it is legitimate to worry if Amdahl's law will apply to this patch. Profiling with this patch in place: https://lore.kernel.org/lkml/20200415054050.31645-4-irogers@google.com/ shows: ... - 32.59% __perf_event__synthesize_threads - 32.54% __event__synthesize_thread + 22.13% perf_event__synthesize_mmap_events + 6.68% perf_event__get_comm_ids.constprop.0 + 1.49% process_synthesized_event + 1.29% __GI___readdir64 + 0.60% __opendir ... That is the processing is 1.49% of execution time and there is plenty to make parallel. This is shown in the benchmark in this patch: https://lore.kernel.org/lkml/20200415054050.31645-2-irogers@google.com/ Computing performance of multi threaded perf event synthesis by synthesizing events on CPU 0: Number of synthesis threads: 1 Average synthesis took: 127729.000 usec (+- 3372.880 usec) Average num. events: 21548.600 (+- 0.306) Average time per event 5.927 usec Number of synthesis threads: 2 Average synthesis took: 88863.500 usec (+- 385.168 usec) Average num. events: 21552.800 (+- 0.327) Average time per event 4.123 usec Number of synthesis threads: 3 Average synthesis took: 83257.400 usec (+- 348.617 usec) Average num. events: 21553.200 (+- 0.327) Average time per event 3.863 usec Number of synthesis threads: 4 Average synthesis took: 75093.000 usec (+- 422.978 usec) Average num. events: 21554.200 (+- 0.200) Average time per event 3.484 usec Number of synthesis threads: 5 Average synthesis took: 64896.600 usec (+- 353.348 usec) Average num. events: 21558.000 (+- 0.000) Average time per event 3.010 usec Number of synthesis threads: 6 Average synthesis took: 59210.200 usec (+- 342.890 usec) Average num. events: 21560.000 (+- 0.000) Average time per event 2.746 usec Number of synthesis threads: 7 Average synthesis took: 54093.900 usec (+- 306.247 usec) Average num. events: 21562.000 (+- 0.000) Average time per event 2.509 usec Number of synthesis threads: 8 Average synthesis took: 48938.700 usec (+- 341.732 usec) Average num. events: 21564.000 (+- 0.000) Average time per event 2.269 usec Where average time per synthesized event goes from 5.927 usec with 1 thread to 2.269 usec with 8. This isn't a linear speed up as not all of synthesize code has been made parallel. If the synthesis time was about 10 seconds then using 8 threads may bring this down to less than 4. Signed-off-by: Stephane Eranian <eranian@google.com> Reviewed-by: Ian Rogers <irogers@google.com> Acked-by: Jiri Olsa <jolsa@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Alexey Budankov <alexey.budankov@linux.intel.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Tony Jones <tonyj@suse.de> Cc: yuzhoujian <yuzhoujian@didichuxing.com> Link: http://lore.kernel.org/lkml/20200422155038.9380-1-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
![]() |
dbd660e6b2 |
perf test session topology: Fix data path
Commit |
|
![]() |
197ba86fdc |
perf stat: Improve runtime stat for interval mode
For interval mode, the metric is printed after the '#' character if it exists. But it's not calculated by the counts generated in this interval. See the following examples: root@kbl-ppc:~# perf stat -M CPI -I1000 --interval-count 2 # time counts unit events 1.000422803 764,809 inst_retired.any # 2.9 CPI 1.000422803 2,234,932 cycles 2.001464585 1,960,061 inst_retired.any # 1.6 CPI 2.001464585 4,022,591 cycles The second CPI should not be 1.6 (4,022,591/1,960,061 is 2.1) root@kbl-ppc:~# perf stat -e cycles,instructions -I1000 --interval-count 2 # time counts unit events 1.000429493 2,869,311 cycles 1.000429493 816,875 instructions # 0.28 insn per cycle 2.001516426 9,260,973 cycles 2.001516426 5,250,634 instructions # 0.87 insn per cycle The second 'insn per cycle' should not be 0.87 (5,250,634/9,260,973 is 0.57). The current code uses a global variable 'rt_stat' for tracking and updating the std dev of runtime stat. Unlike the counts, 'rt_stat' is not reset for interval. While the counts are reset for interval. perf_stat_process_counter() { if (config->interval) init_stats(ps->res_stats); } So for interval mode, the 'rt_stat' variable should be reset too. This patch resets 'rt_stat' before read_counters(), so the runtime stat is only calculated by the counts generated in this interval. With this patch: root@kbl-ppc:~# perf stat -M CPI -I1000 --interval-count 2 # time counts unit events 1.000420924 2,408,818 inst_retired.any # 2.1 CPI 1.000420924 5,010,111 cycles 2.001448579 2,798,407 inst_retired.any # 1.6 CPI 2.001448579 4,599,861 cycles root@kbl-ppc:~# perf stat -e cycles,instructions -I1000 --interval-count 2 # time counts unit events 1.000428555 2,769,714 cycles 1.000428555 774,462 instructions # 0.28 insn per cycle 2.001471562 3,595,904 cycles 2.001471562 1,243,703 instructions # 0.35 insn per cycle Now the second 'insn per cycle' and CPI are calculated by the counts generated in this interval. Signed-off-by: Jin Yao <yao.jin@linux.intel.com> Acked-by: Jiri Olsa <jolsa@redhat.com> Tested-By: Kajol Jain <kjain@linux.ibm.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jin Yao <yao.jin@intel.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lore.kernel.org/lkml/20200420145417.6864-1-yao.jin@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
![]() |
0e0bf1ea11 |
perf stat: Zero all the 'ena' and 'run' array slot stats for interval mode
As the code comments in perf_stat_process_counter() say, we calculate
counter's data every interval, and the display code shows ps->res_stats
avg value. We need to zero the stats for interval mode.
But the current code only zeros the res_stats[0], it doesn't zero the
res_stats[1] and res_stats[2], which are for ena and run of counter.
This patch zeros the whole res_stats[] for interval mode.
Fixes:
|
|
![]() |
1e76b171b7 |
perf script: Avoid NULL dereference on symbol
al->sym may be NULL given current if conditions and may cause a segv.
Fixes:
|
|
![]() |
41e7c32b97 |
perf bench: Fix div-by-zero if runtime is zero
Fix div-by-zero if runtime is zero: $ perf bench futex hash --runtime=0 # Running 'futex/hash' benchmark: Run summary [PID 12090]: 4 threads, each operating on 1024 [private] futexes for 0 secs. Floating point exception (core dumped) Signed-off-by: Tommi Rantala <tommi.t.rantala@nokia.com> Acked-by: Jiri Olsa <jolsa@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Darren Hart <dvhart@infradead.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lore.kernel.org/lkml/20200417132330.119407-4-tommi.t.rantala@nokia.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
![]() |
d2e7d8636f |
perf cgroup: Avoid needless closing of unopened fd
Do not bother with close() if fd is not valid, just to silence valgrind: $ valgrind ./perf script ==59169== Memcheck, a memory error detector ==59169== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al. ==59169== Using Valgrind-3.14.0 and LibVEX; rerun with -h for copyright info ==59169== Command: ./perf script ==59169== ==59169== Warning: invalid file descriptor -1 in syscall close() ==59169== Warning: invalid file descriptor -1 in syscall close() ==59169== Warning: invalid file descriptor -1 in syscall close() ==59169== Warning: invalid file descriptor -1 in syscall close() ==59169== Warning: invalid file descriptor -1 in syscall close() ==59169== Warning: invalid file descriptor -1 in syscall close() ==59169== Warning: invalid file descriptor -1 in syscall close() ==59169== Warning: invalid file descriptor -1 in syscall close() Signed-off-by: Tommi Rantala <tommi.t.rantala@nokia.com> Acked-by: Jiri Olsa <jolsa@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lore.kernel.org/lkml/20200417132330.119407-1-tommi.t.rantala@nokia.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
![]() |
12e89e65f4 |
perf hist: Add fast path for duplicate entries check
Perf checks the duplicate entries in a callchain before adding an entry. However the check is very slow especially with deeper call stack. Almost ~50% elapsed time of perf report is spent on the check when the call stack is always depth of 32. The hist_entry__cmp() is used to compare the new entry with the old entries. It will go through all the available sorts in the sort_list, and call the specific cmp of each sort, which is very slow. Actually, for most cases, there are no duplicate entries in callchain. The symbols are usually different. It's much faster to do a quick check for symbols first. Only do the full cmp when the symbols are exactly the same. The quick check is only to check symbols, not dso. Export _sort__sym_cmp. $ perf record --call-graph lbr ./tchain_edit_64 Without the patch $time perf report --stdio real 0m21.142s user 0m21.110s sys 0m0.033s With the patch $time perf report --stdio real 0m10.977s user 0m10.948s sys 0m0.027s Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Acked-by: Jiri Olsa <jolsa@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexey Budankov <alexey.budankov@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Pavel Gerasimov <pavel.gerasimov@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@linux.ibm.com> Cc: Stephane Eranian <eranian@google.com> Cc: Vitaly Slobodskoy <vitaly.slobodskoy@intel.com> Link: http://lore.kernel.org/lkml/20200319202517.23423-18-kan.liang@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
![]() |
d80da766d1 |
perf c2c: Add option to enable the LBR stitching approach
With the LBR stitching approach, the reconstructed LBR call stack can break the HW limitation. However, it may reconstruct invalid call stacks in some cases, e.g. exception handing such as setjmp/longjmp. Also, it may impact the processing time especially when the number of samples with stitched LBRs are huge. Add an option to enable the approach. Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Reviewed-by: Andi Kleen <ak@linux.intel.com> Acked-by: Jiri Olsa <jolsa@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexey Budankov <alexey.budankov@linux.intel.com> Cc: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Pavel Gerasimov <pavel.gerasimov@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@linux.ibm.com> Cc: Stephane Eranian <eranian@google.com> Cc: Vitaly Slobodskoy <vitaly.slobodskoy@intel.com> Link: http://lore.kernel.org/lkml/20200319202517.23423-17-kan.liang@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
![]() |
13e0c844fa |
perf top: Add option to enable the LBR stitching approach
With the LBR stitching approach, the reconstructed LBR call stack can break the HW limitation. However, it may reconstruct invalid call stacks in some cases, e.g. exception handing such as setjmp/longjmp. Also, it may impact the processing time especially when the number of samples with stitched LBRs are huge. Add an option to enable the approach. The option must be used with --call-graph lbr. Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Reviewed-by: Andi Kleen <ak@linux.intel.com> Acked-by: Jiri Olsa <jolsa@redhat.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexey Budankov <alexey.budankov@linux.intel.com> Cc: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Pavel Gerasimov <pavel.gerasimov@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@linux.ibm.com> Cc: Stephane Eranian <eranian@google.com> Cc: Vitaly Slobodskoy <vitaly.slobodskoy@intel.com> Link: http://lore.kernel.org/lkml/20200319202517.23423-16-kan.liang@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
![]() |
680d125cd5 |
perf script: Add option to enable the LBR stitching approach
With the LBR stitching approach, the reconstructed LBR call stack can break the HW limitation. However, it may reconstruct invalid call stacks in some cases, e.g. exception handing such as setjmp/longjmp. Also, it may impact the processing time especially when the number of samples with stitched LBRs are huge. Add an option to enable the approach. Committer testing: Using the same perf.data as with the latest cset committer testing section: $ perf script --stitch-lbr <SNIP> tchain_edit 11131 15164.984292: 437491 cycles:u: 401106 f43+0x0 (/wb/tchain_edit) 40114c f42+0x18 (/wb/tchain_edit) 401172 f41+0xe (/wb/tchain_edit) 401194 f40+0x0 (/wb/tchain_edit) 40119b f39+0x0 (/wb/tchain_edit) 4011a2 f38+0x0 (/wb/tchain_edit) 4011a9 f37+0x0 (/wb/tchain_edit) 4011b0 f36+0x0 (/wb/tchain_edit) 4011b7 f35+0x0 (/wb/tchain_edit) 4011be f34+0x0 (/wb/tchain_edit) 4011c5 f33+0x0 (/wb/tchain_edit) 4011cc f32+0x0 (/wb/tchain_edit) 401207 f31+0x34 (/wb/tchain_edit) 401212 f30+0x0 (/wb/tchain_edit) 401219 f29+0x0 (/wb/tchain_edit) 401220 f28+0x0 (/wb/tchain_edit) 401227 f27+0x0 (/wb/tchain_edit) 40122e f26+0x0 (/wb/tchain_edit) 401235 f25+0x0 (/wb/tchain_edit) 40123c f24+0x0 (/wb/tchain_edit) 401243 f23+0x0 (/wb/tchain_edit) 40124a f22+0x0 (/wb/tchain_edit) 401251 f21+0x0 (/wb/tchain_edit) 401258 f20+0x0 (/wb/tchain_edit) 40125f f19+0x0 (/wb/tchain_edit) 401266 f18+0x0 (/wb/tchain_edit) 40126d f17+0x0 (/wb/tchain_edit) 401274 f16+0x0 (/wb/tchain_edit) 40127b f15+0x0 (/wb/tchain_edit) 401282 f14+0x0 (/wb/tchain_edit) 401289 f13+0x0 (/wb/tchain_edit) 401290 f12+0x0 (/wb/tchain_edit) 401297 f11+0x0 (/wb/tchain_edit) 40129e f10+0x0 (/wb/tchain_edit) 4012a5 f9+0x0 (/wb/tchain_edit) 4012ac f8+0x0 (/wb/tchain_edit) 4012b3 f7+0x0 (/wb/tchain_edit) 4012ba f6+0x0 (/wb/tchain_edit) 4012c1 f5+0x0 (/wb/tchain_edit) 4012c8 f4+0x0 (/wb/tchain_edit) 4012cf f3+0x0 (/wb/tchain_edit) 4012d6 f2+0x0 (/wb/tchain_edit) 4012dd f1+0x0 (/wb/tchain_edit) 4012e4 main+0x0 (/wb/tchain_edit) 7f41a5016f41 __libc_start_main+0xf1 (/usr/lib64/libc-2.29.so) <SNIP> $ Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Reviewed-by: Andi Kleen <ak@linux.intel.com> Acked-by: Jiri Olsa <jolsa@redhat.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexey Budankov <alexey.budankov@linux.intel.com> Cc: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Pavel Gerasimov <pavel.gerasimov@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@linux.ibm.com> Cc: Stephane Eranian <eranian@google.com> Cc: Vitaly Slobodskoy <vitaly.slobodskoy@intel.com> Link: http://lore.kernel.org/lkml/20200319202517.23423-15-kan.liang@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
![]() |
b1d1429b18 |
perf report: Add option to enable the LBR stitching approach
With the LBR stitching approach, the reconstructed LBR call stack can break the HW limitation. However, it may reconstruct invalid call stacks in some cases, e.g. exception handing such as setjmp/longjmp. Also, it may impact the processing time especially when the number of samples with stitched LBRs are huge. Add an option to enable the approach. # To display the perf.data header info, please use # --header/--header-only options. # # # Total Lost Samples: 0 # # Samples: 6K of event 'cycles' # Event count (approx.): 6492797701 # # Children Self Command Shared Object Symbol # ........ ........ ............... .................. # ................................. # 99.99% 99.99% tchain_edit tchain_edit [.] f43 | ---main f1 f2 f3 f4 f5 f6 f7 f8 f9 f10 f11 f12 f13 f14 f15 f16 f17 f18 f19 f20 f21 f22 f23 f24 f25 f26 f27 f28 f29 f30 f31 | --99.65%--f32 f33 f34 f35 f36 f37 f38 f39 f40 f41 f42 f43 Committer testing: $ perf record --call-graph lbr /wb/tchain_edit [ perf record: Woken up 23 times to write data ] [ perf record: Captured and wrote 5.578 MB perf.data (6839 samples) ] $ perf report --header-only | egrep 'cpu(desc|.*capabilities)' # cpudesc : Intel(R) Core(TM) i5-7500 CPU @ 3.40GHz # cpu pmu capabilities: branches=32, max_precise=3, pmu_name=skylake $ Before: $ perf report --no-children --stdio # To display the perf.data header info, please use --header/--header-only options. # # # Total Lost Samples: 0 # # Samples: 6K of event 'cycles:u' # Event count (approx.): 6459523879 # # Overhead Command Shared Object Symbol # ........ ........... ................ ....................... # 99.95% tchain_edit tchain_edit [.] f43 | --99.92%--f43 f42 f41 f40 f39 f38 f37 f36 f35 f34 f33 f32 f31 f30 f29 f28 f27 f26 f25 f24 f23 f22 f21 f20 f19 f18 f17 f16 f15 f14 f13 f12 f11 0.03% tchain_edit tchain_edit [.] f42 0.01% tchain_edit tchain_edit [.] f41 0.00% tchain_edit tchain_edit [.] f31 0.00% tchain_edit ld-2.29.so [.] _dl_relocate_object 0.00% tchain_edit ld-2.29.so [.] memmove 0.00% tchain_edit [unknown] [k] 0xffffffff93a00b17 After: $ perf report --stitch-lbr --no-children --stdio # To display the perf.data header info, please use --header/--header-only options. # # # Total Lost Samples: 0 # # Samples: 6K of event 'cycles:u' # Event count (approx.): 6459496645 # # Overhead Command Shared Object Symbol # ........ ........... ................ ........................ # 99.97% tchain_edit tchain_edit [.] f43 | --99.93%--f43 f42 f41 f40 f39 f38 f37 f36 f35 f34 f33 f32 f31 f30 f29 f28 f27 f26 f25 f24 f23 f22 f21 f20 f19 f18 f17 f16 f15 f14 f13 f12 f11 f10 f9 f8 f7 f6 f5 f4 f3 f2 f1 main __libc_start_main 0.02% tchain_edit [unknown] [k] 0xffffffff93a00b17 0.01% tchain_edit tchain_edit [.] f31 0.00% tchain_edit ld-2.29.so [.] _dl_important_hwcaps Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Reviewed-by: Andi Kleen <ak@linux.intel.com> Acked-by: Jiri Olsa <jolsa@redhat.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexey Budankov <alexey.budankov@linux.intel.com> Cc: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Pavel Gerasimov <pavel.gerasimov@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@linux.ibm.com> Cc: Stephane Eranian <eranian@google.com> Cc: Vitaly Slobodskoy <vitaly.slobodskoy@intel.com> Link: http://lore.kernel.org/lkml/20200319202517.23423-14-kan.liang@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
![]() |
ff165628d7 |
perf callchain: Stitch LBR call stack
In LBR call stack mode, the depth of reconstructed LBR call stack limits to the number of LBR registers. For example, on skylake, the depth of reconstructed LBR call stack is always <= 32. # To display the perf.data header info, please use # --header/--header-only options. # # # Total Lost Samples: 0 # # Samples: 6K of event 'cycles' # Event count (approx.): 6487119731 # # Children Self Command Shared Object Symbol # ........ ........ ............... .................. # ................................ 99.97% 99.97% tchain_edit tchain_edit [.] f43 | --99.64%--f11 f12 f13 f14 f15 f16 f17 f18 f19 f20 f21 f22 f23 f24 f25 f26 f27 f28 f29 f30 f31 f32 f33 f34 f35 f36 f37 f38 f39 f40 f41 f42 f43 For a call stack which is deeper than LBR limit, HW will overwrite the LBR register with oldest branch. Only partial call stacks can be reconstructed. However, the overwritten LBRs may still be retrieved from previous sample. At that moment, HW hasn't overwritten the LBR registers yet. Perf tools can stitch those overwritten LBRs on current call stacks to get a more complete call stack. To determine if LBRs can be stitched, perf tools need to compare current sample with previous sample. - They should have identical LBR records (Same from, to and flags values, and the same physical index of LBR registers). - The searching starts from the base-of-stack of current sample. Once perf determines to stitch the previous LBRs, the corresponding LBR cursor nodes will be copied to 'lists'. The 'lists' is to track the LBR cursor nodes which are going to be stitched. When the stitching is over, the nodes will not be freed immediately. They will be moved to 'free_lists'. Next stitching may reuse the space. Both 'lists' and 'free_lists' will be freed when all samples are processed. Committer notes: Fix the intel-pt.c initialization of the union with 'struct branch_flags', that breaks the build with its unnamed union on older gcc versions. Uninline thread__free_stitch_list(), as it grew big and started dragging includes to thread.h, so move it to thread.c where what it needs in terms of headers are already there. This fixes the build in several systems such as debian:experimental when cross building to the MIPS32 architecture, i.e. in the other cases what was needed was being included by sheer luck. In file included from builtin-sched.c:11: util/thread.h: In function 'thread__free_stitch_list': util/thread.h:169:3: error: implicit declaration of function 'free' [-Werror=implicit-function-declaration] 169 | free(pos); | ^~~~ util/thread.h:169:3: error: incompatible implicit declaration of built-in function 'free' [-Werror] util/thread.h:19:1: note: include '<stdlib.h>' or provide a declaration of 'free' 18 | #include "callchain.h" +++ |+#include <stdlib.h> 19 | util/thread.h:174:3: error: incompatible implicit declaration of built-in function 'free' [-Werror] 174 | free(pos); | ^~~~ util/thread.h:174:3: note: include '<stdlib.h>' or provide a declaration of 'free' Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Reviewed-by: Andi Kleen <ak@linux.intel.com> Acked-by: Jiri Olsa <jolsa@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexey Budankov <alexey.budankov@linux.intel.com> Cc: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Pavel Gerasimov <pavel.gerasimov@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@linux.ibm.com> Cc: Stephane Eranian <eranian@google.com> Cc: Vitaly Slobodskoy <vitaly.slobodskoy@intel.com> Link: http://lore.kernel.org/lkml/20200319202517.23423-13-kan.liang@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
![]() |
7f1d39317c |
perf callchain: Save previous cursor nodes for LBR stitching approach
The cursor nodes which generates from sample are eventually added into callchain. To avoid generating cursor nodes from previous samples again, the previous cursor nodes are also saved for LBR stitching approach. Some option, e.g. hide-unresolved, may hide some LBRs. Add a variable 'valid' in struct callchain_cursor_node to indicate this case. The LBR stitching approach will only append the valid cursor nodes from previous samples later. Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Reviewed-by: Andi Kleen <ak@linux.intel.com> Acked-by: Jiri Olsa <jolsa@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexey Budankov <alexey.budankov@linux.intel.com> Cc: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Pavel Gerasimov <pavel.gerasimov@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@linux.ibm.com> Cc: Stephane Eranian <eranian@google.com> Cc: Vitaly Slobodskoy <vitaly.slobodskoy@intel.com> Link: http://lore.kernel.org/lkml/20200319202517.23423-12-kan.liang@linux.intel.com [ Use zfree() instead of open coded equivalent, and use it when freeing members of structs ] Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
![]() |
9c6c3f471d |
perf thread: Save previous sample for LBR stitching approach
To retrieve the overwritten LBRs from previous sample for LBR stitching approach, perf has to save the previous sample. Only allocate the struct lbr_stitch once, when LBR stitching approach is enabled and kernel supports hw_idx. Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Reviewed-by: Andi Kleen <ak@linux.intel.com> Acked-by: Jiri Olsa <jolsa@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexey Budankov <alexey.budankov@linux.intel.com> Cc: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Pavel Gerasimov <pavel.gerasimov@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@linux.ibm.com> Cc: Stephane Eranian <eranian@google.com> Cc: Vitaly Slobodskoy <vitaly.slobodskoy@intel.com> Link: http://lore.kernel.org/lkml/20200319202517.23423-11-kan.liang@linux.intel.com [ Use zalloc()/zfree() for thread->lbr_stitch ] Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
![]() |
771fd155df |
perf thread: Add a knob for LBR stitch approach
The LBR stitch approach should be disabled by default. Because - The stitching approach base on LBR call stack technology. The known limitations of LBR call stack technology still apply to the approach, e.g. Exception handing such as setjmp/longjmp will have calls/returns not match. - This approach is not foolproof. There can be cases where it creates incorrect call stacks from incorrect matches. There is no attempt to validate any matches in another way. The 'lbr_stitch_enable' is used to indicate whether enable LBR stitch approach, which is disabled by default. The following patch will introduce a new option for each tools to enable the LBR stitch approach. Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Reviewed-by: Andi Kleen <ak@linux.intel.com> Acked-by: Jiri Olsa <jolsa@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexey Budankov <alexey.budankov@linux.intel.com> Cc: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Pavel Gerasimov <pavel.gerasimov@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@linux.ibm.com> Cc: Stephane Eranian <eranian@google.com> Cc: Vitaly Slobodskoy <vitaly.slobodskoy@intel.com> Link: http://lore.kernel.org/lkml/20200319202517.23423-10-kan.liang@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
![]() |
e2b23483eb |
perf machine: Factor out lbr_callchain_add_lbr_ip()
Both caller and callee needs to add ip from LBR to callchain. Factor out lbr_callchain_add_lbr_ip() to improve code readability. Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Reviewed-by: Andi Kleen <ak@linux.intel.com> Acked-by: Jiri Olsa <jolsa@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexey Budankov <alexey.budankov@linux.intel.com> Cc: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Pavel Gerasimov <pavel.gerasimov@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@linux.ibm.com> Cc: Stephane Eranian <eranian@google.com> Cc: Vitaly Slobodskoy <vitaly.slobodskoy@intel.com> Link: http://lore.kernel.org/lkml/20200319202517.23423-9-kan.liang@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
![]() |
dd3e249a0c |
perf machine: Factor out lbr_callchain_add_kernel_ip()
Both caller and callee needs to add kernel ip to callchain. Factor out lbr_callchain_add_kernel_ip() to improve code readability. Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Reviewed-by: Andi Kleen <ak@linux.intel.com> Acked-by: Jiri Olsa <jolsa@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexey Budankov <alexey.budankov@linux.intel.com> Cc: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Pavel Gerasimov <pavel.gerasimov@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@linux.ibm.com> Cc: Stephane Eranian <eranian@google.com> Cc: Vitaly Slobodskoy <vitaly.slobodskoy@intel.com> Link: http://lore.kernel.org/lkml/20200319202517.23423-8-kan.liang@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
![]() |
e48b8311ca |
perf machine: Refine the function for LBR call stack reconstruction
LBR only collect the user call stack. To reconstruct a call stack, both kernel call stack and user call stack are required. The function resolve_lbr_callchain_sample() mix the kernel call stack and user call stack. Now, with the help of HW idx, perf tool can reconstruct a more complete call stack by adding some user call stack from previous sample. However, current implementation is hard to be extended to support it. Current code path for resolve_lbr_callchain_sample() for (j = 0; j < mix_chain_nr; j++) { if (ORDER_CALLEE) { if (kernel callchain) Fill callchain info else if (LBR callchain) Fill callchain info } else { if (LBR callchain) Fill callchain info else if (kernel callchain) Fill callchain info } add_callchain_ip(); } With the patch, if (ORDER_CALLEE) { for (j = 0; j < NUM of kernel callchain) { Fill callchain info add_callchain_ip(); } for (; j < mix_chain_nr) { Fill callchain info add_callchain_ip(); } } else { for (; j < NUM of LBR callchain) { Fill callchain info add_callchain_ip(); } for (j = 0; j < mix_chain_nr) { Fill callchain info add_callchain_ip(); } } No functional changes. Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Reviewed-by: Andi Kleen <ak@linux.intel.com> Acked-by: Jiri Olsa <jolsa@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexey Budankov <alexey.budankov@linux.intel.com> Cc: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Pavel Gerasimov <pavel.gerasimov@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@linux.ibm.com> Cc: Stephane Eranian <eranian@google.com> Cc: Vitaly Slobodskoy <vitaly.slobodskoy@intel.com> Link: http://lore.kernel.org/lkml/20200319202517.23423-7-kan.liang@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
![]() |
f8603267bf |
perf machine: Remove the indent in resolve_lbr_callchain_sample
The indent is unnecessary in resolve_lbr_callchain_sample. Removing it will make the following patch simpler. Current code path for resolve_lbr_callchain_sample() /* LBR only affects the user callchain */ if (i != chain_nr) { body of the function .... return 1; } return 0; With the patch, /* LBR only affects the user callchain */ if (i == chain_nr) return 0; body of the function ... return 1; No functional changes. Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Reviewed-by: Andi Kleen <ak@linux.intel.com> Acked-by: Jiri Olsa <jolsa@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexey Budankov <alexey.budankov@linux.intel.com> Cc: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Pavel Gerasimov <pavel.gerasimov@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@linux.ibm.com> Cc: Stephane Eranian <eranian@google.com> Cc: Vitaly Slobodskoy <vitaly.slobodskoy@intel.com> Link: http://lore.kernel.org/lkml/20200319202517.23423-6-kan.liang@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
![]() |
6f91ea283a |
perf header: Support CPU PMU capabilities
To stitch LBR call stack, the max LBR information is required. So the CPU PMU capabilities information has to be stored in perf header. Add a new feature HEADER_CPU_PMU_CAPS for CPU PMU capabilities. Retrieve all CPU PMU capabilities, not just max LBR information. Add variable max_branches to facilitate future usage. Committer testing: # ls -la /sys/devices/cpu/caps/ total 0 drwxr-xr-x. 2 root root 0 Apr 17 10:53 . drwxr-xr-x. 6 root root 0 Apr 17 07:02 .. -r--r--r--. 1 root root 4096 Apr 17 10:53 max_precise # # cat /sys/devices/cpu/caps/max_precise 0 # perf record sleep 1 [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.033 MB perf.data (7 samples) ] # # perf report --header-only | egrep 'cpu(desc|.*capabilities)' # cpudesc : AMD Ryzen 5 3600X 6-Core Processor # cpu pmu capabilities: max_precise=0 # And then on an Intel machine: $ ls -la /sys/devices/cpu/caps/ total 0 drwxr-xr-x. 2 root root 0 Apr 17 10:51 . drwxr-xr-x. 6 root root 0 Apr 17 10:04 .. -r--r--r--. 1 root root 4096 Apr 17 11:37 branches -r--r--r--. 1 root root 4096 Apr 17 10:51 max_precise -r--r--r--. 1 root root 4096 Apr 17 11:37 pmu_name $ cat /sys/devices/cpu/caps/max_precise 3 $ cat /sys/devices/cpu/caps/branches 32 $ cat /sys/devices/cpu/caps/pmu_name skylake $ perf record sleep 1 [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.001 MB perf.data (8 samples) ] $ perf report --header-only | egrep 'cpu(desc|.*capabilities)' # cpudesc : Intel(R) Core(TM) i5-7500 CPU @ 3.40GHz # cpu pmu capabilities: branches=32, max_precise=3, pmu_name=skylake $ Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Reviewed-by: Andi Kleen <ak@linux.intel.com> Acked-by: Jiri Olsa <jolsa@redhat.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexey Budankov <alexey.budankov@linux.intel.com> Cc: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Pavel Gerasimov <pavel.gerasimov@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@linux.ibm.com> Cc: Stephane Eranian <eranian@google.com> Cc: Vitaly Slobodskoy <vitaly.slobodskoy@intel.com> Link: http://lore.kernel.org/lkml/20200319202517.23423-3-kan.liang@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
![]() |
3a6c51e4d6 |
perf parser: Add support to specify rXXX event with pmu
The current rXXXX event specification creates event under PERF_TYPE_RAW pmu type. This change allows to use rXXXX within pmu syntax, so it's type is used via the following syntax: -e 'cpu/r3c/' -e 'cpum_cf/r0/' The XXXX number goes directly to perf_event_attr::config the same way as in '-e rXXXX' event. The perf_event_attr::type is filled with pmu type. Committer testing: So, lets see what goes in perf_event_attr::config for, say, the 'instructions' PERF_TYPE_HARDWARE (0) event, first we should look at how to encode this event as a PERF_TYPE_RAW event for this specific CPU, an AMD Ryzen 5: # cat /sys/devices/cpu/events/instructions event=0xc0 # Then try with it _and_ the instruction, just to see that they are close enough: # perf stat -e rc0,instructions sleep 1 Performance counter stats for 'sleep 1': 919,794 rc0 919,898 instructions 1.000754579 seconds time elapsed 0.000715000 seconds user 0.000000000 seconds sys # Now we should try, before this patch, the PMU event encoding: # perf stat -e cpu/rc0/ sleep 1 event syntax error: 'cpu/rc0/' \___ unknown term valid terms: event,edge,inv,umask,cmask,config,config1,config2,name,period,percore # Now with this patch, the three ways of specifying the 'instructions' CPU counter are accepted: # perf stat -e cpu/rc0/,rc0,instructions sleep 1 Performance counter stats for 'sleep 1': 892,948 cpu/rc0/ 893,052 rc0 893,156 instructions 1.000931819 seconds time elapsed 0.000916000 seconds user 0.000000000 seconds sys # Requested-by: Thomas Richter <tmricht@linux.ibm.com> Signed-off-by: Jiri Olsa <jolsa@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Tested-by: Thomas Richter <tmricht@linux.ibm.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Michael Petlan <mpetlan@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Sumanth Korikkar <sumanthk@linux.ibm.com> Cc: Vasily Gorbik <gor@linux.ibm.com> Link: http://lore.kernel.org/lkml/20200416221405.437788-1-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
![]() |
e9cfa47e68 |
perf doc: allow ASCIIDOC_EXTRA to be an argument
This will allow parent makefiles to pass values to asciidoc. Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Alexey Budankov <alexey.budankov@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Andrii Nakryiko <andriin@fb.com> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: Florian Fainelli <f.fainelli@gmail.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Igor Lubashev <ilubashe@akamai.com> Cc: Jin Yao <yao.jin@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Jiwei Sun <jiwei.sun@windriver.com> Cc: John Garry <john.garry@huawei.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Martin KaFai Lau <kafai@fb.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Yonghong Song <yhs@fb.com> Cc: bpf@vger.kernel.org Cc: netdev@vger.kernel.org Cc: yuzhoujian <yuzhoujian@didichuxing.com> Link: http://lore.kernel.org/lkml/20200416162058.201954-2-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
![]() |
9fbc61f832 |
perf pmu: Add support for PMU capabilities
The PMU capabilities information, which is located at /sys/bus/event_source/devices/<dev>/caps, is required by perf tool. For example, the max LBR information is required to stitch LBR call stack. Add perf_pmu__caps_parse() to parse the PMU capabilities information. The information is stored in a list. The following patch will store the capabilities information in perf header. Committer notes: Here's an example of such directories and its files in an i5 7th gen machine: [root@seventh ~]# ls -lad /sys/bus/event_source/devices/*/caps drwxr-xr-x. 2 root root 0 Apr 14 13:33 /sys/bus/event_source/devices/cpu/caps drwxr-xr-x. 2 root root 0 Apr 14 13:33 /sys/bus/event_source/devices/intel_pt/caps [root@seventh ~]# ls -la /sys/bus/event_source/devices/intel_pt/caps total 0 drwxr-xr-x. 2 root root 0 Apr 14 13:33 . drwxr-xr-x. 5 root root 0 Apr 14 13:12 .. -r--r--r--. 1 root root 4096 Apr 16 13:10 cr3_filtering -r--r--r--. 1 root root 4096 Apr 16 11:42 cycle_thresholds -r--r--r--. 1 root root 4096 Apr 16 13:10 ip_filtering -r--r--r--. 1 root root 4096 Apr 16 13:10 max_subleaf -r--r--r--. 1 root root 4096 Apr 14 13:33 mtc -r--r--r--. 1 root root 4096 Apr 14 13:33 mtc_periods -r--r--r--. 1 root root 4096 Apr 16 13:10 num_address_ranges -r--r--r--. 1 root root 4096 Apr 16 13:10 output_subsys -r--r--r--. 1 root root 4096 Apr 16 13:10 payloads_lip -r--r--r--. 1 root root 4096 Apr 16 13:10 power_event_trace -r--r--r--. 1 root root 4096 Apr 14 13:33 psb_cyc -r--r--r--. 1 root root 4096 Apr 14 13:33 psb_periods -r--r--r--. 1 root root 4096 Apr 16 13:10 ptwrite -r--r--r--. 1 root root 4096 Apr 16 13:10 single_range_output -r--r--r--. 1 root root 4096 Apr 16 12:03 topa_multiple_entries -r--r--r--. 1 root root 4096 Apr 16 13:10 topa_output [root@seventh ~]# cat /sys/bus/event_source/devices/intel_pt/caps/topa_output 1 [root@seventh ~]# cat /sys/bus/event_source/devices/intel_pt/caps/topa_multiple_entries 1 [root@seventh ~]# cat /sys/bus/event_source/devices/intel_pt/caps/mtc 1 [root@seventh ~]# cat /sys/bus/event_source/devices/intel_pt/caps/power_event_trace 0 [root@seventh ~]# [root@seventh ~]# ls -la /sys/bus/event_source/devices/cpu/caps/ total 0 drwxr-xr-x. 2 root root 0 Apr 14 13:33 . drwxr-xr-x. 6 root root 0 Apr 14 13:12 .. -r--r--r--. 1 root root 4096 Apr 16 13:10 branches -r--r--r--. 1 root root 4096 Apr 14 13:33 max_precise -r--r--r--. 1 root root 4096 Apr 16 13:10 pmu_name [root@seventh ~]# cat /sys/bus/event_source/devices/cpu/caps/max_precise 3 [root@seventh ~]# cat /sys/bus/event_source/devices/cpu/caps/branches 32 [root@seventh ~]# cat /sys/bus/event_source/devices/cpu/caps/pmu_name skylake [root@seventh ~]# Wow, first time I've heard about /sys/bus/event_source/devices/cpu/caps/max_precise, I think I'll use it! :-) Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Reviewed-by: Andi Kleen <ak@linux.intel.com> Acked-by: Jiri Olsa <jolsa@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexey Budankov <alexey.budankov@linux.intel.com> Cc: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Pavel Gerasimov <pavel.gerasimov@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@linux.ibm.com> Cc: Stephane Eranian <eranian@google.com> Cc: Vitaly Slobodskoy <vitaly.slobodskoy@intel.com> Link: http://lore.kernel.org/lkml/20200319202517.23423-2-kan.liang@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
![]() |
bec49a9e05 |
perf stat: Force error in fallback on :k events
When it is not possible for a non-privilege perf command to monitor at the kernel level (:k), the fallback code forces a :u. That works if the event was previously monitoring both levels. But if the event was already constrained to kernel only, then it does not make sense to restrict it to user only. Given the code works by exclusion, a kernel only event would have: attr->exclude_user = 1 The fallback code would add: attr->exclude_kernel = 1 In the end the end would not monitor in either the user level or kernel level. In other words, it would count nothing. An event programmed to monitor kernel only cannot be switched to user only without seriously warning the user. This patch forces an error in this case to make it clear the request cannot really be satisfied. Behavior with paranoid 1: $ sudo bash -c "echo 1 > /proc/sys/kernel/perf_event_paranoid" $ perf stat -e cycles:k sleep 1 Performance counter stats for 'sleep 1': 1,520,413 cycles:k 1.002361664 seconds time elapsed 0.002480000 seconds user 0.000000000 seconds sys Old behavior with paranoid 2: $ sudo bash -c "echo 2 > /proc/sys/kernel/perf_event_paranoid" $ perf stat -e cycles:k sleep 1 Performance counter stats for 'sleep 1': 0 cycles:ku 1.002358127 seconds time elapsed 0.002384000 seconds user 0.000000000 seconds sys New behavior with paranoid 2: $ sudo bash -c "echo 2 > /proc/sys/kernel/perf_event_paranoid" $ perf stat -e cycles:k sleep 1 Error: You may not have permission to collect stats. Consider tweaking /proc/sys/kernel/perf_event_paranoid, which controls use of the performance events system by unprivileged users (without CAP_PERFMON or CAP_SYS_ADMIN). The current value is 2: -1: Allow use of (almost) all events by all users Ignore mlock limit after perf_event_mlock_kb without CAP_IPC_LOCK >= 0: Disallow ftrace function tracepoint by users without CAP_PERFMON or CAP_SYS_ADMIN Disallow raw tracepoint access by users without CAP_SYS_PERFMON or CAP_SYS_ADMIN >= 1: Disallow CPU event access by users without CAP_PERFMON or CAP_SYS_ADMIN >= 2: Disallow kernel profiling by users without CAP_PERFMON or CAP_SYS_ADMIN To make this setting permanent, edit /etc/sysctl.conf too, e.g.: kernel.perf_event_paranoid = -1 v2 of this patch addresses the review feedback from jolsa@redhat.com. Signed-off-by: Stephane Eranian <eranian@google.com> Reviewed-by: Ian Rogers <irogers@google.com> Acked-by: Jiri Olsa <jolsa@redhat.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lore.kernel.org/lkml/20200414161550.225588-1-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
![]() |
e345997914 |
perf tools: Add support for leader-sampling with AUX area events
When AUX area events are used in sampling mode, they must be the group leader, but the group leader is also used for leader-sampling. However, it is not desirable to use an AUX area event as the leader for leader-sampling, because it doesn't have any samples of its own. To support leader-sampling with AUX area events, use the 2nd event of the group as the "leader" for the purposes of leader-sampling. Example: # perf record --kcore --aux-sample -e '{intel_pt//,cycles,instructions}:S' -c 10000 uname [ perf record: Woken up 3 times to write data ] [ perf record: Captured and wrote 0.786 MB perf.data ] # perf report Samples: 380 of events 'anon group { cycles, instructions }', Event count (approx.): 3026164 Children Self Command Shared Object Symbol + 38.76% 42.65% 0.00% 0.00% uname [kernel.kallsyms] [k] __x86_indirect_thunk_rax + 35.82% 31.33% 0.00% 0.00% uname ld-2.28.so [.] _dl_start_user + 34.29% 29.74% 0.55% 0.47% uname ld-2.28.so [.] _dl_start + 33.73% 28.62% 1.60% 0.97% uname ld-2.28.so [.] dl_main + 33.19% 29.04% 0.52% 0.32% uname ld-2.28.so [.] _dl_sysdep_start + 27.83% 33.74% 0.00% 0.00% uname [kernel.kallsyms] [k] do_syscall_64 + 26.76% 33.29% 0.00% 0.00% uname [kernel.kallsyms] [k] entry_SYSCALL_64_after_hwframe + 23.78% 20.33% 5.97% 5.25% uname [kernel.kallsyms] [k] page_fault + 23.18% 24.60% 0.00% 0.00% uname libc-2.28.so [.] __libc_start_main + 22.64% 24.37% 0.00% 0.00% uname uname [.] _start + 21.04% 23.27% 0.00% 0.00% uname uname [.] main + 19.48% 18.08% 3.72% 3.64% uname ld-2.28.so [.] _dl_relocate_object + 19.47% 21.81% 0.00% 0.00% uname libc-2.28.so [.] setlocale + 19.44% 21.56% 0.52% 0.61% uname libc-2.28.so [.] _nl_find_locale + 17.87% 19.66% 0.00% 0.00% uname libc-2.28.so [.] _nl_load_locale_from_archive + 15.71% 13.73% 0.53% 0.52% uname [kernel.kallsyms] [k] do_page_fault + 15.18% 13.21% 1.03% 0.68% uname [kernel.kallsyms] [k] handle_mm_fault + 14.15% 12.53% 1.01% 1.12% uname [kernel.kallsyms] [k] __handle_mm_fault + 12.03% 9.67% 0.54% 0.32% uname ld-2.28.so [.] _dl_map_object + 10.55% 8.48% 0.00% 0.00% uname ld-2.28.so [.] openaux + 10.55% 20.20% 0.52% 0.61% uname libc-2.28.so [.] __run_exit_handlers Comnmitter notes: Fixed up this problem: util/record.c: In function ‘perf_evlist__config’: util/record.c:256:3: error: too few arguments to function ‘perf_evsel__config_leader_sampling’ 256 | perf_evsel__config_leader_sampling(evsel); | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ util/record.c:190:13: note: declared here 190 | static void perf_evsel__config_leader_sampling(struct evsel *evsel, | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Link: http://lore.kernel.org/lkml/20200401101613.6201-17-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
![]() |
94d3820f2e |
perf evlist: Allow multiple read formats
Tools find the correct evsel, and therefore read format, using the event ID, so it isn't necessary for all read formats to be the same. In the case of leader-sampling of AUX area events, dummy tracking events will have a different read format, so relax the validation to become a debug message only. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Link: http://lore.kernel.org/lkml/20200401101613.6201-16-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
![]() |
3713eb371c |
perf evsel: Rearrange perf_evsel__config_leader_sampling()
In preparation for adding support for leader sampling with AUX area events. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Link: http://lore.kernel.org/lkml/20200401101613.6201-15-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
![]() |
5f34278867 |
perf evlist: Move leader-sampling configuration
Move leader-sampling configuration in preparation for adding support for leader sampling with AUX area events. Committer notes: It only makes sense when configuring an evsel that is part of an evlist, so the only case where it is called outside perf_evlist__config(), in some 'perf test' entry, is safe, and even there we should just use perf_evlist__config(), but since in that case we have just one evsel in the evlist, it is equivalent. Also fixed up this problem: util/record.c: In function ‘perf_evlist__config’: util/record.c:223:3: error: too many arguments to function ‘perf_evsel__config_leader_sampling’ 223 | perf_evsel__config_leader_sampling(evsel, evlist); | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ util/record.c:170:13: note: declared here 170 | static void perf_evsel__config_leader_sampling(struct evsel *evsel) | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Link: http://lore.kernel.org/lkml/20200401101613.6201-14-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
![]() |
e12ee9f751 |
perf evsel: Move and globalize perf_evsel__find_pmu() and perf_evsel__is_aux_event()
Move and globalize 2 functions from the auxtrace specific sources so that they can be reused. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Link: http://lore.kernel.org/lkml/20200401101613.6201-13-adrian.hunter@intel.com [ Move to pmu.c, as moving to evsel.h breaks the python binding ] Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
![]() |
2855c05cf1 |
perf intel-pt: Add support for synthesizing callchains for regular events
Currently, callchains can be synthesized only for synthesized events. Support also synthesizing callchains for regular events. Example: # perf record --kcore --aux-sample -e '{intel_pt//,cycles}' -c 10000 uname Linux [ perf record: Woken up 3 times to write data ] [ perf record: Captured and wrote 0.532 MB perf.data ] # perf script --itrace=Ge | head -20 uname 4864 2419025.358181: 10000 cycles: ffffffffbba56965 apparmor_bprm_committing_creds+0x35 ([kernel.kallsyms]) ffffffffbc400cd5 __indirect_thunk_start+0x5 ([kernel.kallsyms]) ffffffffbba07422 security_bprm_committing_creds+0x22 ([kernel.kallsyms]) ffffffffbb89805d install_exec_creds+0xd ([kernel.kallsyms]) ffffffffbb90d9ac load_elf_binary+0x3ac ([kernel.kallsyms]) uname 4864 2419025.358185: 10000 cycles: ffffffffbba56db0 apparmor_bprm_committed_creds+0x20 ([kernel.kallsyms]) ffffffffbc400cd5 __indirect_thunk_start+0x5 ([kernel.kallsyms]) ffffffffbba07452 security_bprm_committed_creds+0x22 ([kernel.kallsyms]) ffffffffbb89809a install_exec_creds+0x4a ([kernel.kallsyms]) ffffffffbb90d9ac load_elf_binary+0x3ac ([kernel.kallsyms]) uname 4864 2419025.358189: 10000 cycles: ffffffffbb86fdf6 vma_adjust_trans_huge+0x6 ([kernel.kallsyms]) ffffffffbb821660 __vma_adjust+0x160 ([kernel.kallsyms]) ffffffffbb897be7 shift_arg_pages+0x97 ([kernel.kallsyms]) ffffffffbb897ed9 setup_arg_pages+0x1e9 ([kernel.kallsyms]) ffffffffbb90d9f2 load_elf_binary+0x3f2 ([kernel.kallsyms]) Committer testing: # perf record --kcore --aux-sample -e '{intel_pt//,cycles}' -c 10000 uname Linux [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.233 MB perf.data ] # Then, before this patch: # perf script --itrace=Ge | head -20 uname 28642 168664.856384: 10000 cycles: ffffffff9810aeaa commit_creds+0x2a ([kernel.kallsyms]) uname 28642 168664.856388: 10000 cycles: ffffffff982a24f1 mprotect_fixup+0x151 ([kernel.kallsyms]) uname 28642 168664.856392: 10000 cycles: ffffffff982a385b move_page_tables+0xbcb ([kernel.kallsyms]) uname 28642 168664.856396: 10000 cycles: ffffffff982fd4ec __mod_memcg_state+0x1c ([kernel.kallsyms]) uname 28642 168664.856400: 10000 cycles: ffffffff9829fddd do_mmap+0xfd ([kernel.kallsyms]) uname 28642 168664.856404: 10000 cycles: ffffffff9829c879 __vma_adjust+0x479 ([kernel.kallsyms]) uname 28642 168664.856408: 10000 cycles: ffffffff98238e94 __perf_addr_filters_adjust+0x34 ([kernel.kallsyms]) uname 28642 168664.856412: 10000 cycles: ffffffff98a38e0b down_write+0x1b ([kernel.kallsyms]) uname 28642 168664.856416: 10000 cycles: ffffffff983006a0 memcg_kmem_get_cache+0x0 ([kernel.kallsyms]) uname 28642 168664.856421: 10000 cycles: ffffffff98396eaf load_elf_binary+0x92f ([kernel.kallsyms]) uname 28642 168664.856425: 10000 cycles: ffffffff982e0222 kfree+0x62 ([kernel.kallsyms]) uname 28642 168664.856428: 10000 cycles: ffffffff9846dfd4 file_has_perm+0x54 ([kernel.kallsyms]) uname 28642 168664.856433: 10000 cycles: ffffffff98288911 vma_interval_tree_insert+0x51 ([kernel.kallsyms]) uname 28642 168664.856437: 10000 cycles: ffffffff9823e577 perf_event_mmap_output+0x27 ([kernel.kallsyms]) uname 28642 168664.856441: 10000 cycles: ffffffff98a26fa0 xas_load+0x40 ([kernel.kallsyms]) uname 28642 168664.856445: 10000 cycles: ffffffff98004f30 arch_setup_additional_pages+0x0 ([kernel.kallsyms]) uname 28642 168664.856448: 10000 cycles: ffffffff98a297c0 copy_user_generic_unrolled+0xa0 ([kernel.kallsyms]) uname 28642 168664.856452: 10000 cycles: ffffffff9853a87a strnlen_user+0x10a ([kernel.kallsyms]) uname 28642 168664.856456: 10000 cycles: ffffffff986638a7 randomize_page+0x27 ([kernel.kallsyms]) uname 28642 168664.856460: 10000 cycles: ffffffff98a3b645 _raw_spin_lock+0x5 ([kernel.kallsyms]) # And after: # perf script --itrace=Ge | head -20 uname 28642 168664.856384: 10000 cycles: ffffffff9810aeaa commit_creds+0x2a ([kernel.kallsyms]) ffffffff9831fe87 install_exec_creds+0x17 ([kernel.kallsyms]) ffffffff983968d9 load_elf_binary+0x359 ([kernel.kallsyms]) ffffffff98e00c45 __x86_indirect_thunk_rax+0x5 ([kernel.kallsyms]) ffffffff98e00c45 __x86_indirect_thunk_rax+0x5 ([kernel.kallsyms]) uname 28642 168664.856388: 10000 cycles: ffffffff982a24f1 mprotect_fixup+0x151 ([kernel.kallsyms]) ffffffff9831fa83 setup_arg_pages+0x123 ([kernel.kallsyms]) ffffffff9839691f load_elf_binary+0x39f ([kernel.kallsyms]) ffffffff98e00c45 __x86_indirect_thunk_rax+0x5 ([kernel.kallsyms]) ffffffff98e00c45 __x86_indirect_thunk_rax+0x5 ([kernel.kallsyms]) uname 28642 168664.856392: 10000 cycles: ffffffff982a385b move_page_tables+0xbcb ([kernel.kallsyms]) ffffffff9831f889 shift_arg_pages+0xa9 ([kernel.kallsyms]) ffffffff9831fb4f setup_arg_pages+0x1ef ([kernel.kallsyms]) ffffffff9839691f load_elf_binary+0x39f ([kernel.kallsyms]) ffffffff98e00c45 __x86_indirect_thunk_rax+0x5 ([kernel.kallsyms]) # Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Link: http://lore.kernel.org/lkml/20200401101613.6201-12-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
![]() |
e11869a065 |
perf evsel: Add support for synthesized sample type
For reporting purposes, an evsel sample can have a callchain synthesized from AUX area data. Add support for keeping track of synthesized sample types. Note, the recorded sample_type cannot be changed because it is needed to continue to parse events. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Link: http://lore.kernel.org/lkml/20200401101613.6201-11-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
![]() |
8e94b3243a |
perf evsel: Be consistent when looking which evsel PERF_SAMPLE_ bits are set
Using 'type' variable for checking for callchains is equivalent to using evsel__has_callchain(evsel) and is how the other PERF_SAMPLE_ bits are checked in this function, so use it to be consistent. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Link: http://lore.kernel.org/lkml/20200401101613.6201-11-adrian.hunter@intel.com [ split from a larger patch ] Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
![]() |
4fef41bfb1 |
perf thread-stack: Add thread_stack__sample_late()
Add a thread stack function to create a call chain for hardware events where the sample records get created some time after the event occurred. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Link: http://lore.kernel.org/lkml/20200401101613.6201-10-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
![]() |
1c5c25b3fd |
perf auxtrace: Add an option to synthesize callchains for regular events
Currently, callchains can be synthesized only for synthesized events. Add an itrace option to synthesize callchains for regular events. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Link: http://lore.kernel.org/lkml/20200401101613.6201-9-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
![]() |
5c7bec0c9c |
perf auxtrace: For reporting purposes, un-group AUX area event
An AUX area event must be the group leader when recording traces in sample mode, but that does not produce the expected results from 'perf report' because it expects the leader to provide samples. Rather than teach 'perf report' about AUX area sampling, un-group the AUX area event during processing, making the 2nd event the leader. Example: $ perf record -e '{intel_pt//u,branch-misses:u}' -c 1 uname Linux [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.080 MB perf.data ] Before: $ perf report Samples: 800 of events 'anon group { intel_pt//u, branch-misses:u }', Event count (approx.): 800 Children Self Command Shared Object Symbol 0.00% 47.50% 0.00% 47.50% uname libc-2.28.so [.] _dl_addr 0.00% 16.38% 0.00% 16.38% uname ld-2.28.so [.] __GI___tunables_init 0.00% 54.75% 0.00% 4.75% uname ld-2.28.so [.] dl_main 0.00% 3.12% 0.00% 3.12% uname ld-2.28.so [.] _dl_map_object_from_fd 0.00% 2.38% 0.00% 2.38% uname ld-2.28.so [.] strcmp 0.00% 2.25% 0.00% 2.25% uname ld-2.28.so [.] _dl_check_map_versions 0.00% 2.00% 0.00% 2.00% uname ld-2.28.so [.] _dl_important_hwcaps 0.00% 2.00% 0.00% 2.00% uname ld-2.28.so [.] _dl_map_object_deps 0.00% 51.50% 0.00% 1.50% uname ld-2.28.so [.] _dl_sysdep_start 0.00% 1.25% 0.00% 1.25% uname ld-2.28.so [.] _dl_load_cache_lookup 0.00% 51.12% 0.00% 1.12% uname ld-2.28.so [.] _dl_start 0.00% 50.88% 0.00% 1.12% uname ld-2.28.so [.] do_lookup_x 0.00% 50.62% 0.00% 1.00% uname ld-2.28.so [.] _dl_lookup_symbol_x 0.00% 1.00% 0.00% 1.00% uname ld-2.28.so [.] _dl_map_object 0.00% 1.00% 0.00% 1.00% uname ld-2.28.so [.] _dl_next_ld_env_entry 0.00% 0.88% 0.00% 0.88% uname ld-2.28.so [.] _dl_cache_libcmp 0.00% 0.88% 0.00% 0.88% uname ld-2.28.so [.] _dl_new_object 0.00% 50.88% 0.00% 0.88% uname ld-2.28.so [.] _dl_relocate_object 0.00% 0.62% 0.00% 0.62% uname ld-2.28.so [.] _dl_init_paths 0.00% 0.62% 0.00% 0.62% uname ld-2.28.so [.] _dl_name_match_p 0.00% 0.50% 0.00% 0.50% uname ld-2.28.so [.] get_common_indeces.constprop.1 0.00% 0.50% 0.00% 0.50% uname ld-2.28.so [.] memmove 0.00% 0.50% 0.00% 0.50% uname ld-2.28.so [.] memset 0.00% 0.50% 0.00% 0.50% uname ld-2.28.so [.] open_verify.constprop.11 0.00% 0.38% 0.00% 0.38% uname ld-2.28.so [.] _dl_check_all_versions 0.00% 0.38% 0.00% 0.38% uname ld-2.28.so [.] _dl_find_dso_for_object 0.00% 0.38% 0.00% 0.38% uname ld-2.28.so [.] init_tls 0.00% 0.25% 0.00% 0.25% uname ld-2.28.so [.] __tunable_get_val 0.00% 0.25% 0.00% 0.25% uname ld-2.28.so [.] _dl_add_to_namespace_list 0.00% 0.25% 0.00% 0.25% uname ld-2.28.so [.] _dl_determine_tlsoffset 0.00% 0.25% 0.00% 0.25% uname ld-2.28.so [.] _dl_discover_osversion 0.00% 0.25% 0.00% 0.25% uname ld-2.28.so [.] calloc@plt 0.00% 0.25% 0.00% 0.25% uname ld-2.28.so [.] malloc 0.00% 0.25% 0.00% 0.25% uname ld-2.28.so [.] malloc@plt 0.00% 0.25% 0.00% 0.25% uname libc-2.28.so [.] _nl_load_locale_from_archive 0.00% 0.25% 0.00% 0.25% uname [unknown] [k] 0xffffffffa3a00010 0.00% 0.12% 0.00% 0.12% uname ld-2.28.so [.] __libc_scratch_buffer_set_array_size 0.00% 0.12% 0.00% 0.12% uname ld-2.28.so [.] _dl_allocate_tls_storage 0.00% 0.12% 0.00% 0.12% uname ld-2.28.so [.] _dl_catch_exception 0.00% 0.12% 0.00% 0.12% uname ld-2.28.so [.] _dl_setup_hash 0.00% 0.12% 0.00% 0.12% uname ld-2.28.so [.] _dl_sort_maps 0.00% 0.12% 0.00% 0.12% uname ld-2.28.so [.] _dl_sysdep_read_whole_file 0.00% 0.12% 0.00% 0.12% uname ld-2.28.so [.] access 0.00% 0.12% 0.00% 0.12% uname ld-2.28.so [.] calloc 0.00% 0.12% 0.00% 0.12% uname ld-2.28.so [.] mmap64 0.00% 0.12% 0.00% 0.12% uname ld-2.28.so [.] openaux 0.00% 0.12% 0.00% 0.12% uname ld-2.28.so [.] rtld_lock_default_lock_recursive 0.00% 0.12% 0.00% 0.12% uname ld-2.28.so [.] rtld_lock_default_unlock_recursive 0.00% 0.12% 0.00% 0.12% uname ld-2.28.so [.] strchr 0.00% 0.12% 0.00% 0.12% uname ld-2.28.so [.] strlen 0.00% 0.12% 0.00% 0.12% uname ld-2.28.so [.] 0x0000000000001080 0.00% 0.12% 0.00% 0.12% uname libc-2.28.so [.] __strchrnul_avx2 0.00% 0.12% 0.00% 0.12% uname libc-2.28.so [.] _nl_normalize_codeset 0.00% 0.12% 0.00% 0.12% uname libc-2.28.so [.] malloc 0.00% 0.12% 0.00% 0.12% uname [unknown] [k] 0xffffffffa3a011f0 0.00% 50.00% 0.00% 0.00% uname ld-2.28.so [.] _dl_start_user 0.00% 50.00% 0.00% 0.00% uname [unknown] [.] 0000000000000000 After: Samples: 800 of event 'branch-misses:u', Event count (approx.): 800 Children Self Command Shared Object Symbol 54.75% 4.75% uname ld-2.28.so [.] dl_main 51.50% 1.50% uname ld-2.28.so [.] _dl_sysdep_start 51.12% 1.12% uname ld-2.28.so [.] _dl_start 50.88% 0.88% uname ld-2.28.so [.] _dl_relocate_object 50.88% 1.12% uname ld-2.28.so [.] do_lookup_x 50.62% 1.00% uname ld-2.28.so [.] _dl_lookup_symbol_x 50.00% 0.00% uname ld-2.28.so [.] _dl_start_user 50.00% 0.00% uname [unknown] [.] 0000000000000000 47.50% 47.50% uname libc-2.28.so [.] _dl_addr 16.38% 16.38% uname ld-2.28.so [.] __GI___tunables_init 3.12% 3.12% uname ld-2.28.so [.] _dl_map_object_from_fd 2.38% 2.38% uname ld-2.28.so [.] strcmp 2.25% 2.25% uname ld-2.28.so [.] _dl_check_map_versions 2.00% 2.00% uname ld-2.28.so [.] _dl_important_hwcaps 2.00% 2.00% uname ld-2.28.so [.] _dl_map_object_deps 1.25% 1.25% uname ld-2.28.so [.] _dl_load_cache_lookup 1.00% 1.00% uname ld-2.28.so [.] _dl_map_object 1.00% 1.00% uname ld-2.28.so [.] _dl_next_ld_env_entry 0.88% 0.88% uname ld-2.28.so [.] _dl_cache_libcmp 0.88% 0.88% uname ld-2.28.so [.] _dl_new_object 0.62% 0.62% uname ld-2.28.so [.] _dl_init_paths 0.62% 0.62% uname ld-2.28.so [.] _dl_name_match_p 0.50% 0.50% uname ld-2.28.so [.] get_common_indeces.constprop.1 0.50% 0.50% uname ld-2.28.so [.] memmove 0.50% 0.50% uname ld-2.28.so [.] memset 0.50% 0.50% uname ld-2.28.so [.] open_verify.constprop.11 0.38% 0.38% uname ld-2.28.so [.] _dl_check_all_versions 0.38% 0.38% uname ld-2.28.so [.] _dl_find_dso_for_object 0.38% 0.38% uname ld-2.28.so [.] init_tls 0.25% 0.25% uname ld-2.28.so [.] __tunable_get_val 0.25% 0.25% uname ld-2.28.so [.] _dl_add_to_namespace_list 0.25% 0.25% uname ld-2.28.so [.] _dl_determine_tlsoffset 0.25% 0.25% uname ld-2.28.so [.] _dl_discover_osversion 0.25% 0.25% uname ld-2.28.so [.] calloc@plt 0.25% 0.25% uname ld-2.28.so [.] malloc 0.25% 0.25% uname ld-2.28.so [.] malloc@plt 0.25% 0.25% uname libc-2.28.so [.] _nl_load_locale_from_archive 0.25% 0.25% uname [unknown] [k] 0xffffffffa3a00010 0.12% 0.12% uname ld-2.28.so [.] __libc_scratch_buffer_set_array_size 0.12% 0.12% uname ld-2.28.so [.] _dl_allocate_tls_storage 0.12% 0.12% uname ld-2.28.so [.] _dl_catch_exception 0.12% 0.12% uname ld-2.28.so [.] _dl_setup_hash 0.12% 0.12% uname ld-2.28.so [.] _dl_sort_maps 0.12% 0.12% uname ld-2.28.so [.] _dl_sysdep_read_whole_file 0.12% 0.12% uname ld-2.28.so [.] access 0.12% 0.12% uname ld-2.28.so [.] calloc 0.12% 0.12% uname ld-2.28.so [.] mmap64 0.12% 0.12% uname ld-2.28.so [.] openaux 0.12% 0.12% uname ld-2.28.so [.] rtld_lock_default_lock_recursive 0.12% 0.12% uname ld-2.28.so [.] rtld_lock_default_unlock_recursive 0.12% 0.12% uname ld-2.28.so [.] strchr 0.12% 0.12% uname ld-2.28.so [.] strlen 0.12% 0.12% uname ld-2.28.so [.] 0x0000000000001080 0.12% 0.12% uname libc-2.28.so [.] __strchrnul_avx2 0.12% 0.12% uname libc-2.28.so [.] _nl_normalize_codeset 0.12% 0.12% uname libc-2.28.so [.] malloc 0.12% 0.12% uname [unknown] [k] 0xffffffffa3a011f0 Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Link: http://lore.kernel.org/lkml/20200401101613.6201-8-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
![]() |
113fcb46cf |
perf s390-cpumsf: Implement ->evsel_is_auxtrace() callback
Implement ->evsel_is_auxtrace() callback. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Acked-by: Thomas Richter <tmricht@linux.ibm.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Link: http://lore.kernel.org/lkml/20200401101613.6201-7-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
![]() |
a58ab57caa |
perf cs-etm: Implement ->evsel_is_auxtrace() callback
Implement ->evsel_is_auxtrace() callback. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Reviewed-by: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Link: http://lore.kernel.org/lkml/20200401101613.6201-6-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
![]() |
508c71e3f9 |
perf arm-spe: Implement ->evsel_is_auxtrace() callback
Implement ->evsel_is_auxtrace() callback. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Reviewed-by: Leo Yan <leo.yan@linaro.org> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Kim Phillips <kim.phillips@arm.com> Link: http://lore.kernel.org/lkml/20200401101613.6201-5-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
![]() |
966246f597 |
perf intel-bts: Implement ->evsel_is_auxtrace() callback
Implement ->evsel_is_auxtrace() callback. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Link: http://lore.kernel.org/lkml/20200401101613.6201-4-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
![]() |
6b52bb07c3 |
perf intel-pt: Implement ->evsel_is_auxtrace() callback
Implement ->evsel_is_auxtrace() callback. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Link: http://lore.kernel.org/lkml/20200401101613.6201-3-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
![]() |
853f37d75c |
perf auxtrace: Add ->evsel_is_auxtrace() callback
Add ->evsel_is_auxtrace() callback to identify if a selected event is an AUX area event. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Kim Phillips <kim.phillips@arm.com> Cc: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Thomas Richter <tmricht@linux.ibm.com> Link: http://lore.kernel.org/lkml/20200401101613.6201-2-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
![]() |
5287f92692 |
perf script: Add flamegraph.py script
This script works in tandem with d3-flame-graph to generate flame graphs from perf. It supports two output formats: JSON and HTML (the default). The HTML format will look for a standalone d3-flame-graph template file in /usr/share/d3-flame-graph/d3-flamegraph-base.html and fill in the collected stacks. Usage: perf record -a -g -F 99 sleep 60 perf script report flamegraph Combined: perf script flamegraph -a -F 99 sleep 60 Committer testing: Tested both with "PYTHON=python3" and with the default, that uses python2-devel: Complete set of instructions: $ mkdir /tmp/build/perf $ make PYTHON=python3 -C tools/perf O=/tmp/build/perf install-bin $ export PATH=~/bin:$PATH $ perf record -a -g -F 99 sleep 60 $ perf script report flamegraph Now go and open the generated flamegraph.html file in a browser. At first this required building with PYTHON=python3, but after I reported this Andreas was kind enough to send a patch making it work with both python and python3. Signed-off-by: Andreas Gerstmayr <agerstmayr@redhat.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Brendan Gregg <bgregg@netflix.com> Cc: Martin Spier <mspier@netflix.com> Link: http://lore.kernel.org/lkml/20200320151355.66302-1-agerstmayr@redhat.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
![]() |
47352aba40 |
perf metrictroup: Split the metricgroup__add_metric function
This patch refactors metricgroup__add_metric function where some part of it move to function metricgroup__add_metric_param. No logic change. Signed-off-by: Kajol Jain <kjain@linux.ibm.com> Acked-by: Jiri Olsa <jolsa@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Anju T Sudhakar <anju@linux.vnet.ibm.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Jin Yao <yao.jin@linux.intel.com> Cc: Joe Mario <jmario@redhat.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Madhavan Srinivasan <maddy@linux.vnet.ibm.com> Cc: Mamatha Inamdar <mamatha4@linux.vnet.ibm.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Michael Petlan <mpetlan@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Paul Mackerras <paulus@ozlabs.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@linux.ibm.com> Cc: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linuxppc-dev@lists.ozlabs.org Link: http://lore.kernel.org/lkml/20200401203340.31402-4-kjain@linux.ibm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
![]() |
871f9f599d |
perf expr: Add expr_scanner_ctx object
Add the expr_scanner_ctx object to hold user data for the expr scanner. Currently it holds only start_token, Kajol Jain will use it to hold 24x7 runtime param. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Anju T Sudhakar <anju@linux.vnet.ibm.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Jin Yao <yao.jin@linux.intel.com> Cc: Joe Mario <jmario@redhat.com> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Madhavan Srinivasan <maddy@linux.vnet.ibm.com> Cc: Mamatha Inamdar <mamatha4@linux.vnet.ibm.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Michael Petlan <mpetlan@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Paul Mackerras <paulus@ozlabs.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@linux.ibm.com> Cc: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linuxppc-dev@lists.ozlabs.org Link: http://lore.kernel.org/lkml/20200401203340.31402-3-kjain@linux.ibm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
![]() |
aecce63e2b |
perf expr: Add expr_ prefix for parse_ctx and parse_id
Adding expr_ prefix for parse_ctx and parse_id, to straighten out the expr* namespace. There's no functional change. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Anju T Sudhakar <anju@linux.vnet.ibm.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Jin Yao <yao.jin@linux.intel.com> Cc: Joe Mario <jmario@redhat.com> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Madhavan Srinivasan <maddy@linux.vnet.ibm.com> Cc: Mamatha Inamdar <mamatha4@linux.vnet.ibm.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Michael Petlan <mpetlan@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Paul Mackerras <paulus@ozlabs.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@linux.ibm.com> Cc: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linuxppc-dev@lists.ozlabs.org Link: http://lore.kernel.org/lkml/20200401203340.31402-2-kjain@linux.ibm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
![]() |
04ed4ccb9c |
perf synthetic-events: save 4kb from 2 stack frames
Reuse an existing char buffer to avoid two PATH_MAX sized char buffers. Reduces stack frame sizes by 4kb. perf_event__synthesize_mmap_events before 'sub $0x45b8,%rsp' after 'sub $0x35b8,%rsp'. perf_event__get_comm_ids before 'sub $0x2028,%rsp' after 'sub $0x1028,%rsp'. The performance impact of this change is negligible. Signed-off-by: Ian Rogers <irogers@google.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andrey Zhizhikin <andrey.z@gmail.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Kefeng Wang <wangkefeng.wang@huawei.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Petr Mladek <pmladek@suse.com> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lore.kernel.org/lkml/20200402154357.107873-4-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
![]() |
2a4b51666a |
perf bench: Add event synthesis benchmark
Event synthesis may occur at the start or end (tail) of a perf command. In system-wide mode it can scan every process in /proc, which may add seconds of latency before event recording. Add a new benchmark that times how long event synthesis takes with and without data synthesis. An example execution looks like: $ perf bench internals synthesize # Running 'internals/synthesize' benchmark: Average synthesis took: 168.253800 usec Average data synthesis took: 208.104700 usec Signed-off-by: Ian Rogers <irogers@google.com> Acked-by: Jiri Olsa <jolsa@redhat.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andrey Zhizhikin <andrey.z@gmail.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Kefeng Wang <wangkefeng.wang@huawei.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Petr Mladek <pmladek@suse.com> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lore.kernel.org/lkml/20200402154357.107873-2-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
![]() |
1a2725f3ee |
perf script: Simplify auxiliary event printing functions
This simplifies the print functions for the following perf script options: --show-task-events --show-namespace-events --show-cgroup-events --show-mmap-events --show-switch-events --show-lost-events --show-bpf-events Example: # perf record --switch-events -a -e cycles -c 10000 sleep 1 Before: # perf script --show-task-events --show-namespace-events --show-cgroup-events --show-mmap-events --show-switch-events --show-lost-events --show-bpf-events > out-before.txt After: # perf script --show-task-events --show-namespace-events --show-cgroup-events --show-mmap-events --show-switch-events --show-lost-events --show-bpf-events > out-after.txt # diff -s out-before.txt out-after.txt Files out-before.txt and out-after.tx are identical Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Acked-by: Jiri Olsa <jolsa@redhat.com> Link: http://lore.kernel.org/lkml/20200402141548.21283-1-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
![]() |
6b3e0e2e04 |
perf tools: Support CAP_PERFMON capability
Extend error messages to mention CAP_PERFMON capability as an option to substitute CAP_SYS_ADMIN capability for secure system performance monitoring and observability operations. Make perf_event_paranoid_check() and __cmd_ftrace() to be aware of CAP_PERFMON capability. CAP_PERFMON implements the principle of least privilege for performance monitoring and observability operations (POSIX IEEE 1003.1e 2.2.2.39 principle of least privilege: A security design principle that states that a process or program be granted only those privileges (e.g., capabilities) necessary to accomplish its legitimate function, and only for the time that such privileges are actually required) For backward compatibility reasons access to perf_events subsystem remains open for CAP_SYS_ADMIN privileged processes but CAP_SYS_ADMIN usage for secure perf_events monitoring is discouraged with respect to CAP_PERFMON capability. Committer testing: Using a libcap with this patch: diff --git a/libcap/include/uapi/linux/capability.h b/libcap/include/uapi/linux/capability.h index 78b2fd4c8a95..89b5b0279b60 100644 --- a/libcap/include/uapi/linux/capability.h +++ b/libcap/include/uapi/linux/capability.h @@ -366,8 +366,9 @@ struct vfs_ns_cap_data { #define CAP_AUDIT_READ 37 +#define CAP_PERFMON 38 -#define CAP_LAST_CAP CAP_AUDIT_READ +#define CAP_LAST_CAP CAP_PERFMON #define cap_valid(x) ((x) >= 0 && (x) <= CAP_LAST_CAP) Note that using '38' in place of 'cap_perfmon' works to some degree with an old libcap, its only when cap_get_flag() is called that libcap performs an error check based on the maximum value known for capabilities that it will fail. This makes determining the default of perf_event_attr.exclude_kernel to fail, as it can't determine if CAP_PERFMON is in place. Using 'perf top -e cycles' avoids the default check and sets perf_event_attr.exclude_kernel to 1. As root, with a libcap supporting CAP_PERFMON: # groupadd perf_users # adduser perf -g perf_users # mkdir ~perf/bin # cp ~acme/bin/perf ~perf/bin/ # chgrp perf_users ~perf/bin/perf # setcap "cap_perfmon,cap_sys_ptrace,cap_syslog=ep" ~perf/bin/perf # getcap ~perf/bin/perf /home/perf/bin/perf = cap_sys_ptrace,cap_syslog,cap_perfmon+ep # ls -la ~perf/bin/perf -rwxr-xr-x. 1 root perf_users 16968552 Apr 9 13:10 /home/perf/bin/perf As the 'perf' user in the 'perf_users' group: $ perf top -a --stdio Error: Failed to mmap with 1 (Operation not permitted) $ Either add the cap_ipc_lock capability to the perf binary or reduce the ring buffer size to some smaller value: $ perf top -m10 -a --stdio rounding mmap pages size to 64K (16 pages) Error: Failed to mmap with 1 (Operation not permitted) $ perf top -m4 -a --stdio Error: Failed to mmap with 1 (Operation not permitted) $ perf top -m2 -a --stdio PerfTop: 762 irqs/sec kernel:49.7% exact: 100.0% lost: 0/0 drop: 0/0 [4000Hz cycles], (all, 4 CPUs) ------------------------------------------------------------------------------------------------------ 9.83% perf [.] __symbols__insert 8.58% perf [.] rb_next 5.91% [kernel] [k] module_get_kallsym 5.66% [kernel] [k] kallsyms_expand_symbol.constprop.0 3.98% libc-2.29.so [.] __GI_____strtoull_l_internal 3.66% perf [.] rb_insert_color 2.34% [kernel] [k] vsnprintf 2.30% [kernel] [k] string_nocheck 2.16% libc-2.29.so [.] _IO_getdelim 2.15% [kernel] [k] number 2.13% [kernel] [k] format_decode 1.58% libc-2.29.so [.] _IO_feof 1.52% libc-2.29.so [.] __strcmp_avx2 1.50% perf [.] rb_set_parent_color 1.47% libc-2.29.so [.] __libc_calloc 1.24% [kernel] [k] do_syscall_64 1.17% [kernel] [k] __x86_indirect_thunk_rax $ perf record -a sleep 1 [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.552 MB perf.data (74 samples) ] $ perf evlist cycles $ perf evlist -v cycles: size: 120, { sample_period, sample_freq }: 4000, sample_type: IP|TID|TIME|CPU|PERIOD, read_format: ID, disabled: 1, inherit: 1, mmap: 1, comm: 1, freq: 1, task: 1, precise_ip: 3, sample_id_all: 1, exclude_guest: 1, mmap2: 1, comm_exec: 1, ksymbol: 1, bpf_event: 1 $ perf report | head -20 # To display the perf.data header info, please use --header/--header-only options. # # # Total Lost Samples: 0 # # Samples: 74 of event 'cycles' # Event count (approx.): 15694834 # # Overhead Command Shared Object Symbol # ........ ............... .......................... ...................................... # 19.62% perf [kernel.vmlinux] [k] strnlen_user 13.88% swapper [kernel.vmlinux] [k] intel_idle 13.83% ksoftirqd/0 [kernel.vmlinux] [k] pfifo_fast_dequeue 13.51% swapper [kernel.vmlinux] [k] kmem_cache_free 6.31% gnome-shell [kernel.vmlinux] [k] kmem_cache_free 5.66% kworker/u8:3+ix [kernel.vmlinux] [k] delay_tsc 4.42% perf [kernel.vmlinux] [k] __set_cpus_allowed_ptr 3.45% kworker/2:1-eve [kernel.vmlinux] [k] shmem_truncate_range 2.29% gnome-shell libgobject-2.0.so.0.6000.7 [.] g_closure_ref $ Signed-off-by: Alexey Budankov <alexey.budankov@linux.intel.com> Reviewed-by: James Morris <jamorris@linux.microsoft.com> Acked-by: Jiri Olsa <jolsa@redhat.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Andi Kleen <ak@linux.intel.com> Cc: Igor Lubashev <ilubashe@akamai.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Serge Hallyn <serge@hallyn.com> Cc: Song Liu <songliubraving@fb.com> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: intel-gfx@lists.freedesktop.org Cc: linux-doc@vger.kernel.org Cc: linux-man@vger.kernel.org Cc: linux-security-module@vger.kernel.org Cc: selinux@vger.kernel.org Link: http://lore.kernel.org/lkml/a66d5648-2b8e-577e-e1f2-1d56c017ab5e@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
![]() |
3c29d4483e |
perf annotate: Add basic support for bpf_image
Add the DSO_BINARY_TYPE__BPF_IMAGE dso binary type to recognize BPF images that carry trampoline or dispatcher. Upcoming patches will add support to read the image data, store it within the BPF feature in perf.data and display it for annotation purposes. Currently we only display following message: # ./perf annotate bpf_trampoline_24456 --stdio Percent | Source code & Disassembly of . for cycles (504 ... --------------------------------------------------------------- ... : to be implemented Signed-off-by: Jiri Olsa <jolsa@kernel.org> Acked-by: Song Liu <songliubraving@fb.com> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Andrii Nakryiko <andriin@fb.com> Cc: Björn Töpel <bjorn.topel@intel.com> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: David S. Miller <davem@redhat.com> Cc: Jakub Kicinski <kuba@kernel.org> Cc: Jesper Dangaard Brouer <hawk@kernel.org> Cc: John Fastabend <john.fastabend@gmail.com> Cc: Martin KaFai Lau <kafai@fb.com> Cc: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/20200312195610.346362-16-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
![]() |
7eddf7e74e |
perf machine: Set ksymbol dso as loaded on arrival
There's no special load action for ksymbol data on map__load/dso__load action, where the kernel is getting loaded. It only gets confused with kernel kallsyms/vmlinux load for bpf object, which fails and could mess up with the map. Disabling any further load of the map for ksymbol related dso/map. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Acked-by: Song Liu <songliubraving@fb.com> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Andrii Nakryiko <andriin@fb.com> Cc: Björn Töpel <bjorn.topel@intel.com> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: David S. Miller <davem@redhat.com> Cc: Jakub Kicinski <kuba@kernel.org> Cc: Jesper Dangaard Brouer <hawk@kernel.org> Cc: John Fastabend <john.fastabend@gmail.com> Cc: Martin KaFai Lau <kafai@fb.com> Cc: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/20200312195610.346362-15-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
![]() |
943930e472 |
perf tools: Synthesize bpf_trampoline/dispatcher ksymbol event
Synthesize bpf images (trampolines/dispatchers) on start, as ksymbol events from /proc/kallsyms. Having this perf can recognize samples from those images and perf report and top shows them correctly. The rest of the ksymbol handling is already in place from for the bpf programs monitoring, so only the initial state was needed. perf report output: # Overhead Command Shared Object Symbol 12.37% test_progs [kernel.vmlinux] [k] entry_SYSCALL_64 11.80% test_progs [kernel.vmlinux] [k] syscall_return_via_sysret 9.63% test_progs bpf_prog_bcf7977d3b93787c_prog2 [k] bpf_prog_bcf7977d3b93787c_prog2 6.90% test_progs bpf_trampoline_24456 [k] bpf_trampoline_24456 6.36% test_progs [kernel.vmlinux] [k] memcpy_erms Committer notes: Use scnprintf() instead of strncpy() to overcome this on fedora:32, rawhide and OpenMandriva Cooker: CC /tmp/build/perf/util/bpf-event.o In file included from /usr/include/string.h:495, from /git/linux/tools/lib/bpf/libbpf_common.h:12, from /git/linux/tools/lib/bpf/bpf.h:31, from util/bpf-event.c:4: In function 'strncpy', inlined from 'process_bpf_image' at util/bpf-event.c:323:2, inlined from 'kallsyms_process_symbol' at util/bpf-event.c:358:9: /usr/include/bits/string_fortified.h:106:10: error: '__builtin_strncpy' specified bound 256 equals destination size [-Werror=stringop-truncation] 106 | return __builtin___strncpy_chk (__dest, __src, __len, __bos (__dest)); | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ cc1: all warnings being treated as errors Signed-off-by: Jiri Olsa <jolsa@kernel.org> Acked-by: Song Liu <songliubraving@fb.com> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Andrii Nakryiko <andriin@fb.com> Cc: Björn Töpel <bjorn.topel@intel.com> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: David S. Miller <davem@redhat.com> Cc: Jakub Kicinski <kuba@kernel.org> Cc: Jesper Dangaard Brouer <hawk@kernel.org> Cc: John Fastabend <john.fastabend@gmail.com> Cc: Martin KaFai Lau <kafai@fb.com> Cc: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/20200312195610.346362-14-jolsa@kernel.org/ Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
![]() |
cfbd41b786 |
perf stat: Honour --timeout for forked workloads
When --timeout is used and a workload is specified to be started by
'perf stat', i.e.
$ perf stat --timeout 1000 sleep 1h
The --timeout wasn't being honoured, i.e. the workload, 'sleep 1h' in
the above example, should be terminated after 1000ms, but it wasn't,
'perf stat' was waiting for it to finish.
Fix it by sending a SIGTERM when the timeout expires.
Now it works:
# perf stat -e cycles --timeout 1234 sleep 1h
sleep: Terminated
Performance counter stats for 'sleep 1h':
1,066,692 cycles
1.234314838 seconds time elapsed
0.000750000 seconds user
0.000000000 seconds sys
#
Fixes:
|
|
![]() |
e3698b23ec |
tools headers: Synchronize linux/bits.h with the kernel sources
To pick up the changes in these csets: |
|
![]() |
d8ed4d7aeb |
tools headers: Update x86's syscall_64.tbl with the kernel sources
To pick the changes from: |
|
![]() |
f60b3878f4 |
tools headers UAPI: Sync linux/mman.h with the kernel
To get the changes in:
|
|
![]() |
027fa8fb63 |
tools headers UAPI: Sync sched.h with the kernel
To get the changes in:
|
|
![]() |
ca64d84e93 |
tools headers: Update linux/vdso.h and grab a copy of vdso/const.h
To get in line with:
|
|
![]() |
8358f698ec |
perf stat: Fix no metric header if --per-socket and --metric-only set
We received a report that was no metric header displayed if --per-socket and --metric-only were both set. It's hard for script to parse the perf-stat output. This patch fixes this issue. Before: root@kbl-ppc:~# perf stat -a -M CPI --metric-only --per-socket ^C Performance counter stats for 'system wide': S0 8 2.6 2.215270071 seconds time elapsed root@kbl-ppc:~# perf stat -a -M CPI --metric-only --per-socket -I1000 # time socket cpus 1.000411692 S0 8 2.2 2.001547952 S0 8 3.4 3.002446511 S0 8 3.4 4.003346157 S0 8 4.0 5.004245736 S0 8 0.3 After: root@kbl-ppc:~# perf stat -a -M CPI --metric-only --per-socket ^C Performance counter stats for 'system wide': CPI S0 8 2.1 1.813579830 seconds time elapsed root@kbl-ppc:~# perf stat -a -M CPI --metric-only --per-socket -I1000 # time socket cpus CPI 1.000415122 S0 8 3.2 2.001630051 S0 8 2.9 3.002612278 S0 8 4.3 4.003523594 S0 8 3.0 5.004504256 S0 8 3.7 Signed-off-by: Jin Yao <yao.jin@linux.intel.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lore.kernel.org/lkml/20200331180226.25915-1-yao.jin@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
![]() |
9a00df311b |
perf python: Check if clang supports -fno-semantic-interposition
The set of C compiler options used by distros to build python bindings may include options that are unknown to clang, we check for a variety of such options, add -fno-semantic-interposition to that mix: This fixes the build on, among others, Manjaro Linux: GEN /tmp/build/perf/python/perf.so clang-9: error: unknown argument: '-fno-semantic-interposition' error: command 'clang' failed with exit status 1 make: Leaving directory '/git/perf/tools/perf' [perfbuilder@602aed1c266d ~]$ gcc -v Using built-in specs. COLLECT_GCC=gcc COLLECT_LTO_WRAPPER=/usr/lib/gcc/x86_64-pc-linux-gnu/9.3.0/lto-wrapper Target: x86_64-pc-linux-gnu Configured with: /build/gcc/src/gcc/configure --prefix=/usr --libdir=/usr/lib --libexecdir=/usr/lib --mandir=/usr/share/man --infodir=/usr/share/info --with-pkgversion='Arch Linux 9.3.0-1' --with-bugurl=https://bugs.archlinux.org/ --enable-languages=c,c++,ada,fortran,go,lto,objc,obj-c++,d --enable-shared --enable-threads=posix --with-system-zlib --with-isl --enable-__cxa_atexit --disable-libunwind-exceptions --enable-clocale=gnu --disable-libstdcxx-pch --disable-libssp --enable-gnu-unique-object --enable-linker-build-id --enable-lto --enable-plugin --enable-install-libiberty --with-linker-hash-style=gnu --enable-gnu-indirect-function --enable-multilib --disable-werror --enable-checking=release --enable-default-pie --enable-default-ssp --enable-cet=auto gdc_include_dir=/usr/include/dlang/gdc Thread model: posix gcc version 9.3.0 (Arch Linux 9.3.0-1) [perfbuilder@602aed1c266d ~]$ Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
![]() |
c48b07226b |
perf updates all over the place:
core: - Support for cgroup tracking in samples to allow cgroup based analysis tools: - Support for cgroup analysis - Commandline option and hotkey for perf top to change the sort order - A set of fixes all over the place - Various build system related improvements - Updates of the X86 pmu event JSON data - Documentation updates -----BEGIN PGP SIGNATURE----- iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAl6J2iATHHRnbHhAbGlu dXRyb25peC5kZQAKCRCmGPVMDXSYoXMEEACy6WaNabX7EzkLUnX0WCgxnZVSryNR 4EnxLJSg5lChEe4q2mOS3mRMpzlXHQieWcFxlwda7kjIbFlgQvjJQUiYlAvxRRO/ giY/GwCtTi/Flcb+7wKxTgMYmtAUOZDWeQbBZUlFLi9vyeCHVkjget9EyVsgbe/W EmRsrPuKOVMUTeEwm3zpIE051DObpiWLNge++My70q5W/yNsS94PbNydgKO7osqk pX37YVyBFpI2IQxMGzaE3yK7OxXRjYljZaz1tONFDMEYOX9gmxpDsCCflsP1ZOzL 4/P4faRvAOElwxtYBelKmRl8eboqhRpTEK0Et0TI0LYbUZrE2nisDi0LTKPWQb0k Om2Qi6AfZs67PVzx9htlx6rfee72+sUluz5BDKOGH0pNJ8CFy70ns8InLsZqbBZ7 SgFVNjx6bHxB58VuVE9WEzr/KVs6zI/SuJlH7WG7FLXbm5j0cETfCjg40JlDpSNh CZs+Epky1zyytrVJ9Gc7KnRlw8VB2eWEQ2cQ0sqj5w7WxhfrnsCmQf1zk4sofhOF iz3Dvb9Llz/pYWGZbEiQAuI+8bo0psJptidzpmpbIXs/woKDhW49w1FxqowJQMWe +lGWcauSfo3gjgEnTkOWAx3yiH4i9rvRChX8Jh0Z07d6Kwf19YYfxcuhlkx0Wutj eKaaErWDtWAxpA== =2UUD -----END PGP SIGNATURE----- Merge tag 'perf-urgent-2020-04-05' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull more perf updates from Thomas Gleixner: "Perf updates all over the place: core: - Support for cgroup tracking in samples to allow cgroup based analysis tools: - Support for cgroup analysis - Commandline option and hotkey for perf top to change the sort order - A set of fixes all over the place - Various build system related improvements - Updates of the X86 pmu event JSON data - Documentation updates" * tag 'perf-urgent-2020-04-05' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (55 commits) perf python: Fix clang detection to strip out options passed in $CC perf tools: Support Python 3.8+ in Makefile perf script: Fix invalid read of directory entry after closedir() perf script report: Fix SEGFAULT when using DWARF mode perf script: add -S/--symbols documentation perf pmu-events x86: Use CPU_CLK_UNHALTED.THREAD in Kernel_Utilization metric perf events parser: Add missing Intel CPU events to parser perf script: Allow --symbol to accept hexadecimal addresses perf report/top TUI: Fix title line formatting perf top: Support hotkey to change sort order perf top: Support --group-sort-idx to change the sort order perf symbols: Fix arm64 gap between kernel start and module end perf build-test: Honour JOBS to override detection of number of cores perf script: Add --show-cgroup-events option perf top: Add --all-cgroups option perf record: Add --all-cgroups option perf record: Support synthesizing cgroup events perf report: Add 'cgroup' sort key perf cgroup: Maintain cgroup hierarchy perf tools: Basic support for CGROUP event ... |
|
![]() |
ff2ae607c6 |
SPDX patches for 5.7-rc1.
Here are 3 SPDX patches for 5.7-rc1. One fixes up the SPDX tag for a single driver, while the other two go through the tree and add SPDX tags for all of the .gitignore files as needed. Nothing too complex, but you will get a merge conflict with your current tree, that should be trivial to handle (one file modified by two things, one file deleted.) All 3 of these have been in linux-next for a while, with no reported issues other than the merge conflict. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> -----BEGIN PGP SIGNATURE----- iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCXodg5A8cZ3JlZ0Brcm9h aC5jb20ACgkQMUfUDdst+ykySQCgy9YDrkz7nWq6v3Gohl6+lW/L+rMAnRM4uTZm m5AuCzO3Azt9KBi7NL+L =2Lm5 -----END PGP SIGNATURE----- Merge tag 'spdx-5.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/spdx Pull SPDX updates from Greg KH: "Here are three SPDX patches for 5.7-rc1. One fixes up the SPDX tag for a single driver, while the other two go through the tree and add SPDX tags for all of the .gitignore files as needed. Nothing too complex, but you will get a merge conflict with your current tree, that should be trivial to handle (one file modified by two things, one file deleted.) All three of these have been in linux-next for a while, with no reported issues other than the merge conflict" * tag 'spdx-5.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/spdx: ASoC: MT6660: make spdxcheck.py happy .gitignore: add SPDX License Identifier .gitignore: remove too obvious comments |
|
![]() |
9ff76cea4e |
perf python: Fix clang detection to strip out options passed in $CC
The clang check in the python setup.py file expected $CC to be just the
name of the compiler, not the compiler + options, i.e. all options were
expected to be passed in $CFLAGS, this ends up making it fail in systems
where CC is set to, e.g.:
"aarch64-linaro-linux-gcc --sysroot=/oe/build/tmp/work/juno-linaro-linux/perf/1.0-r9/recipe-sysroot"
Like this:
$ python3
>>> from subprocess import Popen
>>> a = Popen(["aarch64-linux-gnu-gcc --sysroot=/oe/build/tmp/work/juno-linaro-linux/perf/1.0-r9/recipe-sysroot", "-v"])
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib/python3.6/subprocess.py", line 729, in __init__
restore_signals, start_new_session)
File "/usr/lib/python3.6/subprocess.py", line 1364, in _execute_child
raise child_exception_type(errno_num, err_msg, err_filename)
FileNotFoundError: [Errno 2] No such file or directory: 'aarch64-linux-gnu-gcc --sysroot=/oe/build/tmp/work/juno-linaro-linux/perf/1.0-r9/recipe-sysroot': 'aarch64-linux-gnu-gcc --sysroot=/oe/build/tmp/work/juno-linaro-linux/perf/1.0-r9/recipe-sysroot'
>>>
Make it more robust, covering this case, by passing cc.split()[0] as the
first arg to popen().
Fixes:
|
|
![]() |
b9c9ce4e59 |
perf tools: Support Python 3.8+ in Makefile
Python 3.8 changed the output of 'python-config --ldflags' to no longer include the '-lpythonX.Y' flag (this apparently fixed an issue loading modules with a statically linked Python executable). The libpython feature check in linux/build/feature fails if the Python library is not included in FEATURE_CHECK_LDFLAGS-libpython variable. This adds a check in the Makefile to determine if PYTHON_CONFIG accepts the '--embed' flag and passes that flag alongside '--ldflags' if so. tools/perf is the only place the libpython feature check is used. Signed-off-by: Sam Lunt <samuel.j.lunt@gmail.com> Tested-by: He Zhe <zhe.he@windriver.com> Link: http://lore.kernel.org/lkml/c56be2e1-8111-9dfe-8298-f7d0f9ab7431@windriver.com Acked-by: Jiri Olsa <jolsa@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: trivial@kernel.org Cc: stable@kernel.org Link: http://lore.kernel.org/lkml/20200131181123.tmamivhq4b7uqasr@gmail.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
![]() |
27486a85cb |
perf script: Fix invalid read of directory entry after closedir()
closedir(lang_dir) frees the memory of script_dirent->d_name, which gets accessed in the next line in a call to scnprintf(). Valgrind report: Invalid read of size 1 ==413557== at 0x483CBE6: strlen (vg_replace_strmem.c:461) ==413557== by 0x4DD45FD: __vfprintf_internal (vfprintf-internal.c:1688) ==413557== by 0x4DE6679: __vsnprintf_internal (vsnprintf.c:114) ==413557== by 0x53A037: vsnprintf (stdio2.h:80) ==413557== by 0x53A037: scnprintf (vsprintf.c:21) ==413557== by 0x435202: get_script_path (builtin-script.c:3223) ==413557== Address 0x52e7313 is 1,139 bytes inside a block of size 32,816 free'd ==413557== at 0x483AA0C: free (vg_replace_malloc.c:540) ==413557== by 0x4E303C0: closedir (closedir.c:50) ==413557== by 0x4351DC: get_script_path (builtin-script.c:3222) Signed-off-by: Andreas Gerstmayr <agerstmayr@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lore.kernel.org/lkml/20200402124337.419456-1-agerstmayr@redhat.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
![]() |
1a4025f060 |
perf script report: Fix SEGFAULT when using DWARF mode
When running perf script report with a Python script and a callgraph in DWARF mode, intr_regs->regs can be 0 and therefore crashing the regs_map function. Added a check for this condition (same check as in builtin-script.c:595). Signed-off-by: Andreas Gerstmayr <agerstmayr@redhat.com> Tested-by: Kim Phillips <kim.phillips@amd.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt (VMware) <rostedt@goodmis.org> Link: http://lore.kernel.org/lkml/20200402125417.422232-1-agerstmayr@redhat.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
![]() |
628d736d91 |
perf script: add -S/--symbols documentation
Capture both that this option exists and that symbols can be hexadecimal addresses. Signed-off-by: Ian Rogers <irogers@google.com> Suggested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: http://lore.kernel.org/lkml/20200402174130.140319-1-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
![]() |
8ed1faf015 |
perf pmu-events x86: Use CPU_CLK_UNHALTED.THREAD in Kernel_Utilization metric
The kernel utilization metric does multiplexing currently and is somewhat unreliable. The problem is that it uses two instances of the fixed counter, and the kernel has to multipleplex which causes errors. So should use CPU_CLK_UNHALTED.THREAD instead. Before: # perf stat -M Kernel_Utilization -- sleep 1 Performance counter stats for 'sleep 1': 1,419,425 cpu_clk_unhalted.ref_tsc:k <not counted> cpu_clk_unhalted.ref_tsc (0.00%) After: # perf stat -M Kernel_Utilization -- sleep 1 Performance counter stats for 'sleep 1': 746,688 cpu_clk_unhalted.thread:k # 0.7 Kernel_Utilization 1,088,348 cpu_clk_unhalted.thread Signed-off-by: Jin Yao <yao.jin@linux.intel.com> Reviewed-by: Andi Kleen <ak@linux.intel.com> Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Jin Yao <yao.jin@intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lore.kernel.org/lkml/20200309013125.7559-1-yao.jin@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
![]() |
47327f5667 |
perf events parser: Add missing Intel CPU events to parser
perf list expects CPU events to be parseable by name, e.g. # perf list | grep el-capacity-read el-capacity-read OR cpu/el-capacity-read/ [Kernel PMU event] But the event parser does not recognize them that way, e.g. # perf test -v "Parse event" <SNIP> running test 54 'cycles//u' running test 55 'cycles:k' running test 0 'cpu/config=10,config1,config2=3,period=1000/u' running test 1 'cpu/config=1,name=krava/u,cpu/config=2/u' running test 2 'cpu/config=1,call-graph=fp,time,period=100000/,cpu/config=2,call-graph=no,time=0,period=2000/' running test 3 'cpu/name='COMPLEX_CYCLES_NAME:orig=cycles,desc=chip-clock-ticks',period=0x1,event=0x2/ukp' -> cpu/event=0,umask=0x11/ -> cpu/event=0,umask=0x13/ -> cpu/event=0x54,umask=0x1/ failed to parse event 'el-capacity-read:u,cpu/event=el-capacity-read/u', err 1, str 'parser error' event syntax error: 'el-capacity-read:u,cpu/event=el-capacity-read/u' \___ parser error test child finished with 1 ---- end ---- Parse event definition strings: FAILED! This happens because the parser splits names by '-' in order to deal with cache events. For example 'L1-dcache' is a token in parse-events.l which is matched to 'L1-dcache-load-miss' by the following rule: PE_NAME_CACHE_TYPE '-' PE_NAME_CACHE_OP_RESULT '-' PE_NAME_CACHE_OP_RESULT opt_event_config And so there is special handling for 2-part PMU names i.e. PE_PMU_EVENT_PRE '-' PE_PMU_EVENT_SUF sep_dc but no handling for 3-part names, which are instead added as tokens e.g. topdown-[a-z-]+ While it would be possible to add a rule for 3-part names, that would not work if the first parts were also a valid PMU name e.g. 'el-capacity-read' would be matched to 'el-capacity' before the parser reached the 3rd part. The parser would need significant change to rationalize all this, so instead fix for now by adding missing Intel CPU events with 3-part names to the event parser as tokens. Missing events were found by using: grep -r EVENT_ATTR_STR arch/x86/events/intel/core.c Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Acked-by: Jiri Olsa <jolsa@redhat.com> Cc: Kan Liang <kan.liang@linux.intel.com> Link: http://lore.kernel.org/lkml/90c7ae07-c568-b6d3-f9c4-d0c1528a0610@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
![]() |
d2bedb7863 |
perf script: Allow --symbol to accept hexadecimal addresses
This patch extends the perf script --symbols option to filter on hexadecimal addresses in addition to symbol names. This makes it easier to handle cases where symbols are aliased. With this patch, it is possible to mix and match symbols and hexadecimal addresses using the --symbols option. $ perf script --symbols=noploop,0x4007a0 Signed-off-by: Stephane Eranian <eranian@google.com> Reviewed-by: Ian Rogers <irogers@google.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lore.kernel.org/lkml/20200325220802.15039-1-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
![]() |
376c3c22e2 |
perf report/top TUI: Fix title line formatting
In |
|
![]() |
2605af0f32 |
perf top: Support hotkey to change sort order
It would be nice if we can use a hotkey in perf top browser to select a event for sorting. For example: perf top --group -e cycles,instructions,cache-misses Samples Overhead Shared Object Symbol 40.03% 45.71% 0.03% div [.] main 20.46% 14.67% 0.21% libc-2.27.so [.] __random_r 20.01% 19.54% 0.02% libc-2.27.so [.] __random 9.68% 10.68% 0.00% div [.] compute_flag 4.32% 4.70% 0.00% libc-2.27.so [.] rand 3.84% 3.43% 0.00% div [.] rand@plt 0.05% 0.05% 2.33% libc-2.27.so [.] __strcmp_sse2_unaligned 0.04% 0.08% 2.43% perf [.] perf_hpp__is_dynamic_en 0.04% 0.02% 6.64% perf [.] rb_next 0.04% 0.01% 3.87% perf [.] dso__find_symbol 0.04% 0.04% 1.77% perf [.] sort__dso_cmp When user press hotkey '2' (event index, starting from 0), it indicates to sort output by the third event in group (cache-misses). Samples Overhead Shared Object Symbol 4.07% 1.28% 6.68% perf [.] rb_next 3.57% 3.98% 4.11% perf [.] __hists__insert_output 3.67% 11.24% 3.60% perf [.] perf_hpp__is_dynamic_e 3.67% 3.20% 3.20% perf [.] hpp__sort_overhead 0.81% 0.06% 3.01% perf [.] dso__find_symbol 1.62% 5.47% 2.51% perf [.] hists__match 2.70% 1.86% 2.47% libc-2.27.so [.] _int_malloc 0.19% 0.00% 2.29% [kernel] [k] copy_page 0.41% 0.32% 1.98% perf [.] hists__decay_entries 1.84% 3.67% 1.68% perf [.] sort__dso_cmp 0.16% 0.00% 1.63% [kernel] [k] clear_page_erms Now the output is sorted by cache-misses. v2: --- Zero the history if hotkey is pressed. Signed-off-by: Jin Yao <yao.jin@linux.intel.com> Suggested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jin Yao <yao.jin@intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lore.kernel.org/lkml/20200324220711.6025-2-yao.jin@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
![]() |
df7deb2cce |
perf top: Support --group-sort-idx to change the sort order
'perf report' supports the option --group-sort-idx, which sorts the output by the event at the index n in event group. For example: perf record -e cycles,instructions,cache-misses perf report --group --group-sort-idx 2 --stdio The perf-report output is sorted by cache-misses. This patch supports --group-sort-idx in perf-top. For example: perf top --group -e cycles,instructions,cache-misses --group-sort-idx 2 The perf-top output is sorted by cache-misses. Signed-off-by: Jin Yao <yao.jin@linux.intel.com> Suggested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jin Yao <yao.jin@intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lore.kernel.org/lkml/20200324220711.6025-1-yao.jin@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
![]() |
78886f3ed3 |
perf symbols: Fix arm64 gap between kernel start and module end
During execution of command 'perf report' in my arm64 virtual machine,
this error message is showed:
failed to process sample
__symbol__inc_addr_samples(860): ENOMEM! sym->name=__this_module,
start=0x1477100, addr=0x147dbd8, end=0x80002000, func: 0
The error is caused with path:
cmd_report
__cmd_report
perf_session__process_events
__perf_session__process_events
ordered_events__flush
__ordered_events__flush
oe->deliver (ordered_events__deliver_event)
perf_session__deliver_event
machines__deliver_event
perf_evlist__deliver_sample
tool->sample (process_sample_event)
hist_entry_iter__add
iter->add_entry_cb(hist_iter__report_callback)
hist_entry__inc_addr_samples
symbol__inc_addr_samples
__symbol__inc_addr_samples
h = annotated_source__histogram(src, evidx) (NULL)
annotated_source__histogram failed is caused with path:
...
hist_entry__inc_addr_samples
symbol__inc_addr_samples
symbol__hists
annotated_source__alloc_histograms
src->histograms = calloc(nr_hists, sizeof_sym_hist) (failed)
Calloc failed as the symbol__size(sym) is too huge. As show in error
message: start=0x1477100, end=0x80002000, size of symbol is about 2G.
This is the same problem as 'perf annotate: Fix s390 gap between kernel
end and module start (
|
|
![]() |
7b1642f2fc |
perf build-test: Honour JOBS to override detection of number of cores
When one does: $ make -C tools/perf build-test The makefile in tools/perf/tests/ will, just like the main one, detect how many cores are in the system and use it with -j. Sometimes we may need to override that, for instance, when using icecream or distcc to use multiple machines in the build process, then we need to, as with the main makefile, use: $ make JOBS=N -C tools/perf build-test Fix the tests makefile to honour that. Acked-by: Jiri Olsa <jolsa@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Link: http://lore.kernel.org/lkml/20200330130301.GA31702@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
![]() |
160d4af97b |
perf script: Add --show-cgroup-events option
The --show-cgroup-events option is to print CGROUP events in the output like others. Committer testing: [root@seventh ~]# perf record --all-cgroups --namespaces /wb/cgtest [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.039 MB perf.data (487 samples) ] [root@seventh ~]# perf script --show-cgroup-events | grep PERF_RECORD_CGROUP -B2 -A2 swapper 0 0.000000: PERF_RECORD_CGROUP cgroup: 1 / perf 12145 11200.440730: 1 cycles: ffffffffb900d58b __intel_pmu_enable_all.constprop.0+0x3b (/lib/modules/5.6.0-rc6-00008-gfe2413eefd7f/build/vmlinux) perf 12145 11200.440733: 1 cycles: ffffffffb900d58b __intel_pmu_enable_all.constprop.0+0x3b (/lib/modules/5.6.0-rc6-00008-gfe2413eefd7f/build/vmlinux) -- cgtest 12145 11200.440739: 193472 cycles: ffffffffb90f6fbc commit_creds+0x1fc (/lib/modules/5.6.0-rc6-00008-gfe2413eefd7f/build/vmlinux) cgtest 12145 11200.440790: 2691608 cycles: 7fa2cb43019b _dl_sysdep_start+0x7cb (/usr/lib64/ld-2.29.so) cgtest 12145 11200.440962: PERF_RECORD_CGROUP cgroup: 83 /sub cgtest 12147 11200.441054: 1 cycles: ffffffffb900d58b __intel_pmu_enable_all.constprop.0+0x3b (/lib/modules/5.6.0-rc6-00008-gfe2413eefd7f/build/vmlinux) cgtest 12147 11200.441057: 1 cycles: ffffffffb900d58b __intel_pmu_enable_all.constprop.0+0x3b (/lib/modules/5.6.0-rc6-00008-gfe2413eefd7f/build/vmlinux) -- cgtest 12148 11200.441103: 10227 cycles: ffffffffb9a0153d end_repeat_nmi+0x48 (/lib/modules/5.6.0-rc6-00008-gfe2413eefd7f/build/vmlinux) cgtest 12148 11200.441106: 273295 cycles: ffffffffb99ecbc7 copy_page+0x7 (/lib/modules/5.6.0-rc6-00008-gfe2413eefd7f/build/vmlinux) cgtest 12147 11200.441133: PERF_RECORD_CGROUP cgroup: 88 /sub/cgrp1 cgtest 12147 11200.441143: 2788845 cycles: ffffffffb94676c2 security_genfs_sid+0x102 (/lib/modules/5.6.0-rc6-00008-gfe2413eefd7f/build/vmlinux) cgtest 12148 11200.441162: PERF_RECORD_CGROUP cgroup: 93 /sub/cgrp2 cgtest 12148 11200.441182: 2669546 cycles: 401020 _init+0x20 (/wb/cgtest) cgtest 12149 11200.441247: 1 cycles: ffffffffb900d58b __intel_pmu_enable_all.constprop.0+0x3b (/lib/modules/5.6.0-rc6-00008-gfe2413eefd7f/build/vmlinux) [root@seventh ~]# Signed-off-by: Namhyung Kim <namhyung@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lore.kernel.org/lkml/20200325124536.2800725-10-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
![]() |
f382842fa0 |
perf top: Add --all-cgroups option
The --all-cgroups option is to enable cgroup profiling support. It tells kernel to record CGROUP events in the ring buffer so that 'perf top' can identify task/cgroup association later. Committer testing: Use: # perf top --all-cgroups -s cgroup_id,cgroup,pid Signed-off-by: Namhyung Kim <namhyung@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lore.kernel.org/lkml/20200325124536.2800725-9-namhyung@kernel.org Link: http://lore.kernel.org/lkml/20200402015249.3800462-1-namhyung@kernel.org [ Extracted the HAVE_FILE_HANDLE from the followup patch ] Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
![]() |
8fb4b67939 |
perf record: Add --all-cgroups option
The --all-cgroups option is to enable cgroup profiling support. It tells kernel to record CGROUP events in the ring buffer so that perf report can identify task/cgroup association later. [root@seventh ~]# perf record --all-cgroups --namespaces /wb/cgtest [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.042 MB perf.data (558 samples) ] [root@seventh ~]# perf report --stdio -s cgroup_id,cgroup,pid # To display the perf.data header info, please use --header/--header-only options. # # # Total Lost Samples: 0 # # Samples: 558 of event 'cycles' # Event count (approx.): 458017341 # # Overhead cgroup id (dev/inode) Cgroup Pid:Command # ........ ..................... .......... ............... # 33.15% 4/0xeffffffb /sub 9615:looper0 32.83% 4/0xf00002f5 /sub/cgrp2 9620:looper2 32.79% 4/0xf00002f4 /sub/cgrp1 9619:looper1 0.35% 4/0xf00002f5 /sub/cgrp2 9618:cgtest 0.34% 4/0xf00002f4 /sub/cgrp1 9617:cgtest 0.32% 4/0xeffffffb / 9615:looper0 0.11% 4/0xeffffffb /sub 9617:cgtest 0.10% 4/0xeffffffb /sub 9618:cgtest # # (Tip: Sample related events with: perf record -e '{cycles,instructions}:S') # [root@seventh ~]# Signed-off-by: Namhyung Kim <namhyung@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lore.kernel.org/lkml/20200325124536.2800725-8-namhyung@kernel.org Link: http://lore.kernel.org/lkml/20200402015249.3800462-1-namhyung@kernel.org [ Extracted the HAVE_FILE_HANDLE from the followup patch ] Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
![]() |
ab64069f1a |
perf record: Support synthesizing cgroup events
Synthesize cgroup events by iterating cgroup filesystem directories. The cgroup event only saves the portion of cgroup path after the mount point and the cgroup id (which actually is a file handle). Signed-off-by: Namhyung Kim <namhyung@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lore.kernel.org/lkml/20200325124536.2800725-7-namhyung@kernel.org Link: http://lore.kernel.org/lkml/20200402015249.3800462-1-namhyung@kernel.org [ Extracted the HAVE_FILE_HANDLE from the followup patch, added missing __maybe_unused ] Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
![]() |
b629f3e9d0 |
perf report: Add 'cgroup' sort key
The cgroup sort key is to show cgroup membership of each task. Currently it shows full path in the cgroupfs (not relative to the root of cgroup namespace) since it'd be more intuitive IMHO. Otherwise root cgroup in different namespaces will all show same name - "/". The cgroup sort key should come before cgroup_id otherwise sort_dimension__add() will match it to cgroup_id as it only matches with the given substring. For example it will look like following. Note that record patch adding --all-cgroups patch will come later. $ perf record -a --namespace --all-cgroups cgtest [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.208 MB perf.data (4090 samples) ] $ perf report -s cgroup_id,cgroup,pid ... # Overhead cgroup id (dev/inode) Cgroup Pid:Command # ........ ..................... .......... ............... # 93.96% 0/0x0 / 0:swapper 1.25% 3/0xeffffffb / 278:looper0 0.86% 3/0xf000015f /sub/cgrp1 280:cgtest 0.37% 3/0xf0000160 /sub/cgrp2 281:cgtest 0.34% 3/0xf0000163 /sub/cgrp3 282:cgtest 0.22% 3/0xeffffffb /sub 278:looper0 0.20% 3/0xeffffffb / 280:cgtest 0.15% 3/0xf0000163 /sub/cgrp3 285:looper3 Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lore.kernel.org/lkml/20200325124536.2800725-6-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
![]() |
d1277aa36b |
perf cgroup: Maintain cgroup hierarchy
Each cgroup is kept in the perf_env's cgroup_tree sorted by the cgroup id. Hist entries have cgroup id can compare it directly and later it can be used to find a group name using this tree. Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lore.kernel.org/lkml/20200325124536.2800725-5-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
![]() |
ba78c1c546 |
perf tools: Basic support for CGROUP event
Implement basic functionality to support cgroup tracking. Each cgroup can be identified by inode number which can be read from userspace too. The actual cgroup processing will come in the later patch. Reported-by: kernel test robot <rong.a.chen@intel.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> [ fix perf test failure on sampling parsing ] Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lore.kernel.org/lkml/20200325124536.2800725-4-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
![]() |
49f550ea87 |
perf tools: Add file-handle feature test
The file handle (FHANDLE) support is configurable so some systems might not have it. So add a config feature item to check it on build time so that we don't add the cgroup tracking feature based on that. Committer notes: Had to make the test use the same construct as its later use in synthetic-events.c, in the next patch in this series. i.e. make it be: struct { struct file_handle fh; uint64_t cgroup_id; } handle; To cope with: CC /tmp/build/perf/util/cloexec.o util/synthetic-events.c:428:22: error: field 'fh' with CC /tmp/build/perf/util/call-path.o variable sized type 'struct file_handle' not at the end of a struct or class is a GNU extension [-Werror,-Wgnu-variable-sized-type-not-at-end] struct file_handle fh; ^ 1 error generated. Deal with this at some point, i.e. investigate if the right thing is to remove that -Wgnu-variable-sized-type-not-at-end from our CFLAGS, for now do the test the same way as it is used looks more sensible. Reported-by: Arnaldo Carvalho de Melo <acme@kernel.org> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lore.kernel.org/lkml/20200402015249.3800462-1-namhyung@kernel.org [ split from a larger patch, removed blank line at EOF ] Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
![]() |
460c3ed999 |
perf python: Include rwsem.c in the pythong biding
We'll need it for the cgroup patches, and its better to have it in a separate patch in case we need to later revert the cgroup patches. I.e. without this we have: [root@five ~]# perf test -v python 19: 'import perf' in python : --- start --- test child forked, pid 148447 Traceback (most recent call last): File "<stdin>", line 1, in <module> ImportError: /tmp/build/perf/python/perf.cpython-37m-x86_64-linux-gnu.so: undefined symbol: down_write test child finished with -1 ---- end ---- 'import perf' in python: FAILED! [root@five ~]# Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lore.kernel.org/lkml/20200403123606.GC23243@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
![]() |
7cc7e93519 |
Merge branch 'x86-misc-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull misc x86 updates from Ingo Molnar: - extend the decoder maps with CET instructions - fix !vDSO corner cases * 'x86-misc-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf/tests: Add CET instructions to the new instructions test x86/insn: Add Control-flow Enforcement (CET) instructions to the opcode map selftests/x86/ptrace_syscall_32: Fix no-vDSO segfault selftests/x86/vdso: Fix no-vDSO segfaults |
|
![]() |
26567ed79d |
perf script: Introduce --deltatime option
For some kind of analysis a deltatime output is more human friendly and reduce the cognitive load for further analysis. The following output demonstrate the new option "deltatime": calculate the time difference in relation to the previous event. $ perf script --deltatime test 2525 [001] 0.000000: sdt_libev:ev_add: (5635e72a5ebd) test 2525 [001] 0.000091: sdt_libev:epoll_wait_enter: (5635e72a76a9) test 2525 [001] 1.000051: sdt_libev:epoll_wait_return: (5635e72a772e) arg1=1 test 2525 [001] 0.000685: sdt_libev:ev_add: (5635e72a5ebd) test 2525 [001] 0.000048: sdt_libev:epoll_wait_enter: (5635e72a76a9) test 2525 [001] 1.000104: sdt_libev:epoll_wait_return: (5635e72a772e) arg1=1 test 2525 [001] 0.003895: sdt_libev:epoll_wait_enter: (5635e72a76a9) test 2525 [001] 0.996034: sdt_libev:epoll_wait_return: (5635e72a772e) arg1=1 test 2525 [001] 0.000058: sdt_libev:epoll_wait_enter: (5635e72a76a9) test 2525 [001] 1.000004: sdt_libev:epoll_wait_return: (5635e72a772e) arg1=1 test 2525 [001] 0.000064: sdt_libev:epoll_wait_enter: (5635e72a76a9) test 2525 [001] 0.999934: sdt_libev:epoll_wait_return: (5635e72a772e) arg1=1 test 2525 [001] 0.000056: sdt_libev:epoll_wait_enter: (5635e72a76a9) test 2525 [001] 0.999930: sdt_libev:epoll_wait_return: (5635e72a772e) arg1=1 Committer testing: So go from default output to --reltime and then this new --deltatime, to contrast the various timestamp presentation modes for a random perf.data file I had laying around: [root@five ~]# perf script --reltime | head perf 442394 [000] 0.000000: 16 cycles: ffffffff9706e544 native_write_msr+0x4 (/usr/lib/debug/lib/modules/5.5.10-200.fc31.x86_64/vmlinux) perf 442394 [000] 0.000002: 16 cycles: ffffffff9706e544 native_write_msr+0x4 (/usr/lib/debug/lib/modules/5.5.10-200.fc31.x86_64/vmlinux) perf 442394 [000] 0.000004: 16 cycles: ffffffff9706e544 native_write_msr+0x4 (/usr/lib/debug/lib/modules/5.5.10-200.fc31.x86_64/vmlinux) perf 442394 [000] 0.000006: 128 cycles: ffffffff972415a1 perf_event_update_userpage+0x1 (/usr/lib/debug/lib/modules/5.5.10-200.fc31.x86_64/vmlinux) perf 442394 [000] 0.000009: 2597 cycles: ffffffff97463785 cap_task_setscheduler+0x5 (/usr/lib/debug/lib/modules/5.5.10-200.fc31.x86_64/vmlinux) perf 442394 [001] 0.000036: 16 cycles: ffffffff9706e544 native_write_msr+0x4 (/usr/lib/debug/lib/modules/5.5.10-200.fc31.x86_64/vmlinux) perf 442394 [001] 0.000038: 16 cycles: ffffffff9706e544 native_write_msr+0x4 (/usr/lib/debug/lib/modules/5.5.10-200.fc31.x86_64/vmlinux) perf 442394 [001] 0.000040: 16 cycles: ffffffff9706e544 native_write_msr+0x4 (/usr/lib/debug/lib/modules/5.5.10-200.fc31.x86_64/vmlinux) perf 442394 [001] 0.000041: 224 cycles: ffffffff9700a53a perf_ibs_handle_irq+0x1da (/usr/lib/debug/lib/modules/5.5.10-200.fc31.x86_64/vmlinux) perf 442394 [001] 0.000044: 4439 cycles: ffffffff97120d85 put_prev_entity+0x45 (/usr/lib/debug/lib/modules/5.5.10-200.fc31.x86_64/vmlinux) [root@five ~]# perf script --deltatime | head perf 442394 [000] 0.000000: 16 cycles: ffffffff9706e544 native_write_msr+0x4 (/usr/lib/debug/lib/modules/5.5.10-200.fc31.x86_64/vmlinux) perf 442394 [000] 0.000002: 16 cycles: ffffffff9706e544 native_write_msr+0x4 (/usr/lib/debug/lib/modules/5.5.10-200.fc31.x86_64/vmlinux) perf 442394 [000] 0.000001: 16 cycles: ffffffff9706e544 native_write_msr+0x4 (/usr/lib/debug/lib/modules/5.5.10-200.fc31.x86_64/vmlinux) perf 442394 [000] 0.000001: 128 cycles: ffffffff972415a1 perf_event_update_userpage+0x1 (/usr/lib/debug/lib/modules/5.5.10-200.fc31.x86_64/vmlinux) perf 442394 [000] 0.000002: 2597 cycles: ffffffff97463785 cap_task_setscheduler+0x5 (/usr/lib/debug/lib/modules/5.5.10-200.fc31.x86_64/vmlinux) perf 442394 [001] 0.000027: 16 cycles: ffffffff9706e544 native_write_msr+0x4 (/usr/lib/debug/lib/modules/5.5.10-200.fc31.x86_64/vmlinux) perf 442394 [001] 0.000002: 16 cycles: ffffffff9706e544 native_write_msr+0x4 (/usr/lib/debug/lib/modules/5.5.10-200.fc31.x86_64/vmlinux) perf 442394 [001] 0.000001: 16 cycles: ffffffff9706e544 native_write_msr+0x4 (/usr/lib/debug/lib/modules/5.5.10-200.fc31.x86_64/vmlinux) perf 442394 [001] 0.000001: 224 cycles: ffffffff9700a53a perf_ibs_handle_irq+0x1da (/usr/lib/debug/lib/modules/5.5.10-200.fc31.x86_64/vmlinux) perf 442394 [001] 0.000002: 4439 cycles: ffffffff97120d85 put_prev_entity+0x45 (/usr/lib/debug/lib/modules/5.5.10-200.fc31.x86_64/vmlinux) [root@five ~]# perf script | head perf 442394 [000] 7600.157861: 16 cycles: ffffffff9706e544 native_write_msr+0x4 (/usr/lib/debug/lib/modules/5.5.10-200.fc31.x86_64/vmlinux) perf 442394 [000] 7600.157864: 16 cycles: ffffffff9706e544 native_write_msr+0x4 (/usr/lib/debug/lib/modules/5.5.10-200.fc31.x86_64/vmlinux) perf 442394 [000] 7600.157866: 16 cycles: ffffffff9706e544 native_write_msr+0x4 (/usr/lib/debug/lib/modules/5.5.10-200.fc31.x86_64/vmlinux) perf 442394 [000] 7600.157867: 128 cycles: ffffffff972415a1 perf_event_update_userpage+0x1 (/usr/lib/debug/lib/modules/5.5.10-200.fc31.x86_64/vmlinux) perf 442394 [000] 7600.157870: 2597 cycles: ffffffff97463785 cap_task_setscheduler+0x5 (/usr/lib/debug/lib/modules/5.5.10-200.fc31.x86_64/vmlinux) perf 442394 [001] 7600.157897: 16 cycles: ffffffff9706e544 native_write_msr+0x4 (/usr/lib/debug/lib/modules/5.5.10-200.fc31.x86_64/vmlinux) perf 442394 [001] 7600.157900: 16 cycles: ffffffff9706e544 native_write_msr+0x4 (/usr/lib/debug/lib/modules/5.5.10-200.fc31.x86_64/vmlinux) perf 442394 [001] 7600.157901: 16 cycles: ffffffff9706e544 native_write_msr+0x4 (/usr/lib/debug/lib/modules/5.5.10-200.fc31.x86_64/vmlinux) perf 442394 [001] 7600.157903: 224 cycles: ffffffff9700a53a perf_ibs_handle_irq+0x1da (/usr/lib/debug/lib/modules/5.5.10-200.fc31.x86_64/vmlinux) perf 442394 [001] 7600.157906: 4439 cycles: ffffffff97120d85 put_prev_entity+0x45 (/usr/lib/debug/lib/modules/5.5.10-200.fc31.x86_64/vmlinux) [root@five ~]# Andi suggested we better implement it as a new field, i.e. -F deltatime, like: [root@five ~]# perf script -F deltatime Invalid field requested. Usage: perf script [<options>] or: perf script [<options>] record <script> [<record-options>] <command> or: perf script [<options>] report <script> [script-args] or: perf script [<options>] <script> [<record-options>] <command> or: perf script [<options>] <top-script> [script-args] -F, --fields <str> comma separated output fields prepend with 'type:'. +field to add and -field to remove.Valid types: hw,sw,trace,raw,synth. Fields: comm,tid,pid,time,cpu,event,trace,ip,sym,dso,addr,symoff,srcline,period,iregs,uregs,brstack,brstacksym,flags,bpf-output,brstackinsn,brstackoff,callindent,insn,insnlen,synth,phys_addr,metric,misc,ipc [root@five ~]# I.e. we have -F for maximum flexibility: [root@five ~]# perf script -F comm,pid,cpu,time | head perf 442394 [000] 7600.157861: perf 442394 [000] 7600.157864: perf 442394 [000] 7600.157866: perf 442394 [000] 7600.157867: perf 442394 [000] 7600.157870: perf 442394 [001] 7600.157897: perf 442394 [001] 7600.157900: perf 442394 [001] 7600.157901: perf 442394 [001] 7600.157903: perf 442394 [001] 7600.157906: [root@five ~]# But since we already have --reltime, having --deltatime, documented one after the other is sensible. Signed-off-by: Hagen Paul Pfeifer <hagen@jauu.net> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Link: http://lore.kernel.org/lkml/20200204173709.489161-1-hagen@jauu.net [ Added 'perf script' man page entry for --deltatime ] Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
![]() |
26cec7480e |
perf test x86: Add CET instructions to the new instructions test
Add to the "x86 instruction decoder - new instructions" test the following instructions: incsspd incsspq rdsspd rdsspq saveprevssp rstorssp wrssd wrssq wrussd wrussq setssbsy clrssbsy endbr32 endbr64 And the "notrack" prefix for indirect calls and jumps. For information about the instructions, refer Intel Control-flow Enforcement Technology Specification May 2019 (334525-003). Committer testing: $ perf test instr 67: x86 instruction decoder - new instructions : Ok $ Then use verbose mode and check one of those new instructions: $ perf test -v instr |& grep saveprevssp Decoded ok: f3 0f 01 ea saveprevssp Decoded ok: f3 0f 01 ea saveprevssp $ Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Acked-by: Masami Hiramatsu <mhiramat@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi v. Shankar <ravi.v.shankar@intel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tony Luck <tony.luck@intel.com> Cc: x86@kernel.org Link: http://lore.kernel.org/lkml/20200204171425.28073-3-yu-cheng.yu@intel.com Signed-off-by: Yu-cheng Yu <yu-cheng.yu@intel.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
![]() |
e4ffd066ff |
perf: Normalize gcc parameter when generating arch errno table
The $(CC) passed to arch_errno_names.sh may include a series of parameters along with gcc itself. To avoid overwriting the following parameters of arch_errno_names.sh and break the build like below, we just pick up the first word of the $(CC). find: unknown predicate `-m64/arch' x86_64-wrs-linux-gcc: warning: '-x c' after last input file has no effect x86_64-wrs-linux-gcc: error: unrecognized command line option '-m64/include/uapi/asm-generic/errno.h' x86_64-wrs-linux-gcc: fatal error: no input files Signed-off-by: He Zhe <zhe.he@windriver.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lore.kernel.org/lkml/1581618066-187262-2-git-send-email-zhe.he@windriver.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
![]() |
2a3d252dff |
perf parse-events: Add defensive NULL check
Terms may have a NULL config in which case a strcmp will SEGV. This can be reproduced with: perf stat -e '*/event=?,nr/' sleep 1 Add a NULL check to avoid this. This was caught by LLVM's libfuzzer. Signed-off-by: Ian Rogers <irogers@google.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jin Yao <yao.jin@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: John Garry <john.garry@huawei.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: clang-built-linux@googlegroups.com Link: http://lore.kernel.org/lkml/20200325164022.41385-1-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
![]() |
1032f32645 |
perf/tests: Add CET instructions to the new instructions test
Add to the "x86 instruction decoder - new instructions" test the following instructions: incsspd incsspq rdsspd rdsspq saveprevssp rstorssp wrssd wrssq wrussd wrussq setssbsy clrssbsy endbr32 endbr64 And the notrack prefix for indirect calls and jumps. For information about the instructions, refer Intel Control-flow Enforcement Technology Specification May 2019 (334525-003). Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Signed-off-by: Yu-cheng Yu <yu-cheng.yu@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Acked-by: Masami Hiramatsu <mhiramat@kernel.org> Link: https://lkml.kernel.org/r/20200204171425.28073-3-yu-cheng.yu@intel.com |
|
![]() |
eadcaa3dfd |
perf callchain: Update docs regarding kernel/user space unwinding
The method of unwinding for kernel space is defined by the kernel config, not by the value of --call-graph. Improve the documentation to reflect this. Signed-off-by: Tony Jones <tonyj@suse.de> Link: http://lore.kernel.org/lkml/20200325164053.10177-1-tonyj@suse.de Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
![]() |
d198b34f38 |
.gitignore: add SPDX License Identifier
Add SPDX License Identifier to all .gitignore files. Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> |
|
![]() |
0d33b34352 |
perf dso: Fix dso comparison
Perf gets dso details from two different sources. 1st, from builid headers in perf.data and 2nd from MMAP2 samples. Dso from buildid header does not have dso_id detail. And dso from MMAP2 samples does not have buildid information. If detail of the same dso is present at both the places, filename is common. Previously, __dsos__findnew_link_by_longname_id() used to compare only long or short names, but Commit |
|
![]() |
d74b181a02 |
perf cpumap: Fix snprintf overflow check
'snprintf' returns the number of characters which would be generated for the given input. If the returned value is *greater than* or equal to the buffer size, it means that the output has been truncated. Fix the overflow test accordingly. Fixes: |
|
![]() |
956a78356c |
perf test: Test pmu-events aliases
Add creating event aliases to the pmu-events test. So currently we verify that the generated pmu-events.c is as expected for some test events. Now test that we generate aliases as expected for those events during normal operation. For that, we cycle through each HW PMU in the system, and use the test events to create aliases, and verify those against known, expected values. For core PMUs, we should create an alias for every event in test_cpu_events[]. However, for uncore PMUs, they need to be matched by the pmu_event.pmu member, so use test_uncore_events[]; so check the match beforehand with pmu_uncore_alias_match(). A sample run is as follows for my x86 machine: john@linux-3c19:~/linux> tools/perf/perf test -vv 10 10: PMU events : --- start --- ... testing PMU uncore_arb aliases: no events to match testing PMU cstate_pkg aliases: no events to match skipping testing PMU breakpoint testing aliases PMU uncore_cbox_1: matched event unc_cbo_xsnp_response.miss_eviction testing PMU uncore_cbox_1 aliases: pass testing PMU power aliases: no events to match testing aliases PMU cpu: matched event bp_l1_btb_correct testing aliases PMU cpu: matched event bp_l2_btb_correct testing aliases PMU cpu: matched event segment_reg_loads.any testing aliases PMU cpu: matched event dispatch_blocked.any testing aliases PMU cpu: matched event eist_trans testing PMU cpu aliases: pass testing PMU intel_pt aliases: no events to match skipping testing PMU software skipping testing PMU intel_bts testing PMU uncore_imc aliases: no events to match testing aliases PMU uncore_cbox_0: matched event unc_cbo_xsnp_response.miss_eviction testing PMU uncore_cbox_0 aliases: pass testing PMU cstate_core aliases: no events to match skipping testing PMU tracepoint testing PMU msr aliases: no events to match test child finished with 0 Signed-off-by: John Garry <john.garry@huawei.com> Acked-by: Jiri Olsa <jolsa@redhat.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: James Clark <james.clark@arm.com> Cc: Joakim Zhang <qiangqing.zhang@nxp.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Will Deacon <will@kernel.org> Cc: linuxarm@huawei.com Link: http://lore.kernel.org/lkml/1584442939-8911-8-git-send-email-john.garry@huawei.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
![]() |
5b9a50001b |
perf pmu: Make pmu_uncore_alias_match() public
The perf pmu-events test will want to use pmu_uncore_alias_match(), so make it public. Signed-off-by: John Garry <john.garry@huawei.com> Acked-by: Jiri Olsa <jolsa@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: James Clark <james.clark@arm.com> Cc: Joakim Zhang <qiangqing.zhang@nxp.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Will Deacon <will@kernel.org> Cc: linuxarm@huawei.com Link: http://lore.kernel.org/lkml/1584442939-8911-7-git-send-email-john.garry@huawei.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
![]() |
d504fae93d |
perf pmu: Add is_pmu_core()
Add a function to decide whether a PMU is a core PMU. Signed-off-by: John Garry <john.garry@huawei.com> Acked-by: Jiri Olsa <jolsa@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: James Clark <james.clark@arm.com> Cc: Joakim Zhang <qiangqing.zhang@nxp.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Will Deacon <will@kernel.org> Cc: linuxarm@huawei.com Link: http://lore.kernel.org/lkml/1584442939-8911-6-git-send-email-john.garry@huawei.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
![]() |
a6c925fd3a |
perf test: Add pmu-events test
The initial test will verify that the test tables in generated pmu-events.c match against known, expected values. For known events added in pmu-events/arch/test, we need to add an entry in test_cpu_aliases_events[] or test_uncore_events[]. A sample run is as follows for x86: john@linux-3c19:~/linux> tools/perf/perf test -vv 10 10: PMU event aliases : --- start --- test child forked, pid 5316 testing event table bp_l1_btb_correct: pass testing event table bp_l2_btb_correct: pass testing event table segment_reg_loads.any: pass testing event table dispatch_blocked.any: pass testing event table eist_trans: pass testing event table uncore_hisi_ddrc.flux_wcmd: pass testing event table unc_cbo_xsnp_response.miss_eviction: pass test child finished with 0 ---- end ---- PMU event aliases: Ok Signed-off-by: John Garry <john.garry@huawei.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Acked-by: Jiri Olsa <jolsa@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: James Clark <james.clark@arm.com> Cc: Joakim Zhang <qiangqing.zhang@nxp.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Will Deacon <will@kernel.org> Cc: linuxarm@huawei.com [ Fixup test_cpu_events[] and test_uncore_events[] sentinels to initialize one of its members to NULL, fixing the build in older compilers ] Link: http://lore.kernel.org/lkml/1584442939-8911-5-git-send-email-john.garry@huawei.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
![]() |
e45ad701e7 |
perf pmu: Refactor pmu_add_cpu_aliases()
Create pmu_add_cpu_aliases_map() from pmu_add_cpu_aliases(), so the caller can pass the map; the pmu-events test would use this since there would be no CPUID matching to a mapfile there. Signed-off-by: John Garry <john.garry@huawei.com> Acked-by: Jiri Olsa <jolsa@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: James Clark <james.clark@arm.com> Cc: Joakim Zhang <qiangqing.zhang@nxp.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Will Deacon <will@kernel.org> Cc: linuxarm@huawei.com Link: http://lore.kernel.org/lkml/1584442939-8911-4-git-send-email-john.garry@huawei.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
![]() |
d844780887 |
perf jevents: Support test events folder
With the goal of supporting pmu-events test case, introduce support for a test events folder. These test events can be used for testing generation of pmu-event tables and alias creation for any arch. When running the pmu-events test case, these test events will be used as the platform-agnostic events, so aliases can be created per-PMU and validated against known expected values. To support the test events, add a "testcpu" entry in pmu_events_map[]. The pmu-events test will be able to lookup the events map for "testcpu", to verify the generated tables against expected values. The resultant generated pmu-events.c will now look like the following: struct pmu_event pme_ampere_emag[] = { { .name = "ldrex_spec", .event = "event=0x6c", .desc = "Exclusive operation spe...", .topic = "intrinsic", .long_desc = "Exclusive operation ...", }, ... }; struct pmu_event pme_test_cpu[] = { { .name = "uncore_hisi_ddrc.flux_wcmd", .event = "event=0x2", .desc = "DDRC write commands. Unit: hisi_sccl,ddrc ", .topic = "uncore", .long_desc = "DDRC write commands", .pmu = "hisi_sccl,ddrc", }, { .name = "unc_cbo_xsnp_response.miss_eviction", .event = "umask=0x81,event=0x22", .desc = "Unit: uncore_cbox A cross-core snoop resulted ...", .topic = "uncore", .long_desc = "A cross-core snoop resulted from L3 ...", .pmu = "uncore_cbox", }, { .name = "eist_trans", .event = "umask=0x0,period=200000,event=0x3a", .desc = "Number of Enhanced Intel SpeedStep(R) ...", .topic = "other", }, { .name = 0, }, }; struct pmu_events_map pmu_events_map[] = { ... { .cpuid = "0x00000000500f0000", .version = "v1", .type = "core", .table = pme_ampere_emag }, ... { .cpuid = "testcpu", .version = "v1", .type = "core", .table = pme_test_cpu, }, { .cpuid = 0, .version = 0, .type = 0, .table = 0, }, }; Signed-off-by: John Garry <john.garry@huawei.com> Acked-by: Jiri Olsa <jolsa@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: James Clark <james.clark@arm.com> Cc: Joakim Zhang <qiangqing.zhang@nxp.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Will Deacon <will@kernel.org> Cc: linuxarm@huawei.com Link: http://lore.kernel.org/lkml/1584442939-8911-3-git-send-email-john.garry@huawei.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
![]() |
c52db67a74 |
perf jevents: Add some test events
Add some test PMU events. The events are randomly chosen from x86 and arm64 JSONs. The events include CPU and uncore events. Signed-off-by: John Garry <john.garry@huawei.com> Acked-by: Jiri Olsa <jolsa@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: James Clark <james.clark@arm.com> Cc: Joakim Zhang <qiangqing.zhang@nxp.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Will Deacon <will@kernel.org> Cc: linuxarm@huawei.com Link: http://lore.kernel.org/lkml/1584442939-8911-2-git-send-email-john.garry@huawei.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
![]() |
7cd053d4cf |
perf tools: Unify a bit the build directory output
Removing the extra 'SUBDIR' line from clean and doc build output. Because it's annoying.. ;-) Before: $ make clean ... SUBDIR Documentation CLEAN Documentation After: $ make clean ... CLEAN Documentation Before: $ make doc BUILD: Doing 'make -j8' parallel build SUBDIR Documentation ASCIIDOC perf-stat.html ... After: $ make doc BUILD: Doing 'make -j8' parallel build ASCIIDOC perf-stat.html ... Signed-off-by: Jiri Olsa <jolsa@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Michael Petlan <mpetlan@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lore.kernel.org/lkml/20200318204522.1200981-1-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
![]() |
b5b8a7cf14 |
perf vendor events amd: Update Zen1 events to V2
This patch updates the PMCs for AMD Zen1 core based processors (Family 17h; Models 0 through 2F) to be in accordance with PMCs as documented in the latest versions of the AMD Processor Programming Reference [1], [2] and [3]. Note that some events, such as FPU pipe assignment are missing in [1], and therefore [3] is included for full coverage of events. PMCs added: fpu_pipe_assignment.dual{0|1|2|3} fpu_pipe_assignment.total{0|1|2|3} ls_mab_alloc.dc_prefetcher ls_mab_alloc.stores ls_mab_alloc.loads bp_dyn_ind_pred bp_de_redirect PMC removed: ex_ret_cond_misp Cumulative counts, fpu_pipe_assignment.total and fpu_pipe_assignment.dual, existed in v1, but did expose port-level counters. ex_ret_cond_misp has been removed as it has been removed from the latest versions of the PPR, and when tested, always seems to sample zero as tested on a Ryzen 3400G system. [1]: Processor Programming Reference (PPR) for AMD Family 17h Models 01h,08h, Revision B2 Processors, 54945 Rev 3.03 - Jun 14, 2019. [2]: Processor Programming Reference (PPR) for AMD Family 17h Model 18h, Revision B1 Processors, 55570-B1 Rev 3.14 - Sep 26, 2019. [3]: OSRR for AMD Family 17h processors, Models 00h-2Fh, 56255 Rev 3.03 - July, 2018 All of the PPRs can be found at: https://bugzilla.kernel.org/show_bug.cgi?id=206537 Signed-off-by: Vijay Thakkar <vijaythakkar@me.com> Acked-by: Kim Phillips <kim.phillips@amd.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Jon Grimm <jon.grimm@amd.com> Cc: Martin Liška <mliska@suse.cz> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: vijay thakkar <vijaythakkar@me.com> Link: http://lore.kernel.org/lkml/20200318190002.307290-4-vijaythakkar@me.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
![]() |
2079f7aa0a |
perf vendor events amd: Add Zen2 events
This patch adds PMU events for AMD Zen2 core based processors, namely, Matisse (model 71h), Castle Peak (model 31h) and Rome (model 2xh), as documented in the AMD Processor Programming Reference for Matisse [1]. The model number regex has been set to detect all the models under family 17 that do not match those of Zen1, as the range is larger for zen2. Zen2 adds some additional counters that are not present in Zen1 and events for them have been added in this patch. Some counters have also been removed for Zen2 thatwere previously present in Zen1 and have been confirmed to always sample zero on zen2. These added/removed counters have been omitted for brevity but can be found here: https://gist.github.com/thakkarV/5b12ca5fd7488eb2c42e451e40bdd5f3 Note that PPR for Zen2 [1] does not include some counters that were documented in the PPR for Zen1 based processors [2]. After having tested these counters, some of them that still work for zen2 systems have been preserved in the events for zen2. The counters that are omitted in [1] but are still measurable and non-zero on zen2 (tested on a Ryzen 3900X system) are the following: PMC 0x000 fpu_pipe_assignment.{total|total0|total1|total2|total3} PMC 0x004 fp_num_mov_elim_scal_op.* PMC 0x046 ls_tablewalker.* PMC 0x062 l2_latency.l2_cycles_waiting_on_fills PMC 0x063 l2_wcb_req.* PMC 0x06D l2_fill_pending.l2_fill_busy PMC 0x080 ic_fw32 PMC 0x081 ic_fw32_miss PMC 0x086 bp_snp_re_sync PMC 0x087 ic_fetch_stall.* PMC 0x08C ic_cache_inval.* PMC 0x099 bp_tlb_rel PMC 0x0C7 ex_ret_brn_resync PMC 0x28A ic_oc_mode_switch.* L3PMC 0x001 l3_request_g1.* L3PMC 0x006 l3_comb_clstr_state.* [1]: Processor Programming Reference (PPR) for AMD Family 17h Model 71h, Revision B0 Processors, 56176 Rev 3.06 - Jul 17, 2019 [2]: Processor Programming Reference (PPR) for AMD Family 17h Models 01h,08h, Revision B2 Processors, 54945 Rev 3.03 - Jun 14, 2019 All of the PPRs can be found at: https://bugzilla.kernel.org/show_bug.cgi?id=206537 Here are the results of running "fpu_pipe_assignment.total" events on my Ryzen 3900X family 17h model 71h system: Before this patch: $> perf list *fpu_pipe_assignment* List of pre-defined events (to be used in -e): After: $> perf list *fpu_pipe_assignment* floating point: fpu_pipe_assignment.total [Total number of fp uOps] fpu_pipe_assignment.total0 [Total number uOps assigned to pipe 0] fpu_pipe_assignment.total1 [Total number uOps assigned to pipe 1] fpu_pipe_assignment.total2 [Total number uOps assigned to pipe 2] fpu_pipe_assignment.total3 [Total number uOps assigned to pipe 3] Metric Groups: $> perf stat -e fpu_pipe_assignment.total sleep 1 Performance counter stats for 'sleep 1': 25,883 fpu_pipe_assignment.total 1.004145868 seconds time elapsed 0.001805000 seconds user 0.000000000 seconds sys Usage tests while running Linpackin the background: $> perf stat -I1000 -e fpu_pipe_assignment.total 1.000266796 79,313,191,516 fpu_pipe_assignment.total 2.000809630 68,091,474,430 fpu_pipe_assignment.total 3.001028115 52,925,023,174 fpu_pipe_assignment.total $> perf record -e fpu_pipe_assignment.total,fpu_pipe_assignment.total0 -a sleep 1 [ perf record: Woken up 9 times to write data ] [ perf record: Captured and wrote 4.031 MB perf.data (64764 samples) ] $> perf report --stdio --no-header | head -30 98.33% xhpl xhpl [.] dgemm_kernel 0.28% xhpl xhpl [.] dtrsm_kernel_LT 0.10% xhpl [kernel.kallsyms] [k] entry_SYSCALL_64 0.08% xhpl xhpl [.] idamax_k 0.07% baloo_file_extr liblmdb.so [.] mdb_mid2l_insert 0.06% xhpl xhpl [.] dgemm_itcopy 0.06% xhpl xhpl [.] dgemm_oncopy 0.06% xhpl [kernel.kallsyms] [k] __schedule 0.06% xhpl [kernel.kallsyms] [k] syscall_trace_enter 0.06% xhpl [kernel.kallsyms] [k] native_sched_clock 0.06% xhpl [kernel.kallsyms] [k] pick_next_task_fair 0.05% xhpl xhpl [.] blas_thread_server.llvm.15009391670273914865 0.04% xhpl [kernel.kallsyms] [k] do_syscall_64 0.04% xhpl [kernel.kallsyms] [k] yield_task_fair 0.04% xhpl libpthread-2.31.so [.] __pthread_mutex_unlock_usercnt 0.03% xhpl [kernel.kallsyms] [k] cpuacct_charge 0.03% xhpl [kernel.kallsyms] [k] syscall_return_via_sysret 0.03% xhpl libc-2.31.so [.] __sched_yield 0.03% xhpl [kernel.kallsyms] [k] __calc_delta $> perf annotate --stdio2 dgemm_kernel | egrep '^ {0,2}[0-9]+' -B2 -A2 sub $0x60,%rsp mov %rbx,(%rsp) 0.00 mov %rbp,0x8(%rsp) mov %r12,0x10(%rsp) 0.00 mov %r13,0x18(%rsp) mov %r14,0x20(%rsp) mov %r15,0x28(%rsp) -- mov %rdi,%r13 mov %rsi,0x28(%rsp) 0.00 mov %rdx,%r12 vmovsd %xmm0,0x30(%rsp) shl $0x3,%r10 mov 0x28(%rsp),%rax 0.00 xor %rdx,%rdx mov $0x18,%rdi div %rdi -- nop a0: mov %r12,%rax 0.00 shl $0x3,%rax mov %r8,%rdi lea (%r8,%rax,8),%r15 -- mov %r12,%rax nop 0.00 c0: vmovups (%rdi),%ymm1 0.09 vmovups 0x20(%rdi),%ymm2 0.02 vmovups (%r15),%ymm3 0.10 vmovups %ymm1,(%rsi) 0.07 vmovups %ymm2,0x20(%rsi) 0.07 vmovups %ymm3,0x40(%rsi) 0.06 add $0x40,%rdi add $0x40,%r15 add $0x60,%rsi 0.00 dec %rax ↑ jne c0 mov %r9,%r15 -- nop 110: lea 0x80(%rsp),%rsi 0.01 add $0x60,%rsi 0.03 mov %r12,%rax 0.00 sar $0x3,%rax cmp $0x2,%rax ↓ jl d26 prefetcht0 0x200(%rdi) 0.01 vmovups -0x60(%rsi),%ymm1 0.02 prefetcht0 0xa0(%rsi) 0.00 vbroadcastsd -0x80(%rdi),%ymm0 0.00 prefetcht0 0xe0(%rsi) 0.03 vmovups -0x40(%rsi),%ymm2 0.00 prefetcht0 0x120(%rsi) vmovups -0x20(%rsi),%ymm3 vmulpd %ymm0,%ymm1,%ymm4 0.01 prefetcht0 0x160(%rsi) vmulpd %ymm0,%ymm2,%ymm8 0.01 vmulpd %ymm0,%ymm3,%ymm12 0.02 prefetcht0 0x1a0(%rsi) 0.01 vbroadcastsd -0x78(%rdi),%ymm0 vmulpd %ymm0,%ymm1,%ymm5 0.01 vmulpd %ymm0,%ymm2,%ymm9 vmulpd %ymm0,%ymm3,%ymm13 0.01 vbroadcastsd -0x70(%rdi),%ymm0 vmulpd %ymm0,%ymm1,%ymm6 0.00 vmulpd %ymm0,%ymm2,%ymm10 0.00 add $0x60,%rsi ... snip ... nop 65e0: vmovddup -0x60(%rsi),%xmm2 0.00 vmovups -0x80(%rdi),%xmm0 vmovups -0x70(%rdi),%xmm1 0.00 vmovddup -0x58(%rsi),%xmm3 vfmadd231pd %xmm0,%xmm2,%xmm4 0.00 vfmadd231pd %xmm1,%xmm2,%xmm5 0.00 vfmadd231pd %xmm0,%xmm3,%xmm6 0.00 vfmadd231pd %xmm1,%xmm3,%xmm7 0.00 add $0x10,%rsi add $0x20,%rdi 0.00 dec %rax ↑ jne 65e0 nop nop 6620: vmovddup 0x30(%rsp),%xmm0 0.00 vmulpd %xmm0,%xmm4,%xmm4 0.00 vmulpd %xmm0,%xmm5,%xmm5 vmulpd %xmm0,%xmm6,%xmm6 vmulpd %xmm0,%xmm7,%xmm7 vaddpd (%r15),%xmm4,%xmm4 vaddpd 0x10(%r15),%xmm5,%xmm5 0.00 vaddpd (%r15,%r10,1),%xmm6,%xmm6 0.00 vaddpd 0x10(%r15,%r10,1),%xmm7,%xmm7 0.00 vmovups %xmm4,(%r15) vmovups %xmm5,0x10(%r15) 0.00 vmovups %xmm6,(%r15,%r10,1) vmovups %xmm7,0x10(%r15,%r10,1) add $0x20,%r15 -- lea (%r8,%rax,8),%r8 69d8: mov 0x20(%rsp),%r14 0.00 test $0x1,%r14 ↓ je 6d84 mov %r9,%r15 -- vbroadcastsd -0x28(%rsi),%ymm3 vfmadd231pd (%rdi),%ymm0,%ymm4 0.00 vfmadd231pd 0x20(%rdi),%ymm1,%ymm5 vfmadd231pd 0x40(%rdi),%ymm2,%ymm6 vfmadd231pd 0x60(%rdi),%ymm3,%ymm7 -- vmulpd %ymm0,%ymm4,%ymm4 vaddpd (%r15),%ymm4,%ymm4 0.00 vmovups %ymm4,(%r15) add $0x20,%r15 dec %r11 -- mov %rbx,%rsp mov (%rsp),%rbx 0.01 mov 0x8(%rsp),%rbp mov 0x10(%rsp),%r12 mov 0x18(%rsp),%r13 Signed-off-by: Vijay Thakkar <vijaythakkar@me.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Acked-by: Kim Phillips <kim.phillips@amd.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Jon Grimm <jon.grimm@amd.com> Cc: Martin Liška <mliska@suse.cz> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lore.kernel.org/lkml/20200318190002.307290-3-vijaythakkar@me.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
![]() |
c5f18e9e94 |
perf vendor events amd: Restrict model detection for zen1 based processors
This patch changes the previous blanket detection of AMD Family 17h processors to be more specific to Zen1 core based products only by replacing model detection regex pattern [[:xdigit:]]+ with ([12][0-9A-F]|[0-9A-F]), restricting to models 0 though 2f only. This change is required to allow for the addition of separate PMU events for Zen2 core based models in the following patches as those belong to family 17h but have different PMCs. Current PMU events directory has also been renamed to "amdzen1" from "amdfam17h" to reflect this specificity. Note that although this change does not break PMU counters for existing zen1 based systems, it does disable the current set of counters for zen2 based systems. Counters for zen2 have been added in the following patches in this patchset. Signed-off-by: Vijay Thakkar <vijaythakkar@me.com> Acked-by: Kim Phillips <kim.phillips@amd.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Jon Grimm <jon.grimm@amd.com> Cc: Martin Liška <mliska@suse.cz> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lore.kernel.org/lkml/20200318190002.307290-2-vijaythakkar@me.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
![]() |
58fc90fda0 |
perf metricgroup: Fix printing event names of metric group with multiple events incase of overlapping events
Commit
|
|
![]() |
d13e9e413e |
perf stat: Align the output for interval aggregation mode
There is a slight misalignment in -A -I output. For example: # perf stat -e cpu/event=cpu-cycles/ -a -A -I 1000 # time CPU counts unit events 1.000440863 CPU0 1,068,388 cpu/event=cpu-cycles/ 1.000440863 CPU1 875,954 cpu/event=cpu-cycles/ 1.000440863 CPU2 3,072,538 cpu/event=cpu-cycles/ 1.000440863 CPU3 4,026,870 cpu/event=cpu-cycles/ 1.000440863 CPU4 5,919,630 cpu/event=cpu-cycles/ 1.000440863 CPU5 2,714,260 cpu/event=cpu-cycles/ 1.000440863 CPU6 2,219,240 cpu/event=cpu-cycles/ 1.000440863 CPU7 1,299,232 cpu/event=cpu-cycles/ The value of counts is not aligned with the column "counts" and the event name is not aligned with the column "events". With this patch, the output is, # perf stat -e cpu/event=cpu-cycles/ -a -A -I 1000 # time CPU counts unit events 1.000423009 CPU0 997,421 cpu/event=cpu-cycles/ 1.000423009 CPU1 1,422,042 cpu/event=cpu-cycles/ 1.000423009 CPU2 484,651 cpu/event=cpu-cycles/ 1.000423009 CPU3 525,791 cpu/event=cpu-cycles/ 1.000423009 CPU4 1,370,100 cpu/event=cpu-cycles/ 1.000423009 CPU5 442,072 cpu/event=cpu-cycles/ 1.000423009 CPU6 205,643 cpu/event=cpu-cycles/ 1.000423009 CPU7 1,302,250 cpu/event=cpu-cycles/ Now output is aligned. Signed-off-by: Jin Yao <yao.jin@linux.intel.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lore.kernel.org/lkml/20200218071614.25736-1-yao.jin@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
![]() |
dbddf17474 |
perf report/top TUI: Support hotkeys to let user select any event for sorting
When performing "perf report --group", it shows the event group information together. In previous patch, we have supported a new option "--group-sort-idx" to sort the output by the event at the index n in event group. It would be nice if we can use a hotkey in browser to select a event to sort. For example, # perf report --group Samples: 12K of events 'cpu/instructions,period=2000003/, cpu/cpu-cycles,period=200003/, ... Overhead Command Shared Object Symbol 92.19% 98.68% 0.00% 93.30% mgen mgen [.] LOOP1 3.12% 0.29% 0.00% 0.16% gsd-color libglib-2.0.so.0.5600.4 [.] 0x0000000000049515 1.56% 0.03% 0.00% 0.04% gsd-color libglib-2.0.so.0.5600.4 [.] 0x00000000000494b7 1.56% 0.01% 0.00% 0.00% gsd-color libglib-2.0.so.0.5600.4 [.] 0x00000000000494ce 1.56% 0.00% 0.00% 0.00% mgen [kernel.kallsyms] [k] task_tick_fair 0.00% 0.15% 0.00% 0.04% perf [kernel.kallsyms] [k] smp_call_function_single 0.00% 0.13% 0.00% 6.08% swapper [kernel.kallsyms] [k] intel_idle 0.00% 0.03% 0.00% 0.00% gsd-color libglib-2.0.so.0.5600.4 [.] g_main_context_check 0.00% 0.03% 0.00% 0.00% swapper [kernel.kallsyms] [k] apic_timer_interrupt 0.00% 0.03% 0.00% 0.00% swapper [kernel.kallsyms] [k] check_preempt_curr When user press hotkey '3' (event index, starting from 0), it indicates to sort output by the forth event in group. Samples: 12K of events 'cpu/instructions,period=2000003/, cpu/cpu-cycles,period=200003/, ... Overhead Command Shared Object Symbol 92.19% 98.68% 0.00% 93.30% mgen mgen [.] LOOP1 0.00% 0.13% 0.00% 6.08% swapper [kernel.kallsyms] [k] intel_idle 3.12% 0.29% 0.00% 0.16% gsd-color libglib-2.0.so.0.5600.4 [.] 0x0000000000049515 0.00% 0.00% 0.00% 0.06% swapper [kernel.kallsyms] [k] hrtimer_start_range_ns 1.56% 0.03% 0.00% 0.04% gsd-color libglib-2.0.so.0.5600.4 [.] 0x00000000000494b7 0.00% 0.15% 0.00% 0.04% perf [kernel.kallsyms] [k] smp_call_function_single 0.00% 0.00% 0.00% 0.02% mgen [kernel.kallsyms] [k] update_curr 0.00% 0.00% 0.00% 0.02% mgen [kernel.kallsyms] [k] apic_timer_interrupt 0.00% 0.00% 0.00% 0.02% mgen [kernel.kallsyms] [k] native_apic_msr_eoi_write 0.00% 0.00% 0.00% 0.02% mgen [kernel.kallsyms] [k] __update_load_avg_se v6: --- Jiri provided a good improvement to eliminate unneeded refresh. This improvement is added to v6. v2: --- 1. Report warning at helpline when index is invalid. 2. Report warning at helpline when it's not group event. 3. Use "case '0' ... '9'" to refine the code 4. Split K_RELOAD implementation to another patch. Signed-off-by: Jin Yao <yao.jin@linux.intel.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lore.kernel.org/lkml/20200220013616.19916-4-yao.jin@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
![]() |
5e3b810aac |
perf report: Support a new key to reload the browser
Sometimes we may need to reload the browser to update the output since some options are changed. This patch creates a new key K_RELOAD. Once the __cmd_report() returns K_RELOAD, it would repeat the whole process, such as, read samples from data file, sort the data and display in the browser. v5: --- 1. Fix the 'make NO_SLANG=1' error. Define K_RELOAD in util/hist.h. 2. Skip setup_sorting() in repeat path if last key is K_RELOAD. v4: --- Need to quit in perf_evsel_menu__run if key is K_RELOAD. Signed-off-by: Jin Yao <yao.jin@linux.intel.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lore.kernel.org/lkml/20200220013616.19916-3-yao.jin@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
![]() |
429a5f9d89 |
perf report: Allow specifying event to be used as sort key in --group output
When performing "perf report --group", it shows the event group information together. By default, the output is sorted by the first event in group. It would be nice for user to select any event for sorting. This patch introduces a new option "--group-sort-idx" to sort the output by the event at the index n in event group. For example, Before: # perf report --group --stdio # To display the perf.data header info, please use --header/--header-only options. # # # Total Lost Samples: 0 # # Samples: 12K of events 'cpu/instructions,period=2000003/, cpu/cpu-cycles,period=200003/, BR_MISP_RETIRED.ALL_BRANCHES:pp, cpu/event=0xc0,umask=1,cmask=1, # Event count (approx.): 6451235635 # # Overhead Command Shared Object Symbol # ................................ ......... ....................... ................................... # 92.19% 98.68% 0.00% 93.30% mgen mgen [.] LOOP1 3.12% 0.29% 0.00% 0.16% gsd-color libglib-2.0.so.0.5600.4 [.] 0x0000000000049515 1.56% 0.03% 0.00% 0.04% gsd-color libglib-2.0.so.0.5600.4 [.] 0x00000000000494b7 1.56% 0.01% 0.00% 0.00% gsd-color libglib-2.0.so.0.5600.4 [.] 0x00000000000494ce 1.56% 0.00% 0.00% 0.00% mgen [kernel.kallsyms] [k] task_tick_fair 0.00% 0.15% 0.00% 0.04% perf [kernel.kallsyms] [k] smp_call_function_single 0.00% 0.13% 0.00% 6.08% swapper [kernel.kallsyms] [k] intel_idle 0.00% 0.03% 0.00% 0.00% gsd-color libglib-2.0.so.0.5600.4 [.] g_main_context_check 0.00% 0.03% 0.00% 0.00% swapper [kernel.kallsyms] [k] apic_timer_interrupt ... After: # perf report --group --stdio --group-sort-idx 3 # To display the perf.data header info, please use --header/--header-only options. # # # Total Lost Samples: 0 # # Samples: 12K of events 'cpu/instructions,period=2000003/, cpu/cpu-cycles,period=200003/, BR_MISP_RETIRED.ALL_BRANCHES:pp, cpu/event=0xc0,umask=1,cmask=1, # Event count (approx.): 6451235635 # # Overhead Command Shared Object Symbol # ................................ ......... ....................... ................................... # 92.19% 98.68% 0.00% 93.30% mgen mgen [.] LOOP1 0.00% 0.13% 0.00% 6.08% swapper [kernel.kallsyms] [k] intel_idle 3.12% 0.29% 0.00% 0.16% gsd-color libglib-2.0.so.0.5600.4 [.] 0x0000000000049515 0.00% 0.00% 0.00% 0.06% swapper [kernel.kallsyms] [k] hrtimer_start_range_ns 1.56% 0.03% 0.00% 0.04% gsd-color libglib-2.0.so.0.5600.4 [.] 0x00000000000494b7 0.00% 0.15% 0.00% 0.04% perf [kernel.kallsyms] [k] smp_call_function_single 0.00% 0.00% 0.00% 0.02% mgen [kernel.kallsyms] [k] update_curr 0.00% 0.00% 0.00% 0.02% mgen [kernel.kallsyms] [k] apic_timer_interrupt 0.00% 0.00% 0.00% 0.02% mgen [kernel.kallsyms] [k] native_apic_msr_eoi_write 0.00% 0.00% 0.00% 0.02% mgen [kernel.kallsyms] [k] __update_load_avg_se 0.00% 0.00% 0.00% 0.02% mgen [kernel.kallsyms] [k] scheduler_tick Now the output is sorted by the fourth event in group. v7: --- Rebase to latest perf/core, no other change. v4: --- 1. Update Documentation/perf-report.txt to mention '--group-sort-idx' support multiple groups with different amount of events and it should be used on grouped events. 2. Update __hpp__group_sort_idx(), just return when the idx is out of limit. 3. Return failure on symbol_conf.group_sort_idx && !session->evlist->nr_groups. So now we don't need to use together with --group. v3: --- Refine the code in __hpp__group_sort_idx(). Before: for (i = 1; i < nr_members; i++) { if (i == idx) { ret = field_cmp(fields_a[i], fields_b[i]); if (ret) goto out; } } After: if (idx >= 1 && idx < nr_members) { ret = field_cmp(fields_a[idx], fields_b[idx]); if (ret) goto out; } Signed-off-by: Jin Yao <yao.jin@linux.intel.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lore.kernel.org/lkml/20200220013616.19916-2-yao.jin@linux.intel.com [ Renamed pair_fields_alloc() to hist_entry__new_pair() and combined decl + assignment of vars ] Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
![]() |
ec0479a63b |
perf report/top TUI: Support hotkey 'a' for annotation of unresolved addresses
In previous patch, we have supported the annotation functionality even without symbols. For this patch, it supports the hotkey 'a' on address in report view. Note that, for branch mode, we only support the annotation for "branch to" address. Signed-off-by: Jin Yao <yao.jin@linux.intel.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lore.kernel.org/lkml/20200227043939.4403-4-yao.jin@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
![]() |
7b0a0dcb64 |
perf report: Support interactive annotation of code without symbols
For perf report on stripped binaries it is currently impossible to do annotation. The annotation state is all tied to symbols, but there are either no symbols, or symbols are not covering all the code. We should support the annotation functionality even without symbols. This patch fakes a symbol and the symbol name is the string of address. After that, we just follow current annotation working flow. For example, 1. perf report Overhead Command Shared Object Symbol 20.67% div libc-2.27.so [.] __random_r 17.29% div libc-2.27.so [.] __random 10.59% div div [.] 0x0000000000000628 9.25% div div [.] 0x0000000000000612 6.11% div div [.] 0x0000000000000645 2. Select the line of "10.59% div div [.] 0x0000000000000628" and ENTER. Annotate 0x0000000000000628 Zoom into div thread Zoom into div DSO (use the 'k' hotkey to zoom directly into the kernel) Browse map details Run scripts for samples of symbol [0x0000000000000628] Run scripts for all samples Switch to another data file in PWD Exit 3. Select the "Annotate 0x0000000000000628" and ENTER. Percent│ │ │ │ Disassembly of section .text: │ │ 0000000000000628 <.text+0x68>: │ divsd %xmm4,%xmm0 │ divsd %xmm3,%xmm1 │ movsd (%rsp),%xmm2 │ addsd %xmm1,%xmm0 │ addsd %xmm2,%xmm0 │ movsd %xmm0,(%rsp) Now we can see the dump of object starting from 0x628. v5: --- Remove the hotkey 'a' implementation from this patch. It will be moved to a separate patch. v4: --- 1. Support the hotkey 'a'. When we press 'a' on address, now it supports the annotation. 2. Change the patch title from "Support interactive annotation of code without symbols" to "perf report: Support interactive annotation of code without symbols" v3: --- Keep just the ANNOTATION_DUMMY_LEN, and remove the opts->annotate_dummy_len since it's the "maybe in future we will provide" feature. v2: --- Fix a crash issue when annotating an address in "unknown" object. The steps to reproduce this issue: perf record -e cycles:u ls perf report 75.29% ls ld-2.27.so [.] do_lookup_x 23.64% ls ld-2.27.so [.] __GI___tunables_init 1.04% ls [unknown] [k] 0xffffffff85c01210 0.03% ls ld-2.27.so [.] _start When annotating 0xffffffff85c01210, the crash happens. v2 adds checking for ms->map in add_annotate_opt(). If the object is "unknown", ms->map is NULL. Committer notes: Renamed new_annotate_sym() to symbol__new_unresolved(). Use PRIx64 to fix this issue in some 32-bit arches: ui/browsers/hists.c: In function 'symbol__new_unresolved': ui/browsers/hists.c:2474:38: error: format '%lx' expects argument of type 'long unsigned int', but argument 5 has type 'u64' {aka 'long long unsigned int'} [-Werror=format=] snprintf(name, sizeof(name), "%-#.*lx", BITS_PER_LONG / 4, addr); ~~~~~~^ ~~~~ %-#.*llx Signed-off-by: Jin Yao <yao.jin@linux.intel.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Tested-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lore.kernel.org/lkml/20200227043939.4403-3-yao.jin@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
![]() |
443bc639e5 |
perf report: Print al_addr when symbol is not found
For branch mode, if the symbol is not found, it prints the address. For example, 0x0000555eee0365a0 in below output. Overhead Command Source Shared Object Source Symbol Target Symbol 17.55% div libc-2.27.so [.] __random [.] __random 6.11% div div [.] 0x0000555eee0365a0 [.] rand 6.10% div libc-2.27.so [.] rand [.] 0x0000555eee036769 5.80% div libc-2.27.so [.] __random_r [.] __random 5.72% div libc-2.27.so [.] __random [.] __random_r 5.62% div libc-2.27.so [.] __random_r [.] __random_r 5.38% div libc-2.27.so [.] __random [.] rand 4.56% div libc-2.27.so [.] __random [.] __random 4.49% div div [.] 0x0000555eee036779 [.] 0x0000555eee0365ff 4.25% div div [.] 0x0000555eee0365fa [.] 0x0000555eee036760 But it's not very easy to understand what the instructions are in the binary. So this patch uses the al_addr instead. With this patch, the output is Overhead Command Source Shared Object Source Symbol Target Symbol 17.55% div libc-2.27.so [.] __random [.] __random 6.11% div div [.] 0x00000000000005a0 [.] rand 6.10% div libc-2.27.so [.] rand [.] 0x0000000000000769 5.80% div libc-2.27.so [.] __random_r [.] __random 5.72% div libc-2.27.so [.] __random [.] __random_r 5.62% div libc-2.27.so [.] __random_r [.] __random_r 5.38% div libc-2.27.so [.] __random [.] rand 4.56% div libc-2.27.so [.] __random [.] __random 4.49% div div [.] 0x0000000000000779 [.] 0x00000000000005ff 4.25% div div [.] 0x00000000000005fa [.] 0x0000000000000760 Now we can use objdump to dump the object starting from 0x5a0. For example, objdump -d --start-address 0x5a0 div 00000000000005a0 <rand@plt>: 5a0: ff 25 2a 0a 20 00 jmpq *0x200a2a(%rip) # 200fd0 <__cxa_finalize@plt+0x200a20> 5a6: 68 02 00 00 00 pushq $0x2 5ab: e9 c0 ff ff ff jmpq 570 <srand@plt-0x10> ... Committer testing: [root@seventh ~]# perf record -a -b sleep 1 [root@seventh ~]# perf report --header-only | grep cpudesc # cpudesc : Intel(R) Core(TM) i5-7500 CPU @ 3.40GHz [root@seventh ~]# perf evlist -v cycles: size: 120, { sample_period, sample_freq }: 4000, sample_type: IP|TID|TIME|CPU|PERIOD|BRANCH_STACK, read_format: ID, disabled: 1, inherit: 1, mmap: 1, comm: 1, freq: 1, task: 1, precise_ip: 3, sample_id_all: 1, exclude_guest: 1, mmap2: 1, comm_exec: 1, ksymbol: 1, bpf_event: 1, branch_sample_type: ANY [root@seventh ~]# Before: [root@seventh ~]# perf report --stdio --dso libsystemd-shared-241.so | head -20 # To display the perf.data header info, please use --header/--header-only options. # # # Total Lost Samples: 0 # # Samples: 2K of event 'cycles' # Event count (approx.): 2240 # # Overhead Command Source Shared Object Source Symbol Target Symbol Basic Block Cycles # ........ ............... ........................ ...................... ...................... .................. # 0.13% systemd-journal libc-2.29.so [.] cfree@GLIBC_2.2.5 [.] _int_free 1 0.09% systemd libsystemd-shared-241.so [.] 0x00007fe406465c82 [.] 0x00007fe406465d80 1 0.09% systemd libsystemd-shared-241.so [.] 0x00007fe406465ded [.] 0x00007fe406465c30 1 0.09% systemd libsystemd-shared-241.so [.] 0x00007fe406465e4e [.] 0x00007fe406465de0 1 0.09% systemd-journal systemd-journald [.] free@plt [.] cfree@GLIBC_2.2.5 1 0.09% systemd-journal libc-2.29.so [.] _int_free [.] _int_free 18 0.09% systemd-journal libc-2.29.so [.] _int_free [.] _int_free 2 0.04% systemd libsystemd-shared-241.so [.] bus_resolve@plt [.] bus_resolve 204 0.04% systemd libsystemd-shared-241.so [.] getpid_cached@plt [.] getpid_cached 7 [root@seventh ~]# After: [root@seventh ~]# perf report --stdio --dso libsystemd-shared-241.so | head -20 # To display the perf.data header info, please use --header/--header-only options. # # # Total Lost Samples: 0 # # Samples: 2K of event 'cycles' # Event count (approx.): 2240 # # Overhead Command Source Shared Object Source Symbol Target Symbol Basic Block Cycles # ........ ............... ........................ ...................... ...................... .................. # 0.13% systemd-journal libc-2.29.so [.] cfree@GLIBC_2.2.5 [.] _int_free 1 0.09% systemd libsystemd-shared-241.so [.] 0x00000000000f7c82 [.] 0x00000000000f7d80 1 0.09% systemd libsystemd-shared-241.so [.] 0x00000000000f7ded [.] 0x00000000000f7c30 1 0.09% systemd libsystemd-shared-241.so [.] 0x00000000000f7e4e [.] 0x00000000000f7de0 1 0.09% systemd-journal systemd-journald [.] free@plt [.] cfree@GLIBC_2.2.5 1 0.09% systemd-journal libc-2.29.so [.] _int_free [.] _int_free 18 0.09% systemd-journal libc-2.29.so [.] _int_free [.] _int_free 2 0.04% systemd libsystemd-shared-241.so [.] bus_resolve@plt [.] bus_resolve 204 0.04% systemd libsystemd-shared-241.so [.] getpid_cached@plt [.] getpid_cached 7 [root@seventh ~]# Lets use -v to get full paths and then try objdump on the unresolved address: [root@seventh ~]# perf report -v --stdio --dso libsystemd-shared-241.so |& grep libsystemd-shared-241.so | tail -1 0.04% systemd-journal /usr/lib/systemd/libsystemd-shared-241.so 0x80c1a B [.] 0x0000000000080c1a 0x80a95 B [.] 0x0000000000080a95 61 [root@seventh ~]# [root@seventh ~]# objdump -d --start-address 0x00000000000f7d80 /usr/lib/systemd/libsystemd-shared-241.so | head -20 /usr/lib/systemd/libsystemd-shared-241.so: file format elf64-x86-64 Disassembly of section .text: 00000000000f7d80 <proc_cmdline_parse_given@@SD_SHARED+0x330>: f7d80: 41 39 11 cmp %edx,(%r9) f7d83: 0f 84 ff fe ff ff je f7c88 <proc_cmdline_parse_given@@SD_SHARED+0x238> f7d89: 4c 8d 05 97 09 0c 00 lea 0xc0997(%rip),%r8 # 1b8727 <utf8_skip_data@@SD_SHARED+0x3147> f7d90: b9 49 00 00 00 mov $0x49,%ecx f7d95: 48 8d 15 c9 f5 0b 00 lea 0xbf5c9(%rip),%rdx # 1b7365 <utf8_skip_data@@SD_SHARED+0x1d85> f7d9c: 31 ff xor %edi,%edi f7d9e: 48 8d 35 9b ff 0b 00 lea 0xbff9b(%rip),%rsi # 1b7d40 <utf8_skip_data@@SD_SHARED+0x2760> f7da5: e8 a6 d6 f4 ff callq 45450 <log_assert_failed_realm@plt> f7daa: 66 0f 1f 44 00 00 nopw 0x0(%rax,%rax,1) f7db0: 41 56 push %r14 f7db2: 41 55 push %r13 f7db4: 41 54 push %r12 f7db6: 55 push %rbp [root@seventh ~]# If we tried the the reported address before this patch: [root@seventh ~]# objdump -d --start-address 0x00007fe406465d80 /usr/lib/systemd/libsystemd-shared-241.so | head -20 /usr/lib/systemd/libsystemd-shared-241.so: file format elf64-x86-64 [root@seventh ~]# Signed-off-by: Jin Yao <yao.jin@linux.intel.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Tested-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lore.kernel.org/lkml/20200227043939.4403-2-yao.jin@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
![]() |
7eec00a747 |
perf symbols: Consolidate symbol fixup issue
After copying Arm64's perf archive with object files and perf.data file to x86 laptop, the x86's perf kernel symbol resolution fails. It outputs 'unknown' for all symbols parsing. This issue is root caused by the function elf__needs_adjust_symbols(), x86 perf tool uses one weak version, Arm64 (and powerpc) has rewritten their own version. elf__needs_adjust_symbols() decides if need to parse symbols with the relative offset address; but x86 building uses the weak function which misses to check for the elf type 'ET_DYN', so that it cannot parse symbols in Arm DSOs due to the wrong result from elf__needs_adjust_symbols(). The DSO parsing should not depend on any specific architecture perf building; e.g. x86 perf tool can parse Arm and Arm64 DSOs, vice versa. And confirmed by Naveen N. Rao that powerpc64 kernels are not being built as ET_DYN anymore and change to ET_EXEC. This patch removes the arch specific functions for Arm64 and powerpc and changes elf__needs_adjust_symbols() as a common function. In the common elf__needs_adjust_symbols(), it checks an extra condition 'ET_DYN' for elf header type. With this fixing, the Arm64 DSO can be parsed properly with x86's perf tool. Before: # perf script main 3258 1 branches: 0 [unknown] ([unknown]) => ffff800010c4665c [unknown] ([kernel.kallsyms]) main 3258 1 branches: ffff800010c46670 [unknown] ([kernel.kallsyms]) => ffff800010c4eaec [unknown] ([kernel.kallsyms]) main 3258 1 branches: ffff800010c4eaec [unknown] ([kernel.kallsyms]) => ffff800010c4eb00 [unknown] ([kernel.kallsyms]) main 3258 1 branches: ffff800010c4eb08 [unknown] ([kernel.kallsyms]) => ffff800010c4e780 [unknown] ([kernel.kallsyms]) main 3258 1 branches: ffff800010c4e7a0 [unknown] ([kernel.kallsyms]) => ffff800010c4eeac [unknown] ([kernel.kallsyms]) main 3258 1 branches: ffff800010c4eebc [unknown] ([kernel.kallsyms]) => ffff800010c4ed80 [unknown] ([kernel.kallsyms]) After: # perf script main 3258 1 branches: 0 [unknown] ([unknown]) => ffff800010c4665c coresight_timeout+0x54 ([kernel.kallsyms]) main 3258 1 branches: ffff800010c46670 coresight_timeout+0x68 ([kernel.kallsyms]) => ffff800010c4eaec etm4_enable_hw+0x3cc ([kernel.kallsyms]) main 3258 1 branches: ffff800010c4eaec etm4_enable_hw+0x3cc ([kernel.kallsyms]) => ffff800010c4eb00 etm4_enable_hw+0x3e0 ([kernel.kallsyms]) main 3258 1 branches: ffff800010c4eb08 etm4_enable_hw+0x3e8 ([kernel.kallsyms]) => ffff800010c4e780 etm4_enable_hw+0x60 ([kernel.kallsyms]) main 3258 1 branches: ffff800010c4e7a0 etm4_enable_hw+0x80 ([kernel.kallsyms]) => ffff800010c4eeac etm4_enable+0x2d4 ([kernel.kallsyms]) main 3258 1 branches: ffff800010c4eebc etm4_enable+0x2e4 ([kernel.kallsyms]) => ffff800010c4ed80 etm4_enable+0x1a8 ([kernel.kallsyms]) v3: Changed to check for ET_DYN across all architectures. v2: Fixed Arm64 and powerpc native building. Reported-by: Mike Leach <mike.leach@linaro.org> Signed-off-by: Leo Yan <leo.yan@linaro.org> Reviewed-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Acked-by: Jiri Olsa <jolsa@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Allison Randal <allison@lohutok.net> Cc: Enrico Weigelt <info@metux.net> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Hendrik Brueckner <brueckner@linux.vnet.ibm.com> Cc: John Garry <john.garry@huawei.com> Cc: Kate Stewart <kstewart@linuxfoundation.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Thomas Richter <tmricht@linux.vnet.ibm.com> Link: http://lore.kernel.org/lkml/20200306015759.10084-1-leo.yan@linaro.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
![]() |
d4953f7ef1 |
perf parse-events: Fix 3 use after frees found with clang ASAN
Reproducible with a clang asan build and then running perf test in particular 'Parse event definition strings'. Signed-off-by: Ian Rogers <irogers@google.com> Acked-by: Jiri Olsa <jolsa@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: clang-built-linux@googlegroups.com Link: http://lore.kernel.org/lkml/20200314170356.62914-1-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
![]() |
d1c9f7d117 |
perf/core improvements and fixes:
perf record: Alexey Budankov: - Fix binding of AIO user space buffers to nodes maps: Dominik b. Czarnota: - Fix off by one in strncpy() size argument. Arnaldo Carvalho de Melo: - Use strstarts() to look for Android libraries. Ian Rogers: - Give synthetic mmap events an inode generation. man pages: Ian Rogers: - Set man page date to last git commit. perf test: Ian Rogers: - Print if shell directory isn't present. perf report: Jin Yao: - Fix no branch type statistics report issue. perf expr: Jiri Olsa: - Fix copy/paste mistake vendor events: Kan Liang: - Support metric constraints. vendor events intel: Kan Liang: - Add NO_NMI_WATCHDOG metric constraint. vendor events s390: Thomas Richter: - Add new deflate counters for IBM z15. ARM cs-etm: Leo Yan: - Last branch improvements. intel-pt: Adrian Hunter: - Update intel-pt.txt file with new location of the documentation. - Add Intel PT man page references. - Rename intel-pt.txt and put it in man page format. perl scripting: Michael Petlan: - Add common_callchain to fix argument order. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> -----BEGIN PGP SIGNATURE----- iHUEABYIAB0WIQR2GiIUctdOfX2qHhGyPKLppCJ+JwUCXnFBiwAKCRCyPKLppCJ+ J4sOAQDTh5w3GFDOKzFHLqXWOE9mlsXnS7tHdkypuRweBpuQXQEA0Sq125ludwe7 pzZ1MFqZJ85lw0mfDqBV9E1PlgQz8Q8= =1uH9 -----END PGP SIGNATURE----- Merge tag 'perf-core-for-mingo-5.7-20200317' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/core Pull perf/core improvements and fixes from Arnaldo Carvalho de Melo: perf record: Alexey Budankov: - Fix binding of AIO user space buffers to nodes maps: Dominik b. Czarnota: - Fix off by one in strncpy() size argument. Arnaldo Carvalho de Melo: - Use strstarts() to look for Android libraries. Ian Rogers: - Give synthetic mmap events an inode generation. man pages: Ian Rogers: - Set man page date to last git commit. perf test: Ian Rogers: - Print if shell directory isn't present. perf report: Jin Yao: - Fix no branch type statistics report issue. perf expr: Jiri Olsa: - Fix copy/paste mistake vendor events: Kan Liang: - Support metric constraints. vendor events intel: Kan Liang: - Add NO_NMI_WATCHDOG metric constraint. vendor events s390: Thomas Richter: - Add new deflate counters for IBM z15. ARM cs-etm: Leo Yan: - Last branch improvements. intel-pt: Adrian Hunter: - Update intel-pt.txt file with new location of the documentation. - Add Intel PT man page references. - Rename intel-pt.txt and put it in man page format. perl scripting: Michael Petlan: - Add common_callchain to fix argument order. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Conflicts: tools/perf/util/map.c |
|
![]() |
409e1a3140 |
Merge branch 'perf/urgent' into perf/core, to pick up fixes
Signed-off-by: Ingo Molnar <mingo@kernel.org> |
|
![]() |
59a08b4b3b |
perf expr: Fix copy/paste mistake
Copy/paste leftover from recent refactor.
Fixes:
|
|
![]() |
c3b10649a8 |
perf report: Fix no branch type statistics report issue
Previously we could get the report of branch type statistics. For example: # perf record -j any,save_type ... # t perf report --stdio # # Branch Statistics: # COND_FWD: 40.6% COND_BWD: 4.1% CROSS_4K: 24.7% CROSS_2M: 12.3% COND: 44.7% UNCOND: 0.0% IND: 6.1% CALL: 24.5% RET: 24.7% But now for the recent perf, it can't report the branch type statistics. It's a regression issue caused by commit |
|
![]() |
3b7a15b064 |
perf tools: Give synthetic mmap events an inode generation
When mmap2 events are synthesized the ino_generation field isn't being set leading to uninitialized memory being compared. Caught with clang's -fsanitize=memory: ==124733==WARNING: MemorySanitizer: use-of-uninitialized-value #0 0x55a96a6a65cc in __dso_id__cmp tools/perf/util/dsos.c:23:6 #1 0x55a96a6a81d5 in dso_id__cmp tools/perf/util/dsos.c:38:9 #2 0x55a96a6a717f in __dso__cmp_long_name tools/perf/util/dsos.c:74:15 #3 0x55a96a6a6c4c in __dsos__findnew_link_by_longname_id tools/perf/util/dsos.c:106:12 #4 0x55a96a6a851e in __dsos__findnew_by_longname_id tools/perf/util/dsos.c:178:9 #5 0x55a96a6a7798 in __dsos__find_id tools/perf/util/dsos.c:191:9 #6 0x55a96a6a7b57 in __dsos__findnew_id tools/perf/util/dsos.c:251:20 #7 0x55a96a6a7a57 in dsos__findnew_id tools/perf/util/dsos.c:259:17 #8 0x55a96a7776ae in machine__findnew_dso_id tools/perf/util/machine.c:2709:9 #9 0x55a96a77dfcf in map__new tools/perf/util/map.c:193:10 #10 0x55a96a77240a in machine__process_mmap2_event tools/perf/util/machine.c:1670:8 #11 0x55a96a7741a3 in machine__process_event tools/perf/util/machine.c:1882:9 #12 0x55a96a6aee39 in perf_event__process tools/perf/util/event.c:454:9 #13 0x55a96a87d633 in perf_tool__process_synth_event tools/perf/util/synthetic-events.c:63:9 #14 0x55a96a87f131 in perf_event__synthesize_mmap_events tools/perf/util/synthetic-events.c:403:7 #15 0x55a96a8815d6 in __event__synthesize_thread tools/perf/util/synthetic-events.c:548:9 #16 0x55a96a882bff in __perf_event__synthesize_threads tools/perf/util/synthetic-events.c:681:3 #17 0x55a96a881ec2 in perf_event__synthesize_threads tools/perf/util/synthetic-events.c:750:9 #18 0x55a96a562b26 in synth_all tools/perf/tests/mmap-thread-lookup.c:136:9 #19 0x55a96a5623b1 in mmap_events tools/perf/tests/mmap-thread-lookup.c:174:8 #20 0x55a96a561fa0 in test__mmap_thread_lookup tools/perf/tests/mmap-thread-lookup.c:230:2 #21 0x55a96a52c182 in run_test tools/perf/tests/builtin-test.c:378:9 #22 0x55a96a52afc1 in test_and_print tools/perf/tests/builtin-test.c:408:9 #23 0x55a96a52966e in __cmd_test tools/perf/tests/builtin-test.c:603:4 #24 0x55a96a52855d in cmd_test tools/perf/tests/builtin-test.c:747:9 #25 0x55a96a2844d4 in run_builtin tools/perf/perf.c:312:11 #26 0x55a96a282bd0 in handle_internal_command tools/perf/perf.c:364:8 #27 0x55a96a284097 in run_argv tools/perf/perf.c:408:2 #28 0x55a96a282223 in main tools/perf/perf.c:538:3 Uninitialized value was stored to memory at #1 0x55a96a6a18f7 in dso__new_id tools/perf/util/dso.c:1230:14 #2 0x55a96a6a78ee in __dsos__addnew_id tools/perf/util/dsos.c:233:20 #3 0x55a96a6a7bcc in __dsos__findnew_id tools/perf/util/dsos.c:252:21 #4 0x55a96a6a7a57 in dsos__findnew_id tools/perf/util/dsos.c:259:17 #5 0x55a96a7776ae in machine__findnew_dso_id tools/perf/util/machine.c:2709:9 #6 0x55a96a77dfcf in map__new tools/perf/util/map.c:193:10 #7 0x55a96a77240a in machine__process_mmap2_event tools/perf/util/machine.c:1670:8 #8 0x55a96a7741a3 in machine__process_event tools/perf/util/machine.c:1882:9 #9 0x55a96a6aee39 in perf_event__process tools/perf/util/event.c:454:9 #10 0x55a96a87d633 in perf_tool__process_synth_event tools/perf/util/synthetic-events.c:63:9 #11 0x55a96a87f131 in perf_event__synthesize_mmap_events tools/perf/util/synthetic-events.c:403:7 #12 0x55a96a8815d6 in __event__synthesize_thread tools/perf/util/synthetic-events.c:548:9 #13 0x55a96a882bff in __perf_event__synthesize_threads tools/perf/util/synthetic-events.c:681:3 #14 0x55a96a881ec2 in perf_event__synthesize_threads tools/perf/util/synthetic-events.c:750:9 #15 0x55a96a562b26 in synth_all tools/perf/tests/mmap-thread-lookup.c:136:9 #16 0x55a96a5623b1 in mmap_events tools/perf/tests/mmap-thread-lookup.c:174:8 #17 0x55a96a561fa0 in test__mmap_thread_lookup tools/perf/tests/mmap-thread-lookup.c:230:2 #18 0x55a96a52c182 in run_test tools/perf/tests/builtin-test.c:378:9 #19 0x55a96a52afc1 in test_and_print tools/perf/tests/builtin-test.c:408:9 Uninitialized value was stored to memory at #0 0x55a96a7725af in machine__process_mmap2_event tools/perf/util/machine.c:1646:25 #1 0x55a96a7741a3 in machine__process_event tools/perf/util/machine.c:1882:9 #2 0x55a96a6aee39 in perf_event__process tools/perf/util/event.c:454:9 #3 0x55a96a87d633 in perf_tool__process_synth_event tools/perf/util/synthetic-events.c:63:9 #4 0x55a96a87f131 in perf_event__synthesize_mmap_events tools/perf/util/synthetic-events.c:403:7 #5 0x55a96a8815d6 in __event__synthesize_thread tools/perf/util/synthetic-events.c:548:9 #6 0x55a96a882bff in __perf_event__synthesize_threads tools/perf/util/synthetic-events.c:681:3 #7 0x55a96a881ec2 in perf_event__synthesize_threads tools/perf/util/synthetic-events.c:750:9 #8 0x55a96a562b26 in synth_all tools/perf/tests/mmap-thread-lookup.c:136:9 #9 0x55a96a5623b1 in mmap_events tools/perf/tests/mmap-thread-lookup.c:174:8 #10 0x55a96a561fa0 in test__mmap_thread_lookup tools/perf/tests/mmap-thread-lookup.c:230:2 #11 0x55a96a52c182 in run_test tools/perf/tests/builtin-test.c:378:9 #12 0x55a96a52afc1 in test_and_print tools/perf/tests/builtin-test.c:408:9 #13 0x55a96a52966e in __cmd_test tools/perf/tests/builtin-test.c:603:4 #14 0x55a96a52855d in cmd_test tools/perf/tests/builtin-test.c:747:9 #15 0x55a96a2844d4 in run_builtin tools/perf/perf.c:312:11 #16 0x55a96a282bd0 in handle_internal_command tools/perf/perf.c:364:8 #17 0x55a96a284097 in run_argv tools/perf/perf.c:408:2 #18 0x55a96a282223 in main tools/perf/perf.c:538:3 Uninitialized value was created by a heap allocation #0 0x55a96a22f60d in malloc llvm/llvm-project/compiler-rt/lib/msan/msan_interceptors.cpp:925:3 #1 0x55a96a882948 in __perf_event__synthesize_threads tools/perf/util/synthetic-events.c:655:15 #2 0x55a96a881ec2 in perf_event__synthesize_threads tools/perf/util/synthetic-events.c:750:9 #3 0x55a96a562b26 in synth_all tools/perf/tests/mmap-thread-lookup.c:136:9 #4 0x55a96a5623b1 in mmap_events tools/perf/tests/mmap-thread-lookup.c:174:8 #5 0x55a96a561fa0 in test__mmap_thread_lookup tools/perf/tests/mmap-thread-lookup.c:230:2 #6 0x55a96a52c182 in run_test tools/perf/tests/builtin-test.c:378:9 #7 0x55a96a52afc1 in test_and_print tools/perf/tests/builtin-test.c:408:9 #8 0x55a96a52966e in __cmd_test tools/perf/tests/builtin-test.c:603:4 #9 0x55a96a52855d in cmd_test tools/perf/tests/builtin-test.c:747:9 #10 0x55a96a2844d4 in run_builtin tools/perf/perf.c:312:11 #11 0x55a96a282bd0 in handle_internal_command tools/perf/perf.c:364:8 #12 0x55a96a284097 in run_argv tools/perf/perf.c:408:2 #13 0x55a96a282223 in main tools/perf/perf.c:538:3 SUMMARY: MemorySanitizer: use-of-uninitialized-value tools/perf/util/dsos.c:23:6 in __dso_id__cmp Signed-off-by: Ian Rogers <irogers@google.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: clang-built-linux@googlegroups.com Link: http://lore.kernel.org/lkml/20200313053129.131264-1-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
![]() |
b2bf666070 |
perf test: Print if shell directory isn't present
If the shell test directory isn't present the exit code will be 255 but with no error messages printed. Add an error message. Signed-off-by: Ian Rogers <irogers@google.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: http://lore.kernel.org/lkml/20200313005602.45236-1-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
![]() |
44d462acc0 |
perf record: Fix binding of AIO user space buffers to nodes
Correct maxnode parameter value passed to mbind() syscall to be the
amount of node mask bits to analyze plus 1. Dynamically allocate node
mask memory depending on the index of node of cpu being profiled.
Fixes:
|
|
![]() |
67439d555f |
perf scripting perl: Add common_callchain to fix argument order
Since common_callchain has been added to the argument array, we need to reflect it in perl-based scripts, because otherwise the following args would be shifted and thus incorrect. E.g. rw-by-pid and calculation of read and written bytes: Before: read counts by pid: pid comm # reads bytes_requested bytes_read ------ -------------------- ----------- ---------- ---------- 19301 dd 4 424510450039736 0 After: read counts by pid: pid comm # reads bytes_requested bytes_read ------ -------------------- ----------- ---------- ---------- 19301 dd 4 9536 4341 Committer testing: To see before after first do: # perf script record rw-by-pid ^C Now you'll have a perf.data file to report on, then do before and after using: # perf script report rw-by-pid Anbd notice the bytes_request/bytes_read, as above. Signed-off-by: Michael Petlan <mpetlan@redhat.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Benjamin Salon <bsalon@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> LPU-Reference: 20200311132836.12693-1-mpetlan@redhat.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
![]() |
ec2eab9deb |
perf intel-pt: Update intel-pt.txt file with new location of the documentation
Make it easy for people looking in intel-pt.txt to find the new file. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Link: http://lore.kernel.org/lkml/20200311122034.3697-4-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
![]() |
870d325b15 |
perf intel-pt: Add Intel PT man page references
Add references to Intel PT man page in man pages of associated tools. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Link: http://lore.kernel.org/lkml/20200311122034.3697-3-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
![]() |
97256d1a2a |
perf intel-pt: Rename intel-pt.txt and put it in man page format
Make the Intel PT documentation into a man page. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Link: http://lore.kernel.org/lkml/20200311122034.3697-2-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
![]() |
0c2d041232 |
perf doc: Set man page date to last git commit
Currently the man page dates reflect the date the man pages were built. This patch adjusts the date so that the date is when then man page last had a commit against it. The date is generated using 'git log'. Committer testing: $ git log -1 --pretty="format:%cd" --date=short tools/perf/Documentation/perf-top.txt 2020-01-14 Before: rm -rf /tmp/build/perf mkdir -p /tmp/build/perf make -C tools/perf O=/tmp/build/perf/ install $ date Wed 11 Mar 2020 10:21:19 AM -03 $ man perf-top | tail -1 perf 03/11/2020 PERF-TOP(1) $ After: rm -rf /tmp/build/perf mkdir -p /tmp/build/perf make -C tools/perf O=/tmp/build/perf/ install $ date $ date Wed 11 Mar 2020 10:24:06 AM -03 $ man perf-top | tail -1 perf 2020-01-14 PERF-TOP(1) $ Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Masanari Iida <standby24x7@gmail.com> Cc: Mukesh Ojha <mojha@codeaurora.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lore.kernel.org/lkml/20200311052110.23132-1-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
![]() |
bc010dd657 |
perf cs-etm: Fix unsigned variable comparison to zero
The variable 'offset' in function cs_etm__sample() is u64 type, it's not appropriate to check it with 'while (offset > 0)'; this patch changes to 'while (offset)'. Signed-off-by: Leo Yan <leo.yan@linaro.org> Reviewed-by: Mathieu Poirier <mathieu.poirier@linaro.org> Reviewed-by: Mike Leach <mike.leach@linaro.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Robert Walker <robert.walker@arm.com> Cc: Suzuki Poulouse <suzuki.poulose@arm.com> Cc: coresight ml <coresight@lists.linaro.org> Cc: linux-arm-kernel@lists.infradead.org Link: http://lore.kernel.org/lkml/20200219021811.20067-6-leo.yan@linaro.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
![]() |
695378b567 |
perf cs-etm: Optimize copying last branches
If an instruction range packet can generate multiple instruction samples, these samples share the same last branches; it's not necessary to copy the same last branches repeatedly for these samples within the same packet. This patch moves out the last branches copying from function cs_etm__synth_instruction_sample(), and execute it prior to generating instruction samples. Signed-off-by: Leo Yan <leo.yan@linaro.org> Reviewed-by: Mathieu Poirier <mathieu.poirier@linaro.org> Reviewed-by: Mike Leach <mike.leach@linaro.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Robert Walker <robert.walker@arm.com> Cc: Suzuki Poulouse <suzuki.poulose@arm.com> Cc: coresight ml <coresight@lists.linaro.org> Cc: linux-arm-kernel@lists.infradead.org Link: http://lore.kernel.org/lkml/20200219021811.20067-5-leo.yan@linaro.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
![]() |
c9f5baa136 |
perf cs-etm: Correct synthesizing instruction samples
When 'etm->instructions_sample_period' is less than 'tidq->period_instructions', the function cs_etm__sample() cannot handle this case properly with its logic. Let's see below flow as an example: - If we set itrace option '--itrace=i4', then function cs_etm__sample() has variables with initialized values: tidq->period_instructions = 0 etm->instructions_sample_period = 4 - When the first packet is coming: packet->instr_count = 10; the number of instructions executed in this packet is 10, thus update period_instructions as below: tidq->period_instructions = 0 + 10 = 10 instrs_over = 10 - 4 = 6 offset = 10 - 6 - 1 = 3 tidq->period_instructions = instrs_over = 6 - When the second packet is coming: packet->instr_count = 10; in the second pass, assume 10 instructions in the trace sample again: tidq->period_instructions = 6 + 10 = 16 instrs_over = 16 - 4 = 12 offset = 10 - 12 - 1 = -3 -> the negative value tidq->period_instructions = instrs_over = 12 So after handle these two packets, there have below issues: The first issue is that cs_etm__instr_addr() returns the address within the current trace sample of the instruction related to offset, so the offset is supposed to be always unsigned value. But in fact, function cs_etm__sample() might calculate a negative offset value (in handling the second packet, the offset is -3) and pass to cs_etm__instr_addr() with u64 type with a big positive integer. The second issue is it only synthesizes 2 samples for sample period = 4. In theory, every packet has 10 instructions so the two packets have total 20 instructions, 20 instructions should generate 5 samples (4 x 5 = 20). This is because cs_etm__sample() only calls once cs_etm__synth_instruction_sample() to generate instruction sample per range packet. This patch fixes the logic in function cs_etm__sample(); the basic idea for handling coming packet is: - To synthesize the first instruction sample, it combines the left instructions from the previous packet and the head of the new packet; then generate continuous samples with sample period; - At the tail of the new packet, if it has the rest instructions, these instructions will be left for the sequential sample. Suggested-by: Mike Leach <mike.leach@linaro.org> Signed-off-by: Leo Yan <leo.yan@linaro.org> Reviewed-by: Mathieu Poirier <mathieu.poirier@linaro.org> Reviewed-by: Mike Leach <mike.leach@linaro.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Robert Walker <robert.walker@arm.com> Cc: Suzuki Poulouse <suzuki.poulose@arm.com> Cc: coresight ml <coresight@lists.linaro.org> Cc: linux-arm-kernel@lists.infradead.org Link: http://lore.kernel.org/lkml/20200219021811.20067-4-leo.yan@linaro.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
![]() |
f1410028c7 |
perf cs-etm: Continuously record last branch
Every time synthesize instruction sample, the last branch recording will be reset. This is fine if the instruction period is big enough, for example if use the option '--itrace=i100000', the last branch array is reset for every sample with 100000 instructions per period; before generate the next instruction sample, there has the sufficient packets coming to fill the last branch array. On the other hand, if set a very small period, the packets will be significantly reduced between two continuous instruction samples, thus the last branch array is almost empty for new instruction sample by frequently resetting. To allow the last branches to work properly for any instruction periods, this patch avoids to reset the last branch for every instruction sample and only reset it when flush the trace data. The last branches will be reset only for two cases, one is for trace starting, another case is for discontinuous trace; other cases can keep recording last branches for continuous instruction samples. Signed-off-by: Leo Yan <leo.yan@linaro.org> Reviewed-by: Mathieu Poirier <mathieu.poirier@linaro.org> Reviewed-by: Mike Leach <mike.leach@linaro.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Robert Walker <robert.walker@arm.com> Cc: Suzuki Poulouse <suzuki.poulose@arm.com> Cc: coresight ml <coresight@lists.linaro.org> Cc: linux-arm-kernel@lists.infradead.org Link: http://lore.kernel.org/lkml/20200219021811.20067-3-leo.yan@linaro.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
![]() |
d01751563c |
perf cs-etm: Swap packets for instruction samples
If use option '--itrace=iNNN' with Arm CoreSight trace data, perf tool fails inject instruction samples; the root cause is the packets are only swapped for branch samples and last branches but not for instruction samples, so the new coming packets cannot be properly handled for only synthesizing instruction samples. To fix this issue, this patch refactors the code with a new function cs_etm__packet_swap() which is used to swap packets and adds the condition for instruction samples. Signed-off-by: Leo Yan <leo.yan@linaro.org> Reviewed-by: Mathieu Poirier <mathieu.poirier@linaro.org> Reviewed-by: Mike Leach <mike.leach@linaro.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Robert Walker <robert.walker@arm.com> Cc: Suzuki Poulouse <suzuki.poulose@arm.com> Cc: coresight ml <coresight@lists.linaro.org> Cc: linux-arm-kernel@lists.infradead.org Link: http://lore.kernel.org/lkml/20200219021811.20067-2-leo.yan@linaro.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
![]() |
bdadd647cb |
perf map: Use strstarts() to look for Android libraries
And add the '/' to avoid looking at things like "/system/libsomething", when all we want to know if it is like "/system/lib/something", i.e. if it is in that system library dir. Using strstarts() avoids off-by-one errors like recently fixed in this file. Since this adds the '/' I separated this patch, another patch will make this consistent by removing other strncmp(str, prefix, manually calculated prefix length) usage. Reported-by: Dominik Czarnota <dominik.b.czarnota@gmail.com> Acked-by: Dominik Czarnota <dominik.b.czarnota@gmail.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: Link: http://lore.kernel.org/lkml/CABEVAa0_q-uC0vrrqpkqRHy_9RLOSXOJxizMLm1n5faHRy2AeA@mail.gmail.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
![]() |
b8fdcfb5a1 |
perf map: Fix off by one in strncpy() size argument
This patch fixes an off-by-one error in strncpy size argument in
tools/perf/util/map.c. The issue is that in:
strncmp(filename, "/system/lib/", 11)
the passed string literal: "/system/lib/" has 12 bytes (without the NULL
byte) and the passed size argument is 11. As a result, the logic won't
match the ending "/" byte and will pass filepaths that are stored in
other directories e.g. "/system/libmalicious/bin" or just
"/system/libmalicious".
This functionality seems to be present only on Android. I assume the
/system/ directory is only writable by the root user, so I don't think
this bug has much (or any) security impact.
Fixes:
|
|
![]() |
b95fcd2c1c |
perf vendor events intel: Add NO_NMI_WATCHDOG metric constraint
Add NO_NMI_WATCHDOG metric constraint to Page_Walks_Utilization for Sky Lake and Cascade Lake. Committer testing: On a Lenovo T480S, Intel(R) Core(TM) i7-8650U Kaby Lake, that looking at x86's mapfile.csv file is a: $ grep -w skylake tools/perf/pmu-events/arch/x86/mapfile.csv GenuineIntel-6-[4589]E,v24,skylake,core $ So uses the constraint added in this patch in this file: tools/perf/pmu-events/arch/x86/skylake/skl-metrics.json Before: # perf stat -a -M Page_Walks_Utilization sleep 2 Performance counter stats for 'system wide': <not counted> itlb_misses.walk_pending (0.00%) <not counted> dtlb_load_misses.walk_pending (0.00%) <not counted> dtlb_store_misses.walk_pending (0.00%) <not counted> ept.walk_pending (0.00%) <not counted> cycles (0.00%) 2.001750514 seconds time elapsed Some events weren't counted. Try disabling the NMI watchdog: echo 0 > /proc/sys/kernel/nmi_watchdog perf stat ... echo 1 > /proc/sys/kernel/nmi_watchdog The events in group usually have to be from the same PMU. Try reorganizing the group. # After: # perf stat -a -M Page_Walks_Utilization sleep 2 Splitting metric group Page_Walks_Utilization into standalone metrics. Try disabling the NMI watchdog to comply NO_NMI_WATCHDOG metric constraint: echo 0 > /proc/sys/kernel/nmi_watchdog perf stat ... echo 1 > /proc/sys/kernel/nmi_watchdog , Performance counter stats for 'system wide': 36,883,102 itlb_misses.walk_pending # 0.1 Page_Walks_Utilization (79.99%) 123,104,146 dtlb_load_misses.walk_pending (80.02%) 13,720,795 dtlb_store_misses.walk_pending (79.99%) 0 ept.walk_pending (79.99%) 1,519,948,400 cycles (80.01%) 2.002170780 seconds time elapsed # Before and after, if we disable the nmi_watchdog we get: # echo 0 > /proc/sys/kernel/nmi_watchdog # perf stat -a -M Page_Walks_Utilization sleep 2 Performance counter stats for 'system wide': 33,721,658 itlb_misses.walk_pending # 0.1 Page_Walks_Utilization 84,070,996 dtlb_load_misses.walk_pending 9,816,071 dtlb_store_misses.walk_pending 0 ept.walk_pending 704,920,899 cycles 2.002331670 seconds time elapsed # More information about the metric expressions: # perf stat -v -a -M Page_Walks_Utilization sleep 2 Using CPUID GenuineIntel-6-8E-A metric expr ( itlb_misses.walk_pending + dtlb_load_misses.walk_pending + dtlb_store_misses.walk_pending + ept.walk_pending ) / ( 2 * cycles ) for Page_Walks_Utilization found event itlb_misses.walk_pending found event dtlb_load_misses.walk_pending found event dtlb_store_misses.walk_pending found event ept.walk_pending found event cycles adding {itlb_misses.walk_pending,dtlb_load_misses.walk_pending,dtlb_store_misses.walk_pending,ept.walk_pending,cycles}:W -> cpu/umask=0x10,(null)=0x186a3,event=0x85/ -> cpu/umask=0x10,(null)=0x1e8483,event=0x8/ -> cpu/umask=0x10,(null)=0x1e8483,event=0x49/ -> cpu/umask=0x10,(null)=0x1e8483,event=0x4f/ itlb_misses.walk_pending: 8085772 16010162799 16010162799 dtlb_load_misses.walk_pending: 28134579 16010162799 16010162799 dtlb_store_misses.walk_pending: 7276535 16010162799 16010162799 ept.walk_pending: 2 16010162799 16010162799 cycles: 315140605 16010162799 16010162799 Performance counter stats for 'system wide': 8,085,772 itlb_misses.walk_pending # 0.1 Page_Walks_Utilization 28,134,579 dtlb_load_misses.walk_pending 7,276,535 dtlb_store_misses.walk_pending 2 ept.walk_pending 315,140,605 cycles 2.002333181 seconds time elapsed # Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Acked-by: Jiri Olsa <jolsa@redhat.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jin Yao <yao.jin@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@linux.ibm.com> Link: http://lore.kernel.org/lkml/1582581564-184429-6-git-send-email-kan.liang@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
![]() |
ab483d8bc8 |
perf metricgroup: Support metric constraint
Some metric groups have metric constraints. A metric group can be scheduled as a group only when some constraints are applied. For example, Page_Walks_Utilization has a metric constraint, "NO_NMI_WATCHDOG". When NMI watchdog is disabled, the metric group can be scheduled as a group. Otherwise, splitting the metric group into standalone metrics. Add a new function, metricgroup__has_constraint(), to check whether all constraints are applied. If not, splitting the metric group into standalone metrics. Currently, only one constraint, "NO_NMI_WATCHDOG", is checked. Print a warning for the metric group with the constraint, when NMI WATCHDOG is enabled. Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Acked-by: Jiri Olsa <jolsa@redhat.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jin Yao <yao.jin@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@linux.ibm.com> Link: http://lore.kernel.org/lkml/1582581564-184429-5-git-send-email-kan.liang@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
![]() |
2a14c1bf01 |
perf util: Factor out sysctl__nmi_watchdog_enabled()
The NMI watchdog status is required for metric group constraint examination. Factor out sysctl__nmi_watchdog_enabled() to retrieve the NMI watchdog status. Users may count more than one metric group each time. If so, the NMI watchdog status may be retrieved several times. To reduce the overhead, cache the NMI watchdog status. Replace the NMI watchdog status checking in print_footer() by sysctl__nmi_watchdog_enabled(). Suggested-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Acked-by: Jiri Olsa <jolsa@redhat.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jin Yao <yao.jin@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@linux.ibm.com> Link: http://lore.kernel.org/lkml/1582581564-184429-4-git-send-email-kan.liang@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
![]() |
f742634ab4 |
perf metricgroup: Factor out metricgroup__add_metric_weak_group()
Factor out metricgroup__add_metric_weak_group() which add metrics into a weak group. The change can improve code readability. Because following patch will introduce a function which add standalone metrics. Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Acked-by: Jiri Olsa <jolsa@redhat.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jin Yao <yao.jin@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@linux.ibm.com> Link: http://lore.kernel.org/lkml/1582581564-184429-3-git-send-email-kan.liang@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
![]() |
03fe02b113 |
perf jevents: Support metric constraint
A new field "MetricConstraint" is introduced in JSON event list. Extend jevents to parse the field and save the value in metric_constraint. Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Acked-by: Jiri Olsa <jolsa@redhat.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jin Yao <yao.jin@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@linux.ibm.com> Link: http://lore.kernel.org/lkml/1582581564-184429-2-git-send-email-kan.liang@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
![]() |
e7950166e4 |
perf vendor events s390: Add new deflate counters for IBM z15
Add support for new deflate counters: - Counter 247: cycles CPU spent obtaining access to Deflate unit - Counter 252: cycles CPU is using Deflate unit - Counter 264: Increments by one for every DEFLATE CONVERSION CALL instruction executed. - Counter 265: Increments by one for every DEFLATE CONVERSION CALL instruction executed that ended in Condition Codes 0, 1 or 2. Also adjust the some crypto counter description to latest documentation. Signed-off-by: Thomas Richter <tmricht@linux.ibm.com> Reviewed-by: Sumanth Korikkar <sumanthk@linux.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Vasily Gorbik <gor@linux.ibm.com> Link: http://lore.kernel.org/lkml/20200310142937.32045-1-tmricht@linux.ibm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
![]() |
f787feff69 |
perf block-info: Support color ops to print block percents in color
It would be nice to print the block percents with colors. This patch supports the 'Sampled Cycles%' and 'Avg Cycles%' printed in colors. For example, perf record -b ... perf report --total-cycles or perf report --total-cycles --stdio percent > 5%, colored in red percent > 0.5%, colored in green percent < 0.5%, default color Signed-off-by: Jin Yao <yao.jin@linux.intel.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jin Yao <yao.jin@intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lore.kernel.org/lkml/20200202141655.32053-5-yao.jin@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
![]() |
cca0cc76f5 |
perf block-info: Allow selecting which columns to report and its order
Currently we use a predefined array to set the block info output formats, it's fixed and inflexible. This patch adds two parameters "block_hpps" and "nr_hpps" in block_info__create_report and other static functions, in order to let user decide which columns to report and with specified report ordering. It should be more flexible. Buffers will be allocated to contain the new fmts, of course, we need to release them before perf exits. Signed-off-by: Jin Yao <yao.jin@linux.intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jin Yao <yao.jin@intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lore.kernel.org/lkml/20200202141655.32053-4-yao.jin@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
![]() |
a8a9f6dc0d |
perf diff: Use __block_info__cmp() to replace block_pair_cmp()
'perf diff' uses block_pair_cmp() to compare two blocks. But block_info__cmp() has the similar functionality and it's a bit more complete. This patch removes block_pair_cmp() and uses __block_info__cmp() instead. __block_info__cmp() is wrapped by block_info__cmp() and it doesn't receives a perf_hpp_fmt parameter. Signed-off-by: Jin Yao <yao.jin@linux.intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jin Yao <yao.jin@intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lore.kernel.org/lkml/20200202141655.32053-3-yao.jin@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
![]() |
3e152aa984 |
perf block-info: Fix wrong block address comparison in block_info__cmp()
Commit |
|
![]() |
d942815a76 |
perf expr: Make expr__parse() return -1 on error
To match the error value of the expr__find_other function, so all exported expr functions return the same values: 0 on success, -1 on error. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Reviewed-by: Andi Kleen <ak@linux.intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: John Garry <john.garry@huawei.com> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Michael Petlan <mpetlan@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@linux.ibm.com> Link: http://lore.kernel.org/lkml/20200228093616.67125-6-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
![]() |
0f9b1e124b |
perf expr: Straighten expr__parse()/expr__find_other() interface
Now that we have a flex parser we don't need to update the parsed string pointer, so the interface can just be passed the pointer to the expression instead of a pointer to pointer. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Reviewed-by: Andi Kleen <ak@linux.intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: John Garry <john.garry@huawei.com> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Michael Petlan <mpetlan@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@linux.ibm.com> Link: http://lore.kernel.org/lkml/20200228093616.67125-5-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
![]() |
58ca707636 |
perf expr: Increase EXPR_MAX_OTHER to support metrics with more than 15 variables
We have metrics that define more than 15 variables, like Branch_Misprediction_Cost. Increasing the allowed variables count to 20. As Andy pointed out, we can't go too high in here, because some of the code has O(n^2) complexity (already_seen) and we might want to do some other changes (like using hash tables) before increasing the maximum even more. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Reviewed-by: Andi Kleen <ak@linux.intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: John Garry <john.garry@huawei.com> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Michael Petlan <mpetlan@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@linux.ibm.com> Link: http://lore.kernel.org/lkml/20200228093616.67125-4-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
![]() |
26226a9772 |
perf expr: Move expr lexer to flex
Adding expr flex code instead of the manual parser code. So it's easily extensible in upcoming changes. The new flex code is in flex.l object and gets compiled like all the other flexers we use. It's defined as flex reentrant parser. It's used by both expr__parse and expr__find_other interfaces by separating the starting point. There's no intended change of functionality ;-) the test expr is passing. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Reviewed-by: Andi Kleen <ak@linux.intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: John Garry <john.garry@huawei.com> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Michael Petlan <mpetlan@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@linux.ibm.com> Link: http://lore.kernel.org/lkml/20200228093616.67125-3-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
![]() |
576a65b697 |
perf expr: Add expr.c object
Add generic expr code into new expr.c object. The expr.c object will be mainly used in following change that will get rid of the manual flex code, Signed-off-by: Jiri Olsa <jolsa@kernel.org> Reviewed-by: Andi Kleen <ak@linux.intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: John Garry <john.garry@huawei.com> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Michael Petlan <mpetlan@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@linux.ibm.com> Link: http://lore.kernel.org/lkml/20200228093616.67125-2-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
![]() |
277ce1efa7 |
perf header: Add check for unexpected use of reserved membrs in event attr
The perf.data may be generated by a newer version of perf tool, which support new input bits in attr, e.g. new bit for branch_sample_type. The perf.data may be parsed by an older version of perf tool later. The old perf tool may parse the perf.data incorrectly. There is no warning message for this case. Current perf header never check for unknown input bits in attr. When read the event desc from header, check the stored event attr. The reserved bits, sample type, read format and branch sample type will be checked. Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexey Budankov <alexey.budankov@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Pavel Gerasimov <pavel.gerasimov@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@linux.ibm.com> Cc: Stephane Eranian <eranian@google.com> Cc: Vitaly Slobodskoy <vitaly.slobodskoy@intel.com> Link: http://lkml.kernel.org/r/20200228163011.19358-4-kan.liang@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
![]() |
d3f85437ad |
perf evsel: Support PERF_SAMPLE_BRANCH_HW_INDEX
A new branch sample type PERF_SAMPLE_BRANCH_HW_INDEX has been introduced in latest kernel. Enable HW_INDEX by default in LBR call stack mode. If kernel doesn't support the sample type, switching it off. Add HW_INDEX in attr_fprintf as well. User can check whether the branch sample type is set via debug information or header. Committer testing: First collect some samples with LBR callchains, system wide, for a few seconds: # perf record --call-graph lbr -a sleep 5 [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.625 MB perf.data (224 samples) ] # Now lets use 'perf evlist -v' to look at the branch_sample_type: # perf evlist -v cycles: size: 120, { sample_period, sample_freq }: 4000, sample_type: IP|TID|TIME|CALLCHAIN|CPU|PERIOD|BRANCH_STACK, read_format: ID, disabled: 1, inherit: 1, mmap: 1, comm: 1, freq: 1, task: 1, precise_ip: 3, sample_id_all: 1, exclude_guest: 1, mmap2: 1, comm_exec: 1, ksymbol: 1, bpf_event: 1, branch_sample_type: USER|CALL_STACK|NO_FLAGS|NO_CYCLES|HW_INDEX # So the machine has the kernel feature, and it was correctly added to perf_event_attr.branch_sample_type, for the default 'cycles' event. If we do it in another machine, where the kernel lacks the HW_INDEX feature, we get: # perf record --call-graph lbr -a sleep 2s [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 1.690 MB perf.data (499 samples) ] # perf evlist -v cycles: size: 120, { sample_period, sample_freq }: 4000, sample_type: IP|TID|TIME|CALLCHAIN|CPU|PERIOD|BRANCH_STACK, read_format: ID, disabled: 1, inherit: 1, mmap: 1, comm: 1, freq: 1, task: 1, precise_ip: 3, sample_id_all: 1, exclude_guest: 1, mmap2: 1, comm_exec: 1, ksymbol: 1, bpf_event: 1, branch_sample_type: USER|CALL_STACK|NO_FLAGS|NO_CYCLES # No HW_INDEX in attr.branch_sample_type. Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexey Budankov <alexey.budankov@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Pavel Gerasimov <pavel.gerasimov@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@linux.ibm.com> Cc: Stephane Eranian <eranian@google.com> Cc: Vitaly Slobodskoy <vitaly.slobodskoy@intel.com> Link: http://lore.kernel.org/lkml/20200228163011.19358-3-kan.liang@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
![]() |
42bbabed09 |
perf tools: Add hw_idx in struct branch_stack
The low level index of raw branch records for the most recent branch can be recorded in a sample with PERF_SAMPLE_BRANCH_HW_INDEX branch_sample_type. Extend struct branch_stack to support it. However, if the PERF_SAMPLE_BRANCH_HW_INDEX is not applied, only nr and entries[] will be output by kernel. The pointer of entries[] could be wrong, since the output format is different with new struct branch_stack. Add a variable no_hw_idx in struct perf_sample to indicate whether the hw_idx is output. Add get_branch_entry() to return corresponding pointer of entries[0]. To make dummy branch sample consistent as new branch sample, add hw_idx in struct dummy_branch_stack for cs-etm and intel-pt. Apply the new struct branch_stack for synthetic events as well. Extend test case sample-parsing to support new struct branch_stack. Committer notes: Renamed get_branch_entries() to perf_sample__branch_entries() to have proper namespacing and pave the way for this to be moved to libperf, eventually. Add 'static' to that inline as it is in a header. Add 'hw_idx' to 'struct dummy_branch_stack' in cs-etm.c to fix the build on arm64. Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexey Budankov <alexey.budankov@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Pavel Gerasimov <pavel.gerasimov@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@linux.ibm.com> Cc: Stephane Eranian <eranian@google.com> Cc: Vitaly Slobodskoy <vitaly.slobodskoy@intel.com> Link: http://lore.kernel.org/lkml/20200228163011.19358-2-kan.liang@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
![]() |
1efde27542 |
perf probe: Do not depend on dwfl_module_addrsym()
Do not depend on dwfl_module_addrsym() because it can fail on user-space shared libraries. Actually, same bug was fixed by commit |
|
![]() |
6b8d68f1ce |
perf probe: Fix to delete multiple probe event
When we put an event with multiple probes, perf-probe fails to delete
with filters. This comes from a failure to list up the event name
because of overwrapping its name.
To fix this issue, skip to list up the event which has same name.
Without this patch:
# perf probe -l \*
probe_perf:map__map_ip (on perf_sample__fprintf_brstackoff:21@
probe_perf:map__map_ip (on perf_sample__fprintf_brstackoff:25@
probe_perf:map__map_ip (on append_inlines:12@util/machine.c in
probe_perf:map__map_ip (on unwind_entry:19@util/machine.c in /
probe_perf:map__map_ip (on map__map_ip@util/map.h in /home/mhi
probe_perf:map__map_ip (on map__map_ip@util/map.h in /home/mhi
# perf probe -d \*
"*" does not hit any event.
Error: Failed to delete events. Reason: No such file or directory (Code: -2)
With it:
# perf probe -d \*
Removed event: probe_perf:map__map_ip
#
Fixes:
|
|
![]() |
05e54e2386 |
perf parse-events: Fix reading of invalid memory in event parsing
ADD_CONFIG_TERM accesses term->weak, however, in get_config_chgs this value is accessed outside of the list_for_each_entry and references invalid memory. Add an argument for ADD_CONFIG_TERM for weak and set it to false in the get_config_chgs case. This bug was cause by clang's address sanitizer and libfuzzer. It can be reproduced with a command line of: perf stat -a -e i/bs,tsc,L2/o Signed-off-by: Ian Rogers <irogers@google.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: clang-built-linux@googlegroups.com Link: http://lore.kernel.org/lkml/20200307073121.203816-1-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
![]() |
a7ffd416d8 |
perf python: Fix clang detection when using CC=clang-version
Currently, the setup.py script detects the clang compiler only when invoked with CC=clang. But when using a specific version (e.g. CC=clang-11), this doesn't work correctly and wrong compiler flags are set, leading to build errors. To properly detect clang, invoke the compiler with -v and check the output. The first line should start with "clang version ...". Committer testing: $ make CC=clang-9 O=/tmp/build/perf -C tools/perf install-bin <SNIP> $ readelf -wi /tmp/build/perf/python/perf.cpython-37m-x86_64-linux-gnu.so | grep DW_AT_producer | head -1 <c> DW_AT_producer : (indirect string, offset: 0x0): clang version 9.0.1 (Fedora 9.0.1-2.fc31) /usr/bin/clang-9 -Wno-unused-result -Wsign-compare -D DYNAMIC_ANNOTATIONS_ENABLED=1 -D NDEBUG -O2 -g -pipe -Wall -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2 -Wp,-D_GLIBCXX_ASSERTIONS -fexceptions -fstack-protector-strong -grecord-command-line -m64 -mtune=generic -fasynchronous-unwind-tables -fcf-protection=full -D _GNU_SOURCE -fPIC -fwrapv -Wbad-function-cast -Wdeclaration-after-statement -Wformat-security -Wformat-y2k -Winit-self -Wmissing-declarations -Wmissing-prototypes -Wnested-externs -Wno-system-headers -Wold-style-definition -Wpacked -Wredundant-decls -Wstrict-prototypes -Wswitch-default -Wswitch-enum -Wundef -Wwrite-strings -Wformat -Wshadow -D HAVE_ARCH_X86_64_SUPPORT -I /tmp/build/perf/arch/x86/include/generated -D HAVE_SYSCALL_TABLE_SUPPORT -D HAVE_PERF_REGS_SUPPORT -D HAVE_ARCH_REGS_QUERY_REGISTER_OFFSET -Werror -O3 -fno-omit-frame-pointer -ggdb3 -funwind-tables -Wall -Wextra -std=gnu99 -fstack-protector-all -D _FORTIFY_SOURCE=2 -D _LARGEFILE64_SOURCE -D _FILE_OFFSET_BITS=64 -D _GNU_SOURCE -I /home/acme/git/perf/tools/lib/perf/include -I /home/acme/git/perf/tools/perf/util/include -I /home/acme/git/perf/tools/perf/arch/x86/include -I /home/acme/git/perf/tools/include/ -I /home/acme/git/perf/tools/arch/x86/include/uapi -I /home/acme/git/perf/tools/include/uapi -I /home/acme/git/perf/tools/arch/x86/include/ -I /home/acme/git/perf/tools/arch/x86/ -I /tmp/build/perf//util -I /tmp/build/perf/ -I /home/acme/git/perf/tools/perf/util -I /home/acme/git/perf/tools/perf -I /home/acme/git/perf/tools/lib/ -D HAVE_PTHREAD_ATTR_SETAFFINITY_NP -D HAVE_PTHREAD_BARRIER -D HAVE_EVENTFD -D HAVE_GET_CURRENT_DIR_NAME -D HAVE_GETTID -D HAVE_DWARF_GETLOCATIONS_SUPPORT -D HAVE_GLIBC_SUPPORT -D HAVE_AIO_SUPPORT -D HAVE_SCHED_GETCPU_SUPPORT -D HAVE_SETNS_SUPPORT -D HAVE_LIBELF_SUPPORT -D HAVE_LIBELF_MMAP_SUPPORT -D HAVE_ELF_GETPHDRNUM_SUPPORT -D HAVE_GELF_GETNOTE_SUPPORT -D HAVE_ELF_GETSHDRSTRNDX_SUPPORT -D HAVE_DWARF_SUPPORT -D HAVE_LIBBPF_SUPPORT -D HAVE_BPF_PROLOGUE -D HAVE_SDT_EVENT -D HAVE_JITDUMP -D HAVE_DWARF_UNWIND_SUPPORT -D NO_LIBUNWIND_DEBUG_FRAME -D HAVE_LIBUNWIND_SUPPORT -D HAVE_LIBCRYPTO_SUPPORT -D HAVE_SLANG_SUPPORT -D HAVE_GTK2_SUPPORT -D NO_LIBPERL -D HAVE_TIMERFD_SUPPORT -D HAVE_LIBPYTHON_SUPPORT -D HAVE_CPLUS_DEMANGLE_SUPPORT -D HAVE_LIBBFD_SUPPORT -D HAVE_ZLIB_SUPPORT -D HAVE_LZMA_SUPPORT -D HAVE_ZSTD_SUPPORT -D HAVE_LIBCAP_SUPPORT -D HAVE_BACKTRACE_SUPPORT -D HAVE_LIBNUMA_SUPPORT -D HAVE_KVM_STAT_SUPPORT -D DISASM_FOUR_ARGS_SIGNATURE -D HAVE_LIBBABELTRACE_SUPPORT -D HAVE_AUXTRACE_SUPPORT -D HAVE_JVMTI_CMLR -I /tmp/build/perf/ -fPIC -I util/include -I /usr/include/python3.7m -c /home/acme/git/perf/tools/perf/util/python.c -o /tmp/build/perf/python_ext_build/tmp/home/acme/git/perf/tools/perf/util/python.o -Wbad-function-cast -Wdeclaration-after-statement -Wformat-security -Wformat-y2k -Winit-self -Wmissing-declarations -Wmissing-prototypes -Wnested-externs -Wno-system-headers -Wold-style-definition -Wpacked -Wredundant-decls -Wstrict-prototypes -Wswitch-default -Wswitch-enum -Wundef -Wwrite-strings -Wformat -Wshadow -D HAVE_ARCH_X86_64_SUPPORT -I /tmp/build/perf/arch/x86/include/generated -D HAVE_SYSCALL_TABLE_SUPPORT -D HAVE_PERF_REGS_SUPPORT -D HAVE_ARCH_REGS_QUERY_REGISTER_OFFSET -Werror -O3 -fno-omit-frame-pointer -ggdb3 -funwind-tables -Wall -Wextra -std=gnu99 -fstack-protector-all -D _FORTIFY_SOURCE=2 -D _LARGEFILE64_SOURCE -D _FILE_OFFSET_BITS=64 -D _GNU_SOURCE -I /home/acme/git/perf/tools/lib/perf/include -I /home/acme/git/perf/tools/perf/util/include -I /home/acme/git/perf/tools/perf/arch/x86/include -I /home/acme/git/perf/tools/include/ -I /home/acme/git/perf/tools/arch/x86/include/uapi -I /home/acme/git/perf/tools/include/uapi -I /home/acme/git/perf/tools/arch/x86/include/ -I /home/acme/git/perf/tools/arch/x86/ -I /tmp/build/perf//util -I /tmp/build/perf/ -I /home/acme/git/perf/tools/perf/util -I /home/acme/git/perf/tools/perf -I /home/acme/git/perf/tools/lib/ -D HAVE_PTHREAD_ATTR_SETAFFINITY_NP -D HAVE_PTHREAD_BARRIER -D HAVE_EVENTFD -D HAVE_GET_CURRENT_DIR_NAME -D HAVE_GETTID -D HAVE_DWARF_GETLOCATIONS_SUPPORT -D HAVE_GLIBC_SUPPORT -D HAVE_AIO_SUPPORT -D HAVE_SCHED_GETCPU_SUPPORT -D HAVE_SETNS_SUPPORT -D HAVE_LIBELF_SUPPORT -D HAVE_LIBELF_MMAP_SUPPORT -D HAVE_ELF_GETPHDRNUM_SUPPORT -D HAVE_GELF_GETNOTE_SUPPORT -D HAVE_ELF_GETSHDRSTRNDX_SUPPORT -D HAVE_DWARF_SUPPORT -D HAVE_LIBBPF_SUPPORT -D HAVE_BPF_PROLOGUE -D HAVE_SDT_EVENT -D HAVE_JITDUMP -D HAVE_DWARF_UNWIND_SUPPORT -D NO_LIBUNWIND_DEBUG_FRAME -D HAVE_LIBUNWIND_SUPPORT -D HAVE_LIBCRYPTO_SUPPORT -D HAVE_SLANG_SUPPORT -D HAVE_GTK2_SUPPORT -D NO_LIBPERL -D HAVE_TIMERFD_SUPPORT -D HAVE_LIBPYTHON_SUPPORT -D HAVE_CPLUS_DEMANGLE_SUPPORT -D HAVE_LIBBFD_SUPPORT -D HAVE_ZLIB_SUPPORT -D HAVE_LZMA_SUPPORT -D HAVE_ZSTD_SUPPORT -D HAVE_LIBCAP_SUPPORT -D HAVE_BACKTRACE_SUPPORT -D HAVE_LIBNUMA_SUPPORT -D HAVE_KVM_STAT_SUPPORT -D DISASM_FOUR_ARGS_SIGNATURE -D HAVE_LIBBABELTRACE_SUPPORT -D HAVE_AUXTRACE_SUPPORT -D HAVE_JVMTI_CMLR -I /tmp/build/perf/ -fno-strict-aliasing -Wno-write-strings -Wno-unused-parameter -Wno-redundant-decls $ And here is how tools/perf/util/setup.py checks if the used clang has options that the distro specific python extension building compiler defaults: if cc_is_clang: from distutils.sysconfig import get_config_vars vars = get_config_vars() for var in ('CFLAGS', 'OPT'): vars[var] = sub("-specs=[^ ]+", "", vars[var]) if not clang_has_option("-mcet"): vars[var] = sub("-mcet", "", vars[var]) if not clang_has_option("-fcf-protection"): vars[var] = sub("-fcf-protection", "", vars[var]) if not clang_has_option("-fstack-clash-protection"): vars[var] = sub("-fstack-clash-protection", "", vars[var]) if not clang_has_option("-fstack-protector-strong"): vars[var] = sub("-fstack-protector-strong", "", vars[var]) So "-fcf-protection=full" is used, clang-9 has this option and thus it was kept, the perf python extension was built with it and the build completed successfully. Link: https://github.com/ClangBuiltLinux/linux/issues/903 Signed-off-by: Ilie Halip <ilie.halip@gmail.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Igor Lubashev <ilubashe@akamai.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: clang-built-linux@googlegroups.com Link: http://lore.kernel.org/lkml/20200309085618.14307-1-ilie.halip@gmail.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
![]() |
db2c549407 |
perf map: Fix off by one in strncpy() size argument
This patch fixes an off-by-one error in strncpy size argument in
tools/perf/util/map.c. The issue is that in:
strncmp(filename, "/system/lib/", 11)
the passed string literal: "/system/lib/" has 12 bytes (without the NULL
byte) and the passed size argument is 11. As a result, the logic won't
match the ending "/" byte and will pass filepaths that are stored in
other directories e.g. "/system/libmalicious/bin" or just
"/system/libmalicious".
This functionality seems to be present only on Android. I assume the
/system/ directory is only writable by the root user, so I don't think
this bug has much (or any) security impact.
Fixes:
|
|
![]() |
be40920fbf |
tools: Let O= makes handle a relative path with -C option
When I tried to compile tools/perf from the top directory with the -C
option, the O= option didn't work correctly if I passed a relative path:
$ make O=BUILD -C tools/perf/
make: Entering directory '/home/mhiramat/ksrc/linux/tools/perf'
BUILD: Doing 'make -j8' parallel build
../scripts/Makefile.include:4: *** O=/home/mhiramat/ksrc/linux/tools/perf/BUILD does not exist. Stop.
make: *** [Makefile:70: all] Error 2
make: Leaving directory '/home/mhiramat/ksrc/linux/tools/perf'
The O= directory existence check failed because the check script ran in
the build target directory instead of the directory where I ran the make
command.
To fix that, once change directory to $(PWD) and check O= directory,
since the PWD is set to where the make command runs.
Fixes:
|
|
![]() |
441b62acd9 |
tools: Fix off-by 1 relative directory includes
This is currently working due to extra include paths in the build. Committer testing: $ cd tools/include/uapi/asm/ Before this patch: $ ls -la ../../arch/x86/include/uapi/asm/errno.h ls: cannot access '../../arch/x86/include/uapi/asm/errno.h': No such file or directory $ After this patch; $ ls -la ../../../arch/x86/include/uapi/asm/errno.h -rw-rw-r--. 1 acme acme 31 Feb 20 12:42 ../../../arch/x86/include/uapi/asm/errno.h $ Check that that is still under tools/, i.e. hasn't escaped into the main kernel sources: $ cd ../../../arch/x86/include/uapi/asm/ $ pwd /home/acme/git/perf/tools/arch/x86/include/uapi/asm $ Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Alexios Zavras <alexios.zavras@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Igor Lubashev <ilubashe@akamai.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Nick Desaulniers <ndesaulniers@google.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Wei Li <liwei391@huawei.com> Link: http://lore.kernel.org/lkml/20200306071110.130202-2-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
![]() |
3f5777fbaf |
perf jevents: Fix leak of mapfile memory
The memory for global pointer is never freed during normal program execution, so let's do that in the main function exit as a good programming practice. A stray blank line is also removed. Reported-by: Jiri Olsa <jolsa@redhat.com> Signed-off-by: John Garry <john.garry@huawei.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: James Clark <james.clark@arm.com> Cc: Joakim Zhang <qiangqing.zhang@nxp.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Will Deacon <will@kernel.org> Cc: linuxarm@huawei.com Link: http://lore.kernel.org/lkml/1583406486-154841-2-git-send-email-john.garry@huawei.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
![]() |
7b919a5310 |
perf bench: Clear struct sigaction before sigaction() syscall
Avoid garbage in sigaction structs used in sigaction() syscalls. Valgrind is complaining about it. Signed-off-by: Tommi Rantala <tommi.t.rantala@nokia.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Changbin Du <changbin.du@intel.com> Cc: Darren Hart <dvhart@infradead.org> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lore.kernel.org/lkml/20200305083714.9381-4-tommi.t.rantala@nokia.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
![]() |
f649bd9dd5 |
perf bench futex-wake: Restore thread count default to online CPU count
Since commit |
|
![]() |
29b4f5f188 |
perf top: Fix stdio interface input handling with glibc 2.28+
Since glibc 2.28 when running 'perf top --stdio', input handling no longer works, but hitting any key always just prints the "Mapped keys" help text. To fix it, call clearerr() in the display_thread() loop to clear any EOF sticky errors, as instructed in the glibc NEWS file (https://sourceware.org/git/?p=glibc.git;a=blob;f=NEWS): * All stdio functions now treat end-of-file as a sticky condition. If you read from a file until EOF, and then the file is enlarged by another process, you must call clearerr or another function with the same effect (e.g. fseek, rewind) before you can read the additional data. This corrects a longstanding C99 conformance bug. It is most likely to affect programs that use stdio to read interactive input from a terminal. (Bug #1190.) Signed-off-by: Tommi Rantala <tommi.t.rantala@nokia.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lore.kernel.org/lkml/20200305083714.9381-2-tommi.t.rantala@nokia.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
![]() |
cfd3bc752a |
perf diff: Fix undefined string comparision spotted by clang's -Wstring-compare
clang warns: util/block-info.c:298:18: error: result of comparison against a string literal is unspecified (use an explicit string comparison function instead) [-Werror,-Wstring-compare] if ((start_line != SRCLINE_UNKNOWN) && (end_line != SRCLINE_UNKNOWN)) { ^ ~~~~~~~~~~~~~~~ util/block-info.c:298:51: error: result of comparison against a string literal is unspecified (use an explicit string comparison function instead) [-Werror,-Wstring-compare] if ((start_line != SRCLINE_UNKNOWN) && (end_line != SRCLINE_UNKNOWN)) { ^ ~~~~~~~~~~~~~~~ util/block-info.c:298:18: error: result of comparison against a string literal is unspecified (use an explicit string comparison function instead) [-Werror,-Wstring-compare] if ((start_line != SRCLINE_UNKNOWN) && (end_line != SRCLINE_UNKNOWN)) { ^ ~~~~~~~~~~~~~~~ util/block-info.c:298:51: error: result of comparison against a string literal is unspecified (use an explicit string comparison function instead) [-Werror,-Wstring-compare] if ((start_line != SRCLINE_UNKNOWN) && (end_line != SRCLINE_UNKNOWN)) { ^ ~~~~~~~~~~~~~~~ util/map.c:434:15: error: result of comparison against a string literal is unspecified (use an explicit string comparison function instead) [-Werror,-Wstring-compare] if (srcline != SRCLINE_UNKNOWN) ^ ~~~~~~~~~~~~~~~ Reviewer Notes: Looks good to me. Some more context: https://clang.llvm.org/docs/DiagnosticsReference.html#wstring-compare The spec says: J.1 Unspecified behavior The following are unspecified: .. Whether two string literals result in distinct arrays (6.4.5). Signed-off-by: Nick Desaulniers <nick.desaulniers@gmail.com> Reviewed-by: Ian Rogers <irogers@google.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Changbin Du <changbin.du@intel.com> Cc: Jin Yao <yao.jin@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: John Keeping <john@metanate.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Song Liu <songliubraving@fb.com> Cc: clang-built-linux@googlegroups.com Link: https://github.com/ClangBuiltLinux/linux/issues/900 Link: http://lore.kernel.org/lkml/20200223193456.25291-1-nick.desaulniers@gmail.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
![]() |
dabce16bd2 |
perf annotate: Get rid of annotation->nr_jumps
The 'nr_jumps' field in 'struct annotation' is not used since it's
inception in commit
|
|
![]() |
357a5d24c4 |
perf llvm: Add debug hint message about missing kernel-devel package
To help in debugging, add this extra message: detect_kbuild_dir: Couldn't find "/lib/modules/5.4.20-200.fc31.x86_64/build/include/generated/autoconf.h", missing kernel-devel package?. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
![]() |
1af62ce61c |
perf stat: Show percore counts in per CPU output
We have supported the event modifier "percore" which sums up the event counts for all hardware threads in a core and show the counts per core. For example, # perf stat -e cpu/event=cpu-cycles,percore/ -a -A -- sleep 1 Performance counter stats for 'system wide': S0-D0-C0 395,072 cpu/event=cpu-cycles,percore/ S0-D0-C1 851,248 cpu/event=cpu-cycles,percore/ S0-D0-C2 954,226 cpu/event=cpu-cycles,percore/ S0-D0-C3 1,233,659 cpu/event=cpu-cycles,percore/ This patch provides a new option "--percore-show-thread". It is used with event modifier "percore" together to sum up the event counts for all hardware threads in a core but show the counts per hardware thread. This is essentially a replacement for the any bit (which is gone in Icelake). Per core counts are useful for some formulas, e.g. CoreIPC. The original percore version was inconvenient to post process. This variant matches the output of the any bit. With this patch, for example, # perf stat -e cpu/event=cpu-cycles,percore/ -a -A --percore-show-thread -- sleep 1 Performance counter stats for 'system wide': CPU0 2,453,061 cpu/event=cpu-cycles,percore/ CPU1 1,823,921 cpu/event=cpu-cycles,percore/ CPU2 1,383,166 cpu/event=cpu-cycles,percore/ CPU3 1,102,652 cpu/event=cpu-cycles,percore/ CPU4 2,453,061 cpu/event=cpu-cycles,percore/ CPU5 1,823,921 cpu/event=cpu-cycles,percore/ CPU6 1,383,166 cpu/event=cpu-cycles,percore/ CPU7 1,102,652 cpu/event=cpu-cycles,percore/ We can see counts are duplicated in CPU pairs (CPU0/CPU4, CPU1/CPU5, CPU2/CPU6, CPU3/CPU7). The interval mode also works. For example, # perf stat -e cpu/event=cpu-cycles,percore/ -a -A --percore-show-thread -I 1000 # time CPU counts unit events 1.000425421 CPU0 925,032 cpu/event=cpu-cycles,percore/ 1.000425421 CPU1 430,202 cpu/event=cpu-cycles,percore/ 1.000425421 CPU2 436,843 cpu/event=cpu-cycles,percore/ 1.000425421 CPU3 1,192,504 cpu/event=cpu-cycles,percore/ 1.000425421 CPU4 925,032 cpu/event=cpu-cycles,percore/ 1.000425421 CPU5 430,202 cpu/event=cpu-cycles,percore/ 1.000425421 CPU6 436,843 cpu/event=cpu-cycles,percore/ 1.000425421 CPU7 1,192,504 cpu/event=cpu-cycles,percore/ If we offline CPU5, the result is: # perf stat -e cpu/event=cpu-cycles,percore/ -a -A --percore-show-thread -- sleep 1 Performance counter stats for 'system wide': CPU0 2,752,148 cpu/event=cpu-cycles,percore/ CPU1 1,009,312 cpu/event=cpu-cycles,percore/ CPU2 2,784,072 cpu/event=cpu-cycles,percore/ CPU3 2,427,922 cpu/event=cpu-cycles,percore/ CPU4 2,752,148 cpu/event=cpu-cycles,percore/ CPU6 2,784,072 cpu/event=cpu-cycles,percore/ CPU7 2,427,922 cpu/event=cpu-cycles,percore/ 1.001416041 seconds time elapsed v4: --- Ravi Bangoria reports an issue in v3. Once we offline a CPU, the output is not correct. The issue is we should use the cpu idx in print_percore_thread rather than using the cpu value. v3: --- 1. Fix the interval mode output error 2. Use cpu value (not cpu index) in config->aggr_get_id(). 3. Refine the code according to Jiri's comments. v2: --- Add the explanation in change log. This is essentially a replacement for the any bit. No code change. Signed-off-by: Jin Yao <yao.jin@linux.intel.com> Tested-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lore.kernel.org/lkml/20200214080452.26402-1-yao.jin@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
![]() |
7982a89851 |
tools lib api fs: Move cgroupsfs_find_mountpoint()
Move it from tools/perf/util/cgroup.c as it can be used by other places. Note that cgroup filesystem is different from others since it's usually mounted separately (in v1) for each subsystem. I just copied the code with a little modification to pass a name of subsystem. Suggested-by: Jiri Olsa <jolsa@redhat.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Jiri Olsa <jolsa@redhat.com> Link: http://lore.kernel.org/lkml/20200127100031.1368732-1-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
![]() |
c395c3553d |
perf diff: Fix undefined string comparison spotted by clang's -Wstring-compare
clang warns: util/block-info.c:298:18: error: result of comparison against a string literal is unspecified (use an explicit string comparison function instead) [-Werror,-Wstring-compare] if ((start_line != SRCLINE_UNKNOWN) && (end_line != SRCLINE_UNKNOWN)) { ^ ~~~~~~~~~~~~~~~ util/block-info.c:298:51: error: result of comparison against a string literal is unspecified (use an explicit string comparison function instead) [-Werror,-Wstring-compare] if ((start_line != SRCLINE_UNKNOWN) && (end_line != SRCLINE_UNKNOWN)) { ^ ~~~~~~~~~~~~~~~ util/block-info.c:298:18: error: result of comparison against a string literal is unspecified (use an explicit string comparison function instead) [-Werror,-Wstring-compare] if ((start_line != SRCLINE_UNKNOWN) && (end_line != SRCLINE_UNKNOWN)) { ^ ~~~~~~~~~~~~~~~ util/block-info.c:298:51: error: result of comparison against a string literal is unspecified (use an explicit string comparison function instead) [-Werror,-Wstring-compare] if ((start_line != SRCLINE_UNKNOWN) && (end_line != SRCLINE_UNKNOWN)) { ^ ~~~~~~~~~~~~~~~ util/map.c:434:15: error: result of comparison against a string literal is unspecified (use an explicit string comparison function instead) [-Werror,-Wstring-compare] if (srcline != SRCLINE_UNKNOWN) ^ ~~~~~~~~~~~~~~~ Reviewer Notes: Looks good to me. Some more context: https://clang.llvm.org/docs/DiagnosticsReference.html#wstring-compare The spec says: J.1 Unspecified behavior The following are unspecified: .. Whether two string literals result in distinct arrays (6.4.5). Signed-off-by: Nick Desaulniers <nick.desaulniers@gmail.com> Reviewed-by: Ian Rogers <irogers@google.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Changbin Du <changbin.du@intel.com> Cc: Jin Yao <yao.jin@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: John Keeping <john@metanate.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Song Liu <songliubraving@fb.com> Cc: clang-built-linux@googlegroups.com Link: https://github.com/ClangBuiltLinux/linux/issues/900 Link: http://lore.kernel.org/lkml/20200223193456.25291-1-nick.desaulniers@gmail.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
![]() |
b5c0951860 |
perf symbols: Don't try to find a vmlinux file when looking for kernel modules
The dso->kernel value is now set to everything that is in
machine->kmaps, but that was being used to decide if vmlinux lookup is
needed, which ended up making that lookup be made for kernel modules,
that now have dso->kernel set, leading to these kinds of warnings when
running on a machine with compressed kernel modules, like fedora:31:
[root@five ~]# perf record -F 10000 -a sleep 2
[ perf record: Woken up 1 times to write data ]
lzma: fopen failed on vmlinux: 'No such file or directory'
lzma: fopen failed on /boot/vmlinux: 'No such file or directory'
lzma: fopen failed on /boot/vmlinux-5.5.5-200.fc31.x86_64: 'No such file or directory'
lzma: fopen failed on /usr/lib/debug/boot/vmlinux-5.5.5-200.fc31.x86_64: 'No such file or directory'
lzma: fopen failed on /lib/modules/5.5.5-200.fc31.x86_64/build/vmlinux: 'No such file or directory'
lzma: fopen failed on vmlinux: 'No such file or directory'
lzma: fopen failed on /boot/vmlinux: 'No such file or directory'
lzma: fopen failed on /boot/vmlinux-5.5.5-200.fc31.x86_64: 'No such file or directory'
lzma: fopen failed on /usr/lib/debug/boot/vmlinux-5.5.5-200.fc31.x86_64: 'No such file or directory'
lzma: fopen failed on /lib/modules/5.5.5-200.fc31.x86_64/build/vmlinux: 'No such file or directory'
lzma: fopen failed on vmlinux: 'No such file or directory'
lzma: fopen failed on /boot/vmlinux: 'No such file or directory'
lzma: fopen failed on /boot/vmlinux-5.5.5-200.fc31.x86_64: 'No such file or directory'
lzma: fopen failed on /usr/lib/debug/boot/vmlinux-5.5.5-200.fc31.x86_64: 'No such file or directory'
lzma: fopen failed on /lib/modules/5.5.5-200.fc31.x86_64/build/vmlinux: 'No such file or directory'
lzma: fopen failed on vmlinux: 'No such file or directory'
lzma: fopen failed on /boot/vmlinux: 'No such file or directory'
lzma: fopen failed on /boot/vmlinux-5.5.5-200.fc31.x86_64: 'No such file or directory'
lzma: fopen failed on /usr/lib/debug/boot/vmlinux-5.5.5-200.fc31.x86_64: 'No such file or directory'
lzma: fopen failed on /lib/modules/5.5.5-200.fc31.x86_64/build/vmlinux: 'No such file or directory'
lzma: fopen failed on vmlinux: 'No such file or directory'
lzma: fopen failed on /boot/vmlinux: 'No such file or directory'
lzma: fopen failed on /boot/vmlinux-5.5.5-200.fc31.x86_64: 'No such file or directory'
lzma: fopen failed on /usr/lib/debug/boot/vmlinux-5.5.5-200.fc31.x86_64: 'No such file or directory'
lzma: fopen failed on /lib/modules/5.5.5-200.fc31.x86_64/build/vmlinux: 'No such file or directory'
[ perf record: Captured and wrote 1.024 MB perf.data (1366 samples) ]
[root@five ~]#
This happens when collecting the buildid, when we find samples for
kernel modules, fix it by checking if the looked up DSO is a kernel
module by other means.
Fixes:
|
|
![]() |
e4d9b04b97 |
perf bench: Share some global variables to fix build with gcc 10
Noticed with gcc 10 (fedora rawhide) that those variables were not being declared as static, so end up with: ld: /tmp/build/perf/bench/epoll-wait.o:/git/perf/tools/perf/bench/epoll-wait.c:93: multiple definition of `end'; /tmp/build/perf/bench/futex-hash.o:/git/perf/tools/perf/bench/futex-hash.c:40: first defined here ld: /tmp/build/perf/bench/epoll-wait.o:/git/perf/tools/perf/bench/epoll-wait.c:93: multiple definition of `start'; /tmp/build/perf/bench/futex-hash.o:/git/perf/tools/perf/bench/futex-hash.c:40: first defined here ld: /tmp/build/perf/bench/epoll-wait.o:/git/perf/tools/perf/bench/epoll-wait.c:93: multiple definition of `runtime'; /tmp/build/perf/bench/futex-hash.o:/git/perf/tools/perf/bench/futex-hash.c:40: first defined here ld: /tmp/build/perf/bench/epoll-ctl.o:/git/perf/tools/perf/bench/epoll-ctl.c:38: multiple definition of `end'; /tmp/build/perf/bench/futex-hash.o:/git/perf/tools/perf/bench/futex-hash.c:40: first defined here ld: /tmp/build/perf/bench/epoll-ctl.o:/git/perf/tools/perf/bench/epoll-ctl.c:38: multiple definition of `start'; /tmp/build/perf/bench/futex-hash.o:/git/perf/tools/perf/bench/futex-hash.c:40: first defined here ld: /tmp/build/perf/bench/epoll-ctl.o:/git/perf/tools/perf/bench/epoll-ctl.c:38: multiple definition of `runtime'; /tmp/build/perf/bench/futex-hash.o:/git/perf/tools/perf/bench/futex-hash.c:40: first defined here make[4]: *** [/git/perf/tools/build/Makefile.build:145: /tmp/build/perf/bench/perf-in.o] Error 1 Prefix those with bench__ and add them to bench/bench.h, so that we can share those on the tools needing to access those variables from signal handlers. Acked-by: Thomas Gleixner <tglx@linutronix.de> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: http://lore.kernel.org/lkml/20200303155811.GD13702@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
![]() |
7125f20450 |
perf parse-events: Use asprintf() instead of strncpy() to read tracepoint files
Make the code more compact by using asprintf() instead of malloc()+strncpy() which also uses less memory and avoids these warnings with gcc 10: CC /tmp/build/perf/util/cloexec.o In file included from /usr/include/string.h:495, from util/parse-events.h:12, from util/parse-events.c:18: In function ‘strncpy’, inlined from ‘tracepoint_id_to_path’ at util/parse-events.c:271:5: /usr/include/bits/string_fortified.h:106:10: error: ‘__builtin_strncpy’ offset [275, 511] from the object at ‘sys_dirent’ is out of the bounds of referenced subobject ‘d_name’ with type ‘char[256]’ at offset 19 [-Werror=array-bounds] 106 | return __builtin___strncpy_chk (__dest, __src, __len, __bos (__dest)); | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ In file included from /usr/include/dirent.h:61, from util/parse-events.c:5: util/parse-events.c: In function ‘tracepoint_id_to_path’: /usr/include/bits/dirent.h:33:10: note: subobject ‘d_name’ declared here 33 | char d_name[256]; /* We must not include limits.h! */ | ^~~~~~ In file included from /usr/include/string.h:495, from util/parse-events.h:12, from util/parse-events.c:18: In function ‘strncpy’, inlined from ‘tracepoint_id_to_path’ at util/parse-events.c:273:5: /usr/include/bits/string_fortified.h:106:10: error: ‘__builtin_strncpy’ offset [275, 511] from the object at ‘evt_dirent’ is out of the bounds of referenced subobject ‘d_name’ with type ‘char[256]’ at offset 19 [-Werror=array-bounds] 106 | return __builtin___strncpy_chk (__dest, __src, __len, __bos (__dest)); | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ In file included from /usr/include/dirent.h:61, from util/parse-events.c:5: util/parse-events.c: In function ‘tracepoint_id_to_path’: /usr/include/bits/dirent.h:33:10: note: subobject ‘d_name’ declared here 33 | char d_name[256]; /* We must not include limits.h! */ | ^~~~~~ CC /tmp/build/perf/util/call-path.o Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: http://lore.kernel.org/lkml/20200302145535.GA28183@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
![]() |
ebcb9464a2 |
perf env: Do not return pointers to local variables
It is possible to return a pointer to a local variable when looking up the architecture name for the running system and no normalization is done on that value, i.e. we may end up returning the uts.machine local variable. While this doesn't happen on most arches, as normalization takes place, lets fix this by making that a static variable and optimize it a bit by not always running uname(), only the first time. Noticed in fedora rawhide running with: [perfbuilder@a5ff49d6e6e4 ~]$ gcc --version gcc (GCC) 10.0.1 20200216 (Red Hat 10.0.1-0.8) Reported-by: Jiri Olsa <jolsa@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
![]() |
cff20b3151 |
perf tests bp_account: Make global variable static
To fix the build with newer gccs, that without this patch exit with: LD /tmp/build/perf/tests/perf-in.o ld: /tmp/build/perf/tests/bp_account.o:/git/perf/tools/perf/tests/bp_account.c:22: multiple definition of `the_var'; /tmp/build/perf/tests/bp_signal.o:/git/perf/tools/perf/tests/bp_signal.c:38: first defined here make[4]: *** [/git/perf/tools/build/Makefile.build:145: /tmp/build/perf/tests/perf-in.o] Error 1 First noticed in fedora:rawhide/32 with: [perfbuilder@a5ff49d6e6e4 ~]$ gcc --version gcc (GCC) 10.0.1 20200216 (Red Hat 10.0.1-0.8) Reported-by: Jiri Olsa <jolsa@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
![]() |
e0560ba6d9 |
perf annotate: Fix segfault with source toggle
While rendering annotate browser from perf report tui, we keep track of total number of lines(asm + source) in annotation->nr_entries and total number of asm lines in annotation->nr_asm_entries. But we don't reset them before starting. Thus if user annotates same function multiple times, we restart incrementing these fields with old values. This causes a segfault when user tries to toggle source code after annotating same function multiple times. Fix it. Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Acked-by: Jiri Olsa <jolsa@redhat.com> Cc: Ian Rogers <irogers@google.com> Cc: Jin Yao <yao.jin@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Song Liu <songliubraving@fb.com> Link: http://lore.kernel.org/lkml/20200204045233.474937-5-ravi.bangoria@linux.ibm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
![]() |
d3c03147bf |
perf annotate: Align struct annotate_args
Align fields of struct annotate_args. Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com> Acked-by: Jiri Olsa <jolsa@redhat.com> Cc: Ian Rogers <irogers@google.com> Cc: Jin Yao <yao.jin@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Song Liu <songliubraving@fb.com> Link: http://lore.kernel.org/lkml/20200204045233.474937-4-ravi.bangoria@linux.ibm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
![]() |
2316f861ae |
perf annotate: Simplify disasm_line allocation and freeing code
We are allocating disasm_line object in annotation_line__new() instead of disasm_line__new(). Similarly annotation_line__delete() is actually freeing disasm_line object as well. This complexity is because of privsize. But we don't need privsize anymore so get rid of privsize and simplify disasm_line allocation and freeing code. Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com> Acked-by: Jiri Olsa <jolsa@redhat.com> Cc: Ian Rogers <irogers@google.com> Cc: Jin Yao <yao.jin@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Song Liu <songliubraving@fb.com> Link: http://lore.kernel.org/lkml/20200204045233.474937-3-ravi.bangoria@linux.ibm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
![]() |
e0ad4d6854 |
perf annotate: Remove privsize from symbol__annotate() args
privsize is passed as 0 from all the symbol__annotate() callers. Remove it from argument list. Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com> Acked-by: Jiri Olsa <jolsa@redhat.com> Cc: Ian Rogers <irogers@google.com> Cc: Jin Yao <yao.jin@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Song Liu <songliubraving@fb.com> Link: http://lore.kernel.org/lkml/20200204045233.474937-2-ravi.bangoria@linux.ibm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
![]() |
bd862b1d83 |
perf probe: Check return value of strlist__add() for -ENOMEM
strlist__add() may fail with -ENOMEM. Check it and give debugging hint in advance. Signed-off-by: He Zhe <zhe.he@windriver.com> Acked-by: Jiri Olsa <jolsa@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Kate Stewart <kstewart@linuxfoundation.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lore.kernel.org/lkml/1582727404-180095-1-git-send-email-zhe.he@windriver.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
![]() |
b0aaf4c8f3 |
perf config: Document missing config options
While documenting annotate.show_nr_samples config option, I found many other config options missing in perf-config documentation. Add them. Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexey Budankov <alexey.budankov@linux.intel.com> Cc: Changbin Du <changbin.du@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Jin Yao <yao.jin@linux.intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Leo Yan <leo.yan@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Song Liu <songliubraving@fb.com> Cc: Taeung Song <treeze.taeung@gmail.com> Cc: Thomas Richter <tmricht@linux.ibm.com> Cc: Yisheng Xie <xieyisheng1@huawei.com> Link: http://lore.kernel.org/lkml/20200213064306.160480-9-ravi.bangoria@linux.ibm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
![]() |
cd0a9c518d |
perf annotate: Fix perf config option description
perf config annotate options says it works only with TUI, which is wrong. Most of the TUI options are applicable to stdio2 as well. So remove that generic line and add individual line with each option stating which browsers supports that option. Also, annotate.show_nr_samples config is missing in Documentation. Describe it. Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexey Budankov <alexey.budankov@linux.intel.com> Cc: Changbin Du <changbin.du@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Jin Yao <yao.jin@linux.intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Leo Yan <leo.yan@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Song Liu <songliubraving@fb.com> Cc: Taeung Song <treeze.taeung@gmail.com> Cc: Thomas Richter <tmricht@linux.ibm.com> Cc: Yisheng Xie <xieyisheng1@huawei.com> Link: http://lore.kernel.org/lkml/20200213064306.160480-8-ravi.bangoria@linux.ibm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
![]() |
812b0f5282 |
perf annotate: Prefer cmdline option over default config
For all the perf-config options that can also be set from command line option, the preference is given to command line version in case of any conflict. But that's opposite in case of perf annotate. i.e. the more preference is given to default option rather than command line option. Fix it. Before: $ ./perf config annotate.show_nr_samples=false $ ./perf annotate shash --show-nr-samples Percent│ │24: mov -0xc(%rbp),%eax 49.19 │ imul $0x1003f,%eax,%ecx │ mov -0x18(%rbp),%rax After: Samples│ │24: mov -0xc(%rbp),%eax 1 │ imul $0x1003f,%eax,%ecx │ mov -0x18(%rbp),%rax Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexey Budankov <alexey.budankov@linux.intel.com> Cc: Changbin Du <changbin.du@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Jin Yao <yao.jin@linux.intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Leo Yan <leo.yan@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Song Liu <songliubraving@fb.com> Cc: Taeung Song <treeze.taeung@gmail.com> Cc: Thomas Richter <tmricht@linux.ibm.com> Cc: Yisheng Xie <xieyisheng1@huawei.com> Link: http://lore.kernel.org/lkml/20200213064306.160480-7-ravi.bangoria@linux.ibm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
![]() |
7384083ba6 |
perf annotate: Make perf config effective
perf default config set by user in [annotate] section is totally ignored by annotate code. Fix it. Before: $ ./perf config annotate.hide_src_code=true annotate.show_nr_jumps=true annotate.show_nr_samples=true $ ./perf annotate shash │ unsigned h = 0; │ movl $0x0,-0xc(%rbp) │ while (*s) │ ↓ jmp 44 │ h = 65599 * h + *s++; 11.33 │24: mov -0xc(%rbp),%eax 43.50 │ imul $0x1003f,%eax,%ecx │ mov -0x18(%rbp),%rax After: │ movl $0x0,-0xc(%rbp) │ ↓ jmp 44 1 │1 24: mov -0xc(%rbp),%eax 4 │ imul $0x1003f,%eax,%ecx │ mov -0x18(%rbp),%rax Note that we have removed show_nr_samples and show_total_period from annotation_options because they are not used. Instead of them we use symbol_conf.show_nr_samples and symbol_conf.show_total_period. Committer testing: Using 'perf annotate --stdio2' to use the TUI rendering but emitting the output to stdio: # perf config # # perf config annotate.hide_src_code=true # perf config annotate.hide_src_code=true # # perf config annotate.show_nr_jumps=true # perf config annotate.show_nr_samples=true # perf config annotate.hide_src_code=true annotate.show_nr_jumps=true annotate.show_nr_samples=true # # Before: # perf annotate --stdio2 ObjectInstance::weak_pointer_was_finalized Samples: 1 of event 'cycles', 4000 Hz, Event count (approx.): 830873, [percent: local period] ObjectInstance::weak_pointer_was_finalized() /usr/lib64/libgjs.so.0.0.0 Percent 00000000000609f0 <ObjectInstance::weak_pointer_was_finalized()@@Base>: endbr64 cmpq $0x0,0x20(%rdi) ↓ je 10 xor %eax,%eax ← retq xchg %ax,%ax 100.00 10: push %rbp cmpq $0x0,0x18(%rdi) mov %rdi,%rbp ↓ jne 20 1b: xor %eax,%eax pop %rbp ← retq nop 20: lea 0x18(%rdi),%rdi → callq JS_UpdateWeakPointerAfterGC(JS::Heap<JSObject* cmpq $0x0,0x18(%rbp) ↑ jne 1b mov %rbp,%rdi → callq ObjectBase::jsobj_addr() const@plt mov $0x1,%eax pop %rbp ← retq # After: # perf annotate --stdio2 ObjectInstance::weak_pointer_was_finalized 2> /dev/null Samples: 1 of event 'cycles', 4000 Hz, Event count (approx.): 830873, [percent: local period] ObjectInstance::weak_pointer_was_finalized() /usr/lib64/libgjs.so.0.0.0 Samples endbr64 cmpq $0x0,0x20(%rdi) ↓ je 10 xor %eax,%eax ← retq xchg %ax,%ax 1 1 10: push %rbp cmpq $0x0,0x18(%rdi) mov %rdi,%rbp ↓ jne 20 1 1b: xor %eax,%eax pop %rbp ← retq nop 1 20: lea 0x18(%rdi),%rdi → callq JS_UpdateWeakPointerAfterGC(JS::Heap<JSObject* cmpq $0x0,0x18(%rbp) ↑ jne 1b mov %rbp,%rdi → callq ObjectBase::jsobj_addr() const@plt mov $0x1,%eax pop %rbp ← retq # # perf config annotate.show_nr_jumps annotate.show_nr_jumps=true # perf config annotate.show_nr_jumps=false # perf config annotate.show_nr_jumps annotate.show_nr_jumps=false # # perf annotate --stdio2 ObjectInstance::weak_pointer_was_finalized 2> /dev/null Samples: 1 of event 'cycles', 4000 Hz, Event count (approx.): 830873, [percent: local period] ObjectInstance::weak_pointer_was_finalized() /usr/lib64/libgjs.so.0.0.0 Samples endbr64 cmpq $0x0,0x20(%rdi) ↓ je 10 xor %eax,%eax ← retq xchg %ax,%ax 1 10: push %rbp cmpq $0x0,0x18(%rdi) mov %rdi,%rbp ↓ jne 20 1b: xor %eax,%eax pop %rbp ← retq nop 20: lea 0x18(%rdi),%rdi → callq JS_UpdateWeakPointerAfterGC(JS::Heap<JSObject* cmpq $0x0,0x18(%rbp) ↑ jne 1b mov %rbp,%rdi → callq ObjectBase::jsobj_addr() const@plt mov $0x1,%eax pop %rbp ← retq # Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexey Budankov <alexey.budankov@linux.intel.com> Cc: Changbin Du <changbin.du@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Jin Yao <yao.jin@linux.intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Leo Yan <leo.yan@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Song Liu <songliubraving@fb.com> Cc: Taeung Song <treeze.taeung@gmail.com> Cc: Thomas Richter <tmricht@linux.ibm.com> Cc: Yisheng Xie <xieyisheng1@huawei.com> Link: http://lore.kernel.org/lkml/20200213064306.160480-6-ravi.bangoria@linux.ibm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
![]() |
7b43b69704 |
perf config: Introduce perf_config_u8()
Introduce perf_config_u8() utility function to convert char * input into u8 destination. We will utilize it in followup patch. Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexey Budankov <alexey.budankov@linux.intel.com> Cc: Changbin Du <changbin.du@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Jin Yao <yao.jin@linux.intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Leo Yan <leo.yan@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Song Liu <songliubraving@fb.com> Cc: Taeung Song <treeze.taeung@gmail.com> Cc: Thomas Richter <tmricht@linux.ibm.com> Cc: Yisheng Xie <xieyisheng1@huawei.com> Link: http://lore.kernel.org/lkml/20200213064306.160480-5-ravi.bangoria@linux.ibm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
![]() |
46ccb44269 |
perf annotate: Fix --show-nr-samples for tui/stdio2
perf annotate --show-nr-samples does not really show number of samples. The reason is we have two separate variables for the same purpose. One is in symbol_conf.show_nr_samples and another is annotation_options.show_nr_samples. We save command line option in symbol_conf.show_nr_samples but uses annotation_option.show_nr_samples while rendering tui/stdio2 browser. Though, we copy symbol_conf.show_nr_samples to annotation__default_options.show_nr_samples but that is not really effective as we don't use annotation__default_options once we copy default options to dynamic variable annotate.opts in cmd_annotate(). Instead of all these complication, keep only one variable and use it all over. symbol_conf.show_nr_samples is used by perf report/top as well. So let's kill annotation_options.show_nr_samples. On a side note, I've kept annotation_options.show_nr_samples definition because it's still used by perf-config code. Follow up patch to fix perf-config for annotate will remove annotation_options.show_nr_samples. Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexey Budankov <alexey.budankov@linux.intel.com> Cc: Changbin Du <changbin.du@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Jin Yao <yao.jin@linux.intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Leo Yan <leo.yan@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Song Liu <songliubraving@fb.com> Cc: Taeung Song <treeze.taeung@gmail.com> Cc: Thomas Richter <tmricht@linux.ibm.com> Cc: Yisheng Xie <xieyisheng1@huawei.com> Link: http://lore.kernel.org/lkml/20200213064306.160480-4-ravi.bangoria@linux.ibm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
![]() |
68aac855b6 |
perf annotate: Fix --show-total-period for tui/stdio2
perf annotate --show-total-period does not really show total period. The reason is we have two separate variables for the same purpose. One is in symbol_conf.show_total_period and another is annotation_options.show_total_period. We save command line option in symbol_conf.show_total_period but uses annotation_option.show_total_period while rendering tui/stdio2 browser. Though, we copy symbol_conf.show_total_period to annotation__default_options.show_total_period but that is not really effective as we don't use annotation__default_options once we copy default options to dynamic variable annotate.opts in cmd_annotate(). Instead of all these complication, keep only one variable and use it all over. symbol_conf.show_total_period is used by perf report/top as well. So let's kill annotation_options.show_total_period. On a side note, I've kept annotation_options.show_total_period definition because it's still used by perf-config code. Follow up patch to fix perf-config for annotate will remove annotation_options.show_total_period. Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexey Budankov <alexey.budankov@linux.intel.com> Cc: Changbin Du <changbin.du@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Jin Yao <yao.jin@linux.intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Leo Yan <leo.yan@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Song Liu <songliubraving@fb.com> Cc: Taeung Song <treeze.taeung@gmail.com> Cc: Thomas Richter <tmricht@linux.ibm.com> Cc: Yisheng Xie <xieyisheng1@huawei.com> Link: http://lore.kernel.org/lkml/20200213064306.160480-3-ravi.bangoria@linux.ibm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
![]() |
54cf752cfb |
perf annotate/tui: Re-render title bar after switching back from script browser
The 'perf annotate' TUI browser provides a 'r' hot key to switch to a script browser. But the annotate browser title bar becomes hidden while switching back from script browser. Fix it. Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexey Budankov <alexey.budankov@linux.intel.com> Cc: Changbin Du <changbin.du@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Jin Yao <yao.jin@linux.intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Leo Yan <leo.yan@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Song Liu <songliubraving@fb.com> Cc: Taeung Song <treeze.taeung@gmail.com> Cc: Thomas Richter <tmricht@linux.ibm.com> Cc: Yisheng Xie <xieyisheng1@huawei.com> Link: http://lore.kernel.org/lkml/20200213064306.160480-2-ravi.bangoria@linux.ibm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
![]() |
b103de53e0 |
perf arch powerpc: Sync powerpc syscall.tbl with the kernel sources
Copy over powerpc syscall.tbl to grab changes from the below commits |
|
![]() |
ad60ba0c2e |
perf auxtrace: Add auxtrace_record__read_finish()
All ->read_finish() implementations are doing the same thing. Add a helper function so that they can share the same implementation. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Reviewed-by: Leo Yan <leo.yan@linaro.org> Tested-by: Leo Yan <leo.yan@linaro.org> Reviewed-by: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Kim Phillips <kim.phillips@arm.com> Cc: Wei Li <liwei391@huawei.com> Link: http://lore.kernel.org/lkml/20200217082300.6301-1-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
![]() |
d6bc34c5ec |
perf arm-spe: Fix endless record after being terminated
In __cmd_record(), when receiving SIGINT(ctrl + c), a 'done' flag will be set and the event list will be disabled by evlist__disable() once. While in auxtrace_record.read_finish(), the related events will be enabled again, if they are continuous, the recording seems to be endless. If the event is disabled, don't enable it again here. Based-on-patch-by: Wei Li <liwei391@huawei.com> Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Tan Xiaojun <tanxiaojun@huawei.com> Cc: stable@vger.kernel.org # 5.4+ Link: http://lore.kernel.org/lkml/20200214132654.20395-5-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
![]() |
c9f2833cb4 |
perf cs-etm: Fix endless record after being terminated
In __cmd_record(), when receiving SIGINT(ctrl + c), a 'done' flag will be set and the event list will be disabled by evlist__disable() once. While in auxtrace_record.read_finish(), the related events will be enabled again, if they are continuous, the recording seems to be endless. If the cs_etm event is disabled, we don't enable it again here. Note: This patch is NOT tested since i don't have such a machine with coresight feature, but the code seems buggy same as arm-spe and intel-pt. Tester notes: Thanks for looping, Adrian. Applied this patch and tested with CoreSight on juno board, it works well. Signed-off-by: Wei Li <liwei391@huawei.com> Reviewed-by: Leo Yan <leo.yan@linaro.org> Reviewed-by: Mathieu Poirier <mathieu.poirier@linaro.org> Tested-by: Leo Yan <leo.yan@linaro.org> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Tan Xiaojun <tanxiaojun@huawei.com> Cc: stable@vger.kernel.org # 5.4+ Link: http://lore.kernel.org/lkml/20200214132654.20395-4-adrian.hunter@intel.com [ahunter: removed redundant 'else' after 'return'] Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
![]() |
783fed2f35 |
perf intel-bts: Fix endless record after being terminated
In __cmd_record(), when receiving SIGINT(ctrl + c), a 'done' flag will be set and the event list will be disabled by evlist__disable() once. While in auxtrace_record.read_finish(), the related events will be enabled again, if they are continuous, the recording seems to be endless. If the intel_bts event is disabled, we don't enable it again here. Note: This patch is NOT tested since i don't have such a machine with intel_bts feature, but the code seems buggy same as arm-spe and intel-pt. Signed-off-by: Wei Li <liwei391@huawei.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Tan Xiaojun <tanxiaojun@huawei.com> Cc: stable@vger.kernel.org # 5.4+ Link: http://lore.kernel.org/lkml/20200214132654.20395-3-adrian.hunter@intel.com [ahunter: removed redundant 'else' after 'return'] Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
![]() |
2da4dd3d69 |
perf intel-pt: Fix endless record after being terminated
In __cmd_record(), when receiving SIGINT(ctrl + c), a 'done' flag will be set and the event list will be disabled by evlist__disable() once. While in auxtrace_record.read_finish(), the related events will be enabled again, if they are continuous, the recording seems to be endless. If the intel_pt event is disabled, we don't enable it again here. Before the patch: huawei@huawei-2288H-V5:~/linux-5.5-rc4/tools/perf$ ./perf record -e \ intel_pt//u -p 46803 ^C^C^C^C^C^C After the patch: huawei@huawei-2288H-V5:~/linux-5.5-rc4/tools/perf$ ./perf record -e \ intel_pt//u -p 48591 ^C[ perf record: Woken up 0 times to write data ] Warning: AUX data lost 504 times out of 4816! [ perf record: Captured and wrote 2024.405 MB perf.data ] Signed-off-by: Wei Li <liwei391@huawei.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Tan Xiaojun <tanxiaojun@huawei.com> Cc: stable@vger.kernel.org # 5.4+ Link: http://lore.kernel.org/lkml/20200214132654.20395-2-adrian.hunter@intel.com [ ahunter: removed redundant 'else' after 'return' ] Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
![]() |
2bbc835376 |
perf test: Fix test trace+probe_vfs_getname.sh on s390
This test places a kprobe to function getname_flags() in the kernel
which has the following prototype:
struct filename *getname_flags(const char __user *filename, int flags, int *empty)
The 'filename' argument points to a filename located in user space memory.
Looking at commit
|
|
![]() |
3b573bf318 |
perf bpf: Remove bpf/ subdir from bpf.h headers used to build bpf events
The bpf.h file needed gets installed in /usr/lib/include/perf/bpf/bpf.h, and /usr/lib/include/perf/ is added to the include path passed to clang to build the eBPF bytecode, so just remove "bpf/", its directly in the path passed already. This was working by accident, fix it. I.e. now this is back working: # cat /home/acme/git/perf/tools/perf/examples/bpf/hello.c #include <stdio.h> int syscall_enter(openat)(void *args) { puts("Hello, world\n"); return 0; } license(GPL); # perf trace -e /home/acme/git/perf/tools/perf/examples/bpf/hello.c 0.000 pickup/21493 __bpf_stdout__(Hello, world) 56.462 sh/13539 __bpf_stdout__(Hello, world) 56.536 sh/13539 __bpf_stdout__(Hello, world) 56.673 sh/13539 __bpf_stdout__(Hello, world) 56.781 sh/13539 __bpf_stdout__(Hello, world) 56.707 perf/13182 __bpf_stdout__(Hello, world) 56.849 perf/13182 __bpf_stdout__(Hello, world) ^C # Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lkml.kernel.org/n/tip-d9myswhgo8gfi3vmehdqpxa7@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
![]() |
6276594115 |
perf llvm: Fix script used to obtain kernel make directives to work with new kbuild
Before this patch: # ./perf test 39 41 39: LLVM search and compile : 39.1: Basic BPF llvm compile : Ok 39.2: kbuild searching : FAILED! 39.3: Compile source for BPF prologue generation : Skip 39.4: Compile source for BPF relocation : Skip 41: BPF filter : 41.1: Basic BPF filtering : Ok 41.2: BPF pinning : Ok 41.3: BPF prologue generation : FAILED! 41.4: BPF relocation checker : Skip # Using 'perf test -v' for these tests shows that it is not finding uapi/linux/fs.h, which ends up being because we don't setup the right header path. Fix it. After this patch: # perf test 39 41 39: LLVM search and compile : 39.1: Basic BPF llvm compile : Ok 39.2: kbuild searching : Ok 39.3: Compile source for BPF prologue generation : Ok 39.4: Compile source for BPF relocation : Ok 41: BPF filter : 41.1: Basic BPF filtering : Ok 41.2: BPF pinning : Ok 41.3: BPF prologue generation : Ok 41.4: BPF relocation checker : Ok # Longer description: In llvm-utils.c we use some techniques to obtain the kbuild make directives and that recently stopped working as now 'ar' gets called and expects to find the dummy.o used to echo these variables: $(NOSTDINC_FLAGS) $(LINUXINCLUDE) $(EXTRA_CFLAGS) Add the $(CC) line to satisfy that, making sure this works with all kernels, i.e. preserving the temp directory and files in it used for this technique we can see that it works everywhere: # make -s -C /lib/modules/5.4.18-100.fc30.x86_64/build M=/tmp/tmp.qgaFHgxjZ4/ clean # ls -la /tmp/tmp.qgaFHgxjZ4/ total 4 drwx------. 2 root root 80 Feb 14 09:42 . drwxrwxrwt. 47 root root 1200 Feb 14 09:42 .. -rw-r--r--. 1 root root 0 Feb 13 17:14 dummy.c -rw-r--r--. 1 root root 121 Feb 13 17:14 Makefile # # cat /tmp/tmp.qgaFHgxjZ4/Makefile obj-y := dummy.o $(obj)/%.o: $(src)/%.c @echo -n "$(NOSTDINC_FLAGS) $(LINUXINCLUDE) $(EXTRA_CFLAGS)" $(CC) -c -o $@ $< # Then build with an old kernel Makefile: # make -s -C /lib/modules/5.4.18-100.fc30.x86_64/build M=/tmp/tmp.qgaFHgxjZ4/ dummy.o -nostdinc -isystem /usr/lib/gcc/x86_64-redhat-linux/9/include -I./arch/x86/include -I./arch/x86/include/generated -I./include -I./arch/x86/include/uapi -I./arch/x86/include/generated/uapi -I./include/uapi -I./include/generated/uapi -include ./include/linux/kconfig.h # # ls -la /tmp/tmp.qgaFHgxjZ4/ total 8 drwx------. 2 root root 100 Feb 14 09:43 . drwxrwxrwt. 47 root root 1200 Feb 14 09:43 .. -rw-r--r--. 1 root root 0 Feb 13 17:14 dummy.c -rw-r--r--. 1 root root 936 Feb 14 09:43 dummy.o -rw-r--r--. 1 root root 121 Feb 13 17:14 Makefile # And a new one: # make -s -C /lib/modules/5.4.18-100.fc30.x86_64/build M=/tmp/tmp.qgaFHgxjZ4/ clean # ls -la /tmp/tmp.qgaFHgxjZ4/ total 4 drwx------. 2 root root 80 Feb 14 09:43 . drwxrwxrwt. 47 root root 1200 Feb 14 09:43 .. -rw-r--r--. 1 root root 0 Feb 13 17:14 dummy.c -rw-r--r--. 1 root root 121 Feb 13 17:14 Makefile # make -s -C /lib/modules/5.6.0-rc1+/build M=/tmp/tmp.qgaFHgxjZ4/ dummy.o -nostdinc -isystem /usr/lib/gcc/x86_64-redhat-linux/9/include -I/home/acme/git/linux/arch/x86/include -I./arch/x86/include/generated -I/home/acme/git/linux/include -I./include -I/home/acme/git/linux/arch/x86/include/uapi -I./arch/x86/include/generated/uapi -I/home/acme/git/linux/include/uapi -I./include/generated/uapi -include /home/acme/git/linux/include/linux/kconfig.h # # ls -la /tmp/tmp.qgaFHgxjZ4/ total 16 drwx------. 2 root root 160 Feb 14 09:44 . drwxrwxrwt. 47 root root 1200 Feb 14 09:44 .. -rw-r--r--. 1 root root 158 Feb 14 09:44 built-in.a -rw-r--r--. 1 root root 149 Feb 14 09:44 .built-in.a.cmd -rw-r--r--. 1 root root 0 Feb 13 17:14 dummy.c -rw-r--r--. 1 root root 936 Feb 14 09:44 dummy.o -rw-r--r--. 1 root root 121 Feb 13 17:14 Makefile -rw-r--r--. 1 root root 0 Feb 14 09:44 modules.order # Reported-by: Thomas Richter <tmricht@linux.ibm.com> Tested-by: Thomas Richter <tmricht@linux.ibm.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: He Kuang <hekuang@huawei.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Masahiro Yamada <masahiroy@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Sumanth Korikkar <sumanthk@linux.ibm.com> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Wang Nan <wangnan0@huawei.com> Cc: Zefan Li <lizefan@huawei.com> Link: https://www.spinics.net/lists/linux-perf-users/msg10600.html Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
![]() |
df5a5f3cf2 |
perf tools: Add arm64 version of get_cpuid()
Add an arm64 version of get_cpuid(), which is used for various annotation and headers - for example, I now get the CPUID in "perf report --header", as shown in this snippet: # hostname : ubuntu # os release : 5.5.0-rc1-dirty # perf version : 5.5.rc1.gbf8a13dc9851 # arch : aarch64 # nrcpus online : 96 # nrcpus avail : 96 # cpuid : 0x00000000480fd010 Since much of the code to read the MIDR is already in get_cpuid_str(), factor out this code. Tester notes: I tested this patch on my new ARM64 Kunpeng 920 server. [root@node1 zsk]# ./perf --version perf version 5.6.rc1.g2cdb955b7252 Both perf list and perf stat can work. Signed-off-by: John Garry <john.garry@huawei.com> Tested-by: Shaokun Zhang <zhangshaokun@hisilicon.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Will Deacon <will@kernel.org> Cc: linux-arm-kernel@lists.infradead.org Cc: linuxarm@huawei.com Link: http://lore.kernel.org/lkml/1576245255-210926-1-git-send-email-john.garry@huawei.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
![]() |
d7a07b2932 |
perf trace: Resolve prctl's 'option' arg strings to numbers
# perf trace -e syscalls:sys_enter_prctl --filter="option==SET_NAME" 0.000 Socket Thread/3860 syscalls:sys_enter_prctl(option: SET_NAME, arg2: 0x7fc50b9733e8) 0.053 SSL Cert #78/3860 syscalls:sys_enter_prctl(option: SET_NAME, arg2: 0x7fc50b9733e8) ^C # If one uses '-v' with 'perf trace', we can see the filter it puts in place: New filter for syscalls:sys_enter_prctl: (option==0xf) && (common_pid != 3859 && common_pid != 2757) We still need to allow using plain '-e prctl' and have this turn into creating a 'syscalls:sys_enter_prctl' event so that the filter can be applied only to it as right now '-e prctl' ends up using the 'raw_syscalls:sys_enter/sys_exit'. The end goal is to have something like: # perf trace -e prctl/option==SET_NAME/ And have that use tracepoint filters or eBPF ones. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Christian Brauner <christian.brauner@ubuntu.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Mike Christie <mchristi@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
![]() |
c0134b3366 |
perf beauty prctl: Export the 'options' strarray
So that we can use it with strtoul, allowing string to number conversions in filter expressions. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Christian Brauner <christian.brauner@ubuntu.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Mike Christie <mchristi@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
![]() |
484214f49b |
perf maps: Move kmap::kmaps setup to maps__insert()
So the kmaps pointer setup is centralized and we do not need to update it in all those places (2 current places and few more missing) after calling maps__insert(). Reported-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com> Signed-off-by: Jiri Olsa <jolsa@kernel.org> Tested-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com> Tested-by: Kim Phillips <kim.phillips@amd.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Michael Petlan <mpetlan@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lore.kernel.org/lkml/20200210143218.24948-5-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
![]() |
7ce66139a9 |
perf maps: Fix map__clone() for struct kmap
The map__clone() function can be called on kernel maps as well, so it needs to duplicate the whole kmap data. Reported-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com> Signed-off-by: Jiri Olsa <jolsa@kernel.org> Tested-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com> Tested-by: Kim Phillips <kim.phillips@amd.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Michael Petlan <mpetlan@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lore.kernel.org/lkml/20200210143218.24948-4-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
![]() |
4a4eb6154d |
perf maps: Mark ksymbol DSOs with kernel type
We add ksymbol map into machine->kmaps, so it needs to be created as 'struct kmap', which is dependent on its dso having kernel type. Reported-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com> Signed-off-by: Jiri Olsa <jolsa@kernel.org> Tested-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com> Tested-by: Kim Phillips <kim.phillips@amd.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Michael Petlan <mpetlan@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lore.kernel.org/lkml/20200210200847.GA36715@krava Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
![]() |
02213cec64 |
perf maps: Mark module DSOs with kernel type
We add kernel module map into machine->kmaps, so it needs to be created as 'struct kmap', which is dependent on its dso having kernel type. Reported-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com> Signed-off-by: Jiri Olsa <jolsa@kernel.org> Tested-by: Kim Phillips <kim.phillips@amd.com> Tested-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Michael Petlan <mpetlan@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lore.kernel.org/lkml/20200210143218.24948-2-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
![]() |
c452833387 |
tools include UAPI: Sync x86's syscalls_64.tbl, generic unistd.h and fcntl.h to pick up openat2 and pidfd_getfd
|
|
![]() |
bc5f15be2c |
perf symbols: Convert symbol__is_idle() to use strlist
Use the more optimized strlist implementation to do the idle function lookup. Signed-off-by: Kim Phillips <kim.phillips@amd.com> Acked-by: Song Liu <songliubraving@fb.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Cong Wang <xiyou.wangcong@gmail.com> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: Jin Yao <yao.jin@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lore.kernel.org/lkml/20200210163147.25358-1-kim.phillips@amd.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
![]() |
0e71459afc |
perf symbols: Update the list of kernel idle symbols
The "acpi_idle_do_entry", "acpi_processor_ffh_cstate_enter", and "idle_cpu" symbols appear in 'perf top' output, at least on AMD systems. Add them to perf's idle_symbols list, so they don't dominate 'perf top' output. Signed-off-by: Kim Phillips <kim.phillips@amd.com> Acked-by: Jiri Olsa <jolsa@redhat.com> Acked-by: Song Liu <songliubraving@fb.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Cong Wang <xiyou.wangcong@gmail.com> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: Jin Yao <yao.jin@linux.intel.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lore.kernel.org/lkml/20200207230613.26709-2-kim.phillips@amd.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
![]() |
80cc7bb6c1 |
perf stat: Don't report a null stalled cycles per insn metric
For data collected on machines with front end stalled cycles supported, such as found on modern AMD CPU families, commit |
|
![]() |
45f035748b |
perf/core improvements and fixes:
perf maps: Cengiz Can: - Add missing unlock to maps__insert() error case. srcline: Changbin Du: - Make perf able to build with latest libbfd. perf parse: Leo Yan: - Keep copy of string in perf_evsel_config_term() to fix sink terms processing in ARM CoreSight. perf test: Thomas Richter: - Fix test case Merge cpu map, removing extra reference count drop that causes a segfault on s/390. perf probe: Thomas Richter: - Add ustring support for perf probe command Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> -----BEGIN PGP SIGNATURE----- iHUEABYIAB0WIQR2GiIUctdOfX2qHhGyPKLppCJ+JwUCXjUwAgAKCRCyPKLppCJ+ J163AQCvl98mgSHfAVq9roP3ip1T1TwLXT+zPA9aOEc95d7NwwD9EuHVfs1kIjfV +nv8/i3T3YVq1Dlx9MkvKEVQwJdiCws= =f/tq -----END PGP SIGNATURE----- Merge tag 'perf-core-for-mingo-5.6-20200201' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/urgent Pull perf/core improvements and fixes from Arnaldo Carvalho de Melo: perf maps: Cengiz Can: - Add missing unlock to maps__insert() error case. srcline: Changbin Du: - Make perf able to build with latest libbfd. perf parse: Leo Yan: - Keep copy of string in perf_evsel_config_term() to fix sink terms processing in ARM CoreSight. perf test: Thomas Richter: - Fix test case Merge cpu map, removing extra reference count drop that causes a segfault on s/390. perf probe: Thomas Richter: - Add ustring support for perf probe command Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> |
|
![]() |
85fc95d759 |
perf maps: Add missing unlock to maps__insert() error case
`tools/perf/util/map.c` has a function named `maps__insert` that
acquires a write lock if its in multithread context.
Even though this lock is released when function successfully completes,
there's a branch that is executed when `maps_by_name == NULL` that
returns from this function without releasing the write lock.
Added an `up_write` to release the lock when this happens.
Fixes:
|
|
![]() |
1873f1547d |
perf probe: Add ustring support for perf probe command
Kernel commit
|
|
![]() |
0ada120c88 |
perf: Make perf able to build with latest libbfd
libbfd has changed the bfd_section_* macros to inline functions bfd_section_<field> since 2019-09-18. See below two commits: o http://www.sourceware.org/ml/gdb-cvs/2019-09/msg00064.html o https://www.sourceware.org/ml/gdb-cvs/2019-09/msg00072.html This fix make perf able to build with both old and new libbfd. Signed-off-by: Changbin Du <changbin.du@gmail.com> Acked-by: Jiri Olsa <jolsa@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lore.kernel.org/lkml/20200128152938.31413-1-changbin.du@gmail.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
![]() |
0dd1979f7f |
perf test: Fix test case Merge cpu map
Commit
|
|
![]() |
3220fb8d5e |
perf parse: Copy string to perf_evsel_config_term
perf with CoreSight fails to record trace data with command: perf record -e cs_etm/@tmc_etr0/u --per-thread ls failed to set sink "" on event cs_etm/@tmc_etr0/u with 21 (Is a directory)/perf/ This failure is root caused with the commit |
|
![]() |
e884602b57 |
perf parse: Refactor 'struct perf_evsel_config_term'
The struct perf_evsel_config_term::val is a union which contains fields 'callgraph', 'drv_cfg' and 'branch' as string pointers. This leads to the complex code logic for handling every type's string separately, and it's hard to release string as a general way. This patch refactors the structure to add a common field 'str' in the 'val' union as string pointer and remove the other three fields 'callgraph', 'drv_cfg' and 'branch'. Without passing field name, the patch simplifies the string handling with macro ADD_CONFIG_TERM_STR() for string pointer assignment. This patch fixes multiple warnings of line over 80 characters detected by checkpatch tool. Signed-off-by: Leo Yan <leo.yan@linaro.org> Reviewed-by: Andi Kleen <ak@linux.intel.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Mike Leach <mike.leach@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Suzuki Poulouse <suzuki.poulose@arm.com> Cc: linux-arm-kernel@lists.infradead.org Link: http://lore.kernel.org/lkml/20200117055251.24058-1-leo.yan@linaro.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
![]() |
bd2463ac7d |
Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next
Pull networking updates from David Miller: 1) Add WireGuard 2) Add HE and TWT support to ath11k driver, from John Crispin. 3) Add ESP in TCP encapsulation support, from Sabrina Dubroca. 4) Add variable window congestion control to TIPC, from Jon Maloy. 5) Add BCM84881 PHY driver, from Russell King. 6) Start adding netlink support for ethtool operations, from Michal Kubecek. 7) Add XDP drop and TX action support to ena driver, from Sameeh Jubran. 8) Add new ipv4 route notifications so that mlxsw driver does not have to handle identical routes itself. From Ido Schimmel. 9) Add BPF dynamic program extensions, from Alexei Starovoitov. 10) Support RX and TX timestamping in igc, from Vinicius Costa Gomes. 11) Add support for macsec HW offloading, from Antoine Tenart. 12) Add initial support for MPTCP protocol, from Christoph Paasch, Matthieu Baerts, Florian Westphal, Peter Krystad, and many others. 13) Add Octeontx2 PF support, from Sunil Goutham, Geetha sowjanya, Linu Cherian, and others. * git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1469 commits) net: phy: add default ARCH_BCM_IPROC for MDIO_BCM_IPROC udp: segment looped gso packets correctly netem: change mailing list qed: FW 8.42.2.0 debug features qed: rt init valid initialization changed qed: Debug feature: ilt and mdump qed: FW 8.42.2.0 Add fw overlay feature qed: FW 8.42.2.0 HSI changes qed: FW 8.42.2.0 iscsi/fcoe changes qed: Add abstraction for different hsi values per chip qed: FW 8.42.2.0 Additional ll2 type qed: Use dmae to write to widebus registers in fw_funcs qed: FW 8.42.2.0 Parser offsets modified qed: FW 8.42.2.0 Queue Manager changes qed: FW 8.42.2.0 Expose new registers and change windows qed: FW 8.42.2.0 Internal ram offsets modifications MAINTAINERS: Add entry for Marvell OcteonTX2 Physical Function driver Documentation: net: octeontx2: Add RVU HW and drivers overview octeontx2-pf: ethtool RSS config support octeontx2-pf: Add basic ethtool support ... |
|
![]() |
c0e809e244 |
Merge branch 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf updates from Ingo Molnar: "Kernel side changes: - Ftrace is one of the last W^X violators (after this only KLP is left). These patches move it over to the generic text_poke() interface and thereby get rid of this oddity. This requires a surprising amount of surgery, by Peter Zijlstra. - x86/AMD PMUs: add support for 'Large Increment per Cycle Events' to count certain types of events that have a special, quirky hw ABI (by Kim Phillips) - kprobes fixes by Masami Hiramatsu Lots of tooling updates as well, the following subcommands were updated: annotate/report/top, c2c, clang, record, report/top TUI, sched timehist, tests; plus updates were done to the gtk ui, libperf, headers and the parser" * 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (57 commits) perf/x86/amd: Add support for Large Increment per Cycle Events perf/x86/amd: Constrain Large Increment per Cycle events perf/x86/intel/rapl: Add Comet Lake support tracing: Initialize ret in syscall_enter_define_fields() perf header: Use last modification time for timestamp perf c2c: Fix return type for histogram sorting comparision functions perf beauty sockaddr: Fix augmented syscall format warning perf/ui/gtk: Fix gtk2 build perf ui gtk: Add missing zalloc object perf tools: Use %define api.pure full instead of %pure-parser libperf: Setup initial evlist::all_cpus value perf report: Fix no libunwind compiled warning break s390 issue perf tools: Support --prefix/--prefix-strip perf report: Clarify in help that --children is default tools build: Fix test-clang.cpp with Clang 8+ perf clang: Fix build with Clang 9 kprobes: Fix optimize_kprobe()/unoptimize_kprobe() cancellation logic tools lib: Fix builds when glibc contains strlcpy() perf report/top: Make 'e' visible in the help and make it toggle showing callchains perf report/top: Do not offer annotation for symbols without samples ... |
|
![]() |
e279160f49 |
The timekeeping and timers departement provides:
- Time namespace support: If a container migrates from one host to another then it expects that clocks based on MONOTONIC and BOOTTIME are not subject to disruption. Due to different boot time and non-suspended runtime these clocks can differ significantly on two hosts, in the worst case time goes backwards which is a violation of the POSIX requirements. The time namespace addresses this problem. It allows to set offsets for clock MONOTONIC and BOOTTIME once after creation and before tasks are associated with the namespace. These offsets are taken into account by timers and timekeeping including the VDSO. Offsets for wall clock based clocks (REALTIME/TAI) are not provided by this mechanism. While in theory possible, the overhead and code complexity would be immense and not justified by the esoteric potential use cases which were discussed at Plumbers '18. The overhead for tasks in the root namespace (host time offsets = 0) is in the noise and great effort was made to ensure that especially in the VDSO. If time namespace is disabled in the kernel configuration the code is compiled out. Kudos to Andrei Vagin and Dmitry Sofanov who implemented this feature and kept on for more than a year addressing review comments, finding better solutions. A pleasant experience. - Overhaul of the alarmtimer device dependency handling to ensure that the init/suspend/resume ordering is correct. - A new clocksource/event driver for Microchip PIT64 - Suspend/resume support for the Hyper-V clocksource - The usual pile of fixes, updates and improvements mostly in the driver code. -----BEGIN PGP SIGNATURE----- iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAl4vbTcTHHRnbHhAbGlu dXRyb25peC5kZQAKCRCmGPVMDXSYoXT2D/96iJ3G9Snn2khEQP3XS2rYmtDGw7NO m1n96falwWeGe6zreU80R2Jge5nLxQtNhRoMPLLee1GpHwRC6lvqEqgdZ4LMBrD2 JqV7Gzg8Urmdh+hpDsyTCpeEWEzoMKxiFOX8PxwctqUhM4szEe5iQg2YQsg85Jw2 vG6M93N2xwDILh4rhEMbKjo+5ZmYn7c1RQvpGOSmpKOj940W/N7H2HBsFhdaJ1Kw FW5pFv1211PaU5RV2YNb2dMeeMTT1N3e2VN4Dkadoxp47pb+725gNHEBEjmV9poG Lp4IhzGAPnj8zVD88icQZSTaK3gUHMClxprJ0Pf84WEtiH7SeGu8BPYyu77+oNDe yzcctDJNyCWXkzmaP/fe/HLc0TStbvNAJ5Tagp4BC75gzebeb4/n8RtRT0fKeDYL pxpDPKDAPU7p1JSjxiWAtshqjBycWNY3Z49bA7/VhKBhnv8BDyBPGlYd7/4xrbGr RK7DQNXJwaJaiNJ7p5PiaFxGzNyB0B9sThD/slSlEInIKb4h9YzWr0TV+NB62VnB sDcN+tpLbRPz5/5cHGGfxR0+zKWpfyai8pzbmmaXEaKssjRYwyvcac5EZdgbWpbK k7CqAjoWLA2P+tGeePNJOf5JYK6Vmdyh4clmuwM0zOiRJ9NlWUyMf3z7QYILs4RO UAI+6opYlZEPAw== =x3qT -----END PGP SIGNATURE----- Merge tag 'timers-core-2020-01-27' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timer updates from Thomas Gleixner: "The timekeeping and timers departement provides: - Time namespace support: If a container migrates from one host to another then it expects that clocks based on MONOTONIC and BOOTTIME are not subject to disruption. Due to different boot time and non-suspended runtime these clocks can differ significantly on two hosts, in the worst case time goes backwards which is a violation of the POSIX requirements. The time namespace addresses this problem. It allows to set offsets for clock MONOTONIC and BOOTTIME once after creation and before tasks are associated with the namespace. These offsets are taken into account by timers and timekeeping including the VDSO. Offsets for wall clock based clocks (REALTIME/TAI) are not provided by this mechanism. While in theory possible, the overhead and code complexity would be immense and not justified by the esoteric potential use cases which were discussed at Plumbers '18. The overhead for tasks in the root namespace (ie where host time offsets = 0) is in the noise and great effort was made to ensure that especially in the VDSO. If time namespace is disabled in the kernel configuration the code is compiled out. Kudos to Andrei Vagin and Dmitry Sofanov who implemented this feature and kept on for more than a year addressing review comments, finding better solutions. A pleasant experience. - Overhaul of the alarmtimer device dependency handling to ensure that the init/suspend/resume ordering is correct. - A new clocksource/event driver for Microchip PIT64 - Suspend/resume support for the Hyper-V clocksource - The usual pile of fixes, updates and improvements mostly in the driver code" * tag 'timers-core-2020-01-27' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (71 commits) alarmtimer: Make alarmtimer_get_rtcdev() a stub when CONFIG_RTC_CLASS=n alarmtimer: Use wakeup source from alarmtimer platform device alarmtimer: Make alarmtimer platform device child of RTC device alarmtimer: Update alarmtimer_get_rtcdev() docs to reflect reality hrtimer: Add missing sparse annotation for __run_timer() lib/vdso: Only read hrtimer_res when needed in __cvdso_clock_getres() MIPS: vdso: Define BUILD_VDSO32 when building a 32bit kernel clocksource/drivers/hyper-v: Set TSC clocksource as default w/ InvariantTSC clocksource/drivers/hyper-v: Untangle stimers and timesync from clocksources clocksource/drivers/timer-microchip-pit64b: Fix sparse warning clocksource/drivers/exynos_mct: Rename Exynos to lowercase clocksource/drivers/timer-ti-dm: Fix uninitialized pointer access clocksource/drivers/timer-ti-dm: Switch to platform_get_irq clocksource/drivers/timer-ti-dm: Convert to devm_platform_ioremap_resource clocksource/drivers/em_sti: Fix variable declaration in em_sti_probe clocksource/drivers/em_sti: Convert to devm_platform_ioremap_resource clocksource/drivers/bcm2835_timer: Fix memory leak of timer clocksource/drivers/cadence-ttc: Use ttc driver as platform driver clocksource/drivers/timer-microchip-pit64b: Add Microchip PIT64B support clocksource/drivers/hyper-v: Reserve PAGE_SIZE space for tsc page ... |
|
![]() |
954b3c4397 |
Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Alexei Starovoitov says: ==================== pull-request: bpf-next 2020-01-22 The following pull-request contains BPF updates for your *net-next* tree. We've added 92 non-merge commits during the last 16 day(s) which contain a total of 320 files changed, 7532 insertions(+), 1448 deletions(-). The main changes are: 1) function by function verification and program extensions from Alexei. 2) massive cleanup of selftests/bpf from Toke and Andrii. 3) batched bpf map operations from Brian and Yonghong. 4) tcp congestion control in bpf from Martin. 5) bulking for non-map xdp_redirect form Toke. 6) bpf_send_signal_thread helper from Yonghong. ==================== Signed-off-by: David S. Miller <davem@davemloft.net> |
|
![]() |
521fe8bb58 |
perf: Use consistent include paths for libbpf
Fix perf to include libbpf header files with the bpf/ prefix, to be consistent with external users of the library. Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andriin@fb.com> Link: https://lore.kernel.org/bpf/157952560797.1683545.7685921032671386301.stgit@toke.dk |
|
![]() |
8af19d66b9 |
perf header: Use last modification time for timestamp
Using .st_ctime clobbers the timestamp information in perf report header whenever any operation is done with the file. Even tar-ing and untar-ing the perf.data file (which preserves the file last modification timestamp) doesn't prevent that: [Michael@Diego tmp]$ ls -l perf.data -> -rw-------. 1 Michael Michael 169888 Dec 2 15:23 perf.data [Michael@Diego tmp]$ perf report --header-only # ======== -> # captured on : Mon Dec 2 15:23:42 2019 [...] [Michael@Diego tmp]$ tar c perf.data | xz > perf.data.tar.xz [Michael@Diego tmp]$ mkdir aaa [Michael@Diego tmp]$ cd aaa [Michael@Diego aaa]$ xzcat ../perf.data.tar.xz | tar x [Michael@Diego aaa]$ ls -l -a total 172 drwxrwxr-x. 2 Michael Michael 23 Jan 14 11:26 . drwxrwxr-x. 6 Michael Michael 4096 Jan 14 11:26 .. -> -rw-------. 1 Michael Michael 169888 Dec 2 15:23 perf.data [Michael@Diego aaa]$ perf report --header-only # ======== -> # captured on : Tue Jan 14 11:26:16 2020 [...] When using .st_mtime instead, correct information is printed: [Michael@Diego aaa]$ ~/acme/tools/perf/perf report --header-only # ======== -> # captured on : Mon Dec 2 15:23:42 2019 [...] Signed-off-by: Michael Petlan <mpetlan@redhat.com> Acked-by: Jiri Olsa <jolsa@kernel.org> LPU-Reference: 20200114104236.31555-1-mpetlan@redhat.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
![]() |
c1c8013ec3 |
perf c2c: Fix return type for histogram sorting comparision functions
Commit |
|
![]() |
49e0b6f4e9 |
perf beauty sockaddr: Fix augmented syscall format warning
The sockaddr related examples given in `tools/perf/examples/bpf/augmented_syscalls.c` almost always use `long`s to represent most of their fields. However, `size_t syscall_arg__scnprintf_sockaddr(..)` has a `scnprintf` call that uses `"%#x"` as format string. This throws a warning (whenever the syscall argument is `unsigned long`). Added `l` identifier to indicate that the `arg->value` is an unsigned long. Not sure about the complications of this with x86 though. Signed-off-by: Cengiz Can <cengiz@kernel.wtf> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lore.kernel.org/lkml/20200113174438.102975-1-cengiz@kernel.wtf Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
![]() |
93e843f95f |
perf/ui/gtk: Fix gtk2 build
Ravi Bangoria reported an issue when doing the gtk2 feature detection on Fedora 31, where some types got deprecated: /usr/include/gtk-2.0/gtk/gtktypeutils.h:236:1: error: ‘GTypeDebugFlags’ is deprecated [-Werror=deprecated-declarations] 236 | void gtk_type_init (GTypeDebugFlags debug_flags); Fix this for perf by allowing the compile to pass with deprecated symbols via the -Wno-deprecated-declarations compiler directive. Reported-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com> Signed-off-by: Jiri Olsa <jolsa@kernel.org> Tested-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Jelle van der Waa <jelle@vdwaa.nl> Cc: Michael Petlan <mpetlan@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lore.kernel.org/lkml/20200113104358.123511-2-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
![]() |
604e2139a1 |
perf ui gtk: Add missing zalloc object
When we moved zalloc.o to the library we missed gtk library which needs
it compiled in, otherwise the missing __zfree symbol will cause the
library to fail to load.
Adding the zalloc object to the gtk library build.
Fixes:
|
|
![]() |
fc8c0a9922 |
perf tools: Use %define api.pure full instead of %pure-parser
bison deprecated the "%pure-parser" directive in favor of "%define api.pure full". The api.pure got introduced in bison 2.3 (Oct 2007), so it seems safe to use it without any version check. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Clark Williams <williams@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Ravi Bangoria <ravi.bangoria@linux.ibm.com> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lore.kernel.org/lkml/20200112192259.GA35080@krava Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
![]() |
c3314a74f8 |
perf report: Fix no libunwind compiled warning break s390 issue
Commit |
|
![]() |
3b0b16bf8c |
perf tools: Support --prefix/--prefix-strip
The objdump utility has useful --prefix / --prefix-strip options to allow changing source code file names hardcoded into executables' debug info. Add options to 'perf report', 'perf top' and 'perf annotate', which are then passed to objdump. $ mkdir foo $ echo 'main() { for (;;); }' > foo/foo.c $ gcc -g foo/foo.c foo/foo.c:1:1: warning: return type defaults to ‘int’ [-Wimplicit-int] 1 | main() { for (;;); } | ^~~~ $ perf record ./a.out ^C[ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.230 MB perf.data (5721 samples) ] $ mv foo bar $ perf annotate <does not show source code> $ perf annotate --prefix=/home/ak/lsrc/git/bar --prefix-strip=5 <does show source code> Signed-off-by: Andi Kleen <ak@linux.intel.com> Tested-by: Jiri Olsa <jolsa@redhat.com> LPU-Reference: 20200107210444.214071-1-andi@firstfloor.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
![]() |
aa9d1f8334 |
perf report: Clarify in help that --children is default
Refer to --no-children, which is what most people probably want. Signed-off-by: Andi Kleen <ak@linux.intel.com> Cc: Jiri Olsa <jolsa@kernel.org> LPU-Reference: 20200103183643.149150-1-andi@firstfloor.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
![]() |
411c0ec0b8 |
perf clang: Fix build with Clang 9
LLVM D59377 (included in Clang 9) refactored Clang VFS construction a bit, which broke perf clang build. Let's fix it. Signed-off-by: Maciej S. Szmigiero <mail@maciej.szmigiero.name> Reviewed-by: Dennis Schridde <devurandom@gmx.net> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: clang-built-linux@googlegroups.com Cc: Denis Pronin <dannftk@yandex.ru> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Naohiro Aota <naota@elisp.net> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lore.kernel.org/lkml/20191228171314.946469-2-mail@maciej.szmigiero.name Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
![]() |
ea2d1f7fce |
hrtimers: Prepare hrtimer_nanosleep() for time namespaces
clock_nanosleep() accepts absolute values of expiration time when TIMER_ABSTIME flag is set. This absolute value is inside the task's time namespace, and has to be converted to the host's time. There is timens_ktime_to_host() helper for converting time, but it accepts ktime argument. As a preparation, make hrtimer_nanosleep() accept a clock value in ktime instead of timespec64. Co-developed-by: Dmitry Safonov <dima@arista.com> Signed-off-by: Andrei Vagin <avagin@openvz.org> Signed-off-by: Dmitry Safonov <dima@arista.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20191112012724.250792-17-dima@arista.com |
|
![]() |
e6d6abfc44 |
perf report/top: Make 'e' visible in the help and make it toggle showing callchains
The 'e' and 'c' hotkeys were present for a long time, but not documented in the help window, change 'e' to be a toggle so that it gets consistent with other toggles like '+' and document it in the help window. Keep 'c' as is for people used to it but don't document, as it is easier to just use 'e' to show/hide all the callchains for a top level histogram entry. Reviewed-by: Jiri Olsa <jolsa@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jin Yao <yao.jin@linux.intel.com> Cc: Kan Liang <kan.liang@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lkml.kernel.org/n/tip-pmyi5x34stlqmyu81rci94x9@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
![]() |
ea537f22f6 |
perf report/top: Do not offer annotation for symbols without samples
This can happen in the --children mode, i.e. the default mode when callchains are present, where one of the main entries may be a callchain entry with no samples. So far we were not providing any information about why an annotation couldn't be provided even offering the Annotation option in the popup menu. Work is needed to allow for no-samples "annotation', i.e. to show the disassembly anyway and allow for navigation, etc. Reviewed-by: Jiri Olsa <jolsa@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jin Yao <yao.jin@linux.intel.com> Cc: Kan Liang <kan.liang@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lkml.kernel.org/n/tip-0hhzj2de15o88cguy7h66zre@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
![]() |
4c8b9c0f42 |
perf report/top: Allow pressing hotkeys in the options popup menu
When the users presses ENTER in the main 'perf report/top' screen a popup menu is presented, in it some hotkeys are suggested as alternatives to using the menu, or for additional features. At that point the user may try those hotkeys, so allow for that by recording the key used and exiting, the caller then can check for that possibility and process the hotkey. I.e. try pressing ENTER, and then 'k' to exit and zoom into the kernel map, using ESC then zooms out, etc. Reviewed-by: Jiri Olsa <jolsa@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jin Yao <yao.jin@linux.intel.com> Cc: Kan Liang <kan.liang@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lkml.kernel.org/n/tip-ujfq3fw44kf6qrtfajl5dcsp@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
![]() |
d07126560c |
tools ui popup: Allow returning hotkeys
With this patch if an optional pointer is passed to ui__popup_menu() then when any key that is not being handled (ENTER, ESC, etc) is typed, it'll record that key in the pointer and return, allowing for hotkey processing on the caller. If NULL is passed, no change in logic, unhandled keys continue to be ignored. Reviewed-by: Jiri Olsa <jolsa@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jin Yao <yao.jin@linux.intel.com> Cc: Kan Liang <kan.liang@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lkml.kernel.org/n/tip-6ojn19mqzgmrdm8kdoigic0m@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
![]() |
d10ec006dc |
perf hists browser: Allow passing an initial hotkey
Sometimes we're in an outer code, like the main hists browser popup menu and the user follows a suggestion about using some hotkey, and that hotkey is really handled by hists_browser__run(), so allow for calling it with that hotkey, making it handle it instead of waiting for the user to press one. Reviewed-by: Jiri Olsa <jolsa@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jin Yao <yao.jin@linux.intel.com> Cc: Kan Liang <kan.liang@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lkml.kernel.org/n/tip-xv2l7i6o4urn37nv1h40ryfs@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
![]() |
209f4e70a2 |
perf report/top: Add 'k' hotkey to zoom directly into the kernel map
As a convenience, equivalent to pressing Enter in a line with a kernel symbol and then selecting "Zoom" into the kernel DSO. Reviewed-by: Jiri Olsa <jolsa@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jin Yao <yao.jin@linux.intel.com> Cc: Kan Liang <kan.liang@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lkml.kernel.org/n/tip-vbnlnrpyfvz9deqoobtc3dz7@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
![]() |
632003f400 |
perf hists browser: Generalize the do_zoom_dso() function
We'll use it to provide a top level hotkey to zoom into the kernel dso directly. Reviewed-by: Jiri Olsa <jolsa@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jin Yao <yao.jin@linux.intel.com> Cc: Kan Liang <kan.liang@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lkml.kernel.org/n/tip-ae9cjel6v05wjnz9r6z77b6x@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
![]() |
bdc633fec5 |
perf report/top: Improve toggle callchain menu option
Taking into account the current status of the callchain, i.e. if folded, show "Expand", otherwise "Collapse", also show the name of the entry that will be affected and mention the hotkeys for expanding/collapsing all callchains below the main entry, the one that appears with/without callchains. Reviewed-by: Jiri Olsa <jolsa@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jin Yao <yao.jin@linux.intel.com> Cc: Kan Liang <kan.liang@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lkml.kernel.org/n/tip-03arm6poo8463k5tfcfp7gkk@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
![]() |
d5a599d989 |
perf report/top: Add menu entry for toggling callchain expansion
Since previously pressing ENTER toggled expansion/collapse of callchain entries and now brings up the same menu used when callchains are not present, add an entry so that users can quickly figure out the change in behaviour. Its worth mentioning that we also always had 'e'/'c' to expand/collapse all entries in a hist entry and 'E'/'C' for all hist entries. Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Reviewed-by: Jiri Olsa <jolsa@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jin Yao <yao.jin@linux.intel.com> Cc: Kan Liang <kan.liang@intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lkml.kernel.org/n/tip-f9o03jo29fypvd8ly3j49d36@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
![]() |
9218a9132f |
perf report/top: Make ENTER consistently bring up menu
When callchains are present the ENTER key switches from bringing up the menu that offers Annotation, Zoom by DSO, etc to expanding/collapsing one callchain level, causing confusion, fix it by making it consistently bring up the menu and use '+' to expand/collapse one callchain level. Next patch will also add an entry to the menu to allow expanding/collapsing, so that people used to ENTER expanding one callchain level can quickly find it and use it instead. Reported-by: Linus Torvalds <torvalds@linux-foundation.org> Reviewed-by: Jiri Olsa <jolsa@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jin Yao <yao.jin@linux.intel.com> Cc: Kan Liang <kan.liang@intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lkml.kernel.org/n/tip-bjz35omktig8cwn6lbj1ifns@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
![]() |
3f7774033e |
perf hists browser: Restore ESC as "Zoom out" of DSO/thread/etc
We need to set actions->ms.map since |
|
![]() |
3ce311afb5 |
libperf: Move to tools/lib/perf
Move libperf from its current location under tools/perf to a separate directory under tools/lib/. Also change various paths (mainly includes) to reflect the libperf move to a separate directory and add a new directory under MANIFEST. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Michael Petlan <mpetlan@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lore.kernel.org/lkml/20191206210612.8676-2-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
![]() |
6ae9c10b7c |
perf tests bp_signal: Show expected versus obtained values
To help understand failures. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lkml.kernel.org/n/tip-c951j3gvrgnrsyg7ki7pwkiz@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
![]() |
c30d630d1b |
perf sched timehist: Add support for filtering on CPU
Allow user to limit output to one or more CPUs. Really helpful on systems with a large number of cpus. Committer testing: # perf sched record -a sleep 1 [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 1.765 MB perf.data (1412 samples) ] [root@quaco ~]# perf sched timehist | head Samples do not have callchains. time cpu task name wait time sch delay run time [tid/pid] (msec) (msec) (msec) --------------- ------ ------------------------------ --------- --------- --------- 66307.802686 [0000] perf[13086] 0.000 0.000 0.000 66307.802700 [0000] migration/0[12] 0.000 0.001 0.014 66307.802766 [0001] perf[13086] 0.000 0.000 0.000 66307.802774 [0001] migration/1[15] 0.000 0.001 0.007 66307.802841 [0002] perf[13086] 0.000 0.000 0.000 66307.802849 [0002] migration/2[20] 0.000 0.001 0.008 66307.802913 [0003] perf[13086] 0.000 0.000 0.000 # # perf sched timehist --cpu 2 | head Samples do not have callchains. time cpu task name wait time sch delay run time [tid/pid] (msec) (msec) (msec) --------------- ------ ------------------------------ --------- --------- --------- 66307.802841 [0002] perf[13086] 0.000 0.000 0.000 66307.802849 [0002] migration/2[20] 0.000 0.001 0.008 66307.964485 [0002] <idle> 0.000 0.000 161.635 66307.964811 [0002] CPU 0/KVM[3589/3561] 0.000 0.056 0.325 66307.965477 [0002] <idle> 0.325 0.000 0.666 66307.965553 [0002] CPU 0/KVM[3589/3561] 0.666 0.024 0.076 66307.966456 [0002] <idle> 0.076 0.000 0.903 # Signed-off-by: David Ahern <dsahern@gmail.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: http://lore.kernel.org/lkml/20191204173925.66976-1-dsahern@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
![]() |
8384a2600c |
perf record: Adapt affinity to machines with #CPUs > 1K
Use struct mmap_cpu_mask type for the tool's thread and mmap data buffers to overcome current 1024 CPUs mask size limitation of cpu_set_t type. Currently glibc's cpu_set_t type has an internal mask size limit of 1024 CPUs. Moving to the 'struct mmap_cpu_mask' type allows overcoming that limit. The tools bitmap API is used to manipulate objects of 'struct mmap_cpu_mask' type. Committer notes: To print the 'nbits' struct member we must use %zd, since it is a size_t, this fixes the build in some toolchains/arches. Reported-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Alexey Budankov <alexey.budankov@linux.intel.com> Acked-by: Jiri Olsa <jolsa@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lore.kernel.org/lkml/96d7e2ff-ce8b-c1e0-d52c-aa59ea96f0ea@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
![]() |
9c080c0279 |
perf mmap: Declare type for cpu mask of arbitrary length
Declare a dedicated struct map_cpu_mask type for cpu masks of arbitrary length. The mask is available thru bits pointer and the mask length is kept in nbits field. MMAP_CPU_MASK_BYTES() macro returns mask storage size in bytes. The mmap_cpu_mask__scnprintf() function can be used to log text representation of the mask. Committer notes: To print the 'nbits' struct member we must use %zd, since it is a size_t, this fixes the build in some toolchains/arches. Signed-off-by: Alexey Budankov <alexey.budankov@linux.intel.com> Acked-by: Jiri Olsa <jolsa@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lore.kernel.org/lkml/0fd2454f-477f-d15a-f4ee-79bcbd2585ff@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
![]() |
55347ec340 |
perf hists: Fix variable name's inconsistency in hists__for_each() macro
Variable names are inconsistent in hists__for_each macro(). Due to this inconsistency, the macro replaces its second argument with "fmt" regardless of its original name. So far it works because only "fmt" is passed to the second argument. However, this behavior is not expected and should be fixed. Fixes: |
|
![]() |
a75af86b6f |
perf map: Set kmap->kmaps backpointer for main kernel map chunks
When a map is create to represent the main kernel area (vmlinux) with map__new2() we allocate an extra area to store a pointer to the 'struct maps' for the kernel maps, so that we can access that struct when loading ELF files or kallsyms, as we will need to split it in multiple maps, one per kernel module or ELF section (such as ".init.text"). So when map->dso->kernel is non-zero, it is expected that map__kmap(map)->kmaps to be set to the tree of kernel maps (modules, chunks of the main kernel, bpf progs put in place via PERF_RECORD_KSYMBOL, the main kernel). This was not the case when we were splitting the main kernel into chunks for its ELF sections, which ended up making 'perf report --children' processing a perf.data file with callchains to trip on __map__is_kernel(), when we press ENTER to see the popup menu for main histogram entries that starts at a symbol in the ".init.text" ELF section, e.g.: - 8.83% 0.00% swapper [kernel.vmlinux].init.text [k] start_kernel start_kernel cpu_startup_entry do_idle cpuidle_enter cpuidle_enter_state intel_idle Fix it. Reviewed-by: Jiri Olsa <jolsa@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/lkml/20191218190120.GB13282@kernel.org/ Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
![]() |
0feba17bd7 |
perf report: Fix incorrectly added dimensions as switch perf data file
We observed an issue that was some extra columns displayed after switching
perf data file in browser. The steps to reproduce:
1. perf record -a -e cycles,instructions -- sleep 3
2. perf report --group
3. In browser, we use hotkey 's' to switch to another perf.data
4. Now in browser, the extra columns 'Self' and 'Children' are displayed.
The issue is setup_sorting() executed again after repeat path, so dimensions
are added again.
This patch checks the last key returned from __cmd_report(). If it's
K_SWITCH_INPUT_DATA, skips the setup_sorting().
Fixes:
|
|
![]() |
58b3bafff8 |
perf vendor events s390: Remove name from L1D_RO_EXCL_WRITES description
In |
|
![]() |
28396b7df0 |
perf vendor events s390: Fix counter long description for DTLB1_GPAGE_WRITES
The cf_z13 counter DTLB1_GPAGE_WRITES included a prefix
'Counter:132\tName:'.
This is incorrect; remove the prefix as with
|
|
![]() |
2870782687 |
perf header: Fix false warning when there are no duplicate cache entries
Before this patch, perf expected that there might be NPROC*4 unique cache entries at max, however, it also expected that some of them would be shared and/or of the same size, thus the final number of entries would be reduced to be lower than NPROC*4. In case the number of entries hadn't been reduced (was NPROC*4), the warning was printed. However, some systems might have unusual cache topology, such as the following two-processor KVM guest: cpu level shared_cpu_list size 0 1 0 32K 0 1 0 64K 0 2 0 512K 0 3 0 8192K 1 1 1 32K 1 1 1 64K 1 2 1 512K 1 3 1 8192K This KVM guest has 8 (NPROC*4) unique cache entries, which used to make perf printing the message, although there actually aren't "way too many cpu caches". v2: Removing unused argument. v3: Unifying the way we obtain number of cpus. v4: Removed '& UINT_MAX' construct which is redundant. Signed-off-by: Michael Petlan <mpetlan@redhat.com> Acked-by: Jiri Olsa <jolsa@redhat.com> LPU-Reference: 20191208162056.20772-1-mpetlan@redhat.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
![]() |
eb573e746b |
perf metricgroup: Fix printing event names of metric group with multiple events
Commit |
|
![]() |
0dd674efaf |
perf/x86/pmu-events: Fix Kernel_Utilization metric
Kernel Utilization should divide ref cycles spent in kernel with total ref cycles. Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com> Reviewed-by: Andi Kleen <ak@linux.intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Haiyan Song <haiyanx.song@intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Link: http://lore.kernel.org/lkml/20191204162121.29998-1-ravi.bangoria@linux.ibm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
![]() |
61208e6e10 |
perf top: Do not bail out when perf_env__read_cpuid() returns ENOSYS
'perf top' stopped working on hw architectures that do not provide a
get_cpuid() implementation and thus fallback to the weak get_cpuid()
default function.
This is done because at annotation time we may need it in the arch
specific annotation init routine, but that is only being used by arches
that do provide a get_cpuid() implementation:
$ find tools/ -name "*.[ch]" | xargs grep 'evlist->env'
tools/perf/builtin-top.c: top.evlist->env = &perf_env;
tools/perf/util/evsel.c: return evsel->evlist->env;
tools/perf/util/s390-cpumsf.c: sf->machine_type = s390_cpumsf_get_type(session->evlist->env->cpuid);
tools/perf/util/header.c: session->evlist->env = &header->env;
tools/perf/util/sample-raw.c: const char *arch_pf = perf_env__arch(evlist->env);
$
$ find tools/perf/arch -name "*.[ch]" | xargs grep -w get_cpuid
tools/perf/arch/x86/util/auxtrace.c: ret = get_cpuid(buffer, sizeof(buffer));
tools/perf/arch/x86/util/header.c:get_cpuid(char *buffer, size_t sz)
tools/perf/arch/powerpc/util/header.c:get_cpuid(char *buffer, size_t sz)
tools/perf/arch/s390/util/header.c: * Implementation of get_cpuid().
tools/perf/arch/s390/util/header.c:int get_cpuid(char *buffer, size_t sz)
tools/perf/arch/s390/util/header.c: if (buf && get_cpuid(buf, 128))
$
For 'report' or 'script', i.e. tools working on perf.data files, that is
setup while reading the header, its just top that needs to explicitely
read it at tool start.
Fixes:
|
|
![]() |
05267c7eac |
perf arch: Make the default get_cpuid() return compatible error
Some of the functions calling get_cpuid() propagate back the error it returns, and all are using errno (positive) values, make the weak default get_cpuid() function return ENOSYS to be consistent and to allow checking if this is an arch not providing this function or if a provided one is having trouble getting the cpuid, to decide if the warning should be provided to the user or just a debug message should be emitted. Reviewed-by: Mark Rutland <mark.rutland@arm.com> Tested-by: Mark Rutland <mark.rutland@arm.com> Tested-by: John Garry <john.garry@huawei.com> # arm64 Acked-by: Jiri Olsa <jolsa@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Will Deacon <will@kernel.org> Link: https://lkml.kernel.org/n/tip-lxwjr0cd2eggzx04a780ffrv@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
![]() |
761bfc33dd |
Merge remote-tracking branch 'torvalds/master' into perf/urgent
To pick up BPF fixes to allow a clean 'make -C tools/perf build-test': |
|
![]() |
29f6eeca0e |
perf inject: Fix processing of ID index for injected instruction tracing
The ID index event is used when decoding, but can result in the
following error:
$ perf record --aux-sample -e '{intel_pt//,branch-misses}:u' ls
$ perf inject -i perf.data -o perf.data.inj --itrace=be
$ perf script -i perf.data.inj
0x1020 [0x410]: failed to process type: 69 [No such file or directory]
Fix by having 'perf inject' drop the ID index event.
Fixes:
|
|
![]() |
bb30acae4c |
perf report: Bail out --mem-mode if mem info is not available
If perf.data is recorded without -d, don't allow user to use --mem-mode with 'perf report'. symbol_daddr and phys_daddr can be recorded separately and may be present in the perf.data but at the report time they are associated with mem-mode fields and thus this restriction applies to them as well. Before: $ perf record ls $ perf report --mem-mode --stdio # Overhead Local Weight Memory access Symbol # ........ ............ ............. ....................... 55.56% 0 N/A [k] 0xffffffff81a00ae7 After: $ perf report --mem-mode --stdio Error: Selected --mem-mode but no mem data. Did you call perf record without -d? Suggested-by: Arnaldo Carvalho de Melo <acme@kernel.org> Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jin Yao <yao.jin@linux.intel.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Link: http://lore.kernel.org/lkml/20191114132213.5419-4-ravi.bangoria@linux.ibm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
![]() |
aa6b3c9923 |
perf report: Make -F more strict like -s
Currently -F allows branch-mode / mem-mode fields with -F even when perf report is not running in that mode. Don't allow that. Suggested-by: Arnaldo Carvalho de Melo <acme@kernel.org> Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jin Yao <yao.jin@linux.intel.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Link: http://lore.kernel.org/lkml/20191114132213.5419-3-ravi.bangoria@linux.ibm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
![]() |
ae87405fb5 |
perf report/top TUI: Replace pr_err() with ui__error()
pr_err() in TUI mode does not print anyting on the screen and just quits. Replace such pr_err() with ui__error(). Before: $ perf report -s + $ After: $ perf report -s + ┌─Error:────────────────┐ │Invalid --sort key: `+'│ │ │ │Press any key... │ └───────────────────────┘ Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jin Yao <yao.jin@linux.intel.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Link: http://lore.kernel.org/lkml/20191114132213.5419-2-ravi.bangoria@linux.ibm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
![]() |
734c7022ad |
Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf
Daniel Borkmann says: ==================== pull-request: bpf 2019-12-02 The following pull-request contains BPF updates for your *net* tree. We've added 10 non-merge commits during the last 6 day(s) which contain a total of 10 files changed, 60 insertions(+), 51 deletions(-). The main changes are: 1) Fix vmlinux BTF generation for binutils pre v2.25, from Stanislav Fomichev. 2) Fix libbpf global variable relocation to take symbol's st_value offset into account, from Andrii Nakryiko. 3) Fix libbpf build on powerpc where check_abi target fails due to different readelf output format, from Aurelien Jarno. 4) Don't set BPF insns RO for the case when they are JITed in order to avoid fragmenting the direct map, from Daniel Borkmann. 5) Fix static checker warning in btf_distill_func_proto() as well as a build error due to empty enum when BPF is compiled out, from Alexei Starovoitov. 6) Fix up generation of bpf_helper_defs.h for perf, from Arnaldo Carvalho de Melo. ==================== Signed-off-by: David S. Miller <davem@davemloft.net> |
|
![]() |
9974406884 |
perf kvm: Clarify the 'perf kvm' -i and -o command line options
The 'perf kvm' subcommand has options that it in turn passes to other perf subcommands such as 'report' and 'record', particularly -i and -o end up setting the same variable that will then be used for 'record's -o and report '-i', which ends up being confusing, leading some to think that both -i and -o can be used with 'report'. Improve the man page to state that -i is used with the post-processing subcommands while -o is used just with 'record' and that to save the output of 'report' one should simply redirect its output to a file. Noticed while reading the https://www.linux-kvm.org/page/Perf_events page. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Steve Dickson <steved@redhat.com> Cc: William Cohen <wcohen@redhat.com> Link: https://lkml.kernel.org/n/tip-tclbttvmgtm525fvmh85f7d9@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
![]() |
f6661125ff |
perf beauty: Add CLEAR_SIGHAND support for clone's flags arg
Add support for the recently added CLONE_CLEAR_SIGHAND flag. This takes advantage of the copy of the uapi/linux/sched.h we have in tools/include, which allows us to build tools/perf in older systems and have the binary support printing that flag whenever that system gets its kernel updated to one where this feature is present. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Adrian Reber <areber@redhat.com> Cc: Christian Brauner <christian.brauner@ubuntu.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org Link: https://lkml.kernel.org/n/tip-1vnz497ubtu5oz16ygdcul0e@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
![]() |
bd5c6b81dd |
perf bench: Update the copies of x86's mem{cpy,set}_64.S
And update linux/linkage.h, which requires in turn that we make these files switch from ENTRY()/ENDPROC() to SYM_FUNC_START()/SYM_FUNC_END(): tools/perf/arch/arm64/tests/regs_load.S tools/perf/arch/arm/tests/regs_load.S tools/perf/arch/powerpc/tests/regs_load.S tools/perf/arch/x86/tests/regs_load.S We also need to switch SYM_FUNC_START_LOCAL() to SYM_FUNC_START() for the functions used directly by 'perf bench', and update tools/perf/check_headers.sh to ignore those changes when checking if the kernel original files drifted from the copies we carry. This is to get the changes from: |
|
![]() |
77b91c1a52 |
perf machine: Fill map_symbol->maps in append_inlines() to fix segfault
I forgot to fill in the map_symbol->maps field in append_inlines() which
then makes code down the line segfault when trying to deref it.
It doesn't make any sense to have an addr_location with its 'map' member
not NULL while its 'maps' is NULL, after all al->maps is where al->map
is in.
It is done that way so that we don't have to have in each 'struct map' a
pointer to the 'struct maps' it is in, as we had in the past when we
would have 'map->mg', before 'struct maps' was combined with 'struct
map_groups', because there was always a one-to-one relationship for
these structs.
This fixes a segfault when processing DWARF callgraphs in 'perf report'.
Reported-by: Jiri Olsa <jolsa@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Fixes:
|
|
![]() |
fa7f7e7354 |
perf jit: Move test functionality in to a test
Adds a test for minimal jit_write_elf functionality. Committer testing: # perf test jit 61: Test jit_write_elf : Ok # # perf test -v jit 61: Test jit_write_elf : --- start --- test child forked, pid 10460 Writing jit code to: /tmp/perf-test-KqxURR test child finished with 0 ---- end ---- Test jit_write_elf: Ok # Committer notes: Fix up the case where HAVE_JITDUMP is no defined. Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Alexios Zavras <alexios.zavras@intel.com> Cc: Allison Randal <allison@lohutok.net> Cc: Florian Fainelli <f.fainelli@gmail.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Kate Stewart <kstewart@linuxfoundation.org> Cc: Leo Yan <leo.yan@linaro.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Michael Petlan <mpetlan@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Song Liu <songliubraving@fb.com> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lore.kernel.org/lkml/20191126235913.41855-1-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
![]() |
704e2f5b70 |
perf stat: Use affinity for enabling/disabling events
Restructure event enabling/disabling to use affinity, which minimizes the number of IPIs needed. Before on a large test case with 94 CPUs: % time seconds usecs/call calls errors syscall ------ ----------- ----------- --------- --------- ---------------- 54.65 1.899986 22 84812 660 ioctl after: 39.21 0.930451 10 84796 644 ioctl Signed-off-by: Andi Kleen <ak@linux.intel.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Link: http://lore.kernel.org/lkml/20191121001522.180827-13-andi@firstfloor.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
![]() |
363fb12189 |
perf evsel: Add functions to enable/disable for a specific CPU
Refactor the existing functions to use these functions internally. Used in the next patch. Signed-off-by: Andi Kleen <ak@linux.intel.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Link: http://lore.kernel.org/lkml/20191121001522.180827-12-andi@firstfloor.org Link: http://lore.kernel.org/lkml/20191127232657.GL84886@tassilo.jf.intel.com # Fix Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
![]() |
4b49ab708d |
perf stat: Use affinity for reading
Restructure event reading to use affinity to minimize the number of IPIs needed. Before on a large test case with 94 CPUs: % time seconds usecs/call calls errors syscall ------ ----------- ----------- --------- --------- ---------------- 3.16 0.106079 4 22082 read After: 3.43 0.081295 3 22082 read Signed-off-by: Andi Kleen <ak@linux.intel.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Link: http://lore.kernel.org/lkml/20191121001522.180827-11-andi@firstfloor.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
![]() |
4804e01116 |
perf stat: Use affinity for opening events
Restructure the event opening in perf stat to cycle through the events by CPU after setting affinity to that CPU. This eliminates IPI overhead in the perf API. We have to loop through the CPU in the outter builtin-stat code instead of leaving that to low level functions. It has to change the weak group fallback strategy slightly. Since we cannot easily undo the opens for other CPUs move the weak group retry to a separate loop. Before with a large test case with 94 CPUs: % time seconds usecs/call calls errors syscall ------ ----------- ----------- --------- --------- ---------------- 42.75 4.050910 67 60046 110 perf_event_open After: 26.86 0.944396 16 58069 110 perf_event_open (the number changes slightly because the weak group retries work differently and the test case relies on weak groups) Committer notes: Added one of the hunks in a patch provided by Andi after I noticed that the "event times" 'perf test' entry was segfaulting. Signed-off-by: Andi Kleen <ak@linux.intel.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Link: http://lore.kernel.org/lkml/20191121001522.180827-10-andi@firstfloor.org Link: http://lore.kernel.org/lkml/20191127232657.GL84886@tassilo.jf.intel.com # Fix Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
![]() |
e0e6a6ca3a |
perf stat: Factor out open error handling
Factor out the open error handling into a separate function. This is useful for followon patches who need to duplicate this. No behavior change intended. Signed-off-by: Andi Kleen <ak@linux.intel.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Link: http://lore.kernel.org/lkml/20191121001522.180827-9-andi@firstfloor.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
![]() |
7736627b86 |
perf stat: Use affinity for closing file descriptors
Closing a perf fd can also trigger an IPI to the target CPU. Use the same affinity technique as we use for reading/enabling events to closing to optimize the CPU transitions. Before on a large test case with 94 CPUs: % time seconds usecs/call calls errors syscall ------ ----------- ----------- --------- --------- ---------------- 32.56 3.085463 50 61483 close After: 10.54 0.735704 11 61485 close Signed-off-by: Andi Kleen <ak@linux.intel.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Link: http://lore.kernel.org/lkml/20191121001522.180827-8-andi@firstfloor.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
![]() |
99d6141d67 |
perf evsel: Add functions to close evsel on a CPU
Refactor the existing all CPU function to use the per CPU close internally. Export APIs to close per CPU. Signed-off-by: Andi Kleen <ak@linux.intel.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Link: http://lore.kernel.org/lkml/20191121001522.180827-7-andi@firstfloor.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
![]() |
a8cbe40fe9 |
perf evsel: Add iterator to iterate over events ordered by CPU
Add some common code that is needed to iterate over all events in CPU order. Used in followon patches Signed-off-by: Andi Kleen <ak@linux.intel.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Link: http://lore.kernel.org/lkml/20191121001522.180827-6-andi@firstfloor.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
![]() |
a2408a7036 |
perf evlist: Maintain evlist->all_cpus
Maintain a cpumap in the evlist that is the union of all the cpus of the events. This needs a cpumap merge operation, which is added together with tests. v2: Add tests for cpu map merge Fix handling of duplicates Rename _update to _merge Factor out sorting. Fix handling of NULL maps in merge v3: Add comments and empty lines to _merge Committer testing: # perf test "Merge cpu map" 52: Merge cpu map : Ok # Signed-off-by: Andi Kleen <ak@linux.intel.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Link: http://lore.kernel.org/lkml/20191121001522.180827-5-andi@firstfloor.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
![]() |
7074674e73 |
perf cpumap: Maintain cpumaps ordered and without dups
Enforce this in _trim() Needed for followon change. Signed-off-by: Andi Kleen <ak@linux.intel.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Link: http://lore.kernel.org/lkml/20191121001522.180827-4-andi@firstfloor.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
![]() |
5172672da0 |
perf script: Fix invalid LBR/binary mismatch error
The 'len' returned by grab_bb() includes an extra MAXINSN bytes to allow
for the last instruction, so the the final 'offs' will not be 'len'.
Fix the error condition logic accordingly.
Before:
$ perf record -e '{intel_pt//,cpu/mem_inst_retired.all_loads,aux-sample-size=8192/pp}:u' grep -rqs jhgjhg /boot
[ perf record: Woken up 19 times to write data ]
[ perf record: Captured and wrote 2.274 MB perf.data ]
$ perf script -F +brstackinsn --xed --itrace=i1usl100 | head
grep 13759 [002] 8091.310257: 1862 instructions:uH: 5641d58069eb bmexec+0x86b (/bin/grep)
bmexec+2485:
00005641d5806b35 jnz 0x5641d5806bd0 # MISPRED
00005641d5806bd0 movzxb (%r13,%rdx,1), %eax
00005641d5806bd6 add %rdi, %rax
00005641d5806bd9 movzxb -0x1(%rax), %edx
00005641d5806bdd cmp %rax, %r14
00005641d5806be0 jnb 0x5641d58069c0 # MISPRED
mismatch of LBR data and executable
00005641d58069c0 movzxb (%r13,%rdx,1), %edi
After:
$ perf script -F +brstackinsn --xed --itrace=i1usl100 | head
grep 13759 [002] 8091.310257: 1862 instructions:uH: 5641d58069eb bmexec+0x86b (/bin/grep)
bmexec+2485:
00005641d5806b35 jnz 0x5641d5806bd0 # MISPRED
00005641d5806bd0 movzxb (%r13,%rdx,1), %eax
00005641d5806bd6 add %rdi, %rax
00005641d5806bd9 movzxb -0x1(%rax), %edx
00005641d5806bdd cmp %rax, %r14
00005641d5806be0 jnb 0x5641d58069c0 # MISPRED
00005641d58069c0 movzxb (%r13,%rdx,1), %edi
00005641d58069c6 add %rax, %rdi
Fixes:
|
|
![]() |
0cd032d3b5 |
perf script: Fix brstackinsn for AUXTRACE
brstackinsn must be allowed to be set by the user when AUX area data has
been captured because, in that case, the branch stack might be
synthesized on the fly. This fixes the following error:
Before:
$ perf record -e '{intel_pt//,cpu/mem_inst_retired.all_loads,aux-sample-size=8192/pp}:u' grep -rqs jhgjhg /boot
[ perf record: Woken up 19 times to write data ]
[ perf record: Captured and wrote 2.274 MB perf.data ]
$ perf script -F +brstackinsn --xed --itrace=i1usl100 | head
Display of branch stack assembler requested, but non all-branch filter set
Hint: run 'perf record -b ...'
After:
$ perf record -e '{intel_pt//,cpu/mem_inst_retired.all_loads,aux-sample-size=8192/pp}:u' grep -rqs jhgjhg /boot
[ perf record: Woken up 19 times to write data ]
[ perf record: Captured and wrote 2.274 MB perf.data ]
$ perf script -F +brstackinsn --xed --itrace=i1usl100 | head
grep 13759 [002] 8091.310257: 1862 instructions:uH: 5641d58069eb bmexec+0x86b (/bin/grep)
bmexec+2485:
00005641d5806b35 jnz 0x5641d5806bd0 # MISPRED
00005641d5806bd0 movzxb (%r13,%rdx,1), %eax
00005641d5806bd6 add %rdi, %rax
00005641d5806bd9 movzxb -0x1(%rax), %edx
00005641d5806bdd cmp %rax, %r14
00005641d5806be0 jnb 0x5641d58069c0 # MISPRED
mismatch of LBR data and executable
00005641d58069c0 movzxb (%r13,%rdx,1), %edi
Fixes:
|
|
![]() |
267ed5d859 |
perf affinity: Add infrastructure to save/restore affinity
The kernel perf subsystem has to IPI to the target CPU for many operations. On systems with many CPUs and when managing many events the overhead can be dominated by lots of IPIs. An alternative is to set up CPU affinity in the perf tool, then set up all the events for that CPU, and then move on to the next CPU. Add some affinity management infrastructure to enable such a model. Used in followon patches. Committer notes: Use zfree() in some places, add missing stdbool.h header, some minor coding style changes. Signed-off-by: Andi Kleen <ak@linux.intel.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Link: http://lore.kernel.org/lkml/20191121001522.180827-3-andi@firstfloor.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
![]() |
d96645821e |
perf pmu: Use file system cache to optimize sysfs access
pmu.c does a lot of redundant /sys accesses while parsing aliases and probing for PMUs. On large systems with a lot of PMUs this can get expensive (>2s): % time seconds usecs/call calls errors syscall ------ ----------- ----------- --------- --------- ---------------- 27.25 1.227847 8 160888 16976 openat 26.42 1.190481 7 164224 164077 stat Add a cache to remember if specific file names exist or don't exist, which eliminates most of this overhead. Also optimize some stat() calls to be slightly cheaper access() Resulting in: 0.18 0.004166 2 1851 305 open 0.08 0.001970 2 829 622 access Signed-off-by: Andi Kleen <ak@linux.intel.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Link: http://lore.kernel.org/lkml/20191121001522.180827-2-andi@firstfloor.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
![]() |
5b596e0ff0 |
perf regs: Make perf_reg_name() return "unknown" instead of NULL
To avoid breaking the build on arches where this is not wired up, at
least all the other features should be made available and when using
this specific routine, the "unknown" should point the user/developer to
the need to wire this up on this particular hardware architecture.
Detected in a container mipsel debian cross build environment, where it
shows up as:
In file included from /usr/mipsel-linux-gnu/include/stdio.h:867,
from /git/linux/tools/perf/lib/include/perf/cpumap.h:6,
from util/session.c:13:
In function 'printf',
inlined from 'regs_dump__printf' at util/session.c:1103:3,
inlined from 'regs__printf' at util/session.c:1131:2:
/usr/mipsel-linux-gnu/include/bits/stdio2.h:107:10: error: '%-5s' directive argument is null [-Werror=format-overflow=]
107 | return __printf_chk (__USE_FORTIFY_LEVEL - 1, __fmt, __va_arg_pack ());
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
cross compiler details:
mipsel-linux-gnu-gcc (Debian 9.2.1-8) 9.2.1 20190909
Also on mips64:
In file included from /usr/mips64-linux-gnuabi64/include/stdio.h:867,
from /git/linux/tools/perf/lib/include/perf/cpumap.h:6,
from util/session.c:13:
In function 'printf',
inlined from 'regs_dump__printf' at util/session.c:1103:3,
inlined from 'regs__printf' at util/session.c:1131:2,
inlined from 'regs_user__printf' at util/session.c:1139:3,
inlined from 'dump_sample' at util/session.c:1246:3,
inlined from 'machines__deliver_event' at util/session.c:1421:3:
/usr/mips64-linux-gnuabi64/include/bits/stdio2.h:107:10: error: '%-5s' directive argument is null [-Werror=format-overflow=]
107 | return __printf_chk (__USE_FORTIFY_LEVEL - 1, __fmt, __va_arg_pack ());
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
In function 'printf',
inlined from 'regs_dump__printf' at util/session.c:1103:3,
inlined from 'regs__printf' at util/session.c:1131:2,
inlined from 'regs_intr__printf' at util/session.c:1147:3,
inlined from 'dump_sample' at util/session.c:1249:3,
inlined from 'machines__deliver_event' at util/session.c:1421:3:
/usr/mips64-linux-gnuabi64/include/bits/stdio2.h:107:10: error: '%-5s' directive argument is null [-Werror=format-overflow=]
107 | return __printf_chk (__USE_FORTIFY_LEVEL - 1, __fmt, __va_arg_pack ());
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
cross compiler details:
mips64-linux-gnuabi64-gcc (Debian 9.2.1-8) 9.2.1 20190909
Fixes:
|
|
![]() |
2b1ac6403f |
perf diff: Use llabs() with 64-bit values
To fix this build error on a debian mipsel cross build environment:
builtin-diff.c: In function 'compute_cycles_diff':
builtin-diff.c:649:10: error: absolute value function 'labs' given an argument of type 's64' {aka 'long long int'} but has parameter of type 'long int' which may cause truncation of value [-Werror=absolute-value]
649 | val = labs(pair->block_info->cycles_spark[i] -
| ^~~~
Fixes:
|
|
![]() |
98e9324511 |
perf diff: Use llabs() with 64-bit values
To fix these build errors on a debian mipsel cross build environment:
builtin-diff.c: In function 'block_cycles_diff_cmp':
builtin-diff.c:550:6: error: absolute value function 'labs' given an argument of type 's64' {aka 'long long int'} but has parameter of type 'long int' which may cause truncation of value [-Werror=absolute-value]
550 | l = labs(left->diff.cycles);
| ^~~~
builtin-diff.c:551:6: error: absolute value function 'labs' given an argument of type 's64' {aka 'long long int'} but has parameter of type 'long int' which may cause truncation of value [-Werror=absolute-value]
551 | r = labs(right->diff.cycles);
| ^~~~
Fixes:
|
|
![]() |
1fd450f992 |
libbpf: Fix up generation of bpf_helper_defs.h
$ make -C tools/perf build-test
does, ends up with these two problems:
make[3]: *** No rule to make target '/tmp/tmp.zq13cHILGB/perf-5.3.0/include/uapi/linux/bpf.h', needed by 'bpf_helper_defs.h'. Stop.
make[3]: *** Waiting for unfinished jobs....
make[2]: *** [Makefile.perf:757: /tmp/tmp.zq13cHILGB/perf-5.3.0/tools/lib/bpf/libbpf.a] Error 2
make[2]: *** Waiting for unfinished jobs....
Because $(srcdir) points to the /tmp/tmp.zq13cHILGB/perf-5.3.0 directory
and we need '/tools/ after that variable, and after fixing this then we
get to another problem:
/bin/sh: /home/acme/git/perf/tools/scripts/bpf_helpers_doc.py: No such file or directory
make[3]: *** [Makefile:184: bpf_helper_defs.h] Error 127
make[3]: *** Deleting file 'bpf_helper_defs.h'
LD /tmp/build/perf/libapi-in.o
make[2]: *** [Makefile.perf:778: /tmp/build/perf/libbpf.a] Error 2
make[2]: *** Waiting for unfinished jobs....
Because this requires something outside the tools/ directories that gets
collected into perf's detached tarballs, to fix it just add it to
tools/perf/MANIFEST, which this patch does, now it works for that case
and also for all these other cases.
Fixes:
|
|
![]() |
7b65e2034f |
perf tools: Allow to link with libbpf dynamicaly
Currently we support only static linking with kernel's libbpf (tools/lib/bpf). This patch adds libbpf package detection and support to link perf with it dynamically. The libbpf package status is displayed with: $ make VF=1 Auto-detecting system features: ... ... libbpf: [ on ] It's not checked by default, because it's quite new. Once it's on most distros we can switch it on. For the same reason it's not added to the test-all check. Perf does not need advanced version of libbpf, so we can check just for the base bpf_object__open function. Adding new compile variable to detect libbpf package and link bpf dynamically: $ make LIBBPF_DYNAMIC=1 ... LINK perf $ ldd perf | grep bpf libbpf.so.0 => /lib64/libbpf.so.0 (0x00007f46818bc000) If libbpf is not installed, build stops with: Makefile.config:486: *** Error: No libbpf devel library found,\ please install libbpf-devel. Stop. Committer testing: $ make LIBBPF_DYNAMIC=1 -C tools/perf O=/tmp/build/perf make: Entering directory '/home/acme/git/perf/tools/perf' BUILD: Doing 'make -j8' parallel build Makefile.config:493: *** Error: No libbpf devel library found, please install libbpf-devel. Stop. make[1]: *** [Makefile.perf:225: sub-make] Error 2 make: *** [Makefile:70: all] Error 2 make: Leaving directory '/home/acme/git/perf/tools/perf' $ Signed-off-by: Jiri Olsa <jolsa@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Toke Høiland-Jørgensen <toke@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Alexei Starovoitov <alexei.starovoitov@gmail.com> Cc: Andrii Nakryiko <andriin@fb.com> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: Jesper Dangaard Brouer <brouer@redhat.com> Cc: Martin KaFai Lau <kafai@fb.com> Cc: Michael Petlan <mpetlan@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Song Liu <songliubraving@fb.com> Cc: Yonghong Song <yhs@fb.com> Cc: bpf@vger.kernel.org Cc: netdev@vger.kernel.org Link: http://lore.kernel.org/lkml/20191126121253.28253-1-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
![]() |
a5732681e0 |
perf tests: Rename tests/map_groups.c to tests/maps.c
One more step in mergint the maps and map_groups structs. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lkml.kernel.org/n/tip-bw6aagubqxc47m54k2maezfu@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
![]() |
6d38267cf9 |
perf tests: Rename thread-mg-share to thread-maps-share
One more step in merging 'struct maps' with 'struct map_groups'. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lkml.kernel.org/n/tip-naxsl3g4ou3fyxb8l8e0pn5e@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
![]() |
c54d241b35 |
perf maps: Rename map_groups.h to maps.h
One more step in the merge of 'struct maps' with 'struct map_groups'. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lkml.kernel.org/n/tip-9ibtn3vua76f934t7woyf26w@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
![]() |
9a29ceee6b |
perf maps: Rename 'mg' variables to 'maps'
Continuing the merge of 'struct maps' with 'struct map_groups'. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lkml.kernel.org/n/tip-z8d14wrw393a0fbvmnk1bqd9@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
![]() |
f2eaea09d6 |
perf map_symbol: Rename ms->mg to ms->maps
One more step on the merge of 'struct maps' with 'struct map_groups'. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lkml.kernel.org/n/tip-61rra2wg392rhvdgw421wzpt@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
![]() |
694520dfeb |
perf addr_location: Rename al->mg to al->maps
One more step on the merge of 'struct maps' with 'struct map_groups'. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lkml.kernel.org/n/tip-foo95pyyp3bhocbt7yd8qrvq@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
![]() |
fe87797dea |
perf thread: Rename thread->mg to thread->maps
One more step on the merge of 'struct maps' with 'struct map_groups'. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lkml.kernel.org/n/tip-69vcr8pubpym90skxhmbwhiw@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
![]() |
79b6bb73f8 |
perf maps: Merge 'struct maps' with 'struct map_groups'
And pick the shortest name: 'struct maps'. The split existed because we used to have two groups of maps, one for functions and one for variables, but that only complicated things, sometimes we needed to figure out what was at some address and then had to first try it on the functions group and if that failed, fall back to the variables one. That split is long gone, so for quite a while we had only one struct maps per struct map_groups, simplify things by combining those structs. First patch is the minimum needed to merge both, follow up patches will rename 'thread->mg' to 'thread->maps', etc. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lkml.kernel.org/n/tip-hom6639ro7020o708trhxh59@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
![]() |
9adab03488 |
x86/insn: perf tools: Add some more instructions to the new instructions test
Add to the "x86 instruction decoder - new instructions" test the following instructions: v4fmaddps v4fmaddss v4fnmaddps v4fnmaddss vaesdec vaesdeclast vaesenc vaesenclast vcvtne2ps2bf16 vcvtneps2bf16 vdpbf16ps gf2p8affineinvqb vgf2p8affineinvqb gf2p8affineqb vgf2p8affineqb gf2p8mulb vgf2p8mulb vp2intersectd vp2intersectq vp4dpwssd vp4dpwssds vpclmulqdq vpcompressb vpcompressw vpdpbusd vpdpbusds vpdpwssd vpdpwssds vpexpandb vpexpandw vpopcntb vpopcntd vpopcntq vpopcntw vpshldd vpshldq vpshldvd vpshldvq vpshldvw vpshldw vpshrdd vpshrdq vpshrdvd vpshrdvq vpshrdvw vpshrdw vpshufbitqmb For information about the instructions, refer Intel SDM May 2019 (325462-070US) and Intel Architecture Instruction Set Extensions May 2019 (319433-037). Committer testing: $ perf test x86 61: x86 rdpmc : Ok 64: x86 instruction decoder - new instructions : Ok 66: x86 bp modify : Ok $ Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Acked-by: Masami Hiramatsu <mhiramat@kernel.org> Cc: Andi Kleen <ak@linux.intel.com> Cc: Borislav Petkov <bp@alien8.de> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Yu-cheng Yu <yu-cheng.yu@intel.com> Cc: x86@kernel.org Link: http://lore.kernel.org/lkml/20191125125044.31879-2-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
![]() |
a82f15e39a |
perf map: Remove unused functions
At some point those stopped being used, prune them. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lkml.kernel.org/n/tip-p2k98mj3ff2uk1z95sbl5r6e@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
![]() |
805fcbc4fb |
perf map: Remove needless struct forward declarations
At some point we may have needed that, not anymore. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lkml.kernel.org/n/tip-hnao13231bsl7xml5wn8h4iu@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
![]() |
40df3897f0 |
perf map: Ditch leftover map__reloc_vmlinux() prototype
In
|
|
![]() |
540a63ea30 |
perf script: Move map__fprintf_srccode() to near its only user
No need to have it elsewhere. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lkml.kernel.org/n/tip-8cw846pudpxo0xdkvi9qnvrh@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
![]() |
ceb9e77324 |
Merge branch 'x86/core' into perf/core, to resolve conflicts and to pick up completed topic tree
Conflicts: tools/perf/check-headers.sh Signed-off-by: Ingo Molnar <mingo@kernel.org> |
|
![]() |
4584f084aa |
perf parse: Fix potential memory leak when handling tracepoint errors
An error may be in place when tracepoint_error is called, use parse_events__handle_error to avoid a memory leak and to capture the first and last error. Error detected by LLVM's libFuzzer using the following event: $ perf stat -e 'msr/event/,f:e' event syntax error: 'msr/event/,f:e' \___ can't access trace events Error: No permissions to read /sys/kernel/debug/tracing/events/f/e Hint: Try 'sudo mount -o remount,mode=755 /sys/kernel/debug/tracing/' Initial error: event syntax error: 'msr/event/,f:e' \___ no value assigned for term Run 'perf list' for a list of valid events Usage: perf stat [<options>] [<command>] -e, --event <event> event selector. use 'perf list' to list available events Signed-off-by: Ian Rogers <irogers@google.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jin Yao <yao.jin@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: clang-built-linux@googlegroups.com Link: http://lore.kernel.org/lkml/20191120180925.21787-1-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
![]() |
358f98ee8a |
perf probe: Fix spelling mistake "addrees" -> "address"
There is a spelling mistake in a pr_warning message. Fix it. Signed-off-by: Colin King <colin.king@canonical.com> Acked-by: Masami Hiramatsu <mhiramat@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: kernel-janitors@vger.kernel.org Link: http://lore.kernel.org/lkml/20191121092623.374896-1-colin.king@canonical.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
![]() |
32a1ece4bd |
perf intel-bts: Does not support AUX area sampling
Add an error message because Intel BTS does not support AUX area sampling. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Link: http://lore.kernel.org/lkml/20191115124225.5247-16-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
![]() |
dbd134322e |
perf intel-pt: Add support for decoding AUX area samples
Add support for dumping, queuing and decoding AUX area samples. Decoding samples is the same as regular decoding, except in the case where there are no timestamps, in which case buffers are decoded immediately before the sample event. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Link: http://lore.kernel.org/lkml/20191115124225.5247-15-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
![]() |
c4ab2f0f76 |
perf intel-pt: Add support for recording AUX area samples
Set up the default number of mmap pages, default sample size and default psb_period for AUX area sampling. Add documentation also. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Link: http://lore.kernel.org/lkml/20191115124225.5247-14-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
![]() |
a1ac7de690 |
perf pmu: When using default config, record which bits of config were changed by the user
Default config for a PMU is defined before selected events are parsed. That allows the user-entered config to override the default config. However that does not allow for changing the default config based on other options. For example, if the user chooses AUX area sampling mode, in the case of Intel PT, the psb_period needs to be small for sampling, so there is a need to set the default psb_period to 0 (2 KiB) in that case. However that should not override a value set by the user. To allow for that, when using default config, record which bits of config were changed by the user. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Link: http://lore.kernel.org/lkml/20191115124225.5247-13-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
![]() |
ac2f445fc8 |
perf auxtrace: Add support for queuing AUX area samples
Add functions to queue AUX area samples in advance (auxtrace_queue_data()) or individually (auxtrace_queues__add_sample()) or find out what queue a sample belongs on (auxtrace_queues__sample_queue()). auxtrace_queue_data() can also queue snapshot data which keeps snapshots and samples ordered with respect to each other in case support for that is desired. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Link: http://lore.kernel.org/lkml/20191115124225.5247-12-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
![]() |
103ed40e4b |
perf session: Add facility to peek at all events
AUX area samples are not limited in how far back in time the sample could start. Consequently samples must be queued in advance to allow for time-ordered processing. To achieve that, add perf_session__peek_events() that walks and peeks at all the events. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Link: http://lore.kernel.org/lkml/20191115124225.5247-11-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
![]() |
b04b8dd1e4 |
perf auxtrace: Add support for dumping AUX area samples
Add support for dumping AUX area samples i.e. via the perf script/report -D (--dump-raw-trace) option. Committer notes: Add __maybe_unused to the two args for auxtrace__dump_auxtrace_sample() for when we don't HAVE_AUXTRACE_SUPPORT. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Link: http://lore.kernel.org/lkml/20191115124225.5247-10-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
![]() |
ba2675bf15 |
perf inject: Cut AUX area samples
After decoding AUX area samples, the AUX area data is no longer needed (having been replaced by synthesized events) so cut it out. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Link: http://lore.kernel.org/lkml/20191115124225.5247-9-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
![]() |
eb7a52d46c |
perf record: Add aux-sample-size config term
To allow individual events to be selected for AUX area sampling, add aux-sample-size config term. attr.aux_sample_size is updated by auxtrace_parse_sample_options() so that the existing validation will see the value. Any event that has a non-zero aux_sample_size will cause AUX area sampling to be configured, irrespective of the --aux-sample option. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Link: http://lore.kernel.org/lkml/20191115124225.5247-8-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
![]() |
c0a6de06c4 |
perf record: Add support for AUX area sampling
Add a 'perf record' option '--aux-sample' to request AUX area sampling. AUX area sampling uses an overwriting buffer much like snapshot mode, so adjust the AUX buffer mmapping accordingly. To make it easy to queue samples for decoding, synthesize an ID index. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Link: http://lore.kernel.org/lkml/20191115124225.5247-7-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
![]() |
f0bb7ee853 |
perf auxtrace: Add support for AUX area sample recording
Add support for parsing and validating AUX area sample options. At present, the only option is the sample size, but it is also necessary to ensure that events are in a group with an AUX area event as the leader. Committer note: Add missing 'static inline' in front of auxtrace_parse_sample_options() for when we don't HAVE_AUXTRACE_SUPPORT. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Link: http://lore.kernel.org/lkml/20191115124225.5247-6-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
![]() |
f306de275b |
perf auxtrace: Move perf_evsel__find_pmu()
Move perf_evsel__find_pmu() so it can be used without forward declaration. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Link: http://lore.kernel.org/lkml/20191115124225.5247-5-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
![]() |
9bca1a4ef5 |
perf record: Add a function to test for kernel support for AUX area sampling
Architectures are expected to know if AUX area sampling is supported by the hardware. Add a function perf_can_aux_sample() which will determine whether the kernel supports it. Committer notes: I reported that this message was taking place on a kernel without the required bits: # perf record --aux-sample -e '{intel_pt//u,branch-misses:u}' Error: The sys_perf_event_open() syscall returned with 7 (Argument list too long) for event (branch-misses:u). /bin/dmesg | grep -i perf may provide additional information. Adrian sent a patch addressing it, with this explanation: ---- perf_can_aux_sample_size() always returned true because it did not pass the attribute size to sys_perf_event_open, nor correctly check the return value and errno. ---- After applying it I get, later in the series, when --aux-sample is added: # perf record --aux-sample -e '{intel_pt//u,branch-misses:u}' AUX area sampling is not supported by kernel Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Link: http://lore.kernel.org/lkml/20191115124225.5247-4-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
![]() |
98dcf14d7f |
perf tools: Add kernel AUX area sampling definitions
Add kernel AUX area sampling definitions, which brings perf_event.h into line with the kernel version. New sample type PERF_SAMPLE_AUX requests a sample of the AUX area buffer. New perf_event_attr member 'aux_sample_size' specifies the desired size of the sample. Also add support for parsing samples containing AUX area data i.e. PERF_SAMPLE_AUX. Committer notes: I squashed the first two patches in this series to avoid breaking automatic bisection, i.e. after applying only the original first patch in this series we would have: # perf test -v parsing 26: Sample parsing : --- start --- test child forked, pid 17018 sample format has changed, some new PERF_SAMPLE_ bit was introduced - test needs updating test child finished with -1 ---- end ---- Sample parsing: FAILED! # With the two paches combined: # perf test parsing 26: Sample parsing : Ok # Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Link: http://lore.kernel.org/lkml/20191115124225.5247-3-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
![]() |
848a5e507e |
perf report: Jump to symbol source view from total cycles view
This patch supports jumping from tui total cycles view to symbol source view. For example, perf record -b ./div perf report --total-cycles In total cycles view, we can select one entry and press 'a' or press ENTER key to jump to symbol source view. This patch also sets sort_order to NULL in cmd_report() which will use the default branch sort order. The percent value in new annotate view will be consistent with the percent in annotate view switched from perf report (we observed the original percent gap with previous patches). v2: --- Fix the 'make NO_SLANG=1' error. (set __maybe_unused to annotation_opts in block_hists_tui_browse()). Signed-off-by: Jin Yao <yao.jin@linux.intel.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jin Yao <yao.jin@intel.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lore.kernel.org/lkml/20191118140849.20714-2-yao.jin@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
![]() |
5cb456af99 |
perf util: Move block TUI function to ui browsers
It would be nice if we could jump to the assembler/source view (like the normal perf report) from total cycles view. This patch moves the block_hists_tui_browse from block-info.c to ui/browsers/hists.c in order to reuse some browser codes (i.e do_annotate) for implementing new annotation view. v2: --- Fix the 'make NO_SLANG=1' error. (Change 'int block_hists_tui_browse()' to 'static inline int block_hists_tui_browse()') Signed-off-by: Jin Yao <yao.jin@linux.intel.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jin Yao <yao.jin@intel.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lore.kernel.org/lkml/20191118140849.20714-1-yao.jin@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
![]() |
bb1835a3b8 |
perf session: Fix decompression of PERF_RECORD_COMPRESSED records
Avoid termination of trace loading in case the last record in the
decompressed buffer partly resides in the following mmaped
PERF_RECORD_COMPRESSED record.
In this case NULL value returned by fetch_mmaped_event() means to
proceed to the next mmaped record then decompress it and load compressed
events.
The issue can be reproduced like this:
$ perf record -z -- some_long_running_workload
$ perf report --stdio -vv
decomp (B): 44519 to 163000
decomp (B): 48119 to 174800
decomp (B): 65527 to 131072
fetch_mmaped_event: head=0x1ffe0 event->header_size=0x28, mmap_size=0x20000: fuzzed perf.data?
Error:
failed to process sample
...
Testing:
71: Zstd perf.data compression/decompression : Ok
$ tools/perf/perf report -vv --stdio
decomp (B): 59593 to 262160
decomp (B): 4438 to 16512
decomp (B): 285 to 880
Looking at the vmlinux_path (8 entries long)
Using vmlinux for symbols
decomp (B): 57474 to 261248
prefetch_event: head=0x3fc78 event->header_size=0x28, mmap_size=0x3fc80: fuzzed or compressed perf.data?
decomp (B): 25 to 32
decomp (B): 52 to 120
...
Fixes:
|
|
![]() |
0e3149f86b |
perf dso: Move dso_id from 'struct map' to 'struct dso'
And take it into account when looking up DSOs when we have the dso_id fields obtained from somewhere, like from PERF_RECORD_MMAP2 records. Instances of struct map pointing to the same DSO pathname but with anything in dso_id different are in fact different DSOs, so better have different 'struct dso' instances to reflect that. At some point we may want to get copies of the contents of the different objects if we want to do correct annotation or other analysis. With this we get 'struct map' 24 bytes leaner: $ pahole -C map ~/bin/perf struct map { union { struct rb_node rb_node __attribute__((__aligned__(8))); /* 0 24 */ struct list_head node; /* 0 16 */ } __attribute__((__aligned__(8))); /* 0 24 */ u64 start; /* 24 8 */ u64 end; /* 32 8 */ _Bool erange_warned:1; /* 40: 0 1 */ _Bool priv:1; /* 40: 1 1 */ /* XXX 6 bits hole, try to pack */ /* XXX 3 bytes hole, try to pack */ u32 prot; /* 44 4 */ u64 pgoff; /* 48 8 */ u64 reloc; /* 56 8 */ /* --- cacheline 1 boundary (64 bytes) --- */ u64 (*map_ip)(struct map *, u64); /* 64 8 */ u64 (*unmap_ip)(struct map *, u64); /* 72 8 */ struct dso * dso; /* 80 8 */ refcount_t refcnt; /* 88 4 */ u32 flags; /* 92 4 */ /* size: 96, cachelines: 2, members: 13 */ /* sum members: 92, holes: 1, sum holes: 3 */ /* sum bitfield members: 2 bits, bit holes: 1, sum bit holes: 6 bits */ /* forced alignments: 1 */ /* last cacheline: 32 bytes */ } __attribute__((__aligned__(8))); $ Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lkml.kernel.org/n/tip-g4hxxmraplo7wfjmk384mfsb@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
![]() |
1f74b100c9 |
perf dsos: Remove unused dsos__find() method
Not used anywhere, nuke it. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lkml.kernel.org/n/tip-teqz0eqcw43mnt7i3me44esw@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
![]() |
7b59a82493 |
perf map: Move comparision of map's dso_id to a separate function
We'll use it when doing DSO lookups using dso_ids. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lkml.kernel.org/n/tip-u2nr1oq03o0i29w2ay9jx03s@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
![]() |
4a7380a52e |
perf map: Pass a dso_id to map__new()
Instead of the 4 fields, a step in the direction of moving this to struct dso. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lkml.kernel.org/n/tip-gp5s1xgxacurmih5d1l94ymy@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
![]() |
99459a84d5 |
perf map: Move maj/min/ino/ino_generation to separate struct
And this patch highlights where these fields are being used: in the sort order where it uses it to compare maps and classify samples taking into account not just the DSO, but those DSO id fields. I think these should be used to differentiate DSOs with the same name but different 'struct dso_id' fields, i.e. these fields should move to 'struct dso' and then be used as part of the key when doing lookups for DSOs, in addition to the DSO name. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lkml.kernel.org/n/tip-8v5isitqy0dup47nnwkpc80f@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
![]() |
a910e4666d |
perf parse: Report initial event parsing error
Record the first event parsing error and report. Implementing feedback from Jiri Olsa: https://lkml.org/lkml/2019/10/28/680 An example error is: $ tools/perf/perf stat -e c/c/ WARNING: multiple event parsing errors event syntax error: 'c/c/' \___ unknown term valid terms: event,filter_rem,filter_opc0,edge,filter_isoc,filter_tid,filter_loc,filter_nc,inv,umask,filter_opc1,tid_en,thresh,filter_all_op,filter_not_nm,filter_state,filter_nm,config,config1,config2,name,period,percore Initial error: event syntax error: 'c/c/' \___ Cannot find PMU `c'. Missing kernel support? Run 'perf list' for a list of valid events Usage: perf stat [<options>] [<command>] -e, --event <event> event selector. use 'perf list' to list available events Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Allison Randal <allison@lohutok.net> Cc: Andi Kleen <ak@linux.intel.com> Cc: Anju T Sudhakar <anju@linux.vnet.ibm.com> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: Jin Yao <yao.jin@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@linux.ibm.com> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Thomas Richter <tmricht@linux.ibm.com> Link: http://lore.kernel.org/lkml/20191116074652.9960-1-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
![]() |
cb40273085 |
perf probe: Trace a magic number if variable is not found
Trace a magic number as immediate value if the target variable is not found at some probe points which is based on one probe event. This feature is good for the case if you trace a source code line with some local variables, which is compiled into several instructions and some of the variables are optimized out on some instructions. Even if so, with this feature, perf probe trace a magic number instead of such disappeared variables and fold those probes on one event. E.g. without this patch: # perf probe -D "pud_page_vaddr pud" Failed to find 'pud' in this function. Failed to find 'pud' in this function. Failed to find 'pud' in this function. Failed to find 'pud' in this function. Failed to find 'pud' in this function. Failed to find 'pud' in this function. Failed to find 'pud' in this function. Failed to find 'pud' in this function. Failed to find 'pud' in this function. Failed to find 'pud' in this function. Failed to find 'pud' in this function. Failed to find 'pud' in this function. Failed to find 'pud' in this function. Failed to find 'pud' in this function. Failed to find 'pud' in this function. Failed to find 'pud' in this function. p:probe/pud_page_vaddr _text+23480787 pud=%ax:x64 p:probe/pud_page_vaddr _text+23808453 pud=%bp:x64 p:probe/pud_page_vaddr _text+23558082 pud=%ax:x64 p:probe/pud_page_vaddr _text+328373 pud=%r8:x64 p:probe/pud_page_vaddr _text+348448 pud=%bx:x64 p:probe/pud_page_vaddr _text+23816818 pud=%bx:x64 With this patch: # perf probe -D "pud_page_vaddr pud" | head spurious_kernel_fault is blacklisted function, skip it. vmalloc_fault is blacklisted function, skip it. p:probe/pud_page_vaddr _text+23480787 pud=%ax:x64 p:probe/pud_page_vaddr _text+149051 pud=\deade12d:x64 p:probe/pud_page_vaddr _text+23808453 pud=%bp:x64 p:probe/pud_page_vaddr _text+315926 pud=\deade12d:x64 p:probe/pud_page_vaddr _text+23807209 pud=\deade12d:x64 p:probe/pud_page_vaddr _text+23557365 pud=%ax:x64 p:probe/pud_page_vaddr _text+314097 pud=%di:x64 p:probe/pud_page_vaddr _text+314015 pud=\deade12d:x64 p:probe/pud_page_vaddr _text+313893 pud=\deade12d:x64 p:probe/pud_page_vaddr _text+324083 pud=\deade12d:x64 Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Ravi Bangoria <ravi.bangoria@linux.ibm.com> Cc: Steven Rostedt (VMware) <rostedt@goodmis.org> Cc: Tom Zanussi <tom.zanussi@linux.intel.com> Link: http://lore.kernel.org/lkml/157406476931.24476.6261475888681844285.stgit@devnote2 Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
![]() |
66f69b2197 |
perf probe: Support DW_AT_const_value constant value
Support DW_AT_const_value for variable assignment instead of location. Note that this requires ftrace supporting immediate value. Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Ravi Bangoria <ravi.bangoria@linux.ibm.com> Cc: Steven Rostedt (VMware) <rostedt@goodmis.org> Cc: Tom Zanussi <tom.zanussi@linux.intel.com> Link: http://lore.kernel.org/lkml/157406476012.24476.16096289871757175775.stgit@devnote2 Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
![]() |
72363540c0 |
perf probe: Support multiprobe event
Support multiprobe event if the event is based on function and lines and kernel supports it. In this case, perf probe creates the first probe with an event, and tries to append following probes on that event, since those probes must be on the same source code line. Before this patch; # perf probe -a vfs_read:18 Added new events: probe:vfs_read_L18 (on vfs_read:18) probe:vfs_read_L18_1 (on vfs_read:18) You can now use it in all perf tools, such as: perf record -e probe:vfs_read_L18_1 -aR sleep 1 # After this patch (on multiprobe supported kernel) # perf probe -a vfs_read:18 Added new events: probe:vfs_read_L18 (on vfs_read:18) probe:vfs_read_L18 (on vfs_read:18) You can now use it in all perf tools, such as: perf record -e probe:vfs_read_L18 -aR sleep 1 # Committer testing: On a kernel that doesn't support multiprobe events, after this patch: # uname -a Linux quaco 5.3.8-200.fc30.x86_64 #1 SMP Tue Oct 29 14:46:22 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux # grep append /sys/kernel/debug/tracing/README be modified by appending '.descending' or '.ascending' to a can be modified by appending any of the following modifiers # # perf probe -a vfs_read:18 Added new events: probe:vfs_read_L18 (on vfs_read:18) probe:vfs_read_L18_1 (on vfs_read:18) You can now use it in all perf tools, such as: perf record -e probe:vfs_read_L18_1 -aR sleep 1 # perf probe -l probe:vfs_read_L18 (on vfs_read:18@fs/read_write.c) probe:vfs_read_L18_1 (on vfs_read:18@fs/read_write.c) # Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Ravi Bangoria <ravi.bangoria@linux.ibm.com> Cc: Steven Rostedt (VMware) <rostedt@goodmis.org> Cc: Tom Zanussi <tom.zanussi@linux.intel.com> Link: http://lore.kernel.org/lkml/157406475010.24476.586290752591512351.stgit@devnote2 Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
![]() |
15354d5469 |
perf probe: Generate event name with line number
Generate event name from function name with line number as <function>_L<line_number>. Note that this is only for the new event which is defined by the line number of function (except for line 0). If there is another event on same line, you have to use "-f" option. In that case, the new event has "_1" suffix. e.g. # perf probe -a kernel_read:2 Added new event: probe:kernel_read_L2 (on kernel_read:2) You can now use it in all perf tools, such as: perf record -e probe:kernel_read_L2 -aR sleep 1 But if we omit the line number or 0th line, it will have no suffix. # perf probe -a kernel_read:0 Added new event: probe:kernel_read (on kernel_read) You can now use it in all perf tools, such as: perf record -e probe:kernel_read -aR sleep 1 probe:kernel_read (on kernel_read@linux-5.0.0/fs/read_write.c) probe:kernel_read_L2 (on kernel_read:2@linux-5.0.0/fs/read_write.c) Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Ravi Bangoria <ravi.bangoria@linux.ibm.com> Cc: Steven Rostedt (VMware) <rostedt@goodmis.org> Cc: Tom Zanussi <tom.zanussi@linux.intel.com> Link: http://lore.kernel.org/lkml/157406474026.24476.2828897745502059569.stgit@devnote2 Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
![]() |
499144c83d |
perf probe: Do not show non representive lines by perf-probe -L
Since perf probe -L shows non representive lines, it can be mislead users where user can put probes. This prevents to show such non representive lines so that user can understand which lines user can probe. # perf probe -L kernel_read <kernel_read@/build/linux-pvZVvI/linux-5.0.0/fs/read_write.c:0> 0 ssize_t kernel_read(struct file *file, void *buf, size_t count, loff_t *pos) { 2 mm_segment_t old_fs; ssize_t result; old_fs = get_fs(); 6 set_fs(get_ds()); /* The cast to a user pointer is valid due to the set_fs() */ 8 result = vfs_read(file, (void __user *)buf, count, pos); 9 set_fs(old_fs); 10 return result; } EXPORT_SYMBOL(kernel_read); Committer testing: Before: # perf probe -L kernel_read <kernel_read@/usr/src/debug/kernel-5.3.fc30/linux-5.3.8-200.fc30.x86_64/fs/read_write.c:0> 0 ssize_t kernel_read(struct file *file, void *buf, size_t count, loff_t *pos) 1 { 2 mm_segment_t old_fs; 3 ssize_t result; 5 old_fs = get_fs(); 6 set_fs(KERNEL_DS); /* The cast to a user pointer is valid due to the set_fs() */ 8 result = vfs_read(file, (void __user *)buf, count, pos); 9 set_fs(old_fs); 10 return result; } EXPORT_SYMBOL(kernel_read); # See the 1, 3, 5 lines? They shouldn't be there, after this patch: # perf probe -L kernel_read <kernel_read@/usr/src/debug/kernel-5.3.fc30/linux-5.3.8-200.fc30.x86_64/fs/read_write.c:0> 0 ssize_t kernel_read(struct file *file, void *buf, size_t count, loff_t *pos) { 2 mm_segment_t old_fs; ssize_t result; old_fs = get_fs(); 6 set_fs(KERNEL_DS); /* The cast to a user pointer is valid due to the set_fs() */ 8 result = vfs_read(file, (void __user *)buf, count, pos); 9 set_fs(old_fs); 10 return result; } EXPORT_SYMBOL(kernel_read); # Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Reported-by: Arnaldo Carvalho de Melo <acme@redhat.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Ravi Bangoria <ravi.bangoria@linux.ibm.com> Cc: Steven Rostedt (VMware) <rostedt@goodmis.org> Cc: Tom Zanussi <tom.zanussi@linux.intel.com> Link: http://lore.kernel.org/lkml/157406473064.24476.2913278267727587314.stgit@devnote2 Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
![]() |
1ae5d88a4e |
perf probe: Verify given line is a representive line
Verify user given probe line is a representive line (which doesn't share the address with other lines or the line is the least line among the lines which shares same address), and if not, it shows what is the representive line. Without this fix, user can put a probe on the lines which is not a a representive line. But since this is not a representive line, perf probe -l shows a representive line number instead of user given line number. e.g. (put kernel_read:3, but listed as kernel_read:2) # perf probe -a kernel_read:3 Added new event: probe:kernel_read (on kernel_read:3) You can now use it in all perf tools, such as: perf record -e probe:kernel_read -aR sleep 1 # perf probe -l probe:kernel_read (on kernel_read:2@linux-5.0.0/fs/read_write.c) With this fix, perf probe doesn't allow user to put a probe on a representive line, and tell what is the representive line. # perf probe -a kernel_read:3 This line is sharing the addrees with other lines. Please try to probe at kernel_read:2 instead. Error: Failed to add events. Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Reported-by: Arnaldo Carvalho de Melo <acme@redhat.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Ravi Bangoria <ravi.bangoria@linux.ibm.com> Cc: Steven Rostedt (VMware) <rostedt@goodmis.org> Cc: Tom Zanussi <tom.zanussi@linux.intel.com> Link: http://lore.kernel.org/lkml/157406472071.24476.14915451439785001021.stgit@devnote2 Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
![]() |
57f95bf5f8 |
perf probe: Show correct statement line number by perf probe -l
The dwarf_getsrc_die() can return the line which is not a statement nor the least line number among the lines which shares same address. This can lead perf probe --list shows incorrect line number for probed address. To fix this, this introduces cu_getsrc_die() which returns only a statement line and which is the least line number (we call it the representive line for an address), and use it in cu_find_lineinfo(). Also, if the given address is the entry address of a real function, cu_find_lineinfo() returns the function declared line number instead of the start line number of the function body. For example, without this change perf probe -l shows incorrect line as below. # perf probe -a kernel_read:2 Added new event: probe:kernel_read (on kernel_read:2) You can now use it in all perf tools, such as: perf record -e probe:kernel_read -aR sleep 1 # perf probe -l probe:kernel_read (on kernel_read:1@linux-5.0.0/fs/read_write.c) With this fix, it shows correct line number as below; # perf probe -l probe:kernel_read (on kernel_read:2@linux-5.0.0/fs/read_write.c) Reported-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Ravi Bangoria <ravi.bangoria@linux.ibm.com> Cc: Steven Rostedt (VMware) <rostedt@goodmis.org> Cc: Tom Zanussi <tom.zanussi@linux.intel.com> Link: http://lore.kernel.org/lkml/157406471067.24476.17463149618465494448.stgit@devnote2 Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
![]() |
1e5f015442 |
x86/insn: perf tools: Add some instructions to the new instructions test
Add to the "x86 instruction decoder - new instructions" test the following instructions: cldemote tpause umonitor umwait movdiri movdir64b enqcmd enqcmds encls enclu enclv pconfig wbnoinvd For information about the instructions, refer Intel SDM May 2019 (325462-070US) and Intel Architecture Instruction Set Extensions May 2019 (319433-037). Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Reviewed-by: Andi Kleen <ak@linux.intel.com> Cc: Borislav Petkov <bp@alien8.de> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: x86@kernel.org Link: http://lore.kernel.org/lkml/20191115135447.6519-2-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
![]() |
7624e69465 |
perf map: Move seldom used ->flags field to second cacheline
So we start with: $ pahole -C map ~/bin/perf struct map { union { struct rb_node rb_node __attribute__((__aligned__(8))); /* 0 24 */ struct list_head node; /* 0 16 */ } __attribute__((__aligned__(8))); /* 0 24 */ u64 start; /* 24 8 */ u64 end; /* 32 8 */ _Bool erange_warned:1; /* 40: 0 1 */ _Bool priv:1; /* 40: 1 1 */ /* XXX 6 bits hole, try to pack */ /* XXX 3 bytes hole, try to pack */ u32 prot; /* 44 4 */ u32 flags; /* 48 4 */ /* XXX 4 bytes hole, try to pack */ u64 pgoff; /* 56 8 */ /* --- cacheline 1 boundary (64 bytes) --- */ u64 reloc; /* 64 8 */ u32 maj; /* 72 4 */ u32 min; /* 76 4 */ u64 ino; /* 80 8 */ u64 ino_generation; /* 88 8 */ u64 (*map_ip)(struct map *, u64); /* 96 8 */ u64 (*unmap_ip)(struct map *, u64); /* 104 8 */ struct dso * dso; /* 112 8 */ refcount_t refcnt; /* 120 4 */ /* size: 128, cachelines: 2, members: 17 */ /* sum members: 116, holes: 2, sum holes: 7 */ /* sum bitfield members: 2 bits, bit holes: 1, sum bit holes: 6 bits */ /* padding: 4 */ /* forced alignments: 1 */ } __attribute__((__aligned__(8))); $ and 'flags' is seldom used when printing details about the map or with the "cacheline" sort order, we can move them it to the second cacheline, that will allow combining it with 'refcnt', that is only four bytes: $ pahole -C map ~/bin/perf struct map { union { struct rb_node rb_node __attribute__((__aligned__(8))); /* 0 24 */ struct list_head node; /* 0 16 */ } __attribute__((__aligned__(8))); /* 0 24 */ u64 start; /* 24 8 */ u64 end; /* 32 8 */ _Bool erange_warned:1; /* 40: 0 1 */ _Bool priv:1; /* 40: 1 1 */ /* XXX 6 bits hole, try to pack */ /* XXX 3 bytes hole, try to pack */ u32 prot; /* 44 4 */ u64 pgoff; /* 48 8 */ u64 reloc; /* 56 8 */ /* --- cacheline 1 boundary (64 bytes) --- */ u32 maj; /* 64 4 */ u32 min; /* 68 4 */ u64 ino; /* 72 8 */ u64 ino_generation; /* 80 8 */ u64 (*map_ip)(struct map *, u64); /* 88 8 */ u64 (*unmap_ip)(struct map *, u64); /* 96 8 */ struct dso * dso; /* 104 8 */ refcount_t refcnt; /* 112 4 */ u32 flags; /* 116 4 */ /* size: 120, cachelines: 2, members: 17 */ /* sum members: 116, holes: 1, sum holes: 3 */ /* sum bitfield members: 2 bits, bit holes: 1, sum bit holes: 6 bits */ /* forced alignments: 1 */ /* last cacheline: 56 bytes */ } __attribute__((__aligned__(8))); $ Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lkml.kernel.org/n/tip-2cdw3zlw1mkamaf7nqtdlxfi@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
![]() |
dbc984c961 |
perf map: Use bitmap for booleans
The map->priv and map->erange_warned are seldom used, the first only in tests/vmlinux-kallsyms.c, the later only when hist_entry__inc_addr_samples() returns -ERANGE in 'perf top', which are really rare occasions, so make them a bool bitfield. This will open up space for other members on the first cacheline. $ pahole -C map ~/bin/perf struct map { union { struct rb_node rb_node __attribute__((__aligned__(8))); /* 0 24 */ struct list_head node; /* 0 16 */ } __attribute__((__aligned__(8))); /* 0 24 */ u64 start; /* 24 8 */ u64 end; /* 32 8 */ _Bool erange_warned:1; /* 40: 0 1 */ _Bool priv:1; /* 40: 1 1 */ /* XXX 6 bits hole, try to pack */ /* XXX 3 bytes hole, try to pack */ u32 prot; /* 44 4 */ u32 flags; /* 48 4 */ /* XXX 4 bytes hole, try to pack */ u64 pgoff; /* 56 8 */ /* --- cacheline 1 boundary (64 bytes) --- */ u64 reloc; /* 64 8 */ u32 maj; /* 72 4 */ u32 min; /* 76 4 */ u64 ino; /* 80 8 */ u64 ino_generation; /* 88 8 */ u64 (*map_ip)(struct map *, u64); /* 96 8 */ u64 (*unmap_ip)(struct map *, u64); /* 104 8 */ struct dso * dso; /* 112 8 */ refcount_t refcnt; /* 120 4 */ /* size: 128, cachelines: 2, members: 17 */ /* sum members: 116, holes: 2, sum holes: 7 */ /* sum bitfield members: 2 bits, bit holes: 1, sum bit holes: 6 bits */ /* padding: 4 */ /* forced alignments: 1 */ } __attribute__((__aligned__(8))); $ Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lkml.kernel.org/n/tip-g5545pcq4ff0wr17tfb1piqt@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
![]() |
aceb98261e |
perf callchain: Fix segfault in thread__resolve_callchain_sample()
Do not dereference 'chain' when it is NULL.
$ perf record -e intel_pt//u -e branch-misses:u uname
$ perf report --itrace=l --branch-history
perf: Segmentation fault
Fixes:
|
|
![]() |
a7c2b572e2 |
perf map_groups: Auto sort maps by name, if needed
There are still lots of lookups by name, even if just when loading vmlinux, till that code is studied to figure out if its possible to do away with those map lookup by names, provide a way to sort it using libc's qsort/bsearch. Doing it at the first lookup defers the sorting a bit, and as the code stands now, is never done for user maps, just for the kernel ones. # perf probe -l # perf probe -x ~/bin/perf -L __map_groups__find_by_name <__map_groups__find_by_name@/home/acme/git/perf/tools/perf/util/symbol.c:0> 0 static struct map *__map_groups__find_by_name(struct map_groups *mg, const char *name) 1 { struct map **mapp; 4 if (mg->maps_by_name == NULL && 5 map__groups__sort_by_name_from_rbtree(mg)) 6 return NULL; 8 mapp = bsearch(name, mg->maps_by_name, mg->nr_maps, sizeof(*mapp), map__strcmp_name); 9 if (mapp) 10 return *mapp; 11 return NULL; 12 } struct map *map_groups__find_by_name(struct map_groups *mg, const char *name) { # perf probe -x ~/bin/perf 'found=__map_groups__find_by_name:10 name:string' Added new event: probe_perf:found (on __map_groups__find_by_name:10 in /home/acme/bin/perf with name:string) You can now use it in all perf tools, such as: perf record -e probe_perf:found -aR sleep 1 # # perf probe -x ~/bin/perf -L map_groups__find_by_name <map_groups__find_by_name@/home/acme/git/perf/tools/perf/util/symbol.c:0> 0 struct map *map_groups__find_by_name(struct map_groups *mg, const char *name) 1 { 2 struct maps *maps = &mg->maps; struct map *map; 5 down_read(&maps->lock); 7 if (mg->last_search_by_name && strcmp(mg->last_search_by_name->dso->short_name, name) == 0) { 8 map = mg->last_search_by_name; 9 goto out_unlock; } /* * If we have mg->maps_by_name, then the name isn't in the rbtree, * as mg->maps_by_name mirrors the rbtree when lookups by name are * made. */ 16 map = __map_groups__find_by_name(mg, name); 17 if (map || mg->maps_by_name != NULL) 18 goto out_unlock; /* Fallback to traversing the rbtree... */ 21 maps__for_each_entry(maps, map) 22 if (strcmp(map->dso->short_name, name) == 0) { 23 mg->last_search_by_name = map; 24 goto out_unlock; } 27 map = NULL; out_unlock: 30 up_read(&maps->lock); 31 return map; 32 } int dso__load_vmlinux(struct dso *dso, struct map *map, const char *vmlinux, bool vmlinux_allocated) # perf probe -x ~/bin/perf 'fallback=map_groups__find_by_name:21 name:string' Added new events: probe_perf:fallback (on map_groups__find_by_name:21 in /home/acme/bin/perf with name:string) probe_perf:fallback_1 (on map_groups__find_by_name:21 in /home/acme/bin/perf with name:string) You can now use it in all perf tools, such as: perf record -e probe_perf:fallback_1 -aR sleep 1 # # perf probe -l probe_perf:fallback (on map_groups__find_by_name:21@util/symbol.c in /home/acme/bin/perf with name_string) probe_perf:fallback_1 (on map_groups__find_by_name:21@util/symbol.c in /home/acme/bin/perf with name_string) probe_perf:found (on __map_groups__find_by_name:10@util/symbol.c in /home/acme/bin/perf with name_string) # # perf stat -e probe_perf:* Now run 'perf top' in another term and then, after a while, stop 'perf stat': Furthermore, if we ask for interval printing, we can see that that is done just at the start of the workload: # perf stat -I1000 -e probe_perf:* # time counts unit events 1.000319513 0 probe_perf:found 1.000319513 0 probe_perf:fallback_1 1.000319513 0 probe_perf:fallback 2.001868092 23,251 probe_perf:found 2.001868092 0 probe_perf:fallback_1 2.001868092 0 probe_perf:fallback 3.002901597 0 probe_perf:found 3.002901597 0 probe_perf:fallback_1 3.002901597 0 probe_perf:fallback 4.003358591 0 probe_perf:found 4.003358591 0 probe_perf:fallback_1 4.003358591 0 probe_perf:fallback ^C # Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lkml.kernel.org/n/tip-c5lmbyr14x448rcfii7y6t3k@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
![]() |
a94ab91a54 |
perf machine: No need to check if kernel module maps pre-exist
We'only populating maps for kernel modules either from perf.data file PERF_RECORD_MMAP records or when parsing /proc/modules, so there is no need to first look if we already have those module maps in the list, that would mean the kernel has duplicate entries. So ditch one use of looking up maps by name. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lkml.kernel.org/n/tip-gnzjg2hhuz6jnrw91m35059y@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
![]() |
6e0a9b3dfa |
perf record: No need to process the synthesized MMAP events twice
At the end of a 'perf record' session, by default, we'll process all samples and populate the threads, maps, etc so as to find out which of the DSOs got samples, to reduce the size of the build-id table we'll add to the perf.data headers. But we don't need to process the PERF_RECORD_MMAP events synthesized for the kernel modules, as we have those already via perf_session__create_kernel_maps(), so add mmap/mmap2 handlers that first look at event->header.misc to see if the event is for a user map, bailing out if not. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lkml.kernel.org/n/tip-mofoxvcx2dryppcw3o689jdd@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
![]() |
f068435d9b |
perf map: No need to adjust the long name of modules
At some point in the past we needed to make sure we would get the long
name of modules and not just what we get from /proc/modules, but that
need, as described in the cset that introduced the adjustment function:
Fixes:
|
|
![]() |
1ae14516cb |
perf map_groups: Add a front end cache for map lookups by name
Lets see if it helps: First look at the probeable lines for the function that does lookups by name in a map_groups struct: # perf probe -x ~/bin/perf -L map_groups__find_by_name <map_groups__find_by_name@/home/acme/git/perf/tools/perf/util/symbol.c:0> 0 struct map *map_groups__find_by_name(struct map_groups *mg, const char *name) 1 { 2 struct maps *maps = &mg->maps; struct map *map; 5 down_read(&maps->lock); 7 if (mg->last_search_by_name && strcmp(mg->last_search_by_name->dso->short_name, name) == 0) { 8 map = mg->last_search_by_name; 9 goto out_unlock; } 12 maps__for_each_entry(maps, map) 13 if (strcmp(map->dso->short_name, name) == 0) { 14 mg->last_search_by_name = map; 15 goto out_unlock; } 18 map = NULL; out_unlock: 21 up_read(&maps->lock); 22 return map; 23 } int dso__load_vmlinux(struct dso *dso, struct map *map, const char *vmlinux, bool vmlinux_allocated) # Now add a probe to the place where we reuse the last search: # perf probe -x ~/bin/perf map_groups__find_by_name:8 Added new event: probe_perf:map_groups__find_by_name (on map_groups__find_by_name:8 in /home/acme/bin/perf) You can now use it in all perf tools, such as: perf record -e probe_perf:map_groups__find_by_name -aR sleep 1 # Now lets do a system wide 'perf stat' counting those events: # perf stat -e probe_perf:* Leave it running and lets do a 'perf top', then, after a while, stop the 'perf stat': # perf stat -e probe_perf:* ^C Performance counter stats for 'system wide': 3,603 probe_perf:map_groups__find_by_name 44.565253139 seconds time elapsed # yeah, good to have. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lkml.kernel.org/n/tip-tcz37g3nxv3tvxw3q90vga3p@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
![]() |
c5c584d2db |
perf maps: Do not use an rbtree to sort by map name
This is only used for the kernel maps, shave 24 bytes out 'struct map' and just traverse the existing per ip rbtree to look for maps by name, use a front end cache to reuse the last search if its the same name. After this 'struct map' is down to just two cachelines: $ pahole -C map ~/bin/perf struct map { union { struct rb_node rb_node __attribute__((__aligned__(8))); /* 0 24 */ struct list_head node; /* 0 16 */ } __attribute__((__aligned__(8))); /* 0 24 */ u64 start; /* 24 8 */ u64 end; /* 32 8 */ _Bool erange_warned; /* 40 1 */ /* XXX 3 bytes hole, try to pack */ u32 priv; /* 44 4 */ u32 prot; /* 48 4 */ u32 flags; /* 52 4 */ u64 pgoff; /* 56 8 */ /* --- cacheline 1 boundary (64 bytes) --- */ u64 reloc; /* 64 8 */ u32 maj; /* 72 4 */ u32 min; /* 76 4 */ u64 ino; /* 80 8 */ u64 ino_generation; /* 88 8 */ u64 (*map_ip)(struct map *, u64); /* 96 8 */ u64 (*unmap_ip)(struct map *, u64); /* 104 8 */ struct dso * dso; /* 112 8 */ refcount_t refcnt; /* 120 4 */ /* size: 128, cachelines: 2, members: 17 */ /* sum members: 121, holes: 1, sum holes: 3 */ /* padding: 4 */ /* forced alignments: 1 */ } __attribute__((__aligned__(8))); $ Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lkml.kernel.org/n/tip-bvr8fqfgzxtgnhnwt5sssx5g@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
![]() |
bcb8af5c46 |
perf maps: Purge the entries from maps->names in __maps__purge()
No need to iterate via the ->names rbtree, as all the entries there as in maps->entries as well, reuse __maps__purge() for that. Doing it this way we can kill maps__for_each_entry_by_name(), maps__for_each_entry_by_name_safe(), maps__{first,next}_by_name(). Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lkml.kernel.org/n/tip-ps0nrio8pydyo23rr2s696ue@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
![]() |
af833988c0 |
perf scripts python: exported-sql-viewer.py: Fix use of TRUE with SQLite
Prior to version 3.23 SQLite does not support TRUE or FALSE, so always
use 1 and 0 for SQLite.
Fixes:
|
|
![]() |
da3ef7f6cd |
perf vendor events power9: Fix commas so PMU event files are valid JSON
No functional change. Remove extra commas in the power9 JSON files so that the files can be parsed and validated by other utilities such as Python that fail to parse invalid JSON. Before: $ diffstat -l -p1 /wb/1.patch | while read filename ; do echo $filename ; cat $filename | json_verify ; done tools/perf/pmu-events/arch/powerpc/power9/cache.json parse error: invalid object key (must be a string) [ {, "EventCode": "0x300 (right here) ------^ JSON is invalid tools/perf/pmu-events/arch/powerpc/power9/floating-point.json parse error: invalid object key (must be a string) [ {, "EventCode": "0x141 (right here) ------^ JSON is invalid tools/perf/pmu-events/arch/powerpc/power9/frontend.json parse error: invalid object key (must be a string) [ {, "EventCode": "0x250 (right here) ------^ JSON is invalid tools/perf/pmu-events/arch/powerpc/power9/marked.json parse error: invalid object key (must be a string) [ {, "EventCode": "0x301 (right here) ------^ JSON is invalid tools/perf/pmu-events/arch/powerpc/power9/memory.json parse error: invalid object key (must be a string) [ {, "EventCode": "0x300 (right here) ------^ JSON is invalid tools/perf/pmu-events/arch/powerpc/power9/other.json parse error: invalid object key (must be a string) [ {, "EventCode": "0x308 (right here) ------^ JSON is invalid tools/perf/pmu-events/arch/powerpc/power9/pipeline.json parse error: invalid object key (must be a string) [ {, "EventCode": "0x4D0 (right here) ------^ JSON is invalid tools/perf/pmu-events/arch/powerpc/power9/pmc.json parse error: invalid object key (must be a string) [ {, "EventCode": "0x200 (right here) ------^ JSON is invalid tools/perf/pmu-events/arch/powerpc/power9/translation.json parse error: invalid object key (must be a string) [ {, "EventCode": "0x1E" (right here) ------^ JSON is invalid $ After: $ diffstat -l -p1 /wb/1.patch | while read filename ; do echo $filename ; cat $filename | json_verify ; done tools/perf/pmu-events/arch/powerpc/power9/cache.json JSON is valid tools/perf/pmu-events/arch/powerpc/power9/floating-point.json JSON is valid tools/perf/pmu-events/arch/powerpc/power9/frontend.json JSON is valid tools/perf/pmu-events/arch/powerpc/power9/marked.json JSON is valid tools/perf/pmu-events/arch/powerpc/power9/memory.json JSON is valid tools/perf/pmu-events/arch/powerpc/power9/other.json JSON is valid tools/perf/pmu-events/arch/powerpc/power9/pipeline.json JSON is valid tools/perf/pmu-events/arch/powerpc/power9/pmc.json JSON is valid tools/perf/pmu-events/arch/powerpc/power9/translation.json JSON is valid $ Signed-off-by: James Clark <james.clark@arm.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Kevin Mooney <kevin.mooney@arm.com> Cc: Madhavan Srinivasan <maddy@linux.vnet.ibm.com> Cc: Mamatha Inamdar <mamatha4@linux.vnet.ibm.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@linux.ibm.com> Cc: nd@arm.com Link: http://lore.kernel.org/lkml/20191112160342.26470-3-james.clark@arm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
![]() |
835e5bd909 |
perf vendor events power8: Fix commas so PMU event files are valid JSON
No functional change. Remove extra commas in the power8 JSON files so that the files can be parsed and validated by other utilities such as Python that fail to parse invalid JSON. Committer testing: Before: $ diffstat -l -p1 /wb/1.patch | while read filename ; do echo $filename ; cat $filename | json_verify ; done tools/perf/pmu-events/arch/powerpc/power8/cache.json parse error: invalid object key (must be a string) [ {, "EventCode": "0x4c0 (right here) ------^ JSON is invalid tools/perf/pmu-events/arch/powerpc/power8/floating-point.json parse error: invalid object key (must be a string) [ {, "EventCode": "0x200 (right here) ------^ JSON is invalid tools/perf/pmu-events/arch/powerpc/power8/frontend.json parse error: invalid object key (must be a string) [ {, "EventCode": "0x250 (right here) ------^ JSON is invalid tools/perf/pmu-events/arch/powerpc/power8/marked.json parse error: invalid object key (must be a string) [ {, "EventCode": "0x351 (right here) ------^ JSON is invalid tools/perf/pmu-events/arch/powerpc/power8/memory.json parse error: invalid object key (must be a string) [ {, "EventCode": "0x100 (right here) ------^ JSON is invalid tools/perf/pmu-events/arch/powerpc/power8/other.json parse error: invalid object key (must be a string) [ {, "EventCode": "0x1f0 (right here) ------^ JSON is invalid tools/perf/pmu-events/arch/powerpc/power8/pipeline.json parse error: invalid object key (must be a string) [ {, "EventCode": "0x100 (right here) ------^ JSON is invalid tools/perf/pmu-events/arch/powerpc/power8/pmc.json parse error: invalid object key (must be a string) [ {, "EventCode": "0x200 (right here) ------^ JSON is invalid tools/perf/pmu-events/arch/powerpc/power8/translation.json parse error: invalid object key (must be a string) [ {, "EventCode": "0x4c0 (right here) ------^ JSON is invalid $ After: $ diffstat -l -p1 /wb/1.patch | while read filename ; do echo $filename ; cat $filename | json_verify ; done tools/perf/pmu-events/arch/powerpc/power8/cache.json JSON is valid tools/perf/pmu-events/arch/powerpc/power8/floating-point.json JSON is valid tools/perf/pmu-events/arch/powerpc/power8/frontend.json JSON is valid tools/perf/pmu-events/arch/powerpc/power8/marked.json JSON is valid tools/perf/pmu-events/arch/powerpc/power8/memory.json JSON is valid tools/perf/pmu-events/arch/powerpc/power8/other.json JSON is valid tools/perf/pmu-events/arch/powerpc/power8/pipeline.json JSON is valid tools/perf/pmu-events/arch/powerpc/power8/pmc.json JSON is valid tools/perf/pmu-events/arch/powerpc/power8/translation.json JSON is valid $ Signed-off-by: James Clark <james.clark@arm.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Kevin Mooney <kevin.mooney@arm.com> Cc: Madhavan Srinivasan <maddy@linux.vnet.ibm.com> Cc: Mamatha Inamdar <mamatha4@linux.vnet.ibm.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@linux.ibm.com> Cc: nd@arm.com Link: http://lore.kernel.org/lkml/20191112160342.26470-2-james.clark@arm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
![]() |
a44e4f3ab1 |
perf vendor events arm64: Fix commas so PMU event files are valid JSON
No functional change. Add and remove extra commas in the arm64 JSON files so that the files can be parsed and validated by other utilities such as Python that fail to parse invalid JSON. Committer testing: Before: $ diffstat -l -p1 /wb/1.patch | while read filename ; do echo $filename ; cat $filename | json_verify ; done tools/perf/pmu-events/arch/arm64/ampere/emag/branch.json parse error: invalid object key (must be a string) [ { "ArchStdEvent" (right here) ------^ JSON is invalid tools/perf/pmu-events/arch/arm64/ampere/emag/bus.json parse error: invalid object key (must be a string) [ { "ArchStdEvent" (right here) ------^ JSON is invalid tools/perf/pmu-events/arch/arm64/ampere/emag/cache.json parse error: invalid object key (must be a string) [ { "ArchStdEvent" (right here) ------^ JSON is invalid tools/perf/pmu-events/arch/arm64/ampere/emag/clock.json parse error: unallowed token at this point in JSON text [ { "PublicDescrip (right here) ------^ JSON is invalid tools/perf/pmu-events/arch/arm64/ampere/emag/exception.json parse error: invalid object key (must be a string) [ { "ArchStdEvent" (right here) ------^ JSON is invalid tools/perf/pmu-events/arch/arm64/ampere/emag/instruction.json parse error: invalid object key (must be a string) [ { "ArchStdEvent" (right here) ------^ JSON is invalid tools/perf/pmu-events/arch/arm64/ampere/emag/intrinsic.json parse error: invalid object key (must be a string) [ { "ArchStdEvent" (right here) ------^ JSON is invalid tools/perf/pmu-events/arch/arm64/ampere/emag/memory.json parse error: invalid object key (must be a string) [ { "ArchStdEvent" (right here) ------^ JSON is invalid tools/perf/pmu-events/arch/arm64/ampere/emag/pipeline.json parse error: unallowed token at this point in JSON text [ { "PublicDescrip (right here) ------^ JSON is invalid tools/perf/pmu-events/arch/arm64/arm/cortex-a53/branch.json parse error: invalid object key (must be a string) [ { "ArchStdEvent": "BR (right here) ------^ JSON is invalid tools/perf/pmu-events/arch/arm64/arm/cortex-a53/bus.json parse error: invalid object key (must be a string) [ { "ArchStdEvent": (right here) ------^ JSON is invalid tools/perf/pmu-events/arch/arm64/arm/cortex-a53/other.json parse error: invalid object key (must be a string) [ { "ArchStdEvent": (right here) ------^ JSON is invalid tools/perf/pmu-events/arch/arm64/arm/cortex-a57-a72/core-imp-def.json parse error: invalid object key (must be a string) [ { "ArchStdEvent" (right here) ------^ JSON is invalid tools/perf/pmu-events/arch/arm64/armv8-recommended.json parse error: after array element, I expect ',' or ']' [ { "PublicDescrip (right here) ------^ JSON is invalid tools/perf/pmu-events/arch/arm64/cavium/thunderx2/core-imp-def.json parse error: invalid object key (must be a string) [ { "ArchStdEvent" (right here) ------^ JSON is invalid tools/perf/pmu-events/arch/arm64/hisilicon/hip08/core-imp-def.json parse error: invalid object key (must be a string) [ { "ArchStdEvent" (right here) ------^ JSON is invalid tools/perf/pmu-events/arch/arm64/hisilicon/hip08/uncore-ddrc.json parse error: invalid object key (must be a string) [ { "EventCode": "0x00 (right here) ------^ JSON is invalid tools/perf/pmu-events/arch/arm64/hisilicon/hip08/uncore-hha.json parse error: invalid object key (must be a string) [ { "EventCode": "0x00 (right here) ------^ JSON is invalid tools/perf/pmu-events/arch/arm64/hisilicon/hip08/uncore-l3c.json parse error: invalid object key (must be a string) [ { "EventCode": "0x00 (right here) ------^ JSON is invalid $ After: $ diffstat -l -p1 /wb/1.patch | while read filename ; do echo $filename ; cat $filename | json_verify ; done tools/perf/pmu-events/arch/arm64/ampere/emag/branch.json JSON is valid tools/perf/pmu-events/arch/arm64/ampere/emag/bus.json JSON is valid tools/perf/pmu-events/arch/arm64/ampere/emag/cache.json JSON is valid tools/perf/pmu-events/arch/arm64/ampere/emag/clock.json JSON is valid tools/perf/pmu-events/arch/arm64/ampere/emag/exception.json JSON is valid tools/perf/pmu-events/arch/arm64/ampere/emag/instruction.json JSON is valid tools/perf/pmu-events/arch/arm64/ampere/emag/intrinsic.json JSON is valid tools/perf/pmu-events/arch/arm64/ampere/emag/memory.json JSON is valid tools/perf/pmu-events/arch/arm64/ampere/emag/pipeline.json JSON is valid tools/perf/pmu-events/arch/arm64/arm/cortex-a53/branch.json JSON is valid tools/perf/pmu-events/arch/arm64/arm/cortex-a53/bus.json JSON is valid tools/perf/pmu-events/arch/arm64/arm/cortex-a53/other.json JSON is valid tools/perf/pmu-events/arch/arm64/arm/cortex-a57-a72/core-imp-def.json JSON is valid tools/perf/pmu-events/arch/arm64/armv8-recommended.json JSON is valid tools/perf/pmu-events/arch/arm64/cavium/thunderx2/core-imp-def.json JSON is valid tools/perf/pmu-events/arch/arm64/hisilicon/hip08/core-imp-def.json JSON is valid tools/perf/pmu-events/arch/arm64/hisilicon/hip08/uncore-ddrc.json JSON is valid tools/perf/pmu-events/arch/arm64/hisilicon/hip08/uncore-hha.json JSON is valid tools/perf/pmu-events/arch/arm64/hisilicon/hip08/uncore-l3c.json JSON is valid $ Signed-off-by: James Clark <james.clark@arm.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: John Garry <john.garry@huawei.com> Cc: Kevin Mooney <kevin.mooney@arm.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: nd@arm.com Link: http://lore.kernel.org/lkml/20191112160342.26470-1-james.clark@arm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
![]() |
e1e9b78d39 |
perf parse: Use YYABORT to clear stack after failure, plugging leaks
Using return rather than YYABORT means that the stack isn't cleared up following a failure. The change to YYABORT means the return value is 1 rather than -1, but the callers just check for a result of 0 (success). Add missing free of a list when an error occurs in event_pmu. Signed-off-by: Ian Rogers <irogers@google.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: http://lore.kernel.org/lkml/20191109075840.181231-1-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
![]() |
ccd26741f5 |
perf tool: Provide an option to print perf_event_open args and return value
Perf record with verbose=2 already prints this information along with whole lot of other traces which requires lot of scrolling. Introduce an option to print only perf_event_open() arguments and return value. Sample o/p: $ perf --debug perf-event-open=1 record -- ls > /dev/null ------------------------------------------------------------ perf_event_attr: size 112 { sample_period, sample_freq } 4000 sample_type IP|TID|TIME|PERIOD read_format ID disabled 1 inherit 1 exclude_kernel 1 mmap 1 comm 1 freq 1 enable_on_exec 1 task 1 precise_ip 3 sample_id_all 1 exclude_guest 1 mmap2 1 comm_exec 1 ksymbol 1 bpf_event 1 ------------------------------------------------------------ sys_perf_event_open: pid 4308 cpu 0 group_fd -1 flags 0x8 = 4 sys_perf_event_open: pid 4308 cpu 1 group_fd -1 flags 0x8 = 5 sys_perf_event_open: pid 4308 cpu 2 group_fd -1 flags 0x8 = 6 sys_perf_event_open: pid 4308 cpu 3 group_fd -1 flags 0x8 = 8 sys_perf_event_open: pid 4308 cpu 4 group_fd -1 flags 0x8 = 9 sys_perf_event_open: pid 4308 cpu 5 group_fd -1 flags 0x8 = 10 sys_perf_event_open: pid 4308 cpu 6 group_fd -1 flags 0x8 = 11 sys_perf_event_open: pid 4308 cpu 7 group_fd -1 flags 0x8 = 12 ------------------------------------------------------------ perf_event_attr: type 1 size 112 config 0x9 watermark 1 sample_id_all 1 bpf_event 1 { wakeup_events, wakeup_watermark } 1 ------------------------------------------------------------ sys_perf_event_open: pid -1 cpu 0 group_fd -1 flags 0x8 sys_perf_event_open failed, error -13 [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.002 MB perf.data (9 samples) ] Committer notes: Just like the 'verbose' variable this new 'debug_peo_args' needs to be added to util/python.c, since we don't link the debug.o file in the python binding, which ended up making 'perf test python' fail with: # perf test -v python 18: 'import perf' in python : --- start --- test child forked, pid 19237 Traceback (most recent call last): File "<stdin>", line 1, in <module> ImportError: /tmp/build/perf/python/perf.so: undefined symbol: debug_peo_args test child finished with -1 ---- end ---- 'import perf' in python: FAILED! # After adding that new variable to util/python.c: # perf test -v python 18: 'import perf' in python : --- start --- test child forked, pid 22364 test child finished with 0 ---- end ---- 'import perf' in python: Ok # Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Link: http://lore.kernel.org/lkml/20191108094128.28769-1-ravi.bangoria@linux.ibm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
![]() |
7b018e2987 |
perf map: Remove ->groups from 'struct map'
With this 'struct map' uses a bit over 3 cachelines: $ pahole -C map ~/bin/perf <SNIP> /* --- cacheline 2 boundary (128 bytes) --- */ u64 (*unmap_ip)(struct map *, u64); /* 128 8 */ struct dso * dso; /* 136 8 */ refcount_t refcnt; /* 144 4 */ /* size: 152, cachelines: 3, members: 18 */ /* sum members: 145, holes: 1, sum holes: 3 */ /* padding: 4 */ /* forced alignments: 2 */ /* last cacheline: 24 bytes */ } __attribute__((__aligned__(8))); $ We probably can move map->map/unmap_ip() moved to 'struct map_groups', that will shave more 16 bytes, getting this almost to two cachelines. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lkml.kernel.org/n/tip-ymlv3nzpofv2fugnjnizkrwy@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
![]() |
3f662fc08d |
perf map: Combine maps__fixup_overlappings with its only use
In the process we can kill some of the struct map->groups usage, trying to get rid of this per-full struct map fields getting in the way of sharing a map across father/parent processes. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lkml.kernel.org/n/tip-e50eqtqw3za24vmbjnqmmcs6@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
![]() |
94e44b9ca5 |
perf annotate: Stop using map->groups, use map_symbol->mg instead
These were the last uses of map->groups, next cset will nuke it. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lkml.kernel.org/n/tip-n3g0foos7l7uxq9nar0zo0vj@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
![]() |
08f6680e62 |
perf tools: Add a 'struct map_groups' pointer to 'struct map_symbol'
And fill it whenever we setup a a 'struct map_symbol', now we need to use it, next cset. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lkml.kernel.org/n/tip-fzwfcnddenz1o7uj1fzw3g46@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
![]() |
93fcce96c7 |
perf symbols: Use kmaps(map)->machine when we know its a kernel map
And then stop using map->groups to achieve that. To test that that branch is being taken, probe the function that is only called from there and then run something like 'perf top' in another xterm: # perf probe -x ~/bin/perf machine__map_x86_64_entry_trampolines Added new event: probe_perf:machine__map_x86_64_entry_trampolines (on machine__map_x86_64_entry_trampolines in /home/acme/bin/perf) You can now use it in all perf tools, such as: perf record -e probe_perf:machine__map_x86_64_entry_trampolines -aR sleep 1 # perf trace -e probe_perf:* 0.000 bash/10614 probe_perf:machine__map_x86_64_entry_trampolines(__probe_ip: 5224944) ^C# Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lkml.kernel.org/n/tip-lgrrzdxo2p9liq2keivcg887@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
![]() |
d46a4cdf49 |
pref tools: Make 'struct addr_map_symbol' contain 'struct map_symbol'
So that we pass that substructure around and with it consolidate lots of functions that receive a (map, symbol) pair and now can receive just a 'struct map_symbol' pointer. This further paves the way to add 'struct map_groups' to 'struct map_symbol' so that we can have all we need for annotation so that we can ditch 'struct map'->groups, i.e. have the map_groups pointer in a more central place, avoiding the pointer in the 'struct map' that have tons of instances. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lkml.kernel.org/n/tip-fs90ttd9q12l7989fo7pw81q@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
![]() |
5f0fef8ac3 |
perf callchain: Use 'struct map_symbol' in 'struct callchain_cursor_node'
To ease passing around map+symbol, just like done for other parts of the tree recently. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
![]() |
c1529738f5 |
perf unwind: Use 'struct map_symbol' in 'struct unwind_entry'
To help in passing that info around to callchain routines that, for the same reason, are moving to use 'struct map_symbol'. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lkml.kernel.org/n/tip-epsiibeprpxa8qpwji47uskc@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
![]() |
2975489458 |
perf annotate: Pass a 'map_symbol' in places receiving a pair of 'map' and 'symbol' pointers
We are already passing things like: symbol__annotate(ms->sym, ms->map, ...) So shorten the signature of such functions to receive the 'map_symbol' pointer. This also paves the way to having the 'struct map_groups' pointer in the 'struct map_symbol' so that we can get rid of 'struct map'->groups. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lkml.kernel.org/n/tip-23yx8v1t41nzpkpi7rdrozww@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
![]() |
d3a022cbdc |
perf tools: Add map_groups to 'struct addr_location'
From there we can get al->mg->machine, so replace that field with the more useful 'struct map_groups' that for now we're obtaining from al->map->groups, and that is one thing getting into the way of maps being fully shareable. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lkml.kernel.org/n/tip-4qdducrm32tgrjupcp0kjh1e@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
![]() |
9d355b381b |
perf map_groups: Pass the object to map_groups__find_ams()
We were just passing a map to look for and reuse its map->groups member, but the idea is that this is going away, as a map can be in multiple rb_trees when being reused via a map_node, so do as all the other map_groups methods and pass as its first arg the object being operated on. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lkml.kernel.org/n/tip-nmi2pbggqloogwl6vxrvex5a@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
![]() |
f2baa060cd |
perf symbols: Stop using map->groups, we can use kmaps instead
To test that that function is being called I just added a probe on that place, enabled it via 'perf trace' asking for at most 16 levels of backtraces, system wide, and then ran 'perf top' on another xterm, voilà: # perf probe -x ~/bin/perf dso__process_kernel_symbol Added new event: probe_perf:dso__process_kernel_symbol (on dso__process_kernel_symbol in /home/acme/bin/perf) You can now use it in all perf tools, such as: perf record -e probe_perf:dso__process_kernel_symbol -aR sleep 1 # perf trace -e probe_perf:dso__process_kernel_symbol/max-stack=16/ --max-events=2 # perf trace -e probe_perf:dso__process_kernel_symbol/max-stack=16/ --max-events=2 0.000 :17345/17345 probe_perf:dso__process_kernel_symbol(__probe_ip: 5680224) dso__process_kernel_symbol (/home/acme/bin/perf) dso__load_vmlinux (/home/acme/bin/perf) dso__load_vmlinux_path (/home/acme/bin/perf) dso__load (/home/acme/bin/perf) map__load (/home/acme/bin/perf) thread__find_map (/home/acme/bin/perf) machine__resolve (/home/acme/bin/perf) deliver_event (/home/acme/bin/perf) __ordered_events__flush.part.0 (/home/acme/bin/perf) process_thread (/home/acme/bin/perf) start_thread (/usr/lib64/libpthread-2.29.so) 0.064 :17345/17345 probe_perf:dso__process_kernel_symbol(__probe_ip: 5680224) dso__process_kernel_symbol (/home/acme/bin/perf) dso__load_vmlinux (/home/acme/bin/perf) dso__load_vmlinux_path (/home/acme/bin/perf) dso__load (/home/acme/bin/perf) map__load (/home/acme/bin/perf) thread__find_map (/home/acme/bin/perf) machine__resolve (/home/acme/bin/perf) deliver_event (/home/acme/bin/perf) __ordered_events__flush.part.0 (/home/acme/bin/perf) process_thread (/home/acme/bin/perf) start_thread (/usr/lib64/libpthread-2.29.so) # # perf stat -e probe_perf:dso__process_kernel_symbol ^C Performance counter stats for 'system wide': 107,308 probe_perf:dso__process_kernel_symbol 8.215399813 seconds time elapsed # Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lkml.kernel.org/n/tip-5fy66x5hr5ct9pmw84jkiwvm@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
![]() |
de90d513b2 |
perf map: Use map->dso->kernel + map__kmaps() in map__kmaps()
Its equivalent to using map->groups to obtain the machine struct. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lkml.kernel.org/n/tip-bdbazuj4ggrmzxdviaqdrdwh@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
![]() |
56b2147f34 |
perf/core improvements and fixes:
perf report: Jin Yao: - Introduce --total-cycles, for basic block profiling, further using data obtained from LBR, an example should suffice: # perf record -b ^C[ perf record: Woken up 595 times to write data ] [ perf record: Captured and wrote 156.672 MB perf.data (196873 samples) ] # perf evlist -v cycles: size: 112, { sample_period, sample_freq }: 4000, sample_type: IP|TID|TIME|CPU|PERIOD|BRANCH_STACK, read_format: ID, disabled: 1, inherit: 1, mmap: 1, comm: 1, freq: 1, task: 1, precise_ip: 3, sample_id_all: 1, exclude_guest: 1, mmap2: 1, comm_exec: 1, ksymbol: 1, bpf_event: 1, branch_sample_type: ANY # perf report --total-cycles --stdio # To display the perf.data header info, please use --header/--header-only options. # # Total Lost Samples: 0 # # Samples: 6M of event 'cycles' # Event count (approx.): 6299936 # # Sampled Sampled Avg Avg # Cycles% Cycles Cycles% Cycles [Program Block Range] Shared Object # ....... ...... ....... ..... .................................... ................ # 2.17% 1.7M 0.08% 607 [compiler.h:199 -> common.c:221] [kernel.vmlinux] 0.72% 544.5K 0.03% 230 [entry_64.S:657 -> entry_64.S:662] [kernel.vmlinux] 0.56% 541.8K 0.09% 672 [compiler.h:199 -> common.c:300] [kernel.vmlinux] 0.39% 293.2K 0.01% 104 [list_debug.c:43 -> list_debug.c:61] [kernel.vmlinux] 0.36% 278.6K 0.03% 272 [entry_64.S:1289 -> entry_64.S:1308] [kernel.vmlinux] perf record: Adrian Hunter: - Allow storing perf.data in a directory together with a copy of /proc/kcore. Jiwei Sun: - Add support for limit perf output file size, i.e.: # perf record --all-cpus -F 10000 --max-size=4M sleep 10h [ perf record: perf size limit reached (4097 KB), stopping session ] [ perf record: Woken up 6 times to write data ] [ perf record: Captured and wrote 4.048 MB perf.data (54094 samples) ] Terminated # ls -lah perf.data -rw-------. 1 root root 4.1M Nov 7 15:27 perf.data # perf stat: Jiri Olsa: - Add --per-node agregation support: In live mode: # perf stat -a -I 1000 -e cycles --per-node # time node cpus counts unit events 1.000542550 N0 20 6,202,097 cycles 1.000542550 N1 20 639,559 cycles 2.002040063 N0 20 7,412,495 cycles 2.002040063 N1 20 2,185,577 cycles 3.003451699 N0 20 6,508,917 cycles 3.003451699 N1 20 765,607 cycles ... Or in the record/report stat session: # perf stat record -a -I 1000 -e cycles # time counts unit events 1.000536937 10,008,468 cycles 2.002090152 9,578,539 cycles 3.003625233 7,647,869 cycles 4.005135036 7,032,086 cycles ^C 4.340902364 3,923,893 cycles # perf stat report --per-node # time node cpus counts unit events 1.000536937 N0 20 9,355,086 cycles 1.000536937 N1 20 653,382 cycles 2.002090152 N0 20 7,712,838 cycles 2.002090152 N1 20 1,865,701 cycles ... perf probe: Masami Hiramatsu: Various fixes related to recent additions to the DWARF format: - Fix to find range-only function instance - Walk function lines in lexical blocks - Fix to show function entry line as probe-able - Fix wrong address verification - Fix to probe a function which has no entry pc - Fix to probe an inline function which has no entry pc - Fix to list probe event with correct line number - Fix to show inlined function callsite without entry_pc - Fix to show ranges of variables in functions without entry_pc - Return a better scope DIE if there is no best scope - Skip end-of-sequence and non statement lines - Filter out instances except for inlined subroutine and subprogram - Fix to show calling lines of inlined functions - Skip overlapped location on searching variables perf inject: Adrian Hunter: - Do not strip evsels with --strip, as they are needed for create_gcov (see the autofdo example in tools/perf/Documentation/intel-pt.txt). Intel PT: Adrian Hunter: - Intel PT uses an auxtrace_cache to store the results of code-walking, to avoid repeated decoding. Add an auxtrace_cache__remove to handle text poke events. core: Andi Kleen: - Always preserve errno while cleaning up perf_event_open failures. llvm: Arnaldo Carvalho de Melo: - No need to tell that the request for saving a .o file for BPF events, as expressed in ~/.perfconfig was satisfied, make that a debug message. perf vendor events: Intel: Haiyan Song: - Update CascadelakeX events to v1.05. - Update all the Intel JSON metrics from TMAM 3.6. Treewide: Ian Rogers: - Improve error paths, plugging leaks found using LLVM tools such as libFuzzer. jevents: Yunfeng Ye: - Fix resource leak in process_mapfile() and main() perf kvm: Igor Lubashev: - Use evlist layer api when possible. libsubcmd: James Clark: - Move EXTRA_FLAGS to the end to allow overriding existing flags. - Use -O0 with DEBUG=1 perf diff: Jin Yao: - Don't use hack to skip column length calculation CoreSight ETM: Leo yan: - Fix definition of macro TO_CS_QUEUE_NR ARM64: John Garry: - Do not try to include libelf header files when its feature detection failed, fixing the cross build for ARM64. perf tests: Leo Yan: - Fix out of bounds memory access in the backward ring buffer test. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> -----BEGIN PGP SIGNATURE----- iHUEABYIAB0WIQR2GiIUctdOfX2qHhGyPKLppCJ+JwUCXcRowQAKCRCyPKLppCJ+ JxHcAQCTtl9N3zkNjLWif1i6AGKNU9TzYpup+jDR5J83ggLqgQD+O931nR9wXUOe 9bDUr45cNw3ZkRbc1558hKPWIsceJgU= =Rko+ -----END PGP SIGNATURE----- Merge tag 'perf-core-for-mingo-5.5-20191107' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/core Pull perf/core improvements and fixes from Arnaldo Carvalho de Melo: perf report: Jin Yao: - Introduce --total-cycles, for basic block profiling, further using data obtained from LBR, an example should suffice: # perf record -b ^C[ perf record: Woken up 595 times to write data ] [ perf record: Captured and wrote 156.672 MB perf.data (196873 samples) ] # perf evlist -v cycles: size: 112, { sample_period, sample_freq }: 4000, sample_type: IP|TID|TIME|CPU|PERIOD|BRANCH_STACK, read_format: ID, disabled: 1, inherit: 1, mmap: 1, comm: 1, freq: 1, task: 1, precise_ip: 3, sample_id_all: 1, exclude_guest: 1, mmap2: 1, comm_exec: 1, ksymbol: 1, bpf_event: 1, branch_sample_type: ANY # perf report --total-cycles --stdio # To display the perf.data header info, please use --header/--header-only options. # # Total Lost Samples: 0 # # Samples: 6M of event 'cycles' # Event count (approx.): 6299936 # # Sampled Sampled Avg Avg # Cycles% Cycles Cycles% Cycles [Program Block Range] Shared Object # ....... ...... ....... ..... .................................... ................ # 2.17% 1.7M 0.08% 607 [compiler.h:199 -> common.c:221] [kernel.vmlinux] 0.72% 544.5K 0.03% 230 [entry_64.S:657 -> entry_64.S:662] [kernel.vmlinux] 0.56% 541.8K 0.09% 672 [compiler.h:199 -> common.c:300] [kernel.vmlinux] 0.39% 293.2K 0.01% 104 [list_debug.c:43 -> list_debug.c:61] [kernel.vmlinux] 0.36% 278.6K 0.03% 272 [entry_64.S:1289 -> entry_64.S:1308] [kernel.vmlinux] perf record: Adrian Hunter: - Allow storing perf.data in a directory together with a copy of /proc/kcore. Jiwei Sun: - Add support for limit perf output file size, i.e.: # perf record --all-cpus -F 10000 --max-size=4M sleep 10h [ perf record: perf size limit reached (4097 KB), stopping session ] [ perf record: Woken up 6 times to write data ] [ perf record: Captured and wrote 4.048 MB perf.data (54094 samples) ] Terminated # ls -lah perf.data -rw-------. 1 root root 4.1M Nov 7 15:27 perf.data # perf stat: Jiri Olsa: - Add --per-node agregation support: In live mode: # perf stat -a -I 1000 -e cycles --per-node # time node cpus counts unit events 1.000542550 N0 20 6,202,097 cycles 1.000542550 N1 20 639,559 cycles 2.002040063 N0 20 7,412,495 cycles 2.002040063 N1 20 2,185,577 cycles 3.003451699 N0 20 6,508,917 cycles 3.003451699 N1 20 765,607 cycles ... Or in the record/report stat session: # perf stat record -a -I 1000 -e cycles # time counts unit events 1.000536937 10,008,468 cycles 2.002090152 9,578,539 cycles 3.003625233 7,647,869 cycles 4.005135036 7,032,086 cycles ^C 4.340902364 3,923,893 cycles # perf stat report --per-node # time node cpus counts unit events 1.000536937 N0 20 9,355,086 cycles 1.000536937 N1 20 653,382 cycles 2.002090152 N0 20 7,712,838 cycles 2.002090152 N1 20 1,865,701 cycles ... perf probe: Masami Hiramatsu: Various fixes related to recent additions to the DWARF format: - Fix to find range-only function instance - Walk function lines in lexical blocks - Fix to show function entry line as probe-able - Fix wrong address verification - Fix to probe a function which has no entry pc - Fix to probe an inline function which has no entry pc - Fix to list probe event with correct line number - Fix to show inlined function callsite without entry_pc - Fix to show ranges of variables in functions without entry_pc - Return a better scope DIE if there is no best scope - Skip end-of-sequence and non statement lines - Filter out instances except for inlined subroutine and subprogram - Fix to show calling lines of inlined functions - Skip overlapped location on searching variables perf inject: Adrian Hunter: - Do not strip evsels with --strip, as they are needed for create_gcov (see the autofdo example in tools/perf/Documentation/intel-pt.txt). Intel PT: Adrian Hunter: - Intel PT uses an auxtrace_cache to store the results of code-walking, to avoid repeated decoding. Add an auxtrace_cache__remove to handle text poke events. core: Andi Kleen: - Always preserve errno while cleaning up perf_event_open failures. llvm: Arnaldo Carvalho de Melo: - No need to tell that the request for saving a .o file for BPF events, as expressed in ~/.perfconfig was satisfied, make that a debug message. perf vendor events: Intel: Haiyan Song: - Update CascadelakeX events to v1.05. - Update all the Intel JSON metrics from TMAM 3.6. Treewide: Ian Rogers: - Improve error paths, plugging leaks found using LLVM tools such as libFuzzer. jevents: Yunfeng Ye: - Fix resource leak in process_mapfile() and main() perf kvm: Igor Lubashev: - Use evlist layer api when possible. libsubcmd: James Clark: - Move EXTRA_FLAGS to the end to allow overriding existing flags. - Use -O0 with DEBUG=1 perf diff: Jin Yao: - Don't use hack to skip column length calculation CoreSight ETM: Leo yan: - Fix definition of macro TO_CS_QUEUE_NR ARM64: John Garry: - Do not try to include libelf header files when its feature detection failed, fixing the cross build for ARM64. perf tests: Leo Yan: - Fix out of bounds memory access in the backward ring buffer test. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> |
|
![]() |
1ca7feb590 |
Linux 5.4-rc7
-----BEGIN PGP SIGNATURE----- iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAl3IqJQeHHRvcnZhbGRz QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiGOiUH+gOEDwid5OODaFAd CggXugdFIlBZefKqGVNW5sjgX8pxFWHXuEMC8iNb6QXtQZdFrI6LFf9hhUDmzQtm 6y1LPxxEiTZjObMEsBNylb7tyzgujFHcAlp0Zro3w/HLCqmYTSP3FF46i2u6KZfL XhkpM4X7R7qxlfpdhlfESv/ElRGocZe6SwXfC7pcPo5flFcmkdu9ijqhNd/6CZ/h Nf9rTsD/wEDVUelFbgVN+LJzlaB0tsyc4Zbof07n8OsFZjhdEOop8gfM/kTBLcyY 6bh66SfDScdsNnC/l8csbPjSZRx+i+nQs67DyhGNnsSAFgHBZdC4Tb/2mDCwhCLR dUvuYZc= =1N6F -----END PGP SIGNATURE----- Merge tag 'v5.4-rc7' into perf/core, to pick up fixes Signed-off-by: Ingo Molnar <mingo@kernel.org> |
|
![]() |
b584a17628 |
Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf tooling fixes from Thomas Gleixner: - Fix the time sorting algorithm which was broken due to truncation of big numbers - Fix the python script generator fail caused by a broken tracepoint array iterator * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf tools: Fix time sorting perf tools: Remove unused trace_find_next_event() perf scripting engines: Iterate on tep event arrays directly |
|
![]() |
7fa46cbf20 |
perf report: Sort by sampled cycles percent per block for tui
Previous patch has implemented a new option "--total-cycles". But only stdio mode is supported. This patch supports the tui mode and support '--percent-limit'. For example, perf record -b ./div perf report --total-cycles --percent-limit 1 # Samples: 2753248 of event 'cycles' Sampled Cycles% Sampled Cycles Avg Cycles% Avg Cycles [Program Block Range] Shared Object 26.04% 2.8M 0.40% 18 [div.c:42 -> div.c:39] div 15.17% 1.2M 0.16% 7 [random_r.c:357 -> random_r.c:380] libc-2.27.so 5.11% 402.0K 0.04% 2 [div.c:27 -> div.c:28] div 4.87% 381.6K 0.04% 2 [random.c:288 -> random.c:291] libc-2.27.so 4.53% 381.0K 0.04% 2 [div.c:40 -> div.c:40] div 3.85% 300.9K 0.02% 1 [div.c:22 -> div.c:25] div 3.08% 241.1K 0.02% 1 [rand.c:26 -> rand.c:27] libc-2.27.so 3.06% 240.0K 0.02% 1 [random.c:291 -> random.c:291] libc-2.27.so 2.78% 215.7K 0.02% 1 [random.c:298 -> random.c:298] libc-2.27.so 2.52% 198.3K 0.02% 1 [random.c:293 -> random.c:293] libc-2.27.so 2.36% 184.8K 0.02% 1 [rand.c:28 -> rand.c:28] libc-2.27.so 2.33% 180.5K 0.02% 1 [random.c:295 -> random.c:295] libc-2.27.so 2.28% 176.7K 0.02% 1 [random.c:295 -> random.c:295] libc-2.27.so 2.20% 168.8K 0.02% 1 [rand@plt+0 -> rand@plt+0] div 1.98% 158.2K 0.02% 1 [random_r.c:388 -> random_r.c:388] libc-2.27.so 1.57% 123.3K 0.02% 1 [div.c:42 -> div.c:44] div 1.44% 116.0K 0.42% 19 [random_r.c:357 -> random_r.c:394] libc-2.27.so -------------------------------------------------- v7: --- 1. Since we have used use_browser in report__browse_block_hists to support stdio mode, now we also add supporting for tui. 2. Move block tui browser code from ui/browsers/hists.c to block-info.c. v6: --- Create report__tui_browse_block_hists in block-info.c (codes are moved from builtin-report.c). v5: --- Fix a crash issue when running perf report without '--total-cycles'. The issue is because the internal flag is renamed from 'total_cycles' to 'total_cycles_mode' in previous patch but this patch still uses 'total_cycles' to check if the '--total-cycles' option is enabled, which causes the code to be inconsistent. v4: --- Since the block collection is moved out of printing in previous patch, this patch is updated accordingly for tui supporting. v3: --- Minor change since the function name is changed: block_total_cycles_percent -> block_info__total_cycles_percent Signed-off-by: Jin Yao <yao.jin@linux.intel.com> Reviewed-by: Jiri Olsa <jolsa@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jin Yao <yao.jin@intel.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lore.kernel.org/lkml/20191107074719.26139-8-yao.jin@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
![]() |
0b49f83657 |
perf report: Support --percent-limit for --total-cycles
We have already supported the '--total-cycles' option in previous patch. It's also useful to show entries only above a threshold percent. This patch enables '--percent-limit' for not showing entries under that percent. For example: perf report --total-cycles --stdio --percent-limit 1 # To display the perf.data header info, please use --header/--header-only options. # # # Total Lost Samples: 0 # # Samples: 2M of event 'cycles' # Event count (approx.): 2753248 # # Sampled Cycles% Sampled Cycles Avg Cycles% Avg Cycles [Program Block Range] Shared Object # ............... .............. ........... .......... ................................................................. .................... # 26.04% 2.8M 0.40% 18 [div.c:42 -> div.c:39] div 15.17% 1.2M 0.16% 7 [random_r.c:357 -> random_r.c:380] libc-2.27.so 5.11% 402.0K 0.04% 2 [div.c:27 -> div.c:28] div 4.87% 381.6K 0.04% 2 [random.c:288 -> random.c:291] libc-2.27.so 4.53% 381.0K 0.04% 2 [div.c:40 -> div.c:40] div 3.85% 300.9K 0.02% 1 [div.c:22 -> div.c:25] div 3.08% 241.1K 0.02% 1 [rand.c:26 -> rand.c:27] libc-2.27.so 3.06% 240.0K 0.02% 1 [random.c:291 -> random.c:291] libc-2.27.so 2.78% 215.7K 0.02% 1 [random.c:298 -> random.c:298] libc-2.27.so 2.52% 198.3K 0.02% 1 [random.c:293 -> random.c:293] libc-2.27.so 2.36% 184.8K 0.02% 1 [rand.c:28 -> rand.c:28] libc-2.27.so 2.33% 180.5K 0.02% 1 [random.c:295 -> random.c:295] libc-2.27.so 2.28% 176.7K 0.02% 1 [random.c:295 -> random.c:295] libc-2.27.so 2.20% 168.8K 0.02% 1 [rand@plt+0 -> rand@plt+0] div 1.98% 158.2K 0.02% 1 [random_r.c:388 -> random_r.c:388] libc-2.27.so 1.57% 123.3K 0.02% 1 [div.c:42 -> div.c:44] div 1.44% 116.0K 0.42% 19 [random_r.c:357 -> random_r.c:394] libc-2.27.so Committer testing: From second exapmple onwards slightly edited for brevity: # perf report --total-cycles --percent-limit 2 --stdio # To display the perf.data header info, please use --header/--header-only options. # # # Total Lost Samples: 0 # # Samples: 6M of event 'cycles' # Event count (approx.): 6299936 # # Sampled Cycles% Sampled Cycles Avg Cycles% Avg Cycles [Program Block Range] Shared Object # ............... .............. ........... .......... ...................................................................... .................... # 2.17% 1.7M 0.08% 607 [compiler.h:199 -> common.c:221] [kernel.vmlinux] # # (Tip: Create an archive with symtabs to analyse on other machine: perf archive) # # perf report --total-cycles --percent-limit 1 --stdio # Sampled Cycles% Sampled Cycles Avg Cycles% Avg Cycles [Program Block Range] Shared Object 2.17% 1.7M 0.08% 607 [compiler.h:199 -> common.c:221] [kernel.vmlinux] 1.75% 1.3M 8.34% 65.5K [memset-vec-unaligned-erms.S:147 -> memset-vec-unaligned-erms.S:151] libc-2.29.so # # perf report --total-cycles --percent-limit 0.7 --stdio # Sampled Cycles% Sampled Cycles Avg Cycles% Avg Cycles [Program Block Range] Shared Object 2.17% 1.7M 0.08% 607 [compiler.h:199 -> common.c:221] [kernel.vmlinux] 1.75% 1.3M 8.34% 65.5K [memset-vec-unaligned-erms.S:147 -> memset-vec-unaligned-erms.S:151] libc-2.29.so 0.72% 544.5K 0.03% 230 [entry_64.S:657 -> entry_64.S:662] [kernel.vmlinux] # ------------------------------------------- It only shows the entries which 'Sampled Cycles%' > 1%. v7: --- No functional change. Only fix the conflict issue because previous patches are changed. v6: --- No functional change. Only fix the conflict issue because previous patches are changed. v5: --- No functional change. Only fix the conflict issue because previous patches are changed. v4: --- No functional change. Only fix the build issue because previous patches are changed. Signed-off-by: Jin Yao <yao.jin@linux.intel.com> Reviewed-by: Jiri Olsa <jolsa@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jin Yao <yao.jin@intel.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lore.kernel.org/lkml/20191107074719.26139-7-yao.jin@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
![]() |
6f7164fa23 |
perf report: Sort by sampled cycles percent per block for stdio
It would be useful to support sorting for all blocks by the sampled cycles percent per block. This is useful to concentrate on the globally hottest blocks. This patch implements a new option "--total-cycles" which sorts all blocks by 'Sampled Cycles%'. The 'Sampled Cycles%' is the percent: percent = block sampled cycles aggregation / total sampled cycles Note that, this patch only supports "--stdio" mode. For example, # perf record -b ./div # perf report --total-cycles --stdio # To display the perf.data header info, please use --header/--header-only options. # # Total Lost Samples: 0 # # Samples: 2M of event 'cycles' # Event count (approx.): 2753248 # # Sampled Cycles% Sampled Cycles Avg Cycles% Avg Cycles [Program Block Range] Shared Object # ............... .............. ........... .......... ................................................ ................. # 26.04% 2.8M 0.40% 18 [div.c:42 -> div.c:39] div 15.17% 1.2M 0.16% 7 [random_r.c:357 -> random_r.c:380] libc-2.27.so 5.11% 402.0K 0.04% 2 [div.c:27 -> div.c:28] div 4.87% 381.6K 0.04% 2 [random.c:288 -> random.c:291] libc-2.27.so 4.53% 381.0K 0.04% 2 [div.c:40 -> div.c:40] div 3.85% 300.9K 0.02% 1 [div.c:22 -> div.c:25] div 3.08% 241.1K 0.02% 1 [rand.c:26 -> rand.c:27] libc-2.27.so 3.06% 240.0K 0.02% 1 [random.c:291 -> random.c:291] libc-2.27.so 2.78% 215.7K 0.02% 1 [random.c:298 -> random.c:298] libc-2.27.so 2.52% 198.3K 0.02% 1 [random.c:293 -> random.c:293] libc-2.27.so 2.36% 184.8K 0.02% 1 [rand.c:28 -> rand.c:28] libc-2.27.so 2.33% 180.5K 0.02% 1 [random.c:295 -> random.c:295] libc-2.27.so 2.28% 176.7K 0.02% 1 [random.c:295 -> random.c:295] libc-2.27.so 2.20% 168.8K 0.02% 1 [rand@plt+0 -> rand@plt+0] div 1.98% 158.2K 0.02% 1 [random_r.c:388 -> random_r.c:388] libc-2.27.so 1.57% 123.3K 0.02% 1 [div.c:42 -> div.c:44] div 1.44% 116.0K 0.42% 19 [random_r.c:357 -> random_r.c:394] libc-2.27.so 0.25% 182.5K 0.02% 1 [random_r.c:388 -> random_r.c:391] libc-2.27.so 0.00% 48 1.07% 48 [x86_pmu_enable+284 -> x86_pmu_enable+298] [kernel.kallsyms] 0.00% 74 1.64% 74 [vm_mmap_pgoff+0 -> vm_mmap_pgoff+92] [kernel.kallsyms] 0.00% 73 1.62% 73 [vm_mmap+0 -> vm_mmap+48] [kernel.kallsyms] 0.00% 63 0.69% 31 [up_write+0 -> up_write+34] [kernel.kallsyms] 0.00% 13 0.29% 13 [setup_arg_pages+396 -> setup_arg_pages+413] [kernel.kallsyms] 0.00% 3 0.07% 3 [setup_arg_pages+418 -> setup_arg_pages+450] [kernel.kallsyms] 0.00% 616 6.84% 308 [security_mmap_file+0 -> security_mmap_file+72] [kernel.kallsyms] 0.00% 23 0.51% 23 [security_mmap_file+77 -> security_mmap_file+87] [kernel.kallsyms] 0.00% 4 0.02% 1 [sched_clock+0 -> sched_clock+4] [kernel.kallsyms] 0.00% 4 0.02% 1 [sched_clock+9 -> sched_clock+12] [kernel.kallsyms] 0.00% 1 0.02% 1 [rcu_nmi_exit+0 -> rcu_nmi_exit+9] [kernel.kallsyms] Committer testing: This should provide material for hours of endless joy, both from looking for suspicious things in the implementation of this patch, such as the top one: # Sampled Cycles% Sampled Cycles Avg Cycles% Avg Cycles [Program Block Range] Shared Object 2.17% 1.7M 0.08% 607 [compiler.h:199 -> common.c:221] [kernel.vmlinux] As well from things that look legit: # Sampled Cycles% Sampled Cycles Avg Cycles% Avg Cycles [Program Block Range] Shared Object 0.16% 123.0K 0.60% 4.7K [nospec-branch.h:265 -> nospec-branch.h:278] [kernel.vmlinux] :-) Very short system wide taken branches session: # perf record -h -b Usage: perf record [<options>] [<command>] or: perf record [<options>] -- <command> [<options>] -b, --branch-any sample any taken branches # # perf record -b ^C[ perf record: Woken up 595 times to write data ] [ perf record: Captured and wrote 156.672 MB perf.data (196873 samples) ] # # perf evlist -v cycles: size: 112, { sample_period, sample_freq }: 4000, sample_type: IP|TID|TIME|CPU|PERIOD|BRANCH_STACK, read_format: ID, disabled: 1, inherit: 1, mmap: 1, comm: 1, freq: 1, task: 1, precise_ip: 3, sample_id_all: 1, exclude_guest: 1, mmap2: 1, comm_exec: 1, ksymbol: 1, bpf_event: 1, branch_sample_type: ANY # # perf report --total-cycles --stdio # To display the perf.data header info, please use --header/--header-only options. # # Total Lost Samples: 0 # # Samples: 6M of event 'cycles' # Event count (approx.): 6299936 # # Sampled Cycles% Sampled Cycles Avg Cycles% Avg Cycles [Program Block Range] Shared Object # ............... .............. ........... .......... ...................................................................... .................... # 2.17% 1.7M 0.08% 607 [compiler.h:199 -> common.c:221] [kernel.vmlinux] 1.75% 1.3M 8.34% 65.5K [memset-vec-unaligned-erms.S:147 -> memset-vec-unaligned-erms.S:151] libc-2.29.so 0.72% 544.5K 0.03% 230 [entry_64.S:657 -> entry_64.S:662] [kernel.vmlinux] 0.56% 541.8K 0.09% 672 [compiler.h:199 -> common.c:300] [kernel.vmlinux] 0.39% 293.2K 0.01% 104 [list_debug.c:43 -> list_debug.c:61] [kernel.vmlinux] 0.36% 278.6K 0.03% 272 [entry_64.S:1289 -> entry_64.S:1308] [kernel.vmlinux] 0.30% 260.8K 0.07% 564 [clear_page_64.S:47 -> clear_page_64.S:50] [kernel.vmlinux] 0.28% 215.3K 0.05% 369 [traps.c:623 -> traps.c:628] [kernel.vmlinux] 0.23% 178.1K 0.04% 278 [entry_64.S:271 -> entry_64.S:275] [kernel.vmlinux] 0.20% 152.6K 0.09% 706 [paravirt.c:177 -> paravirt.c:179] [kernel.vmlinux] 0.20% 155.8K 0.05% 373 [entry_64.S:153 -> entry_64.S:175] [kernel.vmlinux] 0.18% 136.6K 0.03% 222 [msr.h:105 -> msr.h:166] [kernel.vmlinux] 0.16% 123.0K 0.60% 4.7K [nospec-branch.h:265 -> nospec-branch.h:278] [kernel.vmlinux] 0.16% 118.3K 0.01% 44 [entry_64.S:632 -> entry_64.S:657] [kernel.vmlinux] 0.14% 104.5K 0.00% 28 [rwsem.c:1541 -> rwsem.c:1544] [kernel.vmlinux] 0.13% 99.2K 0.01% 53 [spinlock.c:150 -> spinlock.c:152] [kernel.vmlinux] 0.13% 95.5K 0.00% 35 [swap.c:456 -> swap.c:471] [kernel.vmlinux] 0.12% 96.2K 0.05% 407 [copy_user_64.S:175 -> copy_user_64.S:209] [kernel.vmlinux] 0.11% 85.9K 0.00% 31 [swap.c:400 -> page-flags.h:188] [kernel.vmlinux] 0.10% 73.0K 0.01% 52 [paravirt.h:763 -> list.h:131] [kernel.vmlinux] 0.07% 56.2K 0.03% 214 [filemap.c:1524 -> filemap.c:1557] [kernel.vmlinux] 0.07% 54.2K 0.02% 145 [memory.c:1032 -> memory.c:1049] [kernel.vmlinux] 0.07% 50.3K 0.00% 39 [mmzone.c:49 -> mmzone.c:69] [kernel.vmlinux] 0.06% 48.3K 0.01% 40 [paravirt.h:768 -> page_alloc.c:3304] [kernel.vmlinux] 0.06% 46.7K 0.02% 155 [memory.c:1032 -> memory.c:1056] [kernel.vmlinux] 0.06% 46.9K 0.01% 103 [swap.c:867 -> swap.c:902] [kernel.vmlinux] 0.06% 47.8K 0.00% 34 [entry_64.S:1201 -> entry_64.S:1202] [kernel.vmlinux] ----------------------------------------------------------- v7: --- Use use_browser in report__browse_block_hists for supporting stdio and potential tui mode. v6: --- Create report__browse_block_hists in block-info.c (codes are moved from builtin-report.c). It's called from perf_evlist__tty_browse_hists. v5: --- 1. Move all block functions to block-info.c 2. Move the code of setting ms in block hist_entry to other patch. v4: --- 1. Use new option '--total-cycles' to replace '-s total_cycles' in v3. 2. Move block info collection out of block info printing. v3: --- 1. Use common function block_info__process_sym to process the blocks per symbol. 2. Remove the nasty hack for skipping calculation of column length 3. Some minor cleanup Signed-off-by: Jin Yao <yao.jin@linux.intel.com> Reviewed-by: Jiri Olsa <jolsa@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jin Yao <yao.jin@intel.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lore.kernel.org/lkml/20191107074719.26139-6-yao.jin@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
![]() |
b65a7d372b |
perf hist: Support block formats with compare/sort/display
This patch provides helper routines to support new columns for block info output. The new columns are: Sampled Cycles% Sampled Cycles Avg Cycles% Avg Cycles [Program Block Range] Shared Object v5: --- 1. Move more block related functions from builtin-report.c to block-info.c 2. Set ms (map+sym) in block hist_entry. Because this info is needed for reporting the block range (i.e. source line) Committer notes: Remove unused set_fmt() function, some build were not completing with: util/block-info.c:396:20: error: unused function 'set_fmt' [-Werror,-Wunused-function] static inline void set_fmt(struct block_fmt *block_fmt, ^ 1 error generated. Signed-off-by: Jin Yao <yao.jin@linux.intel.com> Reviewed-by: Jiri Olsa <jolsa@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jin Yao <yao.jin@intel.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lore.kernel.org/lkml/20191107074719.26139-5-yao.jin@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
![]() |
7841f40aed |
perf hist: Count the total cycles of all samples
We can get the per sample cycles by hist__account_cycles(). It's also useful to know the total cycles of all samples in order to get the cycles coverage for a single program block in further. For example: coverage = per block sampled cycles / total sampled cycles This patch creates a new argument 'total_cycles' in hist__account_cycles(), which will be added with the cycles of each sample. Signed-off-by: Jin Yao <yao.jin@linux.intel.com> Reviewed-by: Jiri Olsa <jolsa@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jin Yao <yao.jin@intel.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lore.kernel.org/lkml/20191107074719.26139-4-yao.jin@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
![]() |
6041441870 |
perf block: Cleanup and refactor block info functions
We have already implemented some block-info related functions. Now it's time to do some cleanup, refactoring and move the functions and structures to new block-info.h/block-info.c. v4: --- Move code for skipping column length calculation to patch: 'perf diff: Don't use hack to skip column length calculation' v3: --- 1. Rename the patch title 2. Rename from block.h/block.c to block-info.h/block-info.c 3. Move more common part to block-info, such as block_info__process_sym. 4. Remove the nasty hack for skipping calculation of column length Signed-off-by: Jin Yao <yao.jin@linux.intel.com> Reviewed-by: Jiri Olsa <jolsa@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jin Yao <yao.jin@intel.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lore.kernel.org/lkml/20191107074719.26139-3-yao.jin@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
![]() |
0bdf181fe0 |
perf diff: Don't use hack to skip column length calculation
Previously we use a nasty hack to skip the hists__calc_col_len for block since this function is not very suitable for block column length calculation. This patch removes the hack code and add a check at the entry of hists__calc_col_len to skip for block case. Signed-off-by: Jin Yao <yao.jin@linux.intel.com> Reviewed-by: Jiri Olsa <jolsa@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jin Yao <yao.jin@intel.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lore.kernel.org/lkml/20191107074719.26139-2-yao.jin@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
![]() |
af8490eb2b |
perf tests: Fix out of bounds memory access
The test case 'Read backward ring buffer' failed on 32-bit architectures
which were found by LKFT perf testing. The test failed on arm32 x15
device, qemu_arm32, qemu_i386, and found intermittent failure on i386;
the failure log is as below:
50: Read backward ring buffer :
--- start ---
test child forked, pid 510
Using CPUID GenuineIntel-6-9E-9
mmap size 1052672B
mmap size 8192B
Finished reading overwrite ring buffer: rewind
free(): invalid next size (fast)
test child interrupted
---- end ----
Read backward ring buffer: FAILED!
The log hints there have issue for memory usage, thus free() reports
error 'invalid next size' and directly exit for the case. Finally, this
issue is root caused as out of bounds memory access for the data array
'evsel->id'.
The backward ring buffer test invokes do_test() twice. 'evsel->id' is
allocated at the first call with the flow:
test__backward_ring_buffer()
`-> do_test()
`-> evlist__mmap()
`-> evlist__mmap_ex()
`-> perf_evsel__alloc_id()
So 'evsel->id' is allocated with one item, and it will be used in
function perf_evlist__id_add():
evsel->id[0] = id
evsel->ids = 1
At the second call for do_test(), it skips to initialize 'evsel->id'
and reuses the array which is allocated in the first call. But
'evsel->ids' contains the stale value. Thus:
evsel->id[1] = id -> out of bound access
evsel->ids = 2
To fix this issue, we will use evlist__open() and evlist__close() pair
functions to prepare and cleanup context for evlist; so 'evsel->id' and
'evsel->ids' can be initialized properly when invoke do_test() and avoid
the out of bounds memory access.
Fixes:
|
|
![]() |
6d57581659 |
perf record: Add support for limit perf output file size
The patch adds a new option to limit the output file size, then based on it, we can create a wrapper of the perf command that uses the option to avoid exhausting the disk space by the unconscious user. In order to make the perf.data parsable, we just limit the sample data size, since the perf.data consists of many headers and sample data and other data, the actual size of the recorded file will bigger than the setting value. Testing it: # ./perf record -a -g --max-size=10M Couldn't synthesize bpf events. [ perf record: perf size limit reached (10249 KB), stopping session ] [ perf record: Woken up 32 times to write data ] [ perf record: Captured and wrote 10.133 MB perf.data (71964 samples) ] # ls -lh perf.data -rw------- 1 root root 11M Oct 22 14:32 perf.data # ./perf record -a -g --max-size=10K [ perf record: perf size limit reached (10 KB), stopping session ] Couldn't synthesize bpf events. [ perf record: Woken up 0 times to write data ] [ perf record: Captured and wrote 1.546 MB perf.data (69 samples) ] # ls -l perf.data -rw------- 1 root root 1626952 Oct 22 14:36 perf.data Committer notes: Fixed the build in multiple distros by using PRIu64 to print u64 struct members, fixing this: builtin-record.c: In function 'record__write': builtin-record.c:150:5: error: format '%lu' expects argument of type 'long unsigned int', but argument 3 has type 'u64' [-Werror=format=] rec->bytes_written >> 10); ^ CC /tmp/build/pe Signed-off-by: Jiwei Sun <jiwei.sun@windriver.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Michael Petlan <mpetlan@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Richard Danter <richard.danter@windriver.com> Link: http://lore.kernel.org/lkml/20191022080901.3841-1-jiwei.sun@windriver.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
![]() |
dee36a2abb |
perf probe: Skip overlapped location on searching variables
Since debuginfo__find_probes() callback function can be called with the location which already passed, the callback function must filter out such overlapped locations. add_probe_trace_event() has already done it by commit |
|
![]() |
86c0bf8539 |
perf probe: Fix to show calling lines of inlined functions
Fix to show calling lines of inlined functions (where an inline function
is called).
die_walk_lines() filtered out the lines inside inlined functions based
on the address. However this also filtered out the lines which call
those inlined functions from the target function.
To solve this issue, check the call_file and call_line attributes and do
not filter out if it matches to the line information.
Without this fix, perf probe -L doesn't show some lines correctly.
(don't see the lines after 17)
# perf probe -L vfs_read
<vfs_read@/home/mhiramat/ksrc/linux/fs/read_write.c:0>
0 ssize_t vfs_read(struct file *file, char __user *buf, size_t count, loff_t *pos)
1 {
2 ssize_t ret;
4 if (!(file->f_mode & FMODE_READ))
return -EBADF;
6 if (!(file->f_mode & FMODE_CAN_READ))
return -EINVAL;
8 if (unlikely(!access_ok(buf, count)))
return -EFAULT;
11 ret = rw_verify_area(READ, file, pos, count);
12 if (!ret) {
13 if (count > MAX_RW_COUNT)
count = MAX_RW_COUNT;
15 ret = __vfs_read(file, buf, count, pos);
16 if (ret > 0) {
fsnotify_access(file);
add_rchar(current, ret);
}
With this fix:
# perf probe -L vfs_read
<vfs_read@/home/mhiramat/ksrc/linux/fs/read_write.c:0>
0 ssize_t vfs_read(struct file *file, char __user *buf, size_t count, loff_t *pos)
1 {
2 ssize_t ret;
4 if (!(file->f_mode & FMODE_READ))
return -EBADF;
6 if (!(file->f_mode & FMODE_CAN_READ))
return -EINVAL;
8 if (unlikely(!access_ok(buf, count)))
return -EFAULT;
11 ret = rw_verify_area(READ, file, pos, count);
12 if (!ret) {
13 if (count > MAX_RW_COUNT)
count = MAX_RW_COUNT;
15 ret = __vfs_read(file, buf, count, pos);
16 if (ret > 0) {
17 fsnotify_access(file);
18 add_rchar(current, ret);
}
20 inc_syscr(current);
}
Fixes:
|
|
![]() |
da6cb952a8 |
perf probe: Filter out instances except for inlined subroutine and subprogram
Filter out instances except for inlined_subroutine and subprogram DIE in
die_walk_instances() and die_is_func_instance().
This fixes an issue that perf probe sets some probes on calling address
instead of a target function itself.
When perf probe walks on instances of an abstruct origin (a kind of
function prototype of inlined function), die_walk_instances() can also
pass a GNU_call_site (a GNU extension for call site) to callback. Since
it is not an inlined instance of target function, we have to filter out
when searching a probe point.
Without this patch, perf probe sets probes on call site address too.This
can happen on some function which is marked "inlined", but has actual
symbol. (I'm not sure why GCC mark it "inlined"):
# perf probe -D vfs_read
p:probe/vfs_read _text+2500017
p:probe/vfs_read_1 _text+2499468
p:probe/vfs_read_2 _text+2499563
p:probe/vfs_read_3 _text+2498876
p:probe/vfs_read_4 _text+2498512
p:probe/vfs_read_5 _text+2498627
With this patch:
Slightly different results, similar tho:
# perf probe -D vfs_read
p:probe/vfs_read _text+2498512
Committer testing:
# uname -a
Linux quaco 5.3.8-200.fc30.x86_64 #1 SMP Tue Oct 29 14:46:22 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
Before:
# perf probe -D vfs_read
p:probe/vfs_read _text+3131557
p:probe/vfs_read_1 _text+3130975
p:probe/vfs_read_2 _text+3131047
p:probe/vfs_read_3 _text+3130380
p:probe/vfs_read_4 _text+3130000
# uname -a
Linux quaco 5.3.8-200.fc30.x86_64 #1 SMP Tue Oct 29 14:46:22 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
#
After:
# perf probe -D vfs_read
p:probe/vfs_read _text+3130000
#
Fixes:
|
|
![]() |
f4d99bdfd1 |
perf probe: Skip end-of-sequence and non statement lines
Skip end-of-sequence and non-statement lines while walking through lines
list.
The "end-of-sequence" line information means:
"the current address is that of the first byte after the
end of a sequence of target machine instructions."
(DWARF version 4 spec 6.2.2)
This actually means out of scope and we can not probe on it.
On the other hand, the statement lines (is_stmt) means:
"the current instruction is a recommended breakpoint location.
A recommended breakpoint location is intended to “represent”
a line, a statement and/or a semantically distinct subpart
of a statement."
(DWARF version 4 spec 6.2.2)
So, non-statement line info also should be skipped.
These can reduce unneeded probe points and also avoid an error.
E.g. without this patch:
# perf probe -a "clear_tasks_mm_cpumask:1"
Added new events:
probe:clear_tasks_mm_cpumask (on clear_tasks_mm_cpumask:1)
probe:clear_tasks_mm_cpumask_1 (on clear_tasks_mm_cpumask:1)
probe:clear_tasks_mm_cpumask_2 (on clear_tasks_mm_cpumask:1)
probe:clear_tasks_mm_cpumask_3 (on clear_tasks_mm_cpumask:1)
probe:clear_tasks_mm_cpumask_4 (on clear_tasks_mm_cpumask:1)
You can now use it in all perf tools, such as:
perf record -e probe:clear_tasks_mm_cpumask_4 -aR sleep 1
#
This puts 5 probes on one line, but acutally it's not inlined function.
This is because there are many non statement instructions at the
function prologue.
With this patch:
# perf probe -a "clear_tasks_mm_cpumask:1"
Added new event:
probe:clear_tasks_mm_cpumask (on clear_tasks_mm_cpumask:1)
You can now use it in all perf tools, such as:
perf record -e probe:clear_tasks_mm_cpumask -aR sleep 1
#
Now perf-probe skips unneeded addresses.
Committer testing:
Slightly different results, but similar:
Before:
# uname -a
Linux quaco 5.3.8-200.fc30.x86_64 #1 SMP Tue Oct 29 14:46:22 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
#
# perf probe -a "clear_tasks_mm_cpumask:1"
Added new events:
probe:clear_tasks_mm_cpumask (on clear_tasks_mm_cpumask:1)
probe:clear_tasks_mm_cpumask_1 (on clear_tasks_mm_cpumask:1)
probe:clear_tasks_mm_cpumask_2 (on clear_tasks_mm_cpumask:1)
You can now use it in all perf tools, such as:
perf record -e probe:clear_tasks_mm_cpumask_2 -aR sleep 1
#
After:
# perf probe -a "clear_tasks_mm_cpumask:1"
Added new event:
probe:clear_tasks_mm_cpumask (on clear_tasks_mm_cpumask:1)
You can now use it in all perf tools, such as:
perf record -e probe:clear_tasks_mm_cpumask -aR sleep 1
# perf probe -l
probe:clear_tasks_mm_cpumask (on clear_tasks_mm_cpumask@kernel/cpu.c)
#
Fixes:
|
|
![]() |
c701636aee |
perf probe: Return a better scope DIE if there is no best scope
Make find_best_scope() returns innermost DIE at given address if there is no best matched scope DIE. Since Gcc sometimes generates intuitively strange line info which is out of inlined function address range, we need this fixup. Without this, sometimes perf probe failed to probe on a line inside an inlined function: # perf probe -D ksys_open:3 Failed to find scope of probe point. Error: Failed to add events. With this fix, 'perf probe' can probe it: # perf probe -D ksys_open:3 p:probe/ksys_open _text+25707308 p:probe/ksys_open_1 _text+25710596 p:probe/ksys_open_2 _text+25711114 p:probe/ksys_open_3 _text+25711343 p:probe/ksys_open_4 _text+25714058 p:probe/ksys_open_5 _text+2819653 p:probe/ksys_open_6 _text+2819701 Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Ravi Bangoria <ravi.bangoria@linux.ibm.com> Cc: Steven Rostedt (VMware) <rostedt@goodmis.org> Cc: Tom Zanussi <tom.zanussi@linux.intel.com> Link: http://lore.kernel.org/lkml/157291300887.19771.14936015360963292236.stgit@devnote2 Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
![]() |
5c65b1c084 |
perf annotate: Fix heap overflow
Fix expand_tabs that copies the source lines '\0' and then appends another '\0' at a potentially out of bounds address. Signed-off-by: Ian Rogers <irogers@google.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Jin Yao <yao.jin@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Song Liu <songliubraving@fb.com> Cc: Stephane Eranian <eranian@google.com> Link: http://lore.kernel.org/lkml/20191026035644.217548-1-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
![]() |
93730f85eb |
perf machine: Add kernel_dso() method
To reduce boilerplate in some places. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lkml.kernel.org/n/tip-9s1bgoxxhlnu037e1nqx0tw3@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
![]() |
b0c76fc4cf |
perf symbols: Remove needless checks for map->groups->machine
Its sufficient to check if map->groups is NULL before using it to get ->machine value. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lkml.kernel.org/n/tip-utiepyiv8b1tf8f79ok9d6j8@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
![]() |
1dc925568f |
perf parse: Add a deep delete for parse event terms
Add a parse_events_term deep delete function so that owned strings and arrays are freed. Signed-off-by: Ian Rogers <irogers@google.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Andi Kleen <ak@linux.intel.com> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: Jin Yao <yao.jin@linux.intel.com> Cc: John Garry <john.garry@huawei.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Martin KaFai Lau <kafai@fb.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Song Liu <songliubraving@fb.com> Cc: Stephane Eranian <eranian@google.com> Cc: Yonghong Song <yhs@fb.com> Cc: bpf@vger.kernel.org Cc: clang-built-linux@googlegroups.com Cc: netdev@vger.kernel.org Link: http://lore.kernel.org/lkml/20191030223448.12930-10-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
![]() |
38f2c4226e |
perf parse: If pmu configuration fails free terms
Avoid a memory leak when the configuration fails. Signed-off-by: Ian Rogers <irogers@google.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Andi Kleen <ak@linux.intel.com> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: Jin Yao <yao.jin@linux.intel.com> Cc: John Garry <john.garry@huawei.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Martin KaFai Lau <kafai@fb.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Song Liu <songliubraving@fb.com> Cc: Stephane Eranian <eranian@google.com> Cc: Yonghong Song <yhs@fb.com> Cc: bpf@vger.kernel.org Cc: clang-built-linux@googlegroups.com Cc: netdev@vger.kernel.org Link: http://lore.kernel.org/lkml/20191030223448.12930-9-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
![]() |
cabbf26821 |
perf parse: Before yyabort-ing free components
Yyabort doesn't destruct inputs and so this must be done manually before using yyabort. Signed-off-by: Ian Rogers <irogers@google.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Andi Kleen <ak@linux.intel.com> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: Jin Yao <yao.jin@linux.intel.com> Cc: John Garry <john.garry@huawei.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Martin KaFai Lau <kafai@fb.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Song Liu <songliubraving@fb.com> Cc: Stephane Eranian <eranian@google.com> Cc: Yonghong Song <yhs@fb.com> Cc: bpf@vger.kernel.org Cc: clang-built-linux@googlegroups.com Cc: netdev@vger.kernel.org Link: http://lore.kernel.org/lkml/20191030223448.12930-8-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
![]() |
f2a8ecd8b1 |
perf parse: Add destructors for parse event terms
If parsing fails then destructors are ran to clean the up the stack. Rename the head union member to make the term and evlist use cases more distinct, this simplifies matching the correct destructor. Committer notes: Jiri: "Nice did not know about this.. looks like it's been in bison for some time, right?" Ian: "Looks like it wasn't in Bison 1 but in Bison 2, we're at Bison 3 and Bison 2 is > 14 years old: https://web.archive.org/web/20050924004158/http://www.gnu.org/software/bison/manual/html_mono/bison.html#Destructor-Decl" Signed-off-by: Ian Rogers <irogers@google.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Andi Kleen <ak@linux.intel.com> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: Jin Yao <yao.jin@linux.intel.com> Cc: John Garry <john.garry@huawei.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Martin KaFai Lau <kafai@fb.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Song Liu <songliubraving@fb.com> Cc: Stephane Eranian <eranian@google.com> Cc: Yonghong Song <yhs@fb.com> Cc: bpf@vger.kernel.org Cc: clang-built-linux@googlegroups.com Cc: netdev@vger.kernel.org Link: http://lore.kernel.org/lkml/20191030223448.12930-7-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
![]() |
b6645a7235 |
perf parse: Ensure config and str in terms are unique
Make it easier to release memory associated with parse event terms by duplicating the string for the config name and ensuring the val string is a duplicate. Currently the parser may memory leak terms and this is addressed in a later patch. Signed-off-by: Ian Rogers <irogers@google.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Andi Kleen <ak@linux.intel.com> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: Jin Yao <yao.jin@linux.intel.com> Cc: John Garry <john.garry@huawei.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Martin KaFai Lau <kafai@fb.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Song Liu <songliubraving@fb.com> Cc: Stephane Eranian <eranian@google.com> Cc: Yonghong Song <yhs@fb.com> Cc: bpf@vger.kernel.org Cc: clang-built-linux@googlegroups.com Cc: netdev@vger.kernel.org Link: http://lore.kernel.org/lkml/20191030223448.12930-6-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
![]() |
448d732cef |
perf parse: Add parse events handle error
Parse event error handling may overwrite one error string with another creating memory leaks. Introduce a helper routine that warns about multiple error messages as well as avoiding the memory leak. A reproduction of this problem can be seen with: perf stat -e c/c/ After this change this produces: WARNING: multiple event parsing errors event syntax error: 'c/c/' \___ unknown term valid terms: event,filter_rem,filter_opc0,edge,filter_isoc,filter_tid,filter_loc,filter_nc,inv,umask,filter_opc1,tid_en,thresh,filter_all_op,filter_not_nm,filter_state,filter_nm,config,config1,config2,name,period,percore Run 'perf list' for a list of valid events Usage: perf stat [<options>] [<command>] -e, --event <event> event selector. use 'perf list' to list available events Signed-off-by: Ian Rogers <irogers@google.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Andi Kleen <ak@linux.intel.com> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: Jin Yao <yao.jin@linux.intel.com> Cc: John Garry <john.garry@huawei.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Martin KaFai Lau <kafai@fb.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Song Liu <songliubraving@fb.com> Cc: Stephane Eranian <eranian@google.com> Cc: Yonghong Song <yhs@fb.com> Cc: bpf@vger.kernel.org Cc: clang-built-linux@googlegroups.com Cc: netdev@vger.kernel.org Link: http://lore.kernel.org/lkml/20191030223448.12930-2-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
![]() |
ef5502a1d9 |
perf inject: Make --strip keep evsels
create_gcov (refer to the autofdo example in tools/perf/Documentation/intel-pt.txt) now needs the evsels to read the perf.data file. So don't strip them. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Link: http://lore.kernel.org/lkml/20191105100057.21465-1-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
![]() |
71f699078b |
perf tools: Fix cross compile for ARM64
Currently when cross compiling perf tool for ARM64 on my x86 machine I get this error: arch/arm64/util/sym-handling.c:9:10: fatal error: gelf.h: No such file or directory #include <gelf.h> For the build, libelf is reported off: Auto-detecting system features: ... ... libelf: [ OFF ] Indeed, test-libelf is not built successfully: more ./build/feature/test-libelf.make.output test-libelf.c:2:10: fatal error: libelf.h: No such file or directory #include <libelf.h> ^~~~~~~~~~ compilation terminated. I have no such problems natively compiling on ARM64, and I did not previously have this issue for cross compiling. Fix by relocating the gelf.h include. Signed-off-by: John Garry <john.garry@huawei.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Will Deacon <will@kernel.org> Cc: linux-arm-kernel@lists.infradead.org Link: http://lore.kernel.org/lkml/1573045254-39833-1-git-send-email-john.garry@huawei.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
![]() |
86895b480a |
perf stat: Add --per-node agregation support
Adding new --per-node option to aggregate counts per NUMA nodes for system-wide mode measurements. You can specify --per-node in live mode: # perf stat -a -I 1000 -e cycles --per-node # time node cpus counts unit events 1.000542550 N0 20 6,202,097 cycles 1.000542550 N1 20 639,559 cycles 2.002040063 N0 20 7,412,495 cycles 2.002040063 N1 20 2,185,577 cycles 3.003451699 N0 20 6,508,917 cycles 3.003451699 N1 20 765,607 cycles ... Or in the record/report stat session: # perf stat record -a -I 1000 -e cycles # time counts unit events 1.000536937 10,008,468 cycles 2.002090152 9,578,539 cycles 3.003625233 7,647,869 cycles 4.005135036 7,032,086 cycles ^C 4.340902364 3,923,893 cycles # perf stat report --per-node # time node cpus counts unit events 1.000536937 N0 20 9,355,086 cycles 1.000536937 N1 20 653,382 cycles 2.002090152 N0 20 7,712,838 cycles 2.002090152 N1 20 1,865,701 cycles 3.003625233 N0 20 6,604,441 cycles 3.003625233 N1 20 1,043,428 cycles 4.005135036 N0 20 6,350,522 cycles 4.005135036 N1 20 681,564 cycles 4.340902364 N0 20 3,403,188 cycles 4.340902364 N1 20 520,705 cycles Signed-off-by: Jiri Olsa <jolsa@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Alexey Budankov <alexey.budankov@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Joe Mario <jmario@redhat.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Michael Petlan <mpetlan@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/20190904073415.723-4-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
![]() |
389799a7a1 |
perf env: Add perf_env__numa_node()
To speed up cpu to node lookup, add perf_env__numa_node(), that creates cpu array on the first lookup, that holds numa nodes for each stored cpu. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Alexey Budankov <alexey.budankov@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Joe Mario <jmario@redhat.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Michael Petlan <mpetlan@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/20190904073415.723-3-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
![]() |
61ec07f591 |
perf vendor events intel: Update all the Intel JSON metrics from TMAM 3.6.
New Metrics: - DSB_Switches: fraction of cycles CPU was stalled due to switches from DSB to MITE pipeline [all] - L2_Evictions_{Silent|NonSilent}_PKI: L2 {silent|non silent} ecivtions rate per Kilo instruction [SKX+] - IpFarBranch - Instructions per Far Branch Other Enhancements & fixes: - KBLR/CFL & CLX move to separate columns (no column sharing via if #model) - Re-organized/renamed Metric Group Signed-off-by: Haiyan Song <haiyanx.song@intel.com> Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jin Yao <yao.jin@intel.com> Link: http://lore.kernel.org/lkml/20191030082308.10919-1-haiyanx.song@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
![]() |
7fcf1b89c8 |
perf vendor events intel: Update CascadelakeX events to v1.05
Update CascadelakeX events to v1.05. Other changes: remove duplicated and without description events. Signed-off-by: Haiyan Song <haiyanx.song@intel.com> Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jin Yao <yao.jin@intel.com> Link: http://lore.kernel.org/lkml/20191030082308.10919-1-haiyanx.song@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
![]() |
8e8714c3d1 |
perf tools: Splice events onto evlist even on error
If event parsing fails the event list is leaked, instead splice the list onto the out result and let the caller cleanup. An example input for parse_events found by libFuzzer that reproduces this memory leak is 'm{'. Signed-off-by: Ian Rogers <irogers@google.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Andi Kleen <ak@linux.intel.com> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: Jin Yao <yao.jin@linux.intel.com> Cc: John Garry <john.garry@huawei.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Martin KaFai Lau <kafai@fb.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Song Liu <songliubraving@fb.com> Cc: Stephane Eranian <eranian@google.com> Cc: Yonghong Song <yhs@fb.com> Cc: bpf@vger.kernel.org Cc: clang-built-linux@googlegroups.com Cc: netdev@vger.kernel.org Link: http://lore.kernel.org/lkml/20191025180827.191916-5-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
![]() |
50481461cf |
perf map_groups: Introduce for_each_entry() and for_each_entry_safe() iterators
To reduce boilerplate, providing a more compact form to iterate over the maps in a map_group. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lkml.kernel.org/n/tip-gc3go6fmdn30twusg91t2q56@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
![]() |
8efc4f0568 |
perf maps: Add for_each_entry()/_safe() iterators
To reduce boilerplate, provide a more compact form using an idiom present in other trees of data structures. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lkml.kernel.org/n/tip-59gmq4kg1r68ou1wknyjl78x@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
![]() |
20419d3a5b |
perf map: Allow map__next() to receive a NULL arg
Just like free(), return NULL in that case, will simplify the for_each_entry_safe() iterators. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lkml.kernel.org/n/tip-pbde2ucn49khnrebclys9pny@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
![]() |
ee2555b612 |
perf map: Check if the map still has some refcounts on exit
We were checking just if it was still on some rb tree, but that is not the only way that this map can still have references, map->refcnt is there exactly for this, use it. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lkml.kernel.org/n/tip-hany65tbeavsax7n3xvwl9pc@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
![]() |
b86a9d918a |
perf dso: Add dso__data_write_cache_addr()
Add functions to write into the dso file data cache, but not change the file itself. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Borislav Petkov <bp@alien8.de> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: x86@kernel.org Link: http://lore.kernel.org/lkml/20191025130000.13032-4-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
![]() |
366df72657 |
perf dso: Refactor dso_cache__read()
Refactor dso_cache__read() to separate populating the cache from copying data from it. This is preparation for adding a cache "write" that will update the data in the cache. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Borislav Petkov <bp@alien8.de> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: x86@kernel.org Link: http://lore.kernel.org/lkml/20191025130000.13032-3-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
![]() |
fd62c1097a |
perf auxtrace: Add auxtrace_cache__remove()
Add auxtrace_cache__remove(). Intel PT uses an auxtrace_cache to store the results of code-walking, so that the same block of instructions does not have to be decoded repeatedly. However, when there are text poke events, the associated cache entries need to be removed. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Borislav Petkov <bp@alien8.de> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: x86@kernel.org Link: http://lore.kernel.org/lkml/20191025130000.13032-6-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
![]() |
af04dd2f8e |
perf probe: Fix to show ranges of variables in functions without entry_pc
Fix to show ranges of variables (--range and --vars option) in functions
which DIE has only ranges but no entry_pc attribute.
Without this fix:
# perf probe --range -V clear_tasks_mm_cpumask
Available variables at clear_tasks_mm_cpumask
@<clear_tasks_mm_cpumask+0>
(No matched variables)
With this fix:
# perf probe --range -V clear_tasks_mm_cpumask
Available variables at clear_tasks_mm_cpumask
@<clear_tasks_mm_cpumask+0>
[VAL] int cpu @<clear_tasks_mm_cpumask+[0-35,317-317,2052-2059]>
Committer testing:
Before:
[root@quaco ~]# perf probe --range -V clear_tasks_mm_cpumask
Available variables at clear_tasks_mm_cpumask
@<clear_tasks_mm_cpumask+0>
(No matched variables)
[root@quaco ~]#
After:
[root@quaco ~]# perf probe --range -V clear_tasks_mm_cpumask
Available variables at clear_tasks_mm_cpumask
@<clear_tasks_mm_cpumask+0>
[VAL] int cpu @<clear_tasks_mm_cpumask+[0-23,23-105,105-106,106-106,1843-1850,1850-1862]>
[root@quaco ~]#
Using it:
[root@quaco ~]# perf probe clear_tasks_mm_cpumask cpu
Added new event:
probe:clear_tasks_mm_cpumask (on clear_tasks_mm_cpumask with cpu)
You can now use it in all perf tools, such as:
perf record -e probe:clear_tasks_mm_cpumask -aR sleep 1
[root@quaco ~]# perf probe -l
probe:clear_tasks_mm_cpumask (on clear_tasks_mm_cpumask@kernel/cpu.c with cpu)
[root@quaco ~]#
[root@quaco ~]# perf trace -e probe:*cpumask
^C[root@quaco ~]#
Fixes:
|
|
![]() |
18e21eb671 |
perf probe: Fix to show inlined function callsite without entry_pc
Fix 'perf probe --line' option to show inlined function callsite lines
even if the function DIE has only ranges.
Without this:
# perf probe -L amd_put_event_constraints
...
2 {
3 if (amd_has_nb(cpuc) && amd_is_nb_event(&event->hw))
__amd_put_nb_event_constraints(cpuc, event);
5 }
With this patch:
# perf probe -L amd_put_event_constraints
...
2 {
3 if (amd_has_nb(cpuc) && amd_is_nb_event(&event->hw))
4 __amd_put_nb_event_constraints(cpuc, event);
5 }
Committer testing:
Before:
[root@quaco ~]# perf probe -L amd_put_event_constraints
<amd_put_event_constraints@/usr/src/debug/kernel-5.2.fc30/linux-5.2.18-200.fc30.x86_64/arch/x86/events/amd/core.c:0>
0 static void amd_put_event_constraints(struct cpu_hw_events *cpuc,
struct perf_event *event)
2 {
3 if (amd_has_nb(cpuc) && amd_is_nb_event(&event->hw))
__amd_put_nb_event_constraints(cpuc, event);
5 }
PMU_FORMAT_ATTR(event, "config:0-7,32-35");
PMU_FORMAT_ATTR(umask, "config:8-15" );
[root@quaco ~]#
After:
[root@quaco ~]# perf probe -L amd_put_event_constraints
<amd_put_event_constraints@/usr/src/debug/kernel-5.2.fc30/linux-5.2.18-200.fc30.x86_64/arch/x86/events/amd/core.c:0>
0 static void amd_put_event_constraints(struct cpu_hw_events *cpuc,
struct perf_event *event)
2 {
3 if (amd_has_nb(cpuc) && amd_is_nb_event(&event->hw))
4 __amd_put_nb_event_constraints(cpuc, event);
5 }
PMU_FORMAT_ATTR(event, "config:0-7,32-35");
PMU_FORMAT_ATTR(umask, "config:8-15" );
[root@quaco ~]# perf probe amd_put_event_constraints:4
Added new event:
probe:amd_put_event_constraints (on amd_put_event_constraints:4)
You can now use it in all perf tools, such as:
perf record -e probe:amd_put_event_constraints -aR sleep 1
[root@quaco ~]#
[root@quaco ~]# perf probe -l
probe:amd_put_event_constraints (on amd_put_event_constraints:4@arch/x86/events/amd/core.c)
probe:clear_tasks_mm_cpumask (on clear_tasks_mm_cpumask@kernel/cpu.c)
[root@quaco ~]#
Using it:
[root@quaco ~]# perf trace -e probe:*
^C[root@quaco ~]#
Ok, Intel system here... :-)
Fixes:
|
|
![]() |
3895534dd7 |
perf probe: Fix to list probe event with correct line number
Since debuginfo__find_probe_point() uses dwarf_entrypc() for finding the
entry address of the function on which a probe is, it will fail when the
function DIE has only ranges attribute.
To fix this issue, use die_entrypc() instead of dwarf_entrypc().
Without this fix, perf probe -l shows incorrect offset:
# perf probe -l
probe:clear_tasks_mm_cpumask (on clear_tasks_mm_cpumask+18446744071579263632@work/linux/linux/kernel/cpu.c)
probe:clear_tasks_mm_cpumask_1 (on clear_tasks_mm_cpumask+18446744071579263752@work/linux/linux/kernel/cpu.c)
With this:
# perf probe -l
probe:clear_tasks_mm_cpumask (on clear_tasks_mm_cpumask@work/linux/linux/kernel/cpu.c)
probe:clear_tasks_mm_cpumask_1 (on clear_tasks_mm_cpumask:21@work/linux/linux/kernel/cpu.c)
Committer testing:
Before:
[root@quaco ~]# perf probe -l
probe:clear_tasks_mm_cpumask (on clear_tasks_mm_cpumask+18446744071579765152@kernel/cpu.c)
[root@quaco ~]#
After:
[root@quaco ~]# perf probe -l
probe:clear_tasks_mm_cpumask (on clear_tasks_mm_cpumask@kernel/cpu.c)
[root@quaco ~]#
Fixes:
|
|
![]() |
eb6933b29d |
perf probe: Fix to probe an inline function which has no entry pc
Fix perf probe to probe an inlne function which has no entry pc
or low pc but only has ranges attribute.
This seems very rare case, but I could find a few examples, as
same as probe_point_search_cb(), use die_entrypc() to get the
entry address in probe_point_inline_cb() too.
Without this patch:
# perf probe -D __amd_put_nb_event_constraints
Failed to get entry address of __amd_put_nb_event_constraints.
Probe point '__amd_put_nb_event_constraints' not found.
Error: Failed to add events.
With this patch:
# perf probe -D __amd_put_nb_event_constraints
p:probe/__amd_put_nb_event_constraints amd_put_event_constraints+43
Committer testing:
Before:
[root@quaco ~]# perf probe -D __amd_put_nb_event_constraints
Failed to get entry address of __amd_put_nb_event_constraints.
Probe point '__amd_put_nb_event_constraints' not found.
Error: Failed to add events.
[root@quaco ~]#
After:
[root@quaco ~]# perf probe -D __amd_put_nb_event_constraints
p:probe/__amd_put_nb_event_constraints _text+33789
[root@quaco ~]#
Fixes:
|
|
![]() |
5d16dbcc31 |
perf probe: Fix to probe a function which has no entry pc
Fix 'perf probe' to probe a function which has no entry pc or low pc but
only has ranges attribute.
probe_point_search_cb() uses dwarf_entrypc() to get the probe address,
but that doesn't work for the function DIE which has only ranges
attribute. Use die_entrypc() instead.
Without this fix:
# perf probe -k ../build-x86_64/vmlinux -D clear_tasks_mm_cpumask:0
Probe point 'clear_tasks_mm_cpumask' not found.
Error: Failed to add events.
With this:
# perf probe -k ../build-x86_64/vmlinux -D clear_tasks_mm_cpumask:0
p:probe/clear_tasks_mm_cpumask clear_tasks_mm_cpumask+0
Committer testing:
Before:
[root@quaco ~]# perf probe clear_tasks_mm_cpumask:0
Probe point 'clear_tasks_mm_cpumask' not found.
Error: Failed to add events.
[root@quaco ~]#
After:
[root@quaco ~]# perf probe clear_tasks_mm_cpumask:0
Added new event:
probe:clear_tasks_mm_cpumask (on clear_tasks_mm_cpumask)
You can now use it in all perf tools, such as:
perf record -e probe:clear_tasks_mm_cpumask -aR sleep 1
[root@quaco ~]#
Using it with 'perf trace':
[root@quaco ~]# perf trace -e probe:clear_tasks_mm_cpumask
Doesn't seem to be used in x86_64:
$ find . -name "*.c" | xargs grep clear_tasks_mm_cpumask
./kernel/cpu.c: * clear_tasks_mm_cpumask - Safely clear tasks' mm_cpumask for a CPU
./kernel/cpu.c:void clear_tasks_mm_cpumask(int cpu)
./arch/xtensa/kernel/smp.c: clear_tasks_mm_cpumask(cpu);
./arch/csky/kernel/smp.c: clear_tasks_mm_cpumask(cpu);
./arch/sh/kernel/smp.c: clear_tasks_mm_cpumask(cpu);
./arch/arm/kernel/smp.c: clear_tasks_mm_cpumask(cpu);
./arch/powerpc/mm/nohash/mmu_context.c: clear_tasks_mm_cpumask(cpu);
$ find . -name "*.h" | xargs grep clear_tasks_mm_cpumask
./include/linux/cpu.h:void clear_tasks_mm_cpumask(int cpu);
$ find . -name "*.S" | xargs grep clear_tasks_mm_cpumask
$
Fixes:
|
|
![]() |
07d3698578 |
perf probe: Fix wrong address verification
Since there are some DIE which has only ranges instead of the
combination of entrypc/highpc, address verification must use
dwarf_haspc() instead of dwarf_entrypc/dwarf_highpc.
Also, the ranges only DIE will have a partial code in different section
(e.g. unlikely code will be in text.unlikely as "FUNC.cold" symbol). In
that case, we can not use dwarf_entrypc() or die_entrypc(), because the
offset from original DIE can be a minus value.
Instead, this simply gets the symbol and offset from symtab.
Without this patch;
# perf probe -D clear_tasks_mm_cpumask:1
Failed to get entry address of clear_tasks_mm_cpumask
Error: Failed to add events.
And with this patch:
# perf probe -D clear_tasks_mm_cpumask:1
p:probe/clear_tasks_mm_cpumask clear_tasks_mm_cpumask+0
p:probe/clear_tasks_mm_cpumask_1 clear_tasks_mm_cpumask+5
p:probe/clear_tasks_mm_cpumask_2 clear_tasks_mm_cpumask+8
p:probe/clear_tasks_mm_cpumask_3 clear_tasks_mm_cpumask+16
p:probe/clear_tasks_mm_cpumask_4 clear_tasks_mm_cpumask+82
Committer testing:
I managed to reproduce the above:
[root@quaco ~]# perf probe -D clear_tasks_mm_cpumask:1
p:probe/clear_tasks_mm_cpumask _text+919968
p:probe/clear_tasks_mm_cpumask_1 _text+919973
p:probe/clear_tasks_mm_cpumask_2 _text+919976
[root@quaco ~]#
But then when trying to actually put the probe in place, it fails if I
use :0 as the offset:
[root@quaco ~]# perf probe -L clear_tasks_mm_cpumask | head -5
<clear_tasks_mm_cpumask@/usr/src/debug/kernel-5.2.fc30/linux-5.2.18-200.fc30.x86_64/kernel/cpu.c:0>
0 void clear_tasks_mm_cpumask(int cpu)
1 {
2 struct task_struct *p;
[root@quaco ~]# perf probe clear_tasks_mm_cpumask:0
Probe point 'clear_tasks_mm_cpumask' not found.
Error: Failed to add events.
[root@quaco
The next patch is needed to fix this case.
Fixes:
|
|
![]() |
1785fbb738 |
perf jevents: Fix resource leak in process_mapfile() and main()
There are memory leaks and file descriptor resource leaks in process_mapfile() and main(). Fix this by adding free(), fclose() and free_arch_std_events() on the error paths. Fixes: |
|
![]() |
91e2f539ee |
perf probe: Fix to show function entry line as probe-able
Fix die_walk_lines() to list the function entry line correctly. Since
the dwarf_entrypc() does not return the entry pc if the DIE has only
range attribute, __die_walk_funclines() fails to list the declaration
line (entry line) in that case.
To solve this issue, this introduces die_entrypc() which correctly
returns the entry PC (the first address range) even if the DIE has only
range attribute. With this fix die_walk_lines() shows the function entry
line is able to probe correctly.
Fixes:
|
|
![]() |
acb6a7047a |
perf probe: Walk function lines in lexical blocks
Since some inlined functions are in lexical blocks of given function, we
have to recursively walk through the DIE tree. Without this fix,
perf-probe -L can miss the inlined functions which is in a lexical block
(like if (..) { func() } case.)
However, even though, to walk the lines in a given function, we don't
need to follow the children DIE of inlined functions because those do
not have any lines in the specified function.
We need to walk though whole trees only if we walk all lines in a given
file, because an inlined function can include another inlined function
in the same file.
Fixes:
|
|
![]() |
b77afa1f81 |
perf probe: Fix to find range-only function instance
Fix die_is_func_instance() to find range-only function instance.
In some case, a function instance can be made without any low PC or
entry PC, but only with address ranges by optimization. (e.g. cold text
partially in "text.unlikely" section) To find such function instance, we
have to check the range attribute too.
Fixes:
|
|
![]() |
4bfbcf3ee1 |
perf kvm: Use evlist layer api when possible
No need for layer violations when a proper evlist api is available. Signed-off-by: Igor Lubashev <ilubashe@akamai.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lore.kernel.org/lkml/1571795693-23558-4-git-send-email-ilubashe@akamai.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
![]() |
b7dc21f546 |
perf tests: Fix a typo
Correct typo in comment: s/suck/stuck. Signed-off-by: Leo Yan <leo.yan@linaro.org> Reported-by: Will Deacon <will@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Florian Fainelli <f.fainelli@gmail.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lore.kernel.org/lkml/20191023083324.12093-1-leo.yan@linaro.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
![]() |
826100a7ce |
perf tools: Avoid a malloc() for array events
Use realloc() rather than malloc()+memcpy() to possibly avoid a memory allocation when appending array elements. Signed-off-by: Ian Rogers <irogers@google.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Andi Kleen <ak@linux.intel.com> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: Jin Yao <yao.jin@linux.intel.com> Cc: John Garry <john.garry@huawei.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Martin KaFai Lau <kafai@fb.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Song Liu <songliubraving@fb.com> Cc: Stephane Eranian <eranian@google.com> Cc: Yonghong Song <yhs@fb.com> Cc: bpf@vger.kernel.org Cc: clang-built-linux@googlegroups.com Cc: netdev@vger.kernel.org Link: http://lore.kernel.org/lkml/20191023005337.196160-6-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
![]() |
a26e47162d |
perf tools: Move ALLOC_LIST into a function
Having a YYABORT in a macro makes it hard to free memory for components of a rule. Separate the logic out. Signed-off-by: Ian Rogers <irogers@google.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Andi Kleen <ak@linux.intel.com> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: Jin Yao <yao.jin@linux.intel.com> Cc: John Garry <john.garry@huawei.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Martin KaFai Lau <kafai@fb.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Song Liu <songliubraving@fb.com> Cc: Stephane Eranian <eranian@google.com> Cc: Yonghong Song <yhs@fb.com> Cc: bpf@vger.kernel.org Cc: clang-built-linux@googlegroups.com Cc: netdev@vger.kernel.org Link: http://lore.kernel.org/lkml/20191023005337.196160-5-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
![]() |
2ccfb8bc21 |
perf evsel: Avoid close(-1)
In some weak fallback cases close can be called a lot with -1. Check for this case and avoid calling close then. This is mainly to shut up valgrind which complains about this case. Signed-off-by: Andi Kleen <ak@linux.intel.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: http://lore.kernel.org/lkml/20191020175202.32456-3-andi@firstfloor.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
![]() |
796c01a4bf |
perf evsel: Always preserve errno while cleaning up perf_event_open failures
In some cases when perf_event_open fails, it may do some closes to clean up. In special cases these closes can fail too, which overwrites the errno of the perf_event_open, which is then incorrectly reported. Save/restore errno around closes. Signed-off-by: Andi Kleen <ak@linux.intel.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: http://lore.kernel.org/lkml/20191020175202.32456-2-andi@firstfloor.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
![]() |
9d604aad4b |
perf cs-etm: Fix definition of macro TO_CS_QUEUE_NR
Macro TO_CS_QUEUE_NR definition has a typo, which uses 'trace_id_chan' as its parameter, this doesn't match with its definition body which uses 'trace_chan_id'. So renames the parameter to 'trace_chan_id'. It's luck to have a local variable 'trace_chan_id' in the function cs_etm__setup_queue(), even we wrongly define the macro TO_CS_QUEUE_NR, the local variable 'trace_chan_id' is used rather than the macro's parameter 'trace_id_chan'; so the compiler doesn't complain for this before. After renaming the parameter, it leads to a compiling error due cs_etm__setup_queue() has no variable 'trace_id_chan'. This patch uses the variable 'trace_chan_id' for the macro so that fixes the compiling error. Signed-off-by: Leo Yan <leo.yan@linaro.org> Reviewed-by: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Suzuki Poulouse <suzuki.poulose@arm.com> Cc: coresight ml <coresight@lists.linaro.org> Cc: linux-arm-kernel@lists.infradead.org Link: http://lore.kernel.org/lkml/20191021074808.25795-1-leo.yan@linaro.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
![]() |
a33d261198 |
perf llvm: Make .o saving a debug message, not an info one
Its a bit annoying to have that message, better make it a debug one. I.e. now this message will only appear when using '-v': [root@quaco tracebuffer]# trace -e bristot.c LLVM: dumping bristot.o ^C[root@quaco tracebuffer]# Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Daniel Bristot de Oliveira <bristot@redhat.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Luis Cláudio Gonçalves <lclaudio@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lkml.kernel.org/n/tip-o7jd4i7s66kosec5torubqps@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
|
![]() |
eeb399b531 |
perf record: Put a copy of kcore into the perf.data directory
Add a new 'perf record' option '--kcore' which will put a copy of /proc/kcore, kallsyms and modules into a perf.data directory. Note, that without the --kcore option, output goes to a file as previously. The tools' -o and -i options work with either a file name or directory name. Example: $ sudo perf record --kcore uname $ sudo tree perf.data perf.data ├── kcore_dir │ ├── kallsyms │ ├── kcore │ └── modules └── data $ sudo perf script -v build id event received for vmlinux: 1eaa285996affce2d74d8e66dcea09a80c9941de build id event received for [vdso]: 8bbaf5dc62a9b644b4d4e4539737e104e4a84541 Samples for 'cycles' event do not have CPU attribute set. Skipping 'cpu' field. Using CPUID GenuineIntel-6-8E-A Using perf.data/kcore_dir/kcore for kernel data Using perf.data/kcore_dir/kallsyms for symbols perf 19058 506778.423729: 1 cycles: ffffffffa2caa548 native_write_msr+0x8 (vmlinux) perf 19058 506778.423733: 1 cycles: ffffffffa2caa548 native_write_msr+0x8 (vmlinux) perf 19058 506778.423734: 7 cycles: ffffffffa2caa548 native_write_msr+0x8 (vmlinux) perf 19058 506778.423736: 117 cycles: ffffffffa2caa54a native_write_msr+0xa (vmlinux) perf 19058 506778.423738: 2092 cycles: ffffffffa2c9b7b0 native_apic_msr_write+0x0 (vmlinux) perf 19058 506778.423740: 37380 cycles: ffffffffa2f121d0 perf_event_addr_filters_exec+0x0 (vmlinux) uname 19058 506778.423751: 582673 cycles: ffffffffa303a407 propagate_protected_usage+0x147 (vmlinux) uname 19058 506778.423892: 2241841 cycles: ffffffffa2cae0c9 unwind_next_frame.part.5+0x79 (vmlinux) uname 19058 506778.424430: 2457397 cycles: ffffffffa3019232 check_memory_region+0x52 (vmlinux) Committer testing: # rm -rf perf.data* # perf record sleep 1 [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.024 MB perf.data (7 samples) ] # ls -l perf.data -rw-------. 1 root root 34772 Oct 21 11:08 perf.data # perf record --kcore uname Linux [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.024 MB perf.data (7 samples) ] ls[root@quaco ~]# ls -lad perf.data* drwx------. 3 root root 4096 Oct 21 11:08 perf.data -rw-------. 1 root root 34772 Oct 21 11:08 perf.data.old # perf evlist -v cycles: size: 112, { sample_period, sample_freq }: 4000, sample_type: IP|TID|TIME|PERIOD, read_format: ID, disabled: 1, inherit: 1, mmap: 1, comm: 1, freq: 1, enable_on_exec: 1, task: 1, precise_ip: 3, sample_id_all: 1, exclude_guest: 1, mmap2: 1, comm_exec: 1, ksymbol: 1, bpf_event: 1 # perf evlist -v -i perf.data/data cycles: size: 112, { sample_period, sample_freq }: 4000, sample_type: IP|TID|TIME|PERIOD, read_format: ID, disabled: 1, inherit: 1, mmap: 1, comm: 1, freq: 1, enable_on_exec: 1, task: 1, precise_ip: 3, sample_id_all: 1, exclude_guest: 1, mmap2: 1, comm_exec: 1, ksymbol: 1, bpf_event: 1 # Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Reviewed-by: Jiri Olsa <jolsa@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Link: http://lore.kernel.org/lkml/20191004083121.12182-6-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |