The callchain rbtree is rebuilt periodically, so it needs to
reinitialize the root everytime. Otherwise it can be stuck in the
rbtree insertion with stale pointers.
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Kan Liang <kan.liang@intel.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/r/1448521700-32062-1-git-send-email-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
If user requested to hide unresolved entries, skip unresolved callchains
as well as hist entries.
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Kan Liang <kan.liang@intel.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/r/1448521700-32062-3-git-send-email-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The commit 05c8d802fa ("perf probe: Fix to free temporal Dwarf_Frame")
tried to fix the memory leak of Dwarf_Frame, but it released the frame
at wrong point. Since the dwarf_frame_cfa(frame, &pf->fb_ops, &nops) can
return an address inside the frame data structure to pf->fb_ops, we can
not release the frame before using pf->fb_ops.
This reverts the commit and releases the frame afterwards (right before
returning from call_probe_finder) correctly.
Reported-and-Tested-by: Arnaldo Carvalho de Melo <acme@kernel.org>
Reported-by: Michael Petlan <mpetlan@redhat.com>
Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Wang Nan <wangnan0@huawei.com>
Fixes: 05c8d802fa ("perf probe: Fix to free temporal Dwarf_Frame")
LPU-Reference: 20151125103432.1473.31009.stgit@localhost.localdomain
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
As reported by Milian, currently for DWARF unwind (both libdw and
libunwind) we display callchain in callee order only.
Adding the support to follow callchain order setup to libdw DWARF
unwinder, so we could get following output for report:
$ perf record --call-graph dwarf ls
...
$ perf report --no-children --stdio
21.12% ls libc-2.21.so [.] __strcoll_l
|
---__strcoll_l
mpsort_with_tmp
mpsort_with_tmp
mpsort_with_tmp
sort_files
main
__libc_start_main
_start
$ perf report --stdio --no-children -g caller
21.12% ls libc-2.21.so [.] __strcoll_l
|
---_start
__libc_start_main
main
sort_files
mpsort_with_tmp
mpsort_with_tmp
mpsort_with_tmp
__strcoll_l
Reported-and-Tested-by: Milian Wolff <milian.wolff@kdab.com>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Wang Nan <wangnan0@huawei.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jan Kratochvil <jkratoch@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20151119130119.GA26617@krava.brq.redhat.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
As reported by Milian, currently for DWARF unwind (both libdw and
libunwind) we display callchain in callee order only.
Adding the support to follow callchain order setup to libunwind DWARF
unwinder, so we could get following output for report:
$ perf record --call-graph dwarf ls
...
$ perf report --no-children --stdio
39.26% ls libc-2.21.so [.] __strcoll_l
|
---__strcoll_l
mpsort_with_tmp
mpsort_with_tmp
sort_files
main
__libc_start_main
_start
0
$ perf report -g caller --no-children --stdio
...
39.26% ls libc-2.21.so [.] __strcoll_l
|
---0
_start
__libc_start_main
main
sort_files
mpsort_with_tmp
mpsort_with_tmp
__strcoll_l
Based-on-patch-by: Milian Wolff <milian.wolff@kdab.com>
Reported-and-Tested-by: Milian Wolff <milian.wolff@kdab.com>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Tested-by: Wang Nan <wangnan0@huawei.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/r/20151118075247.GA5416@krava.brq.redhat.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Moving initial entry call into get_entries function so all entries
processing is on one place. It will be useful for next change that adds
ordering logic.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Milian Wolff <milian.wolff@kdab.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/r/1447772739-18471-2-git-send-email-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The flat callchain mode is to print all chains in a single, simple
hierarchy so make it easy to see.
Currently perf report --tui doesn't show flat callchains properly. With
flat callchains, only leaf nodes are added to the final rbtree so it
should show entries in parent nodes. To do that, add parent_val list to
struct callchain_node and show them along with the (normal) val list.
For example, consider following callchains with '-g graph'.
$ perf report -g graph
- 39.93% swapper [kernel.vmlinux] [k] intel_idle
intel_idle
cpuidle_enter_state
cpuidle_enter
call_cpuidle
- cpu_startup_entry
28.63% start_secondary
- 11.30% rest_init
start_kernel
x86_64_start_reservations
x86_64_start_kernel
Before:
$ perf report -g flat
- 39.93% swapper [kernel.vmlinux] [k] intel_idle
28.63% start_secondary
- 11.30% rest_init
start_kernel
x86_64_start_reservations
x86_64_start_kernel
After:
$ perf report -g flat
- 39.93% swapper [kernel.vmlinux] [k] intel_idle
- 28.63% intel_idle
cpuidle_enter_state
cpuidle_enter
call_cpuidle
cpu_startup_entry
start_secondary
- 11.30% intel_idle
cpuidle_enter_state
cpuidle_enter
call_cpuidle
cpu_startup_entry
start_kernel
x86_64_start_reservations
x86_64_start_kernel
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Tested-by: Brendan Gregg <brendan.d.gregg@gmail.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Kan Liang <kan.liang@intel.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1447047946-1691-8-git-send-email-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Now -g/--call-graph option supports how to display callchain values.
Possible values are 'percent', 'period' and 'count'. The percent is
same as before and it's the default behavior. The period displays the
raw period value rather than the percentage. The count displays the
number of occurrences.
$ perf report --no-children --stdio -g percent
...
39.93% swapper [kernel.vmlinux] [k] intel_idel
|
---intel_idle
cpuidle_enter_state
cpuidle_enter
call_cpuidle
cpu_startup_entry
|
|--28.63%-- start_secondary
|
--11.30%-- rest_init
$ perf report --no-children --show-total-period --stdio -g period
...
39.93% 13018705 swapper [kernel.vmlinux] [k] intel_idel
|
---intel_idle
cpuidle_enter_state
cpuidle_enter
call_cpuidle
cpu_startup_entry
|
|--9334403-- start_secondary
|
--3684302-- rest_init
$ perf report --no-children --show-nr-samples --stdio -g count
...
39.93% 80 swapper [kernel.vmlinux] [k] intel_idel
|
---intel_idle
cpuidle_enter_state
cpuidle_enter
call_cpuidle
cpu_startup_entry
|
|--57-- start_secondary
|
--23-- rest_init
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Brendan Gregg <brendan.d.gregg@gmail.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Kan Liang <kan.liang@intel.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1447047946-1691-6-git-send-email-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
It's to track the count of occurrences of the callchains.
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Brendan Gregg <brendan.d.gregg@gmail.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Kan Liang <kan.liang@intel.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1447047946-1691-5-git-send-email-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
This is a preparation to support for printing other type of callchain
value like count or period.
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Tested-by: Brendan Gregg <brendan.d.gregg@gmail.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Kan Liang <kan.liang@intel.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1447047946-1691-4-git-send-email-namhyung@kernel.org
[ renamed new _sprintf_ operation to _scnprintf_ ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Add new call chain option (-g) 'folded' to print callchains in a line.
The callchains are separated by semicolons, and preceded by (absolute)
percent values and a space.
For example, the following 20 lines can be printed in 3 lines with the
folded output mode:
$ perf report -g flat --no-children | grep -v ^# | head -20
60.48% swapper [kernel.vmlinux] [k] intel_idle
54.60%
intel_idle
cpuidle_enter_state
cpuidle_enter
call_cpuidle
cpu_startup_entry
start_secondary
5.88%
intel_idle
cpuidle_enter_state
cpuidle_enter
call_cpuidle
cpu_startup_entry
rest_init
start_kernel
x86_64_start_reservations
x86_64_start_kernel
$ perf report -g folded --no-children | grep -v ^# | head -3
60.48% swapper [kernel.vmlinux] [k] intel_idle
54.60% intel_idle;cpuidle_enter_state;cpuidle_enter;call_cpuidle;cpu_startup_entry;start_secondary
5.88% intel_idle;cpuidle_enter_state;cpuidle_enter;call_cpuidle;cpu_startup_entry;rest_init;start_kernel;x86_64_start_reservations;x86_64_start_kernel
This mode is supported only for --stdio now and intended to be used by
some scripts like in FlameGraphs[1]. Support for other UI might be
added later.
[1] http://www.brendangregg.com/FlameGraphs/cpuflamegraphs.html
Requested-and-Tested-by: Brendan Gregg <brendan.d.gregg@gmail.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Kan Liang <kan.liang@intel.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1447047946-1691-2-git-send-email-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Fix machine__findnew_module_map to drop the reference to the dso because
it is already referenced by both machine__findnew_module_dso() and
map__new2().
Refcnt debugger shows:
==== [1] ====
Unreclaimed dso: 0x1ffd980
Refcount +1 => 1 at
./perf(dso__new+0x1ff) [0x4a62df]
./perf(__dsos__addnew+0x29) [0x4a6e19]
./perf() [0x4b8b91]
./perf(modules__parse+0xfc) [0x4a9d5c]
./perf() [0x4b8460]
./perf(machine__create_kernel_maps+0x150) [0x4bb550]
./perf(machine__new_host+0xfa) [0x4bb75a]
./perf(init_probe_symbol_maps+0x93) [0x506623]
./perf() [0x455ffa]
./perf(cmd_probe+0x6c) [0x4566bc]
./perf() [0x47abc5]
./perf(main+0x610) [0x421f90]
/lib64/libc.so.6(__libc_start_main+0xf5) [0x7f1345a8eaf5]
./perf() [0x4220a9]
This map_groups__insert(0x4b8b91) already gets a reference to the new
dso:
----
eu-addr2line -e ./perf -f 0x4b8b91
map_groups__insert inlined at util/machine.c:586 in
machine__create_module
util/map.h:207
----
So this dso refcnt will be released when map_groups gets released.
[snip]
Refcount +1 => 2 at
./perf(dso__get+0x34) [0x4a65f4]
./perf() [0x4b8b35]
./perf(modules__parse+0xfc) [0x4a9d5c]
./perf() [0x4b8460]
./perf(machine__create_kernel_maps+0x150) [0x4bb550]
./perf(machine__new_host+0xfa) [0x4bb75a]
./perf(init_probe_symbol_maps+0x93) [0x506623]
./perf() [0x455ffa]
./perf(cmd_probe+0x6c) [0x4566bc]
./perf() [0x47abc5]
./perf(main+0x610) [0x421f90]
/lib64/libc.so.6(__libc_start_main+0xf5) [0x7f1345a8eaf5]
./perf() [0x4220a9]
Here, machine__findnew_module_dso(0x4b8b35) gets the dso (and stores it
in a local variable):
----
# eu-addr2line -e ./perf -f 0x4b8b35
machine__findnew_module_dso inlined at util/machine.c:578 in
machine__create_module
util/machine.c:514
----
Refcount +1 => 3 at
./perf(dso__get+0x34) [0x4a65f4]
./perf(map__new2+0x76) [0x4be1c6]
./perf() [0x4b8b4f]
./perf(modules__parse+0xfc) [0x4a9d5c]
./perf() [0x4b8460]
./perf(machine__create_kernel_maps+0x150) [0x4bb550]
./perf(machine__new_host+0xfa) [0x4bb75a]
./perf(init_probe_symbol_maps+0x93) [0x506623]
./perf() [0x455ffa]
./perf(cmd_probe+0x6c) [0x4566bc]
./perf() [0x47abc5]
./perf(main+0x610) [0x421f90]
/lib64/libc.so.6(__libc_start_main+0xf5) [0x7f1345a8eaf5]
./perf() [0x4220a9]
But also map__new2() gets the dso which will be put when the map is
released.
So, we have to drop the constructor reference obtained in
machine__findnew_module_dso().
Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/20151118064035.30709.58824.stgit@localhost.localdomain
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
__dsos__addnew should drop the constructor reference to dso after adding
it to the list, because __dsos__add() will get a reference that will be
kept while it is in the list.
This fixes DSO leaks when entries are removed to the list and the refcount
never gets to zero.
Refcnt debugger shows:
==== [0] ====
Unreclaimed dso: 0x2fccab0
Refcount +1 => 1 at
./perf(dso__new+0x1ff) [0x4a62df]
./perf(__dsos__addnew+0x29) [0x4a6e19]
./perf(dsos__findnew+0xd1) [0x4a7281]
./perf(machine__findnew_kernel+0x27) [0x4a5e17]
./perf() [0x4b8df2]
./perf(machine__create_kernel_maps+0x28) [0x4bb528]
./perf(machine__new_host+0xfa) [0x4bb84a]
./perf(init_probe_symbol_maps+0x93) [0x506713]
./perf() [0x455ffa]
./perf(cmd_probe+0x6c) [0x4566bc]
./perf() [0x47abc5]
./perf(main+0x610) [0x421f90]
/lib64/libc.so.6(__libc_start_main+0xf5) [0x7f46df132af5]
./perf() [0x4220a9]
Refcount +1 => 2 at
./perf(__dsos__addnew+0xfb) [0x4a6eeb]
./perf(dsos__findnew+0xd1) [0x4a7281]
./perf(machine__findnew_kernel+0x27) [0x4a5e17]
./perf() [0x4b8df2]
./perf(machine__create_kernel_maps+0x28) [0x4bb528]
./perf(machine__new_host+0xfa) [0x4bb84a]
./perf(init_probe_symbol_maps+0x93) [0x506713]
./perf() [0x455ffa]
./perf(cmd_probe+0x6c) [0x4566bc]
./perf() [0x47abc5]
./perf(main+0x610) [0x421f90]
/lib64/libc.so.6(__libc_start_main+0xf5) [0x7f46df132af5]
./perf() [0x4220a9]
Refcount +1 => 3 at
./perf(dsos__findnew+0x7e) [0x4a722e]
./perf(machine__findnew_kernel+0x27) [0x4a5e17]
./perf() [0x4b8df2]
./perf(machine__create_kernel_maps+0x28) [0x4bb528]
./perf(machine__new_host+0xfa) [0x4bb84a]
./perf(init_probe_symbol_maps+0x93) [0x506713]
./perf() [0x455ffa]
./perf(cmd_probe+0x6c) [0x4566bc]
./perf() [0x47abc5]
./perf(main+0x610) [0x421f90]
/lib64/libc.so.6(__libc_start_main+0xf5) [0x7f46df132af5]
./perf() [0x4220a9]
[snip]
Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/20151118064031.30709.81460.stgit@localhost.localdomain
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Fix dso__load_sym to put the map object which is already
insterted to kmaps.
Refcnt debugger shows
==== [0] ====
Unreclaimed map: 0x39113e0
Refcount +1 => 1 at
./perf(map__new2+0xb5) [0x4be155]
./perf(dso__load_sym+0xee1) [0x503461]
./perf(dso__load_vmlinux+0xbf) [0x4aa6df]
./perf(dso__load_vmlinux_path+0x8c) [0x4aa83c]
./perf() [0x50528a]
./perf(convert_perf_probe_events+0xd79) [0x50ac29]
./perf() [0x45600f]
./perf(cmd_probe+0x6c) [0x4566bc]
./perf() [0x47abc5]
./perf(main+0x610) [0x421f90]
/lib64/libc.so.6(__libc_start_main+0xf5) [0x7f152368baf5]
./perf() [0x4220a9]
Refcount +1 => 2 at
./perf(maps__insert+0x9a) [0x4bfffa]
./perf(dso__load_sym+0xf89) [0x503509]
./perf(dso__load_vmlinux+0xbf) [0x4aa6df]
./perf(dso__load_vmlinux_path+0x8c) [0x4aa83c]
./perf() [0x50528a]
./perf(convert_perf_probe_events+0xd79) [0x50ac29]
./perf() [0x45600f]
./perf(cmd_probe+0x6c) [0x4566bc]
./perf() [0x47abc5]
./perf(main+0x610) [0x421f90]
/lib64/libc.so.6(__libc_start_main+0xf5) [0x7f152368baf5]
./perf() [0x4220a9]
Refcount -1 => 1 at
./perf(map_groups__exit+0x94) [0x4bed04]
./perf(machine__delete+0xb0) [0x4b9300]
./perf(exit_probe_symbol_maps+0x28) [0x506608]
./perf() [0x45628a]
./perf(cmd_probe+0x6c) [0x4566bc]
./perf() [0x47abc5]
./perf(main+0x610) [0x421f90]
/lib64/libc.so.6(__libc_start_main+0xf5) [0x7f152368baf5]
./perf() [0x4220a9]
This means that the dso__load_sym calls map__new2 and maps_insert, both
of them bump the map refcount, but map_groups__exit will drop just one
reference.
Fix it by dropping the refcount after inserting it into kmaps.
Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/20151118064026.30709.50038.stgit@localhost.localdomain
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Since system_path() returns malloc'd string if given path is not an
absolute path, perf_exec_path() sometimes returns a static string and
sometimes returns a malloc'd string depending on the environment
variables or command options.
This may cause a memory leak because the caller can not unconditionally
free the returned string.
This fixes perf_exec_path() and system_path() to always return a
malloc'd string, so the caller can always free it.
Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/20151119060453.14210.65666.stgit@localhost.localdomain
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Actually machine__exit forgot to call machine__destroy_kernel_maps.
This fixes some memory leaks on map as below.
Without this fix.
----
./perf probe vfs_read
Added new event:
probe:vfs_read (on vfs_read)
You can now use it in all perf tools, such as:
perf record -e probe:vfs_read -aR sleep 1
REFCNT: BUG: Unreclaimed objects found.
REFCNT: Total 4 objects are not reclaimed.
To see all backtraces, rerun with -v option
----
With this fix.
----
./perf probe vfs_read
Added new event:
probe:vfs_read (on vfs_read)
You can now use it in all perf tools, such as:
perf record -e probe:vfs_read -aR sleep 1
REFCNT: BUG: Unreclaimed objects found.
REFCNT: Total 2 objects are not reclaimed.
To see all backtraces, rerun with -v option
----
Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/20151118064024.30709.43577.stgit@localhost.localdomain
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Fix machine object to drop the reference to the map object after it
inserted it into machine->kmaps.
refcnt debugger shows what happened:
----
==== [2] ====
Unreclaimed map: 0x346f750
Refcount +1 => 1 at
./perf(map__new2+0xb5) [0x4bdea5]
./perf() [0x4b8aaf]
./perf(modules__parse+0xfc) [0x4a9cbc]
./perf() [0x4b83c0]
./perf(machine__create_kernel_maps+0x148) [0x4bb208]
./perf(machine__new_host+0xfa) [0x4bb3fa]
./perf(init_probe_symbol_maps+0x93) [0x5062b3]
./perf() [0x455ffa]
./perf(cmd_probe+0x6c) [0x4566bc]
./perf() [0x47abc5]
./perf(main+0x610) [0x421f90]
/lib64/libc.so.6(__libc_start_main+0xf5) [0x7f5373899af5]
./perf() [0x4220a9]
Refcount +1 => 2 at
./perf(maps__insert+0x9a) [0x4bfd4a]
./perf() [0x4b8acb]
./perf(modules__parse+0xfc) [0x4a9cbc]
./perf() [0x4b83c0]
./perf(machine__create_kernel_maps+0x148) [0x4bb208]
./perf(machine__new_host+0xfa) [0x4bb3fa]
./perf(init_probe_symbol_maps+0x93) [0x5062b3]
./perf() [0x455ffa]
./perf(cmd_probe+0x6c) [0x4566bc]
./perf() [0x47abc5]
./perf(main+0x610) [0x421f90]
/lib64/libc.so.6(__libc_start_main+0xf5) [0x7f5373899af5]
./perf() [0x4220a9]
Refcount -1 => 1 at
./perf(map_groups__exit+0x94) [0x4bea54]
./perf(machine__delete+0x3d) [0x4b91ed]
./perf(exit_probe_symbol_maps+0x28) [0x506358]
./perf() [0x45628a]
./perf(cmd_probe+0x6c) [0x4566bc]
./perf() [0x47abc5]
./perf(main+0x610) [0x421f90]
/lib64/libc.so.6(__libc_start_main+0xf5) [0x7f5373899af5]
./perf() [0x4220a9]
----
This pattern clearly shows that the refcnt of the map is acquired twice
by map__new2 and maps__insert but released onlu once at
map_groups__exit, when we purge its maps rbtree.
Since maps__insert already reference counted the map, we have to drop
the constructor (map__new2) reference count right after inserting it.
These happened in machine__findnew_module_map, as below.
----
# eu-addr2line -e ./perf -f 0x4b8aaf
machine__findnew_module_map inlined at util/machine.c:1046
in machine__create_module
util/machine.c:582
# eu-addr2line -e ./perf -f 0x4b8acb
map_groups__insert inlined at util/machine.c:585
in machine__create_module
util/map.h:208
----
(note that both are at util/machine.c:58X which is
machine__findnew_module_map)
Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/20151118064020.30709.40499.stgit@localhost.localdomain
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Since dwarf_cfi_addrframe returns malloc'd Dwarf_Frame object, it has to
be freed after it is used.
Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/20151118064011.30709.65674.stgit@localhost.localdomain
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
This patch allows creating only one BPF program for different
'probe_trace_event'(tev) entries generated by one
'perf_probe_event'(pev) if their prologues are identical.
This is done by comparing the argument list of different tev instances,
and the maps type of prologue and tev using a mapping array. This patch
utilizes qsort to sort the tevs. After sorting, tevs with identical
argument lists will be grouped together.
Test result:
Sample BPF program:
#define SEC(NAME) __attribute__((section(NAME), used))
SEC("inlines=no;"
"func=SyS_dup? oldfd")
int func(void *ctx)
{
return 1;
}
It would probe at SyS_dup2 and SyS_dup3, obtaining oldfd as its
argument.
The following cmdline shows a BPF program being loaded into the kernel
by perf:
# perf record -e ./test_bpf_arg.c sleep 4 & sleep 1 && ls /proc/$!/fd/ -l | grep bpf-prog
Before this patch:
# perf record -e ./test_bpf_arg.c sleep 4 & sleep 1 && ls /proc/$!/fd/ -l | grep bpf-prog
[1] 24858
lrwx------ 1 root root 64 Nov 14 04:09 3 -> anon_inode:bpf-prog
lrwx------ 1 root root 64 Nov 14 04:09 4 -> anon_inode:bpf-prog
...
After this patch:
# perf record -e ./test_bpf_arg.c sleep 4 & sleep 1 && ls /proc/$!/fd/ -l | grep bpf-prog
[1] 25699
lrwx------ 1 root root 64 Nov 14 04:10 3 -> anon_inode:bpf-prog
...
Signed-off-by: Wang Nan <wangnan0@huawei.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Zefan Li <lizefan@huawei.com>
Cc: pi3orama@163.com
Link: http://lkml.kernel.org/r/1447749170-175898-3-git-send-email-wangnan0@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
This patch generates a prologue for each 'struct probe_trace_event' for
fetching arguments for BPF programs.
After bpf__probe(), iterate over each program to check whether prologues are
required. If none of the 'struct perf_probe_event' programs will attach to have
at least one argument, simply skip preprocessor hooking. For those who a
prologue is required, call bpf__gen_prologue() and paste the original
instruction after the prologue.
Signed-off-by: Wang Nan <wangnan0@huawei.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Zefan Li <lizefan@huawei.com>
Cc: pi3orama@163.com
Link: http://lkml.kernel.org/r/1447675815-166222-12-git-send-email-wangnan0@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
This patch generates a prologue for a BPF program which fetches arguments for
it. With this patch, the program can have arguments as follow:
SEC("lock_page=__lock_page page->flags")
int lock_page(struct pt_regs *ctx, int err, unsigned long flags)
{
return 1;
}
This patch passes at most 3 arguments from r3, r4 and r5. r1 is still the ctx
pointer. r2 is used to indicate if dereferencing was done successfully.
This patch uses r6 to hold ctx (struct pt_regs) and r7 to hold stack pointer
for result. Result of each arguments first store on stack:
low address
BPF_REG_FP - 24 ARG3
BPF_REG_FP - 16 ARG2
BPF_REG_FP - 8 ARG1
BPF_REG_FP
high address
Then loaded into r3, r4 and r5.
The output prologue for offn(...off2(off1(reg)))) should be:
r6 <- r1 // save ctx into a callee saved register
r7 <- fp
r7 <- r7 - stack_offset // pointer to result slot
/* load r3 with the offset in pt_regs of 'reg' */
(r7) <- r3 // make slot valid
r3 <- r3 + off1 // prepare to read unsafe pointer
r2 <- 8
r1 <- r7 // result put onto stack
call probe_read // read unsafe pointer
jnei r0, 0, err // error checking
r3 <- (r7) // read result
r3 <- r3 + off2 // prepare to read unsafe pointer
r2 <- 8
r1 <- r7
call probe_read
jnei r0, 0, err
...
/* load r2, r3, r4 from stack */
goto success
err:
r2 <- 1
/* load r3, r4, r5 with 0 */
goto usercode
success:
r2 <- 0
usercode:
r1 <- r6 // restore ctx
// original user code
If all of arguments reside in register (dereferencing is not
required), gen_prologue_fastpath() will be used to create
fast prologue:
r3 <- (r1 + offset of reg1)
r4 <- (r1 + offset of reg2)
r5 <- (r1 + offset of reg3)
r2 <- 0
P.S.
eBPF calling convention is defined as:
* r0 - return value from in-kernel function, and exit value
for eBPF program
* r1 - r5 - arguments from eBPF program to in-kernel function
* r6 - r9 - callee saved registers that in-kernel function will
preserve
* r10 - read-only frame pointer to access stack
Committer note:
At least testing if it builds and loads:
# cat test_probe_arg.c
struct pt_regs;
__attribute__((section("lock_page=__lock_page page->flags"), used))
int func(struct pt_regs *ctx, int err, unsigned long flags)
{
return 1;
}
char _license[] __attribute__((section("license"), used)) = "GPL";
int _version __attribute__((section("version"), used)) = 0x40300;
# perf record -e ./test_probe_arg.c usleep 1
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.016 MB perf.data ]
# perf evlist
perf_bpf_probe:lock_page
#
Signed-off-by: He Kuang <hekuang@huawei.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Wang Nan <wangnan0@huawei.com>
Cc: Zefan Li <lizefan@huawei.com>
Cc: pi3orama@163.com
Link: http://lkml.kernel.org/r/1447675815-166222-11-git-send-email-wangnan0@huawei.com
Signed-off-by: Wang Nan <wangnan0@huawei.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
By extending the syntax of BPF object section names, this patch allows users to
config probing options like what they can do in 'perf probe'.
The error message in 'perf probe' is also updated.
Test result:
For following BPF file test_probe_glob.c:
# cat test_probe_glob.c
__attribute__((section("inlines=no;func=SyS_dup?"), used))
int func(void *ctx)
{
return 1;
}
char _license[] __attribute__((section("license"), used)) = "GPL";
int _version __attribute__((section("version"), used)) = 0x40300;
#
# ./perf record -e ./test_probe_glob.c ls /
...
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.013 MB perf.data ]
# ./perf evlist
perf_bpf_probe:func_1
perf_bpf_probe:func
After changing "inlines=no" to "inlines=yes":
# ./perf record -e ./test_probe_glob.c ls /
...
[ perf record: Woken up 2 times to write data ]
[ perf record: Captured and wrote 0.013 MB perf.data ]
# ./perf evlist
perf_bpf_probe:func_3
perf_bpf_probe:func_2
perf_bpf_probe:func_1
perf_bpf_probe:func
Then test 'force':
Use following program:
# cat test_probe_force.c
__attribute__((section("func=sys_write"), used))
int funca(void *ctx)
{
return 1;
}
__attribute__((section("force=yes;func=sys_write"), used))
int funcb(void *ctx)
{
return 1;
}
char _license[] __attribute__((section("license"), used)) = "GPL";
int _version __attribute__((section("version"), used)) = 0x40300;
#
# perf record -e ./test_probe_force.c usleep 1
Error: event "func" already exists.
Hint: Remove existing event by 'perf probe -d'
or force duplicates by 'perf probe -f'
or set 'force=yes' in BPF source.
event syntax error: './test_probe_force.c'
\___ Probe point exist. Try 'perf probe -d "*"' and set 'force=yes'
(add -v to see detail)
...
Then replace 'force=no' to 'force=yes':
# vim test_probe_force.c
# perf record -e ./test_probe_force.c usleep 1
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.017 MB perf.data ]
# perf evlist
perf_bpf_probe:func_1
perf_bpf_probe:func
#
Signed-off-by: Wang Nan <wangnan0@huawei.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Zefan Li <lizefan@huawei.com>
Cc: pi3orama@163.com
Link: http://lkml.kernel.org/r/1447675815-166222-7-git-send-email-wangnan0@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
By extending the syntax of BPF object section names, this patch allows
users to attach BPF programs to symbols in modules. For example:
SEC("module=i915;"
"parse_cmds=i915_parse_cmds")
int parse_cmds(void *ctx)
{
return 1;
}
The implementation is very simple: like what 'perf probe' does, for module,
fill 'uprobe' field in 'struct perf_probe_event'. Other parts will be done
automatically.
Signed-off-by: Wang Nan <wangnan0@huawei.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Brendan Gregg <brendan.d.gregg@gmail.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: David Ahern <dsahern@gmail.com>
Cc: He Kuang <hekuang@huawei.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kaixu Xia <xiakaixu@huawei.com>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Zefan Li <lizefan@huawei.com>
Cc: pi3orama@163.com
Link: http://lkml.kernel.org/r/1447675815-166222-5-git-send-email-wangnan0@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
This patch adds a new syntax to the BPF object section name to support
probing at uprobe event. Now we can use BPF program like this:
SEC(
"exec=/lib64/libc.so.6;"
"libcwrite=__write"
)
int libcwrite(void *ctx)
{
return 1;
}
Where, in section name of a program, before the main config string, we
can use 'key=value' style options. Now the only option key is "exec",
for uprobes.
Signed-off-by: Wang Nan <wangnan0@huawei.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Zefan Li <lizefan@huawei.com>
Cc: pi3orama@163.com
Link: http://lkml.kernel.org/r/1447675815-166222-4-git-send-email-wangnan0@huawei.com
[ Changed the separator from \n to ; ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
That will contain more string functions with counterparts, sometimes
verbatim copies, in the kernel.
Acked-by: Wang Nan <wangnan0@huawei.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: http://lkml.kernel.org/n/tip-rah6g97kn21vfgmlramorz6o@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
When probing with a glob, errors in add_probe_trace_event() won't be
passed to debuginfo__find_trace_events() because it would be modified by
probe_point_search_cb(). It causes a segfault if perf fails to find an
argument for a probe point matched by the glob. For example:
# ./perf probe -v -n 'SyS_dup? oldfd'
probe-definition(0): SyS_dup? oldfd
symbol:SyS_dup? file:(null) line:0 offset:0 return:0 lazy:(null)
parsing arg: oldfd into oldfd
1 arguments
Looking at the vmlinux_path (7 entries long)
Using /lib/modules/4.3.0-rc4+/build/vmlinux for symbols
Open Debuginfo file: /lib/modules/4.3.0-rc4+/build/vmlinux
Try to find probe point from debuginfo.
Matched function: SyS_dup3
found inline addr: 0xffffffff812095c0
Probe point found: SyS_dup3+0
Searching 'oldfd' variable in context.
Converting variable oldfd into trace event.
oldfd type is long int.
found inline addr: 0xffffffff812096d4
Probe point found: SyS_dup2+36
Searching 'oldfd' variable in context.
Failed to find 'oldfd' in this function.
Matched function: SyS_dup3
Probe point found: SyS_dup3+0
Searching 'oldfd' variable in context.
Converting variable oldfd into trace event.
oldfd type is long int.
Matched function: SyS_dup2
Probe point found: SyS_dup2+0
Searching 'oldfd' variable in context.
Converting variable oldfd into trace event.
oldfd type is long int.
Found 4 probe_trace_events.
Opening /sys/kernel/debug/tracing//kprobe_events write=1
Writing event: p:probe/SyS_dup3 _text+2135488 oldfd=%di:s64
Segmentation fault (core dumped)
#
This patch ensures that add_probe_trace_event() doesn't touches
tf->ntevs and tf->tevs if those functions fail.
After the patch:
# perf probe 'SyS_dup? oldfd'
Failed to find 'oldfd' in this function.
Added new events:
probe:SyS_dup3 (on SyS_dup? with oldfd)
probe:SyS_dup3_1 (on SyS_dup? with oldfd)
probe:SyS_dup2 (on SyS_dup? with oldfd)
You can now use it in all perf tools, such as:
perf record -e probe:SyS_dup2 -aR sleep 1
Signed-off-by: Wang Nan <wangnan0@huawei.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Zefan Li <lizefan@huawei.com>
Cc: pi3orama@163.com
Link: http://lkml.kernel.org/r/1447417761-156094-3-git-send-email-wangnan0@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Fix memory leaking on the debuginfo__find_trace_events() failure path
which frees an array of probe_trace_events but doesn't clears all the
allocated sub-structures and strings.
So, before doing zfree(tevs), clear all the array elements which may
have allocated resources.
Reported-by: Wang Nan <wangnan0@huawei.com>
Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Zefan Li <lizefan@huawei.com>
Cc: pi3orama@163.com
Link: http://lkml.kernel.org/r/1447417761-156094-2-git-send-email-wangnan0@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
'perf buildid-list' processes events to determine hits (i.e. with-hits
option). That may not work if events are not sorted in order. i.e. MMAP
events must be processed before the samples that depend on them so that
sample processing can 'hit' the DSO to which the MMAP refers.
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Link: http://lkml.kernel.org/r/1447408112-1920-3-git-send-email-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Commit 4598a0a6d2 ("perf symbols: Improve DSO long names lookup speed
with rbtree") Added a tree to lookup dsos by long name. That tree gets
corrupted whenever a dso long name is changed because the tree is not
updated.
One effect of that is buildid-list does not work with the 'with-hits'
option because dso lookup fails and results in two structs for the same
dso. The first has the buildid but no hits, the second has hits but no
buildid. e.g.
Before:
$ tools/perf/perf record ls
arch certs CREDITS Documentation firmware include
ipc Kconfig lib Makefile net REPORTING-BUGS
scripts sound usr block COPYING crypto
drivers fs init Kbuild kernel MAINTAINERS
mm README samples security tools virt
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.012 MB perf.data (11 samples) ]
$ tools/perf/perf buildid-list
574da826c66538a8d9060d393a8866289bd06005 [kernel.kallsyms]
30c94dc66a1fe95180c3d68d2b89e576d5ae213c /lib/x86_64-linux-gnu/libc-2.19.so
$ tools/perf/perf buildid-list -H
574da826c66538a8d9060d393a8866289bd06005 [kernel.kallsyms]
0000000000000000000000000000000000000000 /lib/x86_64-linux-gnu/libc-2.19.so
After:
$ tools/perf/perf buildid-list -H
574da826c66538a8d9060d393a8866289bd06005 [kernel.kallsyms]
30c94dc66a1fe95180c3d68d2b89e576d5ae213c /lib/x86_64-linux-gnu/libc-2.19.so
The fix is to record the root of the tree on the dso so that
dso__set_long_name() can update the tree when the long name changes.
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Don Zickus <dzickus@redhat.com>
Cc: Douglas Hatch <doug.hatch@hp.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Scott J Norton <scott.norton@hp.com>
Cc: Waiman Long <Waiman.Long@hp.com>
Fixes: 4598a0a6d2 ("perf symbols: Improve DSO long names lookup speed with rbtree")
Link: http://lkml.kernel.org/r/1447408112-1920-2-git-send-email-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
When the root user tries to read a file owned by some other user we get:
# ls -la perf.data
-rw-------. 1 acme acme 20032 Nov 12 15:50 perf.data
# perf report
File perf.data not owned by current user or root (use -f to override)
# perf report -f | grep -v ^# | head -2
30.96% ls [kernel.vmlinux] [k] do_set_pte
28.24% ls libc-2.20.so [.] intel_check_word
#
That wasn't happening when the symbol code tried to read a JIT map,
where the same check was done but no forcing was possible, fix it.
Reported-by: Brendan Gregg <brendan.d.gregg@gmail.com>
Tested-by: Brendan Gregg <brendan.d.gregg@gmail.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://permalink.gmane.org/gmane.linux.kernel.perf.user/2380
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Normally symbols are read from the DSO and adjusted, if need be, so that
the symbol start matches the file offset in the DSO file (we want the
file offset because that is what we know from MMAP events). That is done
by dso__load_sym() which inserts the symbols *after* adjusting them.
In the case of kcore, the symbols have been read from kallsyms and the
symbol start is the memory address. The symbols have to be adjusted to
match the kcore file offsets. dso__split_kallsyms_for_kcore() does that,
but now the adjustment is being done *after* the symbols have been
inserted. It appears dso__split_kallsyms_for_kcore() was assuming that
changing the symbol start would not change the order in the rbtree -
which is, of course, not guaranteed.
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Tested-by: Wang Nan <wangnan0@huawei.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Zefan Li <lizefan@huawei.com>
Cc: pi3orama@163.com
Link: http://lkml.kernel.org/r/563CB241.2090701@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
On kernel with only one out of CONFIG_KPROBE_EVENTS and
CONFIG_UPROBE_EVENTS enabled, 'perf probe -d' causes a segfault because
perf_del_probe_events() calls probe_file__get_events() with a negative
fd.
This patch fixes it by adding parameter validation at the entry of
probe_file__get_events() and probe_file__get_rawlist(). Since they are
both non-static public functions (in .h file), parameter verifying is
required.
v1 -> v2: Verify fd at the head of probe_file__get_rawlist() instead of
checking at call site (suggested by Masami and Arnaldo at [1,2]).
[1] http://lkml.kernel.org/r/50399556C9727B4D88A595C8584AAB37526048E3@GSjpTKYDCembx32.service.hitachi.net
[2] http://lkml.kernel.org/r/20151105155830.GV13236@kernel.org
Signed-off-by: Wang Nan <wangnan0@huawei.com>
Acked-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Zefan Li <lizefan@huawei.com>
Cc: pi3orama@163.com
Link: http://lkml.kernel.org/r/1446803415-83382-1-git-send-email-wangnan0@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Before:
[acme@zoo linux]$ perf evlist
WARNING: The perf.data file's data size field is 0 which is unexpected.
Was the 'perf record' command properly terminated?
non matching sample_type[acme@zoo linux]$
After:
[acme@zoo linux]$ perf evlist
WARNING: The perf.data file's data size field is 0 which is unexpected.
Was the 'perf record' command properly terminated?
non matching sample_type
[acme@zoo linux]$
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/n/tip-wscok3a2s7yrj8156oc2r6qe@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The --full-paths option did not show the full source file paths in the 'perf
annotate' tool, because the value of the option was not propagated into the
related functions.
With this patch the value of the --full-paths option is known to the function
that composes the srcline string, so it prints the full path when necessary.
Committer Note:
This affects annotate when the --print-line option is used:
# perf annotate -h 2>&1 | grep print-line
-l, --print-line print matching source lines (may be slow)
Looking just at the lines that should be affected by this change:
Before:
# perf annotate --print-line --full-paths --stdio fput | grep '\.[ch]:[0-9]\+'
94.44 atomic64_64.h:114
5.56 file_table.c:265
file_table.c:265 5.56 : ffffffff81219a00: callq ffffffff81769360 <__fentry__>
atomic64_64.h:114 94.44 : ffffffff81219a05: lock decq 0x38(%rdi)
After:
# perf annotate --print-line --full-paths --stdio fput | grep '\.[ch]:[0-9]\+'
94.44 /home/git/linux/arch/x86/include/asm/atomic64_64.h:114
5.56 /home/git/linux/fs/file_table.c:265
/home/git/linux/fs/file_table.c:265 5.56 : ffffffff81219a00: callq ffffffff81769360 <__fentry__>
/home/git/linux/arch/x86/include/asm/atomic64_64.h:114 94.44 : ffffffff81219a05: lock decq 0x38(%rdi)
#
Signed-off-by: Michael Petlan <mpetlan@redhat.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Link: http://permalink.gmane.org/gmane.linux.kernel.perf.user/2365
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
This patch adds BPF testcase for testing BPF event filtering.
By utilizing the result of 'perf test LLVM', this patch compiles the
eBPF sample program then test its ability. The BPF script in 'perf test
LLVM' lets only 50% samples generated by epoll_pwait() to be captured.
This patch runs that system call for 111 times, so the result should
contain 56 samples.
Signed-off-by: Wang Nan <wangnan0@huawei.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Zefan Li <lizefan@huawei.com>
Cc: pi3orama@163.com
Link: http://lkml.kernel.org/r/1446817783-86722-8-git-send-email-wangnan0@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
A series of bpf loader related error codes were introduced to help error
reporting. Functions were improved to return these new error codes.
Functions which return pointers were adjusted to encode error codes into
return value using the ERR_PTR() interface.
bpf_loader_strerror() was improved to convert these error messages to
strings. It checks the error codes and calls libbpf_strerror() and
strerror_r() accordingly, so caller don't need to consider checking the
range of the error code.
In bpf__strerror_load(), print kernel version of running kernel and the
object's 'version' section to notify user how to fix his/her program.
v1 -> v2:
Use macro for error code.
Fetch error message based on array index, eliminate for-loop.
Print version strings.
Before:
# perf record -e ./test_kversion_nomatch_program.o sleep 1
event syntax error: './test_kversion_nomatch_program.o'
\___ Failed to load program: Validate your program and check 'license'/'version' sections in your object
SKIP
After:
# perf record -e ./test_kversion_nomatch_program.o ls
event syntax error: './test_kversion_nomatch_program.o'
\___ 'version' (4.4.0) doesn't match running kernel (4.3.0)
SKIP
Signed-off-by: Wang Nan <wangnan0@huawei.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Zefan Li <lizefan@huawei.com>
Cc: pi3orama@163.com
Link: http://lkml.kernel.org/r/1446818289-87444-1-git-send-email-wangnan0@huawei.com
[ Add 'static inline' to bpf__strerror_prepare_load() when LIBBPF is disabled ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
There are 2 places in llvm-utils.c which find kernel version information
through uname. This patch extracts the uname related code into a
fetch_kernel_version() function and puts it into util.h so it can be
reused.
Signed-off-by: Wang Nan <wangnan0@huawei.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Zefan Li <lizefan@huawei.com>
Cc: pi3orama@163.com
Link: http://lkml.kernel.org/r/1446818135-87310-1-git-send-email-wangnan0@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
In this patch, a series of libbpf specific error numbers and
libbpf_strerror() are introduced to help reporting errors.
Functions are updated to pass correct the error number through the
CHECK_ERR() macro.
All users of bpf_object__open{_buffer}() and bpf_program__title() in
perf are modified accordingly. In addition, due to the error codes
changing, bpf__strerror_load() is also modified to use them.
bpf__strerror_head() is also changed accordingly so it can parse libbpf
errors. bpf_loader_strerror() is introduced for that purpose, and will
be improved by the following patch.
load_program() is improved not to dump log buffer if it is empty. log
buffer is also used to deduce whether the error was caused by an invalid
program or other problem.
v1 -> v2:
- Using macro for error code.
- Fetch error message based on array index, eliminate for-loop.
- Use log buffer to detect the reason of failure. 3 new error code
are introduced to replace LIBBPF_ERRNO__LOAD.
In v1:
# perf record -e ./test_ill_program.o ls
event syntax error: './test_ill_program.o'
\___ Failed to load program: Validate your program and check 'license'/'version' sections in your object
SKIP
# perf record -e ./test_kversion_nomatch_program.o ls
event syntax error: './test_kversion_nomatch_program.o'
\___ Failed to load program: Validate your program and check 'license'/'version' sections in your object
SKIP
# perf record -e ./test_big_program.o ls
event syntax error: './test_big_program.o'
\___ Failed to load program: Validate your program and check 'license'/'version' sections in your object
SKIP
In v2:
# perf record -e ./test_ill_program.o ls
event syntax error: './test_ill_program.o'
\___ Kernel verifier blocks program loading
SKIP
# perf record -e ./test_kversion_nomatch_program.o
event syntax error: './test_kversion_nomatch_program.o'
\___ Incorrect kernel version
SKIP
(Will be further improved by following patches)
# perf record -e ./test_big_program.o
event syntax error: './test_big_program.o'
\___ Program too big
SKIP
Signed-off-by: Wang Nan <wangnan0@huawei.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Zefan Li <lizefan@huawei.com>
Cc: pi3orama@163.com
Link: http://lkml.kernel.org/r/1446817783-86722-2-git-send-email-wangnan0@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
In find_perf_probe_point_from_map(), the 'ret' variable is initialized
with -ENOENT but overwritten by the return code of
kernel_get_symbol_address_by_name(), and after that it is re-initialized
with -ENOENT again.
Setting ret=-ENOENT twice looks a bit redundant. This avoids the
overwriting and just returns -ENOENT if some error happens to simplify
the code.
Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Cc: Zefan Li <lizefan@huawei.com>
Link: http://lkml.kernel.org/n/tip-ufp1zgbktzmttcputozneomd@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
When the browser fails to annotate it is difficult for users to find out
what went wrong.
Add some errors for objdump failures that are displayed in the UI.
Note it would be even better to handle these errors smarter, like
falling back to the binary when the debug info is somehow corrupted. But
for now just giving a better error is an improvement.
Committer note:
This works for --stdio, where errors just scroll by the screen:
# perf annotate --stdio intel_idle
Failure running objdump --start-address=0xffffffff81418290 --stop-address=0xffffffff814183ae -l -d --no-show-raw -S -C /root/.debug/.build-id/28/2777c262e6b3c0451375163c9a81c893218ab1 2>/dev/null|grep -v /root/.debug/.build-id/28/2777c262e6b3c0451375163c9a81c893218ab1|expand
Percent | Source code & Disassembly of vmlinux for cycles:pp
------------------------------------------------------------------
And with that one can use that command line to try to find out more about what
happened instead of getting a blank screen, an improvement.
We need tho to improve this further to get it to work with other UIs, like
--tui and --gtk, where it continues showing a blank screen, no messages, as
the pr_err() used is enough just for --stdio.
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: http://lkml.kernel.org/r/1446779167-18949-1-git-send-email-andi@firstfloor.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
It is possible that find_perf_probe_point_from_map() fails to find a
symbol but still returns 0 because of an small error when coding:
find_perf_probe_point_from_map() set 'ret' to error code at first, but
also use it to hold return value of kernel_get_symbol_address_by_name().
This patch resets 'ret' to error even kernel_get_symbol_address_by_name()
success, so if !sym, the whole function returns error correctly.
Signed-off-by: Wang Nan <wangnan0@huawei.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Zefan Li <lizefan@huawei.com>
Cc: pi3orama@163.com
Link: http://lkml.kernel.org/r/1446729565-27592-3-git-send-email-wangnan0@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Arnaldo suggests to make LINUX_VERSION_CODE works like __func__ and
__FILE__ so user don't need to care setting right linux version too
much. In this patch, perf llvm transfers LINUX_VERSION_CODE macro
through clang cmdline.
[1] http://lkml.kernel.org/r/20151029223744.GK2923@kernel.org
Committer notes:
Before, forgetting to update the version:
# uname -r
4.3.0-rc1+
# cat bpf.c
__attribute__((section("fork=_do_fork"), used))
int fork(void *ctx)
{
return 1;
}
char _license[] __attribute__((section("license"), used)) = "GPL";
int _version __attribute__((section("version"), used)) = 0x40200;
#
# perf record -e bpf.c sleep 1
event syntax error: 'bpf.c'
\___ Invalid argument: Are you root and runing a CONFIG_BPF_SYSCALL kernel?
(add -v to see detail)
Run 'perf list' for a list of valid events
Usage: perf record [<options>] [<command>]
or: perf record [<options>] -- <command> [<options>]
-e, --event <event> event selector. use 'perf list' to list available events
#
After:
# grep version bpf.c
int _version __attribute__((section("version"), used)) = LINUX_VERSION_CODE;
# perf record -e bpf.c sleep 1
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.017 MB perf.data ]
# perf evlist -v
perf_bpf_probe:fork: type: 2, size: 112, config: 0x5ee, { sample_period,
sample_freq }: 1, sample_type: IP|TID|TIME|CPU|PERIOD|RAW, disabled: 1,
inherit: 1, mmap: 1, comm: 1, enable_on_exec: 1, task: 1, sample_id_all:
1, exclude_guest: 1, mmap2: 1, comm_exec: 1
#
Suggested-and-Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Wang Nan <wangnan0@huawei.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Zefan Li <lizefan@huawei.com>
Cc: pi3orama@163.com
Link: http://lkml.kernel.org/r/1446636007-239722-3-git-send-email-wangnan0@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
This patch introduces a new macro "__NR_CPUS__" to perf's embedded clang
compiler, which represent the number of configured CPUs in this system.
BPF programs can use this macro to create a map with the same number of
system CPUs. For example:
struct bpf_map_def SEC("maps") pmu_map = {
.type = BPF_MAP_TYPE_PERF_EVENT_ARRAY,
.key_size = sizeof(int),
.value_size = sizeof(u32),
.max_entries = __NR_CPUS__,
};
Signed-off-by: Wang Nan <wangnan0@huawei.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Zefan Li <lizefan@huawei.com>
Cc: pi3orama@163.com
Link: http://lkml.kernel.org/r/1446636007-239722-2-git-send-email-wangnan0@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
When new maps are cloned out of split map they are added into origin
map's group, but their groups pointer is not updated.
This could lead to a segfault, because map->groups is expected to be
always set as reported by Markus:
__map__is_kernel (map=map@entry=0x1abb7a0) at util/map.c:238
238 return __machine__kernel_map(map->groups->machine, map->type) =
(gdb) bt
#0 __map__is_kernel (map=map@entry=0x1abb7a0) at util/map.c:238
#1 0x00000000004393e4 in symbol_filter (map=map@entry=0x1abb7a0, sym=sym@entry
#2 0x00000000004fcd4d in dso__load_sym (dso=dso@entry=0x166dae0, map=map@entry
#3 0x00000000004a64e0 in dso__load (dso=0x166dae0, map=map@entry=0x1abb7a0, fi
#4 0x00000000004b941f in map__load (filter=0x4393c0 <symbol_filter>, map=<opti
#5 map__find_symbol (map=0x1abb7a0, addr=40188, filter=0x4393c0 <symbol_filter
...
Adding __map_groups__insert function to add map into groups together
with map->groups pointer update. It takes no lock as opposed to existing
map_groups__insert, as maps__fixup_overlappings(), where it is being
called, already has the necessary lock held.
Using __map_groups__insert to add new maps after map split.
Reported-by: Markus Trippelsdorf <markus@trippelsdorf.de>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Markus Trippelsdorf <markus@trippelsdorf.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20151104140811.GA32664@krava.brq.redhat.com
Fixes: cfc5acd4c8 ("perf top: Filter symbols based on __map__is_kernel(map)")
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The sw clock metrics printing was missed in the earlier move to
stat-shadow of all the other metric printouts. Move it too.
v2: Fix metrics printing in this version to make bisect safe.
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Link: http://lkml.kernel.org/r/1446515428-7450-2-git-send-email-andi@firstfloor.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
According to [1], libbpf should be muted. This patch reset info and
warning message level to ensure libbpf doesn't output anything even
if error happened.
[1] http://lkml.kernel.org/r/20151020151255.GF5119@kernel.org
Committer note:
Before:
Testing it with an incompatible kernel version in the .c file that
generated foo.o:
[root@zoo ~]# perf record -e /tmp/foo.o sleep 1
libbpf: load bpf program failed: Invalid argument
libbpf: -- BEGIN DUMP LOG ---
libbpf:
libbpf: -- END LOG --
libbpf: failed to load program 'fork=_do_fork'
libbpf: failed to load object '/tmp/foo.o'
event syntax error: '/tmp/foo.o'
\___ Invalid argument: Are you root and runing a CONFIG_BPF_SYSCALL kernel?
(add -v to see detail)
Run 'perf list' for a list of valid events
Usage: perf record [<options>] [<command>]
or: perf record [<options>] -- <command> [<options>]
-e, --event <event> event selector. use 'perf list' to list available events
[root@zoo ~]#
After:
[root@zoo ~]# perf record -e /tmp/foo.o sleep 1
event syntax error: '/tmp/foo.o'
\___ Invalid argument: Are you root and runing a CONFIG_BPF_SYSCALL kernel?
(add -v to see detail)
Run 'perf list' for a list of valid events
Usage: perf record [<options>] [<command>]
or: perf record [<options>] -- <command> [<options>]
-e, --event <event> event selector. use 'perf list' to list available events
[root@zoo ~]#
This, BTW, need fixing to emit a proper message by validating the
version in the foo.o "version" ELF section against the running kernel,
warning the user instead of asking the kernel to load a binary that it
will refuse due to unmatching kernel version.
Signed-off-by: Wang Nan <wangnan0@huawei.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Zefan Li <lizefan@huawei.com>
Cc: pi3orama@163.com
Link: http://lkml.kernel.org/r/1446547486-229499-3-git-send-email-wangnan0@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Even if --symfs is used to point to the debug binaries, we send in the
non-debug filenames to libunwind, which leads to libunwind not finding
the debug frame. Fix this by preferring the file in --symfs, if it is
available.
Signed-off-by: Rabin Vincent <rabin.vincent@axis.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Rabin Vincent <rabinv@axis.com>
Link: http://lkml.kernel.org/r/1446104978-26429-1-git-send-email-rabin.vincent@axis.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
This patch provides infrastructure for passing source files to --event
directly using:
# perf record --event bpf-file.c command
This patch does following works:
1) Allow passing '.c' file to '--event'. parse_events_load_bpf() is
expanded to allow caller tell it whether the passed file is source
file or object.
2) llvm__compile_bpf() is called to compile the '.c' file, the result
is saved into memory. Use bpf_object__open_buffer() to load the
in-memory object.
Introduces a bpf-script-example.c so we can manually test it:
# perf record --clang-opt "-DLINUX_VERSION_CODE=0x40200" --event ./bpf-script-example.c sleep 1
Note that '--clang-opt' must put before '--event'.
Futher patches will merge it into a testcase so can be tested automatically.
Signed-off-by: Wang Nan <wangnan0@huawei.com>
Acked-by: Alexei Starovoitov <ast@plumgrid.com>
Cc: Brendan Gregg <brendan.d.gregg@gmail.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: David Ahern <dsahern@gmail.com>
Cc: He Kuang <hekuang@huawei.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kaixu Xia <xiakaixu@huawei.com>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Zefan Li <lizefan@huawei.com>
Cc: pi3orama@163.com
Link: http://lkml.kernel.org/r/1444826502-49291-10-git-send-email-wangnan0@huawei.com
Signed-off-by: He Kuang <hekuang@huawei.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
This is the final patch which makes basic BPF filter work. After
applying this patch, users are allowed to use BPF filter like:
# perf record --event ./hello_world.o ls
A bpf_fd field is appended to 'struct evsel', and setup during the
callback function add_bpf_event() for each 'probe_trace_event'.
PERF_EVENT_IOC_SET_BPF ioctl is used to attach eBPF program to a newly
created perf event. The file descriptor of the eBPF program is passed to
perf record using previous patches, and stored into evsel->bpf_fd.
It is possible that different perf event are created for one kprobe
events for different CPUs. In this case, when trying to call the ioctl,
EEXIST will be return. This patch doesn't treat it as an error.
Committer note:
The bpf proggie used so far:
__attribute__((section("fork=_do_fork"), used))
int fork(void *ctx)
{
return 0;
}
char _license[] __attribute__((section("license"), used)) = "GPL";
int _version __attribute__((section("version"), used)) = 0x40300;
failed to produce any samples, even with forks happening and it being
running in system wide mode.
That is because now the filter is being associated, and the code above
always returns zero, meaning that all forks will be probed but filtered
away ;-/
Change it to 'return 1;' instead and after that:
# trace --no-syscalls --event /tmp/foo.o
0.000 perf_bpf_probe:fork:(ffffffff8109be30))
2.333 perf_bpf_probe:fork:(ffffffff8109be30))
3.725 perf_bpf_probe:fork:(ffffffff8109be30))
4.550 perf_bpf_probe:fork:(ffffffff8109be30))
^C#
And it works with all tools, including 'perf trace'.
Signed-off-by: Wang Nan <wangnan0@huawei.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Alexei Starovoitov <ast@plumgrid.com>
Cc: Brendan Gregg <brendan.d.gregg@gmail.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: David Ahern <dsahern@gmail.com>
Cc: He Kuang <hekuang@huawei.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kaixu Xia <xiakaixu@huawei.com>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Zefan Li <lizefan@huawei.com>
Cc: pi3orama@163.com
Link: http://lkml.kernel.org/r/1444826502-49291-8-git-send-email-wangnan0@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
This patch creates a 'struct perf_evsel' for every probe in a BPF object
file(s) and fills 'struct evlist' with them. The previously introduced
dummy event is now removed. After this patch, the following command:
# perf record --event filter.o ls
Can trace on each of the probes defined in filter.o.
The core of this patch is bpf__foreach_tev(), which calls a callback
function for each 'struct probe_trace_event' event for a bpf program
with each associated file descriptors. The add_bpf_event() callback
creates evsels by calling parse_events_add_tracepoint().
Since bpf-loader.c will not be built if libbpf is turned off, an empty
bpf__foreach_tev() is defined in bpf-loader.h to avoid build errors.
Committer notes:
Before:
# /tmp/oldperf record --event /tmp/foo.o -a usleep 1
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.198 MB perf.data ]
# perf evlist
/tmp/foo.o
# perf evlist -v
/tmp/foo.o: type: 1, size: 112, config: 0x9, { sample_period,
sample_freq }: 4000, sample_type: IP|TID|TIME|CPU|PERIOD, disabled: 1,
inherit: 1, mmap: 1, comm: 1, freq: 1, task: 1, sample_id_all: 1,
exclude_guest: 1, mmap2: 1, comm_exec: 1
I.e. we create just the PERF_TYPE_SOFTWARE (type: 1),
PERF_COUNT_SW_DUMMY(config 0x9) event, now, with this patch:
# perf record --event /tmp/foo.o -a usleep 1
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.210 MB perf.data ]
# perf evlist -v
perf_bpf_probe:fork: type: 2, size: 112, config: 0x6bd, { sample_period,
sample_freq }: 1, sample_type: IP|TID|TIME|CPU|PERIOD|RAW, disabled: 1,
inherit: 1, mmap: 1, comm: 1, task: 1, sample_id_all: 1, exclude_guest:
1, mmap2: 1, comm_exec: 1
#
We now have a PERF_TYPE_SOFTWARE (type: 1), but the config states 0x6bd,
which is how, after setting up the event via the kprobes interface, the
'perf_bpf_probe:fork' event is accessible via the perf_event_open
syscall. This is all transient, as soon as the 'perf record' session
ends, these probes will go away.
To see how it looks like, lets try doing a neverending session, one that
expects a control+C to end:
# perf record --event /tmp/foo.o -a
So, with that in place, we can use 'perf probe' to see what is in place:
# perf probe -l
perf_bpf_probe:fork (on _do_fork@acme/git/linux/kernel/fork.c)
We also can use debugfs:
[root@felicio ~]# cat /sys/kernel/debug/tracing/kprobe_events
p:perf_bpf_probe/fork _text+638512
Ok, now lets stop and see if we got some forks:
[root@felicio linux]# perf record --event /tmp/foo.o -a
^C[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.325 MB perf.data (111 samples) ]
[root@felicio linux]# perf script
sshd 1271 [003] 81797.507678: perf_bpf_probe:fork: (ffffffff8109be30)
sshd 18309 [000] 81797.524917: perf_bpf_probe:fork: (ffffffff8109be30)
sshd 18309 [001] 81799.381603: perf_bpf_probe:fork: (ffffffff8109be30)
sshd 18309 [001] 81799.408635: perf_bpf_probe:fork: (ffffffff8109be30)
<SNIP>
Sure enough, we have 111 forks :-)
Callchains seems to work as well:
# perf report --stdio --no-child
# To display the perf.data header info, please use --header/--header-only options.
#
# Total Lost Samples: 0
#
# Samples: 562 of event 'perf_bpf_probe:fork'
# Event count (approx.): 562
#
# Overhead Command Shared Object Symbol
# ........ ........ ................ ............
#
44.66% sh [kernel.vmlinux] [k] _do_fork
|
---_do_fork
entry_SYSCALL_64_fastpath
__libc_fork
make_child
26.16% make [kernel.vmlinux] [k] _do_fork
<SNIP>
#
Signed-off-by: Wang Nan <wangnan0@huawei.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Alexei Starovoitov <ast@plumgrid.com>
Cc: Brendan Gregg <brendan.d.gregg@gmail.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: David Ahern <dsahern@gmail.com>
Cc: He Kuang <hekuang@huawei.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kaixu Xia <xiakaixu@huawei.com>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Zefan Li <lizefan@huawei.com>
Cc: pi3orama@163.com
Link: http://lkml.kernel.org/r/1444826502-49291-7-git-send-email-wangnan0@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
This patch utilizes bpf_object__load() provided by libbpf to load all
objects into kernel.
Committer notes:
Testing it:
When using an incorrect kernel version number, i.e., having this in your
eBPF proggie:
int _version __attribute__((section("version"), used)) = 0x40100;
For a 4.3.0-rc6+ kernel, say, this happens and needs checking at event
parsing time, to provide a better error report to the user:
# perf record --event /tmp/foo.o sleep 1
libbpf: load bpf program failed: Invalid argument
libbpf: -- BEGIN DUMP LOG ---
libbpf:
libbpf: -- END LOG --
libbpf: failed to load program 'fork=_do_fork'
libbpf: failed to load object '/tmp/foo.o'
event syntax error: '/tmp/foo.o'
\___ Invalid argument: Are you root and runing a CONFIG_BPF_SYSCALL kernel?
(add -v to see detail)
Run 'perf list' for a list of valid events
Usage: perf record [<options>] [<command>]
or: perf record [<options>] -- <command> [<options>]
-e, --event <event> event selector. use 'perf list' to list available events
If we instead make it match, i.e. use 0x40300 on this v4.3.0-rc6+
kernel, the whole process goes thru:
# perf record --event /tmp/foo.o -a usleep 1
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.202 MB perf.data ]
# perf evlist -v
/tmp/foo.o: type: 1, size: 112, config: 0x9, { sample_period,
sample_freq }: 4000, sample_type: IP|TID|TIME|CPU|PERIOD, disabled: 1,
inherit: 1, mmap: 1, comm: 1, freq: 1, task: 1, sample_id_all: 1,
exclude_guest: 1, mmap2: 1, comm_exec: 1
#
Signed-off-by: Wang Nan <wangnan0@huawei.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Alexei Starovoitov <ast@plumgrid.com>
Cc: Brendan Gregg <brendan.d.gregg@gmail.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: David Ahern <dsahern@gmail.com>
Cc: He Kuang <hekuang@huawei.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kaixu Xia <xiakaixu@huawei.com>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Zefan Li <lizefan@huawei.com>
Cc: pi3orama@163.com
Link: http://lkml.kernel.org/r/1444826502-49291-6-git-send-email-wangnan0@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
This patch introduces bpf__{un,}probe() functions to enable callers to
create kprobe points based on section names a BPF program. It parses the
section names in the program and creates corresponding 'struct
perf_probe_event' structures. The parse_perf_probe_command() function is
used to do the main parsing work. The resuling 'struct perf_probe_event'
is stored into program private data for further using.
By utilizing the new probing API, this patch creates probe points during
event parsing.
To ensure probe points be removed correctly, register an atexit hook so
even perf quit through exit() bpf__clear() is still called, so probing
points are cleared. Note that bpf_clear() should be registered before
bpf__probe() is called, so failure of bpf__probe() can still trigger
bpf__clear() to remove probe points which are already probed.
strerror style error reporting scaffold is created by this patch.
bpf__strerror_probe() is the first error reporting function in
bpf-loader.c.
Committer note:
Trying it:
To build a test eBPF object file:
I am testing using a script I built from the 'perf test -v LLVM' output:
$ cat ~/bin/hello-ebpf
export KERNEL_INC_OPTIONS="-nostdinc -isystem /usr/lib/gcc/x86_64-redhat-linux/4.8.3/include -I/home/acme/git/linux/arch/x86/include -Iarch/x86/include/generated/uapi -Iarch/x86/include/generated -I/home/acme/git/linux/include -Iinclude -I/home/acme/git/linux/arch/x86/include/uapi -Iarch/x86/include/generated/uapi -I/home/acme/git/linux/include/uapi -Iinclude/generated/uapi -include /home/acme/git/linux/include/linux/kconfig.h"
export WORKING_DIR=/lib/modules/4.2.0/build
export CLANG_SOURCE=-
export CLANG_OPTIONS=-xc
OBJ=/tmp/foo.o
rm -f $OBJ
echo '__attribute__((section("fork=do_fork"), used)) int fork(void *ctx) {return 0;} char _license[] __attribute__((section("license"), used)) = "GPL";int _version __attribute__((section("version"), used)) = 0x40100;' | \
clang -D__KERNEL__ $CLANG_OPTIONS $KERNEL_INC_OPTIONS -Wno-unused-value -Wno-pointer-sign -working-directory $WORKING_DIR -c "$CLANG_SOURCE" -target bpf -O2 -o /tmp/foo.o && file $OBJ
---
First asking to put a probe in a function not present in the kernel
(misses the initial _):
$ perf record --event /tmp/foo.o sleep 1
Probe point 'do_fork' not found.
event syntax error: '/tmp/foo.o'
\___ You need to check probing points in BPF file
(add -v to see detail)
Run 'perf list' for a list of valid events
Usage: perf record [<options>] [<command>]
or: perf record [<options>] -- <command> [<options>]
-e, --event <event> event selector. use 'perf list' to list available events
$
---
Now, with "__attribute__((section("fork=_do_fork"), used)):
$ grep _do_fork /proc/kallsyms
ffffffff81099ab0 T _do_fork
$ perf record --event /tmp/foo.o sleep 1
Failed to open kprobe_events: Permission denied
event syntax error: '/tmp/foo.o'
\___ Permission denied
---
Cool, we need to provide some better hints, "kprobe_events" is too low
level, one doesn't strictly need to know the precise details of how
these things are put in place, so something that shows the command
needed to fix the permissions would be more helpful.
Lets try as root instead:
# perf record --event /tmp/foo.o sleep 1
Lowering default frequency rate to 1000.
Please consider tweaking /proc/sys/kernel/perf_event_max_sample_rate.
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.013 MB perf.data ]
# perf evlist
/tmp/foo.o
[root@felicio ~]# perf evlist -v
/tmp/foo.o: type: 1, size: 112, config: 0x9, { sample_period,
sample_freq }: 1000, sample_type: IP|TID|TIME|PERIOD, disabled: 1,
inherit: 1, mmap: 1, comm: 1, freq: 1, enable_on_exec: 1, task: 1,
sample_id_all: 1, exclude_guest: 1, mmap2: 1, comm_exec: 1
---
Signed-off-by: Wang Nan <wangnan0@huawei.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Alexei Starovoitov <ast@plumgrid.com>
Cc: Brendan Gregg <brendan.d.gregg@gmail.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: David Ahern <dsahern@gmail.com>
Cc: He Kuang <hekuang@huawei.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kaixu Xia <xiakaixu@huawei.com>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Zefan Li <lizefan@huawei.com>
Cc: pi3orama@163.com
Link: http://lkml.kernel.org/r/1444826502-49291-5-git-send-email-wangnan0@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
By introducing new rules in tools/perf/util/parse-events.[ly], this
patch enables 'perf record --event bpf_file.o' to select events by an
eBPF object file. It calls parse_events_load_bpf() to load that file,
which uses bpf__prepare_load() and finally calls bpf_object__open() for
the object files.
After applying this patch, commands like:
# perf record --event foo.o sleep
become possible.
However, at this point it is unable to link any useful things onto the
evsel list because the creating of probe points and BPF program
attaching have not been implemented. Before real events are possible to
be extracted, to avoid perf report error because of empty evsel list,
this patch link a dummy evsel. The dummy event related code will be
removed when probing and extracting code is ready.
Commiter notes:
Using it:
$ ls -la foo.o
ls: cannot access foo.o: No such file or directory
$ perf record --event foo.o sleep
libbpf: failed to open foo.o: No such file or directory
event syntax error: 'foo.o'
\___ BPF object file 'foo.o' is invalid
(add -v to see detail)
Run 'perf list' for a list of valid events
Usage: perf record [<options>] [<command>]
or: perf record [<options>] -- <command> [<options>]
-e, --event <event> event selector. use 'perf list' to list available events
$
$ file /tmp/build/perf/perf.o
/tmp/build/perf/perf.o: ELF 64-bit LSB relocatable, x86-64, version 1 (SYSV), not stripped
$ perf record --event /tmp/build/perf/perf.o sleep
libbpf: /tmp/build/perf/perf.o is not an eBPF object file
event syntax error: '/tmp/build/perf/perf.o'
\___ BPF object file '/tmp/build/perf/perf.o' is invalid
(add -v to see detail)
Run 'perf list' for a list of valid events
Usage: perf record [<options>] [<command>]
or: perf record [<options>] -- <command> [<options>]
-e, --event <event> event selector. use 'perf list' to list available events
$
$ file /tmp/foo.o
/tmp/foo.o: ELF 64-bit LSB relocatable, no machine, version 1 (SYSV), not stripped
$ perf record --event /tmp/foo.o sleep 1
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.013 MB perf.data ]
$ perf evlist
/tmp/foo.o
$ perf evlist -v
/tmp/foo.o: type: 1, size: 112, config: 0x9, { sample_period, sample_freq }: 4000, sample_type: IP|TID|TIME|PERIOD, disabled: 1, inherit: 1, mmap: 1, comm: 1, freq: 1, enable_on_exec: 1, task: 1, sample_id_all: 1, exclude_guest: 1, mmap2: 1, comm_exec: 1
$
So, type 1 is PERF_TYPE_SOFTWARE, config 0x9 is PERF_COUNT_SW_DUMMY, ok.
$ perf report --stdio
Error:
The perf.data file has no samples!
# To display the perf.data header info, please use --header/--header-only options.
#
$
Signed-off-by: Wang Nan <wangnan0@huawei.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Alexei Starovoitov <ast@plumgrid.com>
Cc: Brendan Gregg <brendan.d.gregg@gmail.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: David Ahern <dsahern@gmail.com>
Cc: He Kuang <hekuang@huawei.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kaixu Xia <xiakaixu@huawei.com>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Zefan Li <lizefan@huawei.com>
Cc: pi3orama@163.com
Link: http://lkml.kernel.org/r/1444826502-49291-4-git-send-email-wangnan0@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The 'bpf-loader.[ch]' files are introduced in this patch. Which will be
the interface between perf and libbpf. bpf__prepare_load() resides in
bpf-loader.c. Following patches will enrich these two files.
Signed-off-by: Wang Nan <wangnan0@huawei.com>
Acked-by: Alexei Starovoitov <ast@plumgrid.com>
Cc: Brendan Gregg <brendan.d.gregg@gmail.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: David Ahern <dsahern@gmail.com>
Cc: He Kuang <hekuang@huawei.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kaixu Xia <xiakaixu@huawei.com>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Zefan Li <lizefan@huawei.com>
Cc: pi3orama@163.com
Link: http://lkml.kernel.org/r/1444826502-49291-3-git-send-email-wangnan0@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Currently we split symbols based on the map comparison, but symbols are stored
within dso objects and maps could point into same dso objects (kernel maps).
Hence we could end up changing rbtree we are currently iterating and mess it
up. It's easily reproduced on s390x by running:
$ perf record -a -- sleep 3
$ perf buildid-list -i perf.data --with-hits
The fix is to compare dso objects instead.
Reported-by: Michael Petlan <mpetlan@redhat.com>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Acked-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Kan Liang <kan.liang@intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/20151026135130.GA26003@krava.brq.redhat.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
This patch allows perf record setting event's attr.inherit bit by
config terms like:
# perf record -e cycles/no-inherit/ ...
# perf record -e cycles/inherit/ ...
So user can control inherit bit for each event separately.
In following example, a.out fork()s in main then do some complex
CPU intensive computations in both of its children.
Basic result with and without inherit:
# perf record -e cycles -e instructions ./a.out
[ perf record: Woken up 9 times to write data ]
[ perf record: Captured and wrote 2.205 MB perf.data (47920 samples) ]
# perf report --stdio
# ...
# Samples: 23K of event 'cycles'
# Event count (approx.): 23641752891
...
# Samples: 24K of event 'instructions'
# Event count (approx.): 30428312415
# perf record -i -e cycles -e instructions ./a.out
[ perf record: Woken up 5 times to write data ]
[ perf record: Captured and wrote 1.111 MB perf.data (24019 samples) ]
...
# Samples: 12K of event 'cycles'
# Event count (approx.): 11699501775
...
# Samples: 12K of event 'instructions'
# Event count (approx.): 15058023559
Cancel inherit for one event when globally enable:
# perf record -e cycles/no-inherit/ -e instructions ./a.out
[ perf record: Woken up 7 times to write data ]
[ perf record: Captured and wrote 1.660 MB perf.data (36004 samples) ]
...
# Samples: 12K of event 'cycles/no-inherit/'
# Event count (approx.): 11895759282
...
# Samples: 24K of event 'instructions'
# Event count (approx.): 30668000441
Enable inherit for one event when globally disable:
# perf record -i -e cycles/inherit/ -e instructions ./a.out
[ perf record: Woken up 7 times to write data ]
[ perf record: Captured and wrote 1.654 MB perf.data (35868 samples) ]
...
# Samples: 23K of event 'cycles/inherit/'
# Event count (approx.): 23285400229
...
# Samples: 11K of event 'instructions'
# Event count (approx.): 14969050259
Committer note:
One can check if the bit was set, in addition to seeing the result in
the perf.data file size as above by doing one of:
# perf record -e cycles -e instructions -a usleep 1
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.911 MB perf.data (63 samples) ]
# perf evlist -v
cycles: size: 112, { sample_period, sample_freq }: 4000, sample_type: IP|TID|TIME|ID|CPU|PERIOD, read_format: ID, disabled: 1, inherit: 1, mmap: 1, comm: 1, freq: 1, task: 1, sample_id_all: 1, exclude_guest: 1, mmap2: 1, comm_exec: 1
instructions: size: 112, config: 0x1, { sample_period, sample_freq }: 4000, sample_type: IP|TID|TIME|ID|CPU|PERIOD, read_format: ID, disabled: 1, inherit: 1, freq: 1, sample_id_all: 1, exclude_guest: 1
#
So, the inherit bit was set in both, now, if we disable it globally using
--no-inherit:
# perf record --no-inherit -e cycles -e instructions -a usleep 1
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.910 MB perf.data (56 samples) ]
# perf evlist -v
cycles: size: 112, { sample_period, sample_freq }: 4000, sample_type: IP|TID|TIME|ID|CPU|PERIOD, read_format: ID, disabled: 1, mmap: 1, comm: 1, freq: 1, task: 1, sample_id_all: 1, exclude_guest: 1, mmap2: 1, comm_exec: 1
instructions: size: 112, config: 0x1, { sample_period, sample_freq }: 4000, sample_type: IP|TID|TIME|ID|CPU|PERIOD, read_format: ID, disabled: 1, freq: 1, sample_id_all: 1, exclude_guest: 1
No inherit bit set, then disabling it and setting just on the cycles event:
# perf record --no-inherit -e cycles/inherit/ -e instructions -a usleep 1
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.909 MB perf.data (48 samples) ]
# perf evlist -v
cycles/inherit/: size: 112, { sample_period, sample_freq }: 4000, sample_type: IP|TID|TIME|ID|CPU|PERIOD, read_format: ID, disabled: 1, inherit: 1, mmap: 1, comm: 1, freq: 1, task: 1, sample_id_all: 1, exclude_guest: 1, mmap2: 1, comm_exec: 1
instructions: size: 112, config: 0x1, { sample_period, sample_freq }: 4000, sample_type: IP|TID|TIME|ID|CPU|PERIOD, read_format: ID, disabled: 1, freq: 1, sample_id_all: 1, exclude_guest: 1
#
We can see it as well in by using a more verbose level of debug messages in
the tool that sets up the perf_event_attr, 'perf record' in this case:
[root@zoo ~]# perf record -vv --no-inherit -e cycles/inherit/ -e instructions -a usleep 1
------------------------------------------------------------
perf_event_attr:
size 112
{ sample_period, sample_freq } 4000
sample_type IP|TID|TIME|ID|CPU|PERIOD
read_format ID
disabled 1
inherit 1
mmap 1
comm 1
freq 1
task 1
sample_id_all 1
exclude_guest 1
mmap2 1
comm_exec 1
------------------------------------------------------------
sys_perf_event_open: pid -1 cpu 0 group_fd -1 flags 0x8
sys_perf_event_open: pid -1 cpu 1 group_fd -1 flags 0x8
sys_perf_event_open: pid -1 cpu 2 group_fd -1 flags 0x8
sys_perf_event_open: pid -1 cpu 3 group_fd -1 flags 0x8
------------------------------------------------------------
perf_event_attr:
size 112
config 0x1
{ sample_period, sample_freq } 4000
sample_type IP|TID|TIME|ID|CPU|PERIOD
read_format ID
disabled 1
freq 1
sample_id_all 1
exclude_guest 1
------------------------------------------------------------
sys_perf_event_open: pid -1 cpu 0 group_fd -1 flags 0x8
<SNIP>
Signed-off-by: Wang Nan <wangnan0@huawei.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Alexei Starovoitov <ast@plumgrid.com>
Cc: Brendan Gregg <brendan.d.gregg@gmail.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Li Zefan <lizefan@huawei.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Zefan Li <lizefan@huawei.com>
Cc: pi3orama@163.com
Link: http://lkml.kernel.org/r/1446029705-199659-2-git-send-email-wangnan0@huawei.com
[ s/u64/bool/ for the perf_evsel_config_term inherit field - jolsa]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Recent GDB (at least on a vanilla Debian box) looks for debug information in
/usr/lib/debug/.build-id/nn/nnnnnnn
where nn/nnnnnn is the build-id of the stripped ELF binary. This is
documented here:
https://sourceware.org/gdb/onlinedocs/gdb/Separate-Debug-Files.html
This was not working in perf because we didn't read the build id until
AFTER we searched for the separate debug information file. This patch
reads the build ID and THEN does the search.
Signed-off-by: Dima Kogan <dima@secretsauce.net>
Link: http://lkml.kernel.org/r/87si6pfwz4.fsf@secretsauce.net
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
This was benign, but wrong. The build-id should live in a char[], not a char*[]
Signed-off-by: Dima Kogan <dima@secretsauce.net>
Link: http://lkml.kernel.org/r/87si6pfwz4.fsf@secretsauce.net
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Recently 'perf <tool> -h' was made aware of arguments and would show
just the help for the arguments specified, but that required a strict
form, i.e.:
$ perf -h --tui
worked, but:
$ perf -h tui
didn't.
Make it support both cases and also look at the option help when neither
matches, so that he following examples works:
$ perf report -h interface
Usage: perf report [<options>]
--gtk Use the GTK2 interface
--stdio Use the stdio interface
--tui Use the TUI interface
$ perf report -h stack
Usage: perf report [<options>]
-g, --call-graph <print_type,threshold[,print_limit],order,
sort_key[,branch]>
Display call graph (stack chain/backtrace):
print_type: call graph printing style (graph|flat|fractal|none)
threshold: minimum call graph inclusion threshold (<percent>)
print_limit: maximum number of call graph entry (<number>)
order: call graph order (caller|callee)
sort_key: call graph sort key (function|address)
branch: include last branch info to call graph (branch)
Default: graph,0.5,caller,function
--max-stack <n> Set the maximum stack depth when parsing the
callchain, anything beyond the specified depth
will be ignored. Default: 127
$
Suggested-by: Ingo Molnar <mingo@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Brendan Gregg <brendan.d.gregg@gmail.com>
Cc: Chandler Carruth <chandlerc@gmail.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Stephane Eranian <eranian@google.com>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/n/tip-xzqvamzqv3cv0p6w3inhols3@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Adding cpu_map__empty_new interface to create empty cpumap with given
size. The cpumap entries are initialized with -1.
It'll be used for caching cpu_map in following patches.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Kan Liang <kan.liang@intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1445784728-21732-2-git-send-email-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Because the 'perf stat record' patches will use the id_offset member
together with the priv pointer.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Kan Liang <kan.liang@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1445784728-21732-29-git-send-email-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Now usage_with_options() setup a pager before printing message so normal
printf() or pr_err() will not be shown. The usage_with_options_msg()
can be used to print some help message before usage strings.
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1445701767-12731-4-git-send-email-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
It's annoying to see error or help message when command has many options
like in perf record, report or top. So setup pager when print parser
error or help message - it should be OK since no UI is enabled at the
parsing time. The usage_with_options() already disables it by calling
exit_browser() anyway.
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Ingo Molnar <mingo@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1445701767-12731-3-git-send-email-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
So that it can be more consistent with other --show-* options. The old
name (--showcpuutilization) is provided only for compatibility.
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1445701767-12731-2-git-send-email-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Currently if an option name is ambiguous it only prints first two
matched option names but no help. It'd be better it could show all
possible names and help messages too.
Before:
$ perf report --show
Error: Ambiguous option: show (could be --show-total-period or
--show-ref-call-graph)
Usage: perf report [<options>]
After:
$ perf report --show
Error: Ambiguous option: show (could be --show-total-period or
--show-ref-call-graph)
Usage: perf report [<options>]
-n, --show-nr-samples
Show a column with the number of samples
--showcpuutilization
Show sample percentage for different cpu modes
-I, --show-info Display extended information about perf.data file
--show-total-period
Show a column with the sum of periods
--show-ref-call-graph
Show callgraph from reference event
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Ingo Molnar <mingo@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1445701767-12731-1-git-send-email-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Some tools have a lot of options, so, providing a way to show help just
for some of them may come handy:
$ perf report -h --tui
Usage: perf report [<options>]
--tui Use the TUI interface
$ perf report -h --tui --showcpuutilization -b -c
Usage: perf report [<options>]
-b, --branch-stack use branch records for per branch histogram filling
-c, --comms <comm[,comm...]>
only consider symbols in these comms
--showcpuutilization
Show sample percentage for different cpu modes
--tui Use the TUI interface
$
Using it with perf bash completion is also handy, just make sure you
source the needed file:
$ . ~/git/linux/tools/perf/perf-completion.sh
Then press tab/tab after -- to see a list of options, put them after -h
and only the options chosen will have its help presented:
$ perf report -h --
--asm-raw --demangle-kernel --group
--kallsyms --pretty --stdio
--branch-history --disassembler-style --gtk
--max-stack --showcpuutilization --symbol-filter
--branch-stack --dsos --header
--mem-mode --show-info --symbols
--call-graph --dump-raw-trace --header-only
--modules --show-nr-samples --symfs
--children --exclude-other --hide-unresolved
--objdump --show-ref-call-graph --threads
--column-widths --fields --ignore-callees
--parent --show-total-period --tid
--comms --field-separator --input
--percentage --socket-filter --tui
--cpu --force --inverted
--percent-limit --sort --verbose
--demangle --full-source-path --itrace
--pid --source --vmlinux
$ perf report -h --socket-filter
Usage: perf report [<options>]
--socket-filter <n>
only show processor socket that match with this filter
Suggested-by: Ingo Molnar <mingo@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Brendan Gregg <brendan.d.gregg@gmail.com>
Cc: Chandler Carruth <chandlerc@gmail.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Stephane Eranian <eranian@google.com>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/n/tip-83mcdd3wj0379jcgea8w0fxa@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
When asking for a listing of the options, be it using -h or when an
unknown option is passed, order it by one-letter options, then the ones
having just long names.
Suggested-by: Ingo Molnar <mingo@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Brendan Gregg <brendan.d.gregg@gmail.com>
Cc: Chandler Carruth <chandlerc@gmail.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Stephane Eranian <eranian@google.com>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/n/tip-41qh68t35n4ehrpsuazp1dx8@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The --call-graph option is complex so we should provide better guide for
users. Also change help message to be consistent with config option
names. Now perf top will show help like below:
$ perf top --call-graph
Error: option `call-graph' requires a value
Usage: perf top [<options>]
--call-graph <record_mode[,record_size],print_type,threshold[,print_limit],order,sort_key[,branch]>
setup and enables call-graph (stack chain/backtrace):
record_mode: call graph recording mode (fp|dwarf|lbr)
record_size: if record_mode is 'dwarf', max size of stack recording (<bytes>)
default: 8192 (bytes)
print_type: call graph printing style (graph|flat|fractal|none)
threshold: minimum call graph inclusion threshold (<percent>)
print_limit: maximum number of call graph entry (<number>)
order: call graph order (caller|callee)
sort_key: call graph sort key (function|address)
branch: include last branch info to call graph (branch)
Default: fp,graph,0.5,caller,function
Requested-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Brendan Gregg <brendan.d.gregg@gmail.com>
Cc: Chandler Carruth <chandlerc@gmail.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Stephane Eranian <eranian@google.com>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/r/1445524112-5201-2-git-send-email-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The caller callchain order is useful with --children option since it can
show 'overview' style output, but other commands which don't use
--children feature like 'perf script' or even 'perf report/top' without
--children are better to keep callee order.
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Brendan Gregg <brendan.d.gregg@gmail.com>
Acked-by: Frederic Weisbecker <fweisbec@gmail.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Chandler Carruth <chandlerc@gmail.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Stephane Eranian <eranian@google.com>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/r/1445499946-29817-1-git-send-email-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Currently 'perf top --call-graph' option is same as 'perf record'. But
'perf top' also need to receive display options in 'perf report'. To do
that, change parse_callchain_report_opt() to allow record options too.
Now perf top can receive display options like below:
$ perf top --call-graph
Error: option `call-graph' requires a value
Usage: perf top [<options>]
--call-graph
<mode[,dump_size],output_type,min_percent[,print_limit],call_order[,branch]>
setup and enables call-graph (stack chain/backtrace)
recording: fp dwarf lbr, output_type (graph, flat,
fractal, or none), min percent threshold, optional
print limit, callchain order, key (function or
address), add branches
$ perf top --call-graph callee,graph,fp
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Brendan Gregg <brendan.d.gregg@gmail.com>
Cc: Chandler Carruth <chandlerc@gmail.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Stephane Eranian <eranian@google.com>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/r/1445495330-25416-2-git-send-email-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
These messages will be used by 'perf top' in the next patch.
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Brendan Gregg <brendan.d.gregg@gmail.com>
Cc: Chandler Carruth <chandlerc@gmail.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Stephane Eranian <eranian@google.com>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/r/1445495330-25416-1-git-send-email-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Add a missing field to the perf_event_attr debug output.
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Link: http://lkml.kernel.org/r/1445366797-30894-4-git-send-email-andi@firstfloor.org
[ Print it between config2 and sample_regs_user (peterz)]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Perf will core dump if --per-socket/core -a are applied for perf stat.
The root cause is that cpu_map__build_map set refcnt of evlist's cpu_map
to 1. It should set refcnt for the newly created cpu_map, not evlist's
cpu_map.
Here is the example:
# perf stat -e cycles --per-socket -a sleep 1
Performance counter stats for 'system wide':
S0 36 30,196,257 cycles
S1 28 15,823,536 cycles
1.001126828 seconds time elapsed
*** Error in `./perf': corrupted double-linked list: 0x00000000021f9090 ***
======= Backtrace: =========
/lib64/libc.so.6[0x3002e7bbe7]
/lib64/libc.so.6[0x3002e7d2b5]
./perf(perf_evsel__delete+0x28)[0x485bdd]
./perf[0x4800e8]
./perf(perf_evlist__delete+0x5e)[0x482cd5]
./perf(cmd_stat+0xf25)[0x432328]
./perf[0x4768e0]
./perf[0x476ad6]
./perf[0x476b41]
./perf(main+0x1d0)[0x476db2]
/lib64/libc.so.6(__libc_start_main+0xf5)[0x3002e21b45]
./perf[0x4202c5]
Signed-off-by: Kan Liang <kan.liang@intel.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Andi Kleen <ak@linux.intel.com>
Link: http://lkml.kernel.org/r/1444388363-35936-1-git-send-email-kan.liang@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
This patch add a new branch type sampling filter to perf record.
It is named 'call' and maps to PERF_SAMPLE_BRANCH_CALL. It samples
direct call branches only, unlike 'any_call' which includes indirect
calls as well.
$ perf record -j call -e cycles .....
The man page is updated accordingly.
Signed-off-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: khandual@linux.vnet.ibm.com
Link: http://lkml.kernel.org/r/1444720151-10275-5-git-send-email-eranian@google.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Adding data arg to cpu_map__build_map callback, so we could pass data
along to the callback. It'll be needed in following patches to retrieve
topology info from perf.data.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Kan Liang <kan.liang@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1444992092-17897-41-git-send-email-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
We'll need to call it from perf stat in the stat_script patchkit
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Kan Liang <kan.liang@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1444992092-17897-40-git-send-email-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Adding AGGR_UNSET mode, so we could distinguish unset aggr_mode in
following patches.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Kan Liang <kan.liang@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1444992092-17897-30-git-send-email-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
It's used as the perf_evsel::priv data, so the name suits better. Also
we'll need the perf_stat name free for more generic struct.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Kan Liang <kan.liang@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1444992092-17897-29-git-send-email-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Capitalize 'usage' to make it consistent with all the other 'Usage' in
the codes, e.g., usage_builtin.
Signed-off-by: Yunlong Song <yunlong.song@huawei.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Ramkumar Ramachandra <artagnon@gmail.com>
Cc: Sriram Raghunathan <sriram.r@nokia.com>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/r/1444894792-2338-3-git-send-email-yunlong.song@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Currently libtraceevent emits warning on unsupported event formats.
However it'd be better to see them only -v option is given. To do that,
it needs to override the warning() function which is used in the
libtracevent. Thus add set_warning_routine() same as set_die_routine()
and check the verbose flag in our warning routine.
Before:
# perf test 5
5: parse events tests :
Warning: [kvmmmu:kvm_mmu_get_page] bad op token {
Warning: [kvmmmu:kvm_mmu_sync_page] bad op token {
Warning: [kvmmmu:kvm_mmu_unsync_page] bad op token {
Warning: [kvmmmu:kvm_mmu_prepare_zap_page] bad op token {
Warning: [kvmmmu:fast_page_fault] function is_writable_pte not defined
...
Ok
After:
# perf test 5
5: parse events tests : Ok
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Acked-by: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/r/1445268229-1601-2-git-send-email-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
unw_word_t is uint64_t even on 32-bit MIPS. Cast it to uintptr_t before
the cast to void *p to get rid of the following errors:
util/unwind-libunwind.c: In function 'access_mem':
util/unwind-libunwind.c:464:4: error: cast to pointer from integer of different size [-Werror=int-to-pointer-cast]
util/unwind-libunwind.c:475:2: error: cast to pointer from integer of different size [-Werror=int-to-pointer-cast]
cc1: all warnings being treated as errors
make[3]: *** [util/unwind-libunwind.o] Error 1
Signed-off-by: Rabin Vincent <rabin.vincent@axis.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Rabin Vincent <rabinv@axis.com>
Link: http://lkml.kernel.org/r/1443379079-29133-1-git-send-email-rabin.vincent@axis.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
When NO_LIBUNWIND_DEBUG_FRAME=0, use the .debug_frame if the .eh_frame
doesn't contain the approprate unwind tables.
Signed-off-by: Rabin Vincent <rabin.vincent@axis.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Rabin Vincent <rabinv@axis.com>
Link: http://lkml.kernel.org/r/1443379079-29133-3-git-send-email-rabin.vincent@axis.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Not as the first attempt at finding a vmlinux for the running kernel,
this way we get a more informative filename to present in tools, it will
check that the build-id is the same as the one previously loaded in the
DSO in dso->build_id, reading from /sys/kernel/notes, for instance.
E.g. in the annotation TUI, going from 'perf top', for the scsi_sg_alloc
kernel function, in the first line:
Before:
scsi_sg_alloc /root/.debug/.build-id/28/2777c262e6b3c0451375163c9a81c893218ab1
After:
scsi_sg_alloc /lib/modules/4.3.0-rc1+/build/vmlinux
And:
# ls -la /root/.debug/.build-id/28/2777c262e6b3c0451375163c9a81c893218ab1
lrwxrwxrwx. 1 root root 81 Sep 22 16:11 /root/.debug/.build-id/28/2777c262e6b3c0451375163c9a81c893218ab1 -> ../../home/git/build/v4.3.0-rc1+/vmlinux/282777c262e6b3c0451375163c9a81c893218ab1
# file ~/.debug/home/git/build/v4.3.0-rc1+/vmlinux/282777c262e6b3c0451375163c9a81c893218ab1
/root/.debug/home/git/build/v4.3.0-rc1+/vmlinux/282777c262e6b3c0451375163c9a81c893218ab1: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), statically linked, BuildID[sha1]=282777c262e6b3c0451375163c9a81c893218ab1, not stripped
#
The same as:
# file /lib/modules/4.3.0-rc1+/build/vmlinux
/lib/modules/4.3.0-rc1+/build/vmlinux: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), statically linked, BuildID[sha1]=282777c262e6b3c0451375163c9a81c893218ab1, not stripped
Furthermore:
# sha256sum /lib/modules/4.3.0-rc1+/build/vmlinux
e7a789bbdc61029ec09140c228e1dd651271f38ef0b8416c0b7d5ff727b98be2 /lib/modules/4.3.0-rc1+/build/vmlinux
# sha256sum ~/.debug/home/git/build/v4.3.0-rc1+/vmlinux/282777c262e6b3c0451375163c9a81c893218ab1
e7a789bbdc61029ec09140c228e1dd651271f38ef0b8416c0b7d5ff727b98be2 /root/.debug/home/git/build/v4.3.0-rc1+/vmlinux/282777c262e6b3c0451375163c9a81c893218ab1
[root@zoo new]#
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: David Ahern <dsahern@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/n/tip-9y42ikzq3jisiddoi6f07n8z@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The function can return negative value, assigning it to unsigned
variable can cause memory corruption.
The problem has been detected using proposed semantic patch
scripts/coccinelle/tests/unsigned_lesser_than_zero.cocci [1].
[1]: http://permalink.gmane.org/gmane.linux.kernel/2038576
Signed-off-by: Andrzej Hajda <a.hajda@samsung.com>
Cc: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
Cc: Marek Szyprowski <m.szyprowski@samsung.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: kernel-janitors@vger.kernel.org
Link: http://lkml.kernel.org/r/1444122017-16856-1-git-send-email-a.hajda@samsung.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
This function will allow to register output column from ui code and
respect taken sort/output dimensions.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1444134312-29136-3-git-send-email-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
There's no need to call reset_dimensions within __setup_output_field
function. It's already called in its caller setup_sorting right before
perf_hpp__init, which will be changed in following patch to respect
taken dimension.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1444134312-29136-2-git-send-email-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Currently we dont fail properly when pattern matching fails to find any
tracepoint.
Current behaviour:
$ perf record -e 'sched:krava*' sleep 1
WARNING: event parser found nothinginvalid or unsupported event: 'sched:krava*'
Run 'perf list' for a list of valid events
usage: perf record [<options>] [<command>]
or: perf record [<options>] -- <command> [<options>]
This patch change:
$ perf record -e 'sched:krava*' sleep 1
event syntax error: 'sched:krava*'
\___ unknown tracepoint
Error: File /sys/kernel/debug/tracing/events/sched/krava* not found.
Hint: Perhaps this kernel misses some CONFIG_ setting to enable this feature?.
Run 'perf list' for a list of valid events
usage: perf record [<options>] [<command>]
or: perf record [<options>] -- <command> [<options>]
Reported-by: Daniel Bristot de Oliveira <bristot@redhat.com>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1444073477-3181-1-git-send-email-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Which is the most common default found in other similar tools.
Requested-by: Ingo Molnar <mingo@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Chandler Carruth <chandlerc@gmail.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Wang Nan <wangnan0@huawei.com>
Link: https://www.youtube.com/watch?v=nXaxk27zwlk
Link: http://lkml.kernel.org/n/tip-v8lq36aispvdwgxdmt9p9jd9@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
We need to properly initialize column width for symbol_iaddr field, so
all symbols could fit in the column.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Don Zickus <dzickus@redhat.com>
Cc: Kan Liang <kan.liang@intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1444068369-20978-9-git-send-email-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Sorting on 'symbol' gives to broad a resolution as it can cover a range
of IP address. Use the iaddr instead to get proper sorting on IP
addresses. Need to use the 'mem_sort' feature of perf record.
New sort option is: symbol_iaddr, header label is 'Code Symbol'.
$ perf mem report --stdio -F +symbol_iaddr
# Overhead Samples Code Symbol Local Weight
# ........ ............ ........................ ............
#
54.08% 1 [k] nmi_handle 192
4.51% 1 [k] finish_task_switch 16
3.66% 1 [.] malloc 13
3.10% 1 [.] __strcoll_l 11
Signed-off-by: Don Zickus <dzickus@redhat.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Kan Liang <kan.liang@intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1444068369-20978-8-git-send-email-jolsa@kernel.org
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The 'P' will cause the event to get maximum possible detected precise
level.
Following record:
$ perf record -e cycles:P ...
will detect maximum precise level for 'cycles' event and use it.
Commiter note:
Testing it:
$ perf record -e cycles:P usleep 1
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.013 MB perf.data (9 samples) ]
$ perf evlist
cycles:P
$ perf evlist -v
cycles:P: size: 112, { sample_period, sample_freq }: 4000, sample_type:
IP|TID|TIME|PERIOD, disabled: 1, inherit: 1, mmap: 1, comm: 1, freq: 1,
enable_on_exec: 1, task: 1, precise_ip: 2, sample_id_all: 1, mmap2: 1,
comm_exec: 1
$
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Don Zickus <dzickus@redhat.com>
Cc: Kan Liang <kan.liang@intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1444068369-20978-6-git-send-email-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
It'll be used in following patch.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Don Zickus <dzickus@redhat.com>
Cc: Kan Liang <kan.liang@intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1444068369-20978-5-git-send-email-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The annotated_source::sizeof_sym_hist could easily overflow int size,
resulting in crash in __symbol__inc_addr_samples.
Changing its type int size_t as was probably intended from beginning
based on the initialization code in symbol__alloc_hist.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Don Zickus <dzickus@redhat.com>
Cc: Kan Liang <kan.liang@intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1444068369-20978-4-git-send-email-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Allow probing on kernel modules when 'perf' is built without debuginfo
support.
Currently perf-probe --module requires linking with libdw, but this
doesn't make sense.
E.g.
----
# make NO_DWARF=1
# ./perf probe -m pcspkr pcspkr_event%return
Error: unknown switch `m'
----
With this patch
----
# ./perf probe -m pcspkr pcspkr_event%return
Added new event:
probe:pcspkr_event (on pcspkr_event%return in pcspkr)
You can now use it in all perf tools, such as:
perf record -e probe:pcspkr_event -aR sleep 1
----
Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/r/20151002125832.18617.78721.stgit@localhost.localdomain
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>