Enable perf trace to use perf.data when it is not owned by current user
or root.
Example:
# perf trace record ls
# chown Yunlong.Song:Yunlong.Song perf.data
# ls -al perf.data
-rw------- 1 Yunlong.Song Yunlong.Song 4153101 Apr 2 15:28 perf.data
# id
uid=0(root) gid=0(root) groups=0(root),64(pkcs11)
Before this patch:
# perf trace -i perf.data
File perf.data not owned by current user or root (use -f to override)
# perf trace -i perf.data -f
Error: unknown switch `f'
usage: perf trace [<options>] [<command>]
or: perf trace [<options>] -- <command> [<options>]
or: perf trace record [<options>] [<command>]
or: perf trace record [<options>] -- <command> [<options>]
--event <event> event selector. use 'perf list' to list
available events
--comm show the thread COMM next to its id
--tool_stats show tool stats
-e, --expr <expr> list of events to trace
-o, --output <file> output file name
-i, --input <file> Analyze events in file
-p, --pid <pid> trace events on existing process id
-t, --tid <tid> trace events on existing thread id
--filter-pids <float>
...
As shown above, the -f option does not work at all.
After this patch:
# perf trace -i perf.data
File perf.data not owned by current user or root (use -f to override)
# perf trace -i perf.data -f
0.056 ( 0.002 ms): ls/47325 brk( ...
0.108 ( 0.018 ms): ls/47325 mmap(len: 4096, prot: READ|WRITE, ...
0.145 ( 0.013 ms): ls/47325 access(filename: 0x7f31259a0eb0, ...
0.172 ( 0.008 ms): ls/47325 open(filename: 0x7fffeb9a0d00, ...
0.180 ( 0.004 ms): ls/47325 stat(filename: 0x7fffeb9a0d00, ...
0.185 ( 0.004 ms): ls/47325 open(filename: 0x7fffeb9a0d00, ...
0.189 ( 0.003 ms): ls/47325 stat(filename: 0x7fffeb9a0d00, ...
0.195 ( 0.004 ms): ls/47325 open(filename: 0x7fffeb9a0d00, ...
0.199 ( 0.002 ms): ls/47325 stat(filename: 0x7fffeb9a0d00, ...
0.205 ( 0.004 ms): ls/47325 open(filename: 0x7fffeb9a0d00, ...
0.211 ( 0.004 ms): ls/47325 stat(filename: 0x7fffeb9a0d00, ...
0.220 ( 0.007 ms): ls/47325 open(filename: 0x7f312599e8ff, ...
...
...
As shown above, the -f option really works now.
Signed-off-by: Yunlong Song <yunlong.song@huawei.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/r/1427982439-27388-10-git-send-email-yunlong.song@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Enable perf timechart to use perf.data when it is not owned by current
user or root.
Example:
# perf timechart record ls
# chown Yunlong.Song:Yunlong.Song perf.data
# ls -al perf.data
-rw------- 1 Yunlong.Song Yunlong.Song 5471744 Apr 2 15:15 perf.data
# id
uid=0(root) gid=0(root) groups=0(root),64(pkcs11)
Before this patch:
# perf timechart
File perf.data not owned by current user or root (use -f to override)
# perf timechart -f
Error: unknown switch `f'
usage: perf timechart [<options>] {record}
-i, --input <file> input file name
-o, --output <file> output file name
-w, --width <n> page width
--highlight <duration or task name>
highlight tasks. Pass duration in ns or process name.
-P, --power-only output power data only
-T, --tasks-only output processes data only
-p, --process <process>
process selector. Pass a pid or process name.
--symfs <directory>
Look for files with symbols relative to this directory
-n, --proc-num <n> min. number of tasks to print
-t, --topology sort CPUs according to topology
--io-skip-eagain skip EAGAIN errors
--io-min-time <time>
all IO faster than min-time will visually appear longer
--io-merge-dist <time>
merge events that are merge-dist us apart
As shown above, the -f option does not work at all.
After this patch:
# perf timechart
File perf.data not owned by current user or root (use -f to override)
# perf timechart -f
Written 0.0 seconds of trace to output.svg.
# cat output.svg
<?xml version="1.0" standalone="no"?>
<!DOCTYPE svg SYSTEM "http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd">
<svg width="1000" height="10110" version="1.1" xmlns="http://www.w3.org/2000/svg">
<defs>
<style type="text/css">
<![CDATA[
rect { stroke-width: 1; }
...
...
As shown above, the -f option really works now.
Signed-off-by: Yunlong Song <yunlong.song@huawei.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/r/1427982439-27388-9-git-send-email-yunlong.song@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Enable perf script to use perf.data when it is not owned by current user
or root. Change the short option name of --fields to -F to avoid confusion
with --force.
Example:
# perf record ls
# chown Yunlong.Song:Yunlong.Song perf.data
# ls -al perf.data
-rw------- 1 Yunlong.Song Yunlong.Song 28360 Apr 2 14:53 perf.data
# id
uid=0(root) gid=0(root) groups=0(root),64(pkcs11)
Before this patch:
# perf script
File perf.data not owned by current user or root (use -f to override)
# perf script -f
Error: switch `f' requires a value
usage: perf script [<options>]
or: perf script [<options>] record <script> [<record-options>] <command>
or: perf script [<options>] report <script> [script-args]
or: perf script [<options>] <script> [<record-options>] <command>
or: perf script [<options>] <top-script> [script-args]
-f, --fields <str> comma separated output fields prepend with
'type:'. Valid types: hw,sw,trace,raw. Fields:
comm,tid,pid,time,cpu,event,trace,ip,sym,dso,addr,symoff,period
As shown above, the -f option does not work at all. And -f is already
taken up by --fields, which makes --force confused, so change the short
option name of --fields to -F like what other perf commands do (e.g.
perf report -F) and use -f as the short option name of --force.
After this patch:
# perf script
File perf.data not owned by current user or root (use -f to override)
# perf script -f
:41298 41298 2590086.564226: 1 cycles: ffffffff8103efc6
native_write_msr_safe ([kernel.kallsyms])
:41298 41298 2590086.564244: 1 cycles: ffffffff8103efc6
native_write_msr_safe ([kernel.kallsyms])
:41298 41298 2590086.564249: 7 cycles: ffffffff8103efc6
native_write_msr_safe ([kernel.kallsyms])
:41298 41298 2590086.564255: 176 cycles: ffffffff8103efc6
native_write_msr_safe ([kernel.kallsyms])
ls 41298 2590086.567346: 4059 cycles: ffffffff8105a592
raise_softirq ([kernel.kallsyms])
ls 41298 2590086.567353: 3717 cycles: ffffffff8105a592
raise_softirq ([kernel.kallsyms])
ls 41298 2590086.567358: 63058 cycles: ffffffff8105a592
raise_softirq ([kernel.kallsyms])
ls 41298 2590086.567448: 1706255 cycles: 406ae0
[unknown] (/usr/bin/ls)
As shown above, the -f option really works now.
Signed-off-by: Yunlong Song <yunlong.song@huawei.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/r/1427982439-27388-8-git-send-email-yunlong.song@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Enable perf mem to use perf.data when it is not owned by current user or
root.
Example:
# perf mem -t load record ls
# chown Yunlong.Song:Yunlong.Song perf.data
# ls -al perf.data
-rw------- 1 Yunlong.Song Yunlong.Song 16392 Apr 2 14:34 perf.data
# id
uid=0(root) gid=0(root) groups=0(root),64(pkcs11)
Before this patch:
# perf mem -D report
File perf.data not owned by current user or root (use -f to override)
# perf mem -D -f report
Error: unknown switch `f'
usage: perf mem [<options>] {record|report}
-t, --type <type> memory operations(load,store) Default load,store
-D, --dump-raw-samples
dump raw samples in ASCII
-U, --hide-unresolved
Only display entries resolved to a symbol
-i, --input <file> input file name
-C, --cpu <cpu> list of cpus to profile
-x, --field-separator <separator>
separator for columns, no spaces will be added
between columns '.' is reserved.
As shown above, the -f option does not work at all.
After this patch:
# perf mem -D report
File perf.data not owned by current user or root (use -f to override)
# perf mem -D -f report
# PID, TID, IP, ADDR, LOCAL WEIGHT, DSRC, SYMBOL
39095 39095 0xffffffff81127e40 0x016ffff887f45148338 8 0x68100142
/proc/kcore:perf_event_aux
39095 39095 0xffffffff8100a3fe 0xffff89007f8cb7d0 6 0x68100142
/proc/kcore:native_sched_clock
39095 39095 0xffffffff81309139 0xffff88bf44c9ded8 6 0x68100142
/proc/kcore:acpi_map_lookup
39095 39095 0xffffffff810f8c4c 0xffff89007f8ccd88 6 0x68100142
/proc/kcore:rcu_nmi_exit
39095 39095 0xffffffff81136346 0xffff88fea995dd50 6 0x68100142
/proc/kcore:unlock_page
39095 39095 0xffffffff812a64a2 0xffff88fea995dcc8 6 0x68100142
/proc/kcore:half_md4_transform
39095 39095 0x7f0cf877c7e9 0x25dfb94 6 0x68100142
/lib64/libc-2.19.so:__readdir64
39095 39095 0x7f0cf87575a3 0x7f0cf9163731 6 0x68100142
/lib64/libc-2.19.so:__strcoll_l
39095 39095 0xffffffff8116910e 0xffffea01c1bfbd50 23 0x68100242
/proc/kcore:page_remove_rmap
As shown above, the -f option really works now.
Signed-off-by: Yunlong Song <yunlong.song@huawei.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/r/1427982439-27388-7-git-send-email-yunlong.song@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Enable perf lock to use perf.data when it is not owned by current user
or root.
Example:
# perf lock record ls
# chown Yunlong.Song:Yunlong.Song perf.data
# ls -al perf.data
-rw------- 1 Yunlong.Song Yunlong.Song 4880686 Apr 2 14:14 perf.data
# id
uid=0(root) gid=0(root) groups=0(root),64(pkcs11)
Before this patch:
# perf lock report
File perf.data not owned by current user or root (use -f to override)
Initializing perf session failed
# perf lock report -f
Error: unknown switch `f'
usage: perf lock report [<options>]
-k, --key <acquired> key for sorting (acquired / contended /
avg_wait / wait_total / wait_max / wait_min)
As shown above, the -f option does not work at all.
After this patch:
# perf lock report
File perf.data not owned by current user or root (use -f to override)
Initializing perf session failed
# perf lock report -f
Name acquired contended avg wait (ns) total wait (ns) ...
&ldata->output_l... 128 0 0 0 ...
&ctx->lock 114 0 0 0 ...
&p->pi_lock 112 0 0 0 ...
&(&pool->lock)->... 112 0 0 0 ...
&(&dentry->d_loc... 70 0 0 0 ...
&(&newf->file_lo... 62 0 0 0 ...
&(&fs->lock)->rl... 43 0 0 0 ...
...
As shown above, the -f option really works now.
Signed-off-by: Yunlong Song <yunlong.song@huawei.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/r/1427982439-27388-6-git-send-email-yunlong.song@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Enable perf kvm to use perf.data.guest when it is not owned by current
user or root.
Example:
# perf kvm stat record ls
# chown Yunlong.Song:Yunlong.Song perf.data.guest
# ls -al perf.data.guest
-rw------- 1 Yunlong.Song Yunlong.Song 4128937 Apr 2 11:05 perf.data.guest
# id
uid=0(root) gid=0(root) groups=0(root),64(pkcs11)
Before this patch:
# perf kvm stat report
File perf.data.guest not owned by current user or root (use -f to override)
Initializing perf session failed
# perf kvm stat report -f
Error: unknown switch `f'
usage: perf kvm stat report [<options>]
--event <report event>
event for reporting: vmexit, mmio (x86 only),
ioport (x86 only)
--vcpu <n> vcpu id to report
-k, --key <sort-key> key for sorting: sample(sort by samples
number) time (sort by avg time)
-p, --pid <pid> analyze events only for given process id(s)
As shown above, the -f option does not work at all.
After this patch:
# perf kvm stat report
File perf.data.guest not owned by current user or root (use -f to override)
Initializing perf session failed
# perf kvm stat report -f
Analyze events for all VMs, all VCPUs:
VM-EXIT Samples Samples% Time% Min Time Max Time Avg time
Total Samples:0, Total events handled time:0.00us.
As shown above, the -f option really works now. Since we have not
launched any KVM related process, the result shows 0 sample here.
Signed-off-by: Yunlong Song <yunlong.song@huawei.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/r/1427982439-27388-5-git-send-email-yunlong.song@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Enable perf kmem to use perf.data when it is not owned by current user
or root.
Example:
# perf kmem record ls
# chown Yunlong.Song:Yunlong.Song perf.data
# ls -al perf.data
-rw------- 1 Yunlong.Song Yunlong.Song 5315665 Apr 2 10:54 perf.data
# id
uid=0(root) gid=0(root) groups=0(root),64(pkcs11)
Before this patch:
# perf kmem stat
File perf.data not owned by current user or root (use -f to override)
# perf kmem stat -f
Error: unknown switch `f'
usage: perf kmem [<options>] {record|stat}
-i, --input <file> input file name
-v, --verbose be more verbose (show symbol address, etc)
--caller show per-callsite statistics
--alloc show per-allocation statistics
-s, --sort <key[,key2...]>
sort by keys: ptr, call_site, bytes, hit,
pingpong, frag
-l, --line <num> show n lines
--raw-ip show raw ip instead of symbol
As shown above, the -f option does not work at all.
After this patch:
# perf kmem stat
File perf.data not owned by current user or root (use -f to override)
# perf kmem stat -f
SUMMARY
=======
Total bytes requested: 437599
Total bytes allocated: 615472
Total bytes wasted on internal fragmentation: 177873
Internal fragmentation: 28.900259%
Cross CPU allocations: 6/1192
As shown above, the -f option really works now.
Signed-off-by: Yunlong Song <yunlong.song@huawei.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/r/1427982439-27388-4-git-send-email-yunlong.song@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Enable perf inject to use perf.data when it is not owned by current user
or root.
Example:
# perf record ls
# chown Yunlong.Song:Yunlong.Song perf.data
# ls -al perf.data
-rw------- 1 Yunlong.Song Yunlong.Song 28260 Apr 2 10:37 perf.data
# id
uid=0(root) gid=0(root) groups=0(root),64(pkcs11)
Before this patch:
# perf inject -v -b -i perf.data -o perf.data.new
File perf.data not owned by current user or root (use -f to override)
# perf inject -v -b -i perf.data -o perf.data.new -f
Error: unknown switch `f'
usage: perf inject [<options>]
-b, --build-ids Inject build-ids into the output stream
-i, --input <file> input file name
-o, --output <file> output file name
-s, --sched-stat Merge sched-stat and sched-switch for getting
events where and how long tasks slept
-v, --verbose be more verbose (show build ids, etc)
--kallsyms <file>
kallsyms pathname
As shown above, the -f option does not work at all.
After this patch:
# perf inject -v -b -i perf.data -o perf.data.new
File perf.data not owned by current user or root (use -f to override)
# perf inject -v -b -i perf.data -o perf.data.new -f
build id event received for [kernel.kallsyms]:
f6dcb66d8b98f1c0d9eb87bf043444b69f91d30c
symsrc__init: cannot get elf header.
Looking at the vmlinux_path (7 entries long)
Using /proc/kcore for kernel object code
Using /proc/kallsyms for symbols
As shown above, the -f option really works now.
Signed-off-by: Yunlong Song <yunlong.song@huawei.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/r/1427982439-27388-3-git-send-email-yunlong.song@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Enable perf evlist to use perf.data when it is not owned by current user
or root.
Example:
# perf record ls
# chown Yunlong.Song:Yunlong.Song perf.data
# ls -al perf.data
-rw------- 1 Yunlong.Song Yunlong.Song 28260 Apr 2 10:18 perf.data
# id
uid=0(root) gid=0(root) groups=0(root),64(pkcs11)
Before this patch:
# perf evlist
File perf.data not owned by current user or root (use -f to override)
# perf evlist -f
Error: unknown switch `f'
usage: perf evlist [<options>]
-i, --input <file> Input file name
-F, --freq Show the sample frequency
-v, --verbose Show all event attr details
-g, --group Show event group information
As shown above, the -f option does not work at all.
After this patch:
# perf evlist
File perf.data not owned by current user or root (use -f to override)
# perf evlist -f
cycles
As shown above, the -f option really works now.
Signed-off-by: Yunlong Song <yunlong.song@huawei.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/r/1427982439-27388-2-git-send-email-yunlong.song@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Fix 'perf probe' to track down unnamed union/structure members.
perf probe did not track down the tree of unnamed union/structure
members, since it just failed to find given "name" in a parent
structure/union. To solve this issue, I've introduced 2 changes.
- Fix die_find_member() to track down the type-DIE if it is
unnamed, and if it contains the specified member, returns the
unnamed member.
(note that we don't return found member, since unnamed member
has the offset in the parent structure)
- Fix convert_variable_fields() to track down the unnamed union/
structure (one-by-one).
With this patch, perf probe can access unnamed fields:
-----
#./perf probe -nfx ./perf lock__delete ops 'locked_ops=ops->locked.ops'
Added new event:
probe_perf:lock__delete (on lock__delete in /home/mhiramat/ksrc/linux-3/tools/perf/perf with ops locked_ops=ops->locked.ops)
You can now use it in all perf tools, such as:
perf record -e probe_perf:lock__delete -aR sleep 1
-----
Reported-by: Arnaldo Carvalho de Melo <acme@kernel.org>
Report-Link: https://lkml.org/lkml/2015/3/5/431
Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20150402073312.14482.37942.stgit@localhost.localdomain
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
As it comes from address_location->thread, that is already stored as
export_sample->al, where the thread can be obtained.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: David Ahern <dsahern@gmail.com>
Cc: Don Zickus <dzickus@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/r/20150402141542.GA9630@kernel.org
Link: http://lkml.kernel.org/n/tip-bzotbl4epoztw0jd6sm2stpf@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
As it is available via another parameter, address_location->thread.
Acked-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: David Ahern <dsahern@gmail.com>
Cc: Don Zickus <dzickus@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Stephane Eranian <eranian@google.com>
Link: lkml.kernel.org/r/551D08F8.3040706@intel.com
Link: http://lkml.kernel.org/n/tip-6dbn0tcm9hyv92g7h3zj2dbt@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
It is already in the addr_location, so remove the redundant 'thread'
parameter from the callback signatures.
Acked-by: Jiri Olsa <jolsa@redhat.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: David Ahern <dsahern@gmail.com>
Cc: Don Zickus <dzickus@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/r/1427906210-10519-3-git-send-email-acme@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
We get the thread when we call perf_event__preprocess_sample(), no need
to do it before that.
Acked-by: Jiri Olsa <jolsa@redhat.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: David Ahern <dsahern@gmail.com>
Cc: Don Zickus <dzickus@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/r/1427906210-10519-2-git-send-email-acme@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
So bpf_tracing.o depends on CONFIG_BPF_SYSCALL - but that's not its only
dependency, it also depends on the tracing infrastructure and on kprobes,
without which it will fail to build with:
In file included from kernel/trace/bpf_trace.c:14:0:
kernel/trace/trace.h: In function ‘trace_test_and_set_recursion’:
kernel/trace/trace.h:491:28: error: ‘struct task_struct’ has no member named ‘trace_recursion’
unsigned int val = current->trace_recursion;
[...]
It took quite some time to trigger this build failure, because right now
BPF_SYSCALL is very obscure, depends on CONFIG_EXPERT. So also make BPF_SYSCALL
more configurable, not just under CONFIG_EXPERT.
If BPF_SYSCALL, tracing and kprobes are enabled then enable the bpf_tracing
gateway as well.
We might want to make this an interactive option later on, although
I'd not complicate it unnecessarily: enabling BPF_SYSCALL is enough of
an indicator that the user wants BPF support.
Cc: Alexei Starovoitov <ast@plumgrid.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: David S. Miller <davem@davemloft.net>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
One BPF program attaches to kmem_cache_alloc_node() and
remembers all allocated objects in the map.
Another program attaches to kmem_cache_free() and deletes
corresponding object from the map.
User space walks the map every second and prints any objects
which are older than 1 second.
Usage:
$ sudo tracex4
Then start few long living processes. The 'tracex4' will print
something like this:
obj 0xffff880465928000 is 13sec old was allocated at ip ffffffff8105dc32
obj 0xffff88043181c280 is 13sec old was allocated at ip ffffffff8105dc32
obj 0xffff880465848000 is 8sec old was allocated at ip ffffffff8105dc32
obj 0xffff8804338bc280 is 15sec old was allocated at ip ffffffff8105dc32
$ addr2line -fispe vmlinux ffffffff8105dc32
do_fork at fork.c:1665
As soon as processes exit the memory is reclaimed and 'tracex4'
prints nothing.
Similar experiment can be done with the __kmalloc()/kfree() pair.
Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: David S. Miller <davem@davemloft.net>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Link: http://lkml.kernel.org/r/1427312966-8434-10-git-send-email-ast@plumgrid.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
BPF C program attaches to
blk_mq_start_request()/blk_update_request() kprobe events to
calculate IO latency.
For every completed block IO event it computes the time delta
in nsec and records in a histogram map:
map[log10(delta)*10]++
User space reads this histogram map every 2 seconds and prints
it as a 'heatmap' using gray shades of text terminal. Black
spaces have many events and white spaces have very few events.
Left most space is the smallest latency, right most space is
the largest latency in the range.
Usage:
$ sudo ./tracex3
and do 'sudo dd if=/dev/sda of=/dev/null' in other terminal.
Observe IO latencies and how different activity (like 'make
kernel') affects it.
Similar experiments can be done for network transmit latencies,
syscalls, etc.
'-t' flag prints the heatmap using normal ascii characters:
$ sudo ./tracex3 -t
heatmap of IO latency
# - many events with this latency
- few events
|1us |10us |100us |1ms |10ms |100ms |1s |10s
*ooo. *O.#. # 221
. *# . # 125
.. .o#*.. # 55
. . . . .#O # 37
.# # 175
.#*. # 37
# # 199
. . *#*. # 55
*#..* # 42
# # 266
...***Oo#*OO**o#* . # 629
# # 271
. .#o* o.*o* # 221
. . o* *#O.. # 50
Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: David S. Miller <davem@davemloft.net>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Link: http://lkml.kernel.org/r/1427312966-8434-9-git-send-email-ast@plumgrid.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
this example has two probes in one C file that attach to
different kprove events and use two different maps.
1st probe is x64 specific equivalent of dropmon. It attaches to
kfree_skb, retrevies 'ip' address of kfree_skb() caller and
counts number of packet drops at that 'ip' address. User space
prints 'location - count' map every second.
2nd probe attaches to kprobe:sys_write and computes a histogram
of different write sizes
Usage:
$ sudo tracex2
location 0xffffffff81695995 count 1
location 0xffffffff816d0da9 count 2
location 0xffffffff81695995 count 2
location 0xffffffff816d0da9 count 2
location 0xffffffff81695995 count 3
location 0xffffffff816d0da9 count 2
557145+0 records in
557145+0 records out
285258240 bytes (285 MB) copied, 1.02379 s, 279 MB/s
syscall write() stats
byte_size : count distribution
1 -> 1 : 3 | |
2 -> 3 : 0 | |
4 -> 7 : 0 | |
8 -> 15 : 0 | |
16 -> 31 : 2 | |
32 -> 63 : 3 | |
64 -> 127 : 1 | |
128 -> 255 : 1 | |
256 -> 511 : 0 | |
512 -> 1023 : 1118968 |************************************* |
Ctrl-C at any time. Kernel will auto cleanup maps and programs
$ addr2line -ape ./bld_x64/vmlinux 0xffffffff81695995
0xffffffff816d0da9 0xffffffff81695995:
./bld_x64/../net/ipv4/icmp.c:1038 0xffffffff816d0da9:
./bld_x64/../net/unix/af_unix.c:1231
Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: David S. Miller <davem@davemloft.net>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Link: http://lkml.kernel.org/r/1427312966-8434-8-git-send-email-ast@plumgrid.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
tracex1_kern.c - C program compiled into BPF.
It attaches to kprobe:netif_receive_skb()
When skb->dev->name == "lo", it prints sample debug message into
trace_pipe via bpf_trace_printk() helper function.
tracex1_user.c - corresponding user space component that:
- loads BPF program via bpf() syscall
- opens kprobes:netif_receive_skb event via perf_event_open()
syscall
- attaches the program to event via ioctl(event_fd,
PERF_EVENT_IOC_SET_BPF, prog_fd);
- prints from trace_pipe
Note, this BPF program is non-portable. It must be recompiled
with current kernel headers. kprobe is not a stable ABI and
BPF+kprobe scripts may no longer be meaningful when kernel
internals change.
No matter in what way the kernel changes, neither the kprobe,
nor the BPF program can ever crash or corrupt the kernel,
assuming the kprobes, perf and BPF subsystem has no bugs.
The verifier will detect that the program is using
bpf_trace_printk() and the kernel will print 'this is a DEBUG
kernel' warning banner, which means that bpf_trace_printk()
should be used for debugging of the BPF program only.
Usage:
$ sudo tracex1
ping-19826 [000] d.s2 63103.382648: : skb ffff880466b1ca00 len 84
ping-19826 [000] d.s2 63103.382684: : skb ffff880466b1d300 len 84
ping-19826 [000] d.s2 63104.382533: : skb ffff880466b1ca00 len 84
ping-19826 [000] d.s2 63104.382594: : skb ffff880466b1d300 len 84
Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: David S. Miller <davem@davemloft.net>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Link: http://lkml.kernel.org/r/1427312966-8434-7-git-send-email-ast@plumgrid.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Debugging of BPF programs needs some form of printk from the
program, so let programs call limited trace_printk() with %d %u
%x %p modifiers only.
Similar to kernel modules, during program load verifier checks
whether program is calling bpf_trace_printk() and if so, kernel
allocates trace_printk buffers and emits big 'this is debug
only' banner.
Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
Reviewed-by: Steven Rostedt <rostedt@goodmis.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: David S. Miller <davem@davemloft.net>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1427312966-8434-6-git-send-email-ast@plumgrid.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
bpf_ktime_get_ns() is used by programs to compute time delta
between events or as a timestamp
Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
Reviewed-by: Steven Rostedt <rostedt@goodmis.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: David S. Miller <davem@davemloft.net>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1427312966-8434-5-git-send-email-ast@plumgrid.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
BPF programs, attached to kprobes, provide a safe way to execute
user-defined BPF byte-code programs without being able to crash or
hang the kernel in any way. The BPF engine makes sure that such
programs have a finite execution time and that they cannot break
out of their sandbox.
The user interface is to attach to a kprobe via the perf syscall:
struct perf_event_attr attr = {
.type = PERF_TYPE_TRACEPOINT,
.config = event_id,
...
};
event_fd = perf_event_open(&attr,...);
ioctl(event_fd, PERF_EVENT_IOC_SET_BPF, prog_fd);
'prog_fd' is a file descriptor associated with BPF program
previously loaded.
'event_id' is an ID of the kprobe created.
Closing 'event_fd':
close(event_fd);
... automatically detaches BPF program from it.
BPF programs can call in-kernel helper functions to:
- lookup/update/delete elements in maps
- probe_read - wraper of probe_kernel_read() used to access any
kernel data structures
BPF programs receive 'struct pt_regs *' as an input ('struct pt_regs' is
architecture dependent) and return 0 to ignore the event and 1 to store
kprobe event into the ring buffer.
Note, kprobes are a fundamentally _not_ a stable kernel ABI,
so BPF programs attached to kprobes must be recompiled for
every kernel version and user must supply correct LINUX_VERSION_CODE
in attr.kern_version during bpf_prog_load() call.
Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
Reviewed-by: Steven Rostedt <rostedt@goodmis.org>
Reviewed-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: David S. Miller <davem@davemloft.net>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1427312966-8434-4-git-send-email-ast@plumgrid.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
add TRACE_EVENT_FL_KPROBE flag to differentiate kprobe type of
tracepoints, since bpf programs can only be attached to kprobe
type of PERF_TYPE_TRACEPOINT perf events.
Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
Reviewed-by: Steven Rostedt <rostedt@goodmis.org>
Reviewed-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: David S. Miller <davem@davemloft.net>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1427312966-8434-3-git-send-email-ast@plumgrid.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Socket filter code and other subsystems with upcoming eBPF
support should not need to deal with the fact that we have
CONFIG_BPF_SYSCALL defined or not.
Having the bpf syscall as a config option is a nice thing and
I'd expect it to stay that way for expert users (I presume one
day the default setting of it might change, though), but code
making use of it should not care if it's actually enabled or
not.
Instead, hide this via header files and let the rest deal with it.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
Reviewed-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
Cc: David S. Miller <davem@davemloft.net>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Steven Rostedt <rostedt@goodmis.org>
Link: http://lkml.kernel.org/r/1427312966-8434-2-git-send-email-ast@plumgrid.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
User visible:
- Fix 'perf script' pipe mode segfault, by always initializing ordered_events in
perf_session__new. (Arnaldo Carvalho de Melo)
- Fix ppid for synthesized fork events (David Ahern)
- Fix kernel symbol resolution of callchains in S/390 by remembering the
cpumode. (David Hildenbrand)
Infrastructure:
- Disable libbabeltrace check by default in the build system (Jiri Olsa)
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQEcBAABAgAGBQJVGwphAAoJEBpxZoYYoA71Go8IAI75DrD3ekKtvcD9PnGOsCyX
DL004O2i9l7pE9ko6RNX1nD0W2jWTFpPfNcw2F/Xmd3LSBVSAs4l08Y7S7aEyW+3
IHjfBnuWTpPDXhByvioHqorzfDumvbHSS/fqrCILT3dTpwEMeitu7rn+Dz7gUllV
wOtEEvlt3u5vGxQa4C3tno1z197HSu0IqN0rLTqkAXM1l0MS9eFQehMGiMuIk2s1
DUrR/zkHSMPMNP6aAdwMfcg4jf3wXlqLSYZJlEojMLXulnlJexyOSomB6RNnsuNj
Qsvn/LXsy+8yiEvsOCHBFpVt1feztNb+OxlY/HIEI3EfLhgYQYG5q8/ymDoufEg=
=D57w
-----END PGP SIGNATURE-----
Merge tag 'perf-core-for-mingo' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/core
Pull perf/core improvements and fixes from Arnaldo Carvalho de Melo:
User visible changes:
- Fix 'perf script' pipe mode segfault, by always initializing ordered_events in
perf_session__new(). (Arnaldo Carvalho de Melo)
- Fix ppid for synthesized fork events. (David Ahern)
- Fix kernel symbol resolution of callchains in S/390 by remembering the
cpumode. (David Hildenbrand)
Infrastructure changes:
- Disable libbabeltrace check by default in the build system. (Jiri Olsa)
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
As these can be obtained from the ordered_events pointer, via
container_of, reducing the cross section of ordered_samples.
These were added to ordered_samples in:
commit b7b61cbebd
Author: Arnaldo Carvalho de Melo <acme@redhat.com>
Date: Tue Mar 3 11:58:45 2015 -0300
perf ordered_events: Shorten function signatures
By keeping pointers to machines, evlist and tool in ordered_events.
But that was more a transitional patch while moving stuff out from
perf_session.c to ordered_events.c and possibly not even needed by then,
as we could use the container_of() method and instead of having the
nr_unordered_samples stats in events_stats, we can have it in
ordered_samples.
Based-on-a-patch-by: Jiri Olsa <jolsa@kernel.org>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: David Ahern <dsahern@gmail.com>
Cc: Don Zickus <dzickus@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/n/tip-4lk0t9js82g0tfc0x1onpkjt@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Even when it is not used to actually reorder events, some of its fields
are used, like session->ordered_events->tool, to shorten function
signatures where tool, for instance, was being passed, as the tool is
needed for the ordered_events code, we need it there and might as well
use it for other perf_session needs.
This fixes a problem where 'perf script' had some condition that made
session->ordered_events not to be initialized even with its
script->tool ordered_events related flags asking for it to be, which
looks like another bug and needs to be investigated further.
Always initializing session->ordered_events at least leaves the current
assumptions in place, so do it now.
Reported-by: David Ahern <dsahern@gmail.com>
Reviewed-by: David Ahern <dsahern@gmail.com>
Tested-by: David Ahern <dsahern@gmail.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Don Zickus <dzickus@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/n/tip-b1xxk0rwkz2a0gip1uufmjqg@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
363b785f38 added synthesized fork events and set a thread's parent id to
itself. Since we are already processing /proc/<pid>/status the ppid can
be determined properly. Make it so.
Signed-off-by: David Ahern <dsahern@gmail.com>
Acked-by: Don Zickus <dzickus@redhat.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Joe Mario <jmario@redhat.com>
Link: http://lkml.kernel.org/r/1427747758-18510-2-git-send-email-dsahern@gmail.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Rather than parsing /proc/pid/status file one line at a time, read it
into a buffer in one shot and search for all strings in one pass.
tgid conversion also simplified -- removing the isspace walk. As noted
by Arnaldo those are not needed for atoi == strtol calls.
Signed-off-by: David Ahern <dsahern@gmail.com>
Acked-by: Don Zickus <dzickus@redhat.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Joe Mario <jmario@redhat.com>
Link: http://lkml.kernel.org/r/1427747758-18510-1-git-send-email-dsahern@gmail.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Commit 2e77784bb7 ("perf callchain: Move cpumode resolve code to
add_callchain_ip") promised "No change in behavior.".
As this commit breaks callchains on s390x (symbols not getting resolved,
observed when profiling the kernel), this statement is wrong. The cpumode
must be kept when iterating over all ips, otherwise the default
(PERF_RECORD_MISC_USER) will be used by error.
Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: David Hildenbrand <dahi@linux.vnet.ibm.com>
Cc: Hendrik Brueckner <brueckner@linux.vnet.ibm.com>
Cc: Kan Liang <kan.liang@intel.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1427703060-59883-1-git-send-email-dahi@linux.vnet.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Disabling libbabeltrace check by default and replacing the
NO_LIBBABELTRACE make variable with LIBBABELTRACE.
Users wanting the libbabeltrace feature need to build via:
$ make LIBBABELTRACE=1
The reason for this is that the libababeltrace interface we use (version
1.3) hasn't been packaged/released yet, thus the failing feature check
only slows down build and confuses other (non CTF) developers.
Requested-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Acked-by: Ingo Molnar <mingo@kernel.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Jeremie Galarneau <jgalar@efficios.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Tom Zanussi <tzanussi@gmail.com>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/r/20150328103030.GA8431@krava.redhat.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
While thinking on the whole clock discussion it occurred to me we have
two distinct uses of time:
1) the tracking of event/ctx/cgroup enabled/running/stopped times
which includes the self-monitoring support in struct
perf_event_mmap_page.
2) the actual timestamps visible in the data records.
And we've been conflating them.
The first is all about tracking time deltas, nobody should really care
in what time base that happens, its all relative information, as long
as its internally consistent it works.
The second however is what people are worried about when having to
merge their data with external sources. And here we have the
discussion on MONOTONIC vs MONOTONIC_RAW etc..
Where MONOTONIC is good for correlating between machines (static
offset), MONOTNIC_RAW is required for correlating against a fixed rate
hardware clock.
This means configurability; now 1) makes that hard because it needs to
be internally consistent across groups of unrelated events; which is
why we had to have a global perf_clock().
However, for 2) it doesn't really matter, perf itself doesn't care
what it writes into the buffer.
The below patch makes the distinction between these two cases by
adding perf_event_clock() which is used for the second case. It
further makes this configurable on a per-event basis, but adds a few
sanity checks such that we cannot combine events with different clocks
in confusing ways.
And since we then have per-event configurability we might as well
retain the 'legacy' behaviour as a default.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
An upcoming patch will depend on tai_ns() and NMI-safe ktime_get_raw_fast(),
so merge timers/core here in a separate topic branch until it's all cooked
and timers/core is merged upstream.
Signed-off-by: Ingo Molnar <mingo@kernel.org>
While looking at some fuzzer output I noticed that we do not hold any
locks on leader->ctx and therefore the sibling_list iteration is
unsafe.
Acquire the relevant ctx->mutex before calling into the pmu specific
code.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Sasha Levin <sasha.levin@oracle.com>
Link: http://lkml.kernel.org/r/20150225151639.GL5029@twins.programming.kicks-ass.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
perf_pmu_disable() is called before pmu->add() and perf_pmu_enable() is called
afterwards. No need to call these inside of x86_pmu_add() as well.
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1424281543-67335-1-git-send-email-dsahern@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Because it was the only clock for which we didn't have a _ns()
accessor yet.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
In preparation for more tk_fast instances, remove all hard-coded
tk_fast_mono references.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: John Stultz <john.stultz@linaro.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20150319093400.484279927@infradead.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Introduce tkr_raw and make use of it.
base_raw -> tkr_raw.base
clock->{mult,shift} -> tkr_raw.{mult.shift}
Kill timekeeping_get_ns_raw() in favour of
timekeeping_get_ns(&tkr_raw), this removes all mono_raw special
casing.
Duplicate the updates to tkr_mono.cycle_last into tkr_raw.cycle_last,
both need the same value.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: John Stultz <john.stultz@linaro.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20150319093400.422589590@infradead.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
In preparation of adding another tkr field, rename this one to
tkr_mono. Also rename tk_read_base::base_mono to tk_read_base::base,
since the structure is not specific to CLOCK_MONOTONIC and the mono
name got added to the tk_read_base instance.
Lots of trivial churn.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: John Stultz <john.stultz@linaro.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20150319093400.344679419@infradead.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
On Broadwell INST_RETIRED.ALL cannot be used with any period
that doesn't have the lowest 6 bits cleared. And the period
should not be smaller than 128.
This is erratum BDM11 and BDM55:
http://www.intel.com/content/dam/www/public/us/en/documents/specification-updates/5th-gen-core-family-spec-update.pdf
BDM11: When using a period < 100; we may get incorrect PEBS/PMI
interrupts and/or an invalid counter state.
BDM55: When bit0-5 of the period are !0 we may get redundant PEBS
records on overflow.
Add a new callback to enforce this, and set it for Broadwell.
How does this handle the case when an app requests a specific
period with some of the bottom bits set?
Short answer:
Any useful instruction sampling period needs to be 4-6 orders
of magnitude larger than 128, as an PMI every 128 instructions
would instantly overwhelm the system and be throttled.
So the +-64 error from this is really small compared to the
period, much smaller than normal system jitter.
Long answer (by Peterz):
IFF we guarantee perf_event_attr::sample_period >= 128.
Suppose we start out with sample_period=192; then we'll set period_left
to 192, we'll end up with left = 128 (we truncate the lower bits). We
get an interrupt, find that period_left = 64 (>0 so we return 0 and
don't get an overflow handler), up that to 128. Then we trigger again,
at n=256. Then we find period_left = -64 (<=0 so we return 1 and do get
an overflow). We increment with sample_period so we get left = 128. We
fire again, at n=384, period_left = 0 (<=0 so we return 1 and get an
overflow). And on and on.
So while the individual interrupts are 'wrong' we get then with
interval=256,128 in exactly the right ratio to average out at 192. And
this works for everything >=128.
So the num_samples*fixed_period thing is still entirely correct +- 127,
which is good enough I'd say, as you already have that error anyhow.
So no need to 'fix' the tools, al we need to do is refuse to create
INST_RETIRED:ALL events with sample_period < 128.
Signed-off-by: Andi Kleen <ak@linux.intel.com>
[ Updated comments and changelog a bit. ]
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1424225886-18652-3-git-send-email-andi@firstfloor.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Add Broadwell support for Broadwell to perf.
The basic support is very similar to Haswell. We use the new cache
event list added for Haswell earlier. The only differences
are a few bits related to remote nodes. To avoid an extra,
mostly identical, table these are patched up in the initialization code.
The constraint list has one new event that needs to be handled over Haswell.
Includes code and testing from Kan Liang.
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1424225886-18652-2-git-send-email-andi@firstfloor.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Haswell offcore events are quite different from Sandy Bridge.
Add a new table to handle Haswell properly.
Note that the offcore bits listed in the SDM are not quite correct
(this is currently being fixed). An uptodate list of bits is
in the patch.
The basic setup is similar to Sandy Bridge. The prefetch columns
have been removed, as prefetch counting is not very reliable
on Haswell. One L1 event that is not in the event list anymore
has been also removed.
- data reads do not include code reads (comparable to earlier Sandy Bridge tables)
- data counts include speculative execution (except L1 write, dtlb, bpu)
- remote node access includes both remote memory, remote cache, remote mmio.
- prefetches are not included in the counts for consistency
(different from Sandy Bridge, which includes prefetches in the remote node)
Signed-off-by: Andi Kleen <ak@linux.intel.com>
[ Removed the HSM30 comments; we don't have them for SNB/IVB either. ]
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1424225886-18652-1-git-send-email-andi@firstfloor.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
User visible:
- Show the first event with an invalid filter (David Ahern, Arnaldo Carvalho de Melo)
- Fix garbage output when intermixing syscalls from different threads in 'perf trace' (Arnaldo Carvalho de Melo)
- Fix 'perf timechart' SIBGUS error on sparc64 (David Ahern)
Infrastructure:
- Set JOBS based on CPU or processor, making it work on SPARC, where
/proc/cpuinfo has "CPU", not "processor" (David Ahern)
- Zero should not be considered "not found" in libtraceevent's eval_flag() (Steven Rostedt)
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQEcBAABAgAGBQJVFCe6AAoJEBpxZoYYoA71IPQH/0GyVG3tGC34Zl0GDqQF8dbL
skrDQpNaNMQ0lXRy4GGEp+wiOz2j5N5c22ArmZeUKMV0y5+LBv7H5BxzjCOgZXGn
yGy6JxB0S8RcM+iTbMqS1OIbtFh/FISmsSFnWoTOzYFxT0uCCIPaG6R/cJMNMhza
Udge/p8bbejeWaaKWUdyZkN1wH/c06hSOiaIQnn2vz3yiFCcfoLW746kYb2RI1tB
qMZvoG9fPvT7cEW2dn/z12JcnjaQxLHlSCIy26+Y+cMo5DwjBQdwDaCUFofFFPi3
QG9apo2CWQyUp96mB17IMHJG6QVh5Px0ZEvw2PLpYM3Y2KQM69eBDCU8qdPLQNI=
=YOUs
-----END PGP SIGNATURE-----
Merge tag 'perf-core-for-mingo' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/core
Pull perf/core improvements and fixes from Arnaldo Carvalho de Melo:
User visible changes:
- Show the first event with an invalid filter (David Ahern, Arnaldo Carvalho de Melo)
- Fix garbage output when intermixing syscalls from different threads in 'perf trace' (Arnaldo Carvalho de Melo)
- Fix 'perf timechart' SIBGUS error on sparc64 (David Ahern)
Infrastructure changes:
- Set JOBS based on CPU or processor, making it work on SPARC, where
/proc/cpuinfo has "CPU", not "processor" (David Ahern)
- Zero should not be considered "not found" in libtraceevent's eval_flag() (Steven Rostedt)
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Trivial cleanups, to improve the readability of the generic sched_clock() code:
- Improve and standardize comments
- Standardize the coding style
- Use vertical spacing where appropriate
- etc.
No code changed:
md5:
19a053b31e0c54feaeff1492012b019a sched_clock.o.before.asm
19a053b31e0c54feaeff1492012b019a sched_clock.o.after.asm
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Daniel Thompson <daniel.thompson@linaro.org>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Stephen Boyd <sboyd@codeaurora.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Currently it is possible for an NMI (or FIQ on ARM) to come in
and read sched_clock() whilst update_sched_clock() has locked
the seqcount for writing. This results in the NMI handler
locking up when it calls raw_read_seqcount_begin().
This patch fixes the NMI safety issues by providing banked clock
data. This is a similar approach to the one used in Thomas
Gleixner's 4396e058c52e("timekeeping: Provide fast and NMI safe
access to CLOCK_MONOTONIC").
Suggested-by: Stephen Boyd <sboyd@codeaurora.org>
Signed-off-by: Daniel Thompson <daniel.thompson@linaro.org>
Signed-off-by: John Stultz <john.stultz@linaro.org>
Reviewed-by: Stephen Boyd <sboyd@codeaurora.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Deacon <will.deacon@arm.com>
Link: http://lkml.kernel.org/r/1427397806-20889-6-git-send-email-john.stultz@linaro.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>