The 'perf record --group' option lacks documentation and confuses users.
As -e/--event option already supports group spec, it should not be used
anymore.
Also add a short description of event group itself.
Reported-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/r/1425266013-5034-1-git-send-email-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The perf record does not support -l option anymore, so nuke it.
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1425272038-10406-1-git-send-email-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Add --purge FILE to remove all caches of FILE.
Since the current --remove FILE removes a cache which has
same build-id of given FILE. Since the command takes a
FILE path, it can confuse user who tries to remove cache
about FILE path.
-----
# ./perf buildid-cache -v --add ./perf
Adding 133b7b5486d987a5ab5c3ebf4ea14941f45d4d4f ./perf: Ok
# (update the ./perf binary)
# ./perf buildid-cache -v --remove ./perf
Removing 305bbd1be68f66eca7e2d78db294653031edfa79 ./perf: FAIL
./perf wasn't in the cache
-----
Actually, the --remove's FAIL is not shown, it just silently fails.
So, this patch adds --purge FILE action for such usecase.
perf buildid-cache --purge FILE removes all caches which has same FILE
path.
In other words, it removes all caches including old binaries.
-----
# ./perf buildid-cache -v --add ./perf
Adding 133b7b5486d987a5ab5c3ebf4ea14941f45d4d4f ./perf: Ok
# (update the ./perf binary)
# ./perf buildid-cache -v --purge ./perf
Removing 133b7b5486d987a5ab5c3ebf4ea14941f45d4d4f ./perf: Ok
-----
BTW, if you want to purge all the caches, remove ~/.debug/* .
Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Hemant Kumar <hemant@linux.vnet.ibm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/20150227045026.1999.64084.stgit@localhost.localdomain
[ s/dirname/dir_name/g to fix build on fedora14, where dirname is a global ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Extend 'perf list --raw-dump' to 'perf list --raw-dump [hw|sw|cache
|tracepoint|pmu|event_glob]' in order to show the raw-dump of a certain
kind of events rather than all of the events.
Example:
Before this patch:
$ perf list --raw-dump hw
branch-instructions branch-misses bus-cycles cache-misses
cache-references cpu-cycles instructions stalled-cycles-backend
stalled-cycles-frontend
alignment-faults context-switches cpu-clock cpu-migrations
emulation-faults major-faults minor-faults page-faults task-clock
...
...
writeback:writeback_thread_start writeback:writeback_thread_stop
writeback:writeback_wait_iff_congested
writeback:writeback_wake_background writeback:writeback_wake_thread
As shown above, all of the events are printed.
After this patch:
$ perf list --raw-dump hw
branch-instructions branch-misses bus-cycles cache-misses
cache-references cpu-cycles instructions stalled-cycles-backend
stalled-cycles-frontend
As shown above, only the hw events are printed.
Signed-off-by: Yunlong Song <yunlong.song@huawei.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/r/1425032491-20224-5-git-send-email-yunlong.song@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Currently, the perf diff only works with same binaries. That's because
it compares the symbol start address. It doesn't work if the perf.data
comes from different binaries. This patch matches the symbol names.
Actually, perf diff once intended to compare the symbol names. The
commit as below can look for a pair by name.
604c5c9297 (perf diff: Change the default sort order to "dso,symbol")
However, at that time, perf diff used a global list of dsos. That means
the binaries which has same name can only be loaded once. That's a
problem for comparing different binaries.
For example, we have an old binary and an updated binary. They very
likely have same name and most of the functions, so only dsos from old
binary will be loaded. When processing the data from updated binary,
perf still use the symbol information from old binary. That's wrong.
Then the commit as below used IP to replace symbol name.
9c443dfdd3 ("perf diff: Fix support for all --sort combinations")
>From that time, perf diff starts to compare the symbol address.
The global dsos is discarded from a patch in 2010.
a1645ce12a ("perf: 'perf kvm' tool for monitoring guest performance
from host")
However, at that time, perf diff already compared by address. So perf
diff cannot work for different binaries as well.
This patch actually rolls back the perf diff to original design. The
document is also changed, so everybody knows the original design is to
compare the symbol names.
Here are some examples:
The only difference between example_v1.c and example_v2.c is the
location of f2 and f3. There is no change in behavior, but the previous
perf diff display the wrong differential profile.
example_v1.c
noinline void f3(void)
{
volatile int i;
for (i = 0; i < 10000;) {
if(i%2)
i++;
else
i++;
}
}
noinline void f2(void)
{
volatile int a = 100, b, c;
for (b = 0; b < 10000; b++)
c = a * b;
}
noinline void f1(void)
{
f2();
f3();
}
int main()
{
int i;
for (i = 0; i < 100000; i++)
f1();
}
example_v2.c
noinline void f2(void)
{
volatile int a = 100, b, c;
for (b = 0; b < 10000; b++)
c = a * b;
}
noinline void f3(void)
{
volatile int i;
for (i = 0; i < 10000;) {
if(i%2)
i++;
else
i++;
}
}
noinline void f1(void)
{
f2();
f3();
}
int main()
{
int i;
for (i = 0; i < 100000; i++)
f1();
}
[lk@localhost perf_diff]$ gcc example_v1.c -o example
[lk@localhost perf_diff]$ perf record -o example_v1.data ./example
[ perf record: Woken up 4 times to write data ]
[ perf record: Captured and wrote 0.813 MB example_v1.data (~35522 samples) ]
[lk@localhost perf_diff]$ gcc example_v2.c -o example
[lk@localhost perf_diff]$ perf record -o example_v2.data ./example
[ perf record: Woken up 4 times to write data ]
[ perf record: Captured and wrote 0.824 MB example_v2.data (~36015 samples) ]
Old perf diff result:
[lk@localhost perf_diff]$ perf diff example_v1.data example_v2.data
Event 'cycles'
Baseline Delta Shared Object Symbol
........ ....... ................ ...............................
[kernel.vmlinux] [k] __perf_event_task_sched_out
0.00% [kernel.vmlinux] [k] apic_timer_interrupt
[kernel.vmlinux] [k] idle_cpu
[kernel.vmlinux] [k] intel_pstate_timer_func
[kernel.vmlinux] [k] native_read_msr_safe
0.00% [kernel.vmlinux] [k] native_read_tsc
0.00% [kernel.vmlinux] [k] native_write_msr_safe
[kernel.vmlinux] [k] ntp_tick_length
0.00% [kernel.vmlinux] [k] rb_erase
0.00% [kernel.vmlinux] [k] tick_sched_timer
0.00% [kernel.vmlinux] [k] unmap_single_vma
0.00% [kernel.vmlinux] [k] update_wall_time
0.00% example [.] f1
46.24% example [.] f2
53.71% -7.55% example [.] f3
+53.81% example [.] f3
0.02% example [.] main
New perf diff result:
[lk@localhost perf_diff]$ perf diff example_v1.data example_v2.data
[kernel.vmlinux] [k] __perf_event_task_sched_out
0.00% [kernel.vmlinux] [k] apic_timer_interrupt
[kernel.vmlinux] [k] idle_cpu
[kernel.vmlinux] [k] intel_pstate_timer_func
[kernel.vmlinux] [k] native_read_msr_safe
0.00% [kernel.vmlinux] [k] native_read_tsc
0.00% [kernel.vmlinux] [k] native_write_msr_safe
[kernel.vmlinux] [k] ntp_tick_length
0.00% [kernel.vmlinux] [k] rb_erase
0.00% [kernel.vmlinux] [k] tick_sched_timer
0.00% [kernel.vmlinux] [k] unmap_single_vma
0.00% [kernel.vmlinux] [k] update_wall_time
0.00% example [.] f1
46.24% -0.08% example [.] f2
53.71% +0.11% example [.] f3
0.02% example [.] main
Signed-off-by: Kan Liang <kan.liang@intel.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Andi Kleen <ak@linux.intel.com>
Link: http://lkml.kernel.org/r/1423460384-11645-1-git-send-email-kan.liang@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Add new buildid cache if the update target file is not cached.
This can happen when an old binary is replaced by new one after caching
the old one. In this case, user sees his operation just failed.
But it does not look straight, since user just pass the binary "path",
not "build-id".
----
# ./perf buildid-cache --add ./perf
(update ./perf to new binary)
# ./perf buildid-cache --update ./perf
./perf wasn't in the cache
#
----
This patch adds given new binary to cache if the new binary is
not cached. So we'll not see the above error.
----
# ./perf buildid-cache --add ./perf
(update ./perf to new binary)
# ./perf buildid-cache --update ./perf
#
----
Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Hemant Kumar <hemant@linux.vnet.ibm.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/20150226065440.23912.1494.stgit@localhost.localdomain
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Adding 'perf data convert' to convert perf data file into different
format. This patch adds support for CTF format conversion.
To convert perf.data into CTF run:
$ perf data convert --to-ctf=./ctf-data/
[ perf data convert: Converted 'perf.data' into CTF data './ctf-data/' ]
[ perf data convert: Converted and wrote 11.268 MB (100230 samples) ]
The command will create CTF metadata out of perf.data file (or one
specified via -i option) and then convert all sample events into single
CTF stream.
Each sample_type bit is translated into separated CTF event field apart
from following exceptions:
PERF_SAMPLE_RAW - added in next patch
PERF_SAMPLE_READ - TODO
PERF_SAMPLE_CALLCHAIN - TODO
PERF_SAMPLE_BRANCH_STACK - TODO
PERF_SAMPLE_REGS_USER - TODO
PERF_SAMPLE_STACK_USER - TODO
$ perf --debug=data-convert=2 data convert ...
The converted CTF data could be analyzed by CTF tools, like babletrace
or tracecompass [1].
$ babeltrace ./ctf-data/
[03:19:13.962125533] (+?.?????????) cycles: { }, { ip = 0xFFFFFFFF8105443A, tid = 20714, pid = 20714, period = 1 }
[03:19:13.962130001] (+0.000004468) cycles: { }, { ip = 0xFFFFFFFF8105443A, tid = 20714, pid = 20714, period = 1 }
[03:19:13.962131936] (+0.000001935) cycles: { }, { ip = 0xFFFFFFFF8105443A, tid = 20714, pid = 20714, period = 8 }
[03:19:13.962133732] (+0.000001796) cycles: { }, { ip = 0xFFFFFFFF8105443A, tid = 20714, pid = 20714, period = 114 }
[03:19:13.962135557] (+0.000001825) cycles: { }, { ip = 0xFFFFFFFF8105443A, tid = 20714, pid = 20714, period = 2087 }
[03:19:13.962137627] (+0.000002070) cycles: { }, { ip = 0xFFFFFFFF81361938, tid = 20714, pid = 20714, period = 37582 }
[03:19:13.962161091] (+0.000023464) cycles: { }, { ip = 0xFFFFFFFF8124218F, tid = 20714, pid = 20714, period = 600246 }
[03:19:13.962517569] (+0.000356478) cycles: { }, { ip = 0xFFFFFFFF811A75DB, tid = 20714, pid = 20714, period = 1325731 }
[03:19:13.969518008] (+0.007000439) cycles: { }, { ip = 0x34080917B2, tid = 20714, pid = 20714, period = 1144298 }
The following members to the ctf-environment were decided to be added to
distinguish and specify perf CTF data:
- domain
It says "kernel" because it contains a kernel trace (not to be
confused with a user space like lttng-ust does)
- tracer_name
It says perf. This can be used to distinguish between lttng and perf
CTF based trace.
- version
The kernel version from stream. In addition to release, this is what
it looks like on a Debian kernel:
release = "3.14-1-amd64";
version = "3.14.0";
[1] http://projects.eclipse.org/projects/tools.tracecompass
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Reviewed-by: David Ahern <dsahern@gmail.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Jeremie Galarneau <jgalar@efficios.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Tom Zanussi <tzanussi@gmail.com>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/r/1424470628-5969-4-git-send-email-jolsa@kernel.org
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Adding new 'perf data' command to provide operations over data files.
The 'perf data convert' sub command is coming in following patch, but
there's possibility for other useful commands like 'perf data ls' (to
display perf data file in directory in ls style).
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Reviewed-by: David Ahern <dsahern@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Jeremie Galarneau <jgalar@efficios.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Tom Zanussi <tzanussi@gmail.com>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/r/1424470628-5969-3-git-send-email-jolsa@kernel.org
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Add an option to perf record to record running/enabled time for read
events, similar to what stat does.
This is useful to understand multiplexing problems.
Right now the report support is not great, but at least report -D
already supports it.
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: http://lkml.kernel.org/r/1424819620-16043-1-git-send-email-andi@firstfloor.org
[ Fixed the Documentation entry to match the OPT_BOOLEAN one ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Forgot to do it when adding the feature.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: David Ahern <dsahern@gmail.com>
Cc: Don Zickus <dzickus@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/n/tip-mx152b6x9cgknhw91vsyjlnd@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
User visible:
- 'perf trace': Allow mixing with tracepoints and suppressing plain syscalls
(Arnaldo Carvalho de Melo)
Infrastructure:
- Kconfig beachhead (Jiri Olsa)
- Simplify nr_pages validity (Kaixu Xia)
- Fixup header positioning in 'perf list' (Yunlong Song)
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQEcBAABAgAGBQJU3mKaAAoJEBpxZoYYoA71qnYH/1h8zqbQosuy/7Mu2tgLROts
2LSK8M+XD4RKdDVRLK95BIKmZfZkBjeOUE+PJIQ6/Mb1BQGBOmmGQ5oydLf2QUFw
5zVAFS8gec7xGvQpITuZEplJQcqm24CHt7qxUwFlh1DnRzN8eRkW2tHZmr5mfOil
hVpTQYpawRg/HIufDvlMU0Umv28JPQyRpfIF2TilkBxUT6KjYJK1QNuoNsgGS4ZL
r8rEpijRNkbmQZXmIDfZzvlzMx2Bwf0wdGf/1Rod1f1HLD4252ZKc07JCujBpvji
rK/oFj2hHx64r5HUQrOudlQ2B5VvlFKnWKnnb5EgL6gtM4moGhKjNHcUjFy1XLk=
=8zWn
-----END PGP SIGNATURE-----
Merge tag 'perf-core-for-mingo' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/core
Pull perf/core improvements and fixes from Arnaldo Carvalho de Melo:
User visible changes:
- No need to explicitely enable evsels for workload started from perf, let it
be enabled via perf_event_attr.enable_on_exec, removing some events that take
place in the 'perf trace' before a workload is really started by it.
(Arnaldo Carvalho de Melo)
- Fix to handle optimized not-inlined functions in 'perf probe' (Masami Hiramatsu)
- Update 'perf probe' man page (Masami Hiramatsu)
- 'perf trace': Allow mixing with tracepoints and suppressing plain syscalls
(Arnaldo Carvalho de Melo)
Infrastructure changes:
- Introduce {trace_seq_do,event_format_}_fprintf functions to allow
a default tracepoint field list printer to be used in tools that allows
redirecting output to a file. (Arnaldo Carvalho de Melo)
- The man page for pthread_attr_set_affinity_np states that _GNU_SOURCE
must be defined before pthread.h, do it to fix the build in some
systems (Josh Boyer)
- Cleanups in 'perf buildid-cache' (Masami Hiramatsu)
- Fix dso cache test case (Namhyung Kim)
- Do Not rely on dso__data_read_offset() to open DSO (Namhyung Kim)
- Make perf aware of tracefs (Steven Rostedt).
- Fix build by defining STT_GNU_IFUNC for glibc 2.9 and older (Vinson Lee)
- AArch64 symbol resolution fixes (Victor Kamensky)
- Kconfig beachhead (Jiri Olsa)
- Simplify nr_pages validity (Kaixu Xia)
- Fixup header positioning in 'perf list' (Yunlong Song)
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Currently, there are two call chain recording options, fp and dwarf.
Haswell has a new feature that utilizes the existing LBR facility to
record call chains. Kernel side LBR support code provides this as a
third option to record call chains. This patch enables the lbr call
stack support on the tooling side.
LBR call stack has some limitations:
- It reuses current LBR facility, so LBR call stack and branch record
can not be enabled at the same time.
- It is only available for user-space callchains.
However, it also offers some advantages:
- LBR call stack can work on user apps which don't have frame-pointers
or dwarf debug info compiled. It is a good alternative when nothing
else works.
Tested-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Kan Liang <kan.liang@intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Anshuman Khandual <khandual@linux.vnet.ibm.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Cody P Schafer <cody@linux.vnet.ibm.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Don Zickus <dzickus@redhat.com>
Cc: Jacob Shin <jacob.w.shin@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Masanari Iida <standby24x7@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Rodrigo Campos <rodrigo@sdfg.com.ar>
Cc: Stephane Eranian <eranian@google.com>
Cc: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
Link: http://lkml.kernel.org/r/1420482185-29830-2-git-send-email-kan.liang@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Update Documentation/perf-probe.txt to add descriptions of some newer
options.
Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20150130093746.30575.8571.stgit@localhost.localdomain
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The new hw_breakpoint bits are now ready for v3.20, merge them
into the main branch, to avoid conflicts.
Conflicts:
tools/perf/Documentation/perf-record.txt
Signed-off-by: Ingo Molnar <mingo@kernel.org>
This patch modifies perf mem to default to sampling loads and stores
simultaneously. It could only do one or the other before yet there was
no hardware restriction preventing simultaneous collection. With this
patch, one run is sufficient to collect both.
It is still possible to sample only loads or stores by using the
-t option:
$ perf mem -t load rec
$ perf mem -t load rep
Or
$ perf mem -t store rec
$ perf mem -t store rep
The perf report TUI will show one event at a time. The store output will
contain a Weight column which will be empty.
In V2, we updated the man pages to reflect the change and also simplify
the initialization of the argv vector passed to the cmd_*() functions as
per LKML feedback.
In V3, we fixed typos in the changelog.
Signed-off-by: Stephane Eranian <eranian@google.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Don Zickus <dzickus@redhat.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Joe Mario <jmario@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Richard Fowles <rfowles@redhat.com>
Link: http://lkml.kernel.org/r/20141217152355.GA10053@thinkpad
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Adding --buildid-dir to be able to set specific cache directory. It's
going to be handy for buildid tests coming in shortly.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Stephane Eranian <eranian@google.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Link: http://lkml.kernel.org/r/1417460789-13874-4-git-send-email-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Currently bp_len is given a default value of 4. Allow user to override it:
$ perf stat -e mem:0x1000/8
^
bp_len
If no value is given, it will default to 4 as it did before.
Signed-off-by: Jacob Shin <jacob.w.shin@gmail.com>
Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Reviewed-by: Oleg Nesterov <oleg@redhat.com>
Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: xiakaixu <xiakaixu@huawei.com>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Add a --branch-history option to perf report that changes all the
settings necessary for using the branches in callstacks.
This is just a short cut to make this nicer to use, it does not enable
any functionality by itself.
v2: Change sort order. Rename option to --branch-history to
be less confusing.
v3: Updates
v4: Fix conflict with newer perf base
v5: Port to latest tip
v6: Add more comments. Remove CCKEY_ADDRESS setting. Remove
unnecessary branch_mode setting. Use a boolean.
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: http://lkml.kernel.org/r/1415844328-4884-5-git-send-email-andi@firstfloor.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Currently branch stacks can be only shown as edge histograms for
individual branches. I never found this display particularly useful.
This implements an alternative mode that creates histograms over
complete branch traces, instead of individual branches, similar to how
normal callgraphs are handled. This is done by putting it in front of
the normal callgraph and then using the normal callgraph histogram
infrastructure to unify them.
This way in complex functions we can understand the control flow that
lead to a particular sample, and may even see some control flow in the
caller for short functions.
Example (simplified, of course for such simple code this is usually not
needed), please run this after the whole patchkit is in, as at this
point in the patch order there is no --branch-history, that will be
added in a patch after this one:
tcall.c:
volatile a = 10000, b = 100000, c;
__attribute__((noinline)) f2()
{
c = a / b;
}
__attribute__((noinline)) f1()
{
f2();
f2();
}
main()
{
int i;
for (i = 0; i < 1000000; i++)
f1();
}
% perf record -b -g ./tsrc/tcall
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.044 MB perf.data (~1923 samples) ]
% perf report --no-children --branch-history
...
54.91% tcall.c:6 [.] f2 tcall
|
|--65.53%-- f2 tcall.c:5
| |
| |--70.83%-- f1 tcall.c:11
| | f1 tcall.c:10
| | main tcall.c:18
| | main tcall.c:18
| | main tcall.c:17
| | main tcall.c:17
| | f1 tcall.c:13
| | f1 tcall.c:13
| | f2 tcall.c:7
| | f2 tcall.c:5
| | f1 tcall.c:12
| | f1 tcall.c:12
| | f2 tcall.c:7
| | f2 tcall.c:5
| | f1 tcall.c:11
| |
| --29.17%-- f1 tcall.c:12
| f1 tcall.c:12
| f2 tcall.c:7
| f2 tcall.c:5
| f1 tcall.c:11
| f1 tcall.c:10
| main tcall.c:18
| main tcall.c:18
| main tcall.c:17
| main tcall.c:17
| f1 tcall.c:13
| f1 tcall.c:13
| f2 tcall.c:7
| f2 tcall.c:5
| f1 tcall.c:12
The default output is unchanged.
This is only implemented in perf report, no change to record or anywhere
else.
This adds the basic code to report:
- add a new "branch" option to the -g option parser to enable this mode
- when the flag is set include the LBR into the callstack in machine.c.
The rest of the history code is unchanged and doesn't know the
difference between LBR entry and normal call entry.
- detect overlaps with the callchain
- remove small loop duplicates in the LBR
Current limitations:
- The LBR flags (mispredict etc.) are not shown in the history
and LBR entries have no special marker.
- It would be nice if annotate marked the LBR entries somehow
(e.g. with arrows)
v2: Various fixes.
v3: Merge further patches into this one. Fix white space.
v4: Improve manpage. Address review feedback.
v5: Rename functions. Better error message without -g. Fix crash without
-b.
v6: Rebase
v7: Rebase. Use NO_ENTRY in memset.
v8: Port to latest tip. Move add_callchain_ip to separate
patch. Skip initial entries in callchain. Minor cleanups.
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: http://lkml.kernel.org/r/1415844328-4884-3-git-send-email-andi@firstfloor.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
This patch fix spelling typos found in tool/perf/Documentation.
Signed-off-by: Masanari Iida <standby24x7@gmail.com>
Acked-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Randy Dunlap <rdunlap@infradead.org>
Link: http://lkml.kernel.org/r/1410275930-17207-1-git-send-email-standby24x7@gmail.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Some Linux symbols (for example __vt_event_wait) are interpreted by the
demangler as C++ mangled names, which of course they aren't.
Disable kernel symbol demangling by default to avoid this, and allow
enabling it with a new option --demangle-kernel for those who wish it.
Reported-by: Jiri Olsa <jolsa@redhat.com>
Signed-off-by: Avi Kivity <avi@cloudius-systems.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Jiri Olsa <jolsa@redhat.com>
Link: http://lkml.kernel.org/r/1410581705-26968-1-git-send-email-avi@cloudius-systems.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Add -w/--column-widths option like perf report does so that users are
able to see symbols even with some very long C++ library/functions.
It can be a list separated by comma for each column.
$ perf top -w 0,20,30
The value of 0 means there's no limit.
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung.kim@lge.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1406785662-5534-6-git-send-email-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Let perf inject take --kallsyms parameter the same as perf script and
perf report do.
That is needed for decoding Instruction Trace data using a copy of
/proc/kcore for the kernel object because the kallsyms path is used to
locate that copy.
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@gmail.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/r/1406035081-14301-30-git-send-email-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Adding --debug option as a way to setup debug variables. Starting with
support for verbose, more will come.
It's possible to use it now with report command:
$ perf --debug verbose ...
$ perf --debug verbose=2 ...
I'll need this support to add separated debug variable for ordered
events change in order to separate debug output out of standard verbose
stream.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/20140717105500.GG516@krava.redhat.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
On s390, the vmexit event has a tree-like structure: between
exit_event_begin and exit_event_end several other events may happen and
with each of them refining the previous ones.
This patch adds a decoder for such events to the generic code and also
the files <asm/kvm_perf.h> and kvm-stat.c for s390.
Commands 'perf kvm stat record', 'report' and 'live' are supported.
Reviewed-by: David Ahern <dsahern@gmail.com>
Signed-off-by: Alexander Yarygin <yarygin@linux.vnet.ibm.com>
Acked-by: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Cornelia Huck <cornelia.huck@de.ibm.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1404397747-20939-5-git-send-email-yarygin@linux.vnet.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Currently, timechart records only scheduler and CPU events (task switches,
running times, CPU power states, etc); this commit adds IO mode which
makes it possible to record IO (disk, network) activity. In this mode
perf timechart will generate SVG with IO charts (writes, reads, tx, rx, polls).
Signed-off-by: Stanislav Fomichev <stfomichev@yandex-team.ru>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Link: http://lkml.kernel.org/n/1404835423-23098-3-git-send-email-stfomichev@yandex-team.ru
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
This patch adds optional pagefault tracing support to 'perf trace'.
Using -F/--pf option user can specify whether he wants minor, major or
all pagefault events to be traced. This patch adds only live mode,
record and replace will come in a separate patch.
Example output:
1756272.905 ( 0.000 ms): curl/5937 majfault [0x7fa7261978b6] => /usr/lib/x86_64-linux-gnu/libkrb5.so.26.0.0@0x85288 (d.)
1862866.036 ( 0.000 ms): wget/8460 majfault [__clear_user+0x3f] => 0x659cb4 (?k)
Signed-off-by: Stanislav Fomichev <stfomichev@yandex-team.ru>
Cc: David Ahern <dsahern@gmail.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1403799268-1367-3-git-send-email-stfomichev@yandex-team.ru
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
There are a number of benchmarks that do single runs and as a result
does not really help users gain a general idea of how the workload
performs. So the user must either manually do multiple runs or just use
single bogus results.
This option will enable users to specify the amount of runs (arbitrarily
defaulted to 10, to use the existing benchmarks default) through the
'--repeat' option. Add it to perf-bench instead of implementing it
always in each specific benchmark.
Signed-off-by: Davidlohr Bueso <davidlohr@hp.com>
Cc: Aswin Chandramouleeswaran <aswin@hp.com>
Cc: Hitoshi Mitake <mitake@dcl.info.waseda.ac.jp>
Cc: Jiri Olsa <jolsa@kernel.org>
Link: http://lkml.kernel.org/r/1402942467-10671-2-git-send-email-davidlohr@hp.com
[ Kept the existing default of 10, changing it to something else should
be done on separate patch ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
User visible:
. Improve 'perf probe' error messages, moving some diagnostic messages to
only appear in --verbose mode and fixing up some error reporting related
to variables and struct members. (Masami Hiramatsu)
. Reflow 'perf timechart' man page. (Stanislav Fomichev)
Developer stuff:
. Be more precise when reporting missing libraries in a static tool build.
(Arnaldo Carvalho de Melo)
. Show error messages from the multiple make invoked from 'make build-test'.
(Arnaldo Carvalho de Melo)
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJTlxjYAAoJENZQFvNTUqpA5b4P/1Qs0S/HqAsVCqQe9143IxNS
HY0NrhBGm05rbYga+Bvp6lp9xXf3F9hp7i3rFANgB68sHLEmi8DU9T5vmvrq9TIU
+KT102re7eA/93rVQ+cvBqaosQVh8ia7O2tnr+FEhyBCNOIwTqtUI4g+9/IJB3h9
0xxsYLR2SZtV9aSZKXdSjOZ0wh8l0D1VjuCQd5wqYvqQ8r+1nOImKX3Y02Byftns
ZH/MkYtkmUbdFMdenRN2lJenDnIPji9AESPnTcZbXS23IIgnpOicgtRcrt9LVK4Y
Ty+ooLXmf57uXkoFpM4DMybuyUGH3xw44TB0PqZuBJ1Psgdm5SzdJfLshUKptLFc
XvxN8yaWSvOz2Bu/tS17o+PzXYdgk3Ar8UCWSYtkFDmfbaZC6RYzMfgHZnYsVlrf
ZjcIviBqkbHpTFkV3PJZi6PnvKCiNUj2rA5rv9ltc2XPMgHEGhqT7lxGgh0iGd/O
c8Wt/TjB6CRuMqk6N4Epb/yIIYbL01Ax3GdR1yw4exG7W75hLz+BBrT7P51Ivdg2
Ke2ysjpbARamBY3XOxCqA3zfWlhHdH1PrBexEkEa1+4ALk0W8TtEhkNgw+ZEiT9H
HbWXi9KwrNff0RAgzx2o9XiwO8iG/wLgO5AU0CNY9L2s7gosxE8BnSoPnvdVqhvl
lt/m+f8SKYavUlHNxvC3
=37tZ
-----END PGP SIGNATURE-----
Merge tag 'perf-core-for-mingo' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/core
Pull perf/core improvements from Arnaldo Carvalho de Melo:
User visible:
* Improve 'perf probe' error messages, moving some diagnostic messages to
only appear in --verbose mode and fixing up some error reporting related
to variables and struct members. (Masami Hiramatsu)
* Reflow 'perf timechart' man page. (Stanislav Fomichev)
Developer stuff:
* Be more precise when reporting missing libraries in a static tool build.
(Arnaldo Carvalho de Melo)
* Show error messages from the multiple make invoked from 'make build-test'.
(Arnaldo Carvalho de Melo)
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
In perf's 'mem-mode', one can get access to a whole bunch of details specific to a
particular sample instruction. A bunch of those details relate to the data
address.
One interesting thing you can do with data addresses is to convert them into a unique
cacheline they belong too. Organizing these data cachelines into similar groups and sorting
them can reveal cache contention.
This patch creates an alogorithm based on various sample details that can help group
entries together into data cachelines and allows 'perf report' to sort on it.
The algorithm relies on having proper mmap2 support in the kernel to help determine
if the memory map the data address belongs to is private to a pid or globally shared.
The alogortithm is as follows:
o group cpumodes together
o group entries with discovered maps together
o sort on major, minor, inode and inode generation numbers
o if userspace anon, then sort on pid
o sort on cachelines based on data addresses
The 'dcacheline' sort option in 'perf report' only works in 'mem-mode'.
Sample output:
#
# Samples: 206 of event 'cpu/mem-loads/pp'
# Total weight : 2534
# Sort order : dcacheline,pid
#
# Overhead Samples Data Cacheline Command: Pid
# ........ ............ ...................................................................... ..................
#
13.22% 1 [k] 0xffff88042f08ebc0 swapper: 0
9.27% 1 [k] 0xffff88082e8cea80 swapper: 0
3.59% 2 [k] 0xffffffff819ba180 swapper: 0
0.32% 1 [k] arch_trigger_all_cpu_backtrace_handler_na.23901+0xffffffffffffffe0 swapper: 0
0.32% 1 [k] timekeeper_seq+0xfffffffffffffff8 swapper: 0
Note: Added a '+1' to symlen size in hists__calc_col_len to prevent the next column
from prematurely tabbing over and mis-aligning. Not sure what the problem is.
Signed-off-by: Don Zickus <dzickus@redhat.com>
Link: http://lkml.kernel.org/r/1401208087-181977-8-git-send-email-dzickus@redhat.com
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
The --children option is for showing accumulated overhead (period)
value as well as self overhead. It should be used with one of -g or
--call-graph option.
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Tested-by: Arun Sharma <asharma@fb.com>
Tested-by: Rodrigo Campos <rodrigo@sdfg.com.ar>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Link: http://lkml.kernel.org/r/1401335910-16832-21-git-send-email-namhyung@kernel.org
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
The --children option is for showing accumulated overhead (period)
value as well as self overhead.
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Tested-by: Arun Sharma <asharma@fb.com>
Tested-by: Rodrigo Campos <rodrigo@sdfg.com.ar>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Link: http://lkml.kernel.org/r/1401335910-16832-16-git-send-email-namhyung@kernel.org
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
The --fields option is to allow user setup output field in any order.
It can receive any sort keys and following (hpp) fields:
overhead, overhead_sys, overhead_us, sample and period
If guest profiling is enabled, overhead_guest_{sys,us} will be
available too.
More more information, please see previous patch "perf report:
Add -F option to specify output fields"
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Link: http://lkml.kernel.org/r/1400480762-22852-15-git-send-email-namhyung@kernel.org
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
The -F/--fields option is to allow user setup output field in any
order. It can receive any sort keys and following (hpp) fields:
overhead, overhead_sys, overhead_us, sample and period
If guest profiling is enabled, overhead_guest_{sys,us} will be
available too.
The output fields also affect sort order unless you give -s/--sort
option. And any keys specified on -s option, will also be added to
the output field list automatically.
$ perf report -F sym,sample,overhead
...
# Symbol Samples Overhead
# .......................... ............ ........
#
[.] __cxa_atexit 2 2.50%
[.] __libc_csu_init 4 5.00%
[.] __new_exitfn 3 3.75%
[.] _dl_check_map_versions 1 1.25%
[.] _dl_name_match_p 4 5.00%
[.] _dl_setup_hash 1 1.25%
[.] _dl_sysdep_start 1 1.25%
[.] _init 5 6.25%
[.] _setjmp 6 7.50%
[.] a 8 10.00%
[.] b 8 10.00%
[.] brk 1 1.25%
[.] c 8 10.00%
Note that, the example output above is captured after applying next
patch which fixes sort/comparing behavior.
Requested-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Ingo Molnar <mingo@kernel.org>
Link: http://lkml.kernel.org/r/1400480762-22852-12-git-send-email-namhyung@kernel.org
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
The --percentage option is for controlling overhead percentage
displayed. It can only receive either of "relative" or "absolute" and
affects -c delta output only.
For more information, please see previous commit same thing done to
"perf report".
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Link: http://lkml.kernel.org/r/1397145720-8063-5-git-send-email-namhyung@kernel.org
Signed-off-by: Jiri Olsa <jolsa@redhat.com>
The --percentage option is for controlling overhead percentage
displayed. It can only receive either of "relative" or "absolute".
Move the parser callback function into a common location since it's
used by multiple commands now.
For more information, please see previous commit same thing done to
"perf report".
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Link: http://lkml.kernel.org/r/1397145720-8063-4-git-send-email-namhyung@kernel.org
Signed-off-by: Jiri Olsa <jolsa@redhat.com>
The --percentage option is for controlling overhead percentage
displayed. It can only receive either of "relative" or "absolute".
"relative" means it's relative to filtered entries only so that the
sum of shown entries will be always 100%. "absolute" means it retains
the original value before and after the filter is applied.
$ perf report -s comm
# Overhead Command
# ........ ............
#
74.19% cc1
7.61% gcc
6.11% as
4.35% sh
4.14% make
1.13% fixdep
...
$ perf report -s comm -c cc1,gcc --percentage absolute
# Overhead Command
# ........ ............
#
74.19% cc1
7.61% gcc
$ perf report -s comm -c cc1,gcc --percentage relative
# Overhead Command
# ........ ............
#
90.69% cc1
9.31% gcc
Note that it has zero effect if no filter was applied.
Suggested-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Link: http://lkml.kernel.org/r/1397145720-8063-3-git-send-email-namhyung@kernel.org
Signed-off-by: Jiri Olsa <jolsa@redhat.com>
On perf top, the -s option is used for --sort, but the man page
contains invalid documentation of -s option for --sym-annotate.
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung.kim@lge.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1395193578-27098-1-git-send-email-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Jiri Olsa <jolsa@redhat.com>
Clarify how to specify x86 registers in perf probe. I recently ran into
this problem and had to figure it out from the source.
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Acked-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Link: http://lkml.kernel.org/r/1393596135-4227-3-git-send-email-andi@firstfloor.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Clarify in the documentation that 'perf mem report' reports use-latency,
not load/store-latency on Intel systems.
This often causes confusion with users.
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/r/1393596135-4227-2-git-send-email-andi@firstfloor.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
To be consistent with the equivalent option in 'stat', also, for the
same reason, use -D as the one letter alias.
Suggested-by: Ingo Molnar <mingo@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/n/tip-p5yjnopajb3a8x0xha7yl5w8@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
That is how the option summary describes it and so that we can free
--delay to replace --initial-delay and then be consistent with stat's
--delay equivalent option.
Suggested-by: Ingo Molnar <mingo@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/n/tip-f8hd2010uhjl2zzb34hepbmi@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
perf stat has a --delay option to delay measuring the workload.
This is useful to skip measuring the startup phase of the program, which
is often very different from the main workload.
The same is useful for perf record when sampling.
--no-delay was already taken, so add a --initial-delay
to perf record too.
-D was already taken for record, so there is only a long option.
v2: Don't disable group members (Namhyung Kim)
v3: port to latest perf/core
rename to --initial-delay to avoid conflict with --no-delay
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: http://lkml.kernel.org/r/1389476307-2124-1-git-send-email-andi@firstfloor.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The --delay option was documented as --initial-delay in the manpage. Fix this.
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: http://lkml.kernel.org/r/1389132847-31982-1-git-send-email-andi@firstfloor.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
This option highlights tasks (using different color) that run more than
given duration or tasks with given name.
Signed-off-by: Stanislav Fomichev <stfomichev@yandex-team.ru>
Acked-by: Ingo Molnar <mingo@kernel.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Ramkumar Ramachandra <artagnon@gmail.com>
Link: http://lkml.kernel.org/r/20131217155349.GA13021@stfomichev-desktop
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Currently the perf.data header is always displayed for stdio output,
which is no always useful.
Disabling header information by default and adding following options to
control header output:
--header - display header information
--header-only - display header information only w/o further
processing
Signed-off-by: Jiri Olsa <jolsa@redhat.com>
Link: http://lkml.kernel.org/n/tip-0ehaawv5xc83w6ag03c5hi10@git.kernel.org
Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/r/1386583370-1699-3-git-send-email-jolsa@redhat.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Currently the perf.data header is always displayed for stdio output,
which is no always useful.
Disabling header information by default and adding following options to
control header output:
--header - display header information (old default)
--header-only - display header information only w/o further
processing, forces stdio output
Signed-off-by: Jiri Olsa <jolsa@redhat.com>
Acked-by: David Ahern <dsahern@gmail.com>
Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/r/1386583370-1699-2-git-send-email-jolsa@redhat.com
[ Added single line explaining talking about the new --header* options,
to address David Ahern comment; better man page entry for the new options,
from Namhyung Kim ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
As there is no -v option for perf kvm, the all debug message for perf
kvm will nerver be printed out to user.
Example:
# perf kvm --guestmount /tmp/guestmount/ record -a
Not enough memory for reading perf file header
It is confusing message for newbies such as me. With this patch applied,
we can use -v option to get the detail.
Example:
# perf kvm --guestmount /tmp/guestmount/ record -a -v
Can't access file /tmp/guestmount//15069/proc/kallsyms
Not enough memory for reading perf file header
Signed-off-by: Dongsheng Yang <yangds.fnst@cn.fujitsu.com>
Cc: David Ahern <dsahern@gmail.com>
Link: http://lkml.kernel.org/r/1386609311-23889-1-git-send-email-yangds.fnst@cn.fujitsu.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Add field 'srcline' that displays the source file name and line number
associated with the sample ip. The information displayed is the same as
from addr2line.
$ perf script -f comm,tid,pid,time,ip,sym,dso,symoff,srcline
grep 10701/10701 2497321.421013: ffffffff81043ffa native_write_msr_safe+0xa ([kernel.kallsyms])
/usr/src/debug/kernel-3.9.fc17/linux-3.9.10-100.fc17.x86_64/arch/x86/include/asm/msr.h:95
grep 10701/10701 2497321.421984: ffffffff8165b6b3 _raw_spin_lock+0x13 ([kernel.kallsyms])
/usr/src/debug/kernel-3.9.fc17/linux-3.9.10-100.fc17.x86_64/arch/x86/include/asm/spinlock.h:54
grep 10701/10701 2497321.421990: ffffffff810b64b3 tick_sched_timer+0x53 ([kernel.kallsyms])
/usr/src/debug/kernel-3.9.fc17/linux-3.9.10-100.fc17.x86_64/kernel/time/tick-sched.c:840
grep 10701/10701 2497321.421992: ffffffff8106f63f run_timer_softirq+0x2f ([kernel.kallsyms])
/usr/src/debug/kernel-3.9.fc17/linux-3.9.10-100.fc17.x86_64/kernel/timer.c:1372
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Namhyung Kim <namhyung@gmail.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/r/1386315778-11633-3-git-send-email-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
As we have changed the default behavior of 'perf kvm' to --guest
enabled, the parts of the man page that covers the 'record' subcommand
are outdated.
This patch updates it to show the correct output with
--host/--guest/neither/both of them.
Signed-off-by: Dongsheng Yang <yangds.fnst@cn.fujitsu.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/3a3a9c1e05acb5a274d1d8369db5a4c6467d6276.1386197481.git.yangds.fnst@cn.fujitsu.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
As option --host and --guest request no input for it, there should not
be a '=' after them in the man page sources.
And --output expects a filename as the input, so there should be a '='
after it.
This patch removes the needless '=' after --guest and --host, and adds a
'=' after --output in perf-kvm.txt.
Signed-off-by: Dongsheng Yang <yangds.fnst@cn.fujitsu.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/6124d9eb10a3f1f6b399d1db660110bc7a60fd6b.1386197481.git.yangds.fnst@cn.fujitsu.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
As the buildid is read from /sys/kernel/notes, then if we use perf kvm
buildid-list with a perf data file captured by perf kvm record with
--guestkallsyms and --guestmodules, there is no result in output.
This patch add a explanation about it and add a limit of using perf kvm
buildid-list.
Signed-off-by: Dongsheng Yang <yangds.fnst@cn.fujitsu.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/d605a805486340b53bc261aa64d7632ad0a8cf53.1386197481.git.yangds.fnst@cn.fujitsu.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Add -g flag to `perf timechart record` which saves callchain info in the
perf.data.
When generating SVG, add backtrace information to the figure details, so
now it's possible to see which code path woke up the task and why some
task went to sleep.
Signed-off-by: Stanislav Fomichev <stfomichev@yandex-team.ru>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1383323151-19810-8-git-send-email-stfomichev@yandex-team.ru
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
If we don't want either power or task events we may use -T or -P with
the `perf timechart record` command to filter out events while recording
to keep perf.data small.
Signed-off-by: Stanislav Fomichev <stfomichev@yandex-team.ru>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1383323151-19810-7-git-send-email-stfomichev@yandex-team.ru
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
In order to make SVG smaller and faster to browse add possibility to
switch off power related information with -T switch.
Signed-off-by: Stanislav Fomichev <stfomichev@yandex-team.ru>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1383323151-19810-5-git-send-email-stfomichev@yandex-team.ru
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Add -n option to specify min. number of tasks to print.
Signed-off-by: Stanislav Fomichev <stfomichev@yandex-team.ru>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1383323151-19810-3-git-send-email-stfomichev@yandex-team.ru
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The change to per-cpu mmaps causes the -p, -t and -u options now to have
inheritance enabled by default. Change that back to no inheritance but
for the -t option only.
Requested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Namhyung Kim <namhyung@gmail.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/r/1384768557-23331-5-git-send-email-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
This affects the -p, -t and -u options that previously defaulted to
per-thread mmaps.
Consequently add an option to select per-thread mmaps to support the old
behaviour.
Note that per-thread can be used with a workload-only (i.e. none of -p,
-t, -u, -a or -C is selected) to get a per-thread mmap with no
inheritance.
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Namhyung Kim <namhyung@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/r/5286271D.3020808@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
In most commands -g is used for callchains. Make perf-top follow suit.
Move group to just --group with no short cut making it similar to
perf-record.
Signed-off-by: David Ahern <dsahern@gmail.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: http://lkml.kernel.org/r/1384487490-6865-1-git-send-email-dsahern@gmail.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
By default, when tasks are specified (i.e. -p, -t or -u options)
per-thread mmaps are created.
Add an option to override that and force per-cpu mmaps.
Further comments by peterz:
So this option allows -t/-p/-u to create one buffer per cpu and attach
all the various thread/process/user tasks' their counters to that one
buffer?
As opposed to the current state where each such counter would have its
own buffer.
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Tested-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Namhyung Kim <namhyung@gmail.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/r/1383313899-15987-7-git-send-email-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Per request from Pekka make --summary a summary only option meaning do
not show the individual system calls. Add another option to see all
syscalls along with the summary. In addition use 's' and 'S' as
shortcuts for the options.
Requested-by: Pekka Enberg <penberg@kernel.org>
Signed-off-by: David Ahern <dsahern@gmail.com>
Tested-by: Pekka Enberg <penberg@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Pekka Enberg <penberg@kernel.org>
Link: http://lkml.kernel.org/r/1384273875-3751-1-git-send-email-dsahern@gmail.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Splitting -G and --call-graph for record command, so we could use '-G'
with no option.
The '-G' option now takes NO argument and enables the configured unwind
method, which is currently the frame pointers method.
It will be possible to configure unwind method via config file in
upcoming patches.
All current '-G' arguments is overtaken by --call-graph option.
NOTE: The documentation for top --call-graph option
was wrongly copied from report command.
Signed-off-by: Jiri Olsa <jolsa@redhat.com>
Tested-by: David Ahern <dsahern@gmail.com>
Tested-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: David Ahern <dsahern@gmail.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1382797536-32303-3-git-send-email-jolsa@redhat.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Splitting -g and --call-graph for record command, so we could use '-g'
with no option.
The '-g' option now takes NO argument and enables the configured unwind
method, which is currently the frame pointers method.
It will be possible to configure unwind method via config file in
upcoming patches.
All current '-g' arguments is overtaken by --call-graph option.
Signed-off-by: Jiri Olsa <jolsa@redhat.com>
Tested-by: David Ahern <dsahern@gmail.com>
Tested-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: David Ahern <dsahern@gmail.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1382797536-32303-2-git-send-email-jolsa@redhat.com
[ reordered -g/--call-graph on --help and expanded the man page
according to comments by David Ahern and Namhyung Kim ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
When the callgraph function is enabled (-G), it may take a long time to
scan all the stack data and merge them accordingly.
This patch adds a new --max-stack option to perf-top to limit the depth
of callchain stack data to look at to reduce the time it takes for
perf-top to finish its processing. It reduces the amount of information
provided to the user in exchange for faster speed.
Signed-off-by: Waiman Long <Waiman.Long@hp.com>
Acked-by: David Ahern <dsahern@gmail.com>
Tested-by: Davidlohr Bueso <davidlohr@hp.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Aswin Chandramouleeswaran <aswin@hp.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Scott J Norton <scott.norton@hp.com>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/r/1382107129-2010-5-git-send-email-Waiman.Long@hp.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
When callgraph data was included in the perf data file, it may take a
long time to scan all those data and merge them together especially if
the stored callchains are long and the perf data file itself is large,
like a Gbyte or so.
The callchain stack is currently limited to PERF_MAX_STACK_DEPTH (127).
This is a large value. Usually the callgraph data that developers are
most interested in are the first few levels, the rests are usually not
looked at.
This patch adds a new --max-stack option to perf-report to limit the
depth of callchain stack data to look at to reduce the time it takes for
perf-report to finish its processing. It trades the presence of trailing
stack information with faster speed.
The following table shows the elapsed time of doing perf-report on a
perf.data file of size 985,531,828 bytes.
--max_stack Elapsed Time Output data size
----------- ------------ ----------------
not set 88.0s 124,422,651
64 87.5s 116,303,213
32 87.2s 112,023,804
16 86.6s 94,326,380
8 59.9s 33,697,248
4 40.7s 10,116,637
-g none 27.1s 2,555,810
Signed-off-by: Waiman Long <Waiman.Long@hp.com>
Acked-by: David Ahern <dsahern@gmail.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Aswin Chandramouleeswaran <aswin@hp.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Scott J Norton <scott.norton@hp.com>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/r/1382107129-2010-4-git-send-email-Waiman.Long@hp.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Initially it tries to find a probe:vfs_getname that should be setup
with:
perf probe 'vfs_getname=getname_flags:65 pathname=result->name:string'
or with slight changes to cope with code flux in the getname_flags code.
In the future, if a "vfs:getname" tracepoint becomes available, then it
will be preferred.
This is not strictly required and more expensive method of reading the
/proc/pid/fd/ symlink will be used when the fd->path array entry is not
populated by a previous vfs_getname + open syscall ret sequence.
As with any other 'perf probe' probe the setup must be done just once
and the probe will be left inactive, waiting for users, be it 'perf
trace' of any other tool.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/n/tip-ujg8se8glq5izmu8cdkq15po@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
kcore can be used to view the running kernel object code. However,
kcore changes as modules are loaded and unloaded, and when the kernel
decides to modify its own code. Consequently it is useful to create a
copy of kcore at a particular time. Unlike vmlinux, kcore is not unique
for a given build-id. And in addition, the kallsyms and modules files
are also needed. The tool therefore creates a directory:
~/.debug/[kernel.kcore]/<build-id>/<YYYYmmddHHMMSShh>
which contains: kcore, kallsyms and modules.
Note that the copied kcore contains only code sections. See the
kcore_copy() function for how that is determined.
The tool will not make additional copies of kcore if there is already
one with the same modules at the same addresses.
Currently, perf tools will not look for kcore in the cache. That is
addressed in another patch.
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Namhyung Kim <namhyung@gmail.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/r/525BF849.5030405@intel.com
[ renamed 'index' to 'idx' to avoid shadowing string.h symbol in f12,
use at least one member initializer when initializing a struct to
zeros, also to fix the build on f12 ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
While at it, update the synopsis to show both forms.
Signed-off-by: Ramkumar Ramachandra <artagnon@gmail.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Namhyung Kim <namhyung@gmail.com>
Link: http://lkml.kernel.org/r/1380791716-10325-1-git-send-email-artagnon@gmail.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
'make install' used to show all the install lines, which is way too
verbose to be really informative to the user.
Implement summary output instead:
comet:~/tip/tools/perf> make install
BUILD: Doing 'make -j12' parallel build
SUBDIR Documentation
INSTALL Documentation-man
INSTALL binaries
INSTALL libexec
INSTALL perf-archive
INSTALL perl-scripts
INSTALL python-scripts
INSTALL bash_completion-script
INSTALL tests
'make install V=1' will still show the old, detailed output.
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: http://lkml.kernel.org/r/1381312169-17354-5-git-send-email-mingo@kernel.org
[ Fixed conflict with libperf-gtk patches in acme/perf/core, cope with 'trace' alias ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The various build lines from libtraceevent and perf mix up during a
parallel build and produce unaligned output like:
CC builtin-buildid-list.o
CC builtin-buildid-cache.o
CC builtin-list.o
CC FPIC trace-seq.o
CC builtin-record.o
CC FPIC parse-filter.o
CC builtin-report.o
CC builtin-stat.o
CC FPIC parse-utils.o
CC FPIC kbuffer-parse.o
CC builtin-timechart.o
CC builtin-top.o
CC builtin-script.o
BUILD STATIC LIB libtraceevent.a
CC builtin-probe.o
CC builtin-kmem.o
CC builtin-lock.o
To solve this, harmonize all the build message alignments to be similar
to the kernel's kbuild output: prefixed by two spaces and 11-char wide.
After the patch the output looks pretty tidy, even if output lines get
mixed up:
CC builtin-annotate.o
FLAGS: * new build flags or cross compiler
CC builtin-bench.o
AR liblk.a
CC bench/sched-messaging.o
CC FPIC event-parse.o
CC bench/sched-pipe.o
CC FPIC trace-seq.o
CC bench/mem-memcpy.o
CC bench/mem-memset.o
CC FPIC parse-filter.o
CC builtin-diff.o
CC builtin-evlist.o
CC builtin-help.o
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1381312169-17354-3-git-send-email-mingo@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
'make clean' used to show all the rm lines, which isn't really
informative in any way and spams the console.
Implement summary output:
comet:~/tip/tools/perf> make clean
CLEAN libtraceevent
CLEAN liblk
CLEAN config
CLEAN core-objs
CLEAN core-progs
CLEAN core-gen
CLEAN Documentation
CLEAN python
'make clean V=1' will still show the old, detailed output.
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1381312169-17354-2-git-send-email-mingo@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The record option is a convience alias to include the -e raw_syscalls:*
argument to perf-record. All other options are passed to perf-record's
handler. Resulting data file can be analyzed by perf-trace -i.
Signed-off-by: David Ahern <dsahern@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung.kim@lge.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/r/1380395584-9025-5-git-send-email-dsahern@gmail.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Adding possibility to specify mmap size via -m/--mmap-pages
by appending unit size character (B/K/M/G) to the
number, like:
$ perf record -m 8K ls
$ perf record -m 2M ls
The size is rounded up appropriately to follow perf
mmap restrictions.
If no unit is specified the number provides pages as
of now, like:
$ perf record -m 8 ls
Signed-off-by: Jiri Olsa <jolsa@redhat.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1378031796-17892-3-git-send-email-jolsa@redhat.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
While perf-lock currently reports both the total wait time and the
number of contentions, it doesn't explicitly show the average wait time.
Having this value immediately in the report can be quite useful when
looking into performance issues.
Furthermore, allowing report to sort by averages is another handy
feature to have - and thus do not only print the value, but add it to
the lock_stat structure.
Signed-off-by: Davidlohr Bueso <davidlohr@hp.com>
Cc: Aswin Chandramouleeswaran <aswin@hp.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Hitoshi Mitake <mitake@dcl.info.waseda.ac.jp>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1378693159-8747-8-git-send-email-davidlohr@hp.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Current timestamp shown for output is time relative to firt sample. This
patch adds an option to show the absolute perf_clock timestamp which is
useful when comparing output across commands (e.g., perf-trace to
perf-script).
Signed-off-by: David Ahern <dsahern@gmail.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/r/1378319865-55695-1-git-send-email-dsahern@gmail.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Linus reported the following perf build system bug:
'Another annoyance during that make was that "make install" seems to
want to re-make the thing I just built. That's absolutely horrible, [...]'
The thing that got re-built were 'only' the (numerous) feature checks,
not the whole project - but still it was mighty annoying as the feature
checks took 9+ seconds even on reasonably fast boxes.
Even with the autodep patches where feature detection is much faster
it wastes resources, wastes screen real estate and confuses users if
we execute feature detection twice.
There were two sources for these unnecessary re-builds of the feature
checks:
- Unnecessary nested invocations of $(MAKE), apparently to be able
to do conditional compilation dependent on documentation tools
presence. Use straight dependencies instead, with no nesting.
- A direct invocation of $(MAKE) to rebuild the PERF-VERSION-FILE.
This is apparently done to be able to include it into the
Makefile:
-include $(OUTPUT)PERF-VERSION-FILE
but that's entirely pointless for two reasons: 1) the version file
gets regenerated by the initial build pass anyway, 2) including it
is futile, given its contents:
#define PERF_VERSION "3.12.rc3.g8510c7"
'make' will interpret that as a comment line...
So just remove this part of the doc-generation logic.
With these things fixed a 'make install' now rebuilds only what is needed.
A repeated 'make install' on an already built tree is super fast now,
it finishes in under 0.3 seconds:
#
# After the patch:
#
$ time make install
...
real 0m0.280s
user 0m0.162s
sys 0m0.054s
Prior all the autodep changes and prior this fix, a repeat 'make install'
took 24.1 seconds (!) on the same system:
#
# Before the patches:
#
$ time make install
...
real 0m24.109s
user 0m21.171s
sys 0m2.449s
Which almost entirely was caused by fixable build system fat.
We are now literally ~86 times faster.
A fresh rebuild and install now takes just 11.4 seconds:
#
# After the patch:
#
$ make clean
$ time make -j16 install
...
real 0m11.457s
user 1m43.411s
sys 0m7.610s
Without the patches it took 27.8 seconds:
#
# Before the patches:
#
$ make clean
$ time make -j16 install
...
real 0m27.801s
user 1m59.242s
sys 0m9.749s
So even in the complete rebuild case we are now ~2.5 times faster.
Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Link: http://lkml.kernel.org/n/tip-x4qjnxjGrgxpribq8sdakfTp@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Add support for recording and displaying the transaction flags.
They are essentially a new sort key. Also display them
in a nice way to the user.
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1379688044-14173-6-git-send-email-andi@firstfloor.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Make perf record -j aware of the new in_tx,no_tx,abort_tx branch qualifiers.
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1379688044-14173-5-git-send-email-andi@firstfloor.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Extend the perf branch sorting code to support sorting by in_tx
or abort_tx qualifiers. Also print out those qualifiers.
This also fixes up some of the existing sort key documentation.
We do not support no_tx here, because it's simply not showing
the in_tx flag.
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1379688044-14173-4-git-send-email-andi@firstfloor.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Add support to perf stat to print the basic transactional execution statistics:
Total cycles, Cycles in Transaction, Cycles in aborted transsactions
using the in_tx and in_tx_checkpoint qualifiers.
Transaction Starts and Elision Starts, to compute the average transaction
length.
This is a reasonable overview over the success of the transactions.
Also support architectures that have a transaction aborted cycles
counter like POWER8. Since that is awkward to handle in the kernel
abstract handle both cases here.
Enable with a new --transaction / -T option.
This requires measuring these events in a group, since they depend on each
other.
This is implemented by using TM sysfs events exported by the kernel
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Acked-by: Arnaldo Carvalho de Melo <acme@infradead.org>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1377128846-977-5-git-send-email-andi@firstfloor.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>