Commit Graph

5759 Commits

Author SHA1 Message Date
Dima Kogan f2f3096888 perf symbols: Fix type error when reading a build-id
This was benign, but wrong. The build-id should live in a char[], not a char*[]

Signed-off-by: Dima Kogan <dima@secretsauce.net>
Link: http://lkml.kernel.org/r/87si6pfwz4.fsf@secretsauce.net
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-10-28 10:02:00 -03:00
Arnaldo Carvalho de Melo f4efcce33d perf tools: Search for more options when passing args to -h
Recently 'perf <tool> -h' was made aware of arguments and would show
just the help for the arguments specified, but that required a strict
form, i.e.:

  $ perf -h --tui

worked, but:

  $ perf -h tui

didn't.

Make it support both cases and also look at the option help when neither
matches, so that he following examples works:

  $ perf report -h interface

   Usage: perf report [<options>]

    --gtk    Use the GTK2 interface
    --stdio  Use the stdio interface
    --tui    Use the TUI interface

  $ perf report -h stack

   Usage: perf report [<options>]

    -g, --call-graph <print_type,threshold[,print_limit],order,
                      sort_key[,branch]>
      Display call graph (stack chain/backtrace):

        print_type:  call graph printing style (graph|flat|fractal|none)
        threshold:   minimum call graph inclusion threshold (<percent>)
        print_limit: maximum number of call graph entry (<number>)
        order:       call graph order (caller|callee)
        sort_key:    call graph sort key (function|address)
        branch:      include last branch info to call graph (branch)

      Default: graph,0.5,caller,function
        --max-stack <n>   Set the maximum stack depth when parsing the
                          callchain, anything beyond the specified depth
                          will be ignored. Default: 127
  $

Suggested-by: Ingo Molnar <mingo@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Brendan Gregg <brendan.d.gregg@gmail.com>
Cc: Chandler Carruth <chandlerc@gmail.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Stephane Eranian <eranian@google.com>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/n/tip-xzqvamzqv3cv0p6w3inhols3@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-10-27 17:35:58 -03:00
Jiri Olsa 1e5a29318b perf stat: Cache aggregated map entries in extra cpumap
Currently any time we need to access socket or core id for given cpu, we
access the sysfs topology file.

Adding a cpus_aggr_map cpu_map to cache those entries.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Kan Liang <kan.liang@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1445784728-21732-3-git-send-email-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-10-27 15:08:07 -03:00
Jiri Olsa 2322f573f8 perf cpu_map: Add cpu_map__empty_new function
Adding cpu_map__empty_new interface to create empty cpumap with given
size. The cpumap entries are initialized with -1.

It'll be used for caching cpu_map in following patches.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Kan Liang <kan.liang@intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1445784728-21732-2-git-send-email-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-10-27 15:05:36 -03:00
Jiri Olsa af33998174 perf evsel: Move id_offset out of struct perf_evsel union member
Because the 'perf stat record' patches will use the id_offset member
together with the priv pointer.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Kan Liang <kan.liang@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1445784728-21732-29-git-send-email-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-10-27 15:04:29 -03:00
Namhyung Kim c711836972 perf tools: Introduce usage_with_options_msg()
Now usage_with_options() setup a pager before printing message so normal
printf() or pr_err() will not be shown.  The usage_with_options_msg()
can be used to print some help message before usage strings.

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1445701767-12731-4-git-send-email-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-10-27 09:28:44 -03:00
Namhyung Kim 01b19455c0 perf tools: Setup pager when printing usage and help
It's annoying to see error or help message when command has many options
like in perf record, report or top.  So setup pager when print parser
error or help message - it should be OK since no UI is enabled at the
parsing time.  The usage_with_options() already disables it by calling
exit_browser() anyway.

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Ingo Molnar <mingo@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1445701767-12731-3-git-send-email-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-10-26 14:08:48 -03:00
Namhyung Kim b272a59d83 perf report: Rename to --show-cpu-utilization
So that it can be more consistent with other --show-* options.  The old
name (--showcpuutilization) is provided only for compatibility.

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1445701767-12731-2-git-send-email-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-10-26 14:06:04 -03:00
Namhyung Kim a5f4a6932e perf tools: Improve ambiguous option help message
Currently if an option name is ambiguous it only prints first two
matched option names but no help.  It'd be better it could show all
possible names and help messages too.

Before:
  $ perf report --show
    Error: Ambiguous option: show (could be --show-total-period or
                                            --show-ref-call-graph)
   Usage: perf report [<options>]

After:
  $ perf report --show
    Error: Ambiguous option: show (could be --show-total-period or
                                            --show-ref-call-graph)
   Usage: perf report [<options>]

      -n, --show-nr-samples
                              Show a column with the number of samples
          --showcpuutilization
                              Show sample percentage for different cpu modes
      -I, --show-info         Display extended information about perf.data file
          --show-total-period
                              Show a column with the sum of periods
          --show-ref-call-graph
                              Show callgraph from reference event

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Ingo Molnar <mingo@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1445701767-12731-1-git-send-email-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-10-26 13:59:06 -03:00
Arnaldo Carvalho de Melo 161d904178 perf tools: Provide help for subset of options
Some tools have a lot of options, so, providing a way to show help just
for some of them may come handy:

  $ perf report -h --tui

   Usage: perf report [<options>]

        --tui             Use the TUI interface

  $ perf report -h --tui --showcpuutilization -b -c

   Usage: perf report [<options>]

    -b, --branch-stack    use branch records for per branch histogram filling
    -c, --comms <comm[,comm...]>
                          only consider symbols in these comms
        --showcpuutilization
                          Show sample percentage for different cpu modes
        --tui             Use the TUI interface

  $

Using it with perf bash completion is also handy, just make sure you
source the needed file:

  $ . ~/git/linux/tools/perf/perf-completion.sh

Then press tab/tab after -- to see a list of options, put them after -h
and only the options chosen will have its help presented:

  $ perf report -h --
  --asm-raw              --demangle-kernel      --group
  --kallsyms             --pretty               --stdio
  --branch-history       --disassembler-style   --gtk
  --max-stack            --showcpuutilization   --symbol-filter
  --branch-stack         --dsos                 --header
  --mem-mode             --show-info            --symbols
  --call-graph           --dump-raw-trace       --header-only
  --modules              --show-nr-samples      --symfs
  --children             --exclude-other        --hide-unresolved
  --objdump              --show-ref-call-graph  --threads
  --column-widths        --fields               --ignore-callees
  --parent               --show-total-period    --tid
  --comms                --field-separator      --input
  --percentage           --socket-filter        --tui
  --cpu                  --force                --inverted
  --percent-limit        --sort                 --verbose
  --demangle             --full-source-path     --itrace
  --pid                  --source               --vmlinux
  $ perf report -h --socket-filter

   Usage: perf report [<options>]

      --socket-filter <n>
                  only show processor socket that match with this filter

Suggested-by: Ingo Molnar <mingo@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Brendan Gregg <brendan.d.gregg@gmail.com>
Cc: Chandler Carruth <chandlerc@gmail.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Stephane Eranian <eranian@google.com>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/n/tip-83mcdd3wj0379jcgea8w0fxa@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-10-23 21:50:50 -03:00
Arnaldo Carvalho de Melo 869c55b0f4 perf tools: Show tool command line options ordered
When asking for a listing of the options, be it using -h or when an
unknown option is passed, order it by one-letter options, then the ones
having just long names.

Suggested-by: Ingo Molnar <mingo@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Brendan Gregg <brendan.d.gregg@gmail.com>
Cc: Chandler Carruth <chandlerc@gmail.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Stephane Eranian <eranian@google.com>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/n/tip-41qh68t35n4ehrpsuazp1dx8@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-10-23 21:50:49 -03:00
Arnaldo Carvalho de Melo f06cff7c59 perf annotate: Don't die() when finding an invalid config option
The perf_config() infrastructure we inherited from git calls die() when
the provided config callback returns -1, meaning some key in a config
section is unexpected, that seems ok for a stdio based tool, but in
--tui we end up messing up the output, so just tell the user about the
error, wait for a keystroke and return 0, being more resilient and
proceeding with what we managed to parse.

That die() needs to die, tho :-)

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: David Ahern <dsahern@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/n/tip-pqtsffh2kwr5mwm4qg9kgotu@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-10-22 18:10:52 -03:00
Arnaldo Carvalho de Melo 464b01a48e perf ui tui: Register the error callbacks before initializing the widgets
I.e. we want to tell the user about errors found during, for instance,
the ui_browser initialization, so that a call to ui__warning() appears
as a window waiting for a key to be pressed.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: David Ahern <dsahern@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/n/tip-ederrwizcl6mfz10vfobl5qq@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-10-22 16:44:17 -03:00
Namhyung Kim 39ff7cdb5a perf annotate: Fix 'annotate.use_offset' config variable usage
The annotate__configs should be sorted so that it can use bsearch(3).

However commit 0c4a5bcea4 ("perf annotate: Display total number of
samples with --show-total-period") added a new config item at the end.
This resulted in the 'annotate.use_offset' config variable cannot be
found and perf terminated like below:

  $ perf report
  bad config file line 6 in ~/.perfconfig

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Martin Liška <mliska@suse.cz>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Taeung Song <treeze.taeung@gmail.com>
Fixes: 0c4a5bcea4 ("perf annotate: Display total number of samples with --show-total-period")
Link: http://lkml.kernel.org/r/1445396240-3428-1-git-send-email-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-10-22 16:34:29 -03:00
Namhyung Kim 76a26549eb perf tools: Improve call graph documents and help messages
The --call-graph option is complex so we should provide better guide for
users.  Also change help message to be consistent with config option
names.  Now perf top will show help like below:

  $ perf top --call-graph
    Error: option `call-graph' requires a value

   Usage: perf top [<options>]

      --call-graph <record_mode[,record_size],print_type,threshold[,print_limit],order,sort_key[,branch]>
           setup and enables call-graph (stack chain/backtrace):

		record_mode:	call graph recording mode (fp|dwarf|lbr)
		record_size:	if record_mode is 'dwarf', max size of stack recording (<bytes>)
				default: 8192 (bytes)
		print_type:	call graph printing style (graph|flat|fractal|none)
		threshold:	minimum call graph inclusion threshold (<percent>)
		print_limit:	maximum number of call graph entry (<number>)
		order:		call graph order (caller|callee)
		sort_key:	call graph sort key (function|address)
		branch:		include last branch info to call graph (branch)

		Default: fp,graph,0.5,caller,function

Requested-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Brendan Gregg <brendan.d.gregg@gmail.com>
Cc: Chandler Carruth <chandlerc@gmail.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Stephane Eranian <eranian@google.com>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/r/1445524112-5201-2-git-send-email-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-10-22 16:23:19 -03:00
Namhyung Kim 792aeafa8e perf tools: Defaults to 'caller' callchain order only if --children is enabled
The caller callchain order is useful with --children option since it can
show 'overview' style output, but other commands which don't use
--children feature like 'perf script' or even 'perf report/top' without
--children are better to keep callee order.

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Brendan Gregg <brendan.d.gregg@gmail.com>
Acked-by: Frederic Weisbecker <fweisbec@gmail.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Chandler Carruth <chandlerc@gmail.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Stephane Eranian <eranian@google.com>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/r/1445499946-29817-1-git-send-email-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-10-22 15:40:11 -03:00
Namhyung Kim a2c10d39af perf top: Support call-graph display options also
Currently 'perf top --call-graph' option is same as 'perf record'.  But
'perf top' also need to receive display options in 'perf report'.  To do
that, change parse_callchain_report_opt() to allow record options too.

Now perf top can receive display options like below:

  $ perf top --call-graph
    Error: option `call-graph' requires a value

   Usage: perf top [<options>]

        --call-graph
          <mode[,dump_size],output_type,min_percent[,print_limit],call_order[,branch]>
                     setup and enables call-graph (stack chain/backtrace)
                     recording: fp dwarf lbr, output_type (graph, flat,
		     fractal, or none), min percent threshold, optional
		     print limit, callchain order, key (function or
		     address), add branches

  $ perf top --call-graph callee,graph,fp

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Brendan Gregg <brendan.d.gregg@gmail.com>
Cc: Chandler Carruth <chandlerc@gmail.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Stephane Eranian <eranian@google.com>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/r/1445495330-25416-2-git-send-email-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-10-22 15:40:02 -03:00
Namhyung Kim 21cf62847d perf tools: Move callchain help messages to callchain.h
These messages will be used by 'perf top' in the next patch.

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Brendan Gregg <brendan.d.gregg@gmail.com>
Cc: Chandler Carruth <chandlerc@gmail.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Stephane Eranian <eranian@google.com>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/r/1445495330-25416-1-git-send-email-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-10-22 15:39:51 -03:00
Arnaldo Carvalho de Melo e3d006ce81 perf annotate: Add debug message for out of bounds sample
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: David Ahern <dsahern@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/n/tip-q0lde9ajs84oi38nlyjcqbwg@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-10-21 18:12:37 -03:00
Andi Kleen 8b8cde4958 perf evsel: Print branch filter state with -vv
Add a missing field to the perf_event_attr debug output.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Link: http://lkml.kernel.org/r/1445366797-30894-4-git-send-email-andi@firstfloor.org
[ Print it between config2 and sample_regs_user (peterz)]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-10-21 18:12:29 -03:00
Kan Liang bc1d03687b perf cpu_map: Fix core dump caused by per-socket/core system-wide stat
Perf will core dump if --per-socket/core -a are applied for perf stat.

The root cause is that cpu_map__build_map set refcnt of evlist's cpu_map
to 1.  It should set refcnt for the newly created cpu_map, not evlist's
cpu_map.

Here is the example:

  # perf stat -e cycles --per-socket -a sleep 1

   Performance counter stats for 'system wide':

  S0       36         30,196,257      cycles
  S1       28         15,823,536      cycles

       1.001126828 seconds time elapsed

  *** Error in `./perf': corrupted double-linked list: 0x00000000021f9090 ***
  ======= Backtrace: =========
  /lib64/libc.so.6[0x3002e7bbe7]
  /lib64/libc.so.6[0x3002e7d2b5]
  ./perf(perf_evsel__delete+0x28)[0x485bdd]
  ./perf[0x4800e8]
  ./perf(perf_evlist__delete+0x5e)[0x482cd5]
  ./perf(cmd_stat+0xf25)[0x432328]
  ./perf[0x4768e0]
  ./perf[0x476ad6]
  ./perf[0x476b41]
  ./perf(main+0x1d0)[0x476db2]
  /lib64/libc.so.6(__libc_start_main+0xf5)[0x3002e21b45]
  ./perf[0x4202c5]

Signed-off-by: Kan Liang <kan.liang@intel.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Andi Kleen <ak@linux.intel.com>
Link: http://lkml.kernel.org/r/1444388363-35936-1-git-send-email-kan.liang@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-10-20 15:54:20 -03:00
Stephane Eranian 43e41adc9e perf record: Add ability to sample call branches
This patch add a new branch type sampling filter to perf record.
It is named 'call' and maps to PERF_SAMPLE_BRANCH_CALL. It samples
direct call branches only, unlike 'any_call' which includes indirect
calls as well.

 $ perf record -j call -e cycles .....

The man page is updated accordingly.

Signed-off-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: khandual@linux.vnet.ibm.com
Link: http://lkml.kernel.org/r/1444720151-10275-5-git-send-email-eranian@google.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-10-20 10:30:55 +02:00
Arnaldo Carvalho de Melo a4c6a3e8bb perf bench: Use named initializers in the trailer too
To avoid this splat with gcc 4.4.7:

  cc1: warnings being treated as errors
  bench/mem-functions.c:273: error: missing initializer
  bench/mem-functions.c:273: error: (near initialization for ‘memcpy_functions[4].desc’)
  bench/mem-functions.c:366: error: missing initializer
  bench/mem-functions.c:366: error: (near initialization for ‘memset_functions[4].desc’)

Cc: David Ahern <dsahern@gmail.com>
Cc: Hitoshi Mitake <mitake@dcl.info.waseda.ac.jp>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/n/tip-0s8o6tgw1pdwvdv02llb9tkd@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-10-19 18:17:25 -03:00
Jiri Olsa d2b5a315ae perf script: Check output fields only for samples
There's no need to check sampling output fields for events without
perf_event_attr::sample_type field set.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Kan Liang <kan.liang@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1444992092-17897-51-git-send-email-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-10-19 18:05:59 -03:00
Jiri Olsa 1fe7a30028 perf cpu_map: Add data arg to cpu_map__build_map callback
Adding data arg to cpu_map__build_map callback, so we could pass data
along to the callback. It'll be needed in following patches to retrieve
topology info from perf.data.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Kan Liang <kan.liang@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1444992092-17897-41-git-send-email-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-10-19 18:04:01 -03:00
Jiri Olsa f1cbb8f357 perf cpu_map: Make cpu_map__build_map global
We'll need to call it from perf stat in the stat_script patchkit

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Kan Liang <kan.liang@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1444992092-17897-40-git-send-email-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-10-19 18:03:03 -03:00
Jiri Olsa 208df99ed0 perf stat: Add AGGR_UNSET mode
Adding AGGR_UNSET mode, so we could distinguish unset aggr_mode in
following patches.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Kan Liang <kan.liang@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1444992092-17897-30-git-send-email-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-10-19 18:02:07 -03:00
Jiri Olsa 581cc8a2a2 perf stat: Rename perf_stat struct into perf_stat_evsel
It's used as the perf_evsel::priv data, so the name suits better. Also
we'll need the perf_stat name free for more generic struct.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Kan Liang <kan.liang@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1444992092-17897-29-git-send-email-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-10-19 18:01:05 -03:00
Yunlong Song 3a134ae96c perf help: Change 'usage' to 'Usage' for consistency
Capitalize 'usage' to make it consistent with all the other 'Usage' in
the codes, e.g., usage_builtin.

Signed-off-by: Yunlong Song <yunlong.song@huawei.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Ramkumar Ramachandra <artagnon@gmail.com>
Cc: Sriram Raghunathan <sriram.r@nokia.com>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/r/1444894792-2338-3-git-send-email-yunlong.song@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-10-19 16:51:44 -03:00
Ingo Molnar aa254af25c perf bench: Run benchmarks, don't test them
So right now we output this text:

        memcpy: Benchmark for memcpy() functions
        memset: Benchmark for memset() functions
           all: Test all memory access benchmarks

But the right verb to use with benchmarks is to 'run' them, not 'test'
them.

So change this (and all similar texts) to:

        memcpy: Benchmark for memcpy() functions
        memset: Benchmark for memset() functions
           all: Run all memory access benchmarks

Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Hitoshi Mitake <mitake@dcl.info.waseda.ac.jp>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1445241870-24854-15-git-send-email-mingo@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-10-19 16:10:25 -03:00
Ingo Molnar 2f211c84ad perf bench mem: Rename 'routine' to 'function'
So right now there's a somewhat inconsistent mess of the benchmarking
code and options sometimes calling benchmarked functions 'functions',
sometimes calling them 'routines'.

Name them 'functions' consistently.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Hitoshi Mitake <mitake@dcl.info.waseda.ac.jp>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1445241870-24854-14-git-send-email-mingo@kernel.org
[ Updated perf-bench man page, pointed out by David Ahern ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-10-19 16:10:25 -03:00
Ingo Molnar b0d22e52e3 perf bench: Harmonize all the -l/--nr_loops options
We have three benchmarking subsystems that specify some sort of 'number
of loops' parameter - but all of them do it inconsistently:

 numa:              -l/--nr_loops
 sched messaging:   -l/--loops
 mem memset/memcpy: -i/--iterations

Harmonize them to -l/--nr_loops by picking the numa variant - which is
also the most likely one to have existing scripting which we don't want
to break.

Plus improve the parameter help texts to indicate the default value for
the nr_loops variable to keep users from guessing ...

Also propagate the naming to internal variables.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Hitoshi Mitake <mitake@dcl.info.waseda.ac.jp>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1445241870-24854-13-git-send-email-mingo@kernel.org
[ Let the harmonisation reach the perf-bench man page as well ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-10-19 16:10:05 -03:00
Ingo Molnar 5dd93304a5 perf bench mem: Reorganize the code a bit
Reorder functions a bit, so that we synchronize the layout of the
memcpy() and memset() portions of the code.

This improves the code, especially after we'll add an strlcpy() variant
as well.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Hitoshi Mitake <mitake@dcl.info.waseda.ac.jp>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1445241870-24854-12-git-send-email-mingo@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-10-19 16:07:19 -03:00
Ingo Molnar 13b1fdce8d perf bench mem: Improve user visible strings
- fix various typos in user visible output strings
 - make the output consistent (wrt. capitalization and spelling)
 - offer the list of routines to benchmark on '-r help'.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Hitoshi Mitake <mitake@dcl.info.waseda.ac.jp>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1445241870-24854-11-git-send-email-mingo@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-10-19 16:07:18 -03:00
Ingo Molnar a69b4f7413 perf bench mem: Fix 'length' vs. 'size' naming confusion
So 'perf bench mem memcpy/memset' consistently uses 'len' and 'length'
for buffer sizes - while it's really a memory buffer size. (strings have
length.)

Rename all affected variables.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Hitoshi Mitake <mitake@dcl.info.waseda.ac.jp>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1445241870-24854-10-git-send-email-mingo@kernel.org
[ Update perf-bench man page ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-10-19 16:07:11 -03:00
Ingo Molnar e815e32760 perf bench mem: Rename 'routine' to 'routine_str'
So bench/mem-functions.c has a 'routine' name for the routines parameter
string, but a 'length_str' name for the length parameter string.

We also have another entity named 'routine': 'struct routine'.

This is inconsistent and confusing: rename 'routine' to 'routine_str'.

Also fix typos in the --routine help text.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Hitoshi Mitake <mitake@dcl.info.waseda.ac.jp>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1445241870-24854-9-git-send-email-mingo@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-10-19 16:05:27 -03:00
Ingo Molnar b14f2d3576 perf bench mem: Change 'cycle' to 'cycles'
So 'perf bench mem memset/memcpy' has a CPU cycles measurement method,
but calls it 'cycle' (singular) throughout the code, which makes it
harder to read.

Rename all related functions, variables and options to a plural 'cycles'
nomenclature.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Hitoshi Mitake <mitake@dcl.info.waseda.ac.jp>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1445241870-24854-8-git-send-email-mingo@kernel.org
[ s/--cycle/--cycles/g in perf-bench man page ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-10-19 16:05:01 -03:00
Ingo Molnar 7a46a8fd13 perf bench: List output formatting options on 'perf bench -h'
So 'perf bench -h' is not very helpful when printing the help line
about the output formatting options:

    -f, --format <default>
                              Specify format style

There are two output format styles, 'default' and 'simple', so improve
the help text to:

    -f, --format <default|simple>
                              Specify the output formatting style

Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Hitoshi Mitake <mitake@dcl.info.waseda.ac.jp>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1445241870-24854-7-git-send-email-mingo@kernel.org
[ Removed leftovers from the mem-functions.c rename ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-10-19 16:03:53 -03:00
Ingo Molnar 6db175c733 perf bench: Remove the prefaulting complication from 'perf bench mem mem*'
So 'perf bench mem memcpy/memset' has elaborate code to measure
memcpy()/memset() performance both with freshly allocated buffers (which
includes initial page fault overhead) and with preallocated buffers.

But the thing is, the resulting bandwidth results are mostly
meaningless, because page faults dominate so much of the cost.

It might make sense to measure cache cold vs. cache hot performance, but
the code does not do this.

So remove this complication, and always prefault the ranges before using
them.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Hitoshi Mitake <mitake@dcl.info.waseda.ac.jp>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1445241870-24854-6-git-send-email-mingo@kernel.org
[ Remove --no-prefault, --only-prefault from docs, noticed by David Ahern ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-10-19 16:03:31 -03:00
Ingo Molnar 9b2fa7f3e7 perf bench: Rename 'mem-memcpy.c' => 'mem-functions.c'
So mem-memcpy.c started out as a simple memcpy() benchmark, then it grew
memset() functionality and now I plan to add string copy benchmarks as
well.

This makes the file name a misnomer: rename it to the more generic
mem-functions.c name.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Hitoshi Mitake <mitake@dcl.info.waseda.ac.jp>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1445241870-24854-5-git-send-email-mingo@kernel.org
[ The "rename" was introducing __unused, wasn't removing the old file,
  and didn't update tools/perf/bench/Build, fix it ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-10-19 15:39:39 -03:00
Ingo Molnar 2946f59ac3 perf bench: Eliminate unused argument from bench_mem_common()
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Hitoshi Mitake <mitake@dcl.info.waseda.ac.jp>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1445241870-24854-4-git-send-email-mingo@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-10-19 15:29:08 -03:00
Ingo Molnar 2761974156 perf bench: Default to all routines in 'perf bench mem'
So few people know that the --routine option to 'perf bench memcpy/memset'
exists, and would not know that it's capable of testing the kernel's
memcpy/memset implementations.

Furthermore, 'perf bench mem all' will not run all routines:

	vega:~> perf bench mem all
	# Running mem/memcpy benchmark...
	Routine default (Default memcpy() provided by glibc)
	# Copying 1MB Bytes ...

	     894.454383 MB/Sec
	       3.844734 GB/Sec (with prefault)

	# Running mem/memset benchmark...
	Routine default (Default memset() provided by glibc)
	# Copying 1MB Bytes ...

	       1.220703 GB/Sec
	       9.042245 GB/Sec (with prefault)

Because misleadingly the 'all' refers to 'all sub-benchmarks', not 'all
sub-benchmarks and routines'.

Fix all this by making the memcpy/memset routine to default to 'all',
which results in all the benchmarks being run:

	triton:~> perf bench mem all
	# Running mem/memcpy benchmark...
	Routine default (Default memcpy() provided by glibc)
	# Copying 1MB Bytes ...

	       1.448906 GB/Sec
	       4.957170 GB/Sec (with prefault)
	Routine x86-64-unrolled (unrolled memcpy() in arch/x86/lib/memcpy_64.S)
	# Copying 1MB Bytes ...

	       1.614153 GB/Sec
	       4.379204 GB/Sec (with prefault)
	Routine x86-64-movsq (movsq-based memcpy() in arch/x86/lib/memcpy_64.S)
	# Copying 1MB Bytes ...

	       1.570036 GB/Sec
	       4.264465 GB/Sec (with prefault)
	Routine x86-64-movsb (movsb-based memcpy() in arch/x86/lib/memcpy_64.S)
	# Copying 1MB Bytes ...

	       1.788576 GB/Sec
	       6.554111 GB/Sec (with prefault)

	# Running mem/memset benchmark...
	Routine default (Default memset() provided by glibc)
	# Copying 1MB Bytes ...

	       2.082223 GB/Sec
	       9.126752 GB/Sec (with prefault)
	Routine x86-64-unrolled (unrolled memset() in arch/x86/lib/memset_64.S)
	# Copying 1MB Bytes ...

	       5.710892 GB/Sec
	       8.346688 GB/Sec (with prefault)
	Routine x86-64-stosq (movsq-based memset() in arch/x86/lib/memset_64.S)
	# Copying 1MB Bytes ...

	       9.765625 GB/Sec
	      12.520032 GB/Sec (with prefault)
	Routine x86-64-stosb (movsb-based memset() in arch/x86/lib/memset_64.S)
	# Copying 1MB Bytes ...

	       9.668936 GB/Sec
	      12.682630 GB/Sec (with prefault)

Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Hitoshi Mitake <mitake@dcl.info.waseda.ac.jp>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1445241870-24854-3-git-send-email-mingo@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-10-19 15:05:34 -03:00
Ingo Molnar 13839ec495 perf bench: Improve the 'perf bench mem memcpy' code readability
- improve the readability of initializations
 - fix unnecessary double negations
 - fix ugly line breaks
 - fix other small details

Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1445241870-24854-2-git-send-email-mingo@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-10-19 15:05:00 -03:00
Namhyung Kim 2690c73093 perf test: Suppress libtraceevent warnings
Currently libtraceevent emits warning on unsupported event formats.
However it'd be better to see them only -v option is given.  To do that,
it needs to override the warning() function which is used in the
libtracevent.  Thus add set_warning_routine() same as set_die_routine()
and check the verbose flag in our warning routine.

Before:
  # perf test 5
   5: parse events tests                                       :
    Warning: [kvmmmu:kvm_mmu_get_page] bad op token {
    Warning: [kvmmmu:kvm_mmu_sync_page] bad op token {
    Warning: [kvmmmu:kvm_mmu_unsync_page] bad op token {
    Warning: [kvmmmu:kvm_mmu_prepare_zap_page] bad op token {
    Warning: [kvmmmu:fast_page_fault] function is_writable_pte not defined
    ...
   Ok

After:
  # perf test 5
   5: parse events tests                                       : Ok

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Acked-by: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/r/1445268229-1601-2-git-send-email-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-10-19 14:58:10 -03:00
Namhyung Kim 8719138318 perf test: Silence tracepoint event failures
Currently, when 'perf test' is run by a normal user, it'll fail to
access tracepoint events.  The output becomes somewhat messy because it
tries to be nice with long error messages and hints.

IMHO this is not needed for 'perf test' by default and AFAIK 'perf test'
uses pr_debug() rather than pr_err() for such messages so that one can
use -v option to see further details on failed testcases if needed.

Before:
  $ perf test
   1: vmlinux symtab matches kallsyms                          : FAILED!
   2: detect openat syscall event                              :Error:
  No permissions to read
  /sys/kernel/debug/tracing/events/syscalls/sys_enter_openat
  Hint:	Try 'sudo mount -o remount,mode=755 /sys/kernel/debug/tracing'
  FAILED!
   3: detect openat syscall event on all cpus                  :Error:
  No permissions to read
  /sys/kernel/debug/tracing/events/syscalls/sys_enter_openat
  Hint:	Try 'sudo mount -o remount,mode=755 /sys/kernel/debug/tracing'
  FAILED!
   ...

After:
  $ perf test
   1: vmlinux symtab matches kallsyms                          : FAILED!
   2: detect openat syscall event                              : FAILED!
   3: detect openat syscall event on all cpus                  : FAILED!
   ...

  $ perf test -v 2
   2: detect openat syscall event                              :
  --- start ---
  test child forked, pid 30575
  Error:	    No permissions to read
  /sys/kernel/debug/tracing/events/syscalls/sys_enter_openat
  Hint:  Try 'sudo mount -o remount,mode=755 /sys/kernel/debug/tracing'

  test child finished with -1
  ---- end ----
  detect openat syscall event: FAILED!

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Acked-by: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/r/1445268229-1601-1-git-send-email-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-10-19 14:57:49 -03:00
Namhyung Kim 31eb436054 perf hists browser: Add 'm' key for context menu display
With horizontal scrolling, the left/right arrow keys are used to scroll
columns and ENTER/ESC keys are used to enter/exit menu.  However if
callchain is recorded, the ENTER key is used to toggle callchain
expansion so there's no way to display menu.  Use 'm' key to display the
menu for this case.

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1444694521-8136-1-git-send-email-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-10-12 23:29:14 -03:00
Rabin Vincent 186c6cfb32 perf callchains: Fix unw_word_t pointer casts
unw_word_t is uint64_t even on 32-bit MIPS.  Cast it to uintptr_t before
the cast to void *p to get rid of the following errors:

  util/unwind-libunwind.c: In function 'access_mem':
  util/unwind-libunwind.c:464:4: error: cast to pointer from integer of different size [-Werror=int-to-pointer-cast]
  util/unwind-libunwind.c:475:2: error: cast to pointer from integer of different size [-Werror=int-to-pointer-cast]
  cc1: all warnings being treated as errors
  make[3]: *** [util/unwind-libunwind.o] Error 1

Signed-off-by: Rabin Vincent <rabin.vincent@axis.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Rabin Vincent <rabinv@axis.com>
Link: http://lkml.kernel.org/r/1443379079-29133-1-git-send-email-rabin.vincent@axis.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-10-12 23:27:40 -03:00
Rabin Vincent 8eac1d5e92 perf callchain: Use debug_frame if eh_frame is unusable
When NO_LIBUNWIND_DEBUG_FRAME=0, use the .debug_frame if the .eh_frame
doesn't contain the approprate unwind tables.

Signed-off-by: Rabin Vincent <rabin.vincent@axis.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Rabin Vincent <rabinv@axis.com>
Link: http://lkml.kernel.org/r/1443379079-29133-3-git-send-email-rabin.vincent@axis.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-10-12 23:25:25 -03:00
Arnaldo Carvalho de Melo 4aa8e454d3 perf hists browser: Inform how to reset the symbol filter
When in the hists browser, i.e. in 'perf report' or in 'perf top', it is
possible to press '/' and specify a substring to filter by symbol name.

Clarify how to remove a filter by making the prompt be:

   Please enter the name of symbol you want to see.
   To remove the filter later, press / + ENTER

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: David Ahern <dsahern@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/n/tip-vbq2b0kyufwy6p0ctkfswcoe@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-10-12 14:02:29 -03:00
Arnaldo Carvalho de Melo 7727a92544 perf ui browsers: Remove help messages about use of right and arrow keys
They were repurposed for horizontal scrolling, so use just ENTER/ESC in
the help messages.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: David Ahern <dsahern@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Wang Nan <wangnan0@huawei.com>
Fixes: c6c3c02dea ("perf hists browser: Implement horizontal scrolling")
Link: http://lkml.kernel.org/n/tip-n5ar4qg8fs12ax4vhr3rxhxj@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-10-12 13:56:50 -03:00
Arnaldo Carvalho de Melo dc38218e8b perf symbols: Try the .debug/ DSO cache as a last resort
Not as the first attempt at finding a vmlinux for the running kernel,
this way we get a more informative filename to present in tools, it will
check that the build-id is the same as the one previously loaded in the
DSO in dso->build_id, reading from /sys/kernel/notes, for instance.

E.g. in the annotation TUI, going from 'perf top', for the scsi_sg_alloc
kernel function, in the first line:

Before:

scsi_sg_alloc  /root/.debug/.build-id/28/2777c262e6b3c0451375163c9a81c893218ab1

After:

scsi_sg_alloc  /lib/modules/4.3.0-rc1+/build/vmlinux

And:

  # ls -la /root/.debug/.build-id/28/2777c262e6b3c0451375163c9a81c893218ab1
lrwxrwxrwx. 1 root root 81 Sep 22 16:11 /root/.debug/.build-id/28/2777c262e6b3c0451375163c9a81c893218ab1 -> ../../home/git/build/v4.3.0-rc1+/vmlinux/282777c262e6b3c0451375163c9a81c893218ab1
  # file ~/.debug/home/git/build/v4.3.0-rc1+/vmlinux/282777c262e6b3c0451375163c9a81c893218ab1
/root/.debug/home/git/build/v4.3.0-rc1+/vmlinux/282777c262e6b3c0451375163c9a81c893218ab1: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), statically linked, BuildID[sha1]=282777c262e6b3c0451375163c9a81c893218ab1, not stripped
  #

The same as:

  # file /lib/modules/4.3.0-rc1+/build/vmlinux
/lib/modules/4.3.0-rc1+/build/vmlinux: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), statically linked, BuildID[sha1]=282777c262e6b3c0451375163c9a81c893218ab1, not stripped

Furthermore:

  # sha256sum /lib/modules/4.3.0-rc1+/build/vmlinux
  e7a789bbdc61029ec09140c228e1dd651271f38ef0b8416c0b7d5ff727b98be2  /lib/modules/4.3.0-rc1+/build/vmlinux
  # sha256sum ~/.debug/home/git/build/v4.3.0-rc1+/vmlinux/282777c262e6b3c0451375163c9a81c893218ab1
  e7a789bbdc61029ec09140c228e1dd651271f38ef0b8416c0b7d5ff727b98be2  /root/.debug/home/git/build/v4.3.0-rc1+/vmlinux/282777c262e6b3c0451375163c9a81c893218ab1
  [root@zoo new]#

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: David Ahern <dsahern@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/n/tip-9y42ikzq3jisiddoi6f07n8z@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-10-12 13:52:27 -03:00
Arnaldo Carvalho de Melo ae93880244 perf python: Support the PERF_RECORD_SWITCH event
To test it check tools/perf/python/twatch.py, after following the
instructions there to enable context_switch, output looks like:

  [root@zoo linux]# tools/perf/python/twatch.py
  cpu: 1, pid: 31463, tid: 31463 { type: context_switch, next_prev_pid: 31463, next_prev_tid: 31463, switch_out: 0 }
  cpu: 2, pid: 31463, tid: 31496 { type: context_switch, next_prev_pid: 31463, next_prev_tid: 31496, switch_out: 0 }
  cpu: 2, pid: 31463, tid: 31496 { type: context_switch, next_prev_pid: 31463, next_prev_tid: 31496, switch_out: 1 }
  cpu: 3, pid: 31463, tid: 31527 { type: context_switch, next_prev_pid: 31463, next_prev_tid: 31527, switch_out: 0 }
  cpu: 1, pid: 31463, tid: 31463 { type: context_switch, next_prev_pid: 31463, next_prev_tid: 31463, switch_out: 1 }
  cpu: 3, pid: 31463, tid: 31527 { type: context_switch, next_prev_pid: 31463, next_prev_tid: 31527, switch_out: 1 }
  cpu: 1, pid: 31463, tid: 31463 { type: context_switch, next_prev_pid: 31463, next_prev_tid: 31463, switch_out: 0 }
  ^CTraceback (most recent call last):
    File "tools/perf/python/twatch.py", line 67, in <module>
      main(context_switch = 1, thread = 31463)
    File "tools/perf/python/twatch.py", line 40, in main
      evlist.poll(timeout = -1)
  KeyboardInterrupt
  [root@zoo linux]#

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: David Ahern <dsahern@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Guy Streeter <streeter@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/n/tip-1ukistmpamc5z717k80ctcp2@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-10-07 19:41:50 -03:00
Andrzej Hajda 3834966538 perf tools: Fix handling read result using a signed variable
The function can return negative value, assigning it to unsigned
variable can cause memory corruption.

The problem has been detected using proposed semantic patch
scripts/coccinelle/tests/unsigned_lesser_than_zero.cocci [1].

[1]: http://permalink.gmane.org/gmane.linux.kernel/2038576

Signed-off-by: Andrzej Hajda <a.hajda@samsung.com>
Cc: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
Cc: Marek Szyprowski <m.szyprowski@samsung.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: kernel-janitors@vger.kernel.org
Link: http://lkml.kernel.org/r/1444122017-16856-1-git-send-email-a.hajda@samsung.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-10-06 18:04:59 -03:00
Jiri Olsa 1178bfd41f perf tools: Use hpp_dimension__add_output to register hpp columns
The perf_hpp__init currently does not respect sorting dimensions and the
setup_sorting function could endup queueing same format twice. That
screwed up the perf_hpp__list and got stuck in loop within
perf_hpp__setup_output_field function.

  $ perf report -F +overhead

  0x00000000004c1355 in perf_hpp__is_sort_entry (format=format@entry=0x880440 <perf_hpp.format>) at util/sort.c:1506
  1506    {

     #0  0x00000000004c1355 in perf_hpp__is_sort_entry (format=format@entry=0x880440 <perf_hpp.format>) at util/sort.c:1506
     #1  0x00000000004c139d in perf_hpp__same_sort_entry (a=a@entry=0x880440 <perf_hpp.format>, b=b@entry=0x2bb2fe0) at util/sort.c:1380
     #2  0x00000000004f8d3c in perf_hpp__setup_output_field () at ui/hist.c:554
     #3  0x00000000004c1d1e in setup_sorting () at util/sort.c:1984
     #4  0x000000000042efbf in cmd_report (argc=0, argv=0x7ffea5a0e790, prefix=<optimized out>) at builtin-report.c:874
     #5  0x0000000000476f13 in run_builtin (p=p@entry=0x875628 <commands+168>, argc=argc@entry=3, argv=argv@entry=0x7ffea5a0e790) at perf.c:385
     #6  0x000000000047710b in handle_internal_command (argc=3, argv=0x7ffea5a0e790) at perf.c:445
     #7  0x0000000000477176 in run_argv (argcp=argcp@entry=0x7ffea5a0e5fc, argv=argv@entry=0x7ffea5a0e5f0) at perf.c:489
     #8  0x00000000004773e7 in main (argc=3, argv=0x7ffea5a0e790) at perf.c:606

Using hpp_dimension__add_output function to register the output column.
It will also mark the dimension as taken and omit above stuck.

Reported-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1444134312-29136-4-git-send-email-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-10-06 18:04:59 -03:00
Jiri Olsa beeaaeb368 perf tools: Introduce hpp_dimension__add_output function
This function will allow to register output column from ui code and
respect taken sort/output dimensions.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1444134312-29136-3-git-send-email-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-10-06 18:04:59 -03:00
Jiri Olsa 0974d2c971 perf tools: Get rid of superfluos call to reset_dimensions
There's no need to call reset_dimensions within __setup_output_field
function. It's already called in its caller setup_sorting right before
perf_hpp__init, which will be changed in following patch to respect
taken dimension.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1444134312-29136-2-git-send-email-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-10-06 18:04:58 -03:00
Jiri Olsa 27bf90bf06 perf tools: Fail properly in case pattern matching fails to find tracepoint
Currently we dont fail properly when pattern matching fails to find any
tracepoint.

Current behaviour:

  $ perf record -e 'sched:krava*' sleep 1
  WARNING: event parser found nothinginvalid or unsupported event: 'sched:krava*'
  Run 'perf list' for a list of valid events

  usage: perf record [<options>] [<command>]
     or: perf record [<options>] -- <command> [<options>]

This patch change:

  $ perf record -e 'sched:krava*' sleep 1
  event syntax error: 'sched:krava*'
                       \___ unknown tracepoint

  Error:  File /sys/kernel/debug/tracing/events/sched/krava* not found.
  Hint:   Perhaps this kernel misses some CONFIG_ setting to enable this feature?.

  Run 'perf list' for a list of valid events

   usage: perf record [<options>] [<command>]
      or: perf record [<options>] -- <command> [<options>]

Reported-by: Daniel Bristot de Oliveira <bristot@redhat.com>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1444073477-3181-1-git-send-email-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-10-05 17:59:50 -03:00
Arnaldo Carvalho de Melo c6c3c02dea perf hists browser: Implement horizontal scrolling
Do it using the recently introduced ui_brower scrolling mode, setting
ui_browser.columns to the number of sort columns and then, when
rendering each line, skipping as many initial columns as the user
pressed the right arrow.

As the user presses the left arrow, the ui_browser code will remove the
scrolling counter and the left scrolling takes place.

The right arrow key was an alias for ENTER, so people used to press it
may get a bit annoyed at first, sorry! Ditto for ESC and the left key.

Callchains can be left as is or we can, when rendering the Symbol
column, store the at what position on the screen it is and then
using ui_browser__gotorc() to print it from there, i.e. the callchain
would move around with the symbol.

Leaving it as is, i.e. at a fixed position, close to the left, saves
precious screen real state for it, so I'm inclined to leave it as is
now.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Chandler Carruth <chandlerc@gmail.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/n/tip-ccqq9sabgfge5dwbqjwh71ij@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-10-05 17:59:49 -03:00
Arnaldo Carvalho de Melo faae6f690e perf ui browser: Optional horizontal scrolling key binding
If the classes derived from ui_browser want to do some sort of
horizontal scrolling, they have just to set ui_browser->columns to
the number of columns available.

Those columns can be the number of characters on the screen, if what is
desired is to scroll character by character, or the number of columns in
a spreadsheet like table.

This is what the hist_browser will do, skipping ui_browser->horiz_scroll
columns when rendering each of its lines.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: David Ahern <dsahern@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/n/tip-q6a22bpmpgcr1awgzrmd4jrs@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-10-05 17:59:49 -03:00
Arnaldo Carvalho de Melo def02db0d6 perf callchain: Switch default to 'graph,0.5,caller'
Which is the most common default found in other similar tools.

Requested-by: Ingo Molnar <mingo@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Chandler Carruth <chandlerc@gmail.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Wang Nan <wangnan0@huawei.com>
Link: https://www.youtube.com/watch?v=nXaxk27zwlk
Link: http://lkml.kernel.org/n/tip-v8lq36aispvdwgxdmt9p9jd9@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-10-05 17:59:43 -03:00
Matt Fleming 035827e9f2 perf tests: Add Intel CQM test
Peter reports that it's possible to trigger a WARN_ON_ONCE() in the
Intel CQM code by combining a hardware event and an Intel CQM
(software) event into a group. Unfortunately, the perf tools are not
able to create this bundle and we need to manually construct a test
case.

For posterity, record Peter's proof of concept test case in tools/perf
so that it presents a model for how we can perform architecture
specific tests, or "arch tests", in perf in the future.

The particular issue triggered in the test case is that when the
counter for the hardware event overflows and triggers a PMI we'll read
both the hardware event and the software event counters.
Unfortunately, for CQM that involves performing an IPI to read the CQM
event counters on all sockets, which in NMI context triggers the
WARN_ON_ONCE().

Reported-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Kanaka Juvva <kanaka.d.juvva@intel.com>
Cc: Vikas Shivappa <vikas.shivappa@intel.com>
Cc: Vince Weaver <vince@deater.net>
Link: http://lkml.kernel.org/r/1437490509-15373-1-git-send-email-matt@codeblueprint.co.uk
Link: http://lkml.kernel.org/n/tip-3p4ra0u8vzm7m289a1m799kf@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-10-05 16:56:07 -03:00
Matt Fleming d8b167f9d8 perf tests: Move x86 tests into arch directory
Move out the x86-specific tests into tools/perf/arch/x86/tests and
define an 'arch_tests' array, which is the list of tests that only apply
to the build architecture.

We can also now begin to get rid of some of the #ifdef code that is
present in the generic perf tests.

Signed-off-by: Matt Fleming <matt.fleming@intel.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Kanaka Juvva <kanaka.d.juvva@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Vikas Shivappa <vikas.shivappa@intel.com>
Cc: Vince Weaver <vince@deater.net>
Link: http://lkml.kernel.org/n/tip-9s68h4ptg06ah0lgnjz55mqn@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-10-05 16:55:43 -03:00
Matt Fleming 31b6753f95 perf tests: Add arch tests
Tests that only make sense for some architectures currently live in
the same place as the generic tests. Move out the x86-specific tests
into tools/perf/arch/x86/tests and define an 'arch_tests' array, which
is the list of tests that only apply to the build architecture.

The main idea is to encourage developers to add arch tests to build
out perf's test coverage, without dumping everything in
tools/perf/tests.

Signed-off-by: Matt Fleming <matt.fleming@intel.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Kanaka Juvva <kanaka.d.juvva@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Vikas Shivappa <vikas.shivappa@intel.com>
Cc: Vince Weaver <vince@deater.net>
Link: http://lkml.kernel.org/n/tip-p4uc1c15ssbj8xj7ku5slpa6@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-10-05 16:55:38 -03:00
Jiri Olsa a1853e2c6f perf tools: Handle -h and -v options
Adding handling for '-h' and '-v' options to invoke help and version
command respectively.

Current behaviour is:

   $ perf -v
   Unknown option: -v

    Usage: perf [--version] [--help] [OPTIONS] COMMAND [ARGS]
   $ perf -h
   Unknown option: -h

    Usage: perf [--version] [--help] [OPTIONS] COMMAND [ARGS]

New behaviour:

  $ perf -h

   usage: perf [--version] [--help] [OPTIONS] COMMAND [ARGS]

   The most commonly used perf commands are:
     annotate        Read perf.data (created by perf record) and display annotated code
     archive         Create archive with object files with build-ids found in perf.data file
     bench           General framework for benchmark suites
   ...

  $ perf -v
  perf version 4.3.rc3.gc99e32

Updated man page.

Requested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Don Zickus <dzickus@redhat.com>
Cc: Kan Liang <kan.liang@intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1444068369-20978-10-git-send-email-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-10-05 16:36:18 -03:00
Jiri Olsa b34b3bf079 perf tools: Setup proper width for symbol_iaddr field
We need to properly initialize column width for symbol_iaddr field, so
all symbols could fit in the column.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Don Zickus <dzickus@redhat.com>
Cc: Kan Liang <kan.liang@intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1444068369-20978-9-git-send-email-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-10-05 16:33:41 -03:00
Don Zickus 28e6db205b perf tools: Add support for sorting on the iaddr
Sorting on 'symbol' gives to broad a resolution as it can cover a range
of IP address.  Use the iaddr instead to get proper sorting on IP
addresses.  Need to use the 'mem_sort' feature of perf record.

New sort option is: symbol_iaddr, header label is 'Code Symbol'.

  $ perf mem report --stdio -F +symbol_iaddr
  # Overhead       Samples  Code Symbol              Local Weight
  # ........  ............  ........................ ............
  #
      54.08%             1  [k] nmi_handle           192
       4.51%             1  [k] finish_task_switch   16
       3.66%             1  [.] malloc               13
       3.10%             1  [.] __strcoll_l          11

Signed-off-by: Don Zickus <dzickus@redhat.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Kan Liang <kan.liang@intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1444068369-20978-8-git-send-email-jolsa@kernel.org
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-10-05 16:32:00 -03:00
Jiri Olsa ddd83c9717 perf tests: Add parsing test for 'P' modifier
We cant test 'P' modifier gets properly parsed, the functionality test
itself is beyond this suite.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Don Zickus <dzickus@redhat.com>
Cc: Kan Liang <kan.liang@intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1444068369-20978-7-git-send-email-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-10-05 16:22:15 -03:00
Jiri Olsa 7f94af7a48 perf tools: Introduce 'P' modifier to request max precision
The 'P' will cause the event to get maximum possible detected precise
level.

Following record:
  $ perf record -e cycles:P ...

will detect maximum precise level for 'cycles' event and use it.

Commiter note:

Testing it:

  $ perf record -e cycles:P usleep 1
  [ perf record: Woken up 1 times to write data ]
  [ perf record: Captured and wrote 0.013 MB perf.data (9 samples) ]
  $ perf evlist
  cycles:P
  $ perf evlist -v
  cycles:P: size: 112, { sample_period, sample_freq }: 4000, sample_type:
  IP|TID|TIME|PERIOD, disabled: 1, inherit: 1, mmap: 1, comm: 1, freq: 1,
  enable_on_exec: 1, task: 1, precise_ip: 2, sample_id_all: 1, mmap2: 1,
  comm_exec: 1
  $

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Don Zickus <dzickus@redhat.com>
Cc: Kan Liang <kan.liang@intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1444068369-20978-6-git-send-email-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-10-05 16:21:11 -03:00
Jiri Olsa 45cf6c33f9 perf tools: Export perf_event_attr__set_max_precise_ip()
It'll be used in following patch.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Don Zickus <dzickus@redhat.com>
Cc: Kan Liang <kan.liang@intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1444068369-20978-5-git-send-email-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-10-05 16:16:20 -03:00
Jiri Olsa 5ec4502d77 perf annotate: Fix sizeof_sym_hist overflow issue
The annotated_source::sizeof_sym_hist could easily overflow int size,
resulting in crash in __symbol__inc_addr_samples.

Changing its type int size_t as was probably intended from beginning
based on the initialization code in symbol__alloc_hist.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Don Zickus <dzickus@redhat.com>
Cc: Kan Liang <kan.liang@intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1444068369-20978-4-git-send-email-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-10-05 16:15:38 -03:00
Jiri Olsa 84422592e5 perf evlist: Display DATA_SRC sample type bit
Adding DATA_SRC bit_name call to display sample_type properly.

   $ perf evlist -v
   cpu/mem-loads/pp: ...SNIP... sample_type: IP|TID|TIME|ADDR|CPU|PERIOD|DATA_SRC, ...

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Don Zickus <dzickus@redhat.com>
Cc: Kan Liang <kan.liang@intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1444068369-20978-3-git-send-email-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-10-05 16:15:10 -03:00
Kan Liang 19afd10410 perf stat: Reduce min --interval-print to 10ms
The --interval-print parameter was limited to 100ms. However, for
example, 10ms is required to do sophisticated bandwidth analysis using
uncore events.

The test shows that the overhead of the system-wide uncore monitoring
with 10ms interval is only ~2%. So this patch reduces the minimal
interval-print allowd to 10ms.

But 10ms may not work well for all cases. For example, when the
cpus/threads number is very large, for system-wide core event monitoring
the overhead could be high.

To handle this issue, a warning will be displayed when the
interval-print is set between 10ms to 100ms. So users can make a
decision according to their specific cases.

 # perf stat -e uncore_imc_1/cas_count_read/ -a --interval-print 10 -- sleep 1

 print interval < 100ms. The overhead percentage could be high in some
 cases. Please proceed with caution.
 #           time             counts unit events
      0.010200451               0.10 MiB  uncore_imc_1/cas_count_read/
      0.020475117               0.02 MiB  uncore_imc_1/cas_count_read/
      0.030692800               0.01 MiB  uncore_imc_1/cas_count_read/
      0.040948161               0.02 MiB  uncore_imc_1/cas_count_read/
      0.051159564               0.00 MiB  uncore_imc_1/cas_count_read/

Signed-off-by: Kan Liang <kan.liang@intel.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: http://lkml.kernel.org/r/1443776674-42511-1-git-send-email-kan.liang@intel.com
[ Added warning about overhead when using sub 100ms intervals to the man page ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-10-02 17:07:55 -03:00
Yang Shi 9f065194e2 perf record: Change 'record.samples' type to unsigned long long
When run "perf record -e", the number of samples showed up is wrong on some
32 bit systems, i.e. powerpc and arm.

For example, run the below commands on 32 bit powerpc:

  perf probe -x /lib/libc.so.6 malloc
  perf record -e probe_libc:malloc -a ls perf.data
  [ perf record: Woken up 1 times to write data ]
  [ perf record: Captured and wrote 0.036 MB perf.data (13829241621624967218 samples) ]

Actually, "perf script" just shows 21 samples. The number of samples is also
absurd since samples is long type, but it is printed as PRIu64.

Build test ran on x86-64, x86, aarch64, arm, mips, ppc and ppc64.

Signed-off-by: Yang Shi <yang.shi@linaro.org>
Cc: linaro-kernel@lists.linaro.org
Link: http://lkml.kernel.org/r/1443563383-4064-1-git-send-email-yang.shi@linaro.org
[ Bumped the 'hits' var used together with record.samples to 'unsigned long long' too ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-10-02 16:11:08 -03:00
Masami Hiramatsu 1a8ac29cbf perf probe: Allow probing on kmodules without dwarf
Allow probing on kernel modules when 'perf' is built without debuginfo
support.

Currently perf-probe --module requires linking with libdw, but this
doesn't make sense.

E.g.
  ----
  # make NO_DWARF=1
  # ./perf probe -m pcspkr pcspkr_event%return
    Error: unknown switch `m'
  ----

With this patch
  ----
  # ./perf probe -m pcspkr pcspkr_event%return
  Added new event:
    probe:pcspkr_event   (on pcspkr_event%return in pcspkr)

  You can now use it in all perf tools, such as:

          perf record -e probe:pcspkr_event -aR sleep 1
  ----

Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/r/20151002125832.18617.78721.stgit@localhost.localdomain
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-10-02 15:59:23 -03:00
Arnaldo Carvalho de Melo fa52ceabc2 perf list: Honour 'event_glob' whem printing selectable PMUs
Some PMUs, like the 'intel_bts' one can be used as an event name, i.e.:

	$ perf record -e intel_bts:// usleep 1

Is a valid event name.

But the code printing such PMUs was not honouring the 'event_glob'
parameter, so the following line was always appearing:

  $ intel_bts//                                        [Kernel PMU event]

Fix it:

  $ [acme@felicio linux]$ perf list data

  List of pre-defined events (to be used in -e):

    uncore_imc/data_reads/                             [Kernel PMU event]
    uncore_imc/data_writes/                            [Kernel PMU event]

  $

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: David Ahern <dsahern@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/n/tip-ajb71858n7q7ao77b8pyy74w@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-10-02 15:28:16 -03:00
Arnaldo Carvalho de Melo dbc67409fa perf list: Do event name substring search as last resort when no events found
Before:

  # perf list _alloc_ | head -10
  #

After:

  # perf list _alloc_ | head -10
    ext4:ext4_alloc_da_blocks                          [Tracepoint event]
    ext4:ext4_get_implied_cluster_alloc_exit           [Tracepoint event]
    kmem:kmem_cache_alloc_node                         [Tracepoint event]
    kmem:mm_page_alloc_extfrag                         [Tracepoint event]
    kmem:mm_page_alloc_zone_locked                     [Tracepoint event]
    xen:xen_mmu_alloc_ptpage                           [Tracepoint event]
  #

And it works for all types of events:

  # perf list br

  List of pre-defined events (to be used in -e):

    branch-instructions OR branches                    [Hardware event]
    branch-misses                                      [Hardware event]

    branch-load-misses                                 [Hardware cache event]
    branch-loads                                       [Hardware cache event]

    branch-instructions OR cpu/branch-instructions/    [Kernel PMU event]
    branch-misses OR cpu/branch-misses/                [Kernel PMU event]

    filelock:break_lease_block                         [Tracepoint event]
    filelock:break_lease_noblock                       [Tracepoint event]
    filelock:break_lease_unblock                       [Tracepoint event]
    syscalls:sys_enter_brk                             [Tracepoint event]
    syscalls:sys_exit_brk                              [Tracepoint event]

  #

Suggested-by: Ingo Molnar <mingo@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: David Ahern <dsahern@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/n/tip-qieivl18jdemoaghgndj36e6@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-10-01 12:12:22 -03:00
Adrian Hunter 0edd453368 perf callchain: Allow for max_stack greater than PERF_MAX_STACK_DEPTH
Adjust the validation to allow for max_stack greater than
PERF_MAX_STACK_DEPTH.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Link: http://lkml.kernel.org/r/1443186956-18718-18-git-send-email-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-10-01 09:56:06 -03:00
Namhyung Kim 208e760745 perf report: Fix a bug on "--call-graph none" option
The patch f9db0d0f1b ("perf callchain: Allow disabling call graphs
per event") added an ability to enable/disable callchain recording per
event.  But it had a problem when the enablement setting is changed at
'perf report' time using -g/--call-graph option.

For example, the following scenario will get a segfault.

  $ perf record -ag sleep 1
  [ perf record: Woken up 1 times to write data ]
  [ perf record: Captured and wrote 0.500 MB perf.data (2555 samples) ]

  $ perf report -g none
  perf: Segmentation fault
  -------- backtrace --------
  perf[0x53a98a]
  /usr/lib/libc.so.6(+0x335af)[0x7f4e91df95af]

This is because callchain_param.sort() callback was not set but it
tried to call the function as it had the PERF_SAMPLE_CALLCHAIN bit.

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Kan Liang <kan.liang@intel.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Fixes: f9db0d0f1b ("perf callchain: Allow disabling call graphs per event")
Link: http://lkml.kernel.org/r/1443587640-24242-1-git-send-email-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-10-01 09:54:33 -03:00
Namhyung Kim c53d138d41 perf top: Register idle thread
The perf top didn't add the idle/swapper thread to the machine's thread
list and its comm was displayed as ':0'.  Fix it.

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1443577526-3240-3-git-send-email-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-10-01 09:54:33 -03:00
Namhyung Kim 4b37af5957 perf top: Fix unresolved comm when -s comm is used
The perf top uses 'dso,symbol' sort keys by default so it overlooked a
problem in task's comm resolving.  When the sort key contains 'comm',
some task's comm is not shown properly.  This is because the
perf_top__mmap_read_idx() checks the cpumode value improperly.

The cpumode value of non-sample events are 0 (PERF_RECORD_MISC_CPUMODE_
UNKNOWN) so the events will be ignored by the switch statement.  This patch
allows it for non-sample events.

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1443577526-3240-2-git-send-email-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-10-01 09:54:33 -03:00
Namhyung Kim e5bed56448 perf record: Allocate area for sample_id_hdr in a synthesized comm event
A previous patch added a synthesized comm event for forked child process
but it missed that the event should contain area for sample_id_hdr at
the end.  It worked by accident since the perf_event union contains
bigger event structs like mmap_events.

This patch fixes it by dynamically allocating event struct including
those area like in perf_event__synthesize_thread_map().

Reported-by: Arnaldo Carvalho de Melo <acme@kernel.org>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1443577526-3240-1-git-send-email-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-10-01 09:54:33 -03:00
Arnaldo Carvalho de Melo 7f8d1ade1b perf tools: By default use the most precise "cycles" hw counter available
If the user doesn't specify any event, try the most precise "cycles"
available, i.e. start by "cycles:ppp" and go on removing "p" till it
works.

E.g.

  $ perf record usleep 1
  [ perf record: Woken up 1 times to write data ]
  [ perf record: Captured and wrote 0.017 MB perf.data (11 samples) ]
  $ perf evlist
  cycles:pp
  $ perf evlist -v
  cycles:pp: size: 112, { sample_period, sample_freq }: 4000, sample_type:
  IP|TID|TIME|PERIOD, disabled: 1, inherit: 1, mmap: 1, comm: 1, freq: 1,
  enable_on_exec: 1, task: 1, precise_ip: 2, sample_id_all: 1,
  exclude_guest: 1, mmap2: 1, comm_exec: 1
  $ grep 'model name' /proc/cpuinfo | head -1
  model name	: Intel(R) Core(TM) i7-3667U CPU @ 2.00GHz
  $

When 'cycles' appears explicitely is specified this will not be tried,
i.e. the user has full control of the level of precision to be used:

  $ perf record -e cycles usleep 1
  [ perf record: Woken up 1 times to write data ]
  [ perf record: Captured and wrote 0.016 MB perf.data (9 samples) ]
  $ perf evlist
  cycles
  $ perf evlist -v
  cycles: size: 112, { sample_period, sample_freq }: 4000, sample_type:
  IP|TID|TIME|PERIOD, disabled: 1, inherit: 1, mmap: 1, comm: 1, freq: 1,
  enable_on_exec: 1, task: 1, sample_id_all: 1, exclude_guest: 1, mmap2:
  1, comm_exec: 1
  $

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Chandler Carruth <chandlerc@gmail.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Wang Nan <wangnan0@huawei.com>
Link: https://www.youtube.com/watch?v=nXaxk27zwlk
Link: http://lkml.kernel.org/n/tip-b1ywebmt22pi78vjxau01wth@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-09-30 18:34:39 -03:00
Arnaldo Carvalho de Melo dfc431cbdc perf list: Remove blank lines, headers when piping output
So that one can, for instance, use it with wc -l:

  # perf list *:*write* | wc -l
  60

Or to look for the "bio" tracepoints, without 'perf list' headers:

  # perf list *:*bio* | head
    block:block_bio_backmerge                          [Tracepoint event]
    block:block_bio_bounce                             [Tracepoint event]
    block:block_bio_complete                           [Tracepoint event]
    block:block_bio_frontmerge                         [Tracepoint event]
    block:block_bio_queue                              [Tracepoint event]
    block:block_bio_remap                              [Tracepoint event]
  #

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: David Ahern <dsahern@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/n/tip-ts7sc0x8u4io4cifzkup4j44@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-09-30 18:34:38 -03:00
Masami Hiramatsu 6cca13bdf5 perf probe: Improve error message when %return is on inlined function
perf probe shows more precisely message when it finds given
%return target function is inlined.

Without this fix:
  ----
  # ./perf probe -V getname_flags%return
  Return probe must be on the head of a real function.
  Debuginfo analysis failed.
    Error: Failed to show vars.
  ----

With this fix:
  ----
  # ./perf probe -V getname_flags%return
  Failed to find "getname_flags%return",
   because getname_flags is an inlined function and has no return point.
  Debuginfo analysis failed.
    Error: Failed to show vars.
  ----

Suggested-by: Arnaldo Carvalho de Melo <acme@kernel.org>
Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/r/20150930164137.3733.55055.stgit@localhost.localdomain
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-09-30 18:34:37 -03:00
Masami Hiramatsu 20f49859c7 perf probe: Fix a segfault bug in debuginfo_cache
perf probe --list will get a segfault if the first kprobe event is on a
module and the second or latter one is on the kernel.

e.g.
  ----
  # ./perf probe -q -m pcspkr pcspkr_event
  # ./perf probe -q vfs_read
  # ./perf probe -l
  Segmentation fault (core dumped)
  ----

This is because the debuginfo_cache fails to handle NULL module name,
which causes segfault on strcmp. (Note that strcmp("something", NULL)
always causes segfault)

To fix this debuginfo_cache__open always translates the NULL module name
to "kernel" (this is correct, because NULL module name means opening the
debuginfo for the kernel)

  ----
  # ./perf probe -l
    probe:pcspkr_event   (on pcspkr_event@drivers/input/misc/pcspkr.c
    in pcspkr)
    probe:vfs_read       (on vfs_read@ksrc/linux-3/fs/read_write.c)
  ----

Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/r/20150930164135.3733.23993.stgit@localhost.localdomain
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-09-30 18:34:36 -03:00
Masami Hiramatsu 9b239a12bc perf probe: Show correct source lines of probes on kmodules
Perf probe always failed to find appropriate line numbers because of
failing to find .text start address offset from debuginfo.

e.g.
  ----
  # ./perf probe -m pcspkr pcspkr_event:5
  Added new events:
    probe:pcspkr_event   (on pcspkr_event:5 in pcspkr)
    probe:pcspkr_event_1 (on pcspkr_event:5 in pcspkr)

  You can now use it in all perf tools, such as:

          perf record -e probe:pcspkr_event_1 -aR sleep 1

  # ./perf probe -l
  Failed to find debug information for address ffffffffa031f006
  Failed to find debug information for address ffffffffa031f016
    probe:pcspkr_event   (on pcspkr_event+6 in pcspkr)
    probe:pcspkr_event_1 (on pcspkr_event+22 in pcspkr)
  ----

This fixes the above issue as below.
1. Get the relative address of the symbol in .text by using
   map->start.
2. Adjust the address by adding the offset of .text section
   in the kernel module binary.

With this fix, perf probe -l shows lines correctly.
  ----
  # ./perf probe -l
    probe:pcspkr_event   (on pcspkr_event:5@drivers/input/misc/pcspkr.c in pcspkr)
    probe:pcspkr_event_1 (on pcspkr_event:5@drivers/input/misc/pcspkr.c in pcspkr)
  ----

Reported-by: Arnaldo Carvalho de Melo <acme@kernel.org>
Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/r/20150930164132.3733.24643.stgit@localhost.localdomain
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-09-30 18:34:35 -03:00
Masami Hiramatsu 9135949ddd perf probe: Begin and end libdwfl report session correctly
Fix a trival bug about libdwfl usage of the report session, it should
explicitly begin and end a report session around dwfl_report_offline().

Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/r/20150930164128.3733.59876.stgit@localhost.localdomain
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-09-30 18:34:34 -03:00
Masami Hiramatsu 663b1151f2 perf probe: Fix to remove dot suffix from second or latter events
Fix to remove dot suffix (e.g. .const, .isra) from the second or latter
events which has suffix numbers.

Since the previous commit 35a23ff928 ("perf probe: Cut off the gcc
optimization postfixes from function name") didn't care about the suffix
numbered events, therefore we'll have an error when we add additional
events on the same dot suffix functions.

e.g.
  ----
  # ./perf probe -f -a get_sigframe.isra.2.constprop.3 \
   -a get_sigframe.isra.2.constprop.3
  Failed to write event: Invalid argument
    Error: Failed to add events.
  ----

This fixes above issue as below:
  ----
  # ./perf probe -f -a get_sigframe.isra.2.constprop.3 \
   -a get_sigframe.isra.2.constprop.3
  Added new events:
    probe:get_sigframe   (on get_sigframe.isra.2.constprop.3)
    probe:get_sigframe_1 (on get_sigframe.isra.2.constprop.3)

  You can now use it in all perf tools, such as:

          perf record -e probe:get_sigframe_1 -aR sleep 1

  ----

Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/r/20150930164130.3733.26573.stgit@localhost.localdomain
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-09-30 18:34:33 -03:00
Arnaldo Carvalho de Melo 8e947f1e84 tools lib symbol: Rename kallsyms2elf_type to kallsyms2elf_binding
It is about binding, not type, we have just a letter in kallsyms that
should map both for the ELF type (STT_FUNC, etc) and to the ELF
symbol binding (STB_WEAK, STB_GLOBAL, etc), so rename it now before
introducing kallsyms2_elf_type()

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/n/tip-uu5vj343ms1q2wm55690on6v@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-09-30 18:34:30 -03:00
Arnaldo Carvalho de Melo a5e813c686 perf machine: Add method for common kernel_map(FUNCTION) operation
And it is also a step in the direction of killing the separation of data
and text maps in map_groups.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/n/tip-rrds86kb3wx5wk8v38v56gw8@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-09-30 18:34:29 -03:00
Arnaldo Carvalho de Melo 77e6597749 perf machine: Use machine__kernel_map() thoroughly
In places where we were using its open coded equivalent.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/n/tip-khkdugcdoqy3tkszm3jdxgbe@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-09-30 18:34:28 -03:00
Sukadev Bhattiprolu eb56db5432 perf tools: Fix build break on powerpc due to sample_reg_masks
The perf_regs.c file does not get built on Powerpc as CONFIG_PERF_REGS
is false.  So the weak definition for 'sample_regs_masks' doesn't get
picked up.

Adding perf_regs.o to util/Build unconditionally, exposes a redefinition
error for 'perf_reg_value()' function (due to the static inline version
in util/perf_regs.h). So use #ifdef HAVE_PERF_REGS_SUPPORT' around that
function.

Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Dominik Dingel <dingel@linux.vnet.ibm.com>
Cc: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: linuxppc-dev@ozlabs.org
Link: http://lkml.kernel.org/r/20150930182836.GA27858@us.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-09-30 18:34:27 -03:00
Adrian Hunter 40862a7b79 perf report: Amend documentation about max_stack and synthesized callchains
The --max_stack option was added as an optimization to reduce processing time,
so people specifying --max-stack might get a increased processing time if
combined with synthesized callchains, but otherwise no real harm.

A warning about setting both --max_stack and the synthesized callchains max
depth seems like overkill.  Amend the documentation.

Reported-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Link: http://lkml.kernel.org/r/560A5155.4060105@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-09-30 18:34:26 -03:00
Arnaldo Carvalho de Melo b7f9ff5654 perf maps: Introduce maps__find_symbol_by_name()
Out of map_groups__find_symbol_by_name(), so that we can turn this later
one first into a call to maps__find_symbol_by_name(MAP__FUNCTION) +
MAP__VARIABLE, and then to just one call, we'll merge MAP__FUNCTION with
MAP__VARIABLE maps, to simplify the code.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: David Ahern <dsahern@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/n/tip-pvkar0jacqn92g148u9sqttt@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-09-30 18:34:25 -03:00
Jiri Olsa 272ed29a91 perf tools: Fix shadowed declaration in parse-events.c
The error variable breaks build on CentOS 6.7, due to a collision with a
global error symbol:

    CC       util/parse-events.o
  cc1: warnings being treated as errors
  util/parse-events.c:419: error: declaration of ‘error’ shadows a global
  declaration
  util/util.h:135: error: shadowed declaration is here
  util/parse-events.c: In function ‘add_tracepoint_multi_event’:
  ...

Using different argument names instead to fix it.

Reported-by: Vinson Lee <vlee@twopensource.com>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: He Kuang <hekuang@huawei.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: linux-tip-commits@vger.kernel.org
Cc: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Raphael Beamonte <raphael.beamonte@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20150929150531.GI27383@krava.redhat.com
[ Fix one more case, at line 770 ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-09-30 18:34:23 -03:00
He Kuang e637d17757 perf tools: Enable event_config terms to tracepoint events
This patch enables config terms for tracepoint perf events. Valid terms
for tracepoint events are 'call-graph' and 'stack-size', so we can use
different callgraph settings for each event and eliminate unnecessary
overhead.

Here is an example for using different call-graph config for each
tracepoint.

  $ perf record -e syscalls:sys_enter_write/call-graph=fp/
                -e syscalls:sys_exit_write/call-graph=no/
                dd if=/dev/zero of=test bs=4k count=10

  $ perf report --stdio

  #
  # Total Lost Samples: 0
  #
  # Samples: 13  of event 'syscalls:sys_enter_write'
  # Event count (approx.): 13
  #
  # Children      Self  Command  Shared Object       Symbol
  # ........  ........  .......  ..................  ......................
  #
      76.92%    76.92%  dd       libpthread-2.20.so  [.] __write_nocancel
                   |
                   ---__write_nocancel

      23.08%    23.08%  dd       libc-2.20.so        [.] write
                   |
                   ---write
                      |
                      |--33.33%-- 0x2031342820736574
                      |
                      |--33.33%-- 0xa6e69207364726f
                      |
                       --33.33%-- 0x34202c7320393039
  ...

  # Samples: 13  of event 'syscalls:sys_exit_write'
  # Event count (approx.): 13
  #
  # Children      Self  Command  Shared Object       Symbol
  # ........  ........  .......  ..................  ......................
  #
      76.92%    76.92%  dd       libpthread-2.20.so  [.] __write_nocancel
      23.08%    23.08%  dd       libc-2.20.so        [.] write
       7.69%     0.00%  dd       [unknown]           [.] 0x0a6e69207364726f
       7.69%     0.00%  dd       [unknown]           [.] 0x2031342820736574
       7.69%     0.00%  dd       [unknown]           [.] 0x34202c7320393039

Signed-off-by: He Kuang <hekuang@huawei.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Kan Liang <kan.liang@intel.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Wang Nan <wangnan0@huawei.com>
Cc: pi3orama@163.com
Link: http://lkml.kernel.org/r/1443412336-120050-4-git-send-email-hekuang@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-09-28 17:30:07 -03:00
He Kuang 865582c3f4 perf tools: Adds the tracepoint name parsing support
Adds rules for parsing tracepoint names. Change rules of tracepoint which
derives from PE_NAMEs into tracepoint names directly, so adding more rules
based on tracepoint names will be easier.

Changes v2-v3:
   - Change __event_legacy_tracepoint label in bison file to tracepoint_name
   - Fix formats error.

Signed-off-by: He Kuang <hekuang@huawei.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Kan Liang <kan.liang@intel.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Wang Nan <wangnan0@huawei.com>
Cc: pi3orama@163.com
Link: http://lkml.kernel.org/r/1443412336-120050-3-git-send-email-hekuang@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-09-28 17:29:38 -03:00
He Kuang ffeb883e56 perf tools: Show proper error message for wrong terms of hw/sw events
Show proper error message and show valid terms when wrong config terms
is specified for hw/sw type perf events.

This patch makes the original error format function formats_error_string()
more generic, which only outputs the static config terms for hw/sw perf
events, and prepends pmu formats for pmu events.

Before this patch:

  $ perf record -e 'cpu-clock/freqx=200/' -a sleep 1
  invalid or unsupported event: 'cpu-clock/freqx=200/'
  Run 'perf list' for a list of valid events

   usage: perf record [<options>] [<command>]
      or: perf record [<options>] -- <command> [<options>]

      -e, --event <event>   event selector. use 'perf list' to list available events

After this patch:

  $ perf record -e 'cpu-clock/freqx=200/' -a sleep 1
  event syntax error: 'cpu-clock/freqx=200/'
                                 \___ unknown term

  valid terms: config,config1,config2,name,period,freq,branch_type,time,call-graph,stack-size

  Run 'perf list' for a list of valid events

   usage: perf record [<options>] [<command>]
      or: perf record [<options>] -- <command> [<options>]

      -e, --event <event>   event selector. use 'perf list' to list available events

Signed-off-by: He Kuang <hekuang@huawei.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Kan Liang <kan.liang@intel.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Wang Nan <wangnan0@huawei.com>
Cc: pi3orama@163.com
Link: http://lkml.kernel.org/r/1443412336-120050-2-git-send-email-hekuang@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-09-28 17:26:54 -03:00
He Kuang 0b8891a8e6 perf tools: Adds the config_term callback for different type events
Currently, function config_term() is used for checking config terms of
all types of events, while unknown terms is not reported as an error
because pmu events have valid terms in sysfs.

But this is wrong when unknown terms are specificed to hw/sw events.
This patch Adds the config_term callback so we can use separate check
routines for each type of events.

Signed-off-by: He Kuang <hekuang@huawei.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Kan Liang <kan.liang@intel.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Wang Nan <wangnan0@huawei.com>
Cc: pi3orama@163.com
Link: http://lkml.kernel.org/r/1443412336-120050-1-git-send-email-hekuang@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-09-28 17:25:53 -03:00
Adrian Hunter ba11ba65e0 perf intel-pt: Add mispred-all config option to aid use with autofdo
autofdo incorrectly expects branch flags to include either mispred or
predicted.  In fact mispred = predicted = 0 is valid and means the flags
are not supported, which they aren't by Intel PT.

To make autofdo work, add a config option which will cause Intel PT
decoder to set the mispred flag on all branches.

Below is an example of using Intel PT with autofdo.  The example is
also added to the Intel PT documentation.  It requires autofdo
(https://github.com/google/autofdo) and gcc version 5.  The bubble
sort example is from the AutoFDO tutorial (https://gcc.gnu.org/wiki/AutoFDO/Tutorial)
amended to take the number of elements as a parameter.

	$ gcc-5 -O3 sort.c -o sort_optimized
	$ ./sort_optimized 30000
	Bubble sorting array of 30000 elements
	2254 ms

	$ cat ~/.perfconfig
	[intel-pt]
		mispred-all

	$ perf record -e intel_pt//u ./sort 3000
	Bubble sorting array of 3000 elements
	58 ms
	[ perf record: Woken up 2 times to write data ]
	[ perf record: Captured and wrote 3.939 MB perf.data ]
	$ perf inject -i perf.data -o inj --itrace=i100usle --strip
	$ ./create_gcov --binary=./sort --profile=inj --gcov=sort.gcov -gcov_version=1
	$ gcc-5 -O3 -fauto-profile=sort.gcov sort.c -o sort_autofdo
	$ ./sort_autofdo 30000
	Bubble sorting array of 30000 elements
	2155 ms

Note there is currently no advantage to using Intel PT instead of LBR,
but that may change in the future if greater use is made of the data.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Link: http://lkml.kernel.org/r/1443186956-18718-26-git-send-email-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-09-28 17:21:00 -03:00