linux/tools/perf/util/map.h

190 lines
5.1 KiB
C
Raw Normal View History

License cleanup: add SPDX GPL-2.0 license identifier to files with no license Many source files in the tree are missing licensing information, which makes it harder for compliance tools to determine the correct license. By default all files without license information are under the default license of the kernel, which is GPL version 2. Update the files which contain no license information with the 'GPL-2.0' SPDX license identifier. The SPDX identifier is a legally binding shorthand, which can be used instead of the full boiler plate text. This patch is based on work done by Thomas Gleixner and Kate Stewart and Philippe Ombredanne. How this work was done: Patches were generated and checked against linux-4.14-rc6 for a subset of the use cases: - file had no licensing information it it. - file was a */uapi/* one with no licensing information in it, - file was a */uapi/* one with existing licensing information, Further patches will be generated in subsequent months to fix up cases where non-standard license headers were used, and references to license had to be inferred by heuristics based on keywords. The analysis to determine which SPDX License Identifier to be applied to a file was done in a spreadsheet of side by side results from of the output of two independent scanners (ScanCode & Windriver) producing SPDX tag:value files created by Philippe Ombredanne. Philippe prepared the base worksheet, and did an initial spot review of a few 1000 files. The 4.13 kernel was the starting point of the analysis with 60,537 files assessed. Kate Stewart did a file by file comparison of the scanner results in the spreadsheet to determine which SPDX license identifier(s) to be applied to the file. She confirmed any determination that was not immediately clear with lawyers working with the Linux Foundation. Criteria used to select files for SPDX license identifier tagging was: - Files considered eligible had to be source code files. - Make and config files were included as candidates if they contained >5 lines of source - File already had some variant of a license header in it (even if <5 lines). All documentation files were explicitly excluded. The following heuristics were used to determine which SPDX license identifiers to apply. - when both scanners couldn't find any license traces, file was considered to have no license information in it, and the top level COPYING file license applied. For non */uapi/* files that summary was: SPDX license identifier # files ---------------------------------------------------|------- GPL-2.0 11139 and resulted in the first patch in this series. If that file was a */uapi/* path one, it was "GPL-2.0 WITH Linux-syscall-note" otherwise it was "GPL-2.0". Results of that was: SPDX license identifier # files ---------------------------------------------------|------- GPL-2.0 WITH Linux-syscall-note 930 and resulted in the second patch in this series. - if a file had some form of licensing information in it, and was one of the */uapi/* ones, it was denoted with the Linux-syscall-note if any GPL family license was found in the file or had no licensing in it (per prior point). Results summary: SPDX license identifier # files ---------------------------------------------------|------ GPL-2.0 WITH Linux-syscall-note 270 GPL-2.0+ WITH Linux-syscall-note 169 ((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause) 21 ((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause) 17 LGPL-2.1+ WITH Linux-syscall-note 15 GPL-1.0+ WITH Linux-syscall-note 14 ((GPL-2.0+ WITH Linux-syscall-note) OR BSD-3-Clause) 5 LGPL-2.0+ WITH Linux-syscall-note 4 LGPL-2.1 WITH Linux-syscall-note 3 ((GPL-2.0 WITH Linux-syscall-note) OR MIT) 3 ((GPL-2.0 WITH Linux-syscall-note) AND MIT) 1 and that resulted in the third patch in this series. - when the two scanners agreed on the detected license(s), that became the concluded license(s). - when there was disagreement between the two scanners (one detected a license but the other didn't, or they both detected different licenses) a manual inspection of the file occurred. - In most cases a manual inspection of the information in the file resulted in a clear resolution of the license that should apply (and which scanner probably needed to revisit its heuristics). - When it was not immediately clear, the license identifier was confirmed with lawyers working with the Linux Foundation. - If there was any question as to the appropriate license identifier, the file was flagged for further research and to be revisited later in time. In total, over 70 hours of logged manual review was done on the spreadsheet to determine the SPDX license identifiers to apply to the source files by Kate, Philippe, Thomas and, in some cases, confirmation by lawyers working with the Linux Foundation. Kate also obtained a third independent scan of the 4.13 code base from FOSSology, and compared selected files where the other two scanners disagreed against that SPDX file, to see if there was new insights. The Windriver scanner is based on an older version of FOSSology in part, so they are related. Thomas did random spot checks in about 500 files from the spreadsheets for the uapi headers and agreed with SPDX license identifier in the files he inspected. For the non-uapi files Thomas did random spot checks in about 15000 files. In initial set of patches against 4.14-rc6, 3 files were found to have copy/paste license identifier errors, and have been fixed to reflect the correct identifier. Additionally Philippe spent 10 hours this week doing a detailed manual inspection and review of the 12,461 patched files from the initial patch version early this week with: - a full scancode scan run, collecting the matched texts, detected license ids and scores - reviewing anything where there was a license detected (about 500+ files) to ensure that the applied SPDX license was correct - reviewing anything where there was no detection but the patch license was not GPL-2.0 WITH Linux-syscall-note to ensure that the applied SPDX license was correct This produced a worksheet with 20 files needing minor correction. This worksheet was then exported into 3 different .csv files for the different types of files to be modified. These .csv files were then reviewed by Greg. Thomas wrote a script to parse the csv files and add the proper SPDX tag to the file, in the format that the file expected. This script was further refined by Greg based on the output to detect more types of files automatically and to distinguish between header and source .c files (which need different comment types.) Finally Greg ran the script using the .csv files to generate the patches. Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org> Reviewed-by: Philippe Ombredanne <pombredanne@nexb.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-11-01 22:07:57 +08:00
/* SPDX-License-Identifier: GPL-2.0 */
#ifndef __PERF_MAP_H
#define __PERF_MAP_H
#include <linux/refcount.h>
#include <linux/compiler.h>
#include <linux/list.h>
#include <linux/rbtree.h>
#include <stdio.h>
#include <string.h>
#include <stdbool.h>
#include <linux/types.h>
struct dso;
struct maps;
struct machine;
struct map {
union {
struct rb_node rb_node;
struct list_head node;
};
u64 start;
u64 end;
perf map: Use bitmap for booleans The map->priv and map->erange_warned are seldom used, the first only in tests/vmlinux-kallsyms.c, the later only when hist_entry__inc_addr_samples() returns -ERANGE in 'perf top', which are really rare occasions, so make them a bool bitfield. This will open up space for other members on the first cacheline. $ pahole -C map ~/bin/perf struct map { union { struct rb_node rb_node __attribute__((__aligned__(8))); /* 0 24 */ struct list_head node; /* 0 16 */ } __attribute__((__aligned__(8))); /* 0 24 */ u64 start; /* 24 8 */ u64 end; /* 32 8 */ _Bool erange_warned:1; /* 40: 0 1 */ _Bool priv:1; /* 40: 1 1 */ /* XXX 6 bits hole, try to pack */ /* XXX 3 bytes hole, try to pack */ u32 prot; /* 44 4 */ u32 flags; /* 48 4 */ /* XXX 4 bytes hole, try to pack */ u64 pgoff; /* 56 8 */ /* --- cacheline 1 boundary (64 bytes) --- */ u64 reloc; /* 64 8 */ u32 maj; /* 72 4 */ u32 min; /* 76 4 */ u64 ino; /* 80 8 */ u64 ino_generation; /* 88 8 */ u64 (*map_ip)(struct map *, u64); /* 96 8 */ u64 (*unmap_ip)(struct map *, u64); /* 104 8 */ struct dso * dso; /* 112 8 */ refcount_t refcnt; /* 120 4 */ /* size: 128, cachelines: 2, members: 17 */ /* sum members: 116, holes: 2, sum holes: 7 */ /* sum bitfield members: 2 bits, bit holes: 1, sum bit holes: 6 bits */ /* padding: 4 */ /* forced alignments: 1 */ } __attribute__((__aligned__(8))); $ Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lkml.kernel.org/n/tip-g5545pcq4ff0wr17tfb1piqt@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-11-19 03:26:29 +08:00
bool erange_warned:1;
bool priv:1;
u32 prot;
u64 pgoff;
u64 reloc;
perf annotate: Fix it for non-prelinked *.so The problem was we were incorrectly calculating objdump addresses for sym->start and sym->end, look: For simple ET_DYN type DSO (*.so) with one function, objdump -dS output is something like this: 000004ac <my_strlen>: int my_strlen(const char *s) 4ac: 55 push %ebp 4ad: 89 e5 mov %esp,%ebp 4af: 83 ec 10 sub $0x10,%esp { i.e. we have relative-to-dso-mapping IPs (=RIP) there. For ET_EXEC type and probably for prelinked libs as well (sorry can't test - I don't use prelink) objdump outputs absolute IPs, e.g. 08048604 <zz_strlen>: extern "C" int zz_strlen(const char *s) 8048604: 55 push %ebp 8048605: 89 e5 mov %esp,%ebp 8048607: 83 ec 10 sub $0x10,%esp { So, if sym->start is always relative to dso mapping(*), we'll have to unmap it for ET_EXEC like cases, and leave as is for ET_DYN cases. (*) and it is - we've explicitely made it relative. Look for adjust_symbols handling in dso__load_sym() Previously we were always unmapping sym->start and for ET_DYN dsos resulting addresses were wrong, and so objdump output was empty. The end result was that perf annotate output for symbols from non-prelinked *.so had always 0.00% percents only, which is wrong. To fix it, let's introduce a helper for converting rip to objdump address, and also let's document what map_ip() and unmap_ip() do -- I had to study sources for several hours to understand it. Signed-off-by: Kirill Smelkov <kirr@landau.phys.spbu.ru> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Mike Galbraith <efault@gmx.de> LKML-Reference: <1265223128-11786-8-git-send-email-acme@infradead.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-02-04 02:52:07 +08:00
/* ip -> dso rip */
u64 (*map_ip)(struct map *, u64);
perf annotate: Fix it for non-prelinked *.so The problem was we were incorrectly calculating objdump addresses for sym->start and sym->end, look: For simple ET_DYN type DSO (*.so) with one function, objdump -dS output is something like this: 000004ac <my_strlen>: int my_strlen(const char *s) 4ac: 55 push %ebp 4ad: 89 e5 mov %esp,%ebp 4af: 83 ec 10 sub $0x10,%esp { i.e. we have relative-to-dso-mapping IPs (=RIP) there. For ET_EXEC type and probably for prelinked libs as well (sorry can't test - I don't use prelink) objdump outputs absolute IPs, e.g. 08048604 <zz_strlen>: extern "C" int zz_strlen(const char *s) 8048604: 55 push %ebp 8048605: 89 e5 mov %esp,%ebp 8048607: 83 ec 10 sub $0x10,%esp { So, if sym->start is always relative to dso mapping(*), we'll have to unmap it for ET_EXEC like cases, and leave as is for ET_DYN cases. (*) and it is - we've explicitely made it relative. Look for adjust_symbols handling in dso__load_sym() Previously we were always unmapping sym->start and for ET_DYN dsos resulting addresses were wrong, and so objdump output was empty. The end result was that perf annotate output for symbols from non-prelinked *.so had always 0.00% percents only, which is wrong. To fix it, let's introduce a helper for converting rip to objdump address, and also let's document what map_ip() and unmap_ip() do -- I had to study sources for several hours to understand it. Signed-off-by: Kirill Smelkov <kirr@landau.phys.spbu.ru> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Mike Galbraith <efault@gmx.de> LKML-Reference: <1265223128-11786-8-git-send-email-acme@infradead.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-02-04 02:52:07 +08:00
/* dso rip -> ip */
u64 (*unmap_ip)(struct map *, u64);
perf annotate: Fix it for non-prelinked *.so The problem was we were incorrectly calculating objdump addresses for sym->start and sym->end, look: For simple ET_DYN type DSO (*.so) with one function, objdump -dS output is something like this: 000004ac <my_strlen>: int my_strlen(const char *s) 4ac: 55 push %ebp 4ad: 89 e5 mov %esp,%ebp 4af: 83 ec 10 sub $0x10,%esp { i.e. we have relative-to-dso-mapping IPs (=RIP) there. For ET_EXEC type and probably for prelinked libs as well (sorry can't test - I don't use prelink) objdump outputs absolute IPs, e.g. 08048604 <zz_strlen>: extern "C" int zz_strlen(const char *s) 8048604: 55 push %ebp 8048605: 89 e5 mov %esp,%ebp 8048607: 83 ec 10 sub $0x10,%esp { So, if sym->start is always relative to dso mapping(*), we'll have to unmap it for ET_EXEC like cases, and leave as is for ET_DYN cases. (*) and it is - we've explicitely made it relative. Look for adjust_symbols handling in dso__load_sym() Previously we were always unmapping sym->start and for ET_DYN dsos resulting addresses were wrong, and so objdump output was empty. The end result was that perf annotate output for symbols from non-prelinked *.so had always 0.00% percents only, which is wrong. To fix it, let's introduce a helper for converting rip to objdump address, and also let's document what map_ip() and unmap_ip() do -- I had to study sources for several hours to understand it. Signed-off-by: Kirill Smelkov <kirr@landau.phys.spbu.ru> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Mike Galbraith <efault@gmx.de> LKML-Reference: <1265223128-11786-8-git-send-email-acme@infradead.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-02-04 02:52:07 +08:00
struct dso *dso;
refcount_t refcnt;
perf map: Move seldom used ->flags field to second cacheline So we start with: $ pahole -C map ~/bin/perf struct map { union { struct rb_node rb_node __attribute__((__aligned__(8))); /* 0 24 */ struct list_head node; /* 0 16 */ } __attribute__((__aligned__(8))); /* 0 24 */ u64 start; /* 24 8 */ u64 end; /* 32 8 */ _Bool erange_warned:1; /* 40: 0 1 */ _Bool priv:1; /* 40: 1 1 */ /* XXX 6 bits hole, try to pack */ /* XXX 3 bytes hole, try to pack */ u32 prot; /* 44 4 */ u32 flags; /* 48 4 */ /* XXX 4 bytes hole, try to pack */ u64 pgoff; /* 56 8 */ /* --- cacheline 1 boundary (64 bytes) --- */ u64 reloc; /* 64 8 */ u32 maj; /* 72 4 */ u32 min; /* 76 4 */ u64 ino; /* 80 8 */ u64 ino_generation; /* 88 8 */ u64 (*map_ip)(struct map *, u64); /* 96 8 */ u64 (*unmap_ip)(struct map *, u64); /* 104 8 */ struct dso * dso; /* 112 8 */ refcount_t refcnt; /* 120 4 */ /* size: 128, cachelines: 2, members: 17 */ /* sum members: 116, holes: 2, sum holes: 7 */ /* sum bitfield members: 2 bits, bit holes: 1, sum bit holes: 6 bits */ /* padding: 4 */ /* forced alignments: 1 */ } __attribute__((__aligned__(8))); $ and 'flags' is seldom used when printing details about the map or with the "cacheline" sort order, we can move them it to the second cacheline, that will allow combining it with 'refcnt', that is only four bytes: $ pahole -C map ~/bin/perf struct map { union { struct rb_node rb_node __attribute__((__aligned__(8))); /* 0 24 */ struct list_head node; /* 0 16 */ } __attribute__((__aligned__(8))); /* 0 24 */ u64 start; /* 24 8 */ u64 end; /* 32 8 */ _Bool erange_warned:1; /* 40: 0 1 */ _Bool priv:1; /* 40: 1 1 */ /* XXX 6 bits hole, try to pack */ /* XXX 3 bytes hole, try to pack */ u32 prot; /* 44 4 */ u64 pgoff; /* 48 8 */ u64 reloc; /* 56 8 */ /* --- cacheline 1 boundary (64 bytes) --- */ u32 maj; /* 64 4 */ u32 min; /* 68 4 */ u64 ino; /* 72 8 */ u64 ino_generation; /* 80 8 */ u64 (*map_ip)(struct map *, u64); /* 88 8 */ u64 (*unmap_ip)(struct map *, u64); /* 96 8 */ struct dso * dso; /* 104 8 */ refcount_t refcnt; /* 112 4 */ u32 flags; /* 116 4 */ /* size: 120, cachelines: 2, members: 17 */ /* sum members: 116, holes: 1, sum holes: 3 */ /* sum bitfield members: 2 bits, bit holes: 1, sum bit holes: 6 bits */ /* forced alignments: 1 */ /* last cacheline: 56 bytes */ } __attribute__((__aligned__(8))); $ Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lkml.kernel.org/n/tip-2cdw3zlw1mkamaf7nqtdlxfi@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-11-19 03:51:00 +08:00
u32 flags;
};
struct kmap;
struct kmap *__map__kmap(struct map *map);
struct kmap *map__kmap(struct map *map);
struct maps *map__kmaps(struct map *map);
static inline u64 map__map_ip(struct map *map, u64 ip)
{
return ip - map->start + map->pgoff;
}
static inline u64 map__unmap_ip(struct map *map, u64 ip)
{
return ip + map->start - map->pgoff;
}
perf tools: Use __maybe_used for unused variables perf defines both __used and __unused variables to use for marking unused variables. The variable __used is defined to __attribute__((__unused__)), which contradicts the kernel definition to __attribute__((__used__)) for new gcc versions. On Android, __used is also defined in system headers and this leads to warnings like: warning: '__used__' attribute ignored __unused is not defined in the kernel and is not a standard definition. If __unused is included everywhere instead of __used, this leads to conflicts with glibc headers, since glibc has a variables with this name in its headers. The best approach is to use __maybe_unused, the definition used in the kernel for __attribute__((unused)). In this way there is only one definition in perf sources (instead of 2 definitions that point to the same thing: __used and __unused) and it works on both Linux and Android. This patch simply replaces all instances of __used and __unused with __maybe_unused. Signed-off-by: Irina Tirdea <irina.tirdea@intel.com> Acked-by: Pekka Enberg <penberg@kernel.org> Cc: David Ahern <dsahern@gmail.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Namhyung Kim <namhyung.kim@lge.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Steven Rostedt <rostedt@goodmis.org> Link: http://lkml.kernel.org/r/1347315303-29906-7-git-send-email-irina.tirdea@intel.com [ committer note: fixed up conflict with a116e05 in builtin-sched.c ] Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2012-09-11 06:15:03 +08:00
static inline u64 identity__map_ip(struct map *map __maybe_unused, u64 ip)
{
return ip;
}
perf tools: Add a "dso_size" sort order Add DSO size to perf report/top sort output list. This includes adding a map__size fn to map.h, which is approximately equal to the DSO data file_size: DSO file size map (end-start) file / (end-start) libwebkit2gtk-4.0.so.37.24.9 43260072 41295872 95% libglib-2.0.so.0.5400.1 1125680 1118208 99% libc-2.26.so 1960656 1925120 101% libdbus-1.so.3.14.13 309456 303104 102% Sample output: $ ./perf report -s dso_size,dso Samples: 2K of event 'cycles:uppp', Event count (approx.): 128373340 Overhead DSO size Shared Object 90.62% unknown [unknown] 2.87% 1118208 libglib-2.0.so.0.5400.1 1.92% 303104 libdbus-1.so.3.14.13 1.42% 1925120 libc-2.26.so 0.77% 41295872 libwebkit2gtk-4.0.so.37.24.9 0.61% 335872 libgobject-2.0.so.0.5400.1 0.41% 1052672 libgdk-3.so.0.2200.25 0.36% 106496 libpthread-2.26.so 0.29% 221184 dbus-daemon 0.17% 159744 ld-2.26.so 0.13% 49152 libwayland-client.so.0.3.0 0.12% 1642496 libgio-2.0.so.0.5400.1 0.09% 7327744 libgtk-3.so.0.2200.25 0.09% 12324864 libmozjs-52.so.0.0.0 0.05% 4796416 perf 0.04% 843776 libgjs.so.0.0.0 0.03% 1409024 libmutter-clutter-1.so Committer testing: To sort by DSO size, use: # perf report -F dso_size,dso,overhead -s dso_size <SNIP> 3465216 libdns-export.so.174.0.1 0.00% 3522560 libgc.so.1.0.3 0.00% 3538944 libbfd-2.29-13.fc27.so 0.59% 3670016 libunistring.so.2.1.0 0.00% 3723264 libguile-2.0.so.22.8.1 0.00% 3776512 libgio-2.0.so.0.5400.3 0.00% 3891200 libc-2.26.so 0.96% 3944448 libmozjs-17.0.so 0.00% 4218880 libperl.so.5.26.1 0.18% 4452352 libpython2.7.so.1.0 0.02% 4472832 perf 0.02% 4603904 git 0.01% 4751360 libcrypto.so.1.1.0g 0.00% 5005312 libslang.so.2.3.1 0.00% 7315456 libgtk-3.so.0.2200.26 0.09% 8818688 i965_dri.so 2.46% 8818688 i965_dri.so (deleted) 1.26% 12414976 libmozjs-52.so.0.0.0 0.03% 23642112 cc1 2.02% 27889664 [kernel.kallsyms] 25.41% 80834560 libxul.so (deleted) 15.68% 98078720 chrome 32.03% 1056964608 [kernel.kallsyms] 1.59% # Signed-off-by: Kim Phillips <kim.phillips@arm.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jin Yao <yao.jin@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Maxim Kuvyrkov <maxim.kuvyrkov@linaro.org> Cc: Milian Wolff <milian.wolff@kdab.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20180327060956.1c01ebe67a2a941bb4468c6f@arm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-03-27 19:09:56 +08:00
static inline size_t map__size(const struct map *map)
{
return map->end - map->start;
}
perf annotate: Fix it for non-prelinked *.so The problem was we were incorrectly calculating objdump addresses for sym->start and sym->end, look: For simple ET_DYN type DSO (*.so) with one function, objdump -dS output is something like this: 000004ac <my_strlen>: int my_strlen(const char *s) 4ac: 55 push %ebp 4ad: 89 e5 mov %esp,%ebp 4af: 83 ec 10 sub $0x10,%esp { i.e. we have relative-to-dso-mapping IPs (=RIP) there. For ET_EXEC type and probably for prelinked libs as well (sorry can't test - I don't use prelink) objdump outputs absolute IPs, e.g. 08048604 <zz_strlen>: extern "C" int zz_strlen(const char *s) 8048604: 55 push %ebp 8048605: 89 e5 mov %esp,%ebp 8048607: 83 ec 10 sub $0x10,%esp { So, if sym->start is always relative to dso mapping(*), we'll have to unmap it for ET_EXEC like cases, and leave as is for ET_DYN cases. (*) and it is - we've explicitely made it relative. Look for adjust_symbols handling in dso__load_sym() Previously we were always unmapping sym->start and for ET_DYN dsos resulting addresses were wrong, and so objdump output was empty. The end result was that perf annotate output for symbols from non-prelinked *.so had always 0.00% percents only, which is wrong. To fix it, let's introduce a helper for converting rip to objdump address, and also let's document what map_ip() and unmap_ip() do -- I had to study sources for several hours to understand it. Signed-off-by: Kirill Smelkov <kirr@landau.phys.spbu.ru> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Mike Galbraith <efault@gmx.de> LKML-Reference: <1265223128-11786-8-git-send-email-acme@infradead.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-02-04 02:52:07 +08:00
perf top: Fix annotate for userspace First, for programs and prelinked libraries, annotate code was fooled by objdump output IPs (src->eip in the code) being wrongly converted to absolute IPs. In such case there were no conversion needed, but in src->eip = strtoull(src->line, NULL, 16); src->eip = map->unmap_ip(map, src->eip); // = eip + map->start - map->pgoff we were reading absolute address from objdump (e.g. 8048604) and then almost doubling it, because eip & map->start are approximately close for small programs. Needless to say, that later, in record_precise_ip() there was no matching with real runtime IPs. And second, like with `perf annotate` the problem with non-prelinked *.so was that we were doing rip -> objdump address conversion wrong. Also, because unlike `perf annotate`, `perf top` code does annotation based on absolute IPs for performance reasons(*), new helper for mapping objdump addresse to IP is introduced. (*) we get samples info in absolute IPs, and since we do lots of hit-testing on absolute IPs at runtime in record_precise_ip(), it's better to convert objdump addresses to IPs once and do no conversion at runtime. I also had to fix how objdump output is parsed (with hardcoded 8/16 characters format, which was inappropriate for ET_DYN dsos with small addresses like '4ac') Also note, that not all objdump output lines has associtated IPs, e.g. look at source lines here: 000004ac <my_strlen>: extern "C" int my_strlen(const char *s) 4ac: 55 push %ebp 4ad: 89 e5 mov %esp,%ebp 4af: 83 ec 10 sub $0x10,%esp { int len = 0; 4b2: c7 45 fc 00 00 00 00 movl $0x0,-0x4(%ebp) 4b9: eb 08 jmp 4c3 <my_strlen+0x17> while (*s) { ++len; 4bb: 83 45 fc 01 addl $0x1,-0x4(%ebp) ++s; 4bf: 83 45 08 01 addl $0x1,0x8(%ebp) So we mark them with eip=0, and ignore such lines in annotate lookup code. Signed-off-by: Kirill Smelkov <kirr@landau.phys.spbu.ru> [ Note: one hunk of this patch was applied by Mike in 57d8188 ] Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Mike Galbraith <efault@gmx.de> LKML-Reference: <1265550376-12665-1-git-send-email-acme@infradead.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-02-07 21:46:15 +08:00
/* rip/ip <-> addr suitable for passing to `objdump --start-address=` */
perf annotate: Fix it for non-prelinked *.so The problem was we were incorrectly calculating objdump addresses for sym->start and sym->end, look: For simple ET_DYN type DSO (*.so) with one function, objdump -dS output is something like this: 000004ac <my_strlen>: int my_strlen(const char *s) 4ac: 55 push %ebp 4ad: 89 e5 mov %esp,%ebp 4af: 83 ec 10 sub $0x10,%esp { i.e. we have relative-to-dso-mapping IPs (=RIP) there. For ET_EXEC type and probably for prelinked libs as well (sorry can't test - I don't use prelink) objdump outputs absolute IPs, e.g. 08048604 <zz_strlen>: extern "C" int zz_strlen(const char *s) 8048604: 55 push %ebp 8048605: 89 e5 mov %esp,%ebp 8048607: 83 ec 10 sub $0x10,%esp { So, if sym->start is always relative to dso mapping(*), we'll have to unmap it for ET_EXEC like cases, and leave as is for ET_DYN cases. (*) and it is - we've explicitely made it relative. Look for adjust_symbols handling in dso__load_sym() Previously we were always unmapping sym->start and for ET_DYN dsos resulting addresses were wrong, and so objdump output was empty. The end result was that perf annotate output for symbols from non-prelinked *.so had always 0.00% percents only, which is wrong. To fix it, let's introduce a helper for converting rip to objdump address, and also let's document what map_ip() and unmap_ip() do -- I had to study sources for several hours to understand it. Signed-off-by: Kirill Smelkov <kirr@landau.phys.spbu.ru> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Mike Galbraith <efault@gmx.de> LKML-Reference: <1265223128-11786-8-git-send-email-acme@infradead.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-02-04 02:52:07 +08:00
u64 map__rip_2objdump(struct map *map, u64 rip);
/* objdump address -> memory address */
u64 map__objdump_2mem(struct map *map, u64 ip);
struct symbol;
struct thread;
perf probe: Allow to add events on the local functions Allow to add events on the local functions without debuginfo. (With the debuginfo, we can add events even on inlined functions) Currently, probing on local functions requires debuginfo to locate actual address. It is also possible without debuginfo since we have symbol maps. Without this change; ---- # ./perf probe -a t_show Added new event: probe:t_show (on t_show) You can now use it in all perf tools, such as: perf record -e probe:t_show -aR sleep 1 # ./perf probe -x perf -a identity__map_ip no symbols found in /kbuild/ksrc/linux-3/tools/perf/perf, maybe install a debug package? Failed to load map. Error: Failed to add events. (-22) ---- As the above results, perf probe just put one event on the first found symbol for kprobe event. Moreover, for uprobe event, perf probe failed to find local functions. With this change; ---- # ./perf probe -a t_show Added new events: probe:t_show (on t_show) probe:t_show_1 (on t_show) probe:t_show_2 (on t_show) probe:t_show_3 (on t_show) You can now use it in all perf tools, such as: perf record -e probe:t_show_3 -aR sleep 1 # ./perf probe -x perf -a identity__map_ip Added new events: probe_perf:identity__map_ip (on identity__map_ip in /kbuild/ksrc/linux-3/tools/perf/perf) probe_perf:identity__map_ip_1 (on identity__map_ip in /kbuild/ksrc/linux-3/tools/perf/perf) probe_perf:identity__map_ip_2 (on identity__map_ip in /kbuild/ksrc/linux-3/tools/perf/perf) probe_perf:identity__map_ip_3 (on identity__map_ip in /kbuild/ksrc/linux-3/tools/perf/perf) You can now use it in all perf tools, such as: perf record -e probe_perf:identity__map_ip_3 -aR sleep 1 ---- Now we succeed to put events on every given local functions for both kprobes and uprobes. :) Note that this also introduces some symbol rbtree iteration macros; symbols__for_each, dso__for_each_symbol, and map__for_each_symbol. These are for walking through the symbol list in a map. Changes from v2: - Fix add_exec_to_probe_trace_events() not to convert address to tp->symbol any more. - Fix to set kernel probes based on ref_reloc_sym. Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Cc: David Ahern <dsahern@gmail.com> Cc: "David A. Long" <dave.long@linaro.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: yrl.pp-manager.tt@hitachi.com Link: http://lkml.kernel.org/r/20140206053225.29635.15026.stgit@kbuild-fedora.yrl.intra.hitachi.co.jp Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2014-02-06 13:32:25 +08:00
/* map__for_each_symbol - iterate over the symbols in the given map
*
* @map: the 'struct map *' in which symbols itereated
* @pos: the 'struct symbol *' to use as a loop cursor
* @n: the 'struct rb_node *' to use as a temporary storage
* Note: caller must ensure map->dso is not NULL (map is loaded).
*/
#define map__for_each_symbol(map, pos, n) \
dso__for_each_symbol(map->dso, pos, n)
perf probe: Allow to add events on the local functions Allow to add events on the local functions without debuginfo. (With the debuginfo, we can add events even on inlined functions) Currently, probing on local functions requires debuginfo to locate actual address. It is also possible without debuginfo since we have symbol maps. Without this change; ---- # ./perf probe -a t_show Added new event: probe:t_show (on t_show) You can now use it in all perf tools, such as: perf record -e probe:t_show -aR sleep 1 # ./perf probe -x perf -a identity__map_ip no symbols found in /kbuild/ksrc/linux-3/tools/perf/perf, maybe install a debug package? Failed to load map. Error: Failed to add events. (-22) ---- As the above results, perf probe just put one event on the first found symbol for kprobe event. Moreover, for uprobe event, perf probe failed to find local functions. With this change; ---- # ./perf probe -a t_show Added new events: probe:t_show (on t_show) probe:t_show_1 (on t_show) probe:t_show_2 (on t_show) probe:t_show_3 (on t_show) You can now use it in all perf tools, such as: perf record -e probe:t_show_3 -aR sleep 1 # ./perf probe -x perf -a identity__map_ip Added new events: probe_perf:identity__map_ip (on identity__map_ip in /kbuild/ksrc/linux-3/tools/perf/perf) probe_perf:identity__map_ip_1 (on identity__map_ip in /kbuild/ksrc/linux-3/tools/perf/perf) probe_perf:identity__map_ip_2 (on identity__map_ip in /kbuild/ksrc/linux-3/tools/perf/perf) probe_perf:identity__map_ip_3 (on identity__map_ip in /kbuild/ksrc/linux-3/tools/perf/perf) You can now use it in all perf tools, such as: perf record -e probe_perf:identity__map_ip_3 -aR sleep 1 ---- Now we succeed to put events on every given local functions for both kprobes and uprobes. :) Note that this also introduces some symbol rbtree iteration macros; symbols__for_each, dso__for_each_symbol, and map__for_each_symbol. These are for walking through the symbol list in a map. Changes from v2: - Fix add_exec_to_probe_trace_events() not to convert address to tp->symbol any more. - Fix to set kernel probes based on ref_reloc_sym. Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Cc: David Ahern <dsahern@gmail.com> Cc: "David A. Long" <dave.long@linaro.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: yrl.pp-manager.tt@hitachi.com Link: http://lkml.kernel.org/r/20140206053225.29635.15026.stgit@kbuild-fedora.yrl.intra.hitachi.co.jp Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2014-02-06 13:32:25 +08:00
/* map__for_each_symbol_with_name - iterate over the symbols in the given map
* that have the given name
*
* @map: the 'struct map *' in which symbols itereated
* @sym_name: the symbol name
* @pos: the 'struct symbol *' to use as a loop cursor
*/
#define __map__for_each_symbol_by_name(map, sym_name, pos) \
for (pos = map__find_symbol_by_name(map, sym_name); \
perf symbols: Allow user probes on versioned symbols Symbol versioning, as in glibc, results in symbols being defined as: <real symbol>@[@]<version> (Note that "@@" identifies a default symbol, if the symbol name is repeated.) perf is currently unable to deal with this, and is unable to create user probes at such symbols: -- $ nm /lib/powerpc64le-linux-gnu/libpthread.so.0 | grep pthread_create 0000000000008d30 t __pthread_create_2_1 0000000000008d30 T pthread_create@@GLIBC_2.17 $ /usr/bin/sudo perf probe -v -x /lib/powerpc64le-linux-gnu/libpthread.so.0 pthread_create probe-definition(0): pthread_create symbol:pthread_create file:(null) line:0 offset:0 return:0 lazy:(null) 0 arguments Open Debuginfo file: /usr/lib/debug/lib/powerpc64le-linux-gnu/libpthread-2.19.so Try to find probe point from debuginfo. Probe point 'pthread_create' not found. Error: Failed to add events. Reason: No such file or directory (Code: -2) -- One is not able to specify the fully versioned symbol, either, due to syntactic conflicts with other uses of "@" by perf: -- $ /usr/bin/sudo perf probe -v -x /lib/powerpc64le-linux-gnu/libpthread.so.0 pthread_create@@GLIBC_2.17 probe-definition(0): pthread_create@@GLIBC_2.17 Semantic error :SRC@SRC is not allowed. 0 arguments Error: Command Parse Error. Reason: Invalid argument (Code: -22) -- This patch ignores versioning for default symbols, thus allowing probes to be created for these symbols: -- $ /usr/bin/sudo ./perf probe -x /lib/powerpc64le-linux-gnu/libpthread.so.0 pthread_create Added new event: probe_libpthread:pthread_create (on pthread_create in /lib/powerpc64le-linux-gnu/libpthread-2.19.so) You can now use it in all perf tools, such as: perf record -e probe_libpthread:pthread_create -aR sleep 1 $ /usr/bin/sudo ./perf record -e probe_libpthread:pthread_create -aR ./test 2 [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.052 MB perf.data (2 samples) ] $ /usr/bin/sudo ./perf script test 2915 [000] 19124.260729: probe_libpthread:pthread_create: (3fff99248d38) test 2916 [000] 19124.260962: probe_libpthread:pthread_create: (3fff99248d38) $ /usr/bin/sudo ./perf probe --del=probe_libpthread:pthread_create Removed event: probe_libpthread:pthread_create -- Committer note: Change the variable storing the result of strlen() to 'int', to fix the build on debian:experimental-x-mipsel, fedora:24-x-ARC-uClibc, ubuntu:16.04-x-arm, etc: util/symbol.c: In function 'symbol__match_symbol_name': util/symbol.c:422:11: error: comparison between signed and unsigned integer expressions [-Werror=sign-compare] if (len < versioning - name) ^ Signed-off-by: Paul A. Clarke <pc@us.ibm.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Acked-by: Masami Hiramatsu <mhiramat@kernel.org> Cc: David Ahern <dsahern@gmail.com> Link: http://lkml.kernel.org/r/c2b18d9c-17f8-9285-4868-f58b6359ccac@us.ibm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-04-26 02:15:49 +08:00
pos && \
!symbol__match_symbol_name(pos->name, sym_name, \
SYMBOL_TAG_INCLUDE__DEFAULT_ONLY); \
pos = symbol__next_by_name(pos))
#define map__for_each_symbol_by_name(map, sym_name, pos) \
__map__for_each_symbol_by_name(map, sym_name, (pos))
void map__init(struct map *map,
u64 start, u64 end, u64 pgoff, struct dso *dso);
perf dso: Move dso_id from 'struct map' to 'struct dso' And take it into account when looking up DSOs when we have the dso_id fields obtained from somewhere, like from PERF_RECORD_MMAP2 records. Instances of struct map pointing to the same DSO pathname but with anything in dso_id different are in fact different DSOs, so better have different 'struct dso' instances to reflect that. At some point we may want to get copies of the contents of the different objects if we want to do correct annotation or other analysis. With this we get 'struct map' 24 bytes leaner: $ pahole -C map ~/bin/perf struct map { union { struct rb_node rb_node __attribute__((__aligned__(8))); /* 0 24 */ struct list_head node; /* 0 16 */ } __attribute__((__aligned__(8))); /* 0 24 */ u64 start; /* 24 8 */ u64 end; /* 32 8 */ _Bool erange_warned:1; /* 40: 0 1 */ _Bool priv:1; /* 40: 1 1 */ /* XXX 6 bits hole, try to pack */ /* XXX 3 bytes hole, try to pack */ u32 prot; /* 44 4 */ u64 pgoff; /* 48 8 */ u64 reloc; /* 56 8 */ /* --- cacheline 1 boundary (64 bytes) --- */ u64 (*map_ip)(struct map *, u64); /* 64 8 */ u64 (*unmap_ip)(struct map *, u64); /* 72 8 */ struct dso * dso; /* 80 8 */ refcount_t refcnt; /* 88 4 */ u32 flags; /* 92 4 */ /* size: 96, cachelines: 2, members: 13 */ /* sum members: 92, holes: 1, sum holes: 3 */ /* sum bitfield members: 2 bits, bit holes: 1, sum bit holes: 6 bits */ /* forced alignments: 1 */ /* last cacheline: 32 bytes */ } __attribute__((__aligned__(8))); $ Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lkml.kernel.org/n/tip-g4hxxmraplo7wfjmk384mfsb@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-11-20 05:44:22 +08:00
struct dso_id;
struct build_id;
perf dso: Move dso_id from 'struct map' to 'struct dso' And take it into account when looking up DSOs when we have the dso_id fields obtained from somewhere, like from PERF_RECORD_MMAP2 records. Instances of struct map pointing to the same DSO pathname but with anything in dso_id different are in fact different DSOs, so better have different 'struct dso' instances to reflect that. At some point we may want to get copies of the contents of the different objects if we want to do correct annotation or other analysis. With this we get 'struct map' 24 bytes leaner: $ pahole -C map ~/bin/perf struct map { union { struct rb_node rb_node __attribute__((__aligned__(8))); /* 0 24 */ struct list_head node; /* 0 16 */ } __attribute__((__aligned__(8))); /* 0 24 */ u64 start; /* 24 8 */ u64 end; /* 32 8 */ _Bool erange_warned:1; /* 40: 0 1 */ _Bool priv:1; /* 40: 1 1 */ /* XXX 6 bits hole, try to pack */ /* XXX 3 bytes hole, try to pack */ u32 prot; /* 44 4 */ u64 pgoff; /* 48 8 */ u64 reloc; /* 56 8 */ /* --- cacheline 1 boundary (64 bytes) --- */ u64 (*map_ip)(struct map *, u64); /* 64 8 */ u64 (*unmap_ip)(struct map *, u64); /* 72 8 */ struct dso * dso; /* 80 8 */ refcount_t refcnt; /* 88 4 */ u32 flags; /* 92 4 */ /* size: 96, cachelines: 2, members: 13 */ /* sum members: 92, holes: 1, sum holes: 3 */ /* sum bitfield members: 2 bits, bit holes: 1, sum bit holes: 6 bits */ /* forced alignments: 1 */ /* last cacheline: 32 bytes */ } __attribute__((__aligned__(8))); $ Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lkml.kernel.org/n/tip-g4hxxmraplo7wfjmk384mfsb@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-11-20 05:44:22 +08:00
struct map *map__new(struct machine *machine, u64 start, u64 len,
u64 pgoff, struct dso_id *id, u32 prot, u32 flags,
struct build_id *bid, char *filename, struct thread *thread);
struct map *map__new2(u64 start, struct dso *dso);
void map__delete(struct map *map);
struct map *map__clone(struct map *map);
static inline struct map *map__get(struct map *map)
{
if (map)
refcount_inc(&map->refcnt);
return map;
}
void map__put(struct map *map);
static inline void __map__zput(struct map **map)
{
map__put(*map);
*map = NULL;
}
#define map__zput(map) __map__zput(&map)
size_t map__fprintf(struct map *map, FILE *fp);
size_t map__fprintf_dsoname(struct map *map, FILE *fp);
char *map__srcline(struct map *map, u64 addr, struct symbol *sym);
perf script: Add an option to print the source line number Add field 'srcline' that displays the source file name and line number associated with the sample ip. The information displayed is the same as from addr2line. $ perf script -f comm,tid,pid,time,ip,sym,dso,symoff,srcline grep 10701/10701 2497321.421013: ffffffff81043ffa native_write_msr_safe+0xa ([kernel.kallsyms]) /usr/src/debug/kernel-3.9.fc17/linux-3.9.10-100.fc17.x86_64/arch/x86/include/asm/msr.h:95 grep 10701/10701 2497321.421984: ffffffff8165b6b3 _raw_spin_lock+0x13 ([kernel.kallsyms]) /usr/src/debug/kernel-3.9.fc17/linux-3.9.10-100.fc17.x86_64/arch/x86/include/asm/spinlock.h:54 grep 10701/10701 2497321.421990: ffffffff810b64b3 tick_sched_timer+0x53 ([kernel.kallsyms]) /usr/src/debug/kernel-3.9.fc17/linux-3.9.10-100.fc17.x86_64/kernel/time/tick-sched.c:840 grep 10701/10701 2497321.421992: ffffffff8106f63f run_timer_softirq+0x2f ([kernel.kallsyms]) /usr/src/debug/kernel-3.9.fc17/linux-3.9.10-100.fc17.x86_64/kernel/timer.c:1372 Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Namhyung Kim <namhyung@gmail.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Stephane Eranian <eranian@google.com> Link: http://lkml.kernel.org/r/1386315778-11633-3-git-send-email-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2013-12-06 15:42:57 +08:00
int map__fprintf_srcline(struct map *map, u64 addr, const char *prefix,
FILE *fp);
int map__load(struct map *map);
struct symbol *map__find_symbol(struct map *map, u64 addr);
struct symbol *map__find_symbol_by_name(struct map *map, const char *name);
void map__fixup_start(struct map *map);
void map__fixup_end(struct map *map);
int map__set_kallsyms_ref_reloc_sym(struct map *map, const char *symbol_name,
u64 addr);
bool __map__is_kernel(const struct map *map);
bool __map__is_extra_kernel_map(const struct map *map);
bool __map__is_bpf_prog(const struct map *map);
bool __map__is_bpf_image(const struct map *map);
bool __map__is_ool(const struct map *map);
static inline bool __map__is_kmodule(const struct map *map)
{
return !__map__is_kernel(map) && !__map__is_extra_kernel_map(map) &&
!__map__is_bpf_prog(map) && !__map__is_ool(map) &&
!__map__is_bpf_image(map);
}
bool map__has_symbols(const struct map *map);
#define ENTRY_TRAMPOLINE_NAME "__entry_SYSCALL_64_trampoline"
static inline bool is_entry_trampoline(const char *name)
{
return !strcmp(name, ENTRY_TRAMPOLINE_NAME);
}
static inline bool is_bpf_image(const char *name)
{
return strncmp(name, "bpf_trampoline_", sizeof("bpf_trampoline_") - 1) == 0 ||
strncmp(name, "bpf_dispatcher_", sizeof("bpf_dispatcher_") - 1) == 0;
}
perf inject: Do not load map/dso when injecting build-id No need to load symbols in a DSO when injecting build-id. I guess the reason was to check the DSO is a special file like anon files. Use some helper functions in map.c to check them before reading build-id. Also pass sample event's cpumode to a new build-id event. It brought a speedup in the benchmark of 25 -> 21 msec on my laptop. Also the memory usage (Max RSS) went down by ~200 KB. # Running 'internals/inject-build-id' benchmark: Average build-id injection took: 21.389 msec (+- 0.138 msec) Average time per event: 2.097 usec (+- 0.014 usec) Average memory usage: 8225 KB (+- 0 KB) Committer notes: Before: $ perf stat -r5 perf bench internals inject-build-id > /dev/null Performance counter stats for 'perf bench internals inject-build-id' (5 runs): 4,020.56 msec task-clock:u # 1.271 CPUs utilized ( +- 0.74% ) 0 context-switches:u # 0.000 K/sec 0 cpu-migrations:u # 0.000 K/sec 123,354 page-faults:u # 0.031 M/sec ( +- 0.81% ) 7,119,951,568 cycles:u # 1.771 GHz ( +- 1.74% ) (83.27%) 230,086,969 stalled-cycles-frontend:u # 3.23% frontend cycles idle ( +- 1.97% ) (83.41%) 1,168,298,765 stalled-cycles-backend:u # 16.41% backend cycles idle ( +- 1.13% ) (83.44%) 11,173,083,669 instructions:u # 1.57 insn per cycle # 0.10 stalled cycles per insn ( +- 1.58% ) (83.31%) 2,413,908,936 branches:u # 600.392 M/sec ( +- 1.69% ) (83.26%) 46,576,289 branch-misses:u # 1.93% of all branches ( +- 2.20% ) (83.31%) 3.1638 +- 0.0309 seconds time elapsed ( +- 0.98% ) $ After: $ perf stat -r5 perf bench internals inject-build-id > /dev/null Performance counter stats for 'perf bench internals inject-build-id' (5 runs): 2,379.94 msec task-clock:u # 1.473 CPUs utilized ( +- 0.18% ) 0 context-switches:u # 0.000 K/sec 0 cpu-migrations:u # 0.000 K/sec 62,584 page-faults:u # 0.026 M/sec ( +- 0.07% ) 2,372,389,668 cycles:u # 0.997 GHz ( +- 0.29% ) (83.14%) 106,937,862 stalled-cycles-frontend:u # 4.51% frontend cycles idle ( +- 4.89% ) (83.20%) 581,697,915 stalled-cycles-backend:u # 24.52% backend cycles idle ( +- 0.71% ) (83.47%) 3,659,692,199 instructions:u # 1.54 insn per cycle # 0.16 stalled cycles per insn ( +- 0.10% ) (83.63%) 791,372,961 branches:u # 332.518 M/sec ( +- 0.27% ) (83.39%) 10,648,083 branch-misses:u # 1.35% of all branches ( +- 0.22% ) (83.16%) 1.61570 +- 0.00172 seconds time elapsed ( +- 0.11% ) $ Signed-off-by: Namhyung Kim <namhyung@kernel.org> Original-patch-by: Stephane Eranian <eranian@google.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Acked-by: Jiri Olsa <jolsa@redhat.com> Link: https://lore.kernel.org/r/20201012070214.2074921-5-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-10-12 15:02:12 +08:00
static inline int is_anon_memory(const char *filename)
{
return !strcmp(filename, "//anon") ||
!strncmp(filename, "/dev/zero", sizeof("/dev/zero") - 1) ||
!strncmp(filename, "/anon_hugepage", sizeof("/anon_hugepage") - 1);
}
static inline int is_no_dso_memory(const char *filename)
{
return !strncmp(filename, "[stack", 6) ||
!strncmp(filename, "/SYSV", 5) ||
!strcmp(filename, "[heap]");
}
#endif /* __PERF_MAP_H */