perf tools changes for v5.13: 1st batch

perf stat:
 
 - Add support for hybrid PMUs to support systems such as Intel Alderlake
   and its BIG/little core/atom cpus.
 
 - Introduce 'bperf' to share hardware PMCs with BPF.
 
 - New --iostat option to collect and present IO stats on Intel hardware.
 
   This functionality is based on recently introduced sysfs attributes
   for Intel® Xeon® Scalable processor family (code name Skylake-SP):
 
     commit bb42b3d397 ("perf/x86/intel/uncore: Expose an Uncore unit to IIO PMON mapping")
 
   It is intended to provide four I/O performance metrics in MB per each
   PCIe root port:
 
    - Inbound Read: I/O devices below root port read from the host memory
    - Inbound Write: I/O devices below root port write to the host memory
    - Outbound Read: CPU reads from I/O devices below root port
    - Outbound Write: CPU writes to I/O devices below root port
 
 - Align CSV output for summary.
 
 - Clarify --null use cases: Assess raw overhead of 'perf stat' or
   measure just wall clock time.
 
 - Improve readability of shadow stats.
 
 perf record:
 
 - Change the COMM when starting tha workload so that --exclude-perf
   doesn't seem to be not honoured.
 
 - Improve 'Workload failed' message printing events + what was exec'ed.
 
 - Fix cross-arch support for TIME_CONV.
 
 perf report:
 
 - Add option to disable raw event ordering.
 
 - Dump the contents of PERF_RECORD_TIME_CONV in 'perf report -D'.
 
 - Improvements to --stat output, that shows information about PERF_RECORD_ events.
 
 - Preserve identifier id in OCaml demangler.
 
 perf annotate:
 
 - Show full source location with 'l' hotkey in the 'perf annotate' TUI.
 
 - Add line number like in TUI and source location at EOL to the 'perf annotate' --stdio mode.
 
 - Add --demangle and --demangle-kernel to 'perf annotate'.
 
 - Allow configuring annotate.demangle{,_kernel} in 'perf config'.
 
 - Fix sample events lost in stdio mode.
 
 perf data:
 
 - Allow converting a perf.data file to JSON.
 
 libperf:
 
 - Add support for user space counter access.
 
 - Update topdown documentation to permit rdpmc calls.
 
 perf test:
 
 - Add 'perf test' for 'perf stat' CSV output.
 
 - Add 'perf test' entries to test the hybrid PMU support.
 
 - Cleanup 'perf test daemon' if its 'perf test' is interrupted.
 
 - Handle metric reuse in pmu-events parsing 'perf test' entry.
 
 - Add test for PE executable support.
 
 - Add timeout for wait for daemon start in its 'perf test' entries.
 
 Build:
 
 - Enable libtraceevent dynamic linking.
 
 - Improve feature detection output.
 
 - Fix caching of feature checks caching.
 
 - First round of updates for tools copies of kernel headers.
 
 - Enable warnings when compiling BPF programs.
 
 Vendor specific events:
 
 Intel:
 
 - Add missing skylake & icelake model numbers.
 
 arm64:
 
 - Add Hisi hip08 L1, L2 and L3 metrics.
 
 - Add Fujitsu A64FX PMU events.
 
 PowerPC:
 
 - Initial JSON/events list for power10 platform.
 
 - Remove unsupported power9 metrics.
 
 AMD:
 
 - Add Zen3 events.
 
 - Fix broken L2 Cache Hits from L2 HWPF metric.
 
 - Use lowercases for all the eventcodes and umasks.
 
 Hardware tracing:
 
 arm64:
 
 - Update CoreSight ETM metadata format.
 
 - Fix bitmap for CS-ETM option.
 
 - Support PID tracing in config.
 
 - Detect pid in VMID for kernel running at EL2.
 
 Arch specific:
 
 MIPS:
 
 - Support MIPS unwinding and dwarf-regs.
 
 - Generate mips syscalls_n64.c syscall table.
 
 PowerPC:
 
 - Add support for PERF_SAMPLE_WEIGH_STRUCT on PowerPC.
 
 - Support pipeline stage cycles for powerpc.
 
 libbeauty:
 
 - Fix fsconfig generator.
 
 Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQR2GiIUctdOfX2qHhGyPKLppCJ+JwUCYIshAwAKCRCyPKLppCJ+
 J8oWAP9c1POclDQ7AZDe5/t/InZYSQKJFIku1sE1SNCSOupy7wEAuPBtaN7wDaRj
 BFBibfUGd4MNzLPvMMHneIhSY3DgJwg=
 =FLLr
 -----END PGP SIGNATURE-----

Merge tag 'perf-tools-for-v5.13-2021-04-29' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux

Pull perf tool updates from Arnaldo Carvalho de Melo:
 "perf stat:

   - Add support for hybrid PMUs to support systems such as Intel
     Alderlake and its BIG/little core/atom cpus.

   - Introduce 'bperf' to share hardware PMCs with BPF.

   - New --iostat option to collect and present IO stats on Intel
     hardware.

     This functionality is based on recently introduced sysfs attributes
     for Intel® Xeon® Scalable processor family (code name Skylake-SP)
     in commit bb42b3d397 ("perf/x86/intel/uncore: Expose an Uncore
     unit to IIO PMON mapping")

     It is intended to provide four I/O performance metrics in MB per
     each PCIe root port:

       - Inbound Read: I/O devices below root port read from the host memory
       - Inbound Write: I/O devices below root port write to the host memory
       - Outbound Read: CPU reads from I/O devices below root port
       - Outbound Write: CPU writes to I/O devices below root port

   - Align CSV output for summary.

   - Clarify --null use cases: Assess raw overhead of 'perf stat' or
     measure just wall clock time.

   - Improve readability of shadow stats.

  perf record:

   - Change the COMM when starting tha workload so that --exclude-perf
     doesn't seem to be not honoured.

   - Improve 'Workload failed' message printing events + what was
     exec'ed.

   - Fix cross-arch support for TIME_CONV.

  perf report:

   - Add option to disable raw event ordering.

   - Dump the contents of PERF_RECORD_TIME_CONV in 'perf report -D'.

   - Improvements to --stat output, that shows information about
     PERF_RECORD_ events.

   - Preserve identifier id in OCaml demangler.

  perf annotate:

   - Show full source location with 'l' hotkey in the 'perf annotate'
     TUI.

   - Add line number like in TUI and source location at EOL to the 'perf
     annotate' --stdio mode.

   - Add --demangle and --demangle-kernel to 'perf annotate'.

   - Allow configuring annotate.demangle{,_kernel} in 'perf config'.

   - Fix sample events lost in stdio mode.

  perf data:

   - Allow converting a perf.data file to JSON.

  libperf:

   - Add support for user space counter access.

   - Update topdown documentation to permit rdpmc calls.

  perf test:

   - Add 'perf test' for 'perf stat' CSV output.

   - Add 'perf test' entries to test the hybrid PMU support.

   - Cleanup 'perf test daemon' if its 'perf test' is interrupted.

   - Handle metric reuse in pmu-events parsing 'perf test' entry.

   - Add test for PE executable support.

   - Add timeout for wait for daemon start in its 'perf test' entries.

  Build:

   - Enable libtraceevent dynamic linking.

   - Improve feature detection output.

   - Fix caching of feature checks caching.

   - First round of updates for tools copies of kernel headers.

   - Enable warnings when compiling BPF programs.

  Vendor specific events:

   - Intel:
      - Add missing skylake & icelake model numbers.

   - arm64:
      - Add Hisi hip08 L1, L2 and L3 metrics.
      - Add Fujitsu A64FX PMU events.

   - PowerPC:
      - Initial JSON/events list for power10 platform.
      - Remove unsupported power9 metrics.

   - AMD:
      - Add Zen3 events.
      - Fix broken L2 Cache Hits from L2 HWPF metric.
      - Use lowercases for all the eventcodes and umasks.

  Hardware tracing:

   - arm64:
      - Update CoreSight ETM metadata format.
      - Fix bitmap for CS-ETM option.
      - Support PID tracing in config.
      - Detect pid in VMID for kernel running at EL2.

  Arch specific updates:

   - MIPS:
      - Support MIPS unwinding and dwarf-regs.
      - Generate mips syscalls_n64.c syscall table.

   - PowerPC:
      - Add support for PERF_SAMPLE_WEIGH_STRUCT on PowerPC.
      - Support pipeline stage cycles for powerpc.

  libbeauty:

   - Fix fsconfig generator"

* tag 'perf-tools-for-v5.13-2021-04-29' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux: (132 commits)
  perf build: Defer printing detected features to the end of all feature checks
  tools build: Allow deferring printing the results of feature detection
  perf build: Regenerate the FEATURE_DUMP file after extra feature checks
  perf session: Dump PERF_RECORD_TIME_CONV event
  perf session: Add swap operation for event TIME_CONV
  perf jit: Let convert_timestamp() to be backwards-compatible
  perf tools: Change fields type in perf_record_time_conv
  perf tools: Enable libtraceevent dynamic linking
  perf Documentation: Document intel-hybrid support
  perf tests: Skip 'perf stat metrics (shadow stat) test' for hybrid
  perf tests: Support 'Convert perf time to TSC' test for hybrid
  perf tests: Support 'Session topology' test for hybrid
  perf tests: Support 'Parse and process metrics' test for hybrid
  perf tests: Support 'Track with sched_switch' test for hybrid
  perf tests: Skip 'Setup struct perf_event_attr' test for hybrid
  perf tests: Add hybrid cases for 'Roundtrip evsel->name' test
  perf tests: Add hybrid cases for 'Parse event definition strings' test
  perf record: Uniquify hybrid event name
  perf stat: Warn group events from different hybrid PMU
  perf stat: Filter out unmatched aggregation for hybrid event
  ...
This commit is contained in:
Linus Torvalds 2021-05-01 12:22:38 -07:00
commit 10a3efd0fe
244 changed files with 9952 additions and 883 deletions

View File

@ -14290,8 +14290,10 @@ R: Mark Rutland <mark.rutland@arm.com>
R: Alexander Shishkin <alexander.shishkin@linux.intel.com>
R: Jiri Olsa <jolsa@redhat.com>
R: Namhyung Kim <namhyung@kernel.org>
L: linux-perf-users@vger.kernel.org
L: linux-kernel@vger.kernel.org
S: Supported
W: https://perf.wiki.kernel.org/
T: git git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git perf/core
F: arch/*/events/*
F: arch/*/events/*/*

View File

@ -52,6 +52,7 @@ FEATURE_TESTS_BASIC := \
libpython-version \
libslang \
libslang-include-subdir \
libtraceevent \
libcrypto \
libunwind \
pthread-attr-setaffinity-np \
@ -239,17 +240,24 @@ ifeq ($(VF),1)
feature_verbose := 1
endif
ifeq ($(feature_display),1)
$(info )
$(info Auto-detecting system features:)
$(foreach feat,$(FEATURE_DISPLAY),$(call feature_print_status,$(feat),))
ifneq ($(feature_verbose),1)
feature_display_entries = $(eval $(feature_display_entries_code))
define feature_display_entries_code
ifeq ($(feature_display),1)
$(info )
$(info Auto-detecting system features:)
$(foreach feat,$(FEATURE_DISPLAY),$(call feature_print_status,$(feat),))
ifneq ($(feature_verbose),1)
$(info )
endif
endif
ifeq ($(feature_verbose),1)
TMP := $(filter-out $(FEATURE_DISPLAY),$(FEATURE_TESTS))
$(foreach feat,$(TMP),$(call feature_print_status,$(feat),))
$(info )
endif
endif
endef
ifeq ($(feature_verbose),1)
TMP := $(filter-out $(FEATURE_DISPLAY),$(FEATURE_TESTS))
$(foreach feat,$(TMP),$(call feature_print_status,$(feat),))
$(info )
ifeq ($(FEATURE_DISPLAY_DEFERRED),)
$(call feature_display_entries)
endif

View File

@ -36,6 +36,7 @@ FILES= \
test-libpython-version.bin \
test-libslang.bin \
test-libslang-include-subdir.bin \
test-libtraceevent.bin \
test-libcrypto.bin \
test-libunwind.bin \
test-libunwind-debug-frame.bin \
@ -196,6 +197,9 @@ $(OUTPUT)test-libslang.bin:
$(OUTPUT)test-libslang-include-subdir.bin:
$(BUILD) -lslang
$(OUTPUT)test-libtraceevent.bin:
$(BUILD) -ltraceevent
$(OUTPUT)test-libcrypto.bin:
$(BUILD) -lcrypto

View File

@ -0,0 +1,12 @@
// SPDX-License-Identifier: GPL-2.0
#include <traceevent/trace-seq.h>
int main(void)
{
int rv = 0;
struct trace_seq s;
trace_seq_init(&s);
rv += !(s.state == TRACE_SEQ__GOOD);
trace_seq_destroy(&s);
return rv;
}

View File

@ -0,0 +1,75 @@
/* SPDX-License-Identifier: GPL-2.0 */
#ifndef _LINUX_MATH64_H
#define _LINUX_MATH64_H
#include <linux/types.h>
#ifdef __x86_64__
static inline u64 mul_u64_u64_div64(u64 a, u64 b, u64 c)
{
u64 q;
asm ("mulq %2; divq %3" : "=a" (q)
: "a" (a), "rm" (b), "rm" (c)
: "rdx");
return q;
}
#define mul_u64_u64_div64 mul_u64_u64_div64
#endif
#ifdef __SIZEOF_INT128__
static inline u64 mul_u64_u32_shr(u64 a, u32 b, unsigned int shift)
{
return (u64)(((unsigned __int128)a * b) >> shift);
}
#else
#ifdef __i386__
static inline u64 mul_u32_u32(u32 a, u32 b)
{
u32 high, low;
asm ("mull %[b]" : "=a" (low), "=d" (high)
: [a] "a" (a), [b] "rm" (b) );
return low | ((u64)high) << 32;
}
#else
static inline u64 mul_u32_u32(u32 a, u32 b)
{
return (u64)a * b;
}
#endif
static inline u64 mul_u64_u32_shr(u64 a, u32 b, unsigned int shift)
{
u32 ah, al;
u64 ret;
al = a;
ah = a >> 32;
ret = mul_u32_u32(al, b) >> shift;
if (ah)
ret += mul_u32_u32(ah, b) << (32 - shift);
return ret;
}
#endif /* __SIZEOF_INT128__ */
#ifndef mul_u64_u64_div64
static inline u64 mul_u64_u64_div64(u64 a, u64 b, u64 c)
{
u64 quot, rem;
quot = a / c;
rem = a % c;
return quot * b + (rem * b) / c;
}
#endif
#endif /* _LINUX_MATH64_H */

View File

@ -61,6 +61,9 @@ typedef __u32 __bitwise __be32;
typedef __u64 __bitwise __le64;
typedef __u64 __bitwise __be64;
typedef __u16 __bitwise __sum16;
typedef __u32 __bitwise __wsum;
typedef struct {
int counter;
} atomic_t;

View File

@ -37,6 +37,21 @@ enum perf_type_id {
PERF_TYPE_MAX, /* non-ABI */
};
/*
* attr.config layout for type PERF_TYPE_HARDWARE and PERF_TYPE_HW_CACHE
* PERF_TYPE_HARDWARE: 0xEEEEEEEE000000AA
* AA: hardware event ID
* EEEEEEEE: PMU type ID
* PERF_TYPE_HW_CACHE: 0xEEEEEEEE00DDCCBB
* BB: hardware cache ID
* CC: hardware cache op ID
* DD: hardware cache op result ID
* EEEEEEEE: PMU type ID
* If the PMU type ID is 0, the PERF_TYPE_RAW will be applied.
*/
#define PERF_PMU_TYPE_SHIFT 32
#define PERF_HW_EVENT_MASK 0xffffffff
/*
* Generalized performance event event_id types, used by the
* attr.event_id parameter of the sys_perf_event_open()

View File

@ -136,6 +136,9 @@ SYNOPSIS
struct perf_thread_map *threads);
void perf_evsel__close(struct perf_evsel *evsel);
void perf_evsel__close_cpu(struct perf_evsel *evsel, int cpu);
int perf_evsel__mmap(struct perf_evsel *evsel, int pages);
void perf_evsel__munmap(struct perf_evsel *evsel);
void *perf_evsel__mmap_base(struct perf_evsel *evsel, int cpu, int thread);
int perf_evsel__read(struct perf_evsel *evsel, int cpu, int thread,
struct perf_counts_values *count);
int perf_evsel__enable(struct perf_evsel *evsel);

View File

@ -11,10 +11,12 @@
#include <stdlib.h>
#include <internal/xyarray.h>
#include <internal/cpumap.h>
#include <internal/mmap.h>
#include <internal/threadmap.h>
#include <internal/lib.h>
#include <linux/string.h>
#include <sys/ioctl.h>
#include <sys/mman.h>
void perf_evsel__init(struct perf_evsel *evsel, struct perf_event_attr *attr)
{
@ -38,6 +40,7 @@ void perf_evsel__delete(struct perf_evsel *evsel)
}
#define FD(e, x, y) (*(int *) xyarray__entry(e->fd, x, y))
#define MMAP(e, x, y) (e->mmap ? ((struct perf_mmap *) xyarray__entry(e->mmap, x, y)) : NULL)
int perf_evsel__alloc_fd(struct perf_evsel *evsel, int ncpus, int nthreads)
{
@ -55,6 +58,13 @@ int perf_evsel__alloc_fd(struct perf_evsel *evsel, int ncpus, int nthreads)
return evsel->fd != NULL ? 0 : -ENOMEM;
}
static int perf_evsel__alloc_mmap(struct perf_evsel *evsel, int ncpus, int nthreads)
{
evsel->mmap = xyarray__new(ncpus, nthreads, sizeof(struct perf_mmap));
return evsel->mmap != NULL ? 0 : -ENOMEM;
}
static int
sys_perf_event_open(struct perf_event_attr *attr,
pid_t pid, int cpu, int group_fd,
@ -156,6 +166,72 @@ void perf_evsel__close_cpu(struct perf_evsel *evsel, int cpu)
perf_evsel__close_fd_cpu(evsel, cpu);
}
void perf_evsel__munmap(struct perf_evsel *evsel)
{
int cpu, thread;
if (evsel->fd == NULL || evsel->mmap == NULL)
return;
for (cpu = 0; cpu < xyarray__max_x(evsel->fd); cpu++) {
for (thread = 0; thread < xyarray__max_y(evsel->fd); thread++) {
int fd = FD(evsel, cpu, thread);
struct perf_mmap *map = MMAP(evsel, cpu, thread);
if (fd < 0)
continue;
perf_mmap__munmap(map);
}
}
xyarray__delete(evsel->mmap);
evsel->mmap = NULL;
}
int perf_evsel__mmap(struct perf_evsel *evsel, int pages)
{
int ret, cpu, thread;
struct perf_mmap_param mp = {
.prot = PROT_READ | PROT_WRITE,
.mask = (pages * page_size) - 1,
};
if (evsel->fd == NULL || evsel->mmap)
return -EINVAL;
if (perf_evsel__alloc_mmap(evsel, xyarray__max_x(evsel->fd), xyarray__max_y(evsel->fd)) < 0)
return -ENOMEM;
for (cpu = 0; cpu < xyarray__max_x(evsel->fd); cpu++) {
for (thread = 0; thread < xyarray__max_y(evsel->fd); thread++) {
int fd = FD(evsel, cpu, thread);
struct perf_mmap *map = MMAP(evsel, cpu, thread);
if (fd < 0)
continue;
perf_mmap__init(map, NULL, false, NULL);
ret = perf_mmap__mmap(map, &mp, fd, cpu);
if (ret) {
perf_evsel__munmap(evsel);
return ret;
}
}
}
return 0;
}
void *perf_evsel__mmap_base(struct perf_evsel *evsel, int cpu, int thread)
{
if (FD(evsel, cpu, thread) < 0 || MMAP(evsel, cpu, thread) == NULL)
return NULL;
return MMAP(evsel, cpu, thread)->base;
}
int perf_evsel__read_size(struct perf_evsel *evsel)
{
u64 read_format = evsel->attr.read_format;
@ -191,6 +267,10 @@ int perf_evsel__read(struct perf_evsel *evsel, int cpu, int thread,
if (FD(evsel, cpu, thread) < 0)
return -EINVAL;
if (MMAP(evsel, cpu, thread) &&
!perf_mmap__read_self(MMAP(evsel, cpu, thread), count))
return 0;
if (readn(FD(evsel, cpu, thread), count->values, size) <= 0)
return -errno;

View File

@ -41,6 +41,7 @@ struct perf_evsel {
struct perf_cpu_map *own_cpus;
struct perf_thread_map *threads;
struct xyarray *fd;
struct xyarray *mmap;
struct xyarray *sample_id;
u64 *id;
u32 ids;

View File

@ -11,6 +11,7 @@
#define PERF_SAMPLE_MAX_SIZE (1 << 16)
struct perf_mmap;
struct perf_counts_values;
typedef void (*libperf_unmap_cb_t)(struct perf_mmap *map);
@ -52,4 +53,6 @@ void perf_mmap__put(struct perf_mmap *map);
u64 perf_mmap__read_head(struct perf_mmap *map);
int perf_mmap__read_self(struct perf_mmap *map, struct perf_counts_values *count);
#endif /* __LIBPERF_INTERNAL_MMAP_H */

View File

@ -3,11 +3,32 @@
#define __LIBPERF_INTERNAL_TESTS_H
#include <stdio.h>
#include <unistd.h>
int tests_failed;
int tests_verbose;
static inline int get_verbose(char **argv, int argc)
{
int c;
int verbose = 0;
while ((c = getopt(argc, argv, "v")) != -1) {
switch (c)
{
case 'v':
verbose = 1;
break;
default:
break;
}
}
return verbose;
}
#define __T_START \
do { \
tests_verbose = get_verbose(argv, argc); \
fprintf(stdout, "- running %s...", __FILE__); \
fflush(NULL); \
tests_failed = 0; \
@ -30,4 +51,15 @@ do {
} \
} while (0)
#define __T_VERBOSE(...) \
do { \
if (tests_verbose) { \
if (tests_verbose == 1) { \
fputc('\n', stderr); \
tests_verbose++; \
} \
fprintf(stderr, ##__VA_ARGS__); \
} \
} while (0)
#endif /* __LIBPERF_INTERNAL_TESTS_H */

View File

@ -18,11 +18,18 @@ struct xyarray *xyarray__new(int xlen, int ylen, size_t entry_size);
void xyarray__delete(struct xyarray *xy);
void xyarray__reset(struct xyarray *xy);
static inline void *xyarray__entry(struct xyarray *xy, int x, int y)
static inline void *__xyarray__entry(struct xyarray *xy, int x, int y)
{
return &xy->contents[x * xy->row_size + y * xy->entry_size];
}
static inline void *xyarray__entry(struct xyarray *xy, size_t x, size_t y)
{
if (x >= xy->max_x || y >= xy->max_y)
return NULL;
return __xyarray__entry(xy, x, y);
}
static inline int xyarray__max_y(struct xyarray *xy)
{
return xy->max_y;

View File

@ -0,0 +1,31 @@
/* SPDX-License-Identifier: (LGPL-2.1 OR BSD-2-Clause) */
#ifndef __LIBPERF_BPF_PERF_H
#define __LIBPERF_BPF_PERF_H
#include <linux/types.h> /* for __u32 */
/*
* bpf_perf uses a hashmap, the attr_map, to track all the leader programs.
* The hashmap is pinned in bpffs. flock() on this file is used to ensure
* no concurrent access to the attr_map. The key of attr_map is struct
* perf_event_attr, and the value is struct perf_event_attr_map_entry.
*
* struct perf_event_attr_map_entry contains two __u32 IDs, bpf_link of the
* leader prog, and the diff_map. Each perf-stat session holds a reference
* to the bpf_link to make sure the leader prog is attached to sched_switch
* tracepoint.
*
* Since the hashmap only contains IDs of the bpf_link and diff_map, it
* does not hold any references to the leader program. Once all perf-stat
* sessions of these events exit, the leader prog, its maps, and the
* perf_events will be freed.
*/
struct perf_event_attr_map_entry {
__u32 link_id;
__u32 diff_map_id;
};
/* default attr_map name */
#define BPF_PERF_DEFAULT_ATTR_MAP_PATH "perf_attr_map"
#endif /* __LIBPERF_BPF_PERF_H */

View File

@ -8,6 +8,8 @@
#include <linux/bpf.h>
#include <sys/types.h> /* pid_t */
#define event_contains(obj, mem) ((obj).header.size > offsetof(typeof(obj), mem))
struct perf_record_mmap {
struct perf_event_header header;
__u32 pid, tid;
@ -346,8 +348,9 @@ struct perf_record_time_conv {
__u64 time_zero;
__u64 time_cycles;
__u64 time_mask;
bool cap_user_time_zero;
bool cap_user_time_short;
__u8 cap_user_time_zero;
__u8 cap_user_time_short;
__u8 reserved[6]; /* For alignment */
};
struct perf_record_header_feature {

View File

@ -27,6 +27,9 @@ LIBPERF_API int perf_evsel__open(struct perf_evsel *evsel, struct perf_cpu_map *
struct perf_thread_map *threads);
LIBPERF_API void perf_evsel__close(struct perf_evsel *evsel);
LIBPERF_API void perf_evsel__close_cpu(struct perf_evsel *evsel, int cpu);
LIBPERF_API int perf_evsel__mmap(struct perf_evsel *evsel, int pages);
LIBPERF_API void perf_evsel__munmap(struct perf_evsel *evsel);
LIBPERF_API void *perf_evsel__mmap_base(struct perf_evsel *evsel, int cpu, int thread);
LIBPERF_API int perf_evsel__read(struct perf_evsel *evsel, int cpu, int thread,
struct perf_counts_values *count);
LIBPERF_API int perf_evsel__enable(struct perf_evsel *evsel);

View File

@ -23,6 +23,9 @@ LIBPERF_0.0.1 {
perf_evsel__disable;
perf_evsel__open;
perf_evsel__close;
perf_evsel__mmap;
perf_evsel__munmap;
perf_evsel__mmap_base;
perf_evsel__read;
perf_evsel__cpus;
perf_evsel__threads;

View File

@ -8,9 +8,11 @@
#include <linux/perf_event.h>
#include <perf/mmap.h>
#include <perf/event.h>
#include <perf/evsel.h>
#include <internal/mmap.h>
#include <internal/lib.h>
#include <linux/kernel.h>
#include <linux/math64.h>
#include "internal.h"
void perf_mmap__init(struct perf_mmap *map, struct perf_mmap *prev,
@ -273,3 +275,89 @@ union perf_event *perf_mmap__read_event(struct perf_mmap *map)
return event;
}
#if defined(__i386__) || defined(__x86_64__)
static u64 read_perf_counter(unsigned int counter)
{
unsigned int low, high;
asm volatile("rdpmc" : "=a" (low), "=d" (high) : "c" (counter));
return low | ((u64)high) << 32;
}
static u64 read_timestamp(void)
{
unsigned int low, high;
asm volatile("rdtsc" : "=a" (low), "=d" (high));
return low | ((u64)high) << 32;
}
#else
static u64 read_perf_counter(unsigned int counter __maybe_unused) { return 0; }
static u64 read_timestamp(void) { return 0; }
#endif
int perf_mmap__read_self(struct perf_mmap *map, struct perf_counts_values *count)
{
struct perf_event_mmap_page *pc = map->base;
u32 seq, idx, time_mult = 0, time_shift = 0;
u64 cnt, cyc = 0, time_offset = 0, time_cycles = 0, time_mask = ~0ULL;
if (!pc || !pc->cap_user_rdpmc)
return -1;
do {
seq = READ_ONCE(pc->lock);
barrier();
count->ena = READ_ONCE(pc->time_enabled);
count->run = READ_ONCE(pc->time_running);
if (pc->cap_user_time && count->ena != count->run) {
cyc = read_timestamp();
time_mult = READ_ONCE(pc->time_mult);
time_shift = READ_ONCE(pc->time_shift);
time_offset = READ_ONCE(pc->time_offset);
if (pc->cap_user_time_short) {
time_cycles = READ_ONCE(pc->time_cycles);
time_mask = READ_ONCE(pc->time_mask);
}
}
idx = READ_ONCE(pc->index);
cnt = READ_ONCE(pc->offset);
if (pc->cap_user_rdpmc && idx) {
s64 evcnt = read_perf_counter(idx - 1);
u16 width = READ_ONCE(pc->pmc_width);
evcnt <<= 64 - width;
evcnt >>= 64 - width;
cnt += evcnt;
} else
return -1;
barrier();
} while (READ_ONCE(pc->lock) != seq);
if (count->ena != count->run) {
u64 delta;
/* Adjust for cap_usr_time_short, a nop if not */
cyc = time_cycles + ((cyc - time_cycles) & time_mask);
delta = time_offset + mul_u64_u32_shr(cyc, time_mult, time_shift);
count->ena += delta;
if (idx)
count->run += delta;
cnt = mul_u64_u64_div64(cnt, count->ena, count->run);
}
count->val = cnt;
return 0;
}

View File

@ -5,6 +5,8 @@ TESTS = test-cpumap test-threadmap test-evlist test-evsel
TESTS_SO := $(addsuffix -so,$(TESTS))
TESTS_A := $(addsuffix -a,$(TESTS))
TEST_ARGS := $(if $(V),-v)
# Set compile option CFLAGS
ifdef EXTRA_CFLAGS
CFLAGS := $(EXTRA_CFLAGS)
@ -28,9 +30,9 @@ all: $(TESTS_A) $(TESTS_SO)
run:
@echo "running static:"
@for i in $(TESTS_A); do ./$$i; done
@for i in $(TESTS_A); do ./$$i $(TEST_ARGS); done
@echo "running dynamic:"
@for i in $(TESTS_SO); do LD_LIBRARY_PATH=../ ./$$i; done
@for i in $(TESTS_SO); do LD_LIBRARY_PATH=../ ./$$i $(TEST_ARGS); done
clean:
$(call QUIET_CLEAN, tests)$(RM) $(TESTS_A) $(TESTS_SO)

View File

@ -120,6 +120,70 @@ static int test_stat_thread_enable(void)
return 0;
}
static int test_stat_user_read(int event)
{
struct perf_counts_values counts = { .val = 0 };
struct perf_thread_map *threads;
struct perf_evsel *evsel;
struct perf_event_mmap_page *pc;
struct perf_event_attr attr = {
.type = PERF_TYPE_HARDWARE,
.config = event,
};
int err, i;
threads = perf_thread_map__new_dummy();
__T("failed to create threads", threads);
perf_thread_map__set_pid(threads, 0, 0);
evsel = perf_evsel__new(&attr);
__T("failed to create evsel", evsel);
err = perf_evsel__open(evsel, NULL, threads);
__T("failed to open evsel", err == 0);
err = perf_evsel__mmap(evsel, 0);
__T("failed to mmap evsel", err == 0);
pc = perf_evsel__mmap_base(evsel, 0, 0);
#if defined(__i386__) || defined(__x86_64__)
__T("userspace counter access not supported", pc->cap_user_rdpmc);
__T("userspace counter access not enabled", pc->index);
__T("userspace counter width not set", pc->pmc_width >= 32);
#endif
perf_evsel__read(evsel, 0, 0, &counts);
__T("failed to read value for evsel", counts.val != 0);
for (i = 0; i < 5; i++) {
volatile int count = 0x10000 << i;
__u64 start, end, last = 0;
__T_VERBOSE("\tloop = %u, ", count);
perf_evsel__read(evsel, 0, 0, &counts);
start = counts.val;
while (count--) ;
perf_evsel__read(evsel, 0, 0, &counts);
end = counts.val;
__T("invalid counter data", (end - start) > last);
last = end - start;
__T_VERBOSE("count = %llu\n", end - start);
}
perf_evsel__munmap(evsel);
perf_evsel__close(evsel);
perf_evsel__delete(evsel);
perf_thread_map__put(threads);
return 0;
}
int main(int argc, char **argv)
{
__T_START;
@ -129,6 +193,8 @@ int main(int argc, char **argv)
test_stat_cpu();
test_stat_thread();
test_stat_thread_enable();
test_stat_user_read(PERF_COUNT_HW_INSTRUCTIONS);
test_stat_user_read(PERF_COUNT_HW_CPU_CYCLES);
__T_END;
return tests_failed == 0 ? 0 : -1;

View File

@ -20,6 +20,7 @@ perf.data.old
output.svg
perf-archive
perf-with-kcore
perf-iostat
tags
TAGS
cscope*

View File

@ -0,0 +1,214 @@
Intel hybrid support
--------------------
Support for Intel hybrid events within perf tools.
For some Intel platforms, such as AlderLake, which is hybrid platform and
it consists of atom cpu and core cpu. Each cpu has dedicated event list.
Part of events are available on core cpu, part of events are available
on atom cpu and even part of events are available on both.
Kernel exports two new cpu pmus via sysfs:
/sys/devices/cpu_core
/sys/devices/cpu_atom
The 'cpus' files are created under the directories. For example,
cat /sys/devices/cpu_core/cpus
0-15
cat /sys/devices/cpu_atom/cpus
16-23
It indicates cpu0-cpu15 are core cpus and cpu16-cpu23 are atom cpus.
Quickstart
List hybrid event
-----------------
As before, use perf-list to list the symbolic event.
perf list
inst_retired.any
[Fixed Counter: Counts the number of instructions retired. Unit: cpu_atom]
inst_retired.any
[Number of instructions retired. Fixed Counter - architectural event. Unit: cpu_core]
The 'Unit: xxx' is added to brief description to indicate which pmu
the event is belong to. Same event name but with different pmu can
be supported.
Enable hybrid event with a specific pmu
---------------------------------------
To enable a core only event or atom only event, following syntax is supported:
cpu_core/<event name>/
or
cpu_atom/<event name>/
For example, count the 'cycles' event on core cpus.
perf stat -e cpu_core/cycles/
Create two events for one hardware event automatically
------------------------------------------------------
When creating one event and the event is available on both atom and core,
two events are created automatically. One is for atom, the other is for
core. Most of hardware events and cache events are available on both
cpu_core and cpu_atom.
For hardware events, they have pre-defined configs (e.g. 0 for cycles).
But on hybrid platform, kernel needs to know where the event comes from
(from atom or from core). The original perf event type PERF_TYPE_HARDWARE
can't carry pmu information. So now this type is extended to be PMU aware
type. The PMU type ID is stored at attr.config[63:32].
PMU type ID is retrieved from sysfs.
/sys/devices/cpu_atom/type
/sys/devices/cpu_core/type
The new attr.config layout for PERF_TYPE_HARDWARE:
PERF_TYPE_HARDWARE: 0xEEEEEEEE000000AA
AA: hardware event ID
EEEEEEEE: PMU type ID
Cache event is similar. The type PERF_TYPE_HW_CACHE is extended to be
PMU aware type. The PMU type ID is stored at attr.config[63:32].
The new attr.config layout for PERF_TYPE_HW_CACHE:
PERF_TYPE_HW_CACHE: 0xEEEEEEEE00DDCCBB
BB: hardware cache ID
CC: hardware cache op ID
DD: hardware cache op result ID
EEEEEEEE: PMU type ID
When enabling a hardware event without specified pmu, such as,
perf stat -e cycles -a (use system-wide in this example), two events
are created automatically.
------------------------------------------------------------
perf_event_attr:
size 120
config 0x400000000
sample_type IDENTIFIER
read_format TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING
disabled 1
inherit 1
exclude_guest 1
------------------------------------------------------------
and
------------------------------------------------------------
perf_event_attr:
size 120
config 0x800000000
sample_type IDENTIFIER
read_format TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING
disabled 1
inherit 1
exclude_guest 1
------------------------------------------------------------
type 0 is PERF_TYPE_HARDWARE.
0x4 in 0x400000000 indicates it's cpu_core pmu.
0x8 in 0x800000000 indicates it's cpu_atom pmu (atom pmu type id is random).
The kernel creates 'cycles' (0x400000000) on cpu0-cpu15 (core cpus),
and create 'cycles' (0x800000000) on cpu16-cpu23 (atom cpus).
For perf-stat result, it displays two events:
Performance counter stats for 'system wide':
6,744,979 cpu_core/cycles/
1,965,552 cpu_atom/cycles/
The first 'cycles' is core event, the second 'cycles' is atom event.
Thread mode example:
--------------------
perf-stat reports the scaled counts for hybrid event and with a percentage
displayed. The percentage is the event's running time/enabling time.
One example, 'triad_loop' runs on cpu16 (atom core), while we can see the
scaled value for core cycles is 160,444,092 and the percentage is 0.47%.
perf stat -e cycles -- taskset -c 16 ./triad_loop
As previous, two events are created.
------------------------------------------------------------
perf_event_attr:
size 120
config 0x400000000
sample_type IDENTIFIER
read_format TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING
disabled 1
inherit 1
enable_on_exec 1
exclude_guest 1
------------------------------------------------------------
and
------------------------------------------------------------
perf_event_attr:
size 120
config 0x800000000
sample_type IDENTIFIER
read_format TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING
disabled 1
inherit 1
enable_on_exec 1
exclude_guest 1
------------------------------------------------------------
Performance counter stats for 'taskset -c 16 ./triad_loop':
233,066,666 cpu_core/cycles/ (0.43%)
604,097,080 cpu_atom/cycles/ (99.57%)
perf-record:
------------
If there is no '-e' specified in perf record, on hybrid platform,
it creates two default 'cycles' and adds them to event list. One
is for core, the other is for atom.
perf-stat:
----------
If there is no '-e' specified in perf stat, on hybrid platform,
besides of software events, following events are created and
added to event list in order.
cpu_core/cycles/,
cpu_atom/cycles/,
cpu_core/instructions/,
cpu_atom/instructions/,
cpu_core/branches/,
cpu_atom/branches/,
cpu_core/branch-misses/,
cpu_atom/branch-misses/
Of course, both perf-stat and perf-record support to enable
hybrid event with a specific pmu.
e.g.
perf stat -e cpu_core/cycles/
perf stat -e cpu_atom/cycles/
perf stat -e cpu_core/r1a/
perf stat -e cpu_atom/L1-icache-loads/
perf stat -e cpu_core/cycles/,cpu_atom/instructions/
perf stat -e '{cpu_core/cycles/,cpu_core/instructions/}'
But '{cpu_core/cycles/,cpu_atom/instructions/}' will return
warning and disable grouping, because the pmus in group are
not matched (cpu_core vs. cpu_atom).

View File

@ -124,6 +124,13 @@ OPTIONS
--group::
Show event group information together
--demangle::
Demangle symbol names to human readable form. It's enabled by default,
disable with --no-demangle.
--demangle-kernel::
Demangle kernel symbol names to human readable form (for C++ kernels).
--percent-type::
Set annotation percent type from following choices:
global-period, local-period, global-hits, local-hits

View File

@ -57,7 +57,7 @@ OPTIONS
-u::
--update=::
Update specified file of the cache. Note that this doesn't remove
older entires since those may be still needed for annotating old
older entries since those may be still needed for annotating old
(or remote) perf.data. Only if there is already a cache which has
exactly same build-id, that is replaced by new one. It can be used
to update kallsyms and kernel dso to vmlinux in order to support

View File

@ -123,6 +123,7 @@ Given a $HOME/.perfconfig like this:
queue-size = 0
children = true
group = true
skip-empty = true
[llvm]
dump-obj = true
@ -393,6 +394,12 @@ annotate.*::
This option works with tui, stdio2 browsers.
annotate.demangle::
Demangle symbol names to human readable form. Default is 'true'.
annotate.demangle_kernel::
Demangle kernel symbol names to human readable form. Default is 'true'.
hist.*::
hist.percentage::
This option control the way to calculate overhead of filtered entries -
@ -525,6 +532,10 @@ report.*::
0.07% 0.00% noploop ld-2.15.so [.] strcmp
0.03% 0.00% noploop [kernel.kallsyms] [k] timerqueue_del
report.skip-empty::
This option can change default stat behavior with empty results.
If it's set true, 'perf report --stat' will not show 0 stats.
top.*::
top.children::
Same as 'report.children'. So if it is enabled, the output of 'top'

View File

@ -17,7 +17,7 @@ Data file related processing.
COMMANDS
--------
convert::
Converts perf data file into another format (only CTF [1] format is support by now).
Converts perf data file into another format.
It's possible to set data-convert debug variable to get debug messages from conversion,
like:
perf --debug data-convert data convert ...
@ -27,6 +27,9 @@ OPTIONS for 'convert'
--to-ctf::
Triggers the CTF conversion, specify the path of CTF data directory.
--to-json::
Triggers JSON conversion. Specify the JSON filename to output.
--tod::
Convert time to wall clock time.

View File

@ -0,0 +1,88 @@
perf-iostat(1)
===============
NAME
----
perf-iostat - Show I/O performance metrics
SYNOPSIS
--------
[verse]
'perf iostat' list
'perf iostat' <ports> -- <command> [<options>]
DESCRIPTION
-----------
Mode is intended to provide four I/O performance metrics per each PCIe root port:
- Inbound Read - I/O devices below root port read from the host memory, in MB
- Inbound Write - I/O devices below root port write to the host memory, in MB
- Outbound Read - CPU reads from I/O devices below root port, in MB
- Outbound Write - CPU writes to I/O devices below root port, in MB
OPTIONS
-------
<command>...::
Any command you can specify in a shell.
list::
List all PCIe root ports.
<ports>::
Select the root ports for monitoring. Comma-separated list is supported.
EXAMPLES
--------
1. List all PCIe root ports (example for 2-S platform):
$ perf iostat list
S0-uncore_iio_0<0000:00>
S1-uncore_iio_0<0000:80>
S0-uncore_iio_1<0000:17>
S1-uncore_iio_1<0000:85>
S0-uncore_iio_2<0000:3a>
S1-uncore_iio_2<0000:ae>
S0-uncore_iio_3<0000:5d>
S1-uncore_iio_3<0000:d7>
2. Collect metrics for all PCIe root ports:
$ perf iostat -- dd if=/dev/zero of=/dev/nvme0n1 bs=1M oflag=direct
357708+0 records in
357707+0 records out
375083606016 bytes (375 GB, 349 GiB) copied, 215.974 s, 1.7 GB/s
Performance counter stats for 'system wide':
port Inbound Read(MB) Inbound Write(MB) Outbound Read(MB) Outbound Write(MB)
0000:00 1 0 2 3
0000:80 0 0 0 0
0000:17 352552 43 0 21
0000:85 0 0 0 0
0000:3a 3 0 0 0
0000:ae 0 0 0 0
0000:5d 0 0 0 0
0000:d7 0 0 0 0
3. Collect metrics for comma-separated list of PCIe root ports:
$ perf iostat 0000:17,0:3a -- dd if=/dev/zero of=/dev/nvme0n1 bs=1M oflag=direct
357708+0 records in
357707+0 records out
375083606016 bytes (375 GB, 349 GiB) copied, 197.08 s, 1.9 GB/s
Performance counter stats for 'system wide':
port Inbound Read(MB) Inbound Write(MB) Outbound Read(MB) Outbound Write(MB)
0000:17 358559 44 0 22
0000:3a 3 2 0 0
197.081983474 seconds time elapsed
SEE ALSO
--------
linkperf:perf-stat[1]

View File

@ -695,6 +695,7 @@ measurements:
wait -n ${perf_pid}
exit $?
include::intel-hybrid.txt[]
SEE ALSO
--------

View File

@ -112,6 +112,8 @@ OPTIONS
- ins_lat: Instruction latency in core cycles. This is the global instruction
latency
- local_ins_lat: Local instruction latency version
- p_stage_cyc: On powerpc, this presents the number of cycles spent in a
pipeline stage. And currently supported only on powerpc.
By default, comm, dso and symbol keys are used.
(i.e. --sort comm,dso,symbol)
@ -224,6 +226,9 @@ OPTIONS
--dump-raw-trace::
Dump raw trace in ASCII.
--disable-order::
Disable raw trace ordering.
-g::
--call-graph=<print_type,threshold[,print_limit],order,sort_key[,branch],value>::
Display call chains using type, min percent threshold, print limit,
@ -472,7 +477,7 @@ OPTIONS
but probably we'll make the default not to show the switch-on/off events
on the --group mode and if there is only one event besides the off/on ones,
go straight to the histogram browser, just like 'perf report' with no events
explicitely specified does.
explicitly specified does.
--itrace::
Options for decoding instruction tracing data. The options are:
@ -566,6 +571,9 @@ include::itrace.txt[]
sampled cycles
'Avg Cycles' - block average sampled cycles
--skip-empty::
Do not print 0 results in the --stat output.
include::callchain-overhead-calculation.txt[]
SEE ALSO

View File

@ -93,6 +93,19 @@ report::
1.102235068 seconds time elapsed
--bpf-counters::
Use BPF programs to aggregate readings from perf_events. This
allows multiple perf-stat sessions that are counting the same metric (cycles,
instructions, etc.) to share hardware counters.
To use BPF programs on common events by default, use
"perf config stat.bpf-counter-events=<list_of_events>".
--bpf-attr-map::
With option "--bpf-counters", different perf-stat sessions share
information about shared BPF programs and maps via a pinned hashmap.
Use "--bpf-attr-map" to specify the path of this pinned hashmap.
The default path is /sys/fs/bpf/perf_attr_map.
ifdef::HAVE_LIBPFM[]
--pfm-events events::
Select a PMU event using libpfm4 syntax (see http://perfmon2.sf.net)
@ -142,7 +155,10 @@ Do not aggregate counts across all monitored CPUs.
-n::
--null::
null run - don't start any counters
null run - Don't start any counters.
This can be useful to measure just elapsed wall-clock time - or to assess the
raw overhead of perf stat itself, without running any counters.
-v::
--verbose::
@ -468,6 +484,15 @@ convenient for post processing.
--summary::
Print summary for interval mode (-I).
--no-csv-summary::
Don't print 'summary' at the first column for CVS summary output.
This option must be used with -x and --summary.
This option can be enabled in perf config by setting the variable
'stat.no-csv-summary'.
$ perf config stat.no-csv-summary=true
EXAMPLES
--------
@ -527,6 +552,8 @@ The fields are in this order:
Additional metrics may be printed with all earlier fields being empty.
include::intel-hybrid.txt[]
SEE ALSO
--------
linkperf:perf-top[1], linkperf:perf-list[1]

View File

@ -317,7 +317,7 @@ Default is to monitor all CPUS.
but probably we'll make the default not to show the switch-on/off events
on the --group mode and if there is only one event besides the off/on ones,
go straight to the histogram browser, just like 'perf top' with no events
explicitely specified does.
explicitly specified does.
--stitch-lbr::
Show callgraph with stitched LBRs, which may have more complete

View File

@ -76,3 +76,15 @@ SEE ALSO
linkperf:perf-stat[1], linkperf:perf-top[1],
linkperf:perf-record[1], linkperf:perf-report[1],
linkperf:perf-list[1]
linkperf:perf-annotate[1],linkperf:perf-archive[1],
linkperf:perf-bench[1], linkperf:perf-buildid-cache[1],
linkperf:perf-buildid-list[1], linkperf:perf-c2c[1],
linkperf:perf-config[1], linkperf:perf-data[1], linkperf:perf-diff[1],
linkperf:perf-evlist[1], linkperf:perf-ftrace[1],
linkperf:perf-help[1], linkperf:perf-inject[1],
linkperf:perf-intel-pt[1], linkperf:perf-kallsyms[1],
linkperf:perf-kmem[1], linkperf:perf-kvm[1], linkperf:perf-lock[1],
linkperf:perf-mem[1], linkperf:perf-probe[1], linkperf:perf-sched[1],
linkperf:perf-script[1], linkperf:perf-test[1],
linkperf:perf-trace[1], linkperf:perf-version[1]

View File

@ -72,6 +72,7 @@ For example, the perf_event_attr structure can be initialized with
The Fixed counter 3 must be the leader of the group.
#include <linux/perf_event.h>
#include <sys/mman.h>
#include <sys/syscall.h>
#include <unistd.h>
@ -95,6 +96,11 @@ int slots_fd = perf_event_open(&slots, 0, -1, -1, 0);
if (slots_fd < 0)
... error ...
/* Memory mapping the fd permits _rdpmc calls from userspace */
void *slots_p = mmap(0, getpagesize(), PROT_READ, MAP_SHARED, slots_fd, 0);
if (!slot_p)
.... error ...
/*
* Open metrics event file descriptor for current task.
* Set slots event as the leader of the group.
@ -110,6 +116,14 @@ int metrics_fd = perf_event_open(&metrics, 0, -1, slots_fd, 0);
if (metrics_fd < 0)
... error ...
/* Memory mapping the fd permits _rdpmc calls from userspace */
void *metrics_p = mmap(0, getpagesize(), PROT_READ, MAP_SHARED, metrics_fd, 0);
if (!metrics_p)
... error ...
Note: the file descriptors returned by the perf_event_open calls must be memory
mapped to permit calls to the _rdpmd instruction. Permission may also be granted
by writing the /sys/devices/cpu/rdpmc sysfs node.
The RDPMC instruction (or _rdpmc compiler intrinsic) can now be used
to read slots and the topdown metrics at different points of the program:
@ -141,6 +155,10 @@ as the parallelism and overlap in the CPU program execution will
cause too much measurement inaccuracy. For example instrumenting
individual basic blocks is definitely too fine grained.
_rdpmc calls should not be mixed with reading the metrics and slots counters
through system calls, as the kernel will reset these counters after each system
call.
Decoding metrics values
=======================

View File

@ -100,7 +100,10 @@ clean:
# make -C tools/perf -f tests/make
#
build-test:
@$(MAKE) SHUF=1 -f tests/make REUSE_FEATURES_DUMP=1 MK=Makefile SET_PARALLEL=1 --no-print-directory tarpkg out
@$(MAKE) SHUF=1 -f tests/make REUSE_FEATURES_DUMP=1 MK=Makefile SET_PARALLEL=1 --no-print-directory tarpkg make_static make_with_gtk2 out
build-test-tarball:
@$(MAKE) -f tests/make REUSE_FEATURES_DUMP=1 MK=Makefile SET_PARALLEL=1 --no-print-directory out
#
# All other targets get passed through:

View File

@ -32,7 +32,7 @@ ifneq ($(NO_SYSCALL_TABLE),1)
NO_SYSCALL_TABLE := 0
endif
else
ifeq ($(SRCARCH),$(filter $(SRCARCH),powerpc arm64 s390))
ifeq ($(SRCARCH),$(filter $(SRCARCH),powerpc arm64 s390 mips))
NO_SYSCALL_TABLE := 0
endif
endif
@ -87,6 +87,13 @@ ifeq ($(ARCH),s390)
CFLAGS += -fPIC -I$(OUTPUT)arch/s390/include/generated
endif
ifeq ($(ARCH),mips)
NO_PERF_REGS := 0
CFLAGS += -I$(OUTPUT)arch/mips/include/generated
CFLAGS += -I../../arch/mips/include/uapi -I../../arch/mips/include/generated/uapi
LIBUNWIND_LIBS = -lunwind -lunwind-mips
endif
ifeq ($(NO_PERF_REGS),0)
$(call detected,CONFIG_PERF_REGS)
endif
@ -292,6 +299,9 @@ ifneq ($(TCMALLOC),)
endif
ifeq ($(FEATURES_DUMP),)
# We will display at the end of this Makefile.config, using $(call feature_display_entries)
# As we may retry some feature detection here, see the disassembler-four-args case, for instance
FEATURE_DISPLAY_DEFERRED := 1
include $(srctree)/tools/build/Makefile.feature
else
include $(FEATURES_DUMP)
@ -1072,6 +1082,15 @@ ifdef LIBPFM4
endif
endif
ifdef LIBTRACEEVENT_DYNAMIC
$(call feature_check,libtraceevent)
ifeq ($(feature-libtraceevent), 1)
EXTLIBS += -ltraceevent
else
dummy := $(error Error: No libtraceevent devel library found, please install libtraceevent-devel);
endif
endif
# Among the variables below, these:
# perfexecdir
# perf_include_dir
@ -1208,3 +1227,13 @@ $(call detected_var,LIBDIR)
$(call detected_var,GTK_CFLAGS)
$(call detected_var,PERL_EMBED_CCOPTS)
$(call detected_var,PYTHON_EMBED_CCOPTS)
# re-generate FEATURE-DUMP as we may have called feature_check, found out
# extra libraries to add to LDFLAGS of some other test and then redo those
# tests, see the block about libbfd, disassembler-four-args, for instance.
$(shell rm -f $(FEATURE_DUMP_FILENAME))
$(foreach feat,$(FEATURE_TESTS),$(shell echo "$(call feature_assign,$(feat))" >> $(FEATURE_DUMP_FILENAME)))
ifeq ($(feature_display),1)
$(call feature_display_entries)
endif

View File

@ -128,6 +128,8 @@ include ../scripts/utilities.mak
#
# Define BUILD_BPF_SKEL to enable BPF skeletons
#
# Define LIBTRACEEVENT_DYNAMIC to enable libtraceevent dynamic linking
#
# As per kernel Makefile, avoid funny character set dependencies
unexport LC_ALL
@ -283,6 +285,7 @@ SCRIPT_SH =
SCRIPT_SH += perf-archive.sh
SCRIPT_SH += perf-with-kcore.sh
SCRIPT_SH += perf-iostat.sh
grep-libs = $(filter -l%,$(1))
strip-libs = $(filter-out -l%,$(1))
@ -309,7 +312,6 @@ endif
LIBTRACEEVENT = $(TE_PATH)libtraceevent.a
export LIBTRACEEVENT
LIBTRACEEVENT_DYNAMIC_LIST = $(PLUGINS_PATH)libtraceevent-dynamic-list
#
@ -374,12 +376,15 @@ endif
export PERL_PATH
PERFLIBS = $(LIBAPI) $(LIBTRACEEVENT) $(LIBSUBCMD) $(LIBPERF)
PERFLIBS = $(LIBAPI) $(LIBSUBCMD) $(LIBPERF)
ifndef NO_LIBBPF
ifndef LIBBPF_DYNAMIC
PERFLIBS += $(LIBBPF)
endif
endif
ifndef LIBTRACEEVENT_DYNAMIC
PERFLIBS += $(LIBTRACEEVENT)
endif
# We choose to avoid "if .. else if .. else .. endif endif"
# because maintaining the nesting to match is a pain. If
@ -948,6 +953,8 @@ endif
$(INSTALL) $(OUTPUT)perf-archive -t '$(DESTDIR_SQ)$(perfexec_instdir_SQ)'
$(call QUIET_INSTALL, perf-with-kcore) \
$(INSTALL) $(OUTPUT)perf-with-kcore -t '$(DESTDIR_SQ)$(perfexec_instdir_SQ)'
$(call QUIET_INSTALL, perf-iostat) \
$(INSTALL) $(OUTPUT)perf-iostat -t '$(DESTDIR_SQ)$(perfexec_instdir_SQ)'
ifndef NO_LIBAUDIT
$(call QUIET_INSTALL, strace/groups) \
$(INSTALL) -d -m 755 '$(DESTDIR_SQ)$(STRACE_GROUPS_INSTDIR_SQ)'; \
@ -1007,6 +1014,7 @@ python-clean:
SKEL_OUT := $(abspath $(OUTPUT)util/bpf_skel)
SKEL_TMP_OUT := $(abspath $(SKEL_OUT)/.tmp)
SKELETONS := $(SKEL_OUT)/bpf_prog_profiler.skel.h
SKELETONS += $(SKEL_OUT)/bperf_leader.skel.h $(SKEL_OUT)/bperf_follower.skel.h
ifdef BUILD_BPF_SKEL
BPFTOOL := $(SKEL_TMP_OUT)/bootstrap/bpftool
@ -1021,7 +1029,7 @@ $(BPFTOOL): | $(SKEL_TMP_OUT)
OUTPUT=$(SKEL_TMP_OUT)/ bootstrap
$(SKEL_TMP_OUT)/%.bpf.o: util/bpf_skel/%.bpf.c $(LIBBPF) | $(SKEL_TMP_OUT)
$(QUIET_CLANG)$(CLANG) -g -O2 -target bpf $(BPF_INCLUDE) \
$(QUIET_CLANG)$(CLANG) -g -O2 -target bpf -Wall -Werror $(BPF_INCLUDE) \
-c $(filter util/bpf_skel/%.bpf.c,$^) -o $@ && $(LLVM_STRIP) -g $@
$(SKEL_OUT)/%.skel.h: $(SKEL_TMP_OUT)/%.bpf.o | $(BPFTOOL)
@ -1041,7 +1049,7 @@ bpf-skel-clean:
$(call QUIET_CLEAN, bpf-skel) $(RM) -r $(SKEL_TMP_OUT) $(SKELETONS)
clean:: $(LIBTRACEEVENT)-clean $(LIBAPI)-clean $(LIBBPF)-clean $(LIBSUBCMD)-clean $(LIBPERF)-clean fixdep-clean python-clean bpf-skel-clean
$(call QUIET_CLEAN, core-objs) $(RM) $(LIBPERF_A) $(OUTPUT)perf-archive $(OUTPUT)perf-with-kcore $(LANG_BINDINGS)
$(call QUIET_CLEAN, core-objs) $(RM) $(LIBPERF_A) $(OUTPUT)perf-archive $(OUTPUT)perf-with-kcore $(OUTPUT)perf-iostat $(LANG_BINDINGS)
$(Q)find $(if $(OUTPUT),$(OUTPUT),.) -name '*.o' -delete -o -name '\.*.cmd' -delete -o -name '\.*.d' -delete
$(Q)$(RM) $(OUTPUT).config-detected
$(call QUIET_CLEAN, core-progs) $(RM) $(ALL_PROGRAMS) perf perf-read-vdso32 perf-read-vdsox32 $(OUTPUT)pmu-events/jevents $(OUTPUT)$(LIBJVMTI).so

View File

@ -67,6 +67,7 @@ static int cs_etm_set_context_id(struct auxtrace_record *itr,
char path[PATH_MAX];
int err = -EINVAL;
u32 val;
u64 contextid;
ptr = container_of(itr, struct cs_etm_recording, itr);
cs_etm_pmu = ptr->cs_etm_pmu;
@ -86,25 +87,59 @@ static int cs_etm_set_context_id(struct auxtrace_record *itr,
goto out;
}
/* User has configured for PID tracing, respects it. */
contextid = evsel->core.attr.config &
(BIT(ETM_OPT_CTXTID) | BIT(ETM_OPT_CTXTID2));
/*
* TRCIDR2.CIDSIZE, bit [9-5], indicates whether contextID tracing
* is supported:
* 0b00000 Context ID tracing is not supported.
* 0b00100 Maximum of 32-bit Context ID size.
* All other values are reserved.
* If user doesn't configure the contextid format, parse PMU format and
* enable PID tracing according to the "contextid" format bits:
*
* If bit ETM_OPT_CTXTID is set, trace CONTEXTIDR_EL1;
* If bit ETM_OPT_CTXTID2 is set, trace CONTEXTIDR_EL2.
*/
val = BMVAL(val, 5, 9);
if (!val || val != 0x4) {
err = -EINVAL;
goto out;
if (!contextid)
contextid = perf_pmu__format_bits(&cs_etm_pmu->format,
"contextid");
if (contextid & BIT(ETM_OPT_CTXTID)) {
/*
* TRCIDR2.CIDSIZE, bit [9-5], indicates whether contextID
* tracing is supported:
* 0b00000 Context ID tracing is not supported.
* 0b00100 Maximum of 32-bit Context ID size.
* All other values are reserved.
*/
val = BMVAL(val, 5, 9);
if (!val || val != 0x4) {
pr_err("%s: CONTEXTIDR_EL1 isn't supported\n",
CORESIGHT_ETM_PMU_NAME);
err = -EINVAL;
goto out;
}
}
if (contextid & BIT(ETM_OPT_CTXTID2)) {
/*
* TRCIDR2.VMIDOPT[30:29] != 0 and
* TRCIDR2.VMIDSIZE[14:10] == 0b00100 (32bit virtual contextid)
* We can't support CONTEXTIDR in VMID if the size of the
* virtual context id is < 32bit.
* Any value of VMIDSIZE >= 4 (i.e, > 32bit) is fine for us.
*/
if (!BMVAL(val, 29, 30) || BMVAL(val, 10, 14) < 4) {
pr_err("%s: CONTEXTIDR_EL2 isn't supported\n",
CORESIGHT_ETM_PMU_NAME);
err = -EINVAL;
goto out;
}
}
/* All good, let the kernel know */
evsel->core.attr.config |= (1 << ETM_OPT_CTXTID);
evsel->core.attr.config |= contextid;
err = 0;
out:
return err;
}
@ -173,17 +208,17 @@ static int cs_etm_set_option(struct auxtrace_record *itr,
!cpu_map__has(online_cpus, i))
continue;
if (option & ETM_SET_OPT_CTXTID) {
if (option & BIT(ETM_OPT_CTXTID)) {
err = cs_etm_set_context_id(itr, evsel, i);
if (err)
goto out;
}
if (option & ETM_SET_OPT_TS) {
if (option & BIT(ETM_OPT_TS)) {
err = cs_etm_set_timestamp(itr, evsel, i);
if (err)
goto out;
}
if (option & ~(ETM_SET_OPT_MASK))
if (option & ~(BIT(ETM_OPT_CTXTID) | BIT(ETM_OPT_TS)))
/* Nothing else is currently supported */
goto out;
}
@ -343,7 +378,7 @@ static int cs_etm_recording_options(struct auxtrace_record *itr,
opts->auxtrace_mmap_pages = roundup_pow_of_two(sz);
}
/* Snapshost size can't be bigger than the auxtrace area */
/* Snapshot size can't be bigger than the auxtrace area */
if (opts->auxtrace_snapshot_size >
opts->auxtrace_mmap_pages * (size_t)page_size) {
pr_err("Snapshot size %zu must not be greater than AUX area tracing mmap size %zu\n",
@ -410,7 +445,7 @@ static int cs_etm_recording_options(struct auxtrace_record *itr,
evsel__set_sample_bit(cs_etm_evsel, CPU);
err = cs_etm_set_option(itr, cs_etm_evsel,
ETM_SET_OPT_CTXTID | ETM_SET_OPT_TS);
BIT(ETM_OPT_CTXTID) | BIT(ETM_OPT_TS));
if (err)
goto out;
}
@ -489,7 +524,9 @@ static u64 cs_etmv4_get_config(struct auxtrace_record *itr)
config |= BIT(ETM4_CFG_BIT_TS);
if (config_opts & BIT(ETM_OPT_RETSTK))
config |= BIT(ETM4_CFG_BIT_RETSTK);
if (config_opts & BIT(ETM_OPT_CTXTID2))
config |= BIT(ETM4_CFG_BIT_VMID) |
BIT(ETM4_CFG_BIT_VMID_OPT);
return config;
}
@ -576,7 +613,7 @@ static void cs_etm_get_metadata(int cpu, u32 *offset,
struct auxtrace_record *itr,
struct perf_record_auxtrace_info *info)
{
u32 increment;
u32 increment, nr_trc_params;
u64 magic;
struct cs_etm_recording *ptr =
container_of(itr, struct cs_etm_recording, itr);
@ -611,6 +648,7 @@ static void cs_etm_get_metadata(int cpu, u32 *offset,
/* How much space was used */
increment = CS_ETMV4_PRIV_MAX;
nr_trc_params = CS_ETMV4_PRIV_MAX - CS_ETMV4_TRCCONFIGR;
} else {
magic = __perf_cs_etmv3_magic;
/* Get configuration register */
@ -628,11 +666,13 @@ static void cs_etm_get_metadata(int cpu, u32 *offset,
/* How much space was used */
increment = CS_ETM_PRIV_MAX;
nr_trc_params = CS_ETM_PRIV_MAX - CS_ETM_ETMCR;
}
/* Build generic header portion */
info->priv[*offset + CS_ETM_MAGIC] = magic;
info->priv[*offset + CS_ETM_CPU] = cpu;
info->priv[*offset + CS_ETM_NR_TRC_PARAMS] = nr_trc_params;
/* Where the next CPU entry should start from */
*offset += increment;
}
@ -678,7 +718,7 @@ static int cs_etm_info_fill(struct auxtrace_record *itr,
/* First fill out the session header */
info->type = PERF_AUXTRACE_CS_ETM;
info->priv[CS_HEADER_VERSION_0] = 0;
info->priv[CS_HEADER_VERSION] = CS_HEADER_CURRENT_VERSION;
info->priv[CS_PMU_TYPE_CPUS] = type << 32;
info->priv[CS_PMU_TYPE_CPUS] |= nr_cpu;
info->priv[CS_ETM_SNAPSHOT] = ptr->snapshot_mode;

View File

@ -2,6 +2,7 @@ perf-y += header.o
perf-y += machine.o
perf-y += perf_regs.o
perf-y += tsc.o
perf-y += pmu.o
perf-y += kvm-stat.o
perf-$(CONFIG_DWARF) += dwarf-regs.o
perf-$(CONFIG_LOCAL_LIBUNWIND) += unwind-libunwind.o

View File

@ -1,8 +1,8 @@
// SPDX-License-Identifier: GPL-2.0
#include <errno.h>
#include <memory.h>
#include "../../util/evsel.h"
#include "../../util/kvm-stat.h"
#include "../../../util/evsel.h"
#include "../../../util/kvm-stat.h"
#include "arm64_exception_types.h"
#include "debug.h"

View File

@ -6,11 +6,11 @@
#include "debug.h"
#include "symbol.h"
/* On arm64, kernel text segment start at high memory address,
/* On arm64, kernel text segment starts at high memory address,
* for example 0xffff 0000 8xxx xxxx. Modules start at a low memory
* address, like 0xffff 0000 00ax xxxx. When only samll amount of
* address, like 0xffff 0000 00ax xxxx. When only small amount of
* memory is used by modules, gap between end of module's text segment
* and start of kernel text segment may be reach 2G.
* and start of kernel text segment may reach 2G.
* Therefore do not fill this gap and do not assign it to the kernel dso map.
*/

View File

@ -108,7 +108,7 @@ int arch_sdt_arg_parse_op(char *old_op, char **new_op)
/* [sp], [sp, NUM] or [sp,NUM] */
new_len = 7; /* + ( % s p ) NULL */
/* If the arugment is [sp], need to fill offset '0' */
/* If the argument is [sp], need to fill offset '0' */
if (rm[2].rm_so == -1)
new_len += 1;
else

View File

@ -0,0 +1,25 @@
// SPDX-License-Identifier: GPL-2.0
#include "../../../util/cpumap.h"
#include "../../../util/pmu.h"
struct pmu_events_map *pmu_events_map__find(void)
{
struct perf_pmu *pmu = NULL;
while ((pmu = perf_pmu__scan(pmu))) {
if (!is_pmu_core(pmu->name))
continue;
/*
* The cpumap should cover all CPUs. Otherwise, some CPUs may
* not support some events or have different event IDs.
*/
if (pmu->cpus->nr != cpu__max_cpu())
return NULL;
return perf_pmu__find_map(pmu);
}
return NULL;
}

View File

@ -4,9 +4,9 @@
#ifndef REMOTE_UNWIND_LIBUNWIND
#include <libunwind.h>
#include "perf_regs.h"
#include "../../util/unwind.h"
#include "../../../util/unwind.h"
#endif
#include "../../util/debug.h"
#include "../../../util/debug.h"
int LIBUNWIND__ARCH_REG_ID(int regnum)
{

View File

@ -0,0 +1,22 @@
# SPDX-License-Identifier: GPL-2.0
ifndef NO_DWARF
PERF_HAVE_DWARF_REGS := 1
endif
# Syscall table generation for perf
out := $(OUTPUT)arch/mips/include/generated/asm
header := $(out)/syscalls_n64.c
sysprf := $(srctree)/tools/perf/arch/mips/entry/syscalls
sysdef := $(sysprf)/syscall_n64.tbl
systbl := $(sysprf)/mksyscalltbl
# Create output directory if not already present
_dummy := $(shell [ -d '$(out)' ] || mkdir -p '$(out)')
$(header): $(sysdef) $(systbl)
$(Q)$(SHELL) '$(systbl)' $(sysdef) > $@
clean::
$(call QUIET_CLEAN, mips) $(RM) $(header)
archheaders: $(header)

View File

@ -0,0 +1,32 @@
#!/bin/sh
# SPDX-License-Identifier: GPL-2.0
#
# Generate system call table for perf. Derived from
# s390 script.
#
# Author(s): Hendrik Brueckner <brueckner@linux.vnet.ibm.com>
# Changed by: Tiezhu Yang <yangtiezhu@loongson.cn>
SYSCALL_TBL=$1
if ! test -r $SYSCALL_TBL; then
echo "Could not read input file" >&2
exit 1
fi
create_table()
{
local max_nr nr abi sc discard
echo 'static const char *syscalltbl_mips_n64[] = {'
while read nr abi sc discard; do
printf '\t[%d] = "%s",\n' $nr $sc
max_nr=$nr
done
echo '};'
echo "#define SYSCALLTBL_MIPS_N64_MAX_ID $max_nr"
}
grep -E "^[[:digit:]]+[[:space:]]+(n64)" $SYSCALL_TBL \
|sort -k1 -n \
|create_table

View File

@ -0,0 +1,358 @@
# SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note
#
# system call numbers and entry vectors for mips
#
# The format is:
# <number> <abi> <name> <entry point>
#
# The <abi> is always "n64" for this file.
#
0 n64 read sys_read
1 n64 write sys_write
2 n64 open sys_open
3 n64 close sys_close
4 n64 stat sys_newstat
5 n64 fstat sys_newfstat
6 n64 lstat sys_newlstat
7 n64 poll sys_poll
8 n64 lseek sys_lseek
9 n64 mmap sys_mips_mmap
10 n64 mprotect sys_mprotect
11 n64 munmap sys_munmap
12 n64 brk sys_brk
13 n64 rt_sigaction sys_rt_sigaction
14 n64 rt_sigprocmask sys_rt_sigprocmask
15 n64 ioctl sys_ioctl
16 n64 pread64 sys_pread64
17 n64 pwrite64 sys_pwrite64
18 n64 readv sys_readv
19 n64 writev sys_writev
20 n64 access sys_access
21 n64 pipe sysm_pipe
22 n64 _newselect sys_select
23 n64 sched_yield sys_sched_yield
24 n64 mremap sys_mremap
25 n64 msync sys_msync
26 n64 mincore sys_mincore
27 n64 madvise sys_madvise
28 n64 shmget sys_shmget
29 n64 shmat sys_shmat
30 n64 shmctl sys_old_shmctl
31 n64 dup sys_dup
32 n64 dup2 sys_dup2
33 n64 pause sys_pause
34 n64 nanosleep sys_nanosleep
35 n64 getitimer sys_getitimer
36 n64 setitimer sys_setitimer
37 n64 alarm sys_alarm
38 n64 getpid sys_getpid
39 n64 sendfile sys_sendfile64
40 n64 socket sys_socket
41 n64 connect sys_connect
42 n64 accept sys_accept
43 n64 sendto sys_sendto
44 n64 recvfrom sys_recvfrom
45 n64 sendmsg sys_sendmsg
46 n64 recvmsg sys_recvmsg
47 n64 shutdown sys_shutdown
48 n64 bind sys_bind
49 n64 listen sys_listen
50 n64 getsockname sys_getsockname
51 n64 getpeername sys_getpeername
52 n64 socketpair sys_socketpair
53 n64 setsockopt sys_setsockopt
54 n64 getsockopt sys_getsockopt
55 n64 clone __sys_clone
56 n64 fork __sys_fork
57 n64 execve sys_execve
58 n64 exit sys_exit
59 n64 wait4 sys_wait4
60 n64 kill sys_kill
61 n64 uname sys_newuname
62 n64 semget sys_semget
63 n64 semop sys_semop
64 n64 semctl sys_old_semctl
65 n64 shmdt sys_shmdt
66 n64 msgget sys_msgget
67 n64 msgsnd sys_msgsnd
68 n64 msgrcv sys_msgrcv
69 n64 msgctl sys_old_msgctl
70 n64 fcntl sys_fcntl
71 n64 flock sys_flock
72 n64 fsync sys_fsync
73 n64 fdatasync sys_fdatasync
74 n64 truncate sys_truncate
75 n64 ftruncate sys_ftruncate
76 n64 getdents sys_getdents
77 n64 getcwd sys_getcwd
78 n64 chdir sys_chdir
79 n64 fchdir sys_fchdir
80 n64 rename sys_rename
81 n64 mkdir sys_mkdir
82 n64 rmdir sys_rmdir
83 n64 creat sys_creat
84 n64 link sys_link
85 n64 unlink sys_unlink
86 n64 symlink sys_symlink
87 n64 readlink sys_readlink
88 n64 chmod sys_chmod
89 n64 fchmod sys_fchmod
90 n64 chown sys_chown
91 n64 fchown sys_fchown
92 n64 lchown sys_lchown
93 n64 umask sys_umask
94 n64 gettimeofday sys_gettimeofday
95 n64 getrlimit sys_getrlimit
96 n64 getrusage sys_getrusage
97 n64 sysinfo sys_sysinfo
98 n64 times sys_times
99 n64 ptrace sys_ptrace
100 n64 getuid sys_getuid
101 n64 syslog sys_syslog
102 n64 getgid sys_getgid
103 n64 setuid sys_setuid
104 n64 setgid sys_setgid
105 n64 geteuid sys_geteuid
106 n64 getegid sys_getegid
107 n64 setpgid sys_setpgid
108 n64 getppid sys_getppid
109 n64 getpgrp sys_getpgrp
110 n64 setsid sys_setsid
111 n64 setreuid sys_setreuid
112 n64 setregid sys_setregid
113 n64 getgroups sys_getgroups
114 n64 setgroups sys_setgroups
115 n64 setresuid sys_setresuid
116 n64 getresuid sys_getresuid
117 n64 setresgid sys_setresgid
118 n64 getresgid sys_getresgid
119 n64 getpgid sys_getpgid
120 n64 setfsuid sys_setfsuid
121 n64 setfsgid sys_setfsgid
122 n64 getsid sys_getsid
123 n64 capget sys_capget
124 n64 capset sys_capset
125 n64 rt_sigpending sys_rt_sigpending
126 n64 rt_sigtimedwait sys_rt_sigtimedwait
127 n64 rt_sigqueueinfo sys_rt_sigqueueinfo
128 n64 rt_sigsuspend sys_rt_sigsuspend
129 n64 sigaltstack sys_sigaltstack
130 n64 utime sys_utime
131 n64 mknod sys_mknod
132 n64 personality sys_personality
133 n64 ustat sys_ustat
134 n64 statfs sys_statfs
135 n64 fstatfs sys_fstatfs
136 n64 sysfs sys_sysfs
137 n64 getpriority sys_getpriority
138 n64 setpriority sys_setpriority
139 n64 sched_setparam sys_sched_setparam
140 n64 sched_getparam sys_sched_getparam
141 n64 sched_setscheduler sys_sched_setscheduler
142 n64 sched_getscheduler sys_sched_getscheduler
143 n64 sched_get_priority_max sys_sched_get_priority_max
144 n64 sched_get_priority_min sys_sched_get_priority_min
145 n64 sched_rr_get_interval sys_sched_rr_get_interval
146 n64 mlock sys_mlock
147 n64 munlock sys_munlock
148 n64 mlockall sys_mlockall
149 n64 munlockall sys_munlockall
150 n64 vhangup sys_vhangup
151 n64 pivot_root sys_pivot_root
152 n64 _sysctl sys_ni_syscall
153 n64 prctl sys_prctl
154 n64 adjtimex sys_adjtimex
155 n64 setrlimit sys_setrlimit
156 n64 chroot sys_chroot
157 n64 sync sys_sync
158 n64 acct sys_acct
159 n64 settimeofday sys_settimeofday
160 n64 mount sys_mount
161 n64 umount2 sys_umount
162 n64 swapon sys_swapon
163 n64 swapoff sys_swapoff
164 n64 reboot sys_reboot
165 n64 sethostname sys_sethostname
166 n64 setdomainname sys_setdomainname
167 n64 create_module sys_ni_syscall
168 n64 init_module sys_init_module
169 n64 delete_module sys_delete_module
170 n64 get_kernel_syms sys_ni_syscall
171 n64 query_module sys_ni_syscall
172 n64 quotactl sys_quotactl
173 n64 nfsservctl sys_ni_syscall
174 n64 getpmsg sys_ni_syscall
175 n64 putpmsg sys_ni_syscall
176 n64 afs_syscall sys_ni_syscall
# 177 reserved for security
177 n64 reserved177 sys_ni_syscall
178 n64 gettid sys_gettid
179 n64 readahead sys_readahead
180 n64 setxattr sys_setxattr
181 n64 lsetxattr sys_lsetxattr
182 n64 fsetxattr sys_fsetxattr
183 n64 getxattr sys_getxattr
184 n64 lgetxattr sys_lgetxattr
185 n64 fgetxattr sys_fgetxattr
186 n64 listxattr sys_listxattr
187 n64 llistxattr sys_llistxattr
188 n64 flistxattr sys_flistxattr
189 n64 removexattr sys_removexattr
190 n64 lremovexattr sys_lremovexattr
191 n64 fremovexattr sys_fremovexattr
192 n64 tkill sys_tkill
193 n64 reserved193 sys_ni_syscall
194 n64 futex sys_futex
195 n64 sched_setaffinity sys_sched_setaffinity
196 n64 sched_getaffinity sys_sched_getaffinity
197 n64 cacheflush sys_cacheflush
198 n64 cachectl sys_cachectl
199 n64 sysmips __sys_sysmips
200 n64 io_setup sys_io_setup
201 n64 io_destroy sys_io_destroy
202 n64 io_getevents sys_io_getevents
203 n64 io_submit sys_io_submit
204 n64 io_cancel sys_io_cancel
205 n64 exit_group sys_exit_group
206 n64 lookup_dcookie sys_lookup_dcookie
207 n64 epoll_create sys_epoll_create
208 n64 epoll_ctl sys_epoll_ctl
209 n64 epoll_wait sys_epoll_wait
210 n64 remap_file_pages sys_remap_file_pages
211 n64 rt_sigreturn sys_rt_sigreturn
212 n64 set_tid_address sys_set_tid_address
213 n64 restart_syscall sys_restart_syscall
214 n64 semtimedop sys_semtimedop
215 n64 fadvise64 sys_fadvise64_64
216 n64 timer_create sys_timer_create
217 n64 timer_settime sys_timer_settime
218 n64 timer_gettime sys_timer_gettime
219 n64 timer_getoverrun sys_timer_getoverrun
220 n64 timer_delete sys_timer_delete
221 n64 clock_settime sys_clock_settime
222 n64 clock_gettime sys_clock_gettime
223 n64 clock_getres sys_clock_getres
224 n64 clock_nanosleep sys_clock_nanosleep
225 n64 tgkill sys_tgkill
226 n64 utimes sys_utimes
227 n64 mbind sys_mbind
228 n64 get_mempolicy sys_get_mempolicy
229 n64 set_mempolicy sys_set_mempolicy
230 n64 mq_open sys_mq_open
231 n64 mq_unlink sys_mq_unlink
232 n64 mq_timedsend sys_mq_timedsend
233 n64 mq_timedreceive sys_mq_timedreceive
234 n64 mq_notify sys_mq_notify
235 n64 mq_getsetattr sys_mq_getsetattr
236 n64 vserver sys_ni_syscall
237 n64 waitid sys_waitid
# 238 was sys_setaltroot
239 n64 add_key sys_add_key
240 n64 request_key sys_request_key
241 n64 keyctl sys_keyctl
242 n64 set_thread_area sys_set_thread_area
243 n64 inotify_init sys_inotify_init
244 n64 inotify_add_watch sys_inotify_add_watch
245 n64 inotify_rm_watch sys_inotify_rm_watch
246 n64 migrate_pages sys_migrate_pages
247 n64 openat sys_openat
248 n64 mkdirat sys_mkdirat
249 n64 mknodat sys_mknodat
250 n64 fchownat sys_fchownat
251 n64 futimesat sys_futimesat
252 n64 newfstatat sys_newfstatat
253 n64 unlinkat sys_unlinkat
254 n64 renameat sys_renameat
255 n64 linkat sys_linkat
256 n64 symlinkat sys_symlinkat
257 n64 readlinkat sys_readlinkat
258 n64 fchmodat sys_fchmodat
259 n64 faccessat sys_faccessat
260 n64 pselect6 sys_pselect6
261 n64 ppoll sys_ppoll
262 n64 unshare sys_unshare
263 n64 splice sys_splice
264 n64 sync_file_range sys_sync_file_range
265 n64 tee sys_tee
266 n64 vmsplice sys_vmsplice
267 n64 move_pages sys_move_pages
268 n64 set_robust_list sys_set_robust_list
269 n64 get_robust_list sys_get_robust_list
270 n64 kexec_load sys_kexec_load
271 n64 getcpu sys_getcpu
272 n64 epoll_pwait sys_epoll_pwait
273 n64 ioprio_set sys_ioprio_set
274 n64 ioprio_get sys_ioprio_get
275 n64 utimensat sys_utimensat
276 n64 signalfd sys_signalfd
277 n64 timerfd sys_ni_syscall
278 n64 eventfd sys_eventfd
279 n64 fallocate sys_fallocate
280 n64 timerfd_create sys_timerfd_create
281 n64 timerfd_gettime sys_timerfd_gettime
282 n64 timerfd_settime sys_timerfd_settime
283 n64 signalfd4 sys_signalfd4
284 n64 eventfd2 sys_eventfd2
285 n64 epoll_create1 sys_epoll_create1
286 n64 dup3 sys_dup3
287 n64 pipe2 sys_pipe2
288 n64 inotify_init1 sys_inotify_init1
289 n64 preadv sys_preadv
290 n64 pwritev sys_pwritev
291 n64 rt_tgsigqueueinfo sys_rt_tgsigqueueinfo
292 n64 perf_event_open sys_perf_event_open
293 n64 accept4 sys_accept4
294 n64 recvmmsg sys_recvmmsg
295 n64 fanotify_init sys_fanotify_init
296 n64 fanotify_mark sys_fanotify_mark
297 n64 prlimit64 sys_prlimit64
298 n64 name_to_handle_at sys_name_to_handle_at
299 n64 open_by_handle_at sys_open_by_handle_at
300 n64 clock_adjtime sys_clock_adjtime
301 n64 syncfs sys_syncfs
302 n64 sendmmsg sys_sendmmsg
303 n64 setns sys_setns
304 n64 process_vm_readv sys_process_vm_readv
305 n64 process_vm_writev sys_process_vm_writev
306 n64 kcmp sys_kcmp
307 n64 finit_module sys_finit_module
308 n64 getdents64 sys_getdents64
309 n64 sched_setattr sys_sched_setattr
310 n64 sched_getattr sys_sched_getattr
311 n64 renameat2 sys_renameat2
312 n64 seccomp sys_seccomp
313 n64 getrandom sys_getrandom
314 n64 memfd_create sys_memfd_create
315 n64 bpf sys_bpf
316 n64 execveat sys_execveat
317 n64 userfaultfd sys_userfaultfd
318 n64 membarrier sys_membarrier
319 n64 mlock2 sys_mlock2
320 n64 copy_file_range sys_copy_file_range
321 n64 preadv2 sys_preadv2
322 n64 pwritev2 sys_pwritev2
323 n64 pkey_mprotect sys_pkey_mprotect
324 n64 pkey_alloc sys_pkey_alloc
325 n64 pkey_free sys_pkey_free
326 n64 statx sys_statx
327 n64 rseq sys_rseq
328 n64 io_pgetevents sys_io_pgetevents
# 329 through 423 are reserved to sync up with other architectures
424 n64 pidfd_send_signal sys_pidfd_send_signal
425 n64 io_uring_setup sys_io_uring_setup
426 n64 io_uring_enter sys_io_uring_enter
427 n64 io_uring_register sys_io_uring_register
428 n64 open_tree sys_open_tree
429 n64 move_mount sys_move_mount
430 n64 fsopen sys_fsopen
431 n64 fsconfig sys_fsconfig
432 n64 fsmount sys_fsmount
433 n64 fspick sys_fspick
434 n64 pidfd_open sys_pidfd_open
435 n64 clone3 __sys_clone3
436 n64 close_range sys_close_range
437 n64 openat2 sys_openat2
438 n64 pidfd_getfd sys_pidfd_getfd
439 n64 faccessat2 sys_faccessat2
440 n64 process_madvise sys_process_madvise
441 n64 epoll_pwait2 sys_epoll_pwait2

View File

@ -0,0 +1,31 @@
/* SPDX-License-Identifier: GPL-2.0 */
/*
* dwarf-regs-table.h : Mapping of DWARF debug register numbers into
* register names.
*
* Copyright (C) 2013 Cavium, Inc.
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License as published by
* the Free Software Foundation; either version 2 of the License, or
* (at your option) any later version.
*
* This program is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
*/
#ifdef DEFINE_DWARF_REGSTR_TABLE
#undef REG_DWARFNUM_NAME
#define REG_DWARFNUM_NAME(reg, idx) [idx] = "$" #reg
static const char * const mips_regstr_tbl[] = {
"$0", "$1", "$2", "$3", "$4", "$5", "$6", "$7", "$8", "$9",
"$10", "$11", "$12", "$13", "$14", "$15", "$16", "$17", "$18", "$19",
"$20", "$21", "$22", "$23", "$24", "$25", "$26", "$27", "$28", "%29",
"$30", "$31",
REG_DWARFNUM_NAME(hi, 64),
REG_DWARFNUM_NAME(lo, 65),
};
#endif

View File

@ -0,0 +1,84 @@
/* SPDX-License-Identifier: GPL-2.0 */
#ifndef ARCH_PERF_REGS_H
#define ARCH_PERF_REGS_H
#include <stdlib.h>
#include <linux/types.h>
#include <asm/perf_regs.h>
#define PERF_REGS_MAX PERF_REG_MIPS_MAX
#define PERF_REG_IP PERF_REG_MIPS_PC
#define PERF_REG_SP PERF_REG_MIPS_R29
#define PERF_REGS_MASK ((1ULL << PERF_REG_MIPS_MAX) - 1)
static inline const char *__perf_reg_name(int id)
{
switch (id) {
case PERF_REG_MIPS_PC:
return "PC";
case PERF_REG_MIPS_R1:
return "$1";
case PERF_REG_MIPS_R2:
return "$2";
case PERF_REG_MIPS_R3:
return "$3";
case PERF_REG_MIPS_R4:
return "$4";
case PERF_REG_MIPS_R5:
return "$5";
case PERF_REG_MIPS_R6:
return "$6";
case PERF_REG_MIPS_R7:
return "$7";
case PERF_REG_MIPS_R8:
return "$8";
case PERF_REG_MIPS_R9:
return "$9";
case PERF_REG_MIPS_R10:
return "$10";
case PERF_REG_MIPS_R11:
return "$11";
case PERF_REG_MIPS_R12:
return "$12";
case PERF_REG_MIPS_R13:
return "$13";
case PERF_REG_MIPS_R14:
return "$14";
case PERF_REG_MIPS_R15:
return "$15";
case PERF_REG_MIPS_R16:
return "$16";
case PERF_REG_MIPS_R17:
return "$17";
case PERF_REG_MIPS_R18:
return "$18";
case PERF_REG_MIPS_R19:
return "$19";
case PERF_REG_MIPS_R20:
return "$20";
case PERF_REG_MIPS_R21:
return "$21";
case PERF_REG_MIPS_R22:
return "$22";
case PERF_REG_MIPS_R23:
return "$23";
case PERF_REG_MIPS_R24:
return "$24";
case PERF_REG_MIPS_R25:
return "$25";
case PERF_REG_MIPS_R28:
return "$28";
case PERF_REG_MIPS_R29:
return "$29";
case PERF_REG_MIPS_R30:
return "$30";
case PERF_REG_MIPS_R31:
return "$31";
default:
break;
}
return NULL;
}
#endif /* ARCH_PERF_REGS_H */

View File

@ -0,0 +1,3 @@
perf-y += perf_regs.o
perf-$(CONFIG_DWARF) += dwarf-regs.o
perf-$(CONFIG_LOCAL_LIBUNWIND) += unwind-libunwind.o

View File

@ -0,0 +1,38 @@
// SPDX-License-Identifier: GPL-2.0
/*
* dwarf-regs.c : Mapping of DWARF debug register numbers into register names.
*
* Copyright (C) 2013 Cavium, Inc.
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License as published by
* the Free Software Foundation; either version 2 of the License, or
* (at your option) any later version.
*
* This program is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
*/
#include <stdio.h>
#include <dwarf-regs.h>
static const char *mips_gpr_names[32] = {
"$0", "$1", "$2", "$3", "$4", "$5", "$6", "$7", "$8", "$9",
"$10", "$11", "$12", "$13", "$14", "$15", "$16", "$17", "$18", "$19",
"$20", "$21", "$22", "$23", "$24", "$25", "$26", "$27", "$28", "$29",
"$30", "$31"
};
const char *get_arch_regstr(unsigned int n)
{
if (n < 32)
return mips_gpr_names[n];
if (n == 64)
return "hi";
if (n == 65)
return "lo";
return NULL;
}

View File

@ -0,0 +1,6 @@
// SPDX-License-Identifier: GPL-2.0
#include "../../util/perf_regs.h"
const struct sample_reg sample_reg_masks[] = {
SMPL_REG_END
};

View File

@ -0,0 +1,22 @@
// SPDX-License-Identifier: GPL-2.0
#include <errno.h>
#include <libunwind.h>
#include "perf_regs.h"
#include "../../util/unwind.h"
#include "util/debug.h"
int libunwind__arch_reg_id(int regnum)
{
switch (regnum) {
case UNW_MIPS_R1 ... UNW_MIPS_R25:
return regnum - UNW_MIPS_R1 + PERF_REG_MIPS_R1;
case UNW_MIPS_R28 ... UNW_MIPS_R31:
return regnum - UNW_MIPS_R28 + PERF_REG_MIPS_R28;
case UNW_MIPS_PC:
return PERF_REG_MIPS_PC;
default:
pr_err("unwind: invalid reg id %d\n", regnum);
return -EINVAL;
}
}

View File

@ -4,6 +4,8 @@ perf-y += kvm-stat.o
perf-y += perf_regs.o
perf-y += mem-events.o
perf-y += sym-handling.o
perf-y += evsel.o
perf-y += event.o
perf-$(CONFIG_DWARF) += dwarf-regs.o
perf-$(CONFIG_DWARF) += skip-callchain-idx.o

View File

@ -0,0 +1,53 @@
// SPDX-License-Identifier: GPL-2.0
#include <linux/types.h>
#include <linux/string.h>
#include <linux/zalloc.h>
#include "../../../util/event.h"
#include "../../../util/synthetic-events.h"
#include "../../../util/machine.h"
#include "../../../util/tool.h"
#include "../../../util/map.h"
#include "../../../util/debug.h"
void arch_perf_parse_sample_weight(struct perf_sample *data,
const __u64 *array, u64 type)
{
union perf_sample_weight weight;
weight.full = *array;
if (type & PERF_SAMPLE_WEIGHT)
data->weight = weight.full;
else {
data->weight = weight.var1_dw;
data->ins_lat = weight.var2_w;
data->p_stage_cyc = weight.var3_w;
}
}
void arch_perf_synthesize_sample_weight(const struct perf_sample *data,
__u64 *array, u64 type)
{
*array = data->weight;
if (type & PERF_SAMPLE_WEIGHT_STRUCT) {
*array &= 0xffffffff;
*array |= ((u64)data->ins_lat << 32);
}
}
const char *arch_perf_header_entry(const char *se_header)
{
if (!strcmp(se_header, "Local INSTR Latency"))
return "Finish Cyc";
else if (!strcmp(se_header, "Pipeline Stage Cycle"))
return "Dispatch Cyc";
return se_header;
}
int arch_support_sort_key(const char *sort_key)
{
if (!strcmp(sort_key, "p_stage_cyc"))
return 1;
return 0;
}

View File

@ -0,0 +1,8 @@
// SPDX-License-Identifier: GPL-2.0
#include <stdio.h>
#include "util/evsel.h"
void arch_evsel__set_sample_weight(struct evsel *evsel)
{
evsel__set_sample_bit(evsel, WEIGHT_STRUCT);
}

View File

@ -176,7 +176,7 @@ int cpu_isa_init(struct perf_kvm_stat *kvm, const char *cpuid __maybe_unused)
}
/*
* Incase of powerpc architecture, pmu registers are programmable
* In case of powerpc architecture, pmu registers are programmable
* by guest kernel. So monitoring guest via host may not provide
* valid samples with default 'cycles' event. It is better to use
* 'trace_imc/trace_cycles' event for guest profiling, since it

View File

@ -10,6 +10,6 @@
#define SPRN_PVR 0x11F /* Processor Version Register */
#define PVR_VER(pvr) (((pvr) >> 16) & 0xFFFF) /* Version field */
#define PVR_REV(pvr) (((pvr) >> 0) & 0xFFFF) /* Revison field */
#define PVR_REV(pvr) (((pvr) >> 0) & 0xFFFF) /* Revision field */
#endif /* __PERF_UTIL_HEADER_H */

View File

@ -73,7 +73,7 @@ static int bp_modify1(void)
/*
* The parent does following steps:
* - creates a new breakpoint (id 0) for bp_2 function
* - changes that breakponit to bp_1 function
* - changes that breakpoint to bp_1 function
* - waits for the breakpoint to hit and checks
* it has proper rip of bp_1 function
* - detaches the child

View File

@ -9,6 +9,7 @@ perf-y += event.o
perf-y += evlist.o
perf-y += mem-events.o
perf-y += evsel.o
perf-y += iostat.o
perf-$(CONFIG_DWARF) += dwarf-regs.o
perf-$(CONFIG_BPF_PROLOGUE) += dwarf-regs.o

View File

@ -0,0 +1,470 @@
// SPDX-License-Identifier: GPL-2.0
/*
* perf iostat
*
* Copyright (C) 2020, Intel Corporation
*
* Authors: Alexander Antonov <alexander.antonov@linux.intel.com>
*/
#include <api/fs/fs.h>
#include <linux/kernel.h>
#include <linux/err.h>
#include <limits.h>
#include <stdio.h>
#include <string.h>
#include <errno.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <dirent.h>
#include <unistd.h>
#include <stdlib.h>
#include <regex.h>
#include "util/cpumap.h"
#include "util/debug.h"
#include "util/iostat.h"
#include "util/counts.h"
#include "path.h"
#ifndef MAX_PATH
#define MAX_PATH 1024
#endif
#define UNCORE_IIO_PMU_PATH "devices/uncore_iio_%d"
#define SYSFS_UNCORE_PMU_PATH "%s/"UNCORE_IIO_PMU_PATH
#define PLATFORM_MAPPING_PATH UNCORE_IIO_PMU_PATH"/die%d"
/*
* Each metric requiries one IIO event which increments at every 4B transfer
* in corresponding direction. The formulas to compute metrics are generic:
* #EventCount * 4B / (1024 * 1024)
*/
static const char * const iostat_metrics[] = {
"Inbound Read(MB)",
"Inbound Write(MB)",
"Outbound Read(MB)",
"Outbound Write(MB)",
};
static inline int iostat_metrics_count(void)
{
return sizeof(iostat_metrics) / sizeof(char *);
}
static const char *iostat_metric_by_idx(int idx)
{
return *(iostat_metrics + idx % iostat_metrics_count());
}
struct iio_root_port {
u32 domain;
u8 bus;
u8 die;
u8 pmu_idx;
int idx;
};
struct iio_root_ports_list {
struct iio_root_port **rps;
int nr_entries;
};
static struct iio_root_ports_list *root_ports;
static void iio_root_port_show(FILE *output,
const struct iio_root_port * const rp)
{
if (output && rp)
fprintf(output, "S%d-uncore_iio_%d<%04x:%02x>\n",
rp->die, rp->pmu_idx, rp->domain, rp->bus);
}
static struct iio_root_port *iio_root_port_new(u32 domain, u8 bus,
u8 die, u8 pmu_idx)
{
struct iio_root_port *p = calloc(1, sizeof(*p));
if (p) {
p->domain = domain;
p->bus = bus;
p->die = die;
p->pmu_idx = pmu_idx;
}
return p;
}
static void iio_root_ports_list_free(struct iio_root_ports_list *list)
{
int idx;
if (list) {
for (idx = 0; idx < list->nr_entries; idx++)
free(list->rps[idx]);
free(list->rps);
free(list);
}
}
static struct iio_root_port *iio_root_port_find_by_notation(
const struct iio_root_ports_list * const list, u32 domain, u8 bus)
{
int idx;
struct iio_root_port *rp;
if (list) {
for (idx = 0; idx < list->nr_entries; idx++) {
rp = list->rps[idx];
if (rp && rp->domain == domain && rp->bus == bus)
return rp;
}
}
return NULL;
}
static int iio_root_ports_list_insert(struct iio_root_ports_list *list,
struct iio_root_port * const rp)
{
struct iio_root_port **tmp_buf;
if (list && rp) {
rp->idx = list->nr_entries++;
tmp_buf = realloc(list->rps,
list->nr_entries * sizeof(*list->rps));
if (!tmp_buf) {
pr_err("Failed to realloc memory\n");
return -ENOMEM;
}
tmp_buf[rp->idx] = rp;
list->rps = tmp_buf;
}
return 0;
}
static int iio_mapping(u8 pmu_idx, struct iio_root_ports_list * const list)
{
char *buf;
char path[MAX_PATH];
u32 domain;
u8 bus;
struct iio_root_port *rp;
size_t size;
int ret;
for (int die = 0; die < cpu__max_node(); die++) {
scnprintf(path, MAX_PATH, PLATFORM_MAPPING_PATH, pmu_idx, die);
if (sysfs__read_str(path, &buf, &size) < 0) {
if (pmu_idx)
goto out;
pr_err("Mode iostat is not supported\n");
return -1;
}
ret = sscanf(buf, "%04x:%02hhx", &domain, &bus);
free(buf);
if (ret != 2) {
pr_err("Invalid mapping data: iio_%d; die%d\n",
pmu_idx, die);
return -1;
}
rp = iio_root_port_new(domain, bus, die, pmu_idx);
if (!rp || iio_root_ports_list_insert(list, rp)) {
free(rp);
return -ENOMEM;
}
}
out:
return 0;
}
static u8 iio_pmu_count(void)
{
u8 pmu_idx = 0;
char path[MAX_PATH];
const char *sysfs = sysfs__mountpoint();
if (sysfs) {
for (;; pmu_idx++) {
snprintf(path, sizeof(path), SYSFS_UNCORE_PMU_PATH,
sysfs, pmu_idx);
if (access(path, F_OK) != 0)
break;
}
}
return pmu_idx;
}
static int iio_root_ports_scan(struct iio_root_ports_list **list)
{
int ret = -ENOMEM;
struct iio_root_ports_list *tmp_list;
u8 pmu_count = iio_pmu_count();
if (!pmu_count) {
pr_err("Unsupported uncore pmu configuration\n");
return -1;
}
tmp_list = calloc(1, sizeof(*tmp_list));
if (!tmp_list)
goto err;
for (u8 pmu_idx = 0; pmu_idx < pmu_count; pmu_idx++) {
ret = iio_mapping(pmu_idx, tmp_list);
if (ret)
break;
}
err:
if (!ret)
*list = tmp_list;
else
iio_root_ports_list_free(tmp_list);
return ret;
}
static int iio_root_port_parse_str(u32 *domain, u8 *bus, char *str)
{
int ret;
regex_t regex;
/*
* Expected format domain:bus:
* Valid domain range [0:ffff]
* Valid bus range [0:ff]
* Example: 0000:af, 0:3d, 01:7
*/
regcomp(&regex, "^([a-f0-9A-F]{1,}):([a-f0-9A-F]{1,2})", REG_EXTENDED);
ret = regexec(&regex, str, 0, NULL, 0);
if (ret || sscanf(str, "%08x:%02hhx", domain, bus) != 2)
pr_warning("Unrecognized root port format: %s\n"
"Please use the following format:\n"
"\t [domain]:[bus]\n"
"\t for example: 0000:3d\n", str);
regfree(&regex);
return ret;
}
static int iio_root_ports_list_filter(struct iio_root_ports_list **list,
const char *filter)
{
char *tok, *tmp, *filter_copy = NULL;
struct iio_root_port *rp;
u32 domain;
u8 bus;
int ret = -ENOMEM;
struct iio_root_ports_list *tmp_list = calloc(1, sizeof(*tmp_list));
if (!tmp_list)
goto err;
filter_copy = strdup(filter);
if (!filter_copy)
goto err;
for (tok = strtok_r(filter_copy, ",", &tmp); tok;
tok = strtok_r(NULL, ",", &tmp)) {
if (!iio_root_port_parse_str(&domain, &bus, tok)) {
rp = iio_root_port_find_by_notation(*list, domain, bus);
if (rp) {
(*list)->rps[rp->idx] = NULL;
ret = iio_root_ports_list_insert(tmp_list, rp);
if (ret) {
free(rp);
goto err;
}
} else if (!iio_root_port_find_by_notation(tmp_list,
domain, bus))
pr_warning("Root port %04x:%02x were not found\n",
domain, bus);
}
}
if (tmp_list->nr_entries == 0) {
pr_err("Requested root ports were not found\n");
ret = -EINVAL;
}
err:
iio_root_ports_list_free(*list);
if (ret)
iio_root_ports_list_free(tmp_list);
else
*list = tmp_list;
free(filter_copy);
return ret;
}
static int iostat_event_group(struct evlist *evl,
struct iio_root_ports_list *list)
{
int ret;
int idx;
const char *iostat_cmd_template =
"{uncore_iio_%x/event=0x83,umask=0x04,ch_mask=0xF,fc_mask=0x07/,\
uncore_iio_%x/event=0x83,umask=0x01,ch_mask=0xF,fc_mask=0x07/,\
uncore_iio_%x/event=0xc0,umask=0x04,ch_mask=0xF,fc_mask=0x07/,\
uncore_iio_%x/event=0xc0,umask=0x01,ch_mask=0xF,fc_mask=0x07/}";
const int len_template = strlen(iostat_cmd_template) + 1;
struct evsel *evsel = NULL;
int metrics_count = iostat_metrics_count();
char *iostat_cmd = calloc(len_template, 1);
if (!iostat_cmd)
return -ENOMEM;
for (idx = 0; idx < list->nr_entries; idx++) {
sprintf(iostat_cmd, iostat_cmd_template,
list->rps[idx]->pmu_idx, list->rps[idx]->pmu_idx,
list->rps[idx]->pmu_idx, list->rps[idx]->pmu_idx);
ret = parse_events(evl, iostat_cmd, NULL);
if (ret)
goto err;
}
evlist__for_each_entry(evl, evsel) {
evsel->priv = list->rps[evsel->idx / metrics_count];
}
list->nr_entries = 0;
err:
iio_root_ports_list_free(list);
free(iostat_cmd);
return ret;
}
int iostat_prepare(struct evlist *evlist, struct perf_stat_config *config)
{
if (evlist->core.nr_entries > 0) {
pr_warning("The -e and -M options are not supported."
"All chosen events/metrics will be dropped\n");
evlist__delete(evlist);
evlist = evlist__new();
if (!evlist)
return -ENOMEM;
}
config->metric_only = true;
config->aggr_mode = AGGR_GLOBAL;
return iostat_event_group(evlist, root_ports);
}
int iostat_parse(const struct option *opt, const char *str,
int unset __maybe_unused)
{
int ret;
struct perf_stat_config *config = (struct perf_stat_config *)opt->data;
ret = iio_root_ports_scan(&root_ports);
if (!ret) {
config->iostat_run = true;
if (!str)
iostat_mode = IOSTAT_RUN;
else if (!strcmp(str, "list"))
iostat_mode = IOSTAT_LIST;
else {
iostat_mode = IOSTAT_RUN;
ret = iio_root_ports_list_filter(&root_ports, str);
}
}
return ret;
}
void iostat_list(struct evlist *evlist, struct perf_stat_config *config)
{
struct evsel *evsel;
struct iio_root_port *rp = NULL;
evlist__for_each_entry(evlist, evsel) {
if (rp != evsel->priv) {
rp = evsel->priv;
iio_root_port_show(config->output, rp);
}
}
}
void iostat_release(struct evlist *evlist)
{
struct evsel *evsel;
struct iio_root_port *rp = NULL;
evlist__for_each_entry(evlist, evsel) {
if (rp != evsel->priv) {
rp = evsel->priv;
free(evsel->priv);
}
}
}
void iostat_prefix(struct evlist *evlist,
struct perf_stat_config *config,
char *prefix, struct timespec *ts)
{
struct iio_root_port *rp = evlist->selected->priv;
if (rp) {
if (ts)
sprintf(prefix, "%6lu.%09lu%s%04x:%02x%s",
ts->tv_sec, ts->tv_nsec,
config->csv_sep, rp->domain, rp->bus,
config->csv_sep);
else
sprintf(prefix, "%04x:%02x%s", rp->domain, rp->bus,
config->csv_sep);
}
}
void iostat_print_header_prefix(struct perf_stat_config *config)
{
if (config->csv_output)
fputs("port,", config->output);
else if (config->interval)
fprintf(config->output, "# time port ");
else
fprintf(config->output, " port ");
}
void iostat_print_metric(struct perf_stat_config *config, struct evsel *evsel,
struct perf_stat_output_ctx *out)
{
double iostat_value = 0;
u64 prev_count_val = 0;
const char *iostat_metric = iostat_metric_by_idx(evsel->idx);
u8 die = ((struct iio_root_port *)evsel->priv)->die;
struct perf_counts_values *count = perf_counts(evsel->counts, die, 0);
if (count->run && count->ena) {
if (evsel->prev_raw_counts && !out->force_header) {
struct perf_counts_values *prev_count =
perf_counts(evsel->prev_raw_counts, die, 0);
prev_count_val = prev_count->val;
prev_count->val = count->val;
}
iostat_value = (count->val - prev_count_val) /
((double) count->run / count->ena);
}
out->print_metric(config, out->ctx, NULL, "%8.0f", iostat_metric,
iostat_value / (256 * 1024));
}
void iostat_print_counters(struct evlist *evlist,
struct perf_stat_config *config, struct timespec *ts,
char *prefix, iostat_print_counter_t print_cnt_cb)
{
void *perf_device = NULL;
struct evsel *counter = evlist__first(evlist);
evlist__set_selected(evlist, counter);
iostat_prefix(evlist, config, prefix, ts);
fprintf(config->output, "%s", prefix);
evlist__for_each_entry(evlist, counter) {
perf_device = evlist->selected->priv;
if (perf_device && perf_device != counter->priv) {
evlist__set_selected(evlist, counter);
iostat_prefix(evlist, config, prefix, ts);
fprintf(config->output, "\n%s", prefix);
}
print_cnt_cb(config, counter, prefix);
}
fputc('\n', config->output);
}

View File

@ -165,7 +165,7 @@ static int sdt_init_op_regex(void)
/*
* Max x86 register name length is 5(ex: %r15d). So, 6th char
* should always contain NULL. This helps to find register name
* length using strlen, insted of maintaing one more variable.
* length using strlen, instead of maintaining one more variable.
*/
#define SDT_REG_NAME_SIZE 6
@ -207,7 +207,7 @@ int arch_sdt_arg_parse_op(char *old_op, char **new_op)
* and displacement 0 (Both sign and displacement 0 are
* optional so it may be empty). Use one more character
* to hold last NULL so that strlen can be used to find
* prefix length, instead of maintaing one more variable.
* prefix length, instead of maintaining one more variable.
*/
char prefix[3] = {0};

View File

@ -17,7 +17,7 @@
* While the second model, enabled via --multiq option, uses multiple
* queueing (which refers to one epoll instance per worker). For example,
* short lived tcp connections in a high throughput httpd server will
* ditribute the accept()'ing connections across CPUs. In this case each
* distribute the accept()'ing connections across CPUs. In this case each
* worker does a limited amount of processing.
*
* [queue A] ---> [worker]
@ -198,7 +198,7 @@ static void *workerfn(void *arg)
do {
/*
* Block undefinitely waiting for the IN event.
* Block indefinitely waiting for the IN event.
* In order to stress the epoll_wait(2) syscall,
* call it event per event, instead of a larger
* batch (max)limit.

View File

@ -372,7 +372,7 @@ static int inject_build_id(struct bench_data *data, u64 *max_rss)
len += synthesize_flush(data);
}
/* tihs makes the child to finish */
/* this makes the child to finish */
close(data->input_pipe[1]);
wait4(data->pid, &status, 0, &rusage);

View File

@ -42,7 +42,7 @@
#endif
/*
* Regular printout to the terminal, supressed if -q is specified:
* Regular printout to the terminal, suppressed if -q is specified:
*/
#define tprintf(x...) do { if (g && g->p.show_details >= 0) printf(x); } while (0)

View File

@ -239,7 +239,7 @@ static int evsel__add_sample(struct evsel *evsel, struct perf_sample *sample,
}
/*
* XXX filtered samples can still have branch entires pointing into our
* XXX filtered samples can still have branch entries pointing into our
* symbol and are missed.
*/
process_branch_stack(sample->branch_stack, al, sample);
@ -374,13 +374,6 @@ static void hists__find_annotations(struct hists *hists,
} else {
hist_entry__tty_annotate(he, evsel, ann);
nd = rb_next(nd);
/*
* Since we have a hist_entry per IP for the same
* symbol, free he->ms.sym->src to signal we already
* processed this symbol.
*/
zfree(&notes->src->cycles_hist);
zfree(&notes->src);
}
}
}
@ -411,8 +404,8 @@ static int __cmd_annotate(struct perf_annotate *ann)
goto out;
if (dump_trace) {
perf_session__fprintf_nr_events(session, stdout);
evlist__fprintf_nr_events(session->evlist, stdout);
perf_session__fprintf_nr_events(session, stdout, false);
evlist__fprintf_nr_events(session->evlist, stdout, false);
goto out;
}
@ -425,7 +418,7 @@ static int __cmd_annotate(struct perf_annotate *ann)
total_nr_samples = 0;
evlist__for_each_entry(session->evlist, pos) {
struct hists *hists = evsel__hists(pos);
u32 nr_samples = hists->stats.nr_events[PERF_RECORD_SAMPLE];
u32 nr_samples = hists->stats.nr_samples;
if (nr_samples > 0) {
total_nr_samples += nr_samples;
@ -538,6 +531,10 @@ int cmd_annotate(int argc, const char **argv)
"Strip first N entries of source file path name in programs (with --prefix)"),
OPT_STRING(0, "objdump", &annotate.opts.objdump_path, "path",
"objdump binary to use for disassembly and annotations"),
OPT_BOOLEAN(0, "demangle", &symbol_conf.demangle,
"Enable symbol demangling"),
OPT_BOOLEAN(0, "demangle-kernel", &symbol_conf.demangle_kernel,
"Enable kernel symbol demangling"),
OPT_BOOLEAN(0, "group", &symbol_conf.event_group,
"Show event group information together"),
OPT_BOOLEAN(0, "show-total-period", &symbol_conf.show_total_period,
@ -619,14 +616,22 @@ int cmd_annotate(int argc, const char **argv)
setup_browser(true);
if ((use_browser == 1 || annotate.use_stdio2) && annotate.has_br_stack) {
/*
* Events of different processes may correspond to the same
* symbol, we do not care about the processes in annotate,
* set sort order to avoid repeated output.
*/
sort_order = "dso,symbol";
/*
* Set SORT_MODE__BRANCH so that annotate display IPC/Cycle
* if branch info is in perf data in TUI mode.
*/
if ((use_browser == 1 || annotate.use_stdio2) && annotate.has_br_stack)
sort__mode = SORT_MODE__BRANCH;
if (setup_sorting(annotate.session->evlist) < 0)
usage_with_options(annotate_usage, options);
} else {
if (setup_sorting(NULL) < 0)
usage_with_options(annotate_usage, options);
}
if (setup_sorting(NULL) < 0)
usage_with_options(annotate_usage, options);
ret = __cmd_annotate(&annotate);

View File

@ -6,7 +6,6 @@
#include <linux/zalloc.h>
#include <linux/string.h>
#include <linux/limits.h>
#include <linux/string.h>
#include <string.h>
#include <sys/file.h>
#include <signal.h>
@ -24,8 +23,6 @@
#include <sys/signalfd.h>
#include <sys/wait.h>
#include <poll.h>
#include <sys/stat.h>
#include <time.h>
#include "builtin.h"
#include "perf.h"
#include "debug.h"

View File

@ -7,7 +7,6 @@
#include "debug.h"
#include <subcmd/parse-options.h>
#include "data-convert.h"
#include "data-convert-bt.h"
typedef int (*data_cmd_fn_t)(int argc, const char **argv);
@ -55,7 +54,8 @@ static const char * const data_convert_usage[] = {
static int cmd_data_convert(int argc, const char **argv)
{
const char *to_ctf = NULL;
const char *to_json = NULL;
const char *to_ctf = NULL;
struct perf_data_convert_opts opts = {
.force = false,
.all = false,
@ -63,6 +63,7 @@ static int cmd_data_convert(int argc, const char **argv)
const struct option options[] = {
OPT_INCR('v', "verbose", &verbose, "be more verbose"),
OPT_STRING('i', "input", &input_name, "file", "input file name"),
OPT_STRING(0, "to-json", &to_json, NULL, "Convert to JSON format"),
#ifdef HAVE_LIBBABELTRACE_SUPPORT
OPT_STRING(0, "to-ctf", &to_ctf, NULL, "Convert to CTF format"),
OPT_BOOLEAN(0, "tod", &opts.tod, "Convert time to wall clock time"),
@ -72,11 +73,6 @@ static int cmd_data_convert(int argc, const char **argv)
OPT_END()
};
#ifndef HAVE_LIBBABELTRACE_SUPPORT
pr_err("No conversion support compiled in. perf should be compiled with environment variables LIBBABELTRACE=1 and LIBBABELTRACE_DIR=/path/to/libbabeltrace/\n");
return -1;
#endif
argc = parse_options(argc, argv, options,
data_convert_usage, 0);
if (argc) {
@ -84,11 +80,25 @@ static int cmd_data_convert(int argc, const char **argv)
return -1;
}
if (to_json && to_ctf) {
pr_err("You cannot specify both --to-ctf and --to-json.\n");
return -1;
}
if (!to_json && !to_ctf) {
pr_err("You must specify one of --to-ctf or --to-json.\n");
return -1;
}
if (to_json)
return bt_convert__perf2json(input_name, to_json, &opts);
if (to_ctf) {
#ifdef HAVE_LIBBABELTRACE_SUPPORT
return bt_convert__perf2ctf(input_name, to_ctf, &opts);
#else
pr_err("The libbabeltrace support is not compiled in.\n");
pr_err("The libbabeltrace support is not compiled in. perf should be "
"compiled with environment variables LIBBABELTRACE=1 and "
"LIBBABELTRACE_DIR=/path/to/libbabeltrace/\n");
return -1;
#endif
}

View File

@ -1796,7 +1796,7 @@ static int ui_init(void)
data__for_each_file(i, d) {
/*
* Baseline or compute realted columns:
* Baseline or compute related columns:
*
* PERF_HPP_DIFF__BASELINE
* PERF_HPP_DIFF__DELTA

View File

@ -49,7 +49,7 @@ struct lock_stat {
/*
* FIXME: evsel__intval() returns u64,
* so address of lockdep_map should be dealed as 64bit.
* so address of lockdep_map should be treated as 64bit.
* Is there more better solution?
*/
void *addr; /* address of lockdep_map, used as ID */

View File

@ -47,6 +47,8 @@
#include "util/util.h"
#include "util/pfm.h"
#include "util/clockid.h"
#include "util/pmu-hybrid.h"
#include "util/evlist-hybrid.h"
#include "asm/bug.h"
#include "perf.h"
@ -1603,6 +1605,32 @@ static void hit_auxtrace_snapshot_trigger(struct record *rec)
}
}
static void record__uniquify_name(struct record *rec)
{
struct evsel *pos;
struct evlist *evlist = rec->evlist;
char *new_name;
int ret;
if (!perf_pmu__has_hybrid())
return;
evlist__for_each_entry(evlist, pos) {
if (!evsel__is_hybrid(pos))
continue;
if (strchr(pos->name, '/'))
continue;
ret = asprintf(&new_name, "%s/%s/",
pos->pmu_name, pos->name);
if (ret) {
free(pos->name);
pos->name = new_name;
}
}
}
static int __cmd_record(struct record *rec, int argc, const char **argv)
{
int err;
@ -1707,6 +1735,8 @@ static int __cmd_record(struct record *rec, int argc, const char **argv)
if (data->is_pipe && rec->evlist->core.nr_entries == 1)
rec->opts.sample_id = true;
record__uniquify_name(rec);
if (record__open(rec) != 0) {
err = -1;
goto out_child;
@ -1977,9 +2007,13 @@ static int __cmd_record(struct record *rec, int argc, const char **argv)
record__auxtrace_snapshot_exit(rec);
if (forks && workload_exec_errno) {
char msg[STRERR_BUFSIZE];
char msg[STRERR_BUFSIZE], strevsels[2048];
const char *emsg = str_error_r(workload_exec_errno, msg, sizeof(msg));
pr_err("Workload failed: %s\n", emsg);
evlist__scnprintf_evsels(rec->evlist, sizeof(strevsels), strevsels);
pr_err("Failed to collect '%s' for the '%s' workload: %s\n",
strevsels, argv[0], emsg);
err = -1;
goto out_child;
}
@ -2786,10 +2820,19 @@ int cmd_record(int argc, const char **argv)
if (record.opts.overwrite)
record.opts.tail_synthesize = true;
if (rec->evlist->core.nr_entries == 0 &&
__evlist__add_default(rec->evlist, !record.opts.no_samples) < 0) {
pr_err("Not enough memory for event selector list\n");
goto out;
if (rec->evlist->core.nr_entries == 0) {
if (perf_pmu__has_hybrid()) {
err = evlist__add_default_hybrid(rec->evlist,
!record.opts.no_samples);
} else {
err = __evlist__add_default(rec->evlist,
!record.opts.no_samples);
}
if (err < 0) {
pr_err("Not enough memory for event selector list\n");
goto out;
}
}
if (rec->opts.target.tid && !rec->opts.no_inherit_set)

View File

@ -84,6 +84,8 @@ struct report {
bool nonany_branch_mode;
bool group_set;
bool stitch_lbr;
bool disable_order;
bool skip_empty;
int max_stack;
struct perf_read_values show_threads_values;
struct annotation_options annotation_opts;
@ -134,6 +136,11 @@ static int report__config(const char *var, const char *value, void *cb)
return 0;
}
if (!strcmp(var, "report.skip-empty")) {
rep->skip_empty = perf_config_bool(var, value);
return 0;
}
return 0;
}
@ -435,7 +442,7 @@ static size_t hists__fprintf_nr_sample_events(struct hists *hists, struct report
{
size_t ret;
char unit;
unsigned long nr_samples = hists->stats.nr_events[PERF_RECORD_SAMPLE];
unsigned long nr_samples = hists->stats.nr_samples;
u64 nr_events = hists->stats.total_period;
struct evsel *evsel = hists_to_evsel(hists);
char buf[512];
@ -463,7 +470,7 @@ static size_t hists__fprintf_nr_sample_events(struct hists *hists, struct report
nr_samples += pos_hists->stats.nr_non_filtered_samples;
nr_events += pos_hists->stats.total_non_filtered_period;
} else {
nr_samples += pos_hists->stats.nr_events[PERF_RECORD_SAMPLE];
nr_samples += pos_hists->stats.nr_samples;
nr_events += pos_hists->stats.total_period;
}
}
@ -529,6 +536,9 @@ static int evlist__tty_browse_hists(struct evlist *evlist, struct report *rep, c
if (symbol_conf.event_group && !evsel__is_group_leader(pos))
continue;
if (rep->skip_empty && !hists->stats.nr_samples)
continue;
hists__fprintf_nr_sample_events(hists, rep, evname, stdout);
if (rep->total_cycles_mode) {
@ -707,9 +717,22 @@ static void report__output_resort(struct report *rep)
ui_progress__finish();
}
static int count_sample_event(struct perf_tool *tool __maybe_unused,
union perf_event *event __maybe_unused,
struct perf_sample *sample __maybe_unused,
struct evsel *evsel,
struct machine *machine __maybe_unused)
{
struct hists *hists = evsel__hists(evsel);
hists__inc_nr_events(hists);
return 0;
}
static void stats_setup(struct report *rep)
{
memset(&rep->tool, 0, sizeof(rep->tool));
rep->tool.sample = count_sample_event;
rep->tool.no_warn = true;
}
@ -717,7 +740,8 @@ static int stats_print(struct report *rep)
{
struct perf_session *session = rep->session;
perf_session__fprintf_nr_events(session, stdout);
perf_session__fprintf_nr_events(session, stdout, rep->skip_empty);
evlist__fprintf_nr_events(session->evlist, stdout, rep->skip_empty);
return 0;
}
@ -929,8 +953,10 @@ static int __cmd_report(struct report *rep)
perf_session__fprintf_dsos(session, stdout);
if (dump_trace) {
perf_session__fprintf_nr_events(session, stdout);
evlist__fprintf_nr_events(session->evlist, stdout);
perf_session__fprintf_nr_events(session, stdout,
rep->skip_empty);
evlist__fprintf_nr_events(session->evlist, stdout,
rep->skip_empty);
return 0;
}
}
@ -1139,6 +1165,7 @@ int cmd_report(int argc, const char **argv)
.pretty_printing_style = "normal",
.socket_filter = -1,
.annotation_opts = annotation__default_options,
.skip_empty = true,
};
const struct option options[] = {
OPT_STRING('i', "input", &input_name, "file",
@ -1296,6 +1323,10 @@ int cmd_report(int argc, const char **argv)
OPTS_EVSWITCH(&report.evswitch),
OPT_BOOLEAN(0, "total-cycles", &report.total_cycles_mode,
"Sort all blocks by 'Sampled Cycles%'"),
OPT_BOOLEAN(0, "disable-order", &report.disable_order,
"Disable raw trace ordering"),
OPT_BOOLEAN(0, "skip-empty", &report.skip_empty,
"Do not display empty (or dummy) events in the output"),
OPT_END()
};
struct perf_data data = {
@ -1329,7 +1360,7 @@ int cmd_report(int argc, const char **argv)
if (report.mmaps_mode)
report.tasks_mode = true;
if (dump_trace)
if (dump_trace && report.disable_order)
report.tool.ordered_events = false;
if (quiet)

View File

@ -1712,7 +1712,7 @@ static int perf_sched__process_fork_event(struct perf_tool *tool,
{
struct perf_sched *sched = container_of(tool, struct perf_sched, tool);
/* run the fork event through the perf machineruy */
/* run the fork event through the perf machinery */
perf_event__process_fork(tool, event, sample, machine);
/* and then run additional processing needed for this command */

View File

@ -314,8 +314,7 @@ static inline struct evsel_script *evsel_script(struct evsel *evsel)
return (struct evsel_script *)evsel->priv;
}
static struct evsel_script *perf_evsel_script__new(struct evsel *evsel,
struct perf_data *data)
static struct evsel_script *evsel_script__new(struct evsel *evsel, struct perf_data *data)
{
struct evsel_script *es = zalloc(sizeof(*es));
@ -335,7 +334,7 @@ static struct evsel_script *perf_evsel_script__new(struct evsel *evsel,
return NULL;
}
static void perf_evsel_script__delete(struct evsel_script *es)
static void evsel_script__delete(struct evsel_script *es)
{
zfree(&es->filename);
fclose(es->fp);
@ -343,7 +342,7 @@ static void perf_evsel_script__delete(struct evsel_script *es)
free(es);
}
static int perf_evsel_script__fprintf(struct evsel_script *es, FILE *fp)
static int evsel_script__fprintf(struct evsel_script *es, FILE *fp)
{
struct stat st;
@ -2219,8 +2218,7 @@ static int process_attr(struct perf_tool *tool, union perf_event *event,
if (!evsel->priv) {
if (scr->per_event_dump) {
evsel->priv = perf_evsel_script__new(evsel,
scr->session->data);
evsel->priv = evsel_script__new(evsel, scr->session->data);
} else {
es = zalloc(sizeof(*es));
if (!es)
@ -2475,7 +2473,7 @@ static void perf_script__fclose_per_event_dump(struct perf_script *script)
evlist__for_each_entry(evlist, evsel) {
if (!evsel->priv)
break;
perf_evsel_script__delete(evsel->priv);
evsel_script__delete(evsel->priv);
evsel->priv = NULL;
}
}
@ -2488,14 +2486,14 @@ static int perf_script__fopen_per_event_dump(struct perf_script *script)
/*
* Already setup? I.e. we may be called twice in cases like
* Intel PT, one for the intel_pt// and dummy events, then
* for the evsels syntheized from the auxtrace info.
* for the evsels synthesized from the auxtrace info.
*
* Ses perf_script__process_auxtrace_info.
*/
if (evsel->priv != NULL)
continue;
evsel->priv = perf_evsel_script__new(evsel, script->session->data);
evsel->priv = evsel_script__new(evsel, script->session->data);
if (evsel->priv == NULL)
goto out_err_fclose;
}
@ -2530,8 +2528,8 @@ static void perf_script__exit_per_event_dump_stats(struct perf_script *script)
evlist__for_each_entry(script->session->evlist, evsel) {
struct evsel_script *es = evsel->priv;
perf_evsel_script__fprintf(es, stdout);
perf_evsel_script__delete(es);
evsel_script__fprintf(es, stdout);
evsel_script__delete(es);
evsel->priv = NULL;
}
}
@ -3085,7 +3083,7 @@ static int list_available_scripts(const struct option *opt __maybe_unused,
*
* Fixme: All existing "xxx-record" are all in good formats "-e event ",
* which is covered well now. And new parsing code should be added to
* cover the future complexing formats like event groups etc.
* cover the future complex formats like event groups etc.
*/
static int check_ev_match(char *dir_name, char *scriptname,
struct perf_session *session)

View File

@ -48,6 +48,7 @@
#include "util/pmu.h"
#include "util/event.h"
#include "util/evlist.h"
#include "util/evlist-hybrid.h"
#include "util/evsel.h"
#include "util/debug.h"
#include "util/color.h"
@ -68,6 +69,8 @@
#include "util/affinity.h"
#include "util/pfm.h"
#include "util/bpf_counter.h"
#include "util/iostat.h"
#include "util/pmu-hybrid.h"
#include "asm/bug.h"
#include <linux/time64.h>
@ -160,6 +163,7 @@ static const char *smi_cost_attrs = {
};
static struct evlist *evsel_list;
static bool all_counters_use_bpf = true;
static struct target target = {
.uid = UINT_MAX,
@ -212,7 +216,8 @@ static struct perf_stat_config stat_config = {
.walltime_nsecs_stats = &walltime_nsecs_stats,
.big_num = true,
.ctl_fd = -1,
.ctl_fd_ack = -1
.ctl_fd_ack = -1,
.iostat_run = false,
};
static bool cpus_map_matched(struct evsel *a, struct evsel *b)
@ -239,6 +244,9 @@ static void evlist__check_cpu_maps(struct evlist *evlist)
struct evsel *evsel, *pos, *leader;
char buf[1024];
if (evlist__has_hybrid(evlist))
evlist__warn_hybrid_group(evlist);
evlist__for_each_entry(evlist, evsel) {
leader = evsel->leader;
@ -399,6 +407,9 @@ static int read_affinity_counters(struct timespec *rs)
struct affinity affinity;
int i, ncpus, cpu;
if (all_counters_use_bpf)
return 0;
if (affinity__setup(&affinity) < 0)
return -1;
@ -413,6 +424,8 @@ static int read_affinity_counters(struct timespec *rs)
evlist__for_each_entry(evsel_list, counter) {
if (evsel__cpu_iter_skip(counter, cpu))
continue;
if (evsel__is_bpf(counter))
continue;
if (!counter->err) {
counter->err = read_counter_cpu(counter, rs,
counter->cpu_iter - 1);
@ -429,6 +442,9 @@ static int read_bpf_map_counters(void)
int err;
evlist__for_each_entry(evsel_list, counter) {
if (!evsel__is_bpf(counter))
continue;
err = bpf_counter__read(counter);
if (err)
return err;
@ -439,14 +455,10 @@ static int read_bpf_map_counters(void)
static void read_counters(struct timespec *rs)
{
struct evsel *counter;
int err;
if (!stat_config.stop_read_counter) {
if (target__has_bpf(&target))
err = read_bpf_map_counters();
else
err = read_affinity_counters(rs);
if (err < 0)
if (read_bpf_map_counters() ||
read_affinity_counters(rs))
return;
}
@ -535,12 +547,13 @@ static int enable_counters(void)
struct evsel *evsel;
int err;
if (target__has_bpf(&target)) {
evlist__for_each_entry(evsel_list, evsel) {
err = bpf_counter__enable(evsel);
if (err)
return err;
}
evlist__for_each_entry(evsel_list, evsel) {
if (!evsel__is_bpf(evsel))
continue;
err = bpf_counter__enable(evsel);
if (err)
return err;
}
if (stat_config.initial_delay < 0) {
@ -784,14 +797,20 @@ static int __run_perf_stat(int argc, const char **argv, int run_idx)
if (affinity__setup(&affinity) < 0)
return -1;
if (target__has_bpf(&target)) {
evlist__for_each_entry(evsel_list, counter) {
if (bpf_counter__load(counter, &target))
return -1;
}
evlist__for_each_entry(evsel_list, counter) {
if (bpf_counter__load(counter, &target))
return -1;
if (!evsel__is_bpf(counter))
all_counters_use_bpf = false;
}
evlist__for_each_cpu (evsel_list, i, cpu) {
/*
* bperf calls evsel__open_per_cpu() in bperf__load(), so
* no need to call it again here.
*/
if (target.use_bpf)
break;
affinity__set(&affinity, cpu);
evlist__for_each_entry(evsel_list, counter) {
@ -799,6 +818,8 @@ static int __run_perf_stat(int argc, const char **argv, int run_idx)
continue;
if (counter->reset_group || counter->errored)
continue;
if (evsel__is_bpf(counter))
continue;
try_again:
if (create_perf_stat_counter(counter, &stat_config, &target,
counter->cpu_iter - 1) < 0) {
@ -925,15 +946,15 @@ static int __run_perf_stat(int argc, const char **argv, int run_idx)
/*
* Enable counters and exec the command:
*/
t0 = rdclock();
clock_gettime(CLOCK_MONOTONIC, &ref_time);
if (forks) {
evlist__start_workload(evsel_list);
err = enable_counters();
if (err)
return -1;
t0 = rdclock();
clock_gettime(CLOCK_MONOTONIC, &ref_time);
if (interval || timeout || evlist__ctlfd_initialized(evsel_list))
status = dispatch_events(forks, timeout, interval, &times);
if (child_pid != -1) {
@ -954,6 +975,10 @@ static int __run_perf_stat(int argc, const char **argv, int run_idx)
err = enable_counters();
if (err)
return -1;
t0 = rdclock();
clock_gettime(CLOCK_MONOTONIC, &ref_time);
status = dispatch_events(forks, timeout, interval, &times);
}
@ -1083,6 +1108,11 @@ void perf_stat__set_big_num(int set)
stat_config.big_num = (set != 0);
}
void perf_stat__set_no_csv_summary(int set)
{
stat_config.no_csv_summary = (set != 0);
}
static int stat__set_big_num(const struct option *opt __maybe_unused,
const char *s __maybe_unused, int unset)
{
@ -1146,6 +1176,10 @@ static struct option stat_options[] = {
#ifdef HAVE_BPF_SKEL
OPT_STRING('b', "bpf-prog", &target.bpf_str, "bpf-prog-id",
"stat events on existing bpf program id"),
OPT_BOOLEAN(0, "bpf-counters", &target.use_bpf,
"use bpf program to count events"),
OPT_STRING(0, "bpf-attr-map", &target.attr_map, "attr-map-path",
"path to perf_event_attr map"),
#endif
OPT_BOOLEAN('a', "all-cpus", &target.system_wide,
"system-wide collection from all CPUs"),
@ -1235,6 +1269,8 @@ static struct option stat_options[] = {
"threads of same physical core"),
OPT_BOOLEAN(0, "summary", &stat_config.summary,
"print summary for interval mode"),
OPT_BOOLEAN(0, "no-csv-summary", &stat_config.no_csv_summary,
"don't print 'summary' for CSV summary output"),
OPT_BOOLEAN(0, "quiet", &stat_config.quiet,
"don't print output (useful with record)"),
#ifdef HAVE_LIBPFM
@ -1247,6 +1283,9 @@ static struct option stat_options[] = {
"\t\t\t Optionally send control command completion ('ack\\n') to ack-fd descriptor.\n"
"\t\t\t Alternatively, ctl-fifo / ack-fifo will be opened and used as ctl-fd / ack-fd.",
parse_control_option),
OPT_CALLBACK_OPTARG(0, "iostat", &evsel_list, &stat_config, "default",
"measure I/O performance metrics provided by arch/platform",
iostat_parse),
OPT_END()
};
@ -1604,6 +1643,12 @@ static int add_default_attributes(void)
{ .type = PERF_TYPE_HARDWARE, .config = PERF_COUNT_HW_BRANCH_INSTRUCTIONS },
{ .type = PERF_TYPE_HARDWARE, .config = PERF_COUNT_HW_BRANCH_MISSES },
};
struct perf_event_attr default_sw_attrs[] = {
{ .type = PERF_TYPE_SOFTWARE, .config = PERF_COUNT_SW_TASK_CLOCK },
{ .type = PERF_TYPE_SOFTWARE, .config = PERF_COUNT_SW_CONTEXT_SWITCHES },
{ .type = PERF_TYPE_SOFTWARE, .config = PERF_COUNT_SW_CPU_MIGRATIONS },
{ .type = PERF_TYPE_SOFTWARE, .config = PERF_COUNT_SW_PAGE_FAULTS },
};
/*
@ -1705,7 +1750,7 @@ static int add_default_attributes(void)
bzero(&errinfo, sizeof(errinfo));
if (transaction_run) {
/* Handle -T as -M transaction. Once platform specific metrics
* support has been added to the json files, all archictures
* support has been added to the json files, all architectures
* will use this approach. To determine transaction support
* on an architecture test for such a metric name.
*/
@ -1841,6 +1886,28 @@ static int add_default_attributes(void)
}
if (!evsel_list->core.nr_entries) {
if (perf_pmu__has_hybrid()) {
const char *hybrid_str = "cycles,instructions,branches,branch-misses";
if (target__has_cpu(&target))
default_sw_attrs[0].config = PERF_COUNT_SW_CPU_CLOCK;
if (evlist__add_default_attrs(evsel_list,
default_sw_attrs) < 0) {
return -1;
}
err = parse_events(evsel_list, hybrid_str, &errinfo);
if (err) {
fprintf(stderr,
"Cannot set up hybrid events %s: %d\n",
hybrid_str, err);
parse_events_print_error(&errinfo, hybrid_str);
return -1;
}
return err;
}
if (target__has_cpu(&target))
default_attrs0[0].config = PERF_COUNT_SW_CPU_CLOCK;
@ -2320,6 +2387,17 @@ int cmd_stat(int argc, const char **argv)
goto out;
}
if (stat_config.iostat_run) {
status = iostat_prepare(evsel_list, &stat_config);
if (status)
goto out;
if (iostat_mode == IOSTAT_LIST) {
iostat_list(evsel_list, &stat_config);
goto out;
} else if (verbose)
iostat_list(evsel_list, &stat_config);
}
if (add_default_attributes())
goto out;
@ -2357,6 +2435,9 @@ int cmd_stat(int argc, const char **argv)
evlist__check_cpu_maps(evsel_list);
if (perf_pmu__has_hybrid())
stat_config.no_merge = true;
/*
* Initialize thread_map with comm names,
* so we could print it out on output.
@ -2459,7 +2540,7 @@ int cmd_stat(int argc, const char **argv)
/*
* We synthesize the kernel mmap record just so that older tools
* don't emit warnings about not being able to resolve symbols
* due to /proc/sys/kernel/kptr_restrict settings and instear provide
* due to /proc/sys/kernel/kptr_restrict settings and instead provide
* a saner message about no samples being in the perf.data file.
*
* This also serves to suppress a warning about f_header.data.size == 0
@ -2495,6 +2576,9 @@ int cmd_stat(int argc, const char **argv)
perf_stat__exit_aggr_mode();
evlist__free_stats(evsel_list);
out:
if (stat_config.iostat_run)
iostat_release(evsel_list);
zfree(&stat_config.walltime_run);
if (smi_cost && smi_reset)

View File

@ -328,13 +328,13 @@ static void perf_top__print_sym_table(struct perf_top *top)
printf("%-*.*s\n", win_width, win_width, graph_dotted_line);
if (!top->record_opts.overwrite &&
(hists->stats.nr_lost_warned !=
hists->stats.nr_events[PERF_RECORD_LOST])) {
hists->stats.nr_lost_warned =
hists->stats.nr_events[PERF_RECORD_LOST];
(top->evlist->stats.nr_lost_warned !=
top->evlist->stats.nr_events[PERF_RECORD_LOST])) {
top->evlist->stats.nr_lost_warned =
top->evlist->stats.nr_events[PERF_RECORD_LOST];
color_fprintf(stdout, PERF_COLOR_RED,
"WARNING: LOST %d chunks, Check IO/CPU overload",
hists->stats.nr_lost_warned);
top->evlist->stats.nr_lost_warned);
++printed;
}
@ -852,11 +852,9 @@ static void
perf_top__process_lost(struct perf_top *top, union perf_event *event,
struct evsel *evsel)
{
struct hists *hists = evsel__hists(evsel);
top->lost += event->lost.lost;
top->lost_total += event->lost.lost;
hists->stats.total_lost += event->lost.lost;
evsel->evlist->stats.total_lost += event->lost.lost;
}
static void
@ -864,11 +862,9 @@ perf_top__process_lost_samples(struct perf_top *top,
union perf_event *event,
struct evsel *evsel)
{
struct hists *hists = evsel__hists(evsel);
top->lost += event->lost_samples.lost;
top->lost_total += event->lost_samples.lost;
hists->stats.total_lost_samples += event->lost_samples.lost;
evsel->evlist->stats.total_lost_samples += event->lost_samples.lost;
}
static u64 last_timestamp;
@ -1205,7 +1201,7 @@ static int deliver_event(struct ordered_events *qe,
} else if (event->header.type == PERF_RECORD_LOST_SAMPLES) {
perf_top__process_lost_samples(top, event, evsel);
} else if (event->header.type < PERF_RECORD_MAX) {
hists__inc_nr_events(evsel__hists(evsel), event->header.type);
events_stats__inc(&session->evlist->stats, event->header.type);
machine__process_event(machine, event, &sample);
} else
++session->evlist->stats.nr_unknown_events;
@ -1607,7 +1603,7 @@ int cmd_top(int argc, const char **argv)
if (status) {
/*
* Some arches do not provide a get_cpuid(), so just use pr_debug, otherwise
* warn the user explicitely.
* warn the user explicitly.
*/
eprintf(status == ENOSYS ? 1 : 0, verbose,
"Couldn't read the cpuid for this machine: %s\n",

View File

@ -153,6 +153,7 @@ check lib/ctype.c '-I "^EXPORT_SYMBOL" -I "^#include <linux/export.h>" -B
check_2 tools/perf/arch/x86/entry/syscalls/syscall_64.tbl arch/x86/entry/syscalls/syscall_64.tbl
check_2 tools/perf/arch/powerpc/entry/syscalls/syscall.tbl arch/powerpc/kernel/syscalls/syscall.tbl
check_2 tools/perf/arch/s390/entry/syscalls/syscall.tbl arch/s390/kernel/syscalls/syscall.tbl
check_2 tools/perf/arch/mips/entry/syscalls/syscall_n64.tbl arch/mips/kernel/syscalls/syscall_n64.tbl
for i in $BEAUTY_FILES; do
beauty_check $i -B

View File

@ -14,6 +14,7 @@ perf-config mainporcelain common
perf-evlist mainporcelain common
perf-ftrace mainporcelain common
perf-inject mainporcelain common
perf-iostat mainporcelain common
perf-kallsyms mainporcelain common
perf-kmem mainporcelain common
perf-kvm mainporcelain common

View File

@ -262,7 +262,7 @@ int sys_enter(struct syscall_enter_args *args)
/*
* Jump to syscall specific augmenter, even if the default one,
* "!raw_syscalls:unaugmented" that will just return 1 to return the
* unagmented tracepoint payload.
* unaugmented tracepoint payload.
*/
bpf_tail_call(args, &syscalls_sys_enter, augmented_args->args.syscall_nr);
@ -282,7 +282,7 @@ int sys_exit(struct syscall_exit_args *args)
/*
* Jump to syscall specific return augmenter, even if the default one,
* "!raw_syscalls:unaugmented" that will just return 1 to return the
* unagmented tracepoint payload.
* unaugmented tracepoint payload.
*/
bpf_tail_call(args, &syscalls_sys_exit, exit_args.syscall_nr);
/*

View File

@ -390,7 +390,7 @@ jvmti_write_code(void *agent, char const *sym,
rec.p.total_size += size;
/*
* If JVM is multi-threaded, nultiple concurrent calls to agent
* If JVM is multi-threaded, multiple concurrent calls to agent
* may be possible, so protect file writes
*/
flockfile(fp);
@ -457,7 +457,7 @@ jvmti_write_debug_info(void *agent, uint64_t code,
rec.p.total_size = size;
/*
* If JVM is multi-threaded, nultiple concurrent calls to agent
* If JVM is multi-threaded, multiple concurrent calls to agent
* may be possible, so protect file writes
*/
flockfile(fp);

12
tools/perf/perf-iostat.sh Normal file
View File

@ -0,0 +1,12 @@
#!/bin/bash
# SPDX-License-Identifier: GPL-2.0
# perf iostat
# Alexander Antonov <alexander.antonov@linux.intel.com>
if [[ "$1" == "list" ]] || [[ "$1" =~ ([a-f0-9A-F]{1,}):([a-f0-9A-F]{1,2})(,)? ]]; then
DELIMITER="="
else
DELIMITER=" "
fi
perf stat --iostat$DELIMITER$*

View File

@ -209,12 +209,24 @@
"EventName": "L2D_TLB_REFILL",
"BriefDescription": "Attributable Level 2 data TLB refill"
},
{
"PublicDescription": "Attributable Level 2 instruction TLB refill.",
"EventCode": "0x2E",
"EventName": "L2I_TLB_REFILL",
"BriefDescription": "Attributable Level 2 instruction TLB refill."
},
{
"PublicDescription": "Attributable Level 2 data or unified TLB access",
"EventCode": "0x2F",
"EventName": "L2D_TLB",
"BriefDescription": "Attributable Level 2 data or unified TLB access"
},
{
"PublicDescription": "Attributable Level 2 instruction TLB access.",
"EventCode": "0x30",
"EventName": "L2I_TLB",
"BriefDescription": "Attributable Level 2 instruction TLB access."
},
{
"PublicDescription": "Access to another socket in a multi-socket system",
"EventCode": "0x31",
@ -244,5 +256,221 @@
"EventCode": "0x37",
"EventName": "LL_CACHE_MISS_RD",
"BriefDescription": "Last level cache miss, read"
},
{
"PublicDescription": "SIMD Instruction architecturally executed.",
"EventCode": "0x8000",
"EventName": "SIMD_INST_RETIRED",
"BriefDescription": "SIMD Instruction architecturally executed."
},
{
"PublicDescription": "Instruction architecturally executed, SVE.",
"EventCode": "0x8002",
"EventName": "SVE_INST_RETIRED",
"BriefDescription": "Instruction architecturally executed, SVE."
},
{
"PublicDescription": "Microarchitectural operation, Operations speculatively executed.",
"EventCode": "0x8008",
"EventName": "UOP_SPEC",
"BriefDescription": "Microarchitectural operation, Operations speculatively executed."
},
{
"PublicDescription": "SVE Math accelerator Operations speculatively executed.",
"EventCode": "0x800E",
"EventName": "SVE_MATH_SPEC",
"BriefDescription": "SVE Math accelerator Operations speculatively executed."
},
{
"PublicDescription": "Floating-point Operations speculatively executed.",
"EventCode": "0x8010",
"EventName": "FP_SPEC",
"BriefDescription": "Floating-point Operations speculatively executed."
},
{
"PublicDescription": "Floating-point FMA Operations speculatively executed.",
"EventCode": "0x8028",
"EventName": "FP_FMA_SPEC",
"BriefDescription": "Floating-point FMA Operations speculatively executed."
},
{
"PublicDescription": "Floating-point reciprocal estimate Operations speculatively executed.",
"EventCode": "0x8034",
"EventName": "FP_RECPE_SPEC",
"BriefDescription": "Floating-point reciprocal estimate Operations speculatively executed."
},
{
"PublicDescription": "floating-point convert Operations speculatively executed.",
"EventCode": "0x8038",
"EventName": "FP_CVT_SPEC",
"BriefDescription": "floating-point convert Operations speculatively executed."
},
{
"PublicDescription": "Advanced SIMD and SVE integer Operations speculatively executed.",
"EventCode": "0x8043",
"EventName": "ASE_SVE_INT_SPEC",
"BriefDescription": "Advanced SIMD and SVE integer Operations speculatively executed."
},
{
"PublicDescription": "SVE predicated Operations speculatively executed.",
"EventCode": "0x8074",
"EventName": "SVE_PRED_SPEC",
"BriefDescription": "SVE predicated Operations speculatively executed."
},
{
"PublicDescription": "SVE MOVPRFX Operations speculatively executed.",
"EventCode": "0x807C",
"EventName": "SVE_MOVPRFX_SPEC",
"BriefDescription": "SVE MOVPRFX Operations speculatively executed."
},
{
"PublicDescription": "SVE MOVPRFX unfused Operations speculatively executed.",
"EventCode": "0x807F",
"EventName": "SVE_MOVPRFX_U_SPEC",
"BriefDescription": "SVE MOVPRFX unfused Operations speculatively executed."
},
{
"PublicDescription": "Advanced SIMD and SVE load Operations speculatively executed.",
"EventCode": "0x8085",
"EventName": "ASE_SVE_LD_SPEC",
"BriefDescription": "Advanced SIMD and SVE load Operations speculatively executed."
},
{
"PublicDescription": "Advanced SIMD and SVE store Operations speculatively executed.",
"EventCode": "0x8086",
"EventName": "ASE_SVE_ST_SPEC",
"BriefDescription": "Advanced SIMD and SVE store Operations speculatively executed."
},
{
"PublicDescription": "Prefetch Operations speculatively executed.",
"EventCode": "0x8087",
"EventName": "PRF_SPEC",
"BriefDescription": "Prefetch Operations speculatively executed."
},
{
"PublicDescription": "General-purpose register load Operations speculatively executed.",
"EventCode": "0x8089",
"EventName": "BASE_LD_REG_SPEC",
"BriefDescription": "General-purpose register load Operations speculatively executed."
},
{
"PublicDescription": "General-purpose register store Operations speculatively executed.",
"EventCode": "0x808A",
"EventName": "BASE_ST_REG_SPEC",
"BriefDescription": "General-purpose register store Operations speculatively executed."
},
{
"PublicDescription": "SVE unpredicated load register Operations speculatively executed.",
"EventCode": "0x8091",
"EventName": "SVE_LDR_REG_SPEC",
"BriefDescription": "SVE unpredicated load register Operations speculatively executed."
},
{
"PublicDescription": "SVE unpredicated store register Operations speculatively executed.",
"EventCode": "0x8092",
"EventName": "SVE_STR_REG_SPEC",
"BriefDescription": "SVE unpredicated store register Operations speculatively executed."
},
{
"PublicDescription": "SVE load predicate register Operations speculatively executed.",
"EventCode": "0x8095",
"EventName": "SVE_LDR_PREG_SPEC",
"BriefDescription": "SVE load predicate register Operations speculatively executed."
},
{
"PublicDescription": "SVE store predicate register Operations speculatively executed.",
"EventCode": "0x8096",
"EventName": "SVE_STR_PREG_SPEC",
"BriefDescription": "SVE store predicate register Operations speculatively executed."
},
{
"PublicDescription": "SVE contiguous prefetch element Operations speculatively executed.",
"EventCode": "0x809F",
"EventName": "SVE_PRF_CONTIG_SPEC",
"BriefDescription": "SVE contiguous prefetch element Operations speculatively executed."
},
{
"PublicDescription": "Advanced SIMD and SVE contiguous load multiple vector Operations speculatively executed.",
"EventCode": "0x80A5",
"EventName": "ASE_SVE_LD_MULTI_SPEC",
"BriefDescription": "Advanced SIMD and SVE contiguous load multiple vector Operations speculatively executed."
},
{
"PublicDescription": "Advanced SIMD and SVE contiguous store multiple vector Operations speculatively executed.",
"EventCode": "0x80A6",
"EventName": "ASE_SVE_ST_MULTI_SPEC",
"BriefDescription": "Advanced SIMD and SVE contiguous store multiple vector Operations speculatively executed."
},
{
"PublicDescription": "SVE gather-load Operations speculatively executed.",
"EventCode": "0x80AD",
"EventName": "SVE_LD_GATHER_SPEC",
"BriefDescription": "SVE gather-load Operations speculatively executed."
},
{
"PublicDescription": "SVE scatter-store Operations speculatively executed.",
"EventCode": "0x80AE",
"EventName": "SVE_ST_SCATTER_SPEC",
"BriefDescription": "SVE scatter-store Operations speculatively executed."
},
{
"PublicDescription": "SVE gather-prefetch Operations speculatively executed.",
"EventCode": "0x80AF",
"EventName": "SVE_PRF_GATHER_SPEC",
"BriefDescription": "SVE gather-prefetch Operations speculatively executed."
},
{
"PublicDescription": "SVE First-fault load Operations speculatively executed.",
"EventCode": "0x80BC",
"EventName": "SVE_LDFF_SPEC",
"BriefDescription": "SVE First-fault load Operations speculatively executed."
},
{
"PublicDescription": "Scalable floating-point element Operations speculatively executed.",
"EventCode": "0x80C0",
"EventName": "FP_SCALE_OPS_SPEC",
"BriefDescription": "Scalable floating-point element Operations speculatively executed."
},
{
"PublicDescription": "Non-scalable floating-point element Operations speculatively executed.",
"EventCode": "0x80C1",
"EventName": "FP_FIXED_OPS_SPEC",
"BriefDescription": "Non-scalable floating-point element Operations speculatively executed."
},
{
"PublicDescription": "Scalable half-precision floating-point element Operations speculatively executed.",
"EventCode": "0x80C2",
"EventName": "FP_HP_SCALE_OPS_SPEC",
"BriefDescription": "Scalable half-precision floating-point element Operations speculatively executed."
},
{
"PublicDescription": "Non-scalable half-precision floating-point element Operations speculatively executed.",
"EventCode": "0x80C3",
"EventName": "FP_HP_FIXED_OPS_SPEC",
"BriefDescription": "Non-scalable half-precision floating-point element Operations speculatively executed."
},
{
"PublicDescription": "Scalable single-precision floating-point element Operations speculatively executed.",
"EventCode": "0x80C4",
"EventName": "FP_SP_SCALE_OPS_SPEC",
"BriefDescription": "Scalable single-precision floating-point element Operations speculatively executed."
},
{
"PublicDescription": "Non-scalable single-precision floating-point element Operations speculatively executed.",
"EventCode": "0x80C5",
"EventName": "FP_SP_FIXED_OPS_SPEC",
"BriefDescription": "Non-scalable single-precision floating-point element Operations speculatively executed."
},
{
"PublicDescription": "Scalable double-precision floating-point element Operations speculatively executed.",
"EventCode": "0x80C6",
"EventName": "FP_DP_SCALE_OPS_SPEC",
"BriefDescription": "Scalable double-precision floating-point element Operations speculatively executed."
},
{
"PublicDescription": "Non-scalable double-precision floating-point element Operations speculatively executed.",
"EventCode": "0x80C7",
"EventName": "FP_DP_FIXED_OPS_SPEC",
"BriefDescription": "Non-scalable double-precision floating-point element Operations speculatively executed."
}
]

View File

@ -0,0 +1,8 @@
[
{
"ArchStdEvent": "BR_MIS_PRED"
},
{
"ArchStdEvent": "BR_PRED"
}
]

View File

@ -0,0 +1,62 @@
[
{
"PublicDescription": "This event counts read transactions from tofu controller to measured CMG.",
"EventCode": "0x314",
"EventName": "BUS_READ_TOTAL_TOFU",
"BriefDescription": "This event counts read transactions from tofu controller to measured CMG."
},
{
"PublicDescription": "This event counts read transactions from PCI controller to measured CMG.",
"EventCode": "0x315",
"EventName": "BUS_READ_TOTAL_PCI",
"BriefDescription": "This event counts read transactions from PCI controller to measured CMG."
},
{
"PublicDescription": "This event counts read transactions from measured CMG local memory to measured CMG.",
"EventCode": "0x316",
"EventName": "BUS_READ_TOTAL_MEM",
"BriefDescription": "This event counts read transactions from measured CMG local memory to measured CMG."
},
{
"PublicDescription": "This event counts write transactions from measured CMG to CMG0, if measured CMG is not CMG0.",
"EventCode": "0x318",
"EventName": "BUS_WRITE_TOTAL_CMG0",
"BriefDescription": "This event counts write transactions from measured CMG to CMG0, if measured CMG is not CMG0."
},
{
"PublicDescription": "This event counts write transactions from measured CMG to CMG1, if measured CMG is not CMG1.",
"EventCode": "0x319",
"EventName": "BUS_WRITE_TOTAL_CMG1",
"BriefDescription": "This event counts write transactions from measured CMG to CMG1, if measured CMG is not CMG1."
},
{
"PublicDescription": "This event counts write transactions from measured CMG to CMG2, if measured CMG is not CMG2.",
"EventCode": "0x31A",
"EventName": "BUS_WRITE_TOTAL_CMG2",
"BriefDescription": "This event counts write transactions from measured CMG to CMG2, if measured CMG is not CMG2."
},
{
"PublicDescription": "This event counts write transactions from measured CMG to CMG3, if measured CMG is not CMG3.",
"EventCode": "0x31B",
"EventName": "BUS_WRITE_TOTAL_CMG3",
"BriefDescription": "This event counts write transactions from measured CMG to CMG3, if measured CMG is not CMG3."
},
{
"PublicDescription": "This event counts write transactions from measured CMG to tofu controller.",
"EventCode": "0x31C",
"EventName": "BUS_WRITE_TOTAL_TOFU",
"BriefDescription": "This event counts write transactions from measured CMG to tofu controller."
},
{
"PublicDescription": "This event counts write transactions from measured CMG to PCI controller.",
"EventCode": "0x31D",
"EventName": "BUS_WRITE_TOTAL_PCI",
"BriefDescription": "This event counts write transactions from measured CMG to PCI controller."
},
{
"PublicDescription": "This event counts write transactions from measured CMG to measured CMG local memory.",
"EventCode": "0x31E",
"EventName": "BUS_WRITE_TOTAL_MEM",
"BriefDescription": "This event counts write transactions from measured CMG to measured CMG local memory."
}
]

View File

@ -0,0 +1,128 @@
[
{
"ArchStdEvent": "L1I_CACHE_REFILL"
},
{
"ArchStdEvent": "L1I_TLB_REFILL"
},
{
"ArchStdEvent": "L1D_CACHE_REFILL"
},
{
"ArchStdEvent": "L1D_CACHE"
},
{
"ArchStdEvent": "L1D_TLB_REFILL"
},
{
"ArchStdEvent": "L1I_CACHE"
},
{
"ArchStdEvent": "L1D_CACHE_WB"
},
{
"ArchStdEvent": "L2D_CACHE"
},
{
"ArchStdEvent": "L2D_CACHE_REFILL"
},
{
"ArchStdEvent": "L2D_CACHE_WB"
},
{
"ArchStdEvent": "L2D_TLB_REFILL"
},
{
"ArchStdEvent": "L2I_TLB_REFILL"
},
{
"ArchStdEvent": "L2D_TLB"
},
{
"ArchStdEvent": "L2I_TLB"
},
{
"PublicDescription": "This event counts L1D_CACHE_REFILL caused by software or hardware prefetch.",
"EventCode": "0x49",
"EventName": "L1D_CACHE_REFILL_PRF",
"BriefDescription": "This event counts L1D_CACHE_REFILL caused by software or hardware prefetch."
},
{
"PublicDescription": "This event counts L2D_CACHE_REFILL caused by software or hardware prefetch.",
"EventCode": "0x59",
"EventName": "L2D_CACHE_REFILL_PRF",
"BriefDescription": "This event counts L2D_CACHE_REFILL caused by software or hardware prefetch."
},
{
"PublicDescription": "This event counts L1D_CACHE_REFILL caused by demand access.",
"EventCode": "0x200",
"EventName": "L1D_CACHE_REFILL_DM",
"BriefDescription": "This event counts L1D_CACHE_REFILL caused by demand access."
},
{
"PublicDescription": "This event counts L1D_CACHE_REFILL caused by hardware prefetch.",
"EventCode": "0x202",
"EventName": "L1D_CACHE_REFILL_HWPRF",
"BriefDescription": "This event counts L1D_CACHE_REFILL caused by hardware prefetch."
},
{
"PublicDescription": "This event counts outstanding L1D cache miss requests per cycle.",
"EventCode": "0x208",
"EventName": "L1_MISS_WAIT",
"BriefDescription": "This event counts outstanding L1D cache miss requests per cycle."
},
{
"PublicDescription": "This event counts outstanding L1I cache miss requests per cycle.",
"EventCode": "0x209",
"EventName": "L1I_MISS_WAIT",
"BriefDescription": "This event counts outstanding L1I cache miss requests per cycle."
},
{
"PublicDescription": "This event counts L2D_CACHE_REFILL caused by demand access.",
"EventCode": "0x300",
"EventName": "L2D_CACHE_REFILL_DM",
"BriefDescription": "This event counts L2D_CACHE_REFILL caused by demand access."
},
{
"PublicDescription": "This event counts L2D_CACHE_REFILL caused by hardware prefetch.",
"EventCode": "0x302",
"EventName": "L2D_CACHE_REFILL_HWPRF",
"BriefDescription": "This event counts L2D_CACHE_REFILL caused by hardware prefetch."
},
{
"PublicDescription": "This event counts outstanding L2 cache miss requests per cycle.",
"EventCode": "0x308",
"EventName": "L2_MISS_WAIT",
"BriefDescription": "This event counts outstanding L2 cache miss requests per cycle."
},
{
"PublicDescription": "This event counts the number of times of L2 cache miss.",
"EventCode": "0x309",
"EventName": "L2_MISS_COUNT",
"BriefDescription": "This event counts the number of times of L2 cache miss."
},
{
"PublicDescription": "This event counts operations where demand access hits an L2 cache refill buffer allocated by software or hardware prefetch.",
"EventCode": "0x325",
"EventName": "L2D_SWAP_DM",
"BriefDescription": "This event counts operations where demand access hits an L2 cache refill buffer allocated by software or hardware prefetch."
},
{
"PublicDescription": "This event counts operations where software or hardware prefetch hits an L2 cache refill buffer allocated by demand access.",
"EventCode": "0x326",
"EventName": "L2D_CACHE_MIBMCH_PRF",
"BriefDescription": "This event counts operations where software or hardware prefetch hits an L2 cache refill buffer allocated by demand access."
},
{
"PublicDescription": "This event counts operations where demand access hits an L2 cache refill buffer allocated by software or hardware prefetch.",
"EventCode": "0x396",
"EventName": "L2D_CACHE_SWAP_LOCAL",
"BriefDescription": "This event counts operations where demand access hits an L2 cache refill buffer allocated by software or hardware prefetch."
},
{
"PublicDescription": "This event counts energy consumption per cycle of L2 cache.",
"EventCode": "0x3E0",
"EventName": "EA_L2",
"BriefDescription": "This event counts energy consumption per cycle of L2 cache."
}
]

View File

@ -0,0 +1,5 @@
[
{
"ArchStdEvent": "CPU_CYCLES"
}
]

View File

@ -0,0 +1,29 @@
[
{
"ArchStdEvent": "EXC_TAKEN"
},
{
"ArchStdEvent": "EXC_UNDEF"
},
{
"ArchStdEvent": "EXC_SVC"
},
{
"ArchStdEvent": "EXC_PABORT"
},
{
"ArchStdEvent": "EXC_DABORT"
},
{
"ArchStdEvent": "EXC_IRQ"
},
{
"ArchStdEvent": "EXC_FIQ"
},
{
"ArchStdEvent": "EXC_SMC"
},
{
"ArchStdEvent": "EXC_HVC"
}
]

View File

@ -0,0 +1,131 @@
[
{
"ArchStdEvent": "SW_INCR"
},
{
"ArchStdEvent": "INST_RETIRED"
},
{
"ArchStdEvent": "EXC_RETURN"
},
{
"ArchStdEvent": "CID_WRITE_RETIRED"
},
{
"ArchStdEvent": "INST_SPEC"
},
{
"ArchStdEvent": "LDREX_SPEC"
},
{
"ArchStdEvent": "STREX_SPEC"
},
{
"ArchStdEvent": "LD_SPEC"
},
{
"ArchStdEvent": "ST_SPEC"
},
{
"ArchStdEvent": "LDST_SPEC"
},
{
"ArchStdEvent": "DP_SPEC"
},
{
"ArchStdEvent": "ASE_SPEC"
},
{
"ArchStdEvent": "VFP_SPEC"
},
{
"ArchStdEvent": "PC_WRITE_SPEC"
},
{
"ArchStdEvent": "CRYPTO_SPEC"
},
{
"ArchStdEvent": "BR_IMMED_SPEC"
},
{
"ArchStdEvent": "BR_RETURN_SPEC"
},
{
"ArchStdEvent": "BR_INDIRECT_SPEC"
},
{
"ArchStdEvent": "ISB_SPEC"
},
{
"ArchStdEvent": "DSB_SPEC"
},
{
"ArchStdEvent": "DMB_SPEC"
},
{
"PublicDescription": "This event counts architecturally executed zero blocking operations due to the 'DC ZVA' instruction.",
"EventCode": "0x9F",
"EventName": "DCZVA_SPEC",
"BriefDescription": "This event counts architecturally executed zero blocking operations due to the 'DC ZVA' instruction."
},
{
"PublicDescription": "This event counts architecturally executed floating-point move operations.",
"EventCode": "0x105",
"EventName": "FP_MV_SPEC",
"BriefDescription": "This event counts architecturally executed floating-point move operations."
},
{
"PublicDescription": "This event counts architecturally executed operations that using predicate register.",
"EventCode": "0x108",
"EventName": "PRD_SPEC",
"BriefDescription": "This event counts architecturally executed operations that using predicate register."
},
{
"PublicDescription": "This event counts architecturally executed inter-element manipulation operations.",
"EventCode": "0x109",
"EventName": "IEL_SPEC",
"BriefDescription": "This event counts architecturally executed inter-element manipulation operations."
},
{
"PublicDescription": "This event counts architecturally executed inter-register manipulation operations.",
"EventCode": "0x10A",
"EventName": "IREG_SPEC",
"BriefDescription": "This event counts architecturally executed inter-register manipulation operations."
},
{
"PublicDescription": "This event counts architecturally executed NOSIMD load operations that using SIMD&FP registers.",
"EventCode": "0x112",
"EventName": "FP_LD_SPEC",
"BriefDescription": "This event counts architecturally executed NOSIMD load operations that using SIMD&FP registers."
},
{
"PublicDescription": "This event counts architecturally executed NOSIMD store operations that using SIMD&FP registers.",
"EventCode": "0x113",
"EventName": "FP_ST_SPEC",
"BriefDescription": "This event counts architecturally executed NOSIMD store operations that using SIMD&FP registers."
},
{
"PublicDescription": "This event counts architecturally executed SIMD broadcast floating-point load operations.",
"EventCode": "0x11A",
"EventName": "BC_LD_SPEC",
"BriefDescription": "This event counts architecturally executed SIMD broadcast floating-point load operations."
},
{
"PublicDescription": "This event counts architecturally executed instructions, excluding the MOVPRFX instruction.",
"EventCode": "0x121",
"EventName": "EFFECTIVE_INST_SPEC",
"BriefDescription": "This event counts architecturally executed instructions, excluding the MOVPRFX instruction."
},
{
"PublicDescription": "This event counts architecturally executed operations that uses 'pre-index' as its addressing mode.",
"EventCode": "0x123",
"EventName": "PRE_INDEX_SPEC",
"BriefDescription": "This event counts architecturally executed operations that uses 'pre-index' as its addressing mode."
},
{
"PublicDescription": "This event counts architecturally executed operations that uses 'post-index' as its addressing mode.",
"EventCode": "0x124",
"EventName": "POST_INDEX_SPEC",
"BriefDescription": "This event counts architecturally executed operations that uses 'post-index' as its addressing mode."
}
]

View File

@ -0,0 +1,8 @@
[
{
"PublicDescription": "This event counts energy consumption per cycle of CMG local memory.",
"EventCode": "0x3E8",
"EventName": "EA_MEMORY",
"BriefDescription": "This event counts energy consumption per cycle of CMG local memory."
}
]

View File

@ -0,0 +1,188 @@
[
{
"PublicDescription": "This event counts the occurrence count of the micro-operation split.",
"EventCode": "0x139",
"EventName": "UOP_SPLIT",
"BriefDescription": "This event counts the occurrence count of the micro-operation split."
},
{
"PublicDescription": "This event counts every cycle that no operation was committed because the oldest and uncommitted load/store/prefetch operation waits for memory access.",
"EventCode": "0x180",
"EventName": "LD_COMP_WAIT_L2_MISS",
"BriefDescription": "This event counts every cycle that no operation was committed because the oldest and uncommitted load/store/prefetch operation waits for memory access."
},
{
"PublicDescription": "This event counts every cycle that no instruction was committed because the oldest and uncommitted integer load operation waits for memory access.",
"EventCode": "0x181",
"EventName": "LD_COMP_WAIT_L2_MISS_EX",
"BriefDescription": "This event counts every cycle that no instruction was committed because the oldest and uncommitted integer load operation waits for memory access."
},
{
"PublicDescription": "This event counts every cycle that no instruction was committed because the oldest and uncommitted load/store/prefetch operation waits for L2 cache access.",
"EventCode": "0x182",
"EventName": "LD_COMP_WAIT_L1_MISS",
"BriefDescription": "This event counts every cycle that no instruction was committed because the oldest and uncommitted load/store/prefetch operation waits for L2 cache access."
},
{
"PublicDescription": "This event counts every cycle that no instruction was committed because the oldest and uncommitted integer load operation waits for L2 cache access.",
"EventCode": "0x183",
"EventName": "LD_COMP_WAIT_L1_MISS_EX",
"BriefDescription": "This event counts every cycle that no instruction was committed because the oldest and uncommitted integer load operation waits for L2 cache access."
},
{
"PublicDescription": "This event counts every cycle that no instruction was committed because the oldest and uncommitted load/store/prefetch operation waits for L1D cache, L2 cache and memory access.",
"EventCode": "0x184",
"EventName": "LD_COMP_WAIT",
"BriefDescription": "This event counts every cycle that no instruction was committed because the oldest and uncommitted load/store/prefetch operation waits for L1D cache, L2 cache and memory access."
},
{
"PublicDescription": "This event counts every cycle that no instruction was committed because the oldest and uncommitted integer load operation waits for L1D cache, L2 cache and memory access.",
"EventCode": "0x185",
"EventName": "LD_COMP_WAIT_EX",
"BriefDescription": "This event counts every cycle that no instruction was committed because the oldest and uncommitted integer load operation waits for L1D cache, L2 cache and memory access."
},
{
"PublicDescription": "This event counts every cycle that no instruction was committed due to the lack of an available prefetch port.",
"EventCode": "0x186",
"EventName": "LD_COMP_WAIT_PFP_BUSY",
"BriefDescription": "This event counts every cycle that no instruction was committed due to the lack of an available prefetch port."
},
{
"PublicDescription": "This event counts the LD_COMP_WAIT_PFP_BUSY caused by an integer load operation.",
"EventCode": "0x187",
"EventName": "LD_COMP_WAIT_PFP_BUSY_EX",
"BriefDescription": "This event counts the LD_COMP_WAIT_PFP_BUSY caused by an integer load operation."
},
{
"PublicDescription": "This event counts the LD_COMP_WAIT_PFP_BUSY caused by a software prefetch instruction.",
"EventCode": "0x188",
"EventName": "LD_COMP_WAIT_PFP_BUSY_SWPF",
"BriefDescription": "This event counts the LD_COMP_WAIT_PFP_BUSY caused by a software prefetch instruction."
},
{
"PublicDescription": "This event counts every cycle that no instruction was committed and the oldest and uncommitted instruction is an integer or floating-point/SIMD instruction.",
"EventCode": "0x189",
"EventName": "EU_COMP_WAIT",
"BriefDescription": "This event counts every cycle that no instruction was committed and the oldest and uncommitted instruction is an integer or floating-point/SIMD instruction."
},
{
"PublicDescription": "This event counts every cycle that no instruction was committed and the oldest and uncommitted instruction is a floating-point/SIMD instruction.",
"EventCode": "0x18A",
"EventName": "FL_COMP_WAIT",
"BriefDescription": "This event counts every cycle that no instruction was committed and the oldest and uncommitted instruction is a floating-point/SIMD instruction."
},
{
"PublicDescription": "This event counts every cycle that no instruction was committed and the oldest and uncommitted instruction is a branch instruction.",
"EventCode": "0x18B",
"EventName": "BR_COMP_WAIT",
"BriefDescription": "This event counts every cycle that no instruction was committed and the oldest and uncommitted instruction is a branch instruction."
},
{
"PublicDescription": "This event counts every cycle that no instruction was committed because the CSE is empty.",
"EventCode": "0x18C",
"EventName": "ROB_EMPTY",
"BriefDescription": "This event counts every cycle that no instruction was committed because the CSE is empty."
},
{
"PublicDescription": "This event counts every cycle that no instruction was committed because the CSE is empty and the store port (SP) is full.",
"EventCode": "0x18D",
"EventName": "ROB_EMPTY_STQ_BUSY",
"BriefDescription": "This event counts every cycle that no instruction was committed because the CSE is empty and the store port (SP) is full."
},
{
"PublicDescription": "This event counts every cycle that the instruction unit is halted by the WFE/WFI instruction.",
"EventCode": "0x18E",
"EventName": "WFE_WFI_CYCLE",
"BriefDescription": "This event counts every cycle that the instruction unit is halted by the WFE/WFI instruction."
},
{
"PublicDescription": "This event counts every cycle that no instruction was committed, but counts at the time when commits MOVPRFX only.",
"EventCode": "0x190",
"EventName": "_0INST_COMMIT",
"BriefDescription": "This event counts every cycle that no instruction was committed, but counts at the time when commits MOVPRFX only."
},
{
"PublicDescription": "This event counts every cycle that one instruction is committed.",
"EventCode": "0x191",
"EventName": "_1INST_COMMIT",
"BriefDescription": "This event counts every cycle that one instruction is committed."
},
{
"PublicDescription": "This event counts every cycle that two instructions are committed.",
"EventCode": "0x192",
"EventName": "_2INST_COMMIT",
"BriefDescription": "This event counts every cycle that two instructions are committed."
},
{
"PublicDescription": "This event counts every cycle that three instructions are committed.",
"EventCode": "0x193",
"EventName": "_3INST_COMMIT",
"BriefDescription": "This event counts every cycle that three instructions are committed."
},
{
"PublicDescription": "This event counts every cycle that four instructions are committed.",
"EventCode": "0x194",
"EventName": "_4INST_COMMIT",
"BriefDescription": "This event counts every cycle that four instructions are committed."
},
{
"PublicDescription": "This event counts every cycle that only any micro-operations are committed.",
"EventCode": "0x198",
"EventName": "UOP_ONLY_COMMIT",
"BriefDescription": "This event counts every cycle that only any micro-operations are committed."
},
{
"PublicDescription": "This event counts every cycle that only the MOVPRFX instruction is committed.",
"EventCode": "0x199",
"EventName": "SINGLE_MOVPRFX_COMMIT",
"BriefDescription": "This event counts every cycle that only the MOVPRFX instruction is committed."
},
{
"PublicDescription": "This event counts energy consumption per cycle of core.",
"EventCode": "0x1E0",
"EventName": "EA_CORE",
"BriefDescription": "This event counts energy consumption per cycle of core."
},
{
"PublicDescription": "This event counts streaming prefetch requests to L1D cache generated by hardware prefetcher.",
"EventCode": "0x230",
"EventName": "L1HWPF_STREAM_PF",
"BriefDescription": "This event counts streaming prefetch requests to L1D cache generated by hardware prefetcher."
},
{
"PublicDescription": "This event counts allocation type prefetch injection requests to L1D cache generated by hardware prefetcher.",
"EventCode": "0x231",
"EventName": "L1HWPF_INJ_ALLOC_PF",
"BriefDescription": "This event counts allocation type prefetch injection requests to L1D cache generated by hardware prefetcher."
},
{
"PublicDescription": "This event counts non-allocation type prefetch injection requests to L1D cache generated by hardware prefetcher.",
"EventCode": "0x232",
"EventName": "L1HWPF_INJ_NOALLOC_PF",
"BriefDescription": "This event counts non-allocation type prefetch injection requests to L1D cache generated by hardware prefetcher."
},
{
"PublicDescription": "This event counts streaming prefetch requests to L2 cache generated by hardware prefecher.",
"EventCode": "0x233",
"EventName": "L2HWPF_STREAM_PF",
"BriefDescription": "This event counts streaming prefetch requests to L2 cache generated by hardware prefecher."
},
{
"PublicDescription": "This event counts allocation type prefetch injection requests to L2 cache generated by hardware prefetcher.",
"EventCode": "0x234",
"EventName": "L2HWPF_INJ_ALLOC_PF",
"BriefDescription": "This event counts allocation type prefetch injection requests to L2 cache generated by hardware prefetcher."
},
{
"PublicDescription": "This event counts non-allocation type prefetch injection requests to L2 cache generated by hardware prefetcher.",
"EventCode": "0x235",
"EventName": "L2HWPF_INJ_NOALLOC_PF",
"BriefDescription": "This event counts non-allocation type prefetch injection requests to L2 cache generated by hardware prefetcher."
},
{
"PublicDescription": "This event counts prefetch requests to L2 cache generated by the other causes.",
"EventCode": "0x236",
"EventName": "L2HWPF_OTHER",
"BriefDescription": "This event counts prefetch requests to L2 cache generated by the other causes."
}
]

View File

@ -0,0 +1,194 @@
[
{
"ArchStdEvent": "STALL_FRONTEND"
},
{
"ArchStdEvent": "STALL_BACKEND"
},
{
"PublicDescription": "This event counts valid cycles of EAGA pipeline.",
"EventCode": "0x1A0",
"EventName": "EAGA_VAL",
"BriefDescription": "This event counts valid cycles of EAGA pipeline."
},
{
"PublicDescription": "This event counts valid cycles of EAGB pipeline.",
"EventCode": "0x1A1",
"EventName": "EAGB_VAL",
"BriefDescription": "This event counts valid cycles of EAGB pipeline."
},
{
"PublicDescription": "This event counts valid cycles of EXA pipeline.",
"EventCode": "0x1A2",
"EventName": "EXA_VAL",
"BriefDescription": "This event counts valid cycles of EXA pipeline."
},
{
"PublicDescription": "This event counts valid cycles of EXB pipeline.",
"EventCode": "0x1A3",
"EventName": "EXB_VAL",
"BriefDescription": "This event counts valid cycles of EXB pipeline."
},
{
"PublicDescription": "This event counts valid cycles of FLA pipeline.",
"EventCode": "0x1A4",
"EventName": "FLA_VAL",
"BriefDescription": "This event counts valid cycles of FLA pipeline."
},
{
"PublicDescription": "This event counts valid cycles of FLB pipeline.",
"EventCode": "0x1A5",
"EventName": "FLB_VAL",
"BriefDescription": "This event counts valid cycles of FLB pipeline."
},
{
"PublicDescription": "This event counts valid cycles of PRX pipeline.",
"EventCode": "0x1A6",
"EventName": "PRX_VAL",
"BriefDescription": "This event counts valid cycles of PRX pipeline."
},
{
"PublicDescription": "This event counts the number of 1's in the predicate bits of request in FLA pipeline, where it is corrected so that it becomes 16 when all bits are 1.",
"EventCode": "0x1B4",
"EventName": "FLA_VAL_PRD_CNT",
"BriefDescription": "This event counts the number of 1's in the predicate bits of request in FLA pipeline, where it is corrected so that it becomes 16 when all bits are 1."
},
{
"PublicDescription": "This event counts the number of 1's in the predicate bits of request in FLB pipeline, where it is corrected so that it becomes 16 when all bits are 1.",
"EventCode": "0x1B5",
"EventName": "FLB_VAL_PRD_CNT",
"BriefDescription": "This event counts the number of 1's in the predicate bits of request in FLB pipeline, where it is corrected so that it becomes 16 when all bits are 1."
},
{
"PublicDescription": "This event counts valid cycles of L1D cache pipeline#0.",
"EventCode": "0x240",
"EventName": "L1_PIPE0_VAL",
"BriefDescription": "This event counts valid cycles of L1D cache pipeline#0."
},
{
"PublicDescription": "This event counts valid cycles of L1D cache pipeline#1.",
"EventCode": "0x241",
"EventName": "L1_PIPE1_VAL",
"BriefDescription": "This event counts valid cycles of L1D cache pipeline#1."
},
{
"PublicDescription": "This event counts requests in L1D cache pipeline#0 that its sce bit of tagged address is 1.",
"EventCode": "0x250",
"EventName": "L1_PIPE0_VAL_IU_TAG_ADRS_SCE",
"BriefDescription": "This event counts requests in L1D cache pipeline#0 that its sce bit of tagged address is 1."
},
{
"PublicDescription": "This event counts requests in L1D cache pipeline#0 that its pfe bit of tagged address is 1.",
"EventCode": "0x251",
"EventName": "L1_PIPE0_VAL_IU_TAG_ADRS_PFE",
"BriefDescription": "This event counts requests in L1D cache pipeline#0 that its pfe bit of tagged address is 1."
},
{
"PublicDescription": "This event counts requests in L1D cache pipeline#1 that its sce bit of tagged address is 1.",
"EventCode": "0x252",
"EventName": "L1_PIPE1_VAL_IU_TAG_ADRS_SCE",
"BriefDescription": "This event counts requests in L1D cache pipeline#1 that its sce bit of tagged address is 1."
},
{
"PublicDescription": "This event counts requests in L1D cache pipeline#1 that its pfe bit of tagged address is 1.",
"EventCode": "0x253",
"EventName": "L1_PIPE1_VAL_IU_TAG_ADRS_PFE",
"BriefDescription": "This event counts requests in L1D cache pipeline#1 that its pfe bit of tagged address is 1."
},
{
"PublicDescription": "This event counts completed requests in L1D cache pipeline#0.",
"EventCode": "0x260",
"EventName": "L1_PIPE0_COMP",
"BriefDescription": "This event counts completed requests in L1D cache pipeline#0."
},
{
"PublicDescription": "This event counts completed requests in L1D cache pipeline#1.",
"EventCode": "0x261",
"EventName": "L1_PIPE1_COMP",
"BriefDescription": "This event counts completed requests in L1D cache pipeline#1."
},
{
"PublicDescription": "This event counts completed requests in L1I cache pipeline.",
"EventCode": "0x268",
"EventName": "L1I_PIPE_COMP",
"BriefDescription": "This event counts completed requests in L1I cache pipeline."
},
{
"PublicDescription": "This event counts valid cycles of L1I cache pipeline.",
"EventCode": "0x269",
"EventName": "L1I_PIPE_VAL",
"BriefDescription": "This event counts valid cycles of L1I cache pipeline."
},
{
"PublicDescription": "This event counts aborted requests in L1D pipelines that due to store-load interlock.",
"EventCode": "0x274",
"EventName": "L1_PIPE_ABORT_STLD_INTLK",
"BriefDescription": "This event counts aborted requests in L1D pipelines that due to store-load interlock."
},
{
"PublicDescription": "This event counts requests in L1D cache pipeline#0 that its sector cache ID is not 0.",
"EventCode": "0x2A0",
"EventName": "L1_PIPE0_VAL_IU_NOT_SEC0",
"BriefDescription": "This event counts requests in L1D cache pipeline#0 that its sector cache ID is not 0."
},
{
"PublicDescription": "This event counts requests in L1D cache pipeline#1 that its sector cache ID is not 0.",
"EventCode": "0x2A1",
"EventName": "L1_PIPE1_VAL_IU_NOT_SEC0",
"BriefDescription": "This event counts requests in L1D cache pipeline#1 that its sector cache ID is not 0."
},
{
"PublicDescription": "This event counts the number of times where 2 elements of the gather instructions became 2 flows because 2 elements could not be combined.",
"EventCode": "0x2B0",
"EventName": "L1_PIPE_COMP_GATHER_2FLOW",
"BriefDescription": "This event counts the number of times where 2 elements of the gather instructions became 2 flows because 2 elements could not be combined."
},
{
"PublicDescription": "This event counts the number of times where 2 elements of the gather instructions became 1 flow because 2 elements could be combined.",
"EventCode": "0x2B1",
"EventName": "L1_PIPE_COMP_GATHER_1FLOW",
"BriefDescription": "This event counts the number of times where 2 elements of the gather instructions became 1 flow because 2 elements could be combined."
},
{
"PublicDescription": "This event counts the number of times where 2 elements of the gather instructions became 0 flow because both predicate values are 0.",
"EventCode": "0x2B2",
"EventName": "L1_PIPE_COMP_GATHER_0FLOW",
"BriefDescription": "This event counts the number of times where 2 elements of the gather instructions became 0 flow because both predicate values are 0."
},
{
"PublicDescription": "This event counts the number of flows of the scatter instructions.",
"EventCode": "0x2B3",
"EventName": "L1_PIPE_COMP_SCATTER_1FLOW",
"BriefDescription": "This event counts the number of flows of the scatter instructions."
},
{
"PublicDescription": "This event counts the number of 1's in the predicate bits of request in L1D cache pipeline#0, where it is corrected so that it becomes 16 when all bits are 1.",
"EventCode": "0x2B8",
"EventName": "L1_PIPE0_COMP_PRD_CNT",
"BriefDescription": "This event counts the number of 1's in the predicate bits of request in L1D cache pipeline#0, where it is corrected so that it becomes 16 when all bits are 1."
},
{
"PublicDescription": "This event counts the number of 1's in the predicate bits of request in L1D cache pipeline#1, where it is corrected so that it becomes 16 when all bits are 1.",
"EventCode": "0x2B9",
"EventName": "L1_PIPE1_COMP_PRD_CNT",
"BriefDescription": "This event counts the number of 1's in the predicate bits of request in L1D cache pipeline#1, where it is corrected so that it becomes 16 when all bits are 1."
},
{
"PublicDescription": "This event counts valid cycles of L2 cache pipeline.",
"EventCode": "0x330",
"EventName": "L2_PIPE_VAL",
"BriefDescription": "This event counts valid cycles of L2 cache pipeline."
},
{
"PublicDescription": "This event counts completed requests in L2 cache pipeline.",
"EventCode": "0x350",
"EventName": "L2_PIPE_COMP_ALL",
"BriefDescription": "This event counts completed requests in L2 cache pipeline."
},
{
"PublicDescription": "This event counts operations where software or hardware prefetch hits an L2 cache refill buffer allocated by demand access.",
"EventCode": "0x370",
"EventName": "L2_PIPE_COMP_PF_L2MIB_MCH",
"BriefDescription": "This event counts operations where software or hardware prefetch hits an L2 cache refill buffer allocated by demand access."
}
]

View File

@ -0,0 +1,110 @@
[
{
"ArchStdEvent": "SIMD_INST_RETIRED"
},
{
"ArchStdEvent": "SVE_INST_RETIRED"
},
{
"ArchStdEvent": "UOP_SPEC"
},
{
"ArchStdEvent": "SVE_MATH_SPEC"
},
{
"ArchStdEvent": "FP_SPEC"
},
{
"ArchStdEvent": "FP_FMA_SPEC"
},
{
"ArchStdEvent": "FP_RECPE_SPEC"
},
{
"ArchStdEvent": "FP_CVT_SPEC"
},
{
"ArchStdEvent": "ASE_SVE_INT_SPEC"
},
{
"ArchStdEvent": "SVE_PRED_SPEC"
},
{
"ArchStdEvent": "SVE_MOVPRFX_SPEC"
},
{
"ArchStdEvent": "SVE_MOVPRFX_U_SPEC"
},
{
"ArchStdEvent": "ASE_SVE_LD_SPEC"
},
{
"ArchStdEvent": "ASE_SVE_ST_SPEC"
},
{
"ArchStdEvent": "PRF_SPEC"
},
{
"ArchStdEvent": "BASE_LD_REG_SPEC"
},
{
"ArchStdEvent": "BASE_ST_REG_SPEC"
},
{
"ArchStdEvent": "SVE_LDR_REG_SPEC"
},
{
"ArchStdEvent": "SVE_STR_REG_SPEC"
},
{
"ArchStdEvent": "SVE_LDR_PREG_SPEC"
},
{
"ArchStdEvent": "SVE_STR_PREG_SPEC"
},
{
"ArchStdEvent": "SVE_PRF_CONTIG_SPEC"
},
{
"ArchStdEvent": "ASE_SVE_LD_MULTI_SPEC"
},
{
"ArchStdEvent": "ASE_SVE_ST_MULTI_SPEC"
},
{
"ArchStdEvent": "SVE_LD_GATHER_SPEC"
},
{
"ArchStdEvent": "SVE_ST_SCATTER_SPEC"
},
{
"ArchStdEvent": "SVE_PRF_GATHER_SPEC"
},
{
"ArchStdEvent": "SVE_LDFF_SPEC"
},
{
"ArchStdEvent": "FP_SCALE_OPS_SPEC"
},
{
"ArchStdEvent": "FP_FIXED_OPS_SPEC"
},
{
"ArchStdEvent": "FP_HP_SCALE_OPS_SPEC"
},
{
"ArchStdEvent": "FP_HP_FIXED_OPS_SPEC"
},
{
"ArchStdEvent": "FP_SP_SCALE_OPS_SPEC"
},
{
"ArchStdEvent": "FP_SP_FIXED_OPS_SPEC"
},
{
"ArchStdEvent": "FP_DP_SCALE_OPS_SPEC"
},
{
"ArchStdEvent": "FP_DP_FIXED_OPS_SPEC"
}
]

View File

@ -0,0 +1,233 @@
[
{
"MetricExpr": "FETCH_BUBBLE / (4 * CPU_CYCLES)",
"PublicDescription": "Frontend bound L1 topdown metric",
"BriefDescription": "Frontend bound L1 topdown metric",
"MetricGroup": "TopDownL1",
"MetricName": "frontend_bound"
},
{
"MetricExpr": "(INST_SPEC - INST_RETIRED) / (4 * CPU_CYCLES)",
"PublicDescription": "Bad Speculation L1 topdown metric",
"BriefDescription": "Bad Speculation L1 topdown metric",
"MetricGroup": "TopDownL1",
"MetricName": "bad_speculation"
},
{
"MetricExpr": "INST_RETIRED / (CPU_CYCLES * 4)",
"PublicDescription": "Retiring L1 topdown metric",
"BriefDescription": "Retiring L1 topdown metric",
"MetricGroup": "TopDownL1",
"MetricName": "retiring"
},
{
"MetricExpr": "1 - (frontend_bound + bad_speculation + retiring)",
"PublicDescription": "Backend Bound L1 topdown metric",
"BriefDescription": "Backend Bound L1 topdown metric",
"MetricGroup": "TopDownL1",
"MetricName": "backend_bound"
},
{
"MetricExpr": "armv8_pmuv3_0@event\\=0x201d@ / CPU_CYCLES",
"PublicDescription": "Fetch latency bound L2 topdown metric",
"BriefDescription": "Fetch latency bound L2 topdown metric",
"MetricGroup": "TopDownL2",
"MetricName": "fetch_latency_bound"
},
{
"MetricExpr": "frontend_bound - fetch_latency_bound",
"PublicDescription": "Fetch bandwidth bound L2 topdown metric",
"BriefDescription": "Fetch bandwidth bound L2 topdown metric",
"MetricGroup": "TopDownL2",
"MetricName": "fetch_bandwidth_bound"
},
{
"MetricExpr": "(bad_speculation * BR_MIS_PRED) / (BR_MIS_PRED + armv8_pmuv3_0@event\\=0x2013@)",
"PublicDescription": "Branch mispredicts L2 topdown metric",
"BriefDescription": "Branch mispredicts L2 topdown metric",
"MetricGroup": "TopDownL2",
"MetricName": "branch_mispredicts"
},
{
"MetricExpr": "bad_speculation - branch_mispredicts",
"PublicDescription": "Machine clears L2 topdown metric",
"BriefDescription": "Machine clears L2 topdown metric",
"MetricGroup": "TopDownL2",
"MetricName": "machine_clears"
},
{
"MetricExpr": "(EXE_STALL_CYCLE - (MEM_STALL_ANYLOAD + armv8_pmuv3_0@event\\=0x7005@)) / CPU_CYCLES",
"PublicDescription": "Core bound L2 topdown metric",
"BriefDescription": "Core bound L2 topdown metric",
"MetricGroup": "TopDownL2",
"MetricName": "core_bound"
},
{
"MetricExpr": "(MEM_STALL_ANYLOAD + armv8_pmuv3_0@event\\=0x7005@) / CPU_CYCLES",
"PublicDescription": "Memory bound L2 topdown metric",
"BriefDescription": "Memory bound L2 topdown metric",
"MetricGroup": "TopDownL2",
"MetricName": "memory_bound"
},
{
"MetricExpr": "(((L2I_TLB - L2I_TLB_REFILL) * 15) + (L2I_TLB_REFILL * 100)) / CPU_CYCLES",
"PublicDescription": "Idle by itlb miss L3 topdown metric",
"BriefDescription": "Idle by itlb miss L3 topdown metric",
"MetricGroup": "TopDownL3",
"MetricName": "idle_by_itlb_miss"
},
{
"MetricExpr": "(((L2I_CACHE - L2I_CACHE_REFILL) * 15) + (L2I_CACHE_REFILL * 100)) / CPU_CYCLES",
"PublicDescription": "Idle by icache miss L3 topdown metric",
"BriefDescription": "Idle by icache miss L3 topdown metric",
"MetricGroup": "TopDownL3",
"MetricName": "idle_by_icache_miss"
},
{
"MetricExpr": "(BR_MIS_PRED * 5) / CPU_CYCLES",
"PublicDescription": "BP misp flush L3 topdown metric",
"BriefDescription": "BP misp flush L3 topdown metric",
"MetricGroup": "TopDownL3",
"MetricName": "bp_misp_flush"
},
{
"MetricExpr": "(armv8_pmuv3_0@event\\=0x2013@ * 5) / CPU_CYCLES",
"PublicDescription": "OOO flush L3 topdown metric",
"BriefDescription": "OOO flush L3 topdown metric",
"MetricGroup": "TopDownL3",
"MetricName": "ooo_flush"
},
{
"MetricExpr": "(armv8_pmuv3_0@event\\=0x1001@ * 5) / CPU_CYCLES",
"PublicDescription": "Static predictor flush L3 topdown metric",
"BriefDescription": "Static predictor flush L3 topdown metric",
"MetricGroup": "TopDownL3",
"MetricName": "sp_flush"
},
{
"MetricExpr": "armv8_pmuv3_0@event\\=0x1010@ / BR_MIS_PRED",
"PublicDescription": "Indirect branch L3 topdown metric",
"BriefDescription": "Indirect branch L3 topdown metric",
"MetricGroup": "TopDownL3",
"MetricName": "indirect_branch"
},
{
"MetricExpr": "(armv8_pmuv3_0@event\\=0x1014@ + armv8_pmuv3_0@event\\=0x1018@) / BR_MIS_PRED",
"PublicDescription": "Push branch L3 topdown metric",
"BriefDescription": "Push branch L3 topdown metric",
"MetricGroup": "TopDownL3",
"MetricName": "push_branch"
},
{
"MetricExpr": "armv8_pmuv3_0@event\\=0x100c@ / BR_MIS_PRED",
"PublicDescription": "Pop branch L3 topdown metric",
"BriefDescription": "Pop branch L3 topdown metric",
"MetricGroup": "TopDownL3",
"MetricName": "pop_branch"
},
{
"MetricExpr": "(BR_MIS_PRED - armv8_pmuv3_0@event\\=0x1010@ - armv8_pmuv3_0@event\\=0x1014@ - armv8_pmuv3_0@event\\=0x1018@ - armv8_pmuv3_0@event\\=0x100c@) / BR_MIS_PRED",
"PublicDescription": "Other branch L3 topdown metric",
"BriefDescription": "Other branch L3 topdown metric",
"MetricGroup": "TopDownL3",
"MetricName": "other_branch"
},
{
"MetricExpr": "armv8_pmuv3_0@event\\=0x2012@ / armv8_pmuv3_0@event\\=0x2013@",
"PublicDescription": "Nuke flush L3 topdown metric",
"BriefDescription": "Nuke flush L3 topdown metric",
"MetricGroup": "TopDownL3",
"MetricName": "nuke_flush"
},
{
"MetricExpr": "1 - nuke_flush",
"PublicDescription": "Other flush L3 topdown metric",
"BriefDescription": "Other flush L3 topdown metric",
"MetricGroup": "TopDownL3",
"MetricName": "other_flush"
},
{
"MetricExpr": "armv8_pmuv3_0@event\\=0x2010@ / CPU_CYCLES",
"PublicDescription": "Sync stall L3 topdown metric",
"BriefDescription": "Sync stall L3 topdown metric",
"MetricGroup": "TopDownL3",
"MetricName": "sync_stall"
},
{
"MetricExpr": "armv8_pmuv3_0@event\\=0x2004@ / CPU_CYCLES",
"PublicDescription": "Rob stall L3 topdown metric",
"BriefDescription": "Rob stall L3 topdown metric",
"MetricGroup": "TopDownL3",
"MetricName": "rob_stall"
},
{
"MetricExpr": "(armv8_pmuv3_0@event\\=0x2006@ + armv8_pmuv3_0@event\\=0x2007@ + armv8_pmuv3_0@event\\=0x2008@) / CPU_CYCLES",
"PublicDescription": "Ptag stall L3 topdown metric",
"BriefDescription": "Ptag stall L3 topdown metric",
"MetricGroup": "TopDownL3",
"MetricName": "ptag_stall"
},
{
"MetricExpr": "armv8_pmuv3_0@event\\=0x201e@ / CPU_CYCLES",
"PublicDescription": "SaveOpQ stall L3 topdown metric",
"BriefDescription": "SaveOpQ stall L3 topdown metric",
"MetricGroup": "TopDownL3",
"MetricName": "saveopq_stall"
},
{
"MetricExpr": "armv8_pmuv3_0@event\\=0x2005@ / CPU_CYCLES",
"PublicDescription": "PC buffer stall L3 topdown metric",
"BriefDescription": "PC buffer stall L3 topdown metric",
"MetricGroup": "TopDownL3",
"MetricName": "pc_buffer_stall"
},
{
"MetricExpr": "armv8_pmuv3_0@event\\=0x7002@ / CPU_CYCLES",
"PublicDescription": "Divider L3 topdown metric",
"BriefDescription": "Divider L3 topdown metric",
"MetricGroup": "TopDownL3",
"MetricName": "divider"
},
{
"MetricExpr": "armv8_pmuv3_0@event\\=0x7003@ / CPU_CYCLES",
"PublicDescription": "FSU stall L3 topdown metric",
"BriefDescription": "FSU stall L3 topdown metric",
"MetricGroup": "TopDownL3",
"MetricName": "fsu_stall"
},
{
"MetricExpr": "core_bound - divider - fsu_stall",
"PublicDescription": "EXE ports util L3 topdown metric",
"BriefDescription": "EXE ports util L3 topdown metric",
"MetricGroup": "TopDownL3",
"MetricName": "exe_ports_util"
},
{
"MetricExpr": "(MEM_STALL_ANYLOAD - MEM_STALL_L1MISS) / CPU_CYCLES",
"PublicDescription": "L1 bound L3 topdown metric",
"BriefDescription": "L1 bound L3 topdown metric",
"MetricGroup": "TopDownL3",
"MetricName": "l1_bound"
},
{
"MetricExpr": "(MEM_STALL_L1MISS - MEM_STALL_L2MISS) / CPU_CYCLES",
"PublicDescription": "L2 bound L3 topdown metric",
"BriefDescription": "L2 bound L3 topdown metric",
"MetricGroup": "TopDownL3",
"MetricName": "l2_bound"
},
{
"MetricExpr": "MEM_STALL_L2MISS / CPU_CYCLES",
"PublicDescription": "Mem bound L3 topdown metric",
"BriefDescription": "Mem bound L3 topdown metric",
"MetricGroup": "TopDownL3",
"MetricName": "mem_bound"
},
{
"MetricExpr": "armv8_pmuv3_0@event\\=0x7005@ / CPU_CYCLES",
"PublicDescription": "Store bound L3 topdown metric",
"BriefDescription": "Store bound L3 topdown metric",
"MetricGroup": "TopDownL3",
"MetricName": "store_bound"
},
]

View File

@ -20,5 +20,6 @@
0x00000000410fd0c0,v1,arm/cortex-a76-n1,core
0x00000000420f5160,v1,cavium/thunderx2,core
0x00000000430f0af0,v1,cavium/thunderx2,core
0x00000000460f0010,v1,fujitsu/a64fx,core
0x00000000480fd010,v1,hisilicon/hip08,core
0x00000000500f0000,v1,ampere/emag,core

1 # Format:
20 0x00000000410fd0c0,v1,arm/cortex-a76-n1,core
21 0x00000000420f5160,v1,cavium/thunderx2,core
22 0x00000000430f0af0,v1,cavium/thunderx2,core
23 0x00000000460f0010,v1,fujitsu/a64fx,core
24 0x00000000480fd010,v1,hisilicon/hip08,core
25 0x00000000500f0000,v1,ampere/emag,core

View File

@ -15,3 +15,4 @@
# Power8 entries
004[bcd][[:xdigit:]]{4},1,power8,core
004e[[:xdigit:]]{4},1,power9,core
0080[[:xdigit:]]{4},1,power10,core

1 # Format:
15 004[bcd][[:xdigit:]]{4},1,power8,core
16 004e[[:xdigit:]]{4},1,power9,core
17 0080[[:xdigit:]]{4},1,power10,core
18

View File

@ -0,0 +1,47 @@
[
{
"EventCode": "1003C",
"EventName": "PM_EXEC_STALL_DMISS_L2L3",
"BriefDescription": "Cycles in which the oldest instruction in the pipeline was waiting for a load miss to resolve from either the local L2 or local L3."
},
{
"EventCode": "34056",
"EventName": "PM_EXEC_STALL_LOAD_FINISH",
"BriefDescription": "Cycles in which the oldest instruction in the pipeline was finishing a load after its data was reloaded from a data source beyond the local L1; cycles in which the LSU was processing an L1-hit; cycles in which the NTF instruction merged with another load in the LMQ."
},
{
"EventCode": "3006C",
"EventName": "PM_RUN_CYC_SMT2_MODE",
"BriefDescription": "Cycles when this thread's run latch is set and the core is in SMT2 mode."
},
{
"EventCode": "300F4",
"EventName": "PM_RUN_INST_CMPL_CONC",
"BriefDescription": "PowerPC instructions completed by this thread when all threads in the core had the run-latch set."
},
{
"EventCode": "4C016",
"EventName": "PM_EXEC_STALL_DMISS_L2L3_CONFLICT",
"BriefDescription": "Cycles in which the oldest instruction in the pipeline was waiting for a load miss to resolve from the local L2 or local L3, with a dispatch conflict."
},
{
"EventCode": "4D014",
"EventName": "PM_EXEC_STALL_LOAD",
"BriefDescription": "Cycles in which the oldest instruction in the pipeline was a load instruction executing in the Load Store Unit."
},
{
"EventCode": "4D016",
"EventName": "PM_EXEC_STALL_PTESYNC",
"BriefDescription": "Cycles in which the oldest instruction in the pipeline was a PTESYNC instruction executing in the Load Store Unit."
},
{
"EventCode": "401EA",
"EventName": "PM_THRESH_EXC_128",
"BriefDescription": "Threshold counter exceeded a value of 128."
},
{
"EventCode": "400F6",
"EventName": "PM_BR_MPRED_CMPL",
"BriefDescription": "A mispredicted branch completed. Includes direction and target."
}
]

View File

@ -0,0 +1,7 @@
[
{
"EventCode": "4016E",
"EventName": "PM_THRESH_NOT_MET",
"BriefDescription": "Threshold counter did not meet threshold."
}
]

View File

@ -0,0 +1,217 @@
[
{
"EventCode": "10004",
"EventName": "PM_EXEC_STALL_TRANSLATION",
"BriefDescription": "Cycles in which the oldest instruction in the pipeline suffered a TLB miss or ERAT miss and waited for it to resolve."
},
{
"EventCode": "10010",
"EventName": "PM_PMC4_OVERFLOW",
"BriefDescription": "The event selected for PMC4 caused the event counter to overflow."
},
{
"EventCode": "10020",
"EventName": "PM_PMC4_REWIND",
"BriefDescription": "The speculative event selected for PMC4 rewinds and the counter for PMC4 is not charged."
},
{
"EventCode": "10038",
"EventName": "PM_DISP_STALL_TRANSLATION",
"BriefDescription": "Cycles when dispatch was stalled for this thread because the MMU was handling a translation miss."
},
{
"EventCode": "1003A",
"EventName": "PM_DISP_STALL_BR_MPRED_IC_L2",
"BriefDescription": "Cycles when dispatch was stalled while the instruction was fetched from the local L2 after suffering a branch mispredict."
},
{
"EventCode": "1E050",
"EventName": "PM_DISP_STALL_HELD_STF_MAPPER_CYC",
"BriefDescription": "Cycles in which the NTC instruction is held at dispatch because the STF mapper/SRB was full. Includes GPR (count, link, tar), VSR, VMR, FPR."
},
{
"EventCode": "1F054",
"EventName": "PM_DTLB_HIT",
"BriefDescription": "The PTE required by the instruction was resident in the TLB (data TLB access). When MMCR1[16]=0 this event counts only demand hits. When MMCR1[16]=1 this event includes demand and prefetch. Applies to both HPT and RPT."
},
{
"EventCode": "101E8",
"EventName": "PM_THRESH_EXC_256",
"BriefDescription": "Threshold counter exceeded a count of 256."
},
{
"EventCode": "101EC",
"EventName": "PM_THRESH_MET",
"BriefDescription": "Threshold exceeded."
},
{
"EventCode": "100F2",
"EventName": "PM_1PLUS_PPC_CMPL",
"BriefDescription": "Cycles in which at least one instruction is completed by this thread."
},
{
"EventCode": "100F6",
"EventName": "PM_IERAT_MISS",
"BriefDescription": "IERAT Reloaded to satisfy an IERAT miss. All page sizes are counted by this event."
},
{
"EventCode": "100F8",
"EventName": "PM_DISP_STALL_CYC",
"BriefDescription": "Cycles the ICT has no itags assigned to this thread (no instructions were dispatched during these cycles)."
},
{
"EventCode": "20114",
"EventName": "PM_MRK_L2_RC_DISP",
"BriefDescription": "Marked instruction RC dispatched in L2."
},
{
"EventCode": "2C010",
"EventName": "PM_EXEC_STALL_LSU",
"BriefDescription": "Cycles in which the oldest instruction in the pipeline was executing in the Load Store Unit. This does not include simple fixed point instructions."
},
{
"EventCode": "2C016",
"EventName": "PM_DISP_STALL_IERAT_ONLY_MISS",
"BriefDescription": "Cycles when dispatch was stalled while waiting to resolve an instruction ERAT miss."
},
{
"EventCode": "2C01E",
"EventName": "PM_DISP_STALL_BR_MPRED_IC_L3",
"BriefDescription": "Cycles when dispatch was stalled while the instruction was fetched from the local L3 after suffering a branch mispredict."
},
{
"EventCode": "2D01A",
"EventName": "PM_DISP_STALL_IC_MISS",
"BriefDescription": "Cycles when dispatch was stalled for this thread due to an Icache Miss."
},
{
"EventCode": "2D01C",
"EventName": "PM_CMPL_STALL_STCX",
"BriefDescription": "Cycles in which the oldest instruction in the pipeline was a stcx waiting for resolution from the nest before completing."
},
{
"EventCode": "2E018",
"EventName": "PM_DISP_STALL_FETCH",
"BriefDescription": "Cycles when dispatch was stalled for this thread because Fetch was being held."
},
{
"EventCode": "2E01A",
"EventName": "PM_DISP_STALL_HELD_XVFC_MAPPER_CYC",
"BriefDescription": "Cycles in which the NTC instruction is held at dispatch because the XVFC mapper/SRB was full."
},
{
"EventCode": "2C142",
"EventName": "PM_MRK_XFER_FROM_SRC_PMC2",
"BriefDescription": "For a marked data transfer instruction, the processor's L1 data cache was reloaded from the source specified in MMCR3[15:27]. If MMCR1[16|17] is 0 (default), this count includes only lines that were reloaded to satisfy a demand miss. If MMCR1[16|17] is 1, this count includes both demand misses and prefetch reloads."
},
{
"EventCode": "24050",
"EventName": "PM_IOPS_DISP",
"BriefDescription": "Internal Operations dispatched. PM_IOPS_DISP / PM_INST_DISP will show the average number of internal operations per PowerPC instruction."
},
{
"EventCode": "2405E",
"EventName": "PM_ISSUE_CANCEL",
"BriefDescription": "An instruction issued and the issue was later cancelled. Only one cancel per PowerPC instruction."
},
{
"EventCode": "200FA",
"EventName": "PM_BR_TAKEN_CMPL",
"BriefDescription": "Branch Taken instruction completed."
},
{
"EventCode": "30012",
"EventName": "PM_FLUSH_COMPLETION",
"BriefDescription": "The instruction that was next to complete (oldest in the pipeline) did not complete because it suffered a flush."
},
{
"EventCode": "30014",
"EventName": "PM_EXEC_STALL_STORE",
"BriefDescription": "Cycles in which the oldest instruction in the pipeline was a store instruction executing in the Load Store Unit."
},
{
"EventCode": "30018",
"EventName": "PM_DISP_STALL_HELD_SCOREBOARD_CYC",
"BriefDescription": "Cycles in which the NTC instruction is held at dispatch while waiting on the Scoreboard. This event combines VSCR and FPSCR together."
},
{
"EventCode": "30026",
"EventName": "PM_EXEC_STALL_STORE_MISS",
"BriefDescription": "Cycles in which the oldest instruction in the pipeline was a store whose cache line was not resident in the L1 and was waiting for allocation of the missing line into the L1."
},
{
"EventCode": "3012A",
"EventName": "PM_MRK_L2_RC_DONE",
"BriefDescription": "L2 RC machine completed the transaction for the marked instruction."
},
{
"EventCode": "3F046",
"EventName": "PM_ITLB_HIT_1G",
"BriefDescription": "Instruction TLB hit (IERAT reload) page size 1G, which implies Radix Page Table translation is in use. When MMCR1[17]=0 this event counts only for demand misses. When MMCR1[17]=1 this event includes demand misses and prefetches."
},
{
"EventCode": "34058",
"EventName": "PM_DISP_STALL_BR_MPRED_ICMISS",
"BriefDescription": "Cycles when dispatch was stalled after a mispredicted branch resulted in an instruction cache miss."
},
{
"EventCode": "3D05C",
"EventName": "PM_DISP_STALL_HELD_RENAME_CYC",
"BriefDescription": "Cycles in which the NTC instruction is held at dispatch because the mapper/SRB was full. Includes GPR (count, link, tar), VSR, VMR, FPR and XVFC."
},
{
"EventCode": "3E052",
"EventName": "PM_DISP_STALL_IC_L3",
"BriefDescription": "Cycles when dispatch was stalled while the instruction was fetched from the local L3."
},
{
"EventCode": "3E054",
"EventName": "PM_LD_MISS_L1",
"BriefDescription": "Load Missed L1, counted at execution time (can be greater than loads finished). LMQ merges are not included in this count. i.e. if a load instruction misses on an address that is already allocated on the LMQ, this event will not increment for that load). Note that this count is per slice, so if a load spans multiple slices this event will increment multiple times for a single load."
},
{
"EventCode": "301EA",
"EventName": "PM_THRESH_EXC_1024",
"BriefDescription": "Threshold counter exceeded a value of 1024."
},
{
"EventCode": "300FA",
"EventName": "PM_INST_FROM_L3MISS",
"BriefDescription": "The processor's instruction cache was reloaded from a source other than the local core's L1, L2, or L3 due to a demand miss."
},
{
"EventCode": "40006",
"EventName": "PM_ISSUE_KILL",
"BriefDescription": "Cycles in which an instruction or group of instructions were cancelled after being issued. This event increments once per occurrence, regardless of how many instructions are included in the issue group."
},
{
"EventCode": "40116",
"EventName": "PM_MRK_LARX_FIN",
"BriefDescription": "Marked load and reserve instruction (LARX) finished. LARX and STCX are instructions used to acquire a lock."
},
{
"EventCode": "4C010",
"EventName": "PM_DISP_STALL_BR_MPRED_IC_L3MISS",
"BriefDescription": "Cycles when dispatch was stalled while the instruction was fetched from sources beyond the local L3 after suffering a mispredicted branch."
},
{
"EventCode": "4D01E",
"EventName": "PM_DISP_STALL_BR_MPRED",
"BriefDescription": "Cycles when dispatch was stalled for this thread due to a mispredicted branch."
},
{
"EventCode": "4E010",
"EventName": "PM_DISP_STALL_IC_L3MISS",
"BriefDescription": "Cycles when dispatch was stalled while the instruction was fetched from any source beyond the local L3."
},
{
"EventCode": "4E01A",
"EventName": "PM_DISP_STALL_HELD_CYC",
"BriefDescription": "Cycles in which the NTC instruction is held at dispatch for any reason."
},
{
"EventCode": "44056",
"EventName": "PM_VECTOR_ST_CMPL",
"BriefDescription": "Vector store instructions completed."
}
]

View File

@ -0,0 +1,12 @@
[
{
"EventCode": "1E058",
"EventName": "PM_STCX_FAIL_FIN",
"BriefDescription": "Conditional store instruction (STCX) failed. LARX and STCX are instructions used to acquire a lock."
},
{
"EventCode": "4E050",
"EventName": "PM_STCX_PASS_FIN",
"BriefDescription": "Conditional store instruction (STCX) passed. LARX and STCX are instructions used to acquire a lock."
}
]

View File

@ -0,0 +1,147 @@
[
{
"EventCode": "1002C",
"EventName": "PM_LD_PREFETCH_CACHE_LINE_MISS",
"BriefDescription": "The L1 cache was reloaded with a line that fulfills a prefetch request."
},
{
"EventCode": "10132",
"EventName": "PM_MRK_INST_ISSUED",
"BriefDescription": "Marked instruction issued. Note that stores always get issued twice, the address gets issued to the LSU and the data gets issued to the VSU. Also, issues can sometimes get killed/cancelled and cause multiple sequential issues for the same instruction."
},
{
"EventCode": "101E0",
"EventName": "PM_MRK_INST_DISP",
"BriefDescription": "The thread has dispatched a randomly sampled marked instruction."
},
{
"EventCode": "101E2",
"EventName": "PM_MRK_BR_TAKEN_CMPL",
"BriefDescription": "Marked Branch Taken instruction completed."
},
{
"EventCode": "20112",
"EventName": "PM_MRK_NTF_FIN",
"BriefDescription": "The marked instruction became the oldest in the pipeline before it finished. It excludes instructions that finish at dispatch."
},
{
"EventCode": "2C01C",
"EventName": "PM_EXEC_STALL_DMISS_OFF_CHIP",
"BriefDescription": "Cycles in which the oldest instruction in the pipeline was waiting for a load miss to resolve from a remote chip."
},
{
"EventCode": "20138",
"EventName": "PM_MRK_ST_NEST",
"BriefDescription": "A store has been sampled/marked and is at the point of execution where it has completed in the core and can no longer be flushed. At this point the store is sent to the L2."
},
{
"EventCode": "2013A",
"EventName": "PM_MRK_BRU_FIN",
"BriefDescription": "Marked Branch instruction finished."
},
{
"EventCode": "2C144",
"EventName": "PM_MRK_XFER_FROM_SRC_CYC_PMC2",
"BriefDescription": "Cycles taken for a marked demand miss to reload a line from the source specified in MMCR3[15:27]."
},
{
"EventCode": "24156",
"EventName": "PM_MRK_STCX_FIN",
"BriefDescription": "Marked conditional store instruction (STCX) finished. LARX and STCX are instructions used to acquire a lock."
},
{
"EventCode": "24158",
"EventName": "PM_MRK_INST",
"BriefDescription": "An instruction was marked. Includes both Random Instruction Sampling (RIS) at decode time and Random Event Sampling (RES) at the time the configured event happens."
},
{
"EventCode": "2415C",
"EventName": "PM_MRK_BR_CMPL",
"BriefDescription": "A marked branch completed. All branches are included."
},
{
"EventCode": "200FD",
"EventName": "PM_L1_ICACHE_MISS",
"BriefDescription": "Demand iCache Miss."
},
{
"EventCode": "30130",
"EventName": "PM_MRK_INST_FIN",
"BriefDescription": "marked instruction finished. Excludes instructions that finish at dispatch. Note that stores always finish twice since the address gets issued to the LSU and the data gets issued to the VSU."
},
{
"EventCode": "34146",
"EventName": "PM_MRK_LD_CMPL",
"BriefDescription": "Marked loads completed."
},
{
"EventCode": "3E158",
"EventName": "PM_MRK_STCX_FAIL",
"BriefDescription": "Marked conditional store instruction (STCX) failed. LARX and STCX are instructions used to acquire a lock."
},
{
"EventCode": "3E15A",
"EventName": "PM_MRK_ST_FIN",
"BriefDescription": "The marked instruction was a store of any kind."
},
{
"EventCode": "30068",
"EventName": "PM_L1_ICACHE_RELOADED_PREF",
"BriefDescription": "Counts all Icache prefetch reloads ( includes demand turned into prefetch)."
},
{
"EventCode": "301E4",
"EventName": "PM_MRK_BR_MPRED_CMPL",
"BriefDescription": "Marked Branch Mispredicted. Includes direction and target."
},
{
"EventCode": "300F6",
"EventName": "PM_LD_DEMAND_MISS_L1",
"BriefDescription": "The L1 cache was reloaded with a line that fulfills a demand miss request. Counted at reload time, before finish."
},
{
"EventCode": "300FE",
"EventName": "PM_DATA_FROM_L3MISS",
"BriefDescription": "The processor's data cache was reloaded from a source other than the local core's L1, L2, or L3 due to a demand miss."
},
{
"EventCode": "40012",
"EventName": "PM_L1_ICACHE_RELOADED_ALL",
"BriefDescription": "Counts all Icache reloads includes demand, prefetch, prefetch turned into demand and demand turned into prefetch."
},
{
"EventCode": "40134",
"EventName": "PM_MRK_INST_TIMEO",
"BriefDescription": "Marked instruction finish timeout (instruction was lost)."
},
{
"EventCode": "4003C",
"EventName": "PM_DISP_STALL_HELD_SYNC_CYC",
"BriefDescription": "Cycles in which the NTC instruction is held at dispatch because of a synchronizing instruction that requires the ICT to be empty before dispatch."
},
{
"EventCode": "4505A",
"EventName": "PM_SP_FLOP_CMPL",
"BriefDescription": "Single Precision floating point instructions completed."
},
{
"EventCode": "4D058",
"EventName": "PM_VECTOR_FLOP_CMPL",
"BriefDescription": "Vector floating point instructions completed."
},
{
"EventCode": "4D05A",
"EventName": "PM_NON_MATH_FLOP_CMPL",
"BriefDescription": "Non Math instructions completed."
},
{
"EventCode": "401E0",
"EventName": "PM_MRK_INST_CMPL",
"BriefDescription": "marked instruction completed."
},
{
"EventCode": "400FE",
"EventName": "PM_DATA_FROM_MEMORY",
"BriefDescription": "The processor's data cache was reloaded from local, remote, or distant memory due to a demand miss."
}
]

View File

@ -0,0 +1,192 @@
[
{
"EventCode": "1000A",
"EventName": "PM_PMC3_REWIND",
"BriefDescription": "The speculative event selected for PMC3 rewinds and the counter for PMC3 is not charged."
},
{
"EventCode": "1C040",
"EventName": "PM_XFER_FROM_SRC_PMC1",
"BriefDescription": "The processor's L1 data cache was reloaded from the source specified in MMCR3[0:12]. If MMCR1[16|17] is 0 (default), this count includes only lines that were reloaded to satisfy a demand miss. If MMCR1[16|17] is 1, this count includes both demand misses and prefetch reloads."
},
{
"EventCode": "1C142",
"EventName": "PM_MRK_XFER_FROM_SRC_PMC1",
"BriefDescription": "For a marked data transfer instruction, the processor's L1 data cache was reloaded from the source specified in MMCR3[0:12]. If MMCR1[16|17] is 0 (default), this count includes only lines that were reloaded to satisfy a demand miss. If MMCR1[16|17] is 1, this count includes both demand misses and prefetch reloads."
},
{
"EventCode": "1C144",
"EventName": "PM_MRK_XFER_FROM_SRC_CYC_PMC1",
"BriefDescription": "Cycles taken for a marked demand miss to reload a line from the source specified in MMCR3[0:12]."
},
{
"EventCode": "1C056",
"EventName": "PM_DERAT_MISS_4K",
"BriefDescription": "Data ERAT Miss (Data TLB Access) page size 4K. When MMCR1[16]=0 this event counts only DERAT reloads for demand misses. When MMCR1[16]=1 this event includes demand misses and prefetches."
},
{
"EventCode": "1C058",
"EventName": "PM_DTLB_MISS_16G",
"BriefDescription": "Data TLB reload (after a miss) page size 16G. When MMCR1[16]=0 this event counts only for demand misses. When MMCR1[16]=1 this event includes demand misses and prefetches."
},
{
"EventCode": "1C05C",
"EventName": "PM_DTLB_MISS_2M",
"BriefDescription": "Data TLB reload (after a miss) page size 2M. Implies radix translation was used. When MMCR1[16]=0 this event counts only for demand misses. When MMCR1[16]=1 this event includes demand misses and prefetches."
},
{
"EventCode": "1E056",
"EventName": "PM_EXEC_STALL_STORE_PIPE",
"BriefDescription": "Cycles in which the oldest instruction in the pipeline was executing in the store unit. This does not include cycles spent handling store misses, PTESYNC instructions or TLBIE instructions."
},
{
"EventCode": "1F150",
"EventName": "PM_MRK_ST_L2_CYC",
"BriefDescription": "Cycles from L2 RC dispatch to L2 RC completion."
},
{
"EventCode": "10062",
"EventName": "PM_LD_L3MISS_PEND_CYC",
"BriefDescription": "Cycles L3 miss was pending for this thread."
},
{
"EventCode": "20010",
"EventName": "PM_PMC1_OVERFLOW",
"BriefDescription": "The event selected for PMC1 caused the event counter to overflow."
},
{
"EventCode": "2001A",
"EventName": "PM_ITLB_HIT",
"BriefDescription": "The PTE required to translate the instruction address was resident in the TLB (instruction TLB access/IERAT reload). Applies to both HPT and RPT. When MMCR1[17]=0 this event counts only for demand misses. When MMCR1[17]=1 this event includes demand misses and prefetches."
},
{
"EventCode": "2003E",
"EventName": "PM_PTESYNC_FIN",
"BriefDescription": "Ptesync instruction finished in the store unit. Only one ptesync can finish at a time."
},
{
"EventCode": "2C040",
"EventName": "PM_XFER_FROM_SRC_PMC2",
"BriefDescription": "The processor's L1 data cache was reloaded from the source specified in MMCR3[15:27]. If MMCR1[16|17] is 0 (default), this count includes only lines that were reloaded to satisfy a demand miss. If MMCR1[16|17] is 1, this count includes both demand misses and prefetch reloads."
},
{
"EventCode": "2C054",
"EventName": "PM_DERAT_MISS_64K",
"BriefDescription": "Data ERAT Miss (Data TLB Access) page size 64K. When MMCR1[16]=0 this event counts only DERAT reloads for demand misses. When MMCR1[16]=1 this event includes demand misses and prefetches."
},
{
"EventCode": "2C056",
"EventName": "PM_DTLB_MISS_4K",
"BriefDescription": "Data TLB reload (after a miss) page size 4K. When MMCR1[16]=0 this event counts only for demand misses. When MMCR1[16]=1 this event includes demand misses and prefetches."
},
{
"EventCode": "2D154",
"EventName": "PM_MRK_DERAT_MISS_64K",
"BriefDescription": "Data ERAT Miss (Data TLB Access) page size 64K for a marked instruction. When MMCR1[16]=0 this event counts only DERAT reloads for demand misses. When MMCR1[16]=1 this event includes demand misses and prefetches."
},
{
"EventCode": "200F6",
"EventName": "PM_DERAT_MISS",
"BriefDescription": "DERAT Reloaded to satisfy a DERAT miss. All page sizes are counted by this event. When MMCR1[16]=0 this event counts only DERAT reloads for demand misses. When MMCR1[16]=1 this event includes demand misses and prefetches."
},
{
"EventCode": "3000A",
"EventName": "PM_DISP_STALL_ITLB_MISS",
"BriefDescription": "Cycles when dispatch was stalled while waiting to resolve an instruction TLB miss."
},
{
"EventCode": "30016",
"EventName": "PM_EXEC_STALL_DERAT_DTLB_MISS",
"BriefDescription": "Cycles in which the oldest instruction in the pipeline suffered a TLB miss and waited for it resolve."
},
{
"EventCode": "3C040",
"EventName": "PM_XFER_FROM_SRC_PMC3",
"BriefDescription": "The processor's L1 data cache was reloaded from the source specified in MMCR3[30:42]. If MMCR1[16|17] is 0 (default), this count includes only lines that were reloaded to satisfy a demand miss. If MMCR1[16|17] is 1, this count includes both demand misses and prefetch reloads."
},
{
"EventCode": "3C142",
"EventName": "PM_MRK_XFER_FROM_SRC_PMC3",
"BriefDescription": "For a marked data transfer instruction, the processor's L1 data cache was reloaded from the source specified in MMCR3[30:42]. If MMCR1[16|17] is 0 (default), this count includes only lines that were reloaded to satisfy a demand miss. If MMCR1[16|17] is 1, this count includes both demand misses and prefetch reloads."
},
{
"EventCode": "3C144",
"EventName": "PM_MRK_XFER_FROM_SRC_CYC_PMC3",
"BriefDescription": "Cycles taken for a marked demand miss to reload a line from the source specified in MMCR3[30:42]."
},
{
"EventCode": "3C054",
"EventName": "PM_DERAT_MISS_16M",
"BriefDescription": "Data ERAT Miss (Data TLB Access) page size 16M. When MMCR1[16]=0 this event counts only DERAT reloads for demand misses. When MMCR1[16]=1 this event includes demand misses and prefetches."
},
{
"EventCode": "3C056",
"EventName": "PM_DTLB_MISS_64K",
"BriefDescription": "Data TLB reload (after a miss) page size 64K. When MMCR1[16]=0 this event counts only for demand misses. When MMCR1[16]=1 this event includes demand misses and prefetches."
},
{
"EventCode": "3C058",
"EventName": "PM_LARX_FIN",
"BriefDescription": "Load and reserve instruction (LARX) finished. LARX and STCX are instructions used to acquire a lock."
},
{
"EventCode": "301E2",
"EventName": "PM_MRK_ST_CMPL",
"BriefDescription": "Marked store completed and sent to nest. Note that this count excludes cache-inhibited stores."
},
{
"EventCode": "300FC",
"EventName": "PM_DTLB_MISS",
"BriefDescription": "The DPTEG required for the load/store instruction in execution was missing from the TLB. It includes pages of all sizes for demand and prefetch activity."
},
{
"EventCode": "4D02C",
"EventName": "PM_PMC1_REWIND",
"BriefDescription": "The speculative event selected for PMC1 rewinds and the counter for PMC1 is not charged."
},
{
"EventCode": "4003E",
"EventName": "PM_LD_CMPL",
"BriefDescription": "Loads completed."
},
{
"EventCode": "4C040",
"EventName": "PM_XFER_FROM_SRC_PMC4",
"BriefDescription": "The processor's L1 data cache was reloaded from the source specified in MMCR3[45:57]. If MMCR1[16|17] is 0 (default), this count includes only lines that were reloaded to satisfy a demand miss. If MMCR1[16|17] is 1, this count includes both demand misses and prefetch reloads."
},
{
"EventCode": "4C142",
"EventName": "PM_MRK_XFER_FROM_SRC_PMC4",
"BriefDescription": "For a marked data transfer instruction, the processor's L1 data cache was reloaded from the source specified in MMCR3[45:57]. If MMCR1[16|17] is 0 (default), this count includes only lines that were reloaded to satisfy a demand miss. If MMCR1[16|17] is 1, this count includes both demand misses and prefetch reloads."
},
{
"EventCode": "4C144",
"EventName": "PM_MRK_XFER_FROM_SRC_CYC_PMC4",
"BriefDescription": "Cycles taken for a marked demand miss to reload a line from the source specified in MMCR3[45:57]."
},
{
"EventCode": "4C056",
"EventName": "PM_DTLB_MISS_16M",
"BriefDescription": "Data TLB reload (after a miss) page size 16M. When MMCR1[16]=0 this event counts only for demand misses. When MMCR1[16]=1 this event includes demand misses and prefetches."
},
{
"EventCode": "4C05A",
"EventName": "PM_DTLB_MISS_1G",
"BriefDescription": "Data TLB reload (after a miss) page size 1G. Implies radix translation was used. When MMCR1[16]=0 this event counts only for demand misses. When MMCR1[16]=1 this event includes demand misses and prefetches."
},
{
"EventCode": "4C15E",
"EventName": "PM_MRK_DTLB_MISS_64K",
"BriefDescription": "Marked Data TLB reload (after a miss) page size 64K. When MMCR1[16]=0 this event counts only for demand misses. When MMCR1[16]=1 this event includes demand misses and prefetches."
},
{
"EventCode": "4D056",
"EventName": "PM_NON_FMA_FLOP_CMPL",
"BriefDescription": "Non FMA instruction completed."
},
{
"EventCode": "40164",
"EventName": "PM_MRK_DERAT_MISS_2M",
"BriefDescription": "Data ERAT Miss (Data TLB Access) page size 2M for a marked instruction. When MMCR1[16]=0 this event counts only DERAT reloads for demand misses. When MMCR1[16]=1 this event includes demand misses and prefetches."
}
]

Some files were not shown because too many files have changed in this diff Show More