Rename struct thread_map to struct perf_thread_map, so it could be part
of libperf.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexey Budankov <alexey.budankov@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20190721112506.12306-4-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Rename struct cpu_map to struct perf_cpu_map, so it could be part of
libperf.
Committer notes:
Added fixes for arm64, provided by Jiri.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexey Budankov <alexey.budankov@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20190721112506.12306-3-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Because we will make struct perf_counts_values public in following
patches and 'loaded' is implementation related.
No functional change is expected.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexey Budankov <alexey.budankov@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20190721112506.12306-2-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
We were looking in tracefs for:
/sys/kernel/debug/tracing/events/syscalls/sys_enter_sendfile/format when
what is there is just
/sys/kernel/debug/tracing/events/syscalls/sys_enter_sendfile/format
Its the same id, 40 in x86_64, so just add an alias and let the existing
logic take care of that.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Luis Cláudio Gonçalves <lclaudio@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lkml.kernel.org/n/tip-km2hmg7hru6u4pawi5fi903q@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
We have an augmenter for the "open" syscall, which has just one pointer,
in the first argument, a "const char *", so any other syscall that has
just one pointer and that is the first can reuse the "open" BPF
augmenter program.
Even more, syscalls that get two pointers with the first being a string
can reuse "open"'s BPF augmenter till we have an augmenter that better
matches that syscall with two pointers.
With this the few augmenters we have, for open (first arg is a string),
openat (2nd arg is a string), renameat (2nd and 4th are strings) can be
reused by a lot of syscalls, ditto for "bind" reusing "connect" because
both have the 2nd argument as a sockaddr and the 3rd as its len.
Lets see how this makes the "bind" syscall reuse the "connect" BPF prog
augmenter found in tools/perf/examples/bpf/augmented_raw_syscalls.c:
# perf trace -e bind,connect systemctl restart sshd
connect(3, { .family: PF_LOCAL, path: /run/systemd/private }, 23) = 0
#
Oh, it just connects to some daemon, so we better do it system wide and then
stop/start sshd:
# perf trace -e bind,connect
systemctl/10124 connect(3, { .family: PF_LOCAL, path: /run/systemd/private }, 23) = 0
sshd/10102 connect(7, { .family: PF_LOCAL, path: /dev/log }, 110) = 0
systemctl/10126 connect(3, { .family: PF_LOCAL, path: /run/systemd/private }, 23) = 0
systemd/10128 ... [continued]: connect()) = 0
(sshd)/10128 connect(3, { .family: PF_LOCAL, path: /run/systemd/journal/stdout }, 30) ...
sshd/10128 bind(3, { .family: PF_NETLINK }, 12) = 0
sshd/10128 connect(4, { .family: PF_LOCAL, path: /var/run/nscd/socket }, 110) = -1 ENOENT (No such file or directory)
sshd/10128 connect(3, { .family: PF_INET6, port: 22, addr: :: }, 28) = 0
sshd/10128 connect(3, { .family: PF_UNSPEC }, 16) = 0
sshd/10128 connect(3, { .family: PF_INET, port: 22, addr: 0.0.0.0 }, 16) = 0
sshd/10128 connect(3, { .family: PF_LOCAL, path: /var/run/nscd/socket }, 110) = -1 ENOENT (No such file or directory)
sshd/10128 connect(3, { .family: PF_LOCAL, path: /var/run/nscd/socket }, 110) = -1 ENOENT (No such file or directory)
sshd/10128 connect(5, { .family: PF_LOCAL, path: /var/run/nscd/socket }, 110) = -1 ENOENT (No such file or directory)
sshd/10128 connect(5, { .family: PF_LOCAL, path: /var/run/nscd/socket }, 110) = -1 ENOENT (No such file or directory)
sshd/10128 bind(4, { .family: PF_INET, port: 22, addr: 0.0.0.0 }, 16) = 0
sshd/10128 connect(6, { .family: PF_LOCAL, path: /dev/log }, 110) = 0
sshd/10128 bind(6, { .family: PF_INET6, port: 22, addr: :: }, 28) = 0
sshd/10128 connect(7, { .family: PF_LOCAL, path: /dev/log }, 110) = 0
^C#
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Luis Cláudio Gonçalves <lclaudio@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lkml.kernel.org/n/tip-zfley2ghs4nim1uq4nu6ed3l@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
We'll continue reading its details from tracefs as we need it, but
preallocate the whole thing otherwise we may realloc and end up with
pointers to the previous buffer.
I.e. in an upcoming algorithm we'll look for syscalls that have function
signatures that are similar to a given syscall to see if we can reuse
its BPF augmenter, so we may be at syscall 42, having a 'struct syscall'
pointing to that slot in trace->syscalls.table[] and try to read the
slot for an yet unread syscall, which would realloc that table to read
the info for syscall 43, say, which would trigger a realoc of
trace->syscalls.table[], and then the pointer we had for syscall 42
would be pointing to the previous block of memory. b00m.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Brendan Gregg <brendan.d.gregg@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Luis Cláudio Gonçalves <lclaudio@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lkml.kernel.org/n/tip-m3cjzzifibs13imafhkk77a0@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
There are holes in syscall tables with IDs not associated with any
syscall, mark those when trying to read information for syscalls, which
could happen when iterating thru all syscalls from 0 to the highest
numbered syscall id.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Luis Cláudio Gonçalves <lclaudio@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lkml.kernel.org/n/tip-cku9mpcrcsqaiq0jepu86r68@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
We iterate thru the syscall table produced from the kernel syscall
tables reading info, propagate the error and add to the debug message.
This helps in fixing further bugs, such as failing to read the
"sendfile" syscall info when it really should try the aliasm
"sendfile64".
Problems reading syscall 40: 2 (No such file or directory)(sendfile) information
# grep sendfile /tmp/build/perf/arch/x86/include/generated/asm/syscalls_64.c
[40] = "sendfile",
#
I.e. in the tracefs format file for the syscall tracepoints we have it
as sendfile64:
# find /sys -type f -name format | grep sendfile
/sys/kernel/debug/tracing/events/syscalls/sys_enter_sendfile64/format
/sys/kernel/debug/tracing/events/syscalls/sys_exit_sendfile64/format
#
But as "sendfile" in the file used to build the syscall table used in
perf:
$ grep sendfile arch/x86/entry/syscalls/syscall_64.tbl
40 common sendfile __x64_sys_sendfile64
$
So we need to add, in followup patches, aliases in 'perf trace' syscall
data structures to cope with thie.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Luis Cláudio Gonçalves <lclaudio@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lkml.kernel.org/n/tip-w3eluap63x9je0bb8o3t79tz@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
As we invalidate the fd->pathname table in the SCA_CLOSE_FD beautifier,
if we don't have it we may end up keeping an fd->pathname association
that then gets misprinted.
The previous behaviour continues when the close() syscall is enabled,
which may still be a a problem if we lose records (i.e. we may lose a
'close' record and then get that fd reused by socket()) but then the
tool will notify that records are being lost and the user will be warned
that some of the heuristics will fall apart.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Luis Cláudio Gonçalves <lclaudio@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lkml.kernel.org/n/tip-b7t6h8sq9lebemvfy2zh3qq1@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
We already had a beautifier for an augmented sockaddr payload, but that
was when we were hooking on each syscalls:sys_enter_foo tracepoints,
since now we're almost doing that by doing a tail call from
raw_syscalls:sys_enter, its almost the same, we can reuse it straight
away.
# perf trace -e connec* ssh www.bla.com
connect(3</var/lib/sss/mc/passwd>, { .family: PF_LOCAL, path: /var/run/nscd/socket }, 0x6e) = -1 ENOENT (No such file or directory)
connect(3</var/lib/sss/mc/passwd>, { .family: PF_LOCAL, path: /var/run/nscd/socket }, 0x6e) = -1 ENOENT (No such file or directory)
connect(4<socket:[16604782]>, { .family: PF_LOCAL, path: /var/lib/sss/pipes/nss }, 0x6e) = 0
connect(7, { .family: PF_LOCAL, path: /var/run/nscd/socket }, 0x6e) = -1 ENOENT (No such file or directory)
connect(7, { .family: PF_LOCAL, path: /var/run/nscd/socket }, 0x6e) = -1 ENOENT (No such file or directory)
connect(5</etc/hosts>, { .family: PF_LOCAL, path: /var/run/nscd/socket }, 0x6e) = -1 ENOENT (No such file or directory)
connect(5</etc/hosts>, { .family: PF_LOCAL, path: /var/run/nscd/socket }, 0x6e) = -1 ENOENT (No such file or directory)
connect(5</etc/hosts>, { .family: PF_INET, port: 53, addr: 192.168.44.1 }, 0x10) = 0
connect(5</etc/hosts>, { .family: PF_INET, port: 22, addr: 146.112.61.108 }, 0x10) = 0
connect(5</etc/hosts>, { .family: PF_INET6, port: 22, addr: ::ffff:146.112.61.108 }, 0x1c) = 0
^C#
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Luis Cláudio Gonçalves <lclaudio@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lkml.kernel.org/n/tip-5xkrbcpjsgnr3zt1aqdd7nvc@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
It'll get other stuff in there than just filenames, starting with
sockaddr for 'connect' and 'bind'.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Luis Cláudio Gonçalves <lclaudio@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lkml.kernel.org/n/tip-bsexidtsn91ehdpzcd6n5fm9@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
I.e. just look for "!syscalls:sys_enter_" or "exit_" plus the syscall
name, that way we need just to add entries to the
augmented_raw_syscalls.c BPF source to add handlers.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Luis Cláudio Gonçalves <lclaudio@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lkml.kernel.org/n/tip-6xavwddruokp6ohs7tf4qilb@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Starting with the renameat and renameat2 syscall, that both receive as
second and fourth parameters a pathname:
# perf trace -e rename* mv one ANOTHER
LLVM: dumping /home/acme/git/perf/tools/perf/examples/bpf/augmented_raw_syscalls.o
mv: cannot stat 'one': No such file or directory
renameat2(AT_FDCWD, "one", AT_FDCWD, "ANOTHER", RENAME_NOREPLACE) = -1 ENOENT (No such file or directory)
#
Since the per CPU scratch buffer map has space for two maximum sized
pathnames, the verifier is satisfied that there will be no overrun.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Luis Cláudio Gonçalves <lclaudio@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lkml.kernel.org/n/tip-x2uboyg5kx2wqeru288209b6@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Trying to control what arguments to copy, which ones were strings, etc
all from userspace via maps went nowhere, lots of difficulties to get
the verifier satisfied, so use what the fine BPF guys designed for such
a syscall handling mechanism: bpf_tail_call + BPF_MAP_TYPE_PROG_ARRAY.
The series leading to this should have explained it thoroughly, but the
end result, explained via gdb should help understand this:
Breakpoint 1, syscall_arg__scnprintf_filename (bf=0xc002b1 "", size=2031, arg=0x7fffffff7970) at builtin-trace.c:1268
1268 {
(gdb) n
1269 unsigned long ptr = arg->val;
(gdb) n
1271 if (arg->augmented.args)
(gdb) n
1272 return syscall_arg__scnprintf_augmented_string(arg, bf, size);
(gdb) s
syscall_arg__scnprintf_augmented_string (arg=0x7fffffff7970, bf=0xc002b1 "", size=2031) at builtin-trace.c:1251
1251 {
(gdb) n
1252 struct augmented_arg *augmented_arg = arg->augmented.args;
(gdb) n
1253 size_t printed = scnprintf(bf, size, "\"%.*s\"", augmented_arg->size, augmented_arg->value);
(gdb) n
1258 int consumed = sizeof(*augmented_arg) + augmented_arg->size;
(gdb) p bf
$1 = 0xc002b1 "\"/etc/ld.so.cache\""
(gdb) bt
#0 syscall_arg__scnprintf_augmented_string (arg=0x7fffffff7970, bf=0xc002b1 "\"/etc/ld.so.cache\"", size=2031) at builtin-trace.c:1258
#1 0x0000000000492634 in syscall_arg__scnprintf_filename (bf=0xc002b1 "\"/etc/ld.so.cache\"", size=2031, arg=0x7fffffff7970) at builtin-trace.c:1272
#2 0x0000000000493cd7 in syscall__scnprintf_val (sc=0xc0de68, bf=0xc002b1 "\"/etc/ld.so.cache\"", size=2031, arg=0x7fffffff7970, val=140737354091036) at builtin-trace.c:1689
#3 0x000000000049404f in syscall__scnprintf_args (sc=0xc0de68, bf=0xc002a7 "AT_FDCWD, \"/etc/ld.so.cache\"", size=2041, args=0x7ffff6cbf1ec "\234\377\377\377", augmented_args=0x7ffff6cbf21c, augmented_args_size=28, trace=0x7fffffffa170,
thread=0xbff940) at builtin-trace.c:1756
#4 0x0000000000494a97 in trace__sys_enter (trace=0x7fffffffa170, evsel=0xbe1900, event=0x7ffff6cbf1a0, sample=0x7fffffff7b00) at builtin-trace.c:1975
#5 0x0000000000496ff1 in trace__handle_event (trace=0x7fffffffa170, event=0x7ffff6cbf1a0, sample=0x7fffffff7b00) at builtin-trace.c:2685
#6 0x0000000000497edb in __trace__deliver_event (trace=0x7fffffffa170, event=0x7ffff6cbf1a0) at builtin-trace.c:3029
#7 0x000000000049801e in trace__deliver_event (trace=0x7fffffffa170, event=0x7ffff6cbf1a0) at builtin-trace.c:3056
#8 0x00000000004988de in trace__run (trace=0x7fffffffa170, argc=2, argv=0x7fffffffd660) at builtin-trace.c:3258
#9 0x000000000049c2d3 in cmd_trace (argc=2, argv=0x7fffffffd660) at builtin-trace.c:4220
#10 0x00000000004dcb6c in run_builtin (p=0xa18e00 <commands+576>, argc=5, argv=0x7fffffffd660) at perf.c:304
#11 0x00000000004dcdd9 in handle_internal_command (argc=5, argv=0x7fffffffd660) at perf.c:356
#12 0x00000000004dcf20 in run_argv (argcp=0x7fffffffd4bc, argv=0x7fffffffd4b0) at perf.c:400
#13 0x00000000004dd28c in main (argc=5, argv=0x7fffffffd660) at perf.c:522
(gdb)
(gdb) continue
Continuing.
openat(AT_FDCWD, "/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3
Now its a matter of automagically assigning the BPF programs copying
syscall arg pointers to functions that are "open"-like (i.e. that need
only the first syscall arg copied as a string), or "openat"-like (2nd
arg, etc).
End result in tool output:
# perf trace -e open* ls /tmp/notthere
LLVM: dumping /home/acme/git/perf/tools/perf/examples/bpf/augmented_raw_syscalls.o
openat(AT_FDCWD, "/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3
openat(AT_FDCWD, "/lib64/libselinux.so.1", O_RDONLY|O_CLOEXEC) = 3
openat(AT_FDCWD, "/lib64/libcap.so.2", O_RDONLY|O_CLOEXEC) = 3
openat(AT_FDCWD, "/lib64/libc.so.6", O_RDONLY|O_CLOEXEC) = 3
openat(AT_FDCWD, "/lib64/libpcre2-8.so.0", O_RDONLY|O_CLOEXEC) = 3
openat(AT_FDCWD, "/lib64/libdl.so.2", O_RDONLY|O_CLOEXEC) = 3
openat(AT_FDCWD, "/lib64/libpthread.so.0", O_RDONLY|O_CLOEXEC) = 3
openat(AT_FDCWD, "", O_RDONLY|O_CLOEXEC) = 3
openat(AT_FDCWD, "/usr/share/locale/locale.alias", O_RDONLY|O_CLOEXEC) = 3
openat(AT_FDCWD, "/usr/share/locale/en_US.UTF-8/LC_MESSAGES/coreutils.mo", O_RDONLY) = ls: cannot access '/tmp/notthere'-1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/usr/share/locale/en_US.utf8/LC_MESSAGES/coreutils.mo", O_RDONLY) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/usr/share/locale/en_US/LC_MESSAGES/coreutils.mo", O_RDONLY) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/usr/share/locale/en.UTF-8/LC_MESSAGES/coreutils.mo", O_RDONLY) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/usr/share/locale/en.utf8/LC_MESSAGES/coreutils.mo", O_RDONLY) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/usr/share/locale/en/LC_MESSAGES/coreutils.mo", O_RDONLY: No such file or directory) =
-1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/usr/share/locale/en_US.UTF-8/LC_MESSAGES/libc.mo", O_RDONLY) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/usr/share/locale/en_US.utf8/LC_MESSAGES/libc.mo", O_RDONLY) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/usr/share/locale/en_US/LC_MESSAGES/libc.mo", O_RDONLY) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/usr/share/locale/en.UTF-8/LC_MESSAGES/libc.mo", O_RDONLY) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/usr/share/locale/en.utf8/LC_MESSAGES/libc.mo", O_RDONLY) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/usr/share/locale/en/LC_MESSAGES/libc.mo", O_RDONLY) = -1 ENOENT (No such file or directory)
#
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Luis Cláudio Gonçalves <lclaudio@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lkml.kernel.org/n/tip-snc7ry99cl6r0pqaspjim98x@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
I.e. for a syscall that has its second argument being a string, its
difficult these days to find 'open' being used in the wild :-)
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Luis Cláudio Gonçalves <lclaudio@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lkml.kernel.org/n/tip-yf3kbzirqrukd3fb2sp5qx4p@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
So, we use a PERF_COUNT_SW_BPF_OUTPUT to output the augmented sys_enter
payload, i.e. to output more than just the raw syscall args, and if
something goes wrong when handling an unfiltered syscall, we bail out
and just return 1 in the bpf program associated with
raw_syscalls:sys_enter, meaning, don't filter that tracepoint, in which
case what will appear in the perf ring buffer isn't the BPF_OUTPUT
event, but the original raw_syscalls:sys_enter event with its normal
payload.
Now that we're switching to using a bpf_tail_call +
BPF_MAP_TYPE_PROG_ARRAY we're going to use this in the common case, so a
bug where raw_syscalls:sys_enter wasn't being handled by
trace__sys_enter() surfaced and for that case, instead of using the
strace-like augmenter (trace__sys_enter()), we continued to use the
normal generic tracepoint handler:
(gdb) p evsel
$2 = (struct perf_evsel *) 0xc03e40
(gdb) p evsel->name
$3 = 0xbc56c0 "raw_syscalls:sys_enter"
(gdb) p ((struct perf_evsel *) 0xc03e40)->name
$4 = 0xbc56c0 "raw_syscalls:sys_enter"
(gdb) p ((struct perf_evsel *) 0xc03e40)->handler
$5 = (void *) 0x495eb3 <trace__event_handler>
This resulted in this:
0.027 raw_syscalls:sys_enter:NR 12 (0, 7fcfcac64c9b, 4d, 7fcfcac64c9b, 7fcfcac6ce00, 19)
... [continued]: brk()) = 0x563b88677000
I.e. only the sys_exit tracepoint was being properly handled, but since
the sys_enter went to the generic trace__event_handler() we printed it
using libtraceevent's formatter instead of 'perf trace's strace-like
one.
Fix it by setting trace__sys_enter() as the handler for
raw_syscalls:sys_enter and setup the tp_field tracepoint field
accessors.
Now, to test it we just make raw_syscalls:sys_enter return 1 right after
checking if the pid is filtered, making it not use
bpf_perf_output_event() but rather ask for the tracepoint not to be
filtered and the result is the expected one:
brk(NULL) = 0x556f42d6e000
I.e. raw_syscalls:sys_enter returns 1, gets handled by
trace__sys_enter() and gets it combined with the raw_syscalls:sys_exit
in a strace-like way.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Luis Cláudio Gonçalves <lclaudio@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lkml.kernel.org/n/tip-0mkocgk31nmy0odknegcby4z@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
I.e. look for "syscalls_sys_enter" and "syscalls_sys_exit" BPF maps of
type PROG_ARRAY and populate it with the handlers as specified per
syscall, for now only 'open' is wiring it to something, in time all
syscalls that need to copy arguments entering a syscall or returning
from one will set these to the right handlers, reusing when possible
pre-existing ones.
Next step is to use bpf_tail_call() into that.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Luis Cláudio Gonçalves <lclaudio@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lkml.kernel.org/n/tip-t0p4u43i9vbpzs1xtowna3gb@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
This is a step in the direction of being able to use a
BPF_MAP_TYPE_PROG_ARRAY to handle syscalls that need to copy pointer
payloads in addition to the raw tracepoint syscall args.
There is a first example in
tools/perf/examples/bpf/augmented_raw_syscalls.c for the 'open' syscall.
Next step is to introduce the prog array map and use this 'open'
augmenter, then use that augmenter in other syscalls that also only copy
the first arg as a string, and then show how to use with a syscall that
reads more than one filename, like 'rename', etc.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Luis Cláudio Gonçalves <lclaudio@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lkml.kernel.org/n/tip-pys4v57x5qqrybb4cery2mc8@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Will be used to assign to syscalls that don't need augmentation, i.e.
those with just integer args.
All syscalls will be in a BPF_MAP_TYPE_PROG_ARRAY, and the
bpf_tail_call() keyed by the syscall id will either find nothing in
place, which means the syscall is being filtered, or a function that
will either add things like filenames to the ring buffer, right after
the raw syscall args, or be this unaugmented handler that will just
return 1, meaning don't filter the original
raw_syscalls:sys_{enter,exit} tracepoint.
For now it is not really being used, this is just leg work to break the
patch into smaller pieces.
It introduces a trace__find_bpf_program_by_title() helper that in turn
uses libbpf's bpf_object__find_program_by_title() on the BPF object with
the __augmented_syscalls__ map. "title" is how libbpf calls the SEC()
argument for functions, i.e. the ELF section that follows a convention
to specify what BPF program (a function with this SEC() marking) should
be connected to which tracepoint, kprobes, etc.
In perf anything that is of the form SEC("sys:event_name") will be
connected to that tracepoint by perf's BPF loader.
In this case its something that will be bpf_tail_call()ed from either
the "raw_syscalls:sys_enter" or "raw_syscall:sys_exit" tracepoints, so
its named "!raw_syscalls:unaugmented" to convey that idea, i.e. its not
going to be directly attached to a tracepoint, thus it starts with a
"!".
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Luis Cláudio Gonçalves <lclaudio@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lkml.kernel.org/n/tip-meucpjx2u0slpkayx56lxqq6@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The ev_qualifier is an array with the syscall ids passed via -e on the
command line, sort it as we'll search it when setting up the
BPF_MAP_TYPE_PROG_ARRAY.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Luis Cláudio Gonçalves <lclaudio@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lkml.kernel.org/n/tip-c8hprylp3ai6e0z9burn2r3s@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
We can conceivably have multiple BPF object files for other purposes, so
better look just on the BPF object containing the __augmented_syscalls__
map for all things augmented_syscalls related.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Brendan Gregg <brendan.d.gregg@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Luis Cláudio Gonçalves <lclaudio@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lkml.kernel.org/n/tip-3jt8knkuae9lt705r1lns202@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
So that we can use it when looking for other components of that object
file, such as other programs to add to the BPF_MAP_TYPE_PROG_ARRAY and
use with bpf_tail_call().
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Luis Cláudio Gonçalves <lclaudio@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lkml.kernel.org/n/tip-1ibmz7ouv6llqxajy7m8igtd@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
We may want to get to this bpf_object, to search for other BPF programs,
etc.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Luis Cláudio Gonçalves <lclaudio@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lkml.kernel.org/n/tip-3y8hrb6lszjfi23vjlic3cib@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
With BPF_MAP_TYPE_PROG_ARRAY + bpf_tail_call() we want to have BPF
programs, i.e. functions in a object file that perf's BPF loader
shouldn't try to attach to anything, i.e. "!syscalls:sys_enter_open"
should just stay there, not be attached to a tracepoint with that name,
it'll be used by, for instance, 'perf trace' to associate with syscalls
that copy, in addition to the syscall raw args, a filename pointed by
the first arg, i.e. multiple syscalls that need copying the same pointer
arg in the same way, as a filename, for instance, will share the same
BPF program/function.
Right now when perf's BPF loader sees a function with a name
"sys:name" it'll look for a tracepoint and will associate that BPF
program with it, say:
SEC("raw_syscalls:sys_enter")
int sys_enter(struct syscall_enter_args *args)
{
//SNIP
}
Will crate a perf_evsel tracepoint event and then associate with it that
BPF program.
This convention at some point will switch to the one used by the BPF
loader in libbpf, but to experiment with BPF_MAP_TYPE_PROG_ARRAY in
'perf trace' lets do this, that will not require changing too much
stuff.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Luis Cláudio Gonçalves <lclaudio@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lkml.kernel.org/n/tip-lk6dasjr1yf9rtvl292b2hpc@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Will be used together with BPF_MAP_TYPE_PROG_ARRAY in
tools/perf/examples/bpf/augmented_raw_syscalls.c.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Luis Cláudio Gonçalves <lclaudio@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lkml.kernel.org/n/tip-pd1bpy8i31nta6jqwdex871g@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
perf header:
Vince Weaver:
- Fix divide by zero error if f_header.attr_size==0, found using a perf tool fuzzer.
Numfor Mbiziwo-Tiapo:
- Silence use of uninitialized value warning pointed out by clang's MSAN tool.
libbpf:
Andrii Nakryiko:
- Fix missing __WORDSIZE definition in some systems, such as musl libc (Alpine Linux).
tools header UAPI:
Arnaldo Carvalho de Melo:
- Sync headers to address perf build warnings:
- syscalls_64.tbl and generic unistd.h to pick up clone3 and pidfd_open.
- With new ioctls: kvm.h, drm.h and usbdevice_fs.h.
- No tooling change: mman.h, sched.h and if_link.h.
Documentation:
Vince Weaver:
- Fix perf.data documentation units for memory size, its kB, not bytes.
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
-----BEGIN PGP SIGNATURE-----
iHUEABYIAB0WIQR2GiIUctdOfX2qHhGyPKLppCJ+JwUCXT9hNAAKCRCyPKLppCJ+
J6hMAPwIEjHmwu7rCCnaY5NyP7KvV+bzriK/ajxNVXF5dbha9QD/bs785KbBvDvO
1Iz0/JrTT2trXI9aUjD/mrFKMNkFZAU=
=GiQA
-----END PGP SIGNATURE-----
Merge tag 'perf-urgent-for-mingo-5.3-20190729' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/urgent
Pull perf/urgent fixes from Arnaldo Carvalho de Melo:
perf header:
Vince Weaver:
- Fix divide by zero error if f_header.attr_size==0, found using a perf tool fuzzer.
Numfor Mbiziwo-Tiapo:
- Silence use of uninitialized value warning pointed out by clang's MSAN tool.
libbpf:
Andrii Nakryiko:
- Fix missing __WORDSIZE definition in some systems, such as musl libc (Alpine Linux).
tools header UAPI:
Arnaldo Carvalho de Melo:
- Sync headers to address perf build warnings:
- syscalls_64.tbl and generic unistd.h to pick up clone3 and pidfd_open.
- With new ioctls: kvm.h, drm.h and usbdevice_fs.h.
- No tooling change: mman.h, sched.h and if_link.h.
Documentation:
Vince Weaver:
- Fix perf.data documentation units for memory size, its kB, not bytes.
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
hashmap.h depends on __WORDSIZE being defined. It is defined by
glibc/musl in different headers. It's an explicit goal for musl to be
"non-detectable" at compilation time, so instead include glibc header if
glibc is explicitly detected and fall back to musl header otherwise.
Reported-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Alexei Starovoitov <ast@fb.com>
Cc: Andrii Nakryiko <andrii.nakryiko@gmail.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Fixes: e3b9242240 ("libbpf: add resizable non-thread safe internal hashmap")
Link: https://lkml.kernel.org/r/20190718173021.2418606-1-andriin@fb.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The perf.data-file-format documentation incorrectly says the
HEADER_TOTAL_MEM results are in bytes. The results are in kilobytes
(perf reads the value from /proc/meminfo)
Signed-off-by: Vince Weaver <vincent.weaver@maine.edu>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/alpine.DEB.2.21.1907251155500.22624@macbook-air
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
When building our local version of perf with MSAN (Memory Sanitizer) and
running the perf record command, MSAN throws a use of uninitialized
value warning in "tools/perf/util/util.c:333:6".
This warning stems from the "buf" variable being passed into "write".
It originated as the variable "ev" with the type union perf_event*
defined in the "perf_event__synthesize_attr" function in
"tools/perf/util/header.c".
In the "perf_event__synthesize_attr" function they allocate space with a malloc
call using ev, then go on to only assign some of the member variables before
passing "ev" on as a parameter to the "process" function therefore "ev"
contains uninitialized memory. Changing the malloc call to zalloc to initialize
all the members of "ev" which gets rid of the warning.
To reproduce this warning, build perf by running:
make -C tools/perf CLANG=1 CC=clang EXTRA_CFLAGS="-fsanitize=memory\
-fsanitize-memory-track-origins"
(Additionally, llvm might have to be installed and clang might have to
be specified as the compiler - export CC=/usr/bin/clang)
then running:
tools/perf/perf record -o - ls / | tools/perf/perf --no-pager annotate\
-i - --stdio
Please see the cover letter for why false positive warnings may be
generated.
Signed-off-by: Numfor Mbiziwo-Tiapo <nums@google.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Mark Drayton <mbd@fb.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Song Liu <songliubraving@fb.com>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/r/20190724234500.253358-2-nums@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
So I have been having lots of trouble with hand-crafted perf.data files
causing segfaults and the like, so I have started fuzzing the perf tool.
First issue found:
If f_header.attr_size is 0 in the perf.data file, then perf will crash
with a divide-by-zero error.
Committer note:
Added a pr_err() to tell the user why the command failed.
Signed-off-by: Vince Weaver <vincent.weaver@maine.edu>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/alpine.DEB.2.21.1907231100440.14532@macbook-air
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
To pick the changes in:
07a4ddec3c ("bonding: add an option to specify a delay between peer notifications")
And silence this build warning:
Kernel ABI header at 'tools/include/uapi/linux/if_link.h' differs from latest version at 'include/uapi/linux/if_link.h'
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Luis Cláudio Gonçalves <lclaudio@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Vincent Bernat <vincent@bernat.ch>
Link: https://lkml.kernel.org/n/tip-3liw4exxh8goc0rq9xryl2kv@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
To get the changes in:
a509a7cd79 ("sched/uclamp: Extend sched_setattr() to support utilization clamping")
1d6362fa0c ("sched/core: Allow sched_setattr() to use the current policy")
7f192e3cd3 ("fork: add clone3")
And silence this perf build warning:
Warning: Kernel ABI header at 'tools/include/uapi/linux/sched.h' differs from latest version at 'include/uapi/linux/sched.h'
diff -u tools/include/uapi/linux/sched.h include/uapi/linux/sched.h
No changes in tools/ due to the above.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Christian Brauner <christian@brauner.io>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Luis Cláudio Gonçalves <lclaudio@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Patrick Bellasi <patrick.bellasi@arm.com>
Link: https://lkml.kernel.org/n/tip-mtrpsjrux5hgyr5uf8l1aa46@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
To get the changes in:
6d101f24f1 ("USB: add usbfs ioctl to retrieve the connection parameters")
And address this perf build warning:
Warning: Kernel ABI header at 'tools/include/uapi/linux/usbdevice_fs.h' differs from latest version at 'include/uapi/linux/usbdevice_fs.h'
diff -u tools/include/uapi/linux/usbdevice_fs.h include/uapi/linux/usbdevice_fs.h
Which ends up autogenerating a ioctl_cmd->string table used by 'perf
trace':
$ tools/perf/trace/beauty/usbdevfs_ioctl.sh > before
$ cp include/uapi/linux/usbdevice_fs.h tools/include/uapi/linux/usbdevice_fs.h
$ tools/perf/trace/beauty/usbdevfs_ioctl.sh > after
$ diff -u before after
--- before 2019-07-26 15:26:55.513636844 -0300
+++ after 2019-07-26 15:29:11.650518677 -0300
@@ -23,6 +23,7 @@
[2] = "BULK",
[30] = "DROP_PRIVILEGES",
[31] = "GET_SPEED",
+ [32] = "CONNINFO_EX",
[3] = "RESETEP",
[4] = "SETINTERFACE",
[5] = "SETCONFIGURATION",
$
Now 'perf trace' ioctl beautifier will translate this new ioctl to a
string and at some point will allow filtering the 'ioctl' syscall with
something like this in a system wide strace-like sessin:
# perf trace -e ioctl/cmd=USBDEVFS_CONNINFO_EX/
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Luis Cláudio Gonçalves <lclaudio@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lkml.kernel.org/n/tip-tkdfbgzqypwco96b309c0ovd@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
In addition to _IOW() and _IOR(), to handle this case:
#define USBDEVFS_CONNINFO_EX(len) _IOC(_IOC_READ, 'U', 32, len)
That will happen in the next sync of this header file.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Luis Cláudio Gonçalves <lclaudio@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lkml.kernel.org/n/tip-3br5e4t64e4lp0goo84che3s@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Picking the changes from:
c5d3e39caa ("drm/i915: Engine discovery query")
a88b6e4cba ("drm/i915: Allow specification of parallel execbuf")
ee1136908e ("drm/i915/execlists: Virtual engine bonding")
6d06779e86 ("drm/i915: Load balancing across a virtual engine")
b81dde7194 ("drm/i915: Allow userspace to clone contexts on creation")
8319f44c05 ("drm/i915: Re-expose SINGLE_TIMELINE flags for context creation")
e620f7b3a2 ("drm/i915: Extend I915_CONTEXT_PARAM_SSEU to support local ctx->engine[]")
976b55f0e1 ("drm/i915: Allow a context to define its set of engines")
7f3f317a66 ("drm/i915: Restore control over ppgtt for context creation ABI")
75b3f1cb50 ("drm: Fix drm.h uapi header for GNU/kFreeBSD")
Silencing these perf build warnings:
Warning: Kernel ABI header at 'tools/include/uapi/drm/drm.h' differs from latest version at 'include/uapi/drm/drm.h'
diff -u tools/include/uapi/drm/drm.h include/uapi/drm/drm.h
Warning: Kernel ABI header at 'tools/include/uapi/drm/i915_drm.h' differs from latest version at 'include/uapi/drm/i915_drm.h'
diff -u tools/include/uapi/drm/i915_drm.h include/uapi/drm/i915_drm.h
Now 'perf trace' and other code that might use the tools/perf/trace/beauty autogenerated
tables will be able to translate this new ioctl code into a string:
$ tools/perf/trace/beauty/drm_ioctl.sh > before
$ cp include/uapi/drm/i915_drm.h tools/include/uapi/drm/i915_drm.h
$ tools/perf/trace/beauty/drm_ioctl.sh > after
$ diff -u before after
--- before 2019-07-26 13:02:22.052723640 -0300
+++ after 2019-07-26 13:02:35.354906036 -0300
@@ -163,4 +163,6 @@
[DRM_COMMAND_BASE + 0x37] = "I915_PERF_ADD_CONFIG",
[DRM_COMMAND_BASE + 0x38] = "I915_PERF_REMOVE_CONFIG",
[DRM_COMMAND_BASE + 0x39] = "I915_QUERY",
+ [DRM_COMMAND_BASE + 0x3a] = "I915_GEM_VM_CREATE",
+ [DRM_COMMAND_BASE + 0x3b] = "I915_GEM_VM_DESTROY",
};
$
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Eric Anholt <eric@anholt.net>
Cc: James Clarke <jrtc27@jrtc27.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Luis Cláudio Gonçalves <lclaudio@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://lkml.kernel.org/n/tip-a9173whgu3h1vo24jgdg5do8@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Pull perf fixes from Thomas Gleixner:
"A pile of perf related fixes:
Kernel:
- Fix SLOTS PEBS event constraints for Icelake CPUs
- Add the missing mask bit to allow counting hardware generated
prefetches on L3 for Icelake CPUs
- Make the test for hypervisor platforms more accurate (as far as
possible)
- Handle PMUs correctly which override event->cpu
- Yet another missing fallthrough annotation
Tools:
perf.data:
- Fix loading of compressed data split across adjacent records
- Fix buffer size setting for processing CPU topology perf.data
header.
perf stat:
- Fix segfault for event group in repeat mode
- Always separate "stalled cycles per insn" line, it was being
appended to the "instructions" line.
perf script:
- Fix --max-blocks man page description.
- Improve man page description of metrics.
- Fix off by one in brstackinsn IPC computation.
perf probe:
- Avoid calling freeing routine multiple times for same pointer.
perf build:
- Do not use -Wshadow on gcc < 4.8, avoiding too strict warnings
treated as errors, breaking the build"
* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf/x86/intel: Mark expected switch fall-throughs
perf/core: Fix creating kernel counters for PMUs that override event->cpu
perf/x86: Apply more accurate check on hypervisor platform
perf/x86/intel: Fix invalid Bit 13 for Icelake MSR_OFFCORE_RSP_x register
perf/x86/intel: Fix SLOTS PEBS event constraint
perf build: Do not use -Wshadow on gcc < 4.8
perf probe: Avoid calling freeing routine multiple times for same pointer
perf probe: Set pev->nargs to zero after freeing pev->args entries
perf session: Fix loading of compressed data split across adjacent records
perf stat: Always separate stalled cycles per insn
perf stat: Fix segfault for event group in repeat mode
perf tools: Fix proper buffer size for feature processing
perf script: Fix off by one in brstackinsn IPC computation
perf script: Improve man page description of metrics
perf script: Fix --max-blocks man page description
Pull objtool fix from Thomas Gleixner:
"A single robustness fix for objtool to handle unbalanced CLAC
invocations under all circumstances"
* 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
objtool: Improve UACCESS coverage
Picking the changes from:
66bb8a065f ("KVM: x86: PMU Event Filter")
f087a02941 ("KVM: nVMX: Stash L1's CR3 in vmcs01.GUEST_CR3 on nested entry w/o EPT")
99adb56763 ("KVM: arm/arm64: Add save/restore support for firmware workaround state")
Silencing this perf build warning:
Warning: Kernel ABI header at 'tools/arch/arm/include/uapi/asm/kvm.h' differs from latest version at 'arch/arm/include/uapi/asm/kvm.h'
diff -u tools/arch/arm/include/uapi/asm/kvm.h arch/arm/include/uapi/asm/kvm.h
Warning: Kernel ABI header at 'tools/arch/arm64/include/uapi/asm/kvm.h' differs from latest version at 'arch/arm64/include/uapi/asm/kvm.h'
diff -u tools/arch/arm64/include/uapi/asm/kvm.h arch/arm64/include/uapi/asm/kvm.h
Warning: Kernel ABI header at 'tools/arch/x86/include/uapi/asm/vmx.h' differs from latest version at 'arch/x86/include/uapi/asm/vmx.h'
diff -u tools/arch/x86/include/uapi/asm/vmx.h arch/x86/include/uapi/asm/vmx.h
Warning: Kernel ABI header at 'tools/arch/x86/include/uapi/asm/kvm.h' differs from latest version at 'arch/x86/include/uapi/asm/kvm.h'
diff -u tools/arch/x86/include/uapi/asm/kvm.h arch/x86/include/uapi/asm/kvm.h
Warning: Kernel ABI header at 'tools/include/uapi/linux/kvm.h' differs from latest version at 'include/uapi/linux/kvm.h'
diff -u tools/include/uapi/linux/kvm.h include/uapi/linux/kvm.h
Now 'perf trace' and other code that might use the tools/perf/trace/beauty autogenerated
tables will be able to translate this new ioctl code into a string:
$ tools/perf/trace/beauty/kvm_ioctl.sh > before
$
$ cp include/uapi/linux/kvm.h tools/include/uapi/linux/kvm.h
$ tools/perf/trace/beauty/kvm_ioctl.sh > after
$ diff -u before after
--- before 2019-07-26 12:32:47.959220236 -0300
+++ after 2019-07-26 12:33:05.766464871 -0300
@@ -79,6 +79,7 @@
[0xac] = "SET_ONE_REG",
[0xad] = "KVMCLOCK_CTRL",
[0xb0] = "GET_REG_LIST",
+ [0xb2] = "SET_PMU_EVENT_FILTER",
[0xb7] = "SMI",
[0xba] = "MEMORY_ENCRYPT_OP",
[0xbb] = "MEMORY_ENCRYPT_REG_REGION",
$
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Andre Przywara <andre.przywara@arm.com>
Cc: Brendan Gregg <brendan.d.gregg@gmail.com>
Cc: Eric Hankland <ehankland@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Luis Cláudio Gonçalves <lclaudio@redhat.com>
Cc: Marc Zyngier <marc.zyngier@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Sean Christopherson <sean.j.christopherson@intel.com>
Link: https://lkml.kernel.org/n/tip-py1gcmt6rboehlwg6zvagfg2@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
-----BEGIN PGP SIGNATURE-----
iIoEABYIADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCXTi03RQccm9zdGVkdEBn
b29kbWlzLm9yZwAKCRAp5XQQmuv6qsToAQDvxZbyr7MSuDIJdeUlvZ4+d3vf8XVw
kUG5Kio6+m0rQwEA55lFhMfixGF2x7KnFnO2CXSu5GBVSdT0w+2vvZP+OAU=
=59XY
-----END PGP SIGNATURE-----
Merge tag 'ktest-v5.3' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-ktest
Pull ktest fixlets from Steven Rostedt:
"This contains only simple spelling fixes"
* tag 'ktest-v5.3' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-ktest:
ktest: Fix some typos in config-bisect.pl
A clang build reported an (obvious) double CLAC while a GCC build did not;
it turns out that objtool only re-visits instructions if the first visit
was with AC=0. If OTOH the first visit was with AC=1, it completely ignores
any subsequent visit, even when it has AC=0.
Fix this by using a visited mask instead of a boolean, and (explicitly)
mark the AC state.
$ ./objtool check -b --no-fp --retpoline --uaccess drivers/gpu/drm/i915/gem/i915_gem_execbuffer.o
drivers/gpu/drm/i915/gem/i915_gem_execbuffer.o: warning: objtool: .altinstr_replacement+0x22: redundant UACCESS disable
drivers/gpu/drm/i915/gem/i915_gem_execbuffer.o: warning: objtool: eb_copy_relocations.isra.34()+0xea: (alt)
drivers/gpu/drm/i915/gem/i915_gem_execbuffer.o: warning: objtool: .altinstr_replacement+0xffffffffffffffff: (branch)
drivers/gpu/drm/i915/gem/i915_gem_execbuffer.o: warning: objtool: eb_copy_relocations.isra.34()+0xd9: (alt)
drivers/gpu/drm/i915/gem/i915_gem_execbuffer.o: warning: objtool: eb_copy_relocations.isra.34()+0xb2: (branch)
drivers/gpu/drm/i915/gem/i915_gem_execbuffer.o: warning: objtool: eb_copy_relocations.isra.34()+0x39: (branch)
drivers/gpu/drm/i915/gem/i915_gem_execbuffer.o: warning: objtool: eb_copy_relocations.isra.34()+0x0: <=== (func)
Reported-by: Josh Poimboeuf <jpoimboe@redhat.com>
Reported-by: Thomas Gleixner <tglx@linutronix.de>
Reported-by: Sedat Dilek <sedat.dilek@gmail.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Nathan Chancellor <natechancellor@gmail.com>
Tested-by: Nick Desaulniers <ndesaulniers@google.com>
Tested-by: Sedat Dilek <sedat.dilek@gmail.com>
Link: https://github.com/ClangBuiltLinux/linux/issues/617
Link: https://lkml.kernel.org/r/5359166aad2d53f3145cd442d83d0e5115e0cd17.1564007838.git.jpoimboe@redhat.com
This patch fixes some spelling typos in config-bisect.pl
Link: http://lkml.kernel.org/r/20190723032445.14220-1-standby24x7@gmail.com
Acked-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Masanari Iida <standby24x7@gmail.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.22 (GNU/Linux)
iQEcBAABAgAGBQJdOEuhAAoJEL/70l94x66DX/IH/3c6ADaZkuwzUMtJZgib/slX
V7h4ljoW33M85z3nCF5+kY3CNl8c9F2xKGcAIUlJF8MIsZW+zB3HjuU1LC4fCzuk
TqpBf74DpQsKCsv1ngiV02lefPVQ7/VT/QFY7EXNuAqNRfgsBRNoi50244a0ZKpD
KydzKTDKMD5HjE4lHb+bNr+guqkisPx0b0mZtsb4R9uuUSwXEa8DLmWQF2Do7zBj
6G9UD6a1AP5XQBwRRbo5a78b5NZQcF5R9wVEzsmK7OGUw/yC4Em4HVt46z+oT5cm
JK9m59XDqJaL6HMAWC2P/mXUj6o+PP+uBE2uuvkGCNcTLQZwWf+dq9961tWg81E=
=DD/Z
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull KVM fixes from Paolo Bonzini:
"Bugfixes, a pvspinlock optimization, and documentation moving"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
KVM: X86: Boost queue head vCPU to mitigate lock waiter preemption
Documentation: move Documentation/virtual to Documentation/virt
KVM: nVMX: Set cached_vmcs12 and cached_shadow_vmcs12 NULL after free
KVM: X86: Dynamically allocate user_fpu
KVM: X86: Fix fpu state crash in kvm guest
Revert "kvm: x86: Use task structs fpu field for user"
KVM: nVMX: Clear pending KVM_REQ_GET_VMCS12_PAGES when leaving nested
Renaming docs seems to be en vogue at the moment, so fix on of the
grossly misnamed directories. We usually never use "virtual" as
a shortcut for virtualization in the kernel, but always virt,
as seen in the virt/ top-level directory. Fix up the documentation
to match that.
Fixes: ed16648eb5 ("Move kvm, uml, and lguest subdirectories under a common "virtual" directory, I.E:")
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
perf.data:
Alexey Budankov:
- Fix loading of compressed data split across adjacent records
Jiri Olsa:
- Fix buffer size setting for processing CPU topology perf.data header.
perf stat:
Jiri Olsa:
- Fix segfault for event group in repeat mode
Cong Wang:
- Always separate "stalled cycles per insn" line, it was being appended to
the "instructions" line.
perf script:
Andi Kleen:
- Fix --max-blocks man page description.
- Improve man page description of metrics.
- Fix off by one in brstackinsn IPC computation.
perf probe:
Arnaldo Carvalho de Melo:
- Avoid calling freeing routine multiple times for same pointer.
perf build:
- Do not use -Wshadow on gcc < 4.8, avoiding too strict warnings
treated as errors, breaking the build.
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
-----BEGIN PGP SIGNATURE-----
iHUEABYIAB0WIQR2GiIUctdOfX2qHhGyPKLppCJ+JwUCXTcpSQAKCRCyPKLppCJ+
J0s1AQCY4uEiw7ZDUMPkztqG/9nder8M4ncd2FYwsQObmjxhBQEA+u/jvJ9UcUKk
X9BpjDE+1Pi3LrMaFjDQMKgpSutzXgg=
=rAom
-----END PGP SIGNATURE-----
Merge tag 'perf-urgent-for-mingo-5.3-20190723' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/urgent
Pull perf/urgent fixes from Arnaldo Carvalho de Melo:
perf.data:
Alexey Budankov:
- Fix loading of compressed data split across adjacent records
Jiri Olsa:
- Fix buffer size setting for processing CPU topology perf.data header.
perf stat:
Jiri Olsa:
- Fix segfault for event group in repeat mode
Cong Wang:
- Always separate "stalled cycles per insn" line, it was being appended to
the "instructions" line.
perf script:
Andi Kleen:
- Fix --max-blocks man page description.
- Improve man page description of metrics.
- Fix off by one in brstackinsn IPC computation.
perf probe:
Arnaldo Carvalho de Melo:
- Avoid calling freeing routine multiple times for same pointer.
perf build:
- Do not use -Wshadow on gcc < 4.8, avoiding too strict warnings
treated as errors, breaking the build.
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
When perf_add_probe_events() we call cleanup_perf_probe_events() for the
pev pointer it receives, then, as part of handling this failure the main
'perf probe' goes on and calls cleanup_params() and that will again call
cleanup_perf_probe_events()for the same pointer, so just set nevents to
zero when handling the failure of perf_add_probe_events() to avoid the
double free.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lkml.kernel.org/n/tip-x8qgma4g813z96dvtw9w219q@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
So that, when perf_add_probe_events() fails, like in:
# perf probe icmp_rcv:64 "type=icmph->type"
Failed to find 'icmph' in this function.
Error: Failed to add events.
Segmentation fault (core dumped)
#
We don't segfault.
clear_perf_probe_event() was zeroing the whole pev, and since the switch
to zfree() for the members in the pev, that memset() was removed, which
left nargs with its original value, in the above case 1.
With the memset the same pev could be passed to clear_perf_probe_event()
multiple times, since all it would have would be zeroes, and free()
accepts zero, the loop would not happen and we would just memset it
again to zeroes.
Without it we got that segfault, so zero nargs to keep it like it was,
next cset will avoid calling clear_perf_probe_event() for the same pevs
in case of failure.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Fixes: d8f9da2404 ("perf tools: Use zfree() where applicable")
Link: https://lkml.kernel.org/n/tip-802f2jypnwqsvyavvivs8464@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Fix decompression failure found during the loading of compressed trace
collected on larger scale systems (>48 cores).
The error happened due to lack of decompression space for a mmaped
buffer data chunk split across adjacent PERF_RECORD_COMPRESSED records.
$ perf report -i bt.16384.data --stats
failed to decompress (B): 63869 -> 0 : Destination buffer is too small
user stack dump failure
Can't parse sample, err = -14
0x2637e436 [0x4080]: failed to process type: 9
Error:
failed to process sample
$ perf test 71
71: Zstd perf.data compression/decompression : Ok
Signed-off-by: Alexey Budankov <alexey.budankov@linux.intel.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/4d839e1b-9c48-89c4-9702-a12217420611@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The "stalled cycles per insn" is appended to "instructions" when the CPU
has this hardware counter directly. We should always make it a separate
line, which also aligns to the output when we hit the "if (total &&
avg)" branch.
Before:
$ sudo perf stat --all-cpus --field-separator , --log-fd 1 -einstructions,cycles -- sleep 1
4565048704,,instructions,64114578096,100.00,1.34,insn per cycle,,
3396325133,,cycles,64146628546,100.00,,
After:
$ sudo ./tools/perf/perf stat --all-cpus --field-separator , --log-fd 1 -einstructions,cycles -- sleep 1
6721924,,instructions,24026790339,100.00,0.22,insn per cycle
,,,,,0.00,stalled cycles per insn
30939953,,cycles,24025512526,100.00,,
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Andi Kleen <ak@linux.intel.com>
Link: http://lkml.kernel.org/r/20190517221039.8975-1-xiyou.wangcong@gmail.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Numfor Mbiziwo-Tiapo reported segfault on stat of event group in repeat
mode:
# perf stat -e '{cycles,instructions}' -r 10 ls
It's caused by memory corruption due to not cleaned evsel's id array and
index, which needs to be rebuilt in every stat iteration. Currently the
ids index grows, while the array (which is also not freed) has the same
size.
Fixing this by releasing id array and zeroing ids index in
perf_evsel__close function.
We also need to keep the evsel_list alive for stat record (which is
disabled in repeat mode).
Reported-by: Numfor Mbiziwo-Tiapo <nums@google.com>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Mark Drayton <mbd@fb.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Song Liu <songliubraving@fb.com>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/r/20190715142121.GC6032@krava
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
After Song Liu's segfault fix for pipe mode, Arnaldo reported following
error:
# perf record -o - | perf script
0x514 [0x1ac]: failed to process type: 80
It's caused by wrong buffer size setup in feature processing, which
makes cpu topology feature fail, because it's using buffer size to
recognize its header version.
Reported-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: David Carrillo-Cisneros <davidcc@google.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Song Liu <songliubraving@fb.com>
Fixes: e9def1b2e7 ("perf tools: Add feature header record to pipe-mode")
Link: http://lkml.kernel.org/r/20190715140426.32509-1-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Clarify that a metric is based on events, not referring to itself. Also
some improvements with the sentences.
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Link: http://lkml.kernel.org/r/20190711181922.18765-3-andi@firstfloor.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The --max-blocks description was using the old name brstackasm. Use
brstackinsn instead.
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Link: http://lkml.kernel.org/r/20190711181922.18765-1-andi@firstfloor.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
to test and use this feature in the NTB transport layer. Also, bug
fixes for the AMD and Switchtec drivers, as well as some general
patches.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEoE9b9c3U2JxX98mqbmZLrHqL0iMFAl0w0EwACgkQbmZLrHqL
0iNt4RAAiw2Cb2y5D36jI+lPl8xN2REk4x7b/FKFfLSzM/7Bu1okprrElKO6735r
0wRpyZS5lbYYr0QybzMrNfDMLYywFmA+C7GY7gQpR7GLYLb9chmvOmubfEbakHYI
4qKAWgaZt6PaiLQp76y3bFjUiBrd8KzVi5J891lDKvjzsGUjdNeFSlPlDbzTP8H9
+pBxnYn4G7MxVTebxjpOs6bQPO1sJZl4cyW40avTC1x+6bMlU1B361IhwtuabQH8
6rq6MQbaDzHTpuBZqZR8O+JxFzpffnGRu+smWPNQS2p95AewGIaDUsjZ6ph81rF1
mUxt/UJYP0RlIY+RQQ0Ie3cBUZ/I7wQzpFAXnqb8Nzba4tpc9ITX04hfnWKg6Kr0
sgj2Ro6I4TS3TSw6vquwvR2hn6+RZi06z/D0AXdUl+R3QxJXMyZygDDPDpTImZNy
cnu+1Y7pdgjASXBvHKhM39+ynGLGze7xBIs6GO6TjJoVgZdMaW1/N/wsldYRc3Uf
K0VSHowGHOyIV8d33oYWK2oSgWFrRCCanrCXIltNwl6NM05/b474L/8fiSnJ37PZ
dJspdPdrut9W3018cRVHBhAX9csucbSa46FKz1Z0FEFjDJJzqy0isKDsoExBZqBH
AqCnP548sbJF9+yOC475W36KMHBhyNvLS8XxD2J3DQpPDMRj//Y=
=P1lR
-----END PGP SIGNATURE-----
Merge tag 'ntb-5.3' of git://github.com/jonmason/ntb
Pull NTB updates from Jon Mason:
"New feature to add support for NTB virtual MSI interrupts, the ability
to test and use this feature in the NTB transport layer.
Also, bug fixes for the AMD and Switchtec drivers, as well as some
general patches"
* tag 'ntb-5.3' of git://github.com/jonmason/ntb: (22 commits)
NTB: Describe the ntb_msi_test client in the documentation.
NTB: Add MSI interrupt support to ntb_transport
NTB: Add ntb_msi_test support to ntb_test
NTB: Introduce NTB MSI Test Client
NTB: Introduce MSI library
NTB: Rename ntb.c to support multiple source files in the module
NTB: Introduce functions to calculate multi-port resource index
NTB: Introduce helper functions to calculate logical port number
PCI/switchtec: Add module parameter to request more interrupts
PCI/MSI: Support allocating virtual MSI interrupts
ntb_hw_switchtec: Fix setup MW with failure bug
ntb_hw_switchtec: Skip unnecessary re-setup of shared memory window for crosslink case
ntb_hw_switchtec: Remove redundant steps of switchtec_ntb_reinit_peer() function
NTB: correct ntb_dev_ops and ntb_dev comment typos
NTB: amd: Silence shift wrapping warning in amd_ntb_db_vector_mask()
ntb_hw_switchtec: potential shift wrapping bug in switchtec_ntb_init_sndev()
NTB: ntb_transport: Ensure qp->tx_mw_dma_addr is initaliazed
NTB: ntb_hw_amd: set peer limit register
NTB: ntb_perf: Clear stale values in doorbell and command SPAD register
NTB: ntb_perf: Disable NTB link after clearing peer XLAT registers
...
Pull x86 fixes from Thomas Gleixner:
"A set of x86 specific fixes and updates:
- The CR2 corruption fixes which store CR2 early in the entry code
and hand the stored address to the fault handlers.
- Revert a forgotten leftover of the dropped FSGSBASE series.
- Plug a memory leak in the boot code.
- Make the Hyper-V assist functionality robust by zeroing the shadow
page.
- Remove a useless check for dead processes with LDT
- Update paravirt and VMware maintainers entries.
- A few cleanup patches addressing various compiler warnings"
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/entry/64: Prevent clobbering of saved CR2 value
x86/hyper-v: Zero out the VP ASSIST PAGE on allocation
x86, boot: Remove multiple copy of static function sanitize_boot_params()
x86/boot/compressed/64: Remove unused variable
x86/boot/efi: Remove unused variables
x86/mm, tracing: Fix CR2 corruption
x86/entry/64: Update comments and sanity tests for create_gap
x86/entry/64: Simplify idtentry a little
x86/entry/32: Simplify common_exception
x86/paravirt: Make read_cr2() CALLEE_SAVE
MAINTAINERS: Update PARAVIRT_OPS_INTERFACE and VMWARE_HYPERVISOR_INTERFACE
x86/process: Delete useless check for dead process with LDT
x86: math-emu: Hide clang warnings for 16-bit overflow
x86/e820: Use proper booleans instead of 0/1
x86/apic: Silence -Wtype-limits compiler warnings
x86/mm: Free sme_early_buffer after init
x86/boot: Fix memory leak in default_get_smp_config()
Revert "x86/ptrace: Prevent ptrace from clearing the FS/GS selector" and fix the test
Pull perf tooling updates from Thomas Gleixner:
"A set of perf improvements and fixes:
perf db-export:
- Improvements in how COMM details are exported to databases for post
processing and use in the sql-viewer.py UI.
- Export switch events to the database.
BPF:
- Bump rlimit(MEMLOCK) for 'perf test bpf' and 'perf trace', just
like selftests/bpf/bpf_rlimit.h do, which makes errors due to
exhaustion of this limit, which are kinda cryptic (EPERM sometimes)
less frequent.
perf version:
- Fix segfault due to missing OPT_END(), noticed on PowerPC.
perf vendor events:
- Add JSON files for IBM s/390 machine type 8561.
perf cs-etm (ARM):
- Fix two cases of error returns not bing done properly: Invalid
ERR_PTR() use and loss of propagation error codes"
* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (28 commits)
perf version: Fix segfault due to missing OPT_END()
perf vendor events s390: Add JSON files for machine type 8561
perf cs-etm: Return errcode in cs_etm__process_auxtrace_info()
perf cs-etm: Remove errnoeous ERR_PTR() usage in cs_etm__process_auxtrace_info
perf scripts python: export-to-postgresql.py: Export switch events
perf scripts python: export-to-sqlite.py: Export switch events
perf db-export: Export switch events
perf db-export: Factor out db_export__threads()
perf script: Add scripting operation process_switch()
perf scripts python: exported-sql-viewer.py: Use new 'has_calls' column
perf scripts python: exported-sql-viewer.py: Remove redundant semi-colons
perf scripts python: export-to-postgresql.py: Add has_calls column to comms table
perf scripts python: export-to-sqlite.py: Add has_calls column to comms table
perf db-export: Also export thread's current comm
perf db-export: Factor out db_export__comm()
perf scripts python: export-to-postgresql.py: Export comm details
perf scripts python: export-to-sqlite.py: Export comm details
perf db-export: Export comm details
perf db-export: Fix a white space issue in db_export__sample()
perf db-export: Move export__comm_thread into db_export__sample()
...
Pull core fixes from Thomas Gleixner:
- A collection of objtool fixes which address recent fallout partially
exposed by newer toolchains, clang, BPF and general code changes.
- Force USER_DS for user stack traces
[ Note: the "objtool fixes" are not all to objtool itself, but for
kernel code that triggers objtool warnings.
Things like missing function size annotations, or code that confuses
the unwinder etc. - Linus]
* 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (27 commits)
objtool: Support conditional retpolines
objtool: Convert insn type to enum
objtool: Fix seg fault on bad switch table entry
objtool: Support repeated uses of the same C jump table
objtool: Refactor jump table code
objtool: Refactor sibling call detection logic
objtool: Do frame pointer check before dead end check
objtool: Change dead_end_function() to return boolean
objtool: Warn on zero-length functions
objtool: Refactor function alias logic
objtool: Track original function across branches
objtool: Add mcsafe_handle_tail() to the uaccess safe list
bpf: Disable GCC -fgcse optimization for ___bpf_prog_run()
x86/uaccess: Remove redundant CLACs in getuser/putuser error paths
x86/uaccess: Don't leak AC flag into fentry from mcsafe_handle_tail()
x86/uaccess: Remove ELF function annotation from copy_user_handle_tail()
x86/head/64: Annotate start_cpu0() as non-callable
x86/entry: Fix thunk function ELF sizes
x86/kvm: Don't call kvm_spurious_fault() from .fixup
x86/kvm: Replace vmx_vmenter()'s call to kvm_spurious_fault() with UD2
...
- s390 support for KVM selftests
- LAPIC timer offloading to housekeeping CPUs
- Extend an s390 optimization for overcommitted hosts to all architectures
- Debugging cleanups and improvements
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.22 (GNU/Linux)
iQEcBAABAgAGBQJdMr1FAAoJEL/70l94x66DvIkH/iVuUX9jO1NoQ7qhxeo04MnT
GP9mX3XnWoI/iN0zAIRfQSP2/9a6+KblgdiziABhju58j5dCfAZGb5793TQppweb
3ubl11vy7YkzaXJ0b35K7CFhOU9oSlHHGyi5Uh+yyje5qWNxwmHpizxjynbFTKb6
+/S7O2Ua1VrAVvx0i0IRtwanIK/jF4dStVButgVaVdUva3zLaQmeI71iaJl9ddXY
bh50xoYua5Ek6+ENi+nwCNVy4OF152AwDbXlxrU0QbeA1B888Qio7nIqb3bwwPpZ
/8wMVvPzQgL7RmgtY5E5Z4cCYuu7mK8wgGxhuk3oszlVwZJ5rmnaYwGEl4x1s7o=
=giag
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull more KVM updates from Paolo Bonzini:
"Mostly bugfixes, but also:
- s390 support for KVM selftests
- LAPIC timer offloading to housekeeping CPUs
- Extend an s390 optimization for overcommitted hosts to all
architectures
- Debugging cleanups and improvements"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (25 commits)
KVM: x86: Add fixed counters to PMU filter
KVM: nVMX: do not use dangling shadow VMCS after guest reset
KVM: VMX: dump VMCS on failed entry
KVM: x86/vPMU: refine kvm_pmu err msg when event creation failed
KVM: s390: Use kvm_vcpu_wake_up in kvm_s390_vcpu_wakeup
KVM: Boost vCPUs that are delivering interrupts
KVM: selftests: Remove superfluous define from vmx.c
KVM: SVM: Fix detection of AMD Errata 1096
KVM: LAPIC: Inject timer interrupt via posted interrupt
KVM: LAPIC: Make lapic timer unpinned
KVM: x86/vPMU: reset pmc->counter to 0 for pmu fixed_counters
KVM: nVMX: Ignore segment base for VMX memory operand when segment not FS or GS
kvm: x86: ioapic and apic debug macros cleanup
kvm: x86: some tsc debug cleanup
kvm: vmx: fix coccinelle warnings
x86: kvm: avoid constant-conversion warning
x86: kvm: avoid -Wsometimes-uninitized warning
KVM: x86: expose AVX512_BF16 feature to guest
KVM: selftests: enable pgste option for the linker on s390
KVM: selftests: Move kvm_create_max_vcpus test to generic code
...
- match the directory structure of the linux-libc-dev package to that of
Debian-based distributions
- fix incorrect include/config/auto.conf generation when Kconfig creates
it along with the .config file
- remove misleading $(AS) from documents
- clean up precious tag files by distclean instead of mrproper
- add a new coccinelle patch for devm_platform_ioremap_resource migration
- refactor module-related scripts to read modules.order instead of
$(MODVERDIR)/*.mod files to get the list of created modules
- remove MODVERDIR
- update list of header compile-test
- add -fcf-protection=none flag to avoid conflict with the retpoline
flags when CONFIG_RETPOLINE=y
- misc cleanups
-----BEGIN PGP SIGNATURE-----
iQJSBAABCgA8FiEEbmPs18K1szRHjPqEPYsBB53g2wYFAl0ye0MeHHlhbWFkYS5t
YXNhaGlyb0Bzb2Npb25leHQuY29tAAoJED2LAQed4NsGfzgQAKtqa3I6avRrT9Nl
ggYU08z6bqxVBRucpiQq5QhQ0YLf7XQ9tSGO6z0wyzqPHqHRZALg5lHp+x6JUuTe
yhE5AYufHfA86XHD+udOkPuTHEkMCtHZn3qHns39qCsJ5sgnQ5OkjE4xHrMYmV+G
FHoWlqYGCSMsr2SGQ8twffyqlZ3LvOW1XzZAlG53ooBUJsLs1CO9eWYzoksrb6O8
yjPwieKnryVwdzVcyR9gFvoXfgC7JBRuug0vYstQaXceJV88v0BCsWLVWylGGqtO
EdGqi05xMqtkKSuPP4WQVlgv8prull57yOHLkdn/ImQic/JUo8BNAaXnr95vFy6y
/QVCMajCakJDV2WNoSRl/4QK+FYBv1nNSEVT/qGtiC4UXBQZf1BaujrY2CvkQA8x
nfj8Z0ckdv5hfNvTxqPHtwzGJUmO9O8r3Jv69oJ0XnsK2ki2mJB0yjl00o7ZQDg9
NLJ+ovgqRnYDqbJcRe/d0of51NuRwlHmV+h9GDX9FH/7ghHwyMVuxC/k6+a/BZ1h
H8NYOevlqb8eAkXVjz2AoyTCL2SkW4oHdQ+vboEgQcl2jQK0kb3XhtALci91wGzE
aoWEBPZ+5O4wK4RE/z7V6yXvuqq/CcU32YRKJKsccWvEx8AMKLXa0G6NgfTZeZTy
WatLqE6jtTw5yPNNVVPnMZXN4c7C
=D36u
-----END PGP SIGNATURE-----
Merge tag 'kbuild-v5.3-2' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild
Pull more Kbuild updates from Masahiro Yamada:
- match the directory structure of the linux-libc-dev package to that
of Debian-based distributions
- fix incorrect include/config/auto.conf generation when Kconfig
creates it along with the .config file
- remove misleading $(AS) from documents
- clean up precious tag files by distclean instead of mrproper
- add a new coccinelle patch for devm_platform_ioremap_resource
migration
- refactor module-related scripts to read modules.order instead of
$(MODVERDIR)/*.mod files to get the list of created modules
- remove MODVERDIR
- update list of header compile-test
- add -fcf-protection=none flag to avoid conflict with the retpoline
flags when CONFIG_RETPOLINE=y
- misc cleanups
* tag 'kbuild-v5.3-2' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: (25 commits)
kbuild: add -fcf-protection=none when using retpoline flags
kbuild: update compile-test header list for v5.3-rc1
kbuild: split out *.mod out of {single,multi}-used-m rules
kbuild: remove 'prepare1' target
kbuild: remove the first line of *.mod files
kbuild: create *.mod with full directory path and remove MODVERDIR
kbuild: export_report: read modules.order instead of .tmp_versions/*.mod
kbuild: modpost: read modules.order instead of $(MODVERDIR)/*.mod
kbuild: modsign: read modules.order instead of $(MODVERDIR)/*.mod
kbuild: modinst: read modules.order instead of $(MODVERDIR)/*.mod
scsi: remove pointless $(MODVERDIR)/$(obj)/53c700.ver
kbuild: remove duplication from modules.order in sub-directories
kbuild: get rid of kernel/ prefix from in-tree modules.{order,builtin}
kbuild: do not create empty modules.order in the prepare stage
coccinelle: api: add devm_platform_ioremap_resource script
kbuild: compile-test headers listed in header-test-m as well
kbuild: remove unused hostcc-option
kbuild: remove tag files by distclean instead of mrproper
kbuild: add --hash-style= and --build-id unconditionally
kbuild: get rid of misleading $(AS) from documents
...
The code in vmx.c does not use "program_invocation_name", so there
is no need to "#define _GNU_SOURCE" here.
Signed-off-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Pull networking fixes from David Miller:
1) Fix AF_XDP cq entry leak, from Ilya Maximets.
2) Fix handling of PHY power-down on RTL8411B, from Heiner Kallweit.
3) Add some new PCI IDs to iwlwifi, from Ihab Zhaika.
4) Fix handling of neigh timers wrt. entries added by userspace, from
Lorenzo Bianconi.
5) Various cases of missing of_node_put(), from Nishka Dasgupta.
6) The new NET_ACT_CT needs to depend upon NF_NAT, from Yue Haibing.
7) Various RDS layer fixes, from Gerd Rausch.
8) Fix some more fallout from TCQ_F_CAN_BYPASS generalization, from
Cong Wang.
9) Fix FIB source validation checks over loopback, also from Cong Wang.
10) Use promisc for unsupported number of filters, from Justin Chen.
11) Missing sibling route unlink on failure in ipv6, from Ido Schimmel.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (90 commits)
tcp: fix tcp_set_congestion_control() use from bpf hook
ag71xx: fix return value check in ag71xx_probe()
ag71xx: fix error return code in ag71xx_probe()
usb: qmi_wwan: add D-Link DWM-222 A2 device ID
bnxt_en: Fix VNIC accounting when enabling aRFS on 57500 chips.
net: dsa: sja1105: Fix missing unlock on error in sk_buff()
gve: replace kfree with kvfree
selftests/bpf: fix test_xdp_noinline on s390
selftests/bpf: fix "valid read map access into a read-only array 1" on s390
net/mlx5: Replace kfree with kvfree
MAINTAINERS: update netsec driver
ipv6: Unlink sibling route in case of failure
liquidio: Replace vmalloc + memset with vzalloc
udp: Fix typo in net/ipv4/udp.c
net: bcmgenet: use promisc for unsupported filters
ipv6: rt6_check should return NULL if 'from' is NULL
tipc: initialize 'validated' field of received packets
selftests: add a test case for rp_filter
fib: relax source validation check for loopback packets
mlxsw: spectrum: Do not process learned records with a dummy FID
...
Alexei Starovoitov says:
====================
pull-request: bpf 2019-07-18
The following pull-request contains BPF updates for your *net* tree.
The main changes are:
1) verifier precision propagation fix, from Andrii.
2) BTF size fix for typedefs, from Andrii.
3) a bunch of big endian fixes, from Ilya.
4) wide load from bpf_sock_addr fixes, from Stanislav.
5) a bunch of misc fixes from a number of developers.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
test_xdp_noinline fails on s390 due to a handful of endianness issues.
Use ntohs for parsing eth_proto.
Replace bswaps with ntohs/htons.
Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
Acked-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
This test looks up a 32-bit map element and then loads it using a 64-bit
load. This does not work on s390, which is a big-endian machine.
Since the point of this test doesn't seem to be loading a smaller value
using a larger load, simply use a 32-bit load.
Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
A Clang-built kernel is showing the following warning:
arch/x86/kernel/platform-quirks.o: warning: objtool: x86_early_init_platform_quirks()+0x84: unreachable instruction
That corresponds to this code:
7e: 0f 85 00 00 00 00 jne 84 <x86_early_init_platform_quirks+0x84>
80: R_X86_64_PC32 __x86_indirect_thunk_r11-0x4
84: c3 retq
This is a conditional retpoline sibling call, which is now possible
thanks to retpolines. Objtool hasn't seen that before. It's
incorrectly interpreting the conditional jump as an unconditional
dynamic jump.
Reported-by: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Nick Desaulniers <ndesaulniers@google.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/30d4c758b267ef487fb97e6ecb2f148ad007b554.1563413318.git.jpoimboe@redhat.com
This makes it easier to add new instruction types. Also it's hopefully
more robust since the compiler should warn about out-of-range enums.
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Nick Desaulniers <ndesaulniers@google.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/0740e96af0d40e54cfd6a07bf09db0fbd10793cd.1563413318.git.jpoimboe@redhat.com
This fixes objtool for both a GCC issue and a Clang issue:
1) GCC issue:
kernel/bpf/core.o: warning: objtool: ___bpf_prog_run()+0x8d5: sibling call from callable instruction with modified stack frame
With CONFIG_RETPOLINE=n, GCC is doing the following optimization in
___bpf_prog_run().
Before:
select_insn:
jmp *jumptable(,%rax,8)
...
ALU64_ADD_X:
...
jmp select_insn
ALU_ADD_X:
...
jmp select_insn
After:
select_insn:
jmp *jumptable(, %rax, 8)
...
ALU64_ADD_X:
...
jmp *jumptable(, %rax, 8)
ALU_ADD_X:
...
jmp *jumptable(, %rax, 8)
This confuses objtool. It has never seen multiple indirect jump
sites which use the same jump table.
For GCC switch tables, the only way of detecting the size of a table
is by continuing to scan for more tables. The size of the previous
table can only be determined after another switch table is found, or
when the scan reaches the end of the function.
That logic was reused for C jump tables, and was based on the
assumption that each jump table only has a single jump site. The
above optimization breaks that assumption.
2) Clang issue:
drivers/usb/misc/sisusbvga/sisusb.o: warning: objtool: sisusb_write_mem_bulk()+0x588: can't find switch jump table
With clang 9, code can be generated where a function contains two
indirect jump instructions which use the same switch table.
The fix is the same for both issues: split the jump table parsing into
two passes.
In the first pass, locate the heads of all switch tables for the
function and mark their locations.
In the second pass, parse the switch tables and add them.
Fixes: e55a73251d ("bpf: Fix ORC unwinding in non-JIT BPF code")
Reported-by: Randy Dunlap <rdunlap@infradead.org>
Reported-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Jann Horn <jannh@google.com>
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Nick Desaulniers <ndesaulniers@google.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/e995befaada9d4d8b2cf788ff3f566ba900d2b4d.1563413318.git.jpoimboe@redhat.com
Co-developed-by: Josh Poimboeuf <jpoimboe@redhat.com>
Now that C jump tables are supported, call them "jump tables" instead of
"switch tables". Also rename some other variables, add comments, and
simplify the code flow a bit.
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Nick Desaulniers <ndesaulniers@google.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/cf951b0c0641628e0b9b81f7ceccd9bcabcb4bd8.1563413318.git.jpoimboe@redhat.com
Even calls to __noreturn functions need the frame pointer setup first.
Such functions often dump the stack.
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Nick Desaulniers <ndesaulniers@google.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/aed62fbd60e239280218be623f751a433658e896.1563413318.git.jpoimboe@redhat.com
dead_end_function() can no longer return an error. Simplify its
interface by making it return boolean.
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Nick Desaulniers <ndesaulniers@google.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/9e6679610768fb6e6c51dca23f7d4d0c03b0c910.1563413318.git.jpoimboe@redhat.com
- Add an alias check in validate_functions(). With this change, aliases
no longer need uaccess_safe set.
- Add an alias check in decode_instructions(). With this change, the
"if (!insn->func)" check is no longer needed.
- Don't create aliases for zero-length functions, as it can have
unexpected results. The next patch will spit out a warning for
zero-length functions anyway.
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Nick Desaulniers <ndesaulniers@google.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/26a99c31426540f19c9a58b9e10727c385a147bc.1563413318.git.jpoimboe@redhat.com
If 'insn->func' is NULL, objtool skips some important checks, including
sibling call validation. So if some .fixup code does an invalid sibling
call, objtool ignores it.
Treat all code branches (including alts) as part of the original
function by keeping track of the original func value from
validate_functions().
This improves the usefulness of some clang function fallthrough
warnings, and exposes some additional kernel bugs in the process.
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Nick Desaulniers <ndesaulniers@google.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/505df630f33c9717e1ccde6e4b64c5303135c25f.1563413318.git.jpoimboe@redhat.com
After an objtool improvement, it's reporting that __memcpy_mcsafe() is
calling mcsafe_handle_tail() with AC=1:
arch/x86/lib/memcpy_64.o: warning: objtool: .fixup+0x13: call to mcsafe_handle_tail() with UACCESS enabled
arch/x86/lib/memcpy_64.o: warning: objtool: __memcpy_mcsafe()+0x34: (alt)
arch/x86/lib/memcpy_64.o: warning: objtool: __memcpy_mcsafe()+0xb: (branch)
arch/x86/lib/memcpy_64.o: warning: objtool: __memcpy_mcsafe()+0x0: <=== (func)
mcsafe_handle_tail() is basically an extension of __memcpy_mcsafe(), so
AC=1 is supposed to be set. Add mcsafe_handle_tail() to the uaccess
safe list.
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Nick Desaulniers <ndesaulniers@google.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/035c38f7eac845281d3c3d36749144982e06e58c.1563413318.git.jpoimboe@redhat.com
- Add user space specific memory reading for kprobes
- Allow kprobes to be executed earlier in boot
The rest are mostly just various clean ups and small fixes.
-----BEGIN PGP SIGNATURE-----
iIoEABYIADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCXS88txQccm9zdGVkdEBn
b29kbWlzLm9yZwAKCRAp5XQQmuv6qhaPAQDHaAmu6wXtZjZE6GU4ZP61UNgDECmZ
4wlGrNc1AAlqAQD/QC8339p37aDCp9n27VY1wmJwF3nca+jAHfQLqWkkYgw=
=n/tz
-----END PGP SIGNATURE-----
Merge tag 'trace-v5.3' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace
Pull tracing updates from Steven Rostedt:
"The main changes in this release include:
- Add user space specific memory reading for kprobes
- Allow kprobes to be executed earlier in boot
The rest are mostly just various clean ups and small fixes"
* tag 'trace-v5.3' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: (33 commits)
tracing: Make trace_get_fields() global
tracing: Let filter_assign_type() detect FILTER_PTR_STRING
tracing: Pass type into tracing_generic_entry_update()
ftrace/selftest: Test if set_event/ftrace_pid exists before writing
ftrace/selftests: Return the skip code when tracing directory not configured in kernel
tracing/kprobe: Check registered state using kprobe
tracing/probe: Add trace_event_call accesses APIs
tracing/probe: Add probe event name and group name accesses APIs
tracing/probe: Add trace flag access APIs for trace_probe
tracing/probe: Add trace_event_file access APIs for trace_probe
tracing/probe: Add trace_event_call register API for trace_probe
tracing/probe: Add trace_probe init and free functions
tracing/uprobe: Set print format when parsing command
tracing/kprobe: Set print format right after parsed command
kprobes: Fix to init kprobes in subsys_initcall
tracepoint: Use struct_size() in kmalloc()
ring-buffer: Remove HAVE_64BIT_ALIGNED_ACCESS
ftrace: Enable trampoline when rec count returns back to one
tracing/kprobe: Do not run kprobe boot tests if kprobe_event is on cmdline
tracing: Make a separate config for trace event self tests
...
The elftoolchain version of libelf has a function named elf_open().
The function name isn't quite accurate anyway, since it also reads all
the ELF data. Rename it to elf_read(), which is more accurate.
[ jpoimboe: rename to elf_read(); write commit description ]
Signed-off-by: Michael Forney <mforney@mforney.org>
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/7ce2d1b35665edf19fd0eb6fbc0b17b81a48e62f.1562793604.git.jpoimboe@redhat.com
perf db-export:
Adrian Hunter:
- Improvements in how COMM details are exported to databases for
post processing and use in the sql-viewer.py UI.
- Export switch events to the database.
BPF:
Arnaldo Carvalho de Melo:
- Bump rlimit(MEMLOCK) for 'perf test bpf' and 'perf trace', just like
selftests/bpf/bpf_rlimit.h do, which makes errors due to exhaustion of
this limit, which are kinda cryptic (EPERM sometimes) less frequent.
perf version:
Ravi Bangoria:
- Fix segfault due to missing OPT_END(), noticed on PowerPC.
perf vendor events:
Thomas Richter:
- Add JSON files for IBM s/390 machine type 8561.
perf cs-etm (ARM):
YueHaibing:
- Fix two cases of error returns not bing done properly: Invalid ERR_PTR() use
and loss of propagation error codes.
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
-----BEGIN PGP SIGNATURE-----
iHUEABYIAB0WIQR2GiIUctdOfX2qHhGyPKLppCJ+JwUCXSzqiQAKCRCyPKLppCJ+
J6E6AQDUJOqIQ9/AU4VnaLejy01N61TswV++EYaHsGlNoau6SQD+L68GifDgGIh1
hbGAg/BannQONcMHk+0qFXD/AonntQA=
=zyDe
-----END PGP SIGNATURE-----
Merge tag 'perf-core-for-mingo-5.3-20190715' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/urgent
Pull perf/core improvements and fixes from Arnaldo Carvalho de Melo:
perf db-export:
Adrian Hunter:
- Improvements in how COMM details are exported to databases for
post processing and use in the sql-viewer.py UI.
- Export switch events to the database.
BPF:
Arnaldo Carvalho de Melo:
- Bump rlimit(MEMLOCK) for 'perf test bpf' and 'perf trace', just like
selftests/bpf/bpf_rlimit.h do, which makes errors due to exhaustion of
this limit, which are kinda cryptic (EPERM sometimes) less frequent.
perf version:
Ravi Bangoria:
- Fix segfault due to missing OPT_END(), noticed on PowerPC.
perf vendor events:
Thomas Richter:
- Add JSON files for IBM s/390 machine type 8561.
perf cs-etm (ARM):
YueHaibing:
- Fix two cases of error returns not bing done properly: Invalid ERR_PTR() use
and loss of propagation error codes.
Add a test case to simulate the loopback packet case fixed
in the previous patch.
This test gets passed after the fix:
IPv4 rp_filter tests
TEST: rp_filter passes local packets [ OK ]
TEST: rp_filter passes loopback packets [ OK ]
Cc: David Ahern <dsahern@gmail.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Reviewed-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
While descending directories, Kbuild produces objects for modules,
but do not link final *.ko files; it is done in the modpost.
To keep track of modules, Kbuild creates a *.mod file in $(MODVERDIR)
for every module it is building. Some post-processing steps read the
necessary information from *.mod files. This avoids descending into
directories again. This mechanism was introduced in 2003 or so.
Later, commit 551559e13a ("kbuild: implement modules.order") added
modules.order. So, we can simply read it out to know all the modules
with directory paths. This is easier than parsing the first line of
*.mod files.
$(MODVERDIR) has a flat directory structure, that is, *.mod files
are named only with base names. This is based on the assumption that
the module name is unique across the tree. This assumption is really
fragile.
Stephen Rothwell reported a race condition caused by a module name
conflict:
https://lkml.org/lkml/2019/5/13/991
In parallel building, two different threads could write to the same
$(MODVERDIR)/*.mod simultaneously.
Non-unique module names are the source of all kind of troubles, hence
commit 3a48a91901 ("kbuild: check uniqueness of module names")
introduced a new checker script.
However, it is still fragile in the build system point of view because
this race happens before scripts/modules-check.sh is invoked. If it
happens again, the modpost will emit unclear error messages.
To fix this issue completely, create *.mod with full directory path
so that two threads never attempt to write to the same file.
$(MODVERDIR) is no longer needed.
Since modules with directory paths are listed in modules.order, Kbuild
is still able to find *.mod files without additional descending.
I also killed cmd_secanalysis; scripts/mod/sumversion.c computes MD4 hash
for modules with MODULE_VERSION(). When CONFIG_DEBUG_SECTION_MISMATCH=y,
it occurs not only in the modpost stage, but also during directory
descending, where sumversion.c may parse stale *.mod files. It would emit
'No such file or directory' warning when an object consisting a module is
renamed, or when a single-obj module is turned into a multi-obj module or
vice versa.
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Acked-by: Nicolas Pitre <nico@fluxnic.net>
Merge more updates from Andrew Morton:
"VM:
- z3fold fixes and enhancements by Henry Burns and Vitaly Wool
- more accurate reclaimed slab caches calculations by Yafang Shao
- fix MAP_UNINITIALIZED UAPI symbol to not depend on config, by
Christoph Hellwig
- !CONFIG_MMU fixes by Christoph Hellwig
- new novmcoredd parameter to omit device dumps from vmcore, by
Kairui Song
- new test_meminit module for testing heap and pagealloc
initialization, by Alexander Potapenko
- ioremap improvements for huge mappings, by Anshuman Khandual
- generalize kprobe page fault handling, by Anshuman Khandual
- device-dax hotplug fixes and improvements, by Pavel Tatashin
- enable synchronous DAX fault on powerpc, by Aneesh Kumar K.V
- add pte_devmap() support for arm64, by Robin Murphy
- unify locked_vm accounting with a helper, by Daniel Jordan
- several misc fixes
core/lib:
- new typeof_member() macro including some users, by Alexey Dobriyan
- make BIT() and GENMASK() available in asm, by Masahiro Yamada
- changed LIST_POISON2 on x86_64 to 0xdead000000000122 for better
code generation, by Alexey Dobriyan
- rbtree code size optimizations, by Michel Lespinasse
- convert struct pid count to refcount_t, by Joel Fernandes
get_maintainer.pl:
- add --no-moderated switch to skip moderated ML's, by Joe Perches
misc:
- ptrace PTRACE_GET_SYSCALL_INFO interface
- coda updates
- gdb scripts, various"
[ Using merge message suggestion from Vlastimil Babka, with some editing - Linus ]
* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (100 commits)
fs/select.c: use struct_size() in kmalloc()
mm: add account_locked_vm utility function
arm64: mm: implement pte_devmap support
mm: introduce ARCH_HAS_PTE_DEVMAP
mm: clean up is_device_*_page() definitions
mm/mmap: move common defines to mman-common.h
mm: move MAP_SYNC to asm-generic/mman-common.h
device-dax: "Hotremove" persistent memory that is used like normal RAM
mm/hotplug: make remove_memory() interface usable
device-dax: fix memory and resource leak if hotplug fails
include/linux/lz4.h: fix spelling and copy-paste errors in documentation
ipc/mqueue.c: only perform resource calculation if user valid
include/asm-generic/bug.h: fix "cut here" for WARN_ON for __WARN_TAINT architectures
scripts/gdb: add helpers to find and list devices
scripts/gdb: add lx-genpd-summary command
drivers/pps/pps.c: clear offset flags in PPS_SETPARAMS ioctl
kernel/pid.c: convert struct pid count to refcount_t
drivers/rapidio/devices/rio_mport_cdev.c: NUL terminate some strings
select: shift restore_saved_sigmask_unless() into poll_select_copy_remaining()
select: change do_poll() to return -ERESTARTNOHAND rather than -EINTR
...
PTRACE_GET_SYSCALL_INFO is a generic ptrace API that lets ptracer obtain
details of the syscall the tracee is blocked in.
There are two reasons for a special syscall-related ptrace request.
Firstly, with the current ptrace API there are cases when ptracer cannot
retrieve necessary information about syscalls. Some examples include:
* The notorious int-0x80-from-64-bit-task issue. See [1] for details.
In short, if a 64-bit task performs a syscall through int 0x80, its
tracer has no reliable means to find out that the syscall was, in
fact, a compat syscall, and misidentifies it.
* Syscall-enter-stop and syscall-exit-stop look the same for the
tracer. Common practice is to keep track of the sequence of
ptrace-stops in order not to mix the two syscall-stops up. But it is
not as simple as it looks; for example, strace had a (just recently
fixed) long-standing bug where attaching strace to a tracee that is
performing the execve system call led to the tracer identifying the
following syscall-exit-stop as syscall-enter-stop, which messed up
all the state tracking.
* Since the introduction of commit 84d77d3f06 ("ptrace: Don't allow
accessing an undumpable mm"), both PTRACE_PEEKDATA and
process_vm_readv become unavailable when the process dumpable flag is
cleared. On such architectures as ia64 this results in all syscall
arguments being unavailable for the tracer.
Secondly, ptracers also have to support a lot of arch-specific code for
obtaining information about the tracee. For some architectures, this
requires a ptrace(PTRACE_PEEKUSER, ...) invocation for every syscall
argument and return value.
ptrace(2) man page:
long ptrace(enum __ptrace_request request, pid_t pid,
void *addr, void *data);
...
PTRACE_GET_SYSCALL_INFO
Retrieve information about the syscall that caused the stop.
The information is placed into the buffer pointed by "data"
argument, which should be a pointer to a buffer of type
"struct ptrace_syscall_info".
The "addr" argument contains the size of the buffer pointed to
by "data" argument (i.e., sizeof(struct ptrace_syscall_info)).
The return value contains the number of bytes available
to be written by the kernel.
If the size of data to be written by the kernel exceeds the size
specified by "addr" argument, the output is truncated.
[ldv@altlinux.org: selftests/seccomp/seccomp_bpf: update for PTRACE_GET_SYSCALL_INFO]
Link: http://lkml.kernel.org/r/20190708182904.GA12332@altlinux.org
Link: http://lkml.kernel.org/r/20190510152842.GF28558@altlinux.org
Signed-off-by: Elvira Khabirova <lineprinter@altlinux.org>
Co-developed-by: Dmitry V. Levin <ldv@altlinux.org>
Signed-off-by: Dmitry V. Levin <ldv@altlinux.org>
Reviewed-by: Oleg Nesterov <oleg@redhat.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Andy Lutomirski <luto@kernel.org>
Cc: Eugene Syromyatnikov <esyr@redhat.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Greentime Hu <greentime@andestech.com>
Cc: Helge Deller <deller@gmx.de> [parisc]
Cc: James E.J. Bottomley <jejb@parisc-linux.org>
Cc: James Hogan <jhogan@kernel.org>
Cc: kbuild test robot <lkp@intel.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Paul Burton <paul.burton@mips.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Richard Kuo <rkuo@codeaurora.org>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Vincent Chen <deanbo422@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
I thought that /proc/sysvipc has the same bug as /proc/net
commit 1fde6f21d9
proc: fix /proc/net/* after setns(2)
However, it doesn't! /proc/sysvipc files do
get_ipc_ns(current->nsproxy->ipc_ns);
in their open() hook and avoid the problem.
Keep the test, maybe /proc/sysvipc will become broken someday :-\
Link: http://lkml.kernel.org/r/20190706180146.GA21015@avx2
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Test tries to access vsyscall page and if it doesn't exist gets SIGSEGV
which can spam into dmesg. However the segfault happens by design.
Handle it and carry information via exit code to parent.
Link: http://lkml.kernel.org/r/20190524181256.GA2260@avx2
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
perf_buffer test fails for exactly the same reason test_attach_probe
used to fail: different nanosleep syscall kprobe name.
Reuse the test_attach_probe fix.
Fixes: ee5cf82ce0 ("selftests/bpf: test perf buffer API")
Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
It's easier to follow the logic if it's structured the same.
There is just slight difference between test_progs/test_maps and
test_verifier. test_verifier's verifier/*.c files are not really compilable
C files (they are more of include headers), so they can't be specified as
explicit dependencies of test_verifier.
Cc: Alexei Starovoitov <ast@fb.com>
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
e46fc22e60 ("selftests/bpf: make directory prerequisites order-only")
exposed existing problem in Makefile for test_verifier and test_maps tests:
their dependency on auto-generated header file with a list of all tests wasn't
recorded explicitly. This patch fixes these issues.
Fixes: 51a0e301a5 ("bpf: Add BPF_MAP_TYPE_SK_STORAGE test to test_maps")
Fixes: 6b7b6995c4 ("selftests: bpf: tests.h should depend on .c files, not the output")
Cc: Ilya Leoshkevich <iii@linux.ibm.com>
Cc: Stanislav Fomichev <sdf@google.com>
Cc: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEE+QmuaPwR3wnBdVwACF8+vY7k4RUFAl0tpocACgkQCF8+vY7k
4RWoxA//b/fmDXP3WPzrjjSmpyB9ml0/epKzPbT5S2j0lftqKBmet29k+PCjVrTx
Nq2QauehY9ug5h8UMVUCmzPr95F0tSIGRoqk1vrn7z0K3q6k1SHrtvqbY1Bgb2Uk
Qvh2YFU4fQLJg8WAbExCjxCdbdmBKQVGKTwCtM+tP5OMxwAFOmQrjGaUaKCKIIA2
7Wzrx8CpSji+bJ3uK/d36c+4M9oDly5eaxBhoboL3BI0y+GqwiSASGwTO7BxrPOg
0wq5IZHnqS8+bprT9xQdDOqf+UOY9U1cxE/+sqsHxblfUEx9gfLy/R+FLmJn+SS9
Z3yLy4SqVHQMpWBjEAGodohikF60PAuTdymSC11jqFaKCUxWrIZg5xO+0blMrxPF
7vYIexutCkaBMHBlNaNsHIqB7B/2FGGKoN7QW64hwvwJCGvF7OmJcV+R4bROGvh4
nFuis9/Nm66Fq7I3aw37ThyZ0aWZdaQ0QJTH9ksxU/ZCz2hhMNYu/rXggrDvkS4U
nr77ZT5Gd7nj4b110zf8+99uiGiinY6hTfzPAuTCLBhaxwrv4/xDHAhpwdEB5T4j
8gOkxV8c0XWtL7sKqhGJvs/RRe2za0Y9XH6fyxsYfWcfuLjEvug8ouXMad9gxFWH
DL3WnKJEMGLScei2wux4kGOwEbkR1bUf2cHJfh3GpCB/y8vgLOc=
=smxY
-----END PGP SIGNATURE-----
Merge tag 'docs/v5.3-1' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media
Pull rst conversion of docs from Mauro Carvalho Chehab:
"As agreed with Jon, I'm sending this big series directly to you, c/c
him, as this series required a special care, in order to avoid
conflicts with other trees"
* tag 'docs/v5.3-1' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media: (77 commits)
docs: kbuild: fix build with pdf and fix some minor issues
docs: block: fix pdf output
docs: arm: fix a breakage with pdf output
docs: don't use nested tables
docs: gpio: add sysfs interface to the admin-guide
docs: locking: add it to the main index
docs: add some directories to the main documentation index
docs: add SPDX tags to new index files
docs: add a memory-devices subdir to driver-api
docs: phy: place documentation under driver-api
docs: serial: move it to the driver-api
docs: driver-api: add remaining converted dirs to it
docs: driver-api: add xilinx driver API documentation
docs: driver-api: add a series of orphaned documents
docs: admin-guide: add a series of orphaned documents
docs: cgroup-v1: add it to the admin-guide book
docs: aoe: add it to the driver-api book
docs: add some documentation dirs to the driver-api book
docs: driver-model: move it to the driver-api book
docs: lp855x-driver.rst: add it to the driver-api book
...